Respondent Driven Sampling

Respondent driven sampling: Feasibility, validation and its potential use for estimating health indicators for large general populations.

By Robin Lee

A dissertation submitted to the University of Albany, State University of New York in partial fulfillment of the requirements for the Degree of Doctor of Philosophy. School of Public Health Department of Epidemiology and Biostatistics 2010

Modelling the Effect of Differential Recruitment on the Bias of Estimators for Respondent-Driven Sampling

Amber Tomas*

Department of Statistics University of Oxford
January 18, 2011

*With thanks to Krista J. Gile for many helpful suggestions and conversations.


Respondent Driven Sampling has previously been modelled as a random walk on a network. In this document we show that this model can be used to encompass within-group differential recruitment, and examine the implications for bias of several common estimators.

1 Introduction

Respondent Driven Sampling (RDS) (Heckathorn, 1997) is currently a widely used method for sampling from hidden populations. The basic method used to select a respondentdriven sample is as follows: Initially, a number of individuals from the population of interest are selected as seeds. Seeds are selected from a group of individuals in the population who are known to the researcher. Each seed is given a number of coupons, each of which has a unique bar-code, and is asked to pass on the coupons to other people they know within the population. When an individual has received a coupon, they are asked to report to a study centre where information of interest is collected by the researcher (such information is also collected from the seeds). A small monetary reward is often offered at this stage to encourage response. The responders are then themselves given coupons, and are asked to hand them on to others they know within the population, usually only to those who have not yet been recruited. In this manner, after the initial selection of seeds, the sampling is driven by the respondents. Those who report to the study centre are known to those who have already been selected, and recruitee-recruiter relationships can be determined from the bar-codes of the coupons. The information available on which to base an estimate is therefore information collected from the respondents and from the recruitment patterns. Respondents are usually asked how many people they know within
the population of interest. This provides an estimate of degree, as described later.

The original RDS paper (Heckathorn, 1997) suggested using the sample proportion as an estimator of population proportion (we refer to this as the “Na¨ıve” estimator), and showed that this estimator is unbiased under very strong assumptions about the sampling process. Subsequent papers (Salganik and Heckathorn, 2004; Volz and Heckathorn, 2008; Heckathorn, 2007) have relaxed some of these assumptions and proposed several alternative estimators. We will refer to these estimators as the Salganik-Heckathorn (SH) estimator (Salganik and Heckathorn, 2004), the Volz-Heckathorn (VH) estimator (Volz and Heckathorn, 2008) and the Heckathorn (H) estimator (Heckathorn, 2007).

Although the estimators are easy to implement, the nature of their behaviour is not well understood. The main reason for this is that several of the assumptions which underpin the theoretical frameworks in which the estimators are derived and analysed are not met in practice. For example, it is generally assumed that sampling is with replacement or that seeds are selected randomly, whereas in practice these conditions almost never hold.

Another assumption used to derive the estimators and which is unlikely to hold in practice is that sampled individuals recruit uniformly at random from their acquaintances in the population. When this doesn’t hold, we say there exists differential recruitment.

In this paper we extend the theoretical framework used in Goel and Salganik (2009) and show how it can be used to incorporate some types of differential recruitment. We then investigate the effect that differential recruitment will have on the behaviour of the estimators. In the remainder of this document we first introduce the generalised network model which forms the basis for derivation of the RDS estimators. We then present and briefly discuss the form of the estimators used in this study. This allows us to use the network model to analyse the effect of differential recruitment on the bias of the estimators.

Modified respondent driven sampling as a practical method for sampling of hidden “risk” networks Pavlo Smyrnov

Conference abstract

BACKGROUND. Network epidemiology is a promising area of an epidemiological research that can facilitate understanding of an infection transmission within a population. An empirical base of the network epidemiology is developing. A theoretical background was established and tested on computer-simulated models. One of the most difficult tasks in moving to empirical field is developing a feasible method of gathering reliable information on network links. This is particularly challenging when those links represent sexual and drug use relationships within a hidden group of injecting drug users.

METHODS. Combination of a coupon referral system and a personal network name-generator was used. All participants were asked to provide information about drug users with whom they communicated personally or their sexual partners during the last 30 days. This information was entered into the name-generator. Coupons were provided only for the current “risk” contacts, i.e. those injecting drug users with whom participants had sex or injected together. There was no limit to recruitment of “risk” contacts. All recruited and non-recruited but named “risk” contacts formed egocentric networks of the study participants. These egocentric networks were linked through referral chains into larger network clusters. Repeated referrals provided information for new links.

RESULTS. Most of the study participants named and recruited their “risk” contacts. At each of the 5 study sites not less than 300 injecting drug users were selected from name generators of their peers and recruited through coupon referral system.

CONCLUSIONS. Respondent driven sampling was modified to sample network links. Thus recruitment links in the current study are also the “risk” links through which infection can be transmitted when it is present in the network. These links form the “risk” network that can be studied by the network analysis methods.

The sensitivity of respondent-driven sampling

Xin Lu1, Linus Bengtsson2, Tom Britton3, Martin Camitz4, Beom Jun Kim5, Anna Thorson6, Fredrik Liljeros7Article first published online: 18 JUL 2011

DOI: 10.1111/j.1467-985X.2011.00711.x

© 2011 Royal Statistical Society


Keywords:Directed network;Hidden population;Network;Respondent-driven sampling;Sampling;Sensitivity

Summary. Researchers in many scientific fields make inferences from individuals to larger groups. For many groups, however, there is no list of members from which to draw a random sample. Respondent-driven sampling (RDS) is a relatively new sampling methodology that circumvents this difficulty by using the social networks of the groups under study. The RDS method has been shown to provide unbiased estimates of population proportions given certain conditions. The method is now widely used in human immunodeficiency virus related studies among high risk populations globally. We test the RDS methodology by simulating RDS studies on the social networks of a large Lesbian, gay, bisexual and transgender Web community. The robustness of the RDS method is tested by violating, one by one, the conditions under which the method provides unbiased estimates. Simulations indicate that the bias is large if networks are directed or respondents choose to invite people on the basis of characteristics that are correlated with the study outcomes. The bias and variance increase if participants invite close as opposed to more distant friends whereas sampling in denser networks sharply reduces variance. However, the RDS method shows strong resistance to sampling without replacement, low response rates and certain errors in the participants’ reporting of their network sizes, as well as the selection criteria of seeds. The effects of network structure and the number of seeds and coupons are also discussed.

Final Report Submitted to NIJ on Meth Markets in NYC

Travis Wendel
Bilal Khan
Kirk Dombrowski
Ric Curtis
Katherine McLean
Evan Misshula
Robert Riggs
David M. Marshall IV


Using Respondent Driven Sampling, this study piloted an innovative research design
mixing qualitative and quantitative data collection methods, and social network analysis,
that addresses a gap in information on retail methamphetamine markets and the role of
illicit drug markets in consumption. Based on a sample of 132 methamphetamine users,
buyers and sellers in New York City (NYC), findings describe a bifurcated market
defined by differences in sexual identity, drug use behaviors, social network
characteristics, and drug market behaviors. The larger sub-market is a closed market
related to a sexual network of men who have sex with men (MSM) where
methamphetamine (referred to as “tina”) is used as a sex drug. The smaller submarket
is a less-closed market not denominated by sexual identity where methamphetamine
(referred to as “crank,” “speed,” or “crystal meth”) overlaps with powder and crack
cocaine markets. Participants in the MSM submarket viewed “tina” as very different from
cocaine, due to what they characterized as the drug’s intense sexual effects, whereas
participants in the smaller non-sexual-identity-denominated submarket saw “crystal
meth” as a cost-effective alternative to cocaine. While majorities of participants in all
subpopulations studied reported that their use of methamphetamine primarily centered
on sex, almost all (91%) MSM reported this. Many MSM reported that their sexuality
had become indistinguishable from their drug use. MSM had denser patterns of social
network ties and many more sex partners than other subpopulations. MSM market
participants reported higher prices for the drug, which may be an indication that they are
accessing purer forms of methamphetamine. Participants were more willing to discuss
accessing or purchasing methamphetamine than they were to discuss providing or
selling the drug, although all indications are that most market participants do both.
Compared with the sometimes highly organized markets that have existed for other
illegal drugs (e.g., heroin, cocaine, marijuana), retail methamphetamine markets have
remained, by contrast, relatively primitive in their social and technical organization, and
distinct patterns of drug use emerged as an outcome of interactions between drug
providers and members of their social networks. In this case, those with less structurally
advantageous positions within the network must depend on better-positioned network
contacts to supply them with methamphetamine. Findings from the study indicate that
the most striking characteristic of the methamphetamine market in New York City is the
extent of the secondary market. Study data suggests this large secondary market has
developed because of “bottlenecks” in the chain of distribution, which may be the
outcome of the inconsistent supply of methamphetamine available in New York City.
Participants reported essentially no violence in connection with methamphetamine
markets in NYC. Participants have a lifetime total of 13 methamphetamine possession
arrests for the sample of 132; none has ever been arrested for methamphetamine
distribution. Study findings may be useful to practitioners, policy-makers and
researchers in fields including law enforcement, criminal justice, and public health and
substance abuse treatment.

National HIV Behavioral Surveillance Study Among Men who Have Sex with Men

The New York City Department of Health and Mental Hygiene and its collaborators at John Jay College of Criminal Justice are conducting a large study of HIV risk among gay and non-gay identified men who have sex with men (MSM) in New York City. The study is part of a National HIV Behavioral Surveillance (NHBS) study funded by the Centers for Disease Control and Prevention (CDC), which is conducted in 20 cities in the United States with high rates of HIV.

The goals of the NHBS study are to understand the characteristics of MSM at risk of HIV infection in New York City and the factors associated with risk. This study will help plan for future health department efforts in preventing future HIV infections among MSM and address unmet needs in current HIV prevention activities.


