Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2014

Open Access 01-12-2014 | Research article

Estimating the size of hidden populations from register data

Authors: Anders Ledberg, Peter Wennberg

Published in: BMC Medical Research Methodology | Issue 1/2014

Login to get access

Abstract

Background

Prevalence estimates of drug use, or of its consequences, are considered important in many contexts and may have substantial influence over public policy. However, it is rarely possible to simply count the relevant individuals, in particular when the defining characteristics might be illegal, as in the drug use case. Consequently methods are needed to estimate the size of such partly ‘hidden’ populations, and many such methods have been developed and used within epidemiology including studies of alcohol and drug use. Here we introduce a method appropriate for estimating the size of human populations given a single source of data, for example entries in a health-care registry.

Methods

The setup is the following: during a fixed time-period, e.g. a year, individuals belonging to the target population have a non-zero probability of being “registered”. Each individual might be registered multiple times and the time-points of the registrations are recorded. Assuming that the population is closed and that the probability of being registered at least once is constant, we derive a family of maximum likelihood (ML) estimators of total population size. We study the ML estimator using Monte Carlo simulations and delimit the range of cases where it is useful. In particular we investigate the effect of making the population heterogeneous with respect to probability of being registered.

Results

The new estimator is asymptotically unbiased and we show that high precision estimates can be obtained for samples covering as little as 25% of the total population size. However, if the total population size is small (say in the order of 500) a larger fraction needs to be sampled to achieve reliable estimates. Further we show that the estimator give reliable estimates even when individuals differ in the probability of being registered. We also compare the ML estimator to an estimator known as Chao’s estimator and show that the latter can have a substantial bias when applied to epidemiological data.

Conclusions

The population size estimator suggested herein complements existing methods and is less sensitive to certain types of dependencies typical in epidemiological data.
Appendix
Available only for authorised users
Literature
1.
go back to reference UNO: Tungt Narkotikamissbruk: en Totalundersökning 1979 [Heavy Drug Abuse: A Complete Population Stud 1979]. 1980, Stockholm: Socialdepartementet UNO: Tungt Narkotikamissbruk: en Totalundersökning 1979 [Heavy Drug Abuse: A Complete Population Stud 1979]. 1980, Stockholm: Socialdepartementet
2.
go back to reference Olsson B, Adamsson Warhen C, Byqvist S: Det Tunga Narkotikabrukets Omfattning i Sverige 1998. 2001, Stockholm: CAN Olsson B, Adamsson Warhen C, Byqvist S: Det Tunga Narkotikabrukets Omfattning i Sverige 1998. 2001, Stockholm: CAN
3.
go back to reference European Monitoring Centre for Drugs and Drug Addiction: Annual Report 2012. The State of the Drug Problem in Europe. 2012, Luxembourg: Publications Office of the European Union European Monitoring Centre for Drugs and Drug Addiction: Annual Report 2012. The State of the Drug Problem in Europe. 2012, Luxembourg: Publications Office of the European Union
4.
go back to reference Hook E: Capture recapture methods in epidemiology: methods and limitations. Epidemiol Rev. 1995, 17 (2): 243-264.PubMed Hook E: Capture recapture methods in epidemiology: methods and limitations. Epidemiol Rev. 1995, 17 (2): 243-264.PubMed
5.
go back to reference Yip P, Bruno G, Tajima N, Seber G, Buckland S, Cormack R, Unwin N, Chang Y, Fienberg S, Junker B, LaPorte R, Libman I, McCarty D: Capture-recapture and multiple-record systems estimation II applications in human-diseases. Am J Epidemiol. 1995, 142 (10): 1059-1068. Yip P, Bruno G, Tajima N, Seber G, Buckland S, Cormack R, Unwin N, Chang Y, Fienberg S, Junker B, LaPorte R, Libman I, McCarty D: Capture-recapture and multiple-record systems estimation II applications in human-diseases. Am J Epidemiol. 1995, 142 (10): 1059-1068.
6.
go back to reference Chao A, Tsay P, Lin S, Shau W, Chao D: The applications of capture-recapture models to epidemiological data. Stat Med. 2001, 20 (20): 3123-3157. 10.1002/sim.996.CrossRefPubMed Chao A, Tsay P, Lin S, Shau W, Chao D: The applications of capture-recapture models to epidemiological data. Stat Med. 2001, 20 (20): 3123-3157. 10.1002/sim.996.CrossRefPubMed
7.
go back to reference Frischer M, Leyland A, Cormack R, Goldberg D, Bloor M, Green S, Taylor A, Covell R, McKeganey N, Platt S: Estimating the population prevalence of injection-drug use and infection with human-immunodeficiency-virus among injection-drug users in Glasgow, Scotland. Am J Epidemiol. 1993, 138 (3): 170-181.PubMed Frischer M, Leyland A, Cormack R, Goldberg D, Bloor M, Green S, Taylor A, Covell R, McKeganey N, Platt S: Estimating the population prevalence of injection-drug use and infection with human-immunodeficiency-virus among injection-drug users in Glasgow, Scotland. Am J Epidemiol. 1993, 138 (3): 170-181.PubMed
8.
go back to reference Domingo-Salvany A, Hartnoll R, Maguire A, Brugal M, Albertin P, Cayla J, Casabona J, Suelves J: Analytical considerations in the use of capture-recapture to estimate prevalence: case studies of the estimation of opiate use in the metropolitan area og Barcelona, Spain. Am J Epidemiol. 1998, 148 (8): 732-740. 10.1093/oxfordjournals.aje.a009694.CrossRefPubMed Domingo-Salvany A, Hartnoll R, Maguire A, Brugal M, Albertin P, Cayla J, Casabona J, Suelves J: Analytical considerations in the use of capture-recapture to estimate prevalence: case studies of the estimation of opiate use in the metropolitan area og Barcelona, Spain. Am J Epidemiol. 1998, 148 (8): 732-740. 10.1093/oxfordjournals.aje.a009694.CrossRefPubMed
9.
go back to reference Chao A: Estimating the population-size for capture recapture data with unequal catchability. Biometrics. 1987, 43 (4): 783-791. 10.2307/2531532.CrossRefPubMed Chao A: Estimating the population-size for capture recapture data with unequal catchability. Biometrics. 1987, 43 (4): 783-791. 10.2307/2531532.CrossRefPubMed
10.
go back to reference Zelterman D: Robust estimation in truncated discrete-distributions with application to capture recapture experiments. J Stat Plan Infer. 1988, 18 (2): 225-237. 10.1016/0378-3758(88)90007-9.CrossRef Zelterman D: Robust estimation in truncated discrete-distributions with application to capture recapture experiments. J Stat Plan Infer. 1988, 18 (2): 225-237. 10.1016/0378-3758(88)90007-9.CrossRef
11.
go back to reference Hay G, Smit F: Estimating the number of drug injectors from needle exchange data. Addict Res Theory. 2003, 11 (4): 235-243. 10.1080/1606635031000135622.CrossRef Hay G, Smit F: Estimating the number of drug injectors from needle exchange data. Addict Res Theory. 2003, 11 (4): 235-243. 10.1080/1606635031000135622.CrossRef
12.
go back to reference Bohning D, Suppawattanabodee B, Kusolvisitkul W, Viwatwongkasem C: Estimating the number of drug users in Bangkok 2001 : a capture-recapture approach using repeated entries in one list. Eur J Epidemiol. 2004, 19 (12): 1075-1083. 10.1007/s10654-004-3006-8.CrossRefPubMed Bohning D, Suppawattanabodee B, Kusolvisitkul W, Viwatwongkasem C: Estimating the number of drug users in Bangkok 2001 : a capture-recapture approach using repeated entries in one list. Eur J Epidemiol. 2004, 19 (12): 1075-1083. 10.1007/s10654-004-3006-8.CrossRefPubMed
13.
go back to reference Cormack R: Problems with using capture-recapture in epidemiology: an example of a measles epidemic. J Clin Epidemiol. 1999, 52 (10): 909-914.CrossRefPubMed Cormack R: Problems with using capture-recapture in epidemiology: an example of a measles epidemic. J Clin Epidemiol. 1999, 52 (10): 909-914.CrossRefPubMed
14.
go back to reference Hayne D: Two methods for estimating population from trapping records. J Mammal. 1949, 30 (4): 399-411. 10.2307/1375218.CrossRefPubMed Hayne D: Two methods for estimating population from trapping records. J Mammal. 1949, 30 (4): 399-411. 10.2307/1375218.CrossRefPubMed
15.
go back to reference Moran P: A mathematical theory of animal trapping. Biometrika. 1951, 38 (3–4): 307-311.CrossRef Moran P: A mathematical theory of animal trapping. Biometrika. 1951, 38 (3–4): 307-311.CrossRef
16.
go back to reference Zippin C: An evaluation of the removal method of estimating animal populations. Biometrics. 1956, 12 (2): 163-189. 10.2307/3001759.CrossRef Zippin C: An evaluation of the removal method of estimating animal populations. Biometrics. 1956, 12 (2): 163-189. 10.2307/3001759.CrossRef
17.
go back to reference Chao A: Nonparametric estimation of the number of classes in a population. Scand J Statist. 1984, 11: 265-270. Chao A: Nonparametric estimation of the number of classes in a population. Scand J Statist. 1984, 11: 265-270.
18.
go back to reference Seber G, Whale J: The removal method for two and three samples. Biometrics. 1970, 26 (3): 393-400. 10.2307/2529096.CrossRefPubMed Seber G, Whale J: The removal method for two and three samples. Biometrics. 1970, 26 (3): 393-400. 10.2307/2529096.CrossRefPubMed
19.
go back to reference Feller W: An Introduction to Probability Theory and Its Applications. 1968, Stockholm: Wiley Feller W: An Introduction to Probability Theory and Its Applications. 1968, Stockholm: Wiley
Metadata
Title
Estimating the size of hidden populations from register data
Authors
Anders Ledberg
Peter Wennberg
Publication date
01-12-2014
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2014
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/1471-2288-14-58

Other articles of this Issue 1/2014

BMC Medical Research Methodology 1/2014 Go to the issue

Reviewer acknowledgement

Reviewer acknowledgement 2013