Skip to main content
Top
Published in: Malaria Journal 1/2016

Open Access 01-12-2016 | Research

Approaches to estimating inbreeding coefficients in clinical isolates of Plasmodium falciparum from genomic sequence data

Authors: John D. O’Brien, Lucas Amenga-Etego, Ruiqi Li

Published in: Malaria Journal | Issue 1/2016

Login to get access

Abstract

Background

The advent of whole-genome sequencing has generated increased interest in modelling the structure of strain mixture within clinical infections of Plasmodium falciparum The life cycle of the parasite implies that the mixture of multiple strains within an infected individual is related to the out-crossing rate across populations, making methods for measuring this process in situ central to understanding the genetic epidemiology of the disease.

Results

This paper derives a set of new estimators for inferring inbreeding coefficients using whole genome sequence read count data from P. falciparum clinical samples, which provides resources to assess within-sample mixture that connect to extensive literatures in population genetics and conservation ecology. Features of the P. falciparum genome mean that standard methods for inbreeding coefficients and related F-statistics cannot be used directly. After reviewing an initial effort to estimate the inbreeding coefficient within clinical isolates of P. falciparum, several generalizations using both frequentist and Bayesian approaches are provided. A simpler, more intuitive frequentist estimator is shown to have nearly identical properties to the initial estimator both in simulation and in real data sets. The Bayesian approach connects these estimates to the Balding–Nichols model, a mainstay within genetic epidemiology, and a possible framework for more complex modelling. A simulation study shows strong performance for all estimators with as few as ten variants. Application to samples from the PF3K data set indicate significant across-country variation in within-sample mixture. Finally, a comparison with results from a recent mixture model for within-sample strain mixture show that inbreeding coefficients provide a strong proxy for these more complex models.

Conclusions

This paper provides a set of methods for estimating inbreeding coefficients within P. falciparum samples from whole-genome sequence data, supported by simulation studies and empirical examples. It includes a substantially simple estimator with similar statistical properties to the estimator in current use. These methods will also be applicable to other species with similar life-cycles. Implementations of the methods described are available in an open-source R package pfmix. Estimates for the PF3K public data release are provide as part of this resource.
Literature
1.
go back to reference Snow RW, Guerra CA, Noor AM, Myint HY, Hay SI. The global distribution of clinical episodes of Plasmodium falciparum malaria. Nature. 2005;434:214–7.CrossRefPubMedPubMedCentral Snow RW, Guerra CA, Noor AM, Myint HY, Hay SI. The global distribution of clinical episodes of Plasmodium falciparum malaria. Nature. 2005;434:214–7.CrossRefPubMedPubMedCentral
2.
go back to reference Tibayrenc M. Genetic epidemiology of parasitic protozoa and other infectious agents: the need for an integrated approach. Int J Parasitol. 1998;28:85–104.CrossRefPubMed Tibayrenc M. Genetic epidemiology of parasitic protozoa and other infectious agents: the need for an integrated approach. Int J Parasitol. 1998;28:85–104.CrossRefPubMed
3.
go back to reference Manske M, Miotto O, Campino S, Auburn S, Almagro-Garcia J, Maslen G, et al. Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing. Nature. 2012;487:375–9.CrossRefPubMedPubMedCentral Manske M, Miotto O, Campino S, Auburn S, Almagro-Garcia J, Maslen G, et al. Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing. Nature. 2012;487:375–9.CrossRefPubMedPubMedCentral
4.
go back to reference Conway D, Greenwood B, McBride J. The epidemiology of multiple-clone Plasmodium falciparum infections in Gambian patients. Parasitology. 1991;103:1–6.CrossRefPubMed Conway D, Greenwood B, McBride J. The epidemiology of multiple-clone Plasmodium falciparum infections in Gambian patients. Parasitology. 1991;103:1–6.CrossRefPubMed
5.
go back to reference Hill WG, Babiker HA. Estimation of numbers of malaria clones in blood samples. Proc R Soc Lond B: Biol Sci. 1995;262:249–57.CrossRef Hill WG, Babiker HA. Estimation of numbers of malaria clones in blood samples. Proc R Soc Lond B: Biol Sci. 1995;262:249–57.CrossRef
6.
go back to reference Hill WG, Babiker HA, Ranford-Cartwright LC, Walliker D. Estimation of inbreeding coefficients from genotypic data on multiple alleles, and application to estimation of clonality in malaria parasites. Genet Res. 1995;65:53–61.CrossRefPubMed Hill WG, Babiker HA, Ranford-Cartwright LC, Walliker D. Estimation of inbreeding coefficients from genotypic data on multiple alleles, and application to estimation of clonality in malaria parasites. Genet Res. 1995;65:53–61.CrossRefPubMed
7.
go back to reference Galinsky K, Valim C, Salmier A, de Thoisy B, Musset L, Legrand E, et al. COIL: a methodology for evaluating malarial complexity of infection using likelihood from single nucleotide polymorphism data. Malar J. 2015;14:4.CrossRefPubMedPubMedCentral Galinsky K, Valim C, Salmier A, de Thoisy B, Musset L, Legrand E, et al. COIL: a methodology for evaluating malarial complexity of infection using likelihood from single nucleotide polymorphism data. Malar J. 2015;14:4.CrossRefPubMedPubMedCentral
8.
go back to reference O’Brien JD, Iqbal Z, Wendler J, Amenga-Etego L. Inferring strain mixture within clinical Plasmodium falciparum isolates from genomic sequence data. PLoS Computat Biol. 2016;12:e1004824.CrossRef O’Brien JD, Iqbal Z, Wendler J, Amenga-Etego L. Inferring strain mixture within clinical Plasmodium falciparum isolates from genomic sequence data. PLoS Computat Biol. 2016;12:e1004824.CrossRef
9.
go back to reference Hedrick PW, Kalinowski ST. Inbreeding depression in conservation biology. Annu Rev Ecol Evol Syst. 2000;1:139–62.CrossRef Hedrick PW, Kalinowski ST. Inbreeding depression in conservation biology. Annu Rev Ecol Evol Syst. 2000;1:139–62.CrossRef
10.
go back to reference Weir BS, Cockerham CC. Estimating F-statistics for the analysis of population structure. Evolution. 1984;1:1358–70.CrossRef Weir BS, Cockerham CC. Estimating F-statistics for the analysis of population structure. Evolution. 1984;1:1358–70.CrossRef
11.
go back to reference Nei M. F-statistics and analysis of gene diversity in subdivided populations. Ann Hum Genet. 1977;41:225–33.CrossRefPubMed Nei M. F-statistics and analysis of gene diversity in subdivided populations. Ann Hum Genet. 1977;41:225–33.CrossRefPubMed
12.
go back to reference Wright S. The interpretation of population structure by F-statistics with special regard to systems of mating. Evolution. 1965;1:395–420.CrossRef Wright S. The interpretation of population structure by F-statistics with special regard to systems of mating. Evolution. 1965;1:395–420.CrossRef
13.
go back to reference Rousset F. Genetic differentiation and estimation of gene flow from F-statistics under isolation by distance. Genetics. 1997;145:1219–28.PubMedPubMedCentral Rousset F. Genetic differentiation and estimation of gene flow from F-statistics under isolation by distance. Genetics. 1997;145:1219–28.PubMedPubMedCentral
15.
go back to reference Hall N, Karras M, Raine JD, Carlton JM, Kooij TW, Berriman M, et al. A comprehensive survey of the Plasmodium life cycle by genomic, transcriptomic, and proteomic analyses. Science. 2005;307:82–6.CrossRefPubMed Hall N, Karras M, Raine JD, Carlton JM, Kooij TW, Berriman M, et al. A comprehensive survey of the Plasmodium life cycle by genomic, transcriptomic, and proteomic analyses. Science. 2005;307:82–6.CrossRefPubMed
16.
go back to reference Auburn S, Campino S, Miotto O, Djimde AA, Zongo I, Manske M, et al. Characterization of within-host Plasmodium falciparum diversity using next-generation sequence data. PLoS One. 2012;7:e32891.CrossRefPubMedPubMedCentral Auburn S, Campino S, Miotto O, Djimde AA, Zongo I, Manske M, et al. Characterization of within-host Plasmodium falciparum diversity using next-generation sequence data. PLoS One. 2012;7:e32891.CrossRefPubMedPubMedCentral
17.
go back to reference Duffy CW, Assefa SA, Abugri J, Amoako N, Owusu-Agyei S, Anyorigiya T, et al. Comparison of genomic signatures of selection on Plasmodium falciparum between different regions of a country with high malaria endemicity. BMC Genomics. 2015;16:1.CrossRef Duffy CW, Assefa SA, Abugri J, Amoako N, Owusu-Agyei S, Anyorigiya T, et al. Comparison of genomic signatures of selection on Plasmodium falciparum between different regions of a country with high malaria endemicity. BMC Genomics. 2015;16:1.CrossRef
18.
go back to reference Mobegi VA, Duffy CW, Amambua-Ngwa A, Loua KM, Laman E, Nwakanma DC, et al. Genome-wide analysis of selection on the malaria parasite Plasmodium falciparum in West African populations of differing infection endemicity. Mol Biol Evol. 2014;31:1490–9.CrossRefPubMedPubMedCentral Mobegi VA, Duffy CW, Amambua-Ngwa A, Loua KM, Laman E, Nwakanma DC, et al. Genome-wide analysis of selection on the malaria parasite Plasmodium falciparum in West African populations of differing infection endemicity. Mol Biol Evol. 2014;31:1490–9.CrossRefPubMedPubMedCentral
19.
go back to reference Pearson RD, Amato R, Auburn S, Miotto O, Almagro-Garcia J, Amaratunga C, et al. Genomic analysis of local variation and recent evolution in Plasmodium vivax. Nat Genet. 2016;48:959–64.CrossRefPubMed Pearson RD, Amato R, Auburn S, Miotto O, Almagro-Garcia J, Amaratunga C, et al. Genomic analysis of local variation and recent evolution in Plasmodium vivax. Nat Genet. 2016;48:959–64.CrossRefPubMed
20.
go back to reference Assefa S, Lim C, Preston MD, Duffy CW, Nair MB, Adroub SA, et al. Population genomic structure and adaptation in the zoonotic malaria parasite Plasmodium knowlesi. Proc Nat Acad Sci. 2015;112:13027–32.CrossRefPubMedPubMedCentral Assefa S, Lim C, Preston MD, Duffy CW, Nair MB, Adroub SA, et al. Population genomic structure and adaptation in the zoonotic malaria parasite Plasmodium knowlesi. Proc Nat Acad Sci. 2015;112:13027–32.CrossRefPubMedPubMedCentral
21.
go back to reference Murray L, Mobegi VA, Duffy CW, Assefa SA, Kwiatkowski DP, Laman E, et al. Microsatellite genotyping and genome-wide single nucleotide polymorphism-based indices of Plasmodium falciparum diversity within clinical infections. Malar J. 2016;15:1.CrossRef Murray L, Mobegi VA, Duffy CW, Assefa SA, Kwiatkowski DP, Laman E, et al. Microsatellite genotyping and genome-wide single nucleotide polymorphism-based indices of Plasmodium falciparum diversity within clinical infections. Malar J. 2016;15:1.CrossRef
22.
go back to reference \(\text{PF3K} \text{ consortium }\). Plasmodium falciparum 3000 genomes resource, release 3; 2015. https://www.malariagen.net/pf3k-3. \(\text{PF3K} \text{ consortium }\). Plasmodium falciparum 3000 genomes resource, release 3; 2015. https://​www.​malariagen.​net/​pf3k-3.​
23.
go back to reference Efron B. Technical Report No. 115. Stanford University. 1978;1. Efron B. Technical Report No. 115. Stanford University. 1978;1.
24.
go back to reference Efron B, Tibshirani RJ. An introduction to the bootstrap. New York: CRC Press; 1994. Efron B, Tibshirani RJ. An introduction to the bootstrap. New York: CRC Press; 1994.
25.
go back to reference Balding DJ. Likelihood-based inference for genetic correlation coefficients. Theor Popul Biol. 2003;63:221–30.CrossRefPubMed Balding DJ. Likelihood-based inference for genetic correlation coefficients. Theor Popul Biol. 2003;63:221–30.CrossRefPubMed
26.
go back to reference Balding DJ, Nichols RA. DNA profile match probability calculation: how to allow for population stratification, relatedness, database selection and single bands. Forensic Sci Int. 1994;64:125–40.CrossRefPubMed Balding DJ, Nichols RA. DNA profile match probability calculation: how to allow for population stratification, relatedness, database selection and single bands. Forensic Sci Int. 1994;64:125–40.CrossRefPubMed
27.
go back to reference Hastings WK. Monte Carlo sampling methods using Markov chains and their applications. Biometrika. 1970;57:97–109.CrossRef Hastings WK. Monte Carlo sampling methods using Markov chains and their applications. Biometrika. 1970;57:97–109.CrossRef
28.
go back to reference Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. Bayesian data analysis. New York: CRC Press; 2013. Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. Bayesian data analysis. New York: CRC Press; 2013.
29.
go back to reference R Core Team. R: A language and environment for statistical computing. Vienna; 2014. http://www.R-project.org/. R Core Team. R: A language and environment for statistical computing. Vienna; 2014. http://​www.​R-project.​org/​.​
30.
go back to reference O’Brien JD. pfmix R package; 2016. https://github.com/jacobian1980/pfmix. O’Brien JD. pfmix R package; 2016. https://​github.​com/​jacobian1980/​pfmix.​
31.
go back to reference Frankham R. Inbreeding and extinction: a threshold effect. Conserv Biol. 1995;9:792–9.CrossRef Frankham R. Inbreeding and extinction: a threshold effect. Conserv Biol. 1995;9:792–9.CrossRef
32.
go back to reference Lande R, Barrowclough GF. Effective population size, genetic variation, and their use in population management. Viable Popul Conserv. 1987;13:87–123.CrossRef Lande R, Barrowclough GF. Effective population size, genetic variation, and their use in population management. Viable Popul Conserv. 1987;13:87–123.CrossRef
Metadata
Title
Approaches to estimating inbreeding coefficients in clinical isolates of Plasmodium falciparum from genomic sequence data
Authors
John D. O’Brien
Lucas Amenga-Etego
Ruiqi Li
Publication date
01-12-2016
Publisher
BioMed Central
Published in
Malaria Journal / Issue 1/2016
Electronic ISSN: 1475-2875
DOI
https://doi.org/10.1186/s12936-016-1531-z

Other articles of this Issue 1/2016

Malaria Journal 1/2016 Go to the issue