Skip to main content
Top
Published in: BMC Proceedings 2/2011

Open Access 01-12-2011 | Proceedings

Inferring ethnicity from mitochondrial DNA sequence

Authors: Chih Lee, Ion I Măndoiu, Craig E Nelson

Published in: BMC Proceedings | Special Issue 2/2011

Login to get access

Abstract

Background

The assignment of DNA samples to coarse population groups can be a useful but difficult task. One such example is the inference of coarse ethnic groupings for forensic applications. Ethnicity plays an important role in forensic investigation and can be inferred with the help of genetic markers. Being maternally inherited, of high copy number, and robust persistence in degraded samples, mitochondrial DNA may be useful for inferring coarse ethnicity. In this study, we compare the performance of methods for inferring ethnicity from the sequence of the hypervariable region of the mitochondrial genome.

Results

We present the results of comprehensive experiments conducted on datasets extracted from the mtDNA population database, showing that ethnicity inference based on support vector machines (SVM) achieves an overall accuracy of 80-90%, consistently outperforming nearest neighbor and discriminant analysis methods previously proposed in the literature. We also evaluate methods of handling missing data and characterize the most informative segments of the hypervariable region of the mitochondrial genome.

Conclusions

Support vector machines can be used to infer coarse ethnicity from a small region of mitochondrial DNA sequence with surprisingly high accuracy. In the presence of missing data, utilizing only the regions common to the training sequences and a test sequence proves to be the best strategy. Given these results, SVM algorithms are likely to also be useful in other DNA sequence classification applications.
Appendix
Available only for authorised users
Literature
1.
go back to reference Shriver MD, Smith MW, Jin L, Marcini A, Akey JM, Deka R, Ferrell RE: Ethnic-affiliation estimation by use of population-specific DNA markers. American Journal of Human Genetics. 1997, 60 (4): 957-964.PubMedCentralPubMed Shriver MD, Smith MW, Jin L, Marcini A, Akey JM, Deka R, Ferrell RE: Ethnic-affiliation estimation by use of population-specific DNA markers. American Journal of Human Genetics. 1997, 60 (4): 957-964.PubMedCentralPubMed
2.
go back to reference Phillips C, Salas A, Sánchez J, Fondevila M, Gómez-Tato A, Álvarez Dios J, Calaza M, de Cal MC, Ballard D, Lareu M, Carracedo A: Inferring ancestral origin using a single multiplex assay of ancestry-informative marker SNPs. Forensic Science International:Genetics. 2007, 1 (3-4): 273-280. 10.1016/j.fsigen.2007.06.008.CrossRefPubMed Phillips C, Salas A, Sánchez J, Fondevila M, Gómez-Tato A, Álvarez Dios J, Calaza M, de Cal MC, Ballard D, Lareu M, Carracedo A: Inferring ancestral origin using a single multiplex assay of ancestry-informative marker SNPs. Forensic Science International:Genetics. 2007, 1 (3-4): 273-280. 10.1016/j.fsigen.2007.06.008.CrossRefPubMed
3.
go back to reference Connor A, Stoneking M: Assessing ethnicity from human mitochondrial DNA types determined by hybridization with sequence-specific oligonucleotides. Journal of forensic sciences. 1994, 39 (6): 1360-1371.CrossRefPubMed Connor A, Stoneking M: Assessing ethnicity from human mitochondrial DNA types determined by hybridization with sequence-specific oligonucleotides. Journal of forensic sciences. 1994, 39 (6): 1360-1371.CrossRefPubMed
4.
go back to reference Rohl A, Brinkmann B, Forster L, Forster P: An annotated mtDNA database. International Journal of Legal Medicine. 2001, 115 (29): 39- Rohl A, Brinkmann B, Forster L, Forster P: An annotated mtDNA database. International Journal of Legal Medicine. 2001, 115 (29): 39-
5.
go back to reference Egeland T, Bøvelstad HM, Storvik GO, Salas A: Inferring the most likely geographical origin of mtDNA sequence profiles. Annals of human genetics. 2004, 68 (5): 461-471. 10.1046/j.1529-8817.2004.00109.x.CrossRefPubMed Egeland T, Bøvelstad HM, Storvik GO, Salas A: Inferring the most likely geographical origin of mtDNA sequence profiles. Annals of human genetics. 2004, 68 (5): 461-471. 10.1046/j.1529-8817.2004.00109.x.CrossRefPubMed
6.
go back to reference Hastie T, Tibshirani R, Friedman JH: The Elements of Statistical Learning. 2009, Springer, 2CrossRef Hastie T, Tibshirani R, Friedman JH: The Elements of Statistical Learning. 2009, Springer, 2CrossRef
7.
go back to reference Monson KL, Miller KWP, Wilson MR, DiZinno JA, Budowle B: The mtDNA Population Database: An Integrated Software and Database Resource for Forensic Comparison. Forensic Science Communications. 2002, 4 (2): Monson KL, Miller KWP, Wilson MR, DiZinno JA, Budowle B: The mtDNA Population Database: An Integrated Software and Database Resource for Forensic Comparison. Forensic Science Communications. 2002, 4 (2):
8.
go back to reference Behar DM, Rosset S, Blue-Smith J, Balanovsky O, Tzur S, Comas D, Mitchell RJ, Quintana-Murci L, Tyler-Smith C, Wells RS, Consortium TG: The Genographic Project Public Participation Mitochondrial DNA Database. PLoS Genet. 2007, 3 (6): e104-10.1371/journal.pgen.0030104.PubMedCentralCrossRefPubMed Behar DM, Rosset S, Blue-Smith J, Balanovsky O, Tzur S, Comas D, Mitchell RJ, Quintana-Murci L, Tyler-Smith C, Wells RS, Consortium TG: The Genographic Project Public Participation Mitochondrial DNA Database. PLoS Genet. 2007, 3 (6): e104-10.1371/journal.pgen.0030104.PubMedCentralCrossRefPubMed
9.
go back to reference Dibennardo R, Taylor JV: Multiple discriminant function analysis of sex and race in the postcranial skeleton. American Journal of Physical Anthropology. 1983, 61 (3): 305-314. 10.1002/ajpa.1330610305.CrossRefPubMed Dibennardo R, Taylor JV: Multiple discriminant function analysis of sex and race in the postcranial skeleton. American Journal of Physical Anthropology. 1983, 61 (3): 305-314. 10.1002/ajpa.1330610305.CrossRefPubMed
10.
go back to reference İşcan MY: A Topical Guide to the American Journal of Physical Anthropology: Volumes 22-53 (1964-1980). 1983, Wiley-Liss İşcan MY: A Topical Guide to the American Journal of Physical Anthropology: Volumes 22-53 (1964-1980). 1983, Wiley-Liss
11.
go back to reference Bamshad M, Wooding S, Salisbury BA, Stephens JC: Deconstructing the relationship between genetics and race. Nature Reviews Genetics. 2004, 5 (8): 598-609. 10.1038/nrg1401.CrossRefPubMed Bamshad M, Wooding S, Salisbury BA, Stephens JC: Deconstructing the relationship between genetics and race. Nature Reviews Genetics. 2004, 5 (8): 598-609. 10.1038/nrg1401.CrossRefPubMed
12.
go back to reference Vapnik V: Statistical Learning Theory. 1998, Wiley Vapnik V: Statistical Learning Theory. 1998, Wiley
14.
go back to reference Knerr S, Personnaz L, Dreyfus G: Single-layer learning revisited: a stepwise procedure for building and training a neural network. Neurocomputing: Algorithms Architectures and Application. Edited by: Fogelman J, Springer-Verlag. 1990 Knerr S, Personnaz L, Dreyfus G: Single-layer learning revisited: a stepwise procedure for building and training a neural network. Neurocomputing: Algorithms Architectures and Application. Edited by: Fogelman J, Springer-Verlag. 1990
15.
go back to reference Fraley C, Raftery AE: Enhanced Model-Based Clustering, Density Estimation, and Discriminant Analysis Software: MCLUST. Journal of Classification. 2003, 20 (2): 263-286. 10.1007/s00357-003-0015-3.CrossRef Fraley C, Raftery AE: Enhanced Model-Based Clustering, Density Estimation, and Discriminant Analysis Software: MCLUST. Journal of Classification. 2003, 20 (2): 263-286. 10.1007/s00357-003-0015-3.CrossRef
16.
go back to reference M Behar D, Rosset S, Blue-Smith J, Balanovsky O, Tzur S, Comas D, Mitchell RJ, Quintana-Murci L, Tyler-Smith C, Wells RS, Consortium TG: Correction: The Genographic Project Public Participation Mitochondrial DNA Database. PLoS Genet. 2007, 3 (9): e169-10.1371/journal.pgen.0030169.PubMedCentralCrossRef M Behar D, Rosset S, Blue-Smith J, Balanovsky O, Tzur S, Comas D, Mitchell RJ, Quintana-Murci L, Tyler-Smith C, Wells RS, Consortium TG: Correction: The Genographic Project Public Participation Mitochondrial DNA Database. PLoS Genet. 2007, 3 (9): e169-10.1371/journal.pgen.0030169.PubMedCentralCrossRef
17.
go back to reference Lewis DD: Evaluating text categorization. In Proceedings of Speech and Natural Language Workshop. 1991, Morgan Kaufmann, 312-318. abstract_only.CrossRef Lewis DD: Evaluating text categorization. In Proceedings of Speech and Natural Language Workshop. 1991, Morgan Kaufmann, 312-318. abstract_only.CrossRef
18.
go back to reference Tzen J, Hsu H, MN W: Redefinition of hypervariable region I in mitochondrial DNA control region and comparing its diversity among various ethnic groups. Mitochondrion. 2008, 8 (2): 146-154. 10.1016/j.mito.2007.11.002.CrossRefPubMed Tzen J, Hsu H, MN W: Redefinition of hypervariable region I in mitochondrial DNA control region and comparing its diversity among various ethnic groups. Mitochondrion. 2008, 8 (2): 146-154. 10.1016/j.mito.2007.11.002.CrossRefPubMed
19.
go back to reference Di Bernardo G, Del Gaudio S, Galderisi U, Cipollaro M: 2000 Year-old ancient equids: an ancient-DNA lesson from pompeii remains. Journal of Experimental Zoology Part B: Molecular and Developmental Evolution. 2004, 302B (6): 550-556. 10.1002/jez.b.21017.CrossRef Di Bernardo G, Del Gaudio S, Galderisi U, Cipollaro M: 2000 Year-old ancient equids: an ancient-DNA lesson from pompeii remains. Journal of Experimental Zoology Part B: Molecular and Developmental Evolution. 2004, 302B (6): 550-556. 10.1002/jez.b.21017.CrossRef
20.
go back to reference Schlecht J, Kaplan ME, Barnard K, Karafet T, Hammer MF, Merchant NC: Machine-Learning Approaches for Classifying Haplogroup from Y Chromosome STR Data. PLoS Comput Biol. 2008, 4 (6): e1000093-10.1371/journal.pcbi.1000093.PubMedCentralCrossRefPubMed Schlecht J, Kaplan ME, Barnard K, Karafet T, Hammer MF, Merchant NC: Machine-Learning Approaches for Classifying Haplogroup from Y Chromosome STR Data. PLoS Comput Biol. 2008, 4 (6): e1000093-10.1371/journal.pcbi.1000093.PubMedCentralCrossRefPubMed
21.
go back to reference Schelleman H, Limdi NA, Kimmel SE: Ethnic differences in warfarin maintenance dose requirement and its relationship with genetics. Pharmacogenomics. 2008, 9 (9): 1331-1346. 10.2217/14622416.9.9.1331.CrossRefPubMed Schelleman H, Limdi NA, Kimmel SE: Ethnic differences in warfarin maintenance dose requirement and its relationship with genetics. Pharmacogenomics. 2008, 9 (9): 1331-1346. 10.2217/14622416.9.9.1331.CrossRefPubMed
22.
go back to reference Yancy CW: Race-based therapeutics. Current Hypertension Reports. 2008, 10 (4): 276-285. 10.1007/s11906-008-0052-8.CrossRefPubMed Yancy CW: Race-based therapeutics. Current Hypertension Reports. 2008, 10 (4): 276-285. 10.1007/s11906-008-0052-8.CrossRefPubMed
23.
go back to reference Bandelt HJ, Lahermo P, Richards M, Macaulay V: Detecting errors in mtDNA data by phylogenetic analysis. International Journal of Legal Medicine. 2001, 115 (2): 64-69. 10.1007/s004140100228.CrossRefPubMed Bandelt HJ, Lahermo P, Richards M, Macaulay V: Detecting errors in mtDNA data by phylogenetic analysis. International Journal of Legal Medicine. 2001, 115 (2): 64-69. 10.1007/s004140100228.CrossRefPubMed
24.
go back to reference Bandelt H, Quintana-Murci L, Salas A, Macaulay V: The fingerprint of phantom mutations in mitochondrial DNA data. American journal of human genetics. 2002, 71 (5): 1150-1160. 10.1086/344397.PubMedCentralCrossRefPubMed Bandelt H, Quintana-Murci L, Salas A, Macaulay V: The fingerprint of phantom mutations in mitochondrial DNA data. American journal of human genetics. 2002, 71 (5): 1150-1160. 10.1086/344397.PubMedCentralCrossRefPubMed
25.
go back to reference Forster P: To err is human. Annals of Human Genetics. 2003, 67: 2-4. 10.1046/j.1469-1809.2003.00002.x.CrossRefPubMed Forster P: To err is human. Annals of Human Genetics. 2003, 67: 2-4. 10.1046/j.1469-1809.2003.00002.x.CrossRefPubMed
26.
go back to reference Dennis C: Error reports threaten to unravel databases of mitochondrial DNA. Nature. 2003, 421 (6925): 773-774. 10.1038/421773a.CrossRefPubMed Dennis C: Error reports threaten to unravel databases of mitochondrial DNA. Nature. 2003, 421 (6925): 773-774. 10.1038/421773a.CrossRefPubMed
27.
go back to reference Bandelt H, Salas A, Bravi C: Problems in FBI mtDNA database. Science. 2004, 305 (5689): 1402-1404. 10.1126/science.305.5689.1402b.CrossRefPubMed Bandelt H, Salas A, Bravi C: Problems in FBI mtDNA database. Science. 2004, 305 (5689): 1402-1404. 10.1126/science.305.5689.1402b.CrossRefPubMed
Metadata
Title
Inferring ethnicity from mitochondrial DNA sequence
Authors
Chih Lee
Ion I Măndoiu
Craig E Nelson
Publication date
01-12-2011
Publisher
BioMed Central
Published in
BMC Proceedings / Issue Special Issue 2/2011
Electronic ISSN: 1753-6561
DOI
https://doi.org/10.1186/1753-6561-5-S2-S11

Other articles of this Special Issue 2/2011

BMC Proceedings 2/2011 Go to the issue