Skip to main content
Top
Published in: BMC Medical Genetics 1/2018

Open Access 01-12-2018 | Research article

Validation of genotype imputation in Southeast Asian populations and the effect of single nucleotide polymorphism annotation on imputation outcome

Authors: Worachart Lert-itthiporn, Bhoom Suktitipat, Harald Grove, Anavaj Sakuntabhai, Prida Malasit, Nattaya Tangthawornchaikul, Fumihiko Matsuda, Prapat Suriyaphol

Published in: BMC Medical Genetics | Issue 1/2018

Login to get access

Abstract

Background

Imputation involves the inference of untyped single nucleotide polymorphisms (SNPs) in genome-wide association studies. The haplotypic reference of choice for imputation in Southeast Asian populations is unclear. Moreover, the influence of SNP annotation on imputation results has not been examined.

Methods

This study was divided into two parts. In the first part, we applied imputation to genotyped SNPs from Southeast Asian populations from the Pan-Asian SNP database. Five percent of the total SNPs were removed. The remaining SNPs were applied to imputation with IMPUTE2. The imputed outcomes were verified with the removed SNPs. We compared imputation references from Chinese and Japanese haplotypes from the HapMap phase II (HMII) and the complete set of haplotypes from the 1000 Genomes Project (1000G). The second part was imputation accuracy and yield in Thai patient dataset. Half of the autosomal SNPs was removed to create Set 1. Another dataset, Set 2, was then created where we switched which half of the SNPs were removed. Both Set 1 and Set 2 were imputed with HMII to create a complete imputed SNPs dataset. The dataset was used to validate association testing, SNPs annotation and imputation outcome.

Results

The accuracy was highest for all populations when using the HMII reference, but at the cost of a lower yield. Thai genotypes showed the highest accuracy over other populations in both HMII and 1000G panels, although accuracy and yield varied across chromosomes. Imputation was tested in a clinical dataset to compare accuracy in gene-related regions, and coding regions were found to have a higher accuracy and yield.

Conclusions

This work provides the first evidence of imputation reference selection for Southeast Asian studies and highlights the effects of SNP locations respective to genes on imputation outcome. Researchers will need to consider the trade-off between accuracy and yield in future imputation studies.
Appendix
Available only for authorised users
Literature
1.
go back to reference Jallow M, Teo YY, Small KS, Rockett KA, Deloukas P, Clark TG, Kivinen K, Bojang KA, Conway DJ, Pinder M, et al. Genome-wide and fine-resolution association analysis of malaria in West Africa. Nat Genet. 2009;41(6):657–65.CrossRefPubMedPubMedCentral Jallow M, Teo YY, Small KS, Rockett KA, Deloukas P, Clark TG, Kivinen K, Bojang KA, Conway DJ, Pinder M, et al. Genome-wide and fine-resolution association analysis of malaria in West Africa. Nat Genet. 2009;41(6):657–65.CrossRefPubMedPubMedCentral
3.
go back to reference Spencer CC, Su Z, Donnelly P, Marchini J. Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip. PLoS Genet. 2009;5(5):e1000477.CrossRefPubMedPubMedCentral Spencer CC, Su Z, Donnelly P, Marchini J. Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip. PLoS Genet. 2009;5(5):e1000477.CrossRefPubMedPubMedCentral
4.
5.
go back to reference Zhao Z, Timofeev N, Hartley SW, Chui DH, Fucharoen S, Perls TT, Steinberg MH, Baldwin CT, Sebastiani P. Imputation of missing genotypes: an empirical evaluation of IMPUTE. BMC Genet. 2008;9:85.CrossRefPubMedPubMedCentral Zhao Z, Timofeev N, Hartley SW, Chui DH, Fucharoen S, Perls TT, Steinberg MH, Baldwin CT, Sebastiani P. Imputation of missing genotypes: an empirical evaluation of IMPUTE. BMC Genet. 2008;9:85.CrossRefPubMedPubMedCentral
6.
go back to reference Huang L, Li Y, Singleton AB, Hardy JA, Abecasis G, Rosenberg NA, Scheet P. Genotype-imputation accuracy across worldwide human populations. Am J Hum Genet. 2009;84(2):235–50.CrossRefPubMedPubMedCentral Huang L, Li Y, Singleton AB, Hardy JA, Abecasis G, Rosenberg NA, Scheet P. Genotype-imputation accuracy across worldwide human populations. Am J Hum Genet. 2009;84(2):235–50.CrossRefPubMedPubMedCentral
7.
go back to reference Nothnagel M, Ellinghaus D, Schreiber S, Krawczak M, Franke A. A comprehensive evaluation of SNP genotype imputation. Hum Genet. 2009;125(2):163–71.CrossRefPubMed Nothnagel M, Ellinghaus D, Schreiber S, Krawczak M, Franke A. A comprehensive evaluation of SNP genotype imputation. Hum Genet. 2009;125(2):163–71.CrossRefPubMed
8.
go back to reference Barbujani G, Colonna V. Human genome diversity: frequently asked questions. Trends Genet. 2010;26(7):285–95.CrossRefPubMed Barbujani G, Colonna V. Human genome diversity: frequently asked questions. Trends Genet. 2010;26(7):285–95.CrossRefPubMed
9.
go back to reference Pillai NE, Okada Y, Saw WY, Ong RT, Wang X, Tantoso E, Xu W, Peterson TA, Bielawny T, Ali M, et al. Predicting HLA alleles from high-resolution SNP data in three southeast Asian populations. Hum Mol Genet. 2014;23(16):4443–51.CrossRefPubMed Pillai NE, Okada Y, Saw WY, Ong RT, Wang X, Tantoso E, Xu W, Peterson TA, Bielawny T, Ali M, et al. Predicting HLA alleles from high-resolution SNP data in three southeast Asian populations. Hum Mol Genet. 2014;23(16):4443–51.CrossRefPubMed
10.
go back to reference Browning SR, Browning BL. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet. 2007;81(5):1084–97.CrossRefPubMedPubMedCentral Browning SR, Browning BL. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet. 2007;81(5):1084–97.CrossRefPubMedPubMedCentral
11.
go back to reference Marchini J, Howie B, Myers S, McVean G, Donnelly P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet. 2007;39(7):906–13.CrossRefPubMed Marchini J, Howie B, Myers S, McVean G, Donnelly P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet. 2007;39(7):906–13.CrossRefPubMed
12.
go back to reference Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–75.CrossRefPubMedPubMedCentral Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–75.CrossRefPubMedPubMedCentral
13.
go back to reference Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol. 2010;34(8):816–34.CrossRefPubMedPubMedCentral Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol. 2010;34(8):816–34.CrossRefPubMedPubMedCentral
14.
go back to reference Wong KM, Langlais K, Tobias GS, Fletcher-Hoppe C, Krasnewich D, Leeds HS, Rodriguez LL, Godynskiy G, Schneider VA, Ramos EM, et al. The dbGaP data browser: a new tool for browsing dbGaP controlled-access genomic data. Nucleic Acids Res. 2017;45(D1):D819–26.CrossRefPubMed Wong KM, Langlais K, Tobias GS, Fletcher-Hoppe C, Krasnewich D, Leeds HS, Rodriguez LL, Godynskiy G, Schneider VA, Ramos EM, et al. The dbGaP data browser: a new tool for browsing dbGaP controlled-access genomic data. Nucleic Acids Res. 2017;45(D1):D819–26.CrossRefPubMed
15.
go back to reference MacArthur J, Bowler E, Cerezo M, Gil L, Hall P, Hastings E, Junkins H, McMahon A, Milano A, Morales J, et al. The new NHGRI-EBI catalog of published genome-wide association studies (GWAS catalog). Nucleic Acids Res. 2017;45(D1):D896–901.CrossRefPubMed MacArthur J, Bowler E, Cerezo M, Gil L, Hall P, Hastings E, Junkins H, McMahon A, Milano A, Morales J, et al. The new NHGRI-EBI catalog of published genome-wide association studies (GWAS catalog). Nucleic Acids Res. 2017;45(D1):D896–901.CrossRefPubMed
16.
go back to reference Ngamphiw C, Assawamakin A, Xu S, Shaw PJ, Yang JO, Ghang H, Bhak J, Liu E, Tongsima S, Consortium HP-AS. PanSNPdb: the Pan-Asian SNP genotyping database. PLoS One. 2011;6(6):e21451.CrossRefPubMedPubMedCentral Ngamphiw C, Assawamakin A, Xu S, Shaw PJ, Yang JO, Ghang H, Bhak J, Liu E, Tongsima S, Consortium HP-AS. PanSNPdb: the Pan-Asian SNP genotyping database. PLoS One. 2011;6(6):e21451.CrossRefPubMedPubMedCentral
17.
18.
go back to reference Finkel TH, Li J, Wei Z, Wang W, Zhang H, Behrens EM, Reuschel EL, Limou S, Wise C, Punaro M, et al. Variants in CXCR4 associate with juvenile idiopathic arthritis susceptibility. BMC Med Genet. 2016;17:24.CrossRefPubMedPubMedCentral Finkel TH, Li J, Wei Z, Wang W, Zhang H, Behrens EM, Reuschel EL, Limou S, Wise C, Punaro M, et al. Variants in CXCR4 associate with juvenile idiopathic arthritis susceptibility. BMC Med Genet. 2016;17:24.CrossRefPubMedPubMedCentral
19.
go back to reference Han S, Kim-Howard X, Deshmukh H, Kamatani Y, Viswanathan P, Guthridge JM, Thomas K, Kaufman KM, Ojwang J, Rojas-Villarraga A, et al. Evaluation of imputation-based association in and around the integrin-alpha-M (ITGAM) gene and replication of robust association between a non-synonymous functional variant within ITGAM and systemic lupus erythematosus (SLE). Hum Mol Genet. 2009;18(6):1171–80.CrossRefPubMedPubMedCentral Han S, Kim-Howard X, Deshmukh H, Kamatani Y, Viswanathan P, Guthridge JM, Thomas K, Kaufman KM, Ojwang J, Rojas-Villarraga A, et al. Evaluation of imputation-based association in and around the integrin-alpha-M (ITGAM) gene and replication of robust association between a non-synonymous functional variant within ITGAM and systemic lupus erythematosus (SLE). Hum Mol Genet. 2009;18(6):1171–80.CrossRefPubMedPubMedCentral
20.
go back to reference Li L, Li Y, Browning SR, Browning BL, Slater AJ, Kong X, Aponte JL, Mooser VE, Chissoe SL, Whittaker JC, et al. Performance of genotype imputation for rare variants identified in exons and flanking regions of genes. PLoS One. 2011;6(9):e24945.CrossRefPubMedPubMedCentral Li L, Li Y, Browning SR, Browning BL, Slater AJ, Kong X, Aponte JL, Mooser VE, Chissoe SL, Whittaker JC, et al. Performance of genotype imputation for rare variants identified in exons and flanking regions of genes. PLoS One. 2011;6(9):e24945.CrossRefPubMedPubMedCentral
21.
go back to reference Turner S, Armstrong LL, Bradford Y, Carlson CS, Crawford DC, Crenshaw AT, de Andrade M, Doheny KF, Haines JL, Hayes G, et al. Quality control procedures for genome-wide association studies. Curr Prot Hum Genet. 2011;Chapter 1:Unit1. 19. Turner S, Armstrong LL, Bradford Y, Carlson CS, Crawford DC, Crenshaw AT, de Andrade M, Doheny KF, Haines JL, Hayes G, et al. Quality control procedures for genome-wide association studies. Curr Prot Hum Genet. 2011;Chapter 1:Unit1. 19.
22.
go back to reference Delaneau O, Zagury JF, Marchini J. Improved whole-chromosome phasing for disease and population genetic studies. Nat Methods. 2013;10(1):5–6.CrossRefPubMed Delaneau O, Zagury JF, Marchini J. Improved whole-chromosome phasing for disease and population genetic studies. Nat Methods. 2013;10(1):5–6.CrossRefPubMed
23.
go back to reference Howie BN, Donnelly P, Marchini J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 2009;5(6):e1000529.CrossRefPubMedPubMedCentral Howie BN, Donnelly P, Marchini J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 2009;5(6):e1000529.CrossRefPubMedPubMedCentral
24.
go back to reference Lert-itthiporn W, Suriyaphol P. Genotype imputation in Thai population. In: Poster presented at: the Human Genome Meeting 2015. Kuala Lumpur: Human Genome Organisation; 2015. Lert-itthiporn W, Suriyaphol P. Genotype imputation in Thai population. In: Poster presented at: the Human Genome Meeting 2015. Kuala Lumpur: Human Genome Organisation; 2015.
25.
go back to reference Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001;29(1):308–11.CrossRefPubMedPubMedCentral Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001;29(1):308–11.CrossRefPubMedPubMedCentral
26.
go back to reference Zheng HF, Rong JJ, Liu M, Han F, Zhang XW, Richards JB, Wang L. Performance of genotype imputation for low frequency and rare variants from the 1000 genomes. PLoS One. 2015;10(1):e0116487.CrossRefPubMedPubMedCentral Zheng HF, Rong JJ, Liu M, Han F, Zhang XW, Richards JB, Wang L. Performance of genotype imputation for low frequency and rare variants from the 1000 genomes. PLoS One. 2015;10(1):e0116487.CrossRefPubMedPubMedCentral
27.
go back to reference Sung YJ, Gu CC, Tiwari HK, Arnett DK, Broeckel U, Rao DC. Genotype imputation for African Americans using data from HapMap phase II versus 1000 genomes projects. Genet Epidemiol. 2012;36(5):508–16.CrossRefPubMedPubMedCentral Sung YJ, Gu CC, Tiwari HK, Arnett DK, Broeckel U, Rao DC. Genotype imputation for African Americans using data from HapMap phase II versus 1000 genomes projects. Genet Epidemiol. 2012;36(5):508–16.CrossRefPubMedPubMedCentral
28.
go back to reference Krithika S, Valladares-Salgado A, Peralta J, Escobedo-de La Pena J, Kumate-Rodriguez J, Cruz M, Parra EJ. Evaluation of the imputation performance of the program IMPUTE in an admixed sample from Mexico City using several model designs. BMC Med Genet. 2012;5:12. Krithika S, Valladares-Salgado A, Peralta J, Escobedo-de La Pena J, Kumate-Rodriguez J, Cruz M, Parra EJ. Evaluation of the imputation performance of the program IMPUTE in an admixed sample from Mexico City using several model designs. BMC Med Genet. 2012;5:12.
29.
go back to reference Consortium HP-AS, Abdulla MA, Ahmed I, Assawamakin A, Bhak J, Brahmachari SK, Calacal GC, Chaurasia A, Chen CH, Chen J, et al. Mapping human genetic diversity in Asia. Science. 2009;326(5959):1541–5.CrossRef Consortium HP-AS, Abdulla MA, Ahmed I, Assawamakin A, Bhak J, Brahmachari SK, Calacal GC, Chaurasia A, Chen CH, Chen J, et al. Mapping human genetic diversity in Asia. Science. 2009;326(5959):1541–5.CrossRef
30.
go back to reference Hunt KA, Mistry V, Bockett NA, Ahmad T, Ban M, Barker JN, Barrett JC, Blackburn H, Brand O, Burren O, et al. Negligible impact of rare autoimmune-locus coding-region variants on missing heritability. Nature. 2013;498(7453):232–5.CrossRefPubMedPubMedCentral Hunt KA, Mistry V, Bockett NA, Ahmad T, Ban M, Barker JN, Barrett JC, Blackburn H, Brand O, Burren O, et al. Negligible impact of rare autoimmune-locus coding-region variants on missing heritability. Nature. 2013;498(7453):232–5.CrossRefPubMedPubMedCentral
32.
go back to reference Bayegan AH, Garcia-Martin JA, Clote P. New tools to analyze overlapping coding regions. BMC Bioinf. 2016;17(1):530.CrossRef Bayegan AH, Garcia-Martin JA, Clote P. New tools to analyze overlapping coding regions. BMC Bioinf. 2016;17(1):530.CrossRef
35.
go back to reference Cowie P, Hay EA, MacKenzie A. The noncoding human genome and the future of personalised medicine. Expert Rev Mol Med. 2015;17:e4.CrossRefPubMed Cowie P, Hay EA, MacKenzie A. The noncoding human genome and the future of personalised medicine. Expert Rev Mol Med. 2015;17:e4.CrossRefPubMed
36.
go back to reference Southam L, Panoutsopoulou K, Rayner NW, Chapman K, Durrant C, Ferreira T, Arden N, Carr A, Deloukas P, Doherty M, et al. The effect of genome-wide association scan quality control on imputation outcome for common variants. Eur J Hum Genet. 2011;19(5):610–4.CrossRefPubMedPubMedCentral Southam L, Panoutsopoulou K, Rayner NW, Chapman K, Durrant C, Ferreira T, Arden N, Carr A, Deloukas P, Doherty M, et al. The effect of genome-wide association scan quality control on imputation outcome for common variants. Eur J Hum Genet. 2011;19(5):610–4.CrossRefPubMedPubMedCentral
37.
go back to reference Shriner D. Impact of Hardy-Weinberg disequilibrium on post-imputation quality control. Hum Genet. 2013;132(9):1073–5.CrossRefPubMed Shriner D. Impact of Hardy-Weinberg disequilibrium on post-imputation quality control. Hum Genet. 2013;132(9):1073–5.CrossRefPubMed
Metadata
Title
Validation of genotype imputation in Southeast Asian populations and the effect of single nucleotide polymorphism annotation on imputation outcome
Authors
Worachart Lert-itthiporn
Bhoom Suktitipat
Harald Grove
Anavaj Sakuntabhai
Prida Malasit
Nattaya Tangthawornchaikul
Fumihiko Matsuda
Prapat Suriyaphol
Publication date
01-12-2018
Publisher
BioMed Central
Published in
BMC Medical Genetics / Issue 1/2018
Electronic ISSN: 1471-2350
DOI
https://doi.org/10.1186/s12881-018-0534-8

Other articles of this Issue 1/2018

BMC Medical Genetics 1/2018 Go to the issue