Abstract
Many recent studies have established that haplotype diversity in a small region may not be greatly diminished when the number of markers is reduced to a smaller set of “haplotype-tagging” single-nucleotide polymorphisms (SNPs) that identify the most common haplotypes. These studies are motivated by the assumption that retention of haplotype diversity assures retention of power for mapping disease susceptibility by allelic association. Using two bodies of real data, three proposed measures of diversity, and regression-based methods for association mapping, we found no scenario for which this assumption was tenable. We compared the chi-square for composite likelihood and the maximum chi-square for single SNPs in diplotypes, excluding the marker designated as causal. All haplotype-tagging methods conserve haplotype diversity by selecting common SNPs. When the causal marker has a range of allele frequencies as in real data, chi-square decreases faster than under random selection as the haplotype-tagging set diminishes. Selecting SNPs by maximizing haplotype diversity is inefficient when their frequency is much different from the unknown frequency of the causal variant. Loss of power is minimized when the difference between minor allele frequencies of the causal SNP and a closely associated marker SNP is small, which is unlikely in ignorance of the frequency of the causal SNP unless dense markers are used. Therefore retention of haplotype diversity in simulations that do not mirror genomic allele frequencies has no relevance to power for association mapping. TagSNPs that are assigned to bins instead of haplotype blocks also lose power compared with random SNPs. This evidence favours a multi-stage design in which both models and density change adaptively.
Similar content being viewed by others
References
Ackerman H, Usen S, Mott R, Richardson A, Sisay-Joof F, Katundu P, Taylor T, Ward R, Molyneux M, Pinder M, Kwiatkowski DP (2003) Haplotype analysis of the TNF locus by association efficiency and entropy. Genome Biol 4:R24
Botstein D, Risch N (2003) Discovering genotypes underlying human phenotypes: past successes for Mendelian disease, future approaches for complex disease. Nat Genet 33 (Suppl): 228–237
Burgner D, Usen S, Rockett K, Jallow M, Ackerman H, Cervino A, Pinder M, Kwiatkowski DP (2003) Nucleotide and haplotypic diversity of the NOS2A promoter region and its relationship to cerebral malaria. Hum Genet 112:379–386
Cardon LR, Abecasis GR (2003) Using haplotype blocks to map human complex trait loci. Trends Genet 19:135–140
Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA (2004) Selecting a maximally informative set of single-nucleotide polymorphisms for association analysis using linkage disequilibrium. Am J Hum Genet 74:106–120
Clark AG (2003) Finding genes underlying risk of complex disease by linkage disequilibrium mapping. Curr Opin Genet Dev 13:296–302
Collins A, Morton NE (1998) Mapping a disease locus by allelic association. Proc Natl Acad Sci USA 95:1741–1745
Couzin J (2002) Genomics. New mapping project splits the community. Science 296:1391–1393
Crow JF, Kimura M (1970) An introduction to population genetics theory. Harper and Row, New York
Daly MJ, Rioux JD, Schaffner SF, Hudson TJ, Lander ES (2001) High-resolution haplotype structure in the human genome. Nat Genet 29:229–232
Devlin B, Risch N, Roeder K (1996) Disequilibrium mapping: composite likelihood for pairwise disequilibrium. Genomics 36:1–16
Jeffreys AJ, Kauppi L, Neumann R (2001) Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex. Nat Genet 29:217–222
Johnson GC, Esposito L, Barratt BJ, Smith AN, Heward J, Di Genova G, Ueda H, Cordell HJ, Eaves IA, Dudbridge F, Twells RC, Payne F, Hughes W, Nutland S, Stevens H, Carr P, Tuomilehto-Wolf E, Tuomilehto J, Gough SC, Clayton DG, Todd JA (2001) Haplotype tagging for the identification of common disease genes. Nat Genet 29:233–237
Ke X, Hunt S, Tapper W, Lawrence R, Stavrides G, Ghori J, Whittaker P, Collins A, Morris AP, Bentley D, Cardon LR, Deloukas P (2004) The impact of SNP density on fine-scale patterns of linkage disequilibrium. Hum Mol Genet 13:577–588
Kruglyak L, Nickerson DA (2001) Variation is the spice of life. Nat Genet 27:234–236
Lonjou C, Zhang W, Collins A, Tapper WJ, Elahi E, Maniatis N, Morton NE (2003) Linkage disequilibrium in human populations. Proc Natl Acad Sci USA 100:6069–6074
Malecot G (1969) The Mathematics of Heredity. Freeman, San Francisco
Malecot G (1973) Isolation by distance. In: Morton NE (ed) Genetic Structure of Populations. University of Hawaii Press, Honolulu, pp 72–75
Maniatis N, Collins A, Xu CF, McCarthy LC, Hewett DR, Tapper W, Ennis S, Ke X, Morton NE (2002) The first linkage disequilibrium (LD) maps: delineation of hot and cold blocks by diplotype analysis. Proc Natl Acad Sci USA 99:2228–2233
Maniatis N, Collins A, Gibson J, Zhang W, Tapper W, Morton NE (2004) Positional cloning by linkage disequilibrium. Am J Hum Genet 74:846–855
Meng Z, Zaykin DV, Xu C-F, Wagner M, Ehm MG (2003) Selection of genetic markers for association analyses, using linkage disequilibrium and haplotypes. Am J Hum Genet 73:115–130
Morris AP, Whittaker JC, Balding DJ (2002) Fine-scale mapping of disease loci via shattered coalescent modelling of genealogies. Am J Hum Genet 70:686–707
Morton NE (1955) Sequential tests for the detection of linkage. Am J Hum Genet 7:277–318
Morton NE, Zhang W, Taillon-Miller P, Ennis S, Kwok PY, Collins A (2001) The optimal measure of allelic association. Proc Natl Acad Sci USA 98:5217–5221
Patil N, Berno AJ, Hinds DA, Barrett WA, Doshi JM, Hacker CR, Kautzer CR, Lee DH, Marjoribanks C, McDonough DP, Nguyen BT, Norris MC, Sheehan JB, Shen N, Stern D, Stokowski RP, Thomas DJ, Trulson MO, Vyas KR, Frazer KA, Fodor SP, Cox DR (2001) Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294:1719–1723
Pritchard JK (2001) Are rare variants responsible for susceptibility to common diseases? Am J Hum Genet 69:124–137
Pritchard JK, Cox NJ (2002) The allelic architecture of human disease genes: common disease-common variant...or not? Hum Mol Genet 11:2417–2423
Reich DE, Lander ES (2001) On the allelic spectrum of human disease. Trends Genet 17:502–510
Risch N, Merikangas K (1996) The future of genetic studies of complex human diseases. Science 273:1516–1517
Sebastiani P, Lazarus R, Weiss ST, Kunkel LM, Kohane IS, Romani MF (2003) Minimal haplotype tagging. Proc Natl Acad Sci USA 100:9900–9905
Shannon CE (1948) A mathematical theory of communication. Bell System Tech J 27:379–423, 623–656
Storey JD, Tibshirani R (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci USA 100:9440–9445
Stram DO, Haiman CA, Hirschhorn JN, Altshuler D, Kolonel LN, Henderson BE, Pike MC (2003) Choosing haplotype-tagging SNPs based on unphased genotype data using a preliminary sample of unrelated subjects with an example from the multiethnic cohort study. Hum Hered 55:27–36
Terwilliger JD (2000) A likelihood-based extended admixture model of oligogenic inheritance in ‘model-based’ and ‘model-free’ analysis. Eur J Hum Genet 8:399–406
Wang WY, Todd JA (2003) The usefulness of different density SNP maps for disease association studies of common variants. Hum Mol Genet 12:3145–3149
Weiss KM, Clark AG (2002) Linkage disequilibrium and the mapping of complex human traits. Trends Genet 18:19–24
Zaykin DV, Westfall PH, Young SS, Karnoub MA, Wagner MJ, Ehm MG (2002) Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals. Hum Hered 53:79–91
Zhang K, Calabrese P, Nordborg M, Sun F (2002a) Haplotype block structure and its applications to association studies: power and study designs. Am J Hum Genet 71:1386–1394
Zhang K, Deng M, Chen T, Waterman MS, Sun F (2002b) A dynamic programming algorithm for haplotype block partitioning. Proc Natl Acad Sci USA 99:7335–7339
Zhang W, Collins A, Maniatis N, Tapper W, Morton NE (2002c) Properties of linkage disequilibrium (LD) maps. Proc Natl Acad Sci USA 99:17004–17007
Zhao H, Pfeiffer R, Gail M (2003) How useful are the tagging SNPs for identifying complex disease genes? Am J Hum Genet 73 (Suppl): 216
Acknowledgements
We are grateful to Alec Jeffreys and Mark Daly for making their data publicly available. We thank Daniel Stram and Kui Zhang for the tagSNPs and HapBlock programs and suggestions in using them. This work was supported by a grant from the Medical Research Council.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, W., Collins, A. & Morton, N.E. Does haplotype diversity predict power for association mapping of disease susceptibility?. Hum Genet 115, 157–164 (2004). https://doi.org/10.1007/s00439-004-1122-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00439-004-1122-x