Abstract
High-throughput genotyping technologies have become popular in studies that aim to reveal the genetics behind polygenic traits such as complex disease and the diverse response to some drug treatments. These technologies utilize bioinformatics tools to define strategies, analyze data, and estimate the final associations between certain genetic markers and traits. The strategy followed for an association study depends on its efficiency and cost. The efficiency is based on the assumed characteristics of the polymorphisms’ allele frequencies and linkage disequilibrium for putative casual alleles. Statistically significant markers (single mutations or haplotypes) that cause a human disorder should be validated and their biological function elucidated. The aim of this chapter is to present a subset of bioinformatics tools for haplotype inference, tag SNP selection, and genome-wide association studies using a high-throughput generated SNP data set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Hinds DA, Stuve LL, Nilsen GB, Halperin E, Eskin E, Ballinger DG, Frazer KA, Cox DR. (2005) Whole-genome patterns of common DNA variation in three human populations. Science 307:1072–1079.
The International Haplotype Consortium. (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449:851–862.
The International Haplotype Consortium. (2003) The International HapMap Project. Nature 426:789–796.
The International Haplotype Consortium. (2005) A haplotype map of the human genome. Nature 437:1299–1320.
Gordon D, Finch SJ, Nothnagel M, Ott J. (2002) Power and sample size calculations for case-control genetic association tests when errors are present: application to single nucleotide polymorphisms. Hum Hered 54:22–33.
Zhang K, Calabrese P, Nordborg M, Sun F. (2002) Haplotype block structure and its applications to association studies: power and study designs. Am J Hum Genet 71:1386–1394.
Thomas D, Xie R, Gebregziabher M. (2004) Two-stage sampling designs for gene association studies. Genet Epidemiol 27:401–414.
Hartl DL, Clark AG. (1997) Principle of Population Genetics, 3rd ed., Sinauer Associates, Inc., Sunderland, MA.
Ribas G, Gonzalez-Neira A, Salas A, Milne RL, Vega A, Carracedo B, Gonzalez E, Barroso E, Fernandez LP, Yankilevich P, et al. (2006) Evaluating HapMap SNP data transferability in a large-scale genotyping project involving 175 cancer-associated genes. Hum Genet 118:669–679.
Huang W, He Y, Wang H, Wang Y, Liu Y, Wang Y, Chu X, Wang Y, Xu L, Shen Y, et al. (2006) Linkage disequilibrium sharing and haplotype-tagged SNP portability between populations. Proc Natl Acad Sci USA 103:1418–1421.
Reynolds J, Weir BS, Cockerham CC. (1983) Estimation of the coancestry coefficient: basis for a short-term genetic distance. Genetics 105:767–779.
Lewontin RC. (1988) On measures of gametic disequilibrium. Genetics 120: 849–852.
Pritchard JK, Przeworski M. (2001) Linkage disequilibrium in humans: models and data. Am J Hum Genet 69: 1–14.
Barrett JC, Fry B, Maller J, Daly MJ. (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21: 263–265.
Cavalli-Sforza LL, Menozzi P, Piazza A. (1994) The History and Geography of Human Genes, Princeton University Press, Princeton, NJ.
Carlson CS, Smith JD, Stanaway IB, Rieder MJ Nickerson DA. (2006) Direct detection of null alleles in SNP genotyping data. Hum Mol Genet 15:1931–1937.
Nielsen DM, Ehm MG, Weir BS. (1998) Detecting marker-disease association by testing for Hardy-Weinberg disequilibrium at a marker locus. Am J Hum Genet 63:1531–1540.
Wittke-Thompson JK, Pluzhnikov A, Cox NJ. (2005) Rational inferences about departures from Hardy-Weinberg equilibrium. Am J Hum Genet 76:967–986.
Conrad DF, Andrews TD, Carter NP, Hurles ME, Pritchard JK. (2006) A high-resolution survey of deletion polymorphism in the human genome. Nat Genet 38: 75–81.
Bailey JA, Eichler EE. (2006) Primate segmental duplications: crucibles of evolution, diversity and disease. Nat Rev Genet 7:552–564.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559–575.
Armitage P. (1955) Tests for linear trends in proportions and frequencies. Biometrics 11:375–386.
Devlin B, Roeder K. (1999) Genomic control for association studies. Biometrics 55:997–1004.
Pritchard JK, Stephens M, Donnelly P. (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959.
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. (2006) Principal component analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909.
Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins J, DeFelice M, Lochner A, Faggart M, et al. (2002) The structure of haplotype blocks in the human genome. Science 296:2225–2229.
Bonferroni CE. (1936) Teoria statistica delle classi e calcolo delle probabilità [in Italian]. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze 8:3–62.
Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C. (2004) Detection of large-scale variation in the human genome. Nat Genet 36:949–951.
Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, Maner S, Massa H, Walker M, Chi M, et al. (2004) Large-scale copy number polymorphism in the human genome. Science 305:525–528.
Peiffer DA, Le JM, Steemers FJ, Chang W, Jenniges T, Garcia F, Haden K, Li J, Shaw CA, Belmont J, et al. (2006) High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Res 16:1136–1148.
Colella S, Yau C, Taylor JM, Mirza G, Butler H, Clouston P, Bassett AS, Seller A, Holmes CC, Ragoussis J. (2007) QuantiSNP: an objective Bayes hidden-Markov model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res 35:2013–2025.
Wang K, Li M, Hadley D, Liu R, Glessner J, Grant SF, Hakonarson H, Bucan M. (2007) PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res 17:1665–1674.
Baross A, Delaney AD, Li HI, Nayar T, Flibotte S, Qian H, Chan SY, Asano J, Ally A, Cao M, et al. (2007) Assessment of algorithms for high throughput detection of genomic copy number variation in oligonucleotide microarray data. BMC Bioinformatics 8:368.
Millstein J, Conti DV, Gilliland FD, Gauderman WJ. (2006) A testing framework for identifying susceptibility genes in the presence of epistasis. Am J Hum Genet 78:15–27.
Lake SL, Lyon H, Tantisira K, Silverman EK, Weiss ST, Laird NM, Schaid DJ. (2003) Estimation and tests of haplotype-environment interaction when linkage phase is ambiguous. Hum Hered 55:56–65.
Hedrick P, Sudhir K. (2001) Mutation and linkage disequilibrium in human mtDNA. Eur J Hum Genet 9:969–972.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Glossary
- Allele
-
– One of the variant forms of a gene or a genetic locus.
- Causative SNPs
-
– Changes in a single nucleotide that cause a disease or trait.
- Coding SNPs (cSNPs)
-
– SNPs that occur in regions of a gene that are transcribed into RNA (i.e., an exon) and eventually translated into protein. cSNPs include synonymous SNPs (i.e., confer identical amino acid) and nonsynonymous SNPs (i.e., confer different amino acid).
- Genetic map
-
– Also known as a linkage map. A genetic map shows the position of genes and/or markers on chromosomes relative to each other, based on genetic distance (rather than physical distance). The distance between any two markers is represented as a function of recombination.
- Genetic marker
-
– A DNA sequence whose presence or absence can be reliably measured. Because DNA segments that are in close proximity tend to be inherited together, markers can be used to indirectly track the inheritance pattern of a gene or region known to be nearby.
- Genotype
-
– The combination of alleles carried by an individual at a particular genetic locus.
- Haplotype
-
– Haplotypes are an ordered set of alleles located on one chromosome. They reveal whether a chromosomal segment was maternally or paternally inherited and can be used to delineate the boundary of a possible disease-linked locus.
- Haplotype tagging SNPs (htSNPs)
-
– Those SNPs that represent the variation in each block based on the linkage disequilibrium among the markers considered within a block.
- Hardy–Weinberg equilibrium (HWE)
-
– The equilibrium between the frequencies of alleles and the genotype of a population. The occurrence of a genotype stays constant unless mating is nonrandom or inappropriate, or mutations accumulate. Therefore, the frequency of genotypes and the frequency of alleles are said to be at “genetic equilibrium.” Genetic equilibrium is a basic principle of population genetics.
- Intronic SNPs–
-
Single-nucleotide polymorphisms that occur in noncoding regions of a gene that separate the exons (i.e., introns).
- Linkage disequilibrium (LD)
-
– Phenomenon by which the alleles that are close together in the genome tend to be inherited together (haplotype).
- Linkage map
- Mendelian pattern of inheritance
-
– Refers to the predictable way in which single genes or traits can be passed from parents to children, such as in autosomal dominant, autosomal recessive, or sex-linked patterns.
- Minor allele frequency (MAF)
-
– Given an SNP, its minor allele frequency is the frequency of the SNP’s less frequent allele in a given population.
- Mutation
-
– A change in the DNA sequence. A mutation can be a change from one base to another, a deletion of bases, or an addition of bases. Typically, the term “mutation” is used to refer to a disease-causing change, but technically any change, whether or not it causes a different phenotype, is a mutation.
- Penetrance
-
– Penetrance describes the likelihood that a mutation will cause a phenotype. Some mutations have a high penetrance, almost always causing a phenotype, whereas others have a low penetrance, perhaps only causing a phenotype when other genetic or environmental conditions are present. The best way to measure penetrance is phenotypic concordance in monozygotic twins.
- Phenotype
-
– Visible or detectable traits caused by underlying genetic or environmental factors. Examples include height, weight, blood pressure, and the presence or absence of disease.
- Polygenic disorders
-
– Disorders that are caused by the combined effect of multiple genes, rather than by just one single gene. Most common disorders are polygenic. Because the genes involved are often not located near each other, their inheritance does not usually follow Mendelian patterns in families.
- Surrogate SNPs
-
– Single-nucleotide polymorphisms that do not cause a phenotype but can be used to track one because of their strong physical association (linkage) to an SNP that does cause a phenotype.
- Susceptibility
-
– The likelihood of developing a disease or condition.
Rights and permissions
Copyright information
© 2010 Humana Press, a part of Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Aransay, A.M., Matthiesen, R., Regueiro, M.M. (2010). SNP-PHAGE: High-Throughput SNP Discovery Pipeline. In: Matthiesen, R. (eds) Bioinformatics Methods in Clinical Research. Methods in Molecular Biology, vol 593. Humana Press. https://doi.org/10.1007/978-1-60327-194-3_3
Download citation
DOI: https://doi.org/10.1007/978-1-60327-194-3_3
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-60327-193-6
Online ISBN: 978-1-60327-194-3
eBook Packages: Springer Protocols