Skip to main content

SNP-PHAGE: High-Throughput SNP Discovery Pipeline

  • Protocol
  • First Online:
Bioinformatics Methods in Clinical Research

Part of the book series: Methods in Molecular Biology ((MIMB,volume 593))

Abstract

High-throughput genotyping technologies have become popular in studies that aim to reveal the genetics behind polygenic traits such as complex disease and the diverse response to some drug treatments. These technologies utilize bioinformatics tools to define strategies, analyze data, and estimate the final associations between certain genetic markers and traits. The strategy followed for an association study depends on its efficiency and cost. The efficiency is based on the assumed characteristics of the polymorphisms’ allele frequencies and linkage disequilibrium for putative casual alleles. Statistically significant markers (single mutations or haplotypes) that cause a human disorder should be validated and their biological function elucidated. The aim of this chapter is to present a subset of bioinformatics tools for haplotype inference, tag SNP selection, and genome-wide association studies using a high-throughput generated SNP data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Hinds DA, Stuve LL, Nilsen GB, Halperin E, Eskin E, Ballinger DG, Frazer KA, Cox DR. (2005) Whole-genome patterns of common DNA variation in three human populations. Science 307:1072–1079.

    Article  CAS  PubMed  Google Scholar 

  2. The International Haplotype Consortium. (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449:851–862.

    Article  Google Scholar 

  3. The International Haplotype Consortium. (2003) The International HapMap Project. Nature 426:789–796.

    Article  Google Scholar 

  4. The International Haplotype Consortium. (2005) A haplotype map of the human genome. Nature 437:1299–1320.

    Article  Google Scholar 

  5. Gordon D, Finch SJ, Nothnagel M, Ott J. (2002) Power and sample size calculations for case-control genetic association tests when errors are present: application to single nucleotide polymorphisms. Hum Hered 54:22–33.

    Article  PubMed  Google Scholar 

  6. Zhang K, Calabrese P, Nordborg M, Sun F. (2002) Haplotype block structure and its applications to association studies: power and study designs. Am J Hum Genet 71:1386–1394.

    Article  CAS  PubMed  Google Scholar 

  7. Thomas D, Xie R, Gebregziabher M. (2004) Two-stage sampling designs for gene association studies. Genet Epidemiol 27:401–414.

    Article  PubMed  Google Scholar 

  8. Hartl DL, Clark AG. (1997) Principle of Population Genetics, 3rd ed., Sinauer Associates, Inc., Sunderland, MA.

    Google Scholar 

  9. Ribas G, Gonzalez-Neira A, Salas A, Milne RL, Vega A, Carracedo B, Gonzalez E, Barroso E, Fernandez LP, Yankilevich P, et al. (2006) Evaluating HapMap SNP data transferability in a large-scale genotyping project involving 175 cancer-associated genes. Hum Genet 118:669–679.

    Article  CAS  PubMed  Google Scholar 

  10. Huang W, He Y, Wang H, Wang Y, Liu Y, Wang Y, Chu X, Wang Y, Xu L, Shen Y, et al. (2006) Linkage disequilibrium sharing and haplotype-tagged SNP portability between populations. Proc Natl Acad Sci USA 103:1418–1421.

    Article  CAS  PubMed  Google Scholar 

  11. Reynolds J, Weir BS, Cockerham CC. (1983) Estimation of the coancestry coefficient: basis for a short-term genetic distance. Genetics 105:767–779.

    CAS  PubMed  Google Scholar 

  12. Lewontin RC. (1988) On measures of gametic disequilibrium. Genetics 120: 849–852.

    CAS  PubMed  Google Scholar 

  13. Pritchard JK, Przeworski M. (2001) Linkage disequilibrium in humans: models and data. Am J Hum Genet 69: 1–14.

    Article  CAS  PubMed  Google Scholar 

  14. Barrett JC, Fry B, Maller J, Daly MJ. (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21: 263–265.

    Article  CAS  PubMed  Google Scholar 

  15. Cavalli-Sforza LL, Menozzi P, Piazza A. (1994) The History and Geography of Human Genes, Princeton University Press, Princeton, NJ.

    Google Scholar 

  16. Carlson CS, Smith JD, Stanaway IB, Rieder MJ Nickerson DA. (2006) Direct detection of null alleles in SNP genotyping data. Hum Mol Genet 15:1931–1937.

    Article  CAS  PubMed  Google Scholar 

  17. Nielsen DM, Ehm MG, Weir BS. (1998) Detecting marker-disease association by testing for Hardy-Weinberg disequilibrium at a marker locus. Am J Hum Genet 63:1531–1540.

    Article  CAS  PubMed  Google Scholar 

  18. Wittke-Thompson JK, Pluzhnikov A, Cox NJ. (2005) Rational inferences about departures from Hardy-Weinberg equilibrium. Am J Hum Genet 76:967–986.

    Article  CAS  PubMed  Google Scholar 

  19. Conrad DF, Andrews TD, Carter NP, Hurles ME, Pritchard JK. (2006) A high-resolution survey of deletion polymorphism in the human genome. Nat Genet 38: 75–81.

    Article  CAS  PubMed  Google Scholar 

  20. Bailey JA, Eichler EE. (2006) Primate segmental duplications: crucibles of evolution, diversity and disease. Nat Rev Genet 7:552–564.

    Article  CAS  PubMed  Google Scholar 

  21. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559–575.

    Article  CAS  PubMed  Google Scholar 

  22. Armitage P. (1955) Tests for linear trends in proportions and frequencies. Biometrics 11:375–386.

    Article  Google Scholar 

  23. Devlin B, Roeder K. (1999) Genomic control for association studies. Biometrics 55:997–1004.

    Article  CAS  PubMed  Google Scholar 

  24. Pritchard JK, Stephens M, Donnelly P. (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959.

    CAS  PubMed  Google Scholar 

  25. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. (2006) Principal component analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909.

    Article  CAS  PubMed  Google Scholar 

  26. Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins J, DeFelice M, Lochner A, Faggart M, et al. (2002) The structure of haplotype blocks in the human genome. Science 296:2225–2229.

    Article  CAS  PubMed  Google Scholar 

  27. Bonferroni CE. (1936) Teoria statistica delle classi e calcolo delle probabilità [in Italian]. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze 8:3–62.

    Google Scholar 

  28. Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C. (2004) Detection of large-scale variation in the human genome. Nat Genet 36:949–951.

    Article  CAS  PubMed  Google Scholar 

  29. Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, Maner S, Massa H, Walker M, Chi M, et al. (2004) Large-scale copy number polymorphism in the human genome. Science 305:525–528.

    Article  CAS  PubMed  Google Scholar 

  30. Peiffer DA, Le JM, Steemers FJ, Chang W, Jenniges T, Garcia F, Haden K, Li J, Shaw CA, Belmont J, et al. (2006) High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Res 16:1136–1148.

    Article  CAS  PubMed  Google Scholar 

  31. Colella S, Yau C, Taylor JM, Mirza G, Butler H, Clouston P, Bassett AS, Seller A, Holmes CC, Ragoussis J. (2007) QuantiSNP: an objective Bayes hidden-Markov model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res 35:2013–2025.

    Article  CAS  PubMed  Google Scholar 

  32. Wang K, Li M, Hadley D, Liu R, Glessner J, Grant SF, Hakonarson H, Bucan M. (2007) PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res 17:1665–1674.

    Article  CAS  PubMed  Google Scholar 

  33. Baross A, Delaney AD, Li HI, Nayar T, Flibotte S, Qian H, Chan SY, Asano J, Ally A, Cao M, et al. (2007) Assessment of algorithms for high throughput detection of genomic copy number variation in oligonucleotide microarray data. BMC Bioinformatics 8:368.

    Article  PubMed  Google Scholar 

  34. Millstein J, Conti DV, Gilliland FD, Gauderman WJ. (2006) A testing framework for identifying susceptibility genes in the presence of epistasis. Am J Hum Genet 78:15–27.

    Article  CAS  PubMed  Google Scholar 

  35. Lake SL, Lyon H, Tantisira K, Silverman EK, Weiss ST, Laird NM, Schaid DJ. (2003) Estimation and tests of haplotype-environment interaction when linkage phase is ambiguous. Hum Hered 55:56–65.

    Article  CAS  PubMed  Google Scholar 

  36. Hedrick P, Sudhir K. (2001) Mutation and linkage disequilibrium in human mtDNA. Eur J Hum Genet 9:969–972.

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Glossary

Allele

– One of the variant forms of a gene or a genetic locus.

Causative SNPs

– Changes in a single nucleotide that cause a disease or trait.

Coding SNPs (cSNPs)

– SNPs that occur in regions of a gene that are transcribed into RNA (i.e., an exon) and eventually translated into protein. cSNPs include synonymous SNPs (i.e., confer identical amino acid) and nonsynonymous SNPs (i.e., confer different amino acid).

Genetic map

– Also known as a linkage map. A genetic map shows the position of genes and/or markers on chromosomes relative to each other, based on genetic distance (rather than physical distance). The distance between any two markers is represented as a function of recombination.

Genetic marker

– A DNA sequence whose presence or absence can be reliably measured. Because DNA segments that are in close proximity tend to be inherited together, markers can be used to indirectly track the inheritance pattern of a gene or region known to be nearby.

Genotype

– The combination of alleles carried by an individual at a particular genetic locus.

Haplotype

– Haplotypes are an ordered set of alleles located on one chromosome. They reveal whether a chromosomal segment was maternally or paternally inherited and can be used to delineate the boundary of a possible disease-linked locus.

Haplotype tagging SNPs (htSNPs)

– Those SNPs that represent the variation in each block based on the linkage disequilibrium among the markers considered within a block.

Hardy–Weinberg equilibrium (HWE)

– The equilibrium between the frequencies of alleles and the genotype of a population. The occurrence of a genotype stays constant unless mating is nonrandom or inappropriate, or mutations accumulate. Therefore, the frequency of genotypes and the frequency of alleles are said to be at “genetic equilibrium.” Genetic equilibrium is a basic principle of population genetics.

Intronic SNPs

Single-nucleotide polymorphisms that occur in noncoding regions of a gene that separate the exons (i.e., introns).

Linkage disequilibrium (LD)

– Phenomenon by which the alleles that are close together in the genome tend to be inherited together (haplotype).

Linkage map
Mendelian pattern of inheritance

– Refers to the predictable way in which single genes or traits can be passed from parents to children, such as in autosomal dominant, autosomal recessive, or sex-linked patterns.

Minor allele frequency (MAF)

– Given an SNP, its minor allele frequency is the frequency of the SNP’s less frequent allele in a given population.

Mutation

– A change in the DNA sequence. A mutation can be a change from one base to another, a deletion of bases, or an addition of bases. Typically, the term “mutation” is used to refer to a disease-causing change, but technically any change, whether or not it causes a different phenotype, is a mutation.

Penetrance

– Penetrance describes the likelihood that a mutation will cause a phenotype. Some mutations have a high penetrance, almost always causing a phenotype, whereas others have a low penetrance, perhaps only causing a phenotype when other genetic or environmental conditions are present. The best way to measure penetrance is phenotypic concordance in monozygotic twins.

Phenotype

– Visible or detectable traits caused by underlying genetic or environmental factors. Examples include height, weight, blood pressure, and the presence or absence of disease.

Polygenic disorders

– Disorders that are caused by the combined effect of multiple genes, rather than by just one single gene. Most common disorders are polygenic. Because the genes involved are often not located near each other, their inheritance does not usually follow Mendelian patterns in families.

Surrogate SNPs

– Single-nucleotide polymorphisms that do not cause a phenotype but can be used to track one because of their strong physical association (linkage) to an SNP that does cause a phenotype.

Susceptibility

– The likelihood of developing a disease or condition.

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Aransay, A.M., Matthiesen, R., Regueiro, M.M. (2010). SNP-PHAGE: High-Throughput SNP Discovery Pipeline. In: Matthiesen, R. (eds) Bioinformatics Methods in Clinical Research. Methods in Molecular Biology, vol 593. Humana Press. https://doi.org/10.1007/978-1-60327-194-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-60327-194-3_3

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-60327-193-6

  • Online ISBN: 978-1-60327-194-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics