SNP-PHAGE: High-Throughput SNP Discovery Pipeline

Aransay, Ana M.; Matthiesen, Rune; Regueiro, Manuela M.

doi:10.1007/978-1-60327-194-3_3

Ana M. Aransay²,
Rune Matthiesen³ &
Manuela M. Regueiro⁴

Part of the book series: Methods in Molecular Biology ((MIMB,volume 593))

3269 Accesses
1 Citations

Abstract

High-throughput genotyping technologies have become popular in studies that aim to reveal the genetics behind polygenic traits such as complex disease and the diverse response to some drug treatments. These technologies utilize bioinformatics tools to define strategies, analyze data, and estimate the final associations between certain genetic markers and traits. The strategy followed for an association study depends on its efficiency and cost. The efficiency is based on the assumed characteristics of the polymorphisms’ allele frequencies and linkage disequilibrium for putative casual alleles. Statistically significant markers (single mutations or haplotypes) that cause a human disorder should be validated and their biological function elucidated. The aim of this chapter is to present a subset of bioinformatics tools for haplotype inference, tag SNP selection, and genome-wide association studies using a high-throughput generated SNP data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 159.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multi-marker-LD based genetic algorithm for tag SNP selection

Article 09 August 2014

Producing High-Quality Single Nucleotide Polymorphism Data for Genome-Wide Association Studies

High-Throughput SNP Genotyping

References

Hinds DA, Stuve LL, Nilsen GB, Halperin E, Eskin E, Ballinger DG, Frazer KA, Cox DR. (2005) Whole-genome patterns of common DNA variation in three human populations. Science 307:1072–1079.
Article CAS PubMed Google Scholar
The International Haplotype Consortium. (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449:851–862.
Article Google Scholar
The International Haplotype Consortium. (2003) The International HapMap Project. Nature 426:789–796.
Article Google Scholar
The International Haplotype Consortium. (2005) A haplotype map of the human genome. Nature 437:1299–1320.
Article Google Scholar
Gordon D, Finch SJ, Nothnagel M, Ott J. (2002) Power and sample size calculations for case-control genetic association tests when errors are present: application to single nucleotide polymorphisms. Hum Hered 54:22–33.
Article PubMed Google Scholar
Zhang K, Calabrese P, Nordborg M, Sun F. (2002) Haplotype block structure and its applications to association studies: power and study designs. Am J Hum Genet 71:1386–1394.
Article CAS PubMed Google Scholar
Thomas D, Xie R, Gebregziabher M. (2004) Two-stage sampling designs for gene association studies. Genet Epidemiol 27:401–414.
Article PubMed Google Scholar
Hartl DL, Clark AG. (1997) Principle of Population Genetics, 3rd ed., Sinauer Associates, Inc., Sunderland, MA.
Google Scholar
Ribas G, Gonzalez-Neira A, Salas A, Milne RL, Vega A, Carracedo B, Gonzalez E, Barroso E, Fernandez LP, Yankilevich P, et al. (2006) Evaluating HapMap SNP data transferability in a large-scale genotyping project involving 175 cancer-associated genes. Hum Genet 118:669–679.
Article CAS PubMed Google Scholar
Huang W, He Y, Wang H, Wang Y, Liu Y, Wang Y, Chu X, Wang Y, Xu L, Shen Y, et al. (2006) Linkage disequilibrium sharing and haplotype-tagged SNP portability between populations. Proc Natl Acad Sci USA 103:1418–1421.
Article CAS PubMed Google Scholar
Reynolds J, Weir BS, Cockerham CC. (1983) Estimation of the coancestry coefficient: basis for a short-term genetic distance. Genetics 105:767–779.
CAS PubMed Google Scholar
Lewontin RC. (1988) On measures of gametic disequilibrium. Genetics 120: 849–852.
CAS PubMed Google Scholar
Pritchard JK, Przeworski M. (2001) Linkage disequilibrium in humans: models and data. Am J Hum Genet 69: 1–14.
Article CAS PubMed Google Scholar
Barrett JC, Fry B, Maller J, Daly MJ. (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21: 263–265.
Article CAS PubMed Google Scholar
Cavalli-Sforza LL, Menozzi P, Piazza A. (1994) The History and Geography of Human Genes, Princeton University Press, Princeton, NJ.
Google Scholar
Carlson CS, Smith JD, Stanaway IB, Rieder MJ Nickerson DA. (2006) Direct detection of null alleles in SNP genotyping data. Hum Mol Genet 15:1931–1937.
Article CAS PubMed Google Scholar
Nielsen DM, Ehm MG, Weir BS. (1998) Detecting marker-disease association by testing for Hardy-Weinberg disequilibrium at a marker locus. Am J Hum Genet 63:1531–1540.
Article CAS PubMed Google Scholar
Wittke-Thompson JK, Pluzhnikov A, Cox NJ. (2005) Rational inferences about departures from Hardy-Weinberg equilibrium. Am J Hum Genet 76:967–986.
Article CAS PubMed Google Scholar
Conrad DF, Andrews TD, Carter NP, Hurles ME, Pritchard JK. (2006) A high-resolution survey of deletion polymorphism in the human genome. Nat Genet 38: 75–81.
Article CAS PubMed Google Scholar
Bailey JA, Eichler EE. (2006) Primate segmental duplications: crucibles of evolution, diversity and disease. Nat Rev Genet 7:552–564.
Article CAS PubMed Google Scholar
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559–575.
Article CAS PubMed Google Scholar
Armitage P. (1955) Tests for linear trends in proportions and frequencies. Biometrics 11:375–386.
Article Google Scholar
Devlin B, Roeder K. (1999) Genomic control for association studies. Biometrics 55:997–1004.
Article CAS PubMed Google Scholar
Pritchard JK, Stephens M, Donnelly P. (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959.
CAS PubMed Google Scholar
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. (2006) Principal component analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909.
Article CAS PubMed Google Scholar
Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins J, DeFelice M, Lochner A, Faggart M, et al. (2002) The structure of haplotype blocks in the human genome. Science 296:2225–2229.
Article CAS PubMed Google Scholar
Bonferroni CE. (1936) Teoria statistica delle classi e calcolo delle probabilità [in Italian]. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze 8:3–62.
Google Scholar
Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C. (2004) Detection of large-scale variation in the human genome. Nat Genet 36:949–951.
Article CAS PubMed Google Scholar
Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, Maner S, Massa H, Walker M, Chi M, et al. (2004) Large-scale copy number polymorphism in the human genome. Science 305:525–528.
Article CAS PubMed Google Scholar
Peiffer DA, Le JM, Steemers FJ, Chang W, Jenniges T, Garcia F, Haden K, Li J, Shaw CA, Belmont J, et al. (2006) High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Res 16:1136–1148.
Article CAS PubMed Google Scholar
Colella S, Yau C, Taylor JM, Mirza G, Butler H, Clouston P, Bassett AS, Seller A, Holmes CC, Ragoussis J. (2007) QuantiSNP: an objective Bayes hidden-Markov model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res 35:2013–2025.
Article CAS PubMed Google Scholar
Wang K, Li M, Hadley D, Liu R, Glessner J, Grant SF, Hakonarson H, Bucan M. (2007) PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res 17:1665–1674.
Article CAS PubMed Google Scholar
Baross A, Delaney AD, Li HI, Nayar T, Flibotte S, Qian H, Chan SY, Asano J, Ally A, Cao M, et al. (2007) Assessment of algorithms for high throughput detection of genomic copy number variation in oligonucleotide microarray data. BMC Bioinformatics 8:368.
Article PubMed Google Scholar
Millstein J, Conti DV, Gilliland FD, Gauderman WJ. (2006) A testing framework for identifying susceptibility genes in the presence of epistasis. Am J Hum Genet 78:15–27.
Article CAS PubMed Google Scholar
Lake SL, Lyon H, Tantisira K, Silverman EK, Weiss ST, Laird NM, Schaid DJ. (2003) Estimation and tests of haplotype-environment interaction when linkage phase is ambiguous. Hum Hered 55:56–65.
Article CAS PubMed Google Scholar
Hedrick P, Sudhir K. (2001) Mutation and linkage disequilibrium in human mtDNA. Eur J Hum Genet 9:969–972.
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Functional Genomics Unit, Parque Technológico de Bizkaia, Derio, Spain
Ana M. Aransay
Instituto de Patologia e Imunologia Molecular da Universidad do Porto – IPATIMUP, Porto, Portugal
Rune Matthiesen
Department of Biological Sciences, Florida International University, Miami, FL, USA
Manuela M. Regueiro

Authors

Ana M. Aransay
View author publications
You can also search for this author in PubMed Google Scholar
Rune Matthiesen
View author publications
You can also search for this author in PubMed Google Scholar
Manuela M. Regueiro
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Inst. Patologia e Imunologia Molecular, Universidade do Porto, Rua Dr. Roberto Frias s/n, Porto, 4200-465, Portugal
Rune Matthiesen

Glossary

Allele: – One of the variant forms of a gene or a genetic locus.
Causative SNPs: – Changes in a single nucleotide that cause a disease or trait.
Coding SNPs (cSNPs): – SNPs that occur in regions of a gene that are transcribed into RNA (i.e., an exon) and eventually translated into protein. cSNPs include synonymous SNPs (i.e., confer identical amino acid) and nonsynonymous SNPs (i.e., confer different amino acid).
Genetic map: – Also known as a linkage map. A genetic map shows the position of genes and/or markers on chromosomes relative to each other, based on genetic distance (rather than physical distance). The distance between any two markers is represented as a function of recombination.
Genetic marker: – A DNA sequence whose presence or absence can be reliably measured. Because DNA segments that are in close proximity tend to be inherited together, markers can be used to indirectly track the inheritance pattern of a gene or region known to be nearby.
Genotype: – The combination of alleles carried by an individual at a particular genetic locus.
Haplotype: – Haplotypes are an ordered set of alleles located on one chromosome. They reveal whether a chromosomal segment was maternally or paternally inherited and can be used to delineate the boundary of a possible disease-linked locus.
Haplotype tagging SNPs (htSNPs): – Those SNPs that represent the variation in each block based on the linkage disequilibrium among the markers considered within a block.
Hardy–Weinberg equilibrium (HWE): – The equilibrium between the frequencies of alleles and the genotype of a population. The occurrence of a genotype stays constant unless mating is nonrandom or inappropriate, or mutations accumulate. Therefore, the frequency of genotypes and the frequency of alleles are said to be at “genetic equilibrium.” Genetic equilibrium is a basic principle of population genetics.
Intronic SNPs–: Single-nucleotide polymorphisms that occur in noncoding regions of a gene that separate the exons (i.e., introns).
Linkage disequilibrium (LD): – Phenomenon by which the alleles that are close together in the genome tend to be inherited together (haplotype).
Linkage map
Mendelian pattern of inheritance: – Refers to the predictable way in which single genes or traits can be passed from parents to children, such as in autosomal dominant, autosomal recessive, or sex-linked patterns.
Minor allele frequency (MAF): – Given an SNP, its minor allele frequency is the frequency of the SNP’s less frequent allele in a given population.
Mutation: – A change in the DNA sequence. A mutation can be a change from one base to another, a deletion of bases, or an addition of bases. Typically, the term “mutation” is used to refer to a disease-causing change, but technically any change, whether or not it causes a different phenotype, is a mutation.
Penetrance: – Penetrance describes the likelihood that a mutation will cause a phenotype. Some mutations have a high penetrance, almost always causing a phenotype, whereas others have a low penetrance, perhaps only causing a phenotype when other genetic or environmental conditions are present. The best way to measure penetrance is phenotypic concordance in monozygotic twins.
Phenotype: – Visible or detectable traits caused by underlying genetic or environmental factors. Examples include height, weight, blood pressure, and the presence or absence of disease.
Polygenic disorders: – Disorders that are caused by the combined effect of multiple genes, rather than by just one single gene. Most common disorders are polygenic. Because the genes involved are often not located near each other, their inheritance does not usually follow Mendelian patterns in families.
Surrogate SNPs: – Single-nucleotide polymorphisms that do not cause a phenotype but can be used to track one because of their strong physical association (linkage) to an SNP that does cause a phenotype.
Susceptibility: – The likelihood of developing a disease or condition.

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Aransay, A.M., Matthiesen, R., Regueiro, M.M. (2010). SNP-PHAGE: High-Throughput SNP Discovery Pipeline. In: Matthiesen, R. (eds) Bioinformatics Methods in Clinical Research. Methods in Molecular Biology, vol 593. Humana Press. https://doi.org/10.1007/978-1-60327-194-3_3

Download citation

DOI: https://doi.org/10.1007/978-1-60327-194-3_3
Published: 06 November 2009
Publisher Name: Humana Press
Print ISBN: 978-1-60327-193-6
Online ISBN: 978-1-60327-194-3
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

SNP-PHAGE: High-Throughput SNP Discovery Pipeline

Abstract

Access this chapter

Similar content being viewed by others

Multi-marker-LD based genetic algorithm for tag SNP selection

Producing High-Quality Single Nucleotide Polymorphism Data for Genome-Wide Association Studies

High-Throughput SNP Genotyping

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Glossary

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Navigation

SNP-PHAGE: High-Throughput SNP Discovery Pipeline

Abstract

Access this chapter

Similar content being viewed by others

Multi-marker-LD based genetic algorithm for tag SNP selection

Producing High-Quality Single Nucleotide Polymorphism Data for Genome-Wide Association Studies

High-Throughput SNP Genotyping

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Glossary

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Search

Navigation