Skip to main content
Top
Published in: BMC Medical Genetics 1/2019

Open Access 01-12-2019 | Non-Hodgkin Lymphoma | Technical advance

A multivariable approach for risk markers from pooled molecular data with only partial overlap

Authors: Anne-Sophie Stelzer, Livia Maccioni, Aslihan Gerhold-Ay, Karin E. Smedby, Martin Schumacher, Alexandra Nieters, Harald Binder

Published in: BMC Medical Genetics | Issue 1/2019

Login to get access

Abstract

Background

Increasingly, molecular measurements from multiple studies are pooled to identify risk scores, with only partial overlap of measurements available from different studies. Univariate analyses of such markers have routinely been performed in such settings using meta-analysis techniques in genome-wide association studies for identifying genetic risk scores. In contrast, multivariable techniques such as regularized regression, which might potentially be more powerful, are hampered by only partial overlap of available markers even when the pooling of individual level data is feasible for analysis. This cannot easily be addressed at a preprocessing level, as quality criteria in the different studies may result in differential availability of markers – even after imputation.

Methods

Motivated by data from the InterLymph Consortium on risk factors for non-Hodgkin lymphoma, which exhibits these challenges, we adapted a regularized regression approach, componentwise boosting, for dealing with partial overlap in SNPs. This synthesis regression approach is combined with resampling to determine stable sets of single nucleotide polymorphisms, which could feed into a genetic risk score. The proposed approach is contrasted with univariate analyses, an application of the lasso, and with an analysis that discards studies causing the partial overlap. The question of statistical significance is faced with an approach called stability selection.

Results

Using an excerpt of the data from the InterLymph Consortium on two specific subtypes of non-Hodgkin lymphoma, it is shown that componentwise boosting can take into account all applicable information from different SNPs, irrespective of whether they are covered by all investigated studies and for all individuals in the single studies. The results indicate increased power, even when studies that would be discarded in a complete case analysis only comprise a small proportion of individuals.

Conclusions

Given the observed gains in power, the proposed approach can be recommended more generally whenever there is only partial overlap of molecular measurements obtained from pooled studies and/or missing data in single studies. A corresponding software implementation is available upon request.

Trial registration

All involved studies have provided signed GWAS data submission certifications to the U.S. National Institute of Health and have been retrospectively registered.
Appendix
Available only for authorised users
Literature
1.
go back to reference Fuchsberger C, Flannick J, Teslovich TM, Mahajan A, Agarwala V, Gaulton KJ, Ma C, Fontanillas P, Moutsianas L, McCarthy DJ, et al. The genetic architecture of type 2 diabetes. Nature. 2016; 536:41–47.CrossRef Fuchsberger C, Flannick J, Teslovich TM, Mahajan A, Agarwala V, Gaulton KJ, Ma C, Fontanillas P, Moutsianas L, McCarthy DJ, et al. The genetic architecture of type 2 diabetes. Nature. 2016; 536:41–47.CrossRef
2.
go back to reference Berndt SI, Camp NJ, Skibola CF, Vijai J, Wang Z, Gu J, Nieters A, Kelly RS, Smedby KE, Monnereau A, et al. Meta-analysis of genome-wide association studies discovers multiple loci for chronic lymphocytic leukemia. Nat Commun. 2016; 7:1–9. Berndt SI, Camp NJ, Skibola CF, Vijai J, Wang Z, Gu J, Nieters A, Kelly RS, Smedby KE, Monnereau A, et al. Meta-analysis of genome-wide association studies discovers multiple loci for chronic lymphocytic leukemia. Nat Commun. 2016; 7:1–9.
3.
go back to reference Berndt SI, Skibola CF, Joseph V, Camp NJ, Nieters A, Wang Z, Cozen W, Monnereau A, Wang SS, Kelly RS, et al. Genome-wide association study identifies multiple risk loci for chronic lymphocytic leukemia. Nat Genet. 2013; 45(8):868–76.CrossRef Berndt SI, Skibola CF, Joseph V, Camp NJ, Nieters A, Wang Z, Cozen W, Monnereau A, Wang SS, Kelly RS, et al. Genome-wide association study identifies multiple risk loci for chronic lymphocytic leukemia. Nat Genet. 2013; 45(8):868–76.CrossRef
4.
go back to reference Cerhan JR, Berndt SI, Vijai J, Ghesquières H, McKay J, Wang SS, Wang Z, Yeager M, Conde L, De Bakker PI, et al. Genome-wide association study identifies multiple susceptibility loci for diffuse large b cell lymphoma. Nat Genet. 2014; 46(11):1233–8.CrossRef Cerhan JR, Berndt SI, Vijai J, Ghesquières H, McKay J, Wang SS, Wang Z, Yeager M, Conde L, De Bakker PI, et al. Genome-wide association study identifies multiple susceptibility loci for diffuse large b cell lymphoma. Nat Genet. 2014; 46(11):1233–8.CrossRef
5.
go back to reference Machiela MJ, Lan Q, Slager SL, Vermeulen RC, Teras LR, Camp NJ, Cerhan JR, Spinelli JJ, Wang SS, Nieters A, et al. Genetically predicted longer telomere length is associated with increased risk of b-cell lymphoma subtypes. Hum Mol Genet. 2016; 25(8):1663–1676.CrossRef Machiela MJ, Lan Q, Slager SL, Vermeulen RC, Teras LR, Camp NJ, Cerhan JR, Spinelli JJ, Wang SS, Nieters A, et al. Genetically predicted longer telomere length is associated with increased risk of b-cell lymphoma subtypes. Hum Mol Genet. 2016; 25(8):1663–1676.CrossRef
6.
go back to reference Sampson JN, Wheeler WA, Yeager M, Panagiotou O, Wang Z, Berndt SI, Lan Q, Abnet CC, Amundadottir LT, Figueroa JD, et al. Analysis of heritability and shared heritability based on genome-wide association studies for 13 cancer types. J Natl Canc Inst. 2015; 107(12):279.CrossRef Sampson JN, Wheeler WA, Yeager M, Panagiotou O, Wang Z, Berndt SI, Lan Q, Abnet CC, Amundadottir LT, Figueroa JD, et al. Analysis of heritability and shared heritability based on genome-wide association studies for 13 cancer types. J Natl Canc Inst. 2015; 107(12):279.CrossRef
7.
go back to reference Skibola CF, Berndt SI, Vijai J, Conde L, Wang Z, Yeager M, De Bakker PI, Birmann BM, Vajdic CM, Foo J-N, et al. Genome-wide association study identifies five susceptibility loci for follicular lymphoma outside the hla region. Am J Hum Genet. 2014; 95(4):462–71.CrossRef Skibola CF, Berndt SI, Vijai J, Conde L, Wang Z, Yeager M, De Bakker PI, Birmann BM, Vajdic CM, Foo J-N, et al. Genome-wide association study identifies five susceptibility loci for follicular lymphoma outside the hla region. Am J Hum Genet. 2014; 95(4):462–71.CrossRef
8.
go back to reference Vijai J, Wang Z, Berndt SI, Skibola CF, Slager SL, De Sanjose S, Melbye M, Glimelius B, Bracci PM, Conde L, et al. A genome-wide association study of marginal zone lymphoma shows association to the hla region. Nat Commun. 2015; 6:1–7.CrossRef Vijai J, Wang Z, Berndt SI, Skibola CF, Slager SL, De Sanjose S, Melbye M, Glimelius B, Bracci PM, Conde L, et al. A genome-wide association study of marginal zone lymphoma shows association to the hla region. Nat Commun. 2015; 6:1–7.CrossRef
9.
go back to reference Wang SS, Vajdic CM, Linet MS, Slager SL, Voutsinas J, Nieters A, De Sanjose S, Cozen W, Alarcón GS, Martinez-Maza O, et al. Associations of non-hodgkin lymphoma (nhl) risk with autoimmune conditions according to putative nhl loci. Am J Epidemiol. 2015; 181(6):406–21.CrossRef Wang SS, Vajdic CM, Linet MS, Slager SL, Voutsinas J, Nieters A, De Sanjose S, Cozen W, Alarcón GS, Martinez-Maza O, et al. Associations of non-hodgkin lymphoma (nhl) risk with autoimmune conditions according to putative nhl loci. Am J Epidemiol. 2015; 181(6):406–21.CrossRef
10.
go back to reference Silver M, Montana G, null Alzheimer’s Disease Neuroimaging Initiative, et al. Fast identification of biological pathways associated with a quantitative trait using group lasso with overlaps. Stat Appl Genet Mol Biol. 2012; 11(1):1–43.CrossRef Silver M, Montana G, null Alzheimer’s Disease Neuroimaging Initiative, et al. Fast identification of biological pathways associated with a quantitative trait using group lasso with overlaps. Stat Appl Genet Mol Biol. 2012; 11(1):1–43.CrossRef
11.
go back to reference Zhou X, Carbonetto P, Stephens M. Polygenic modeling with bayesian sparse linear mixed models. PLoS Genet. 2013; 9(2):1003264.CrossRef Zhou X, Carbonetto P, Stephens M. Polygenic modeling with bayesian sparse linear mixed models. PLoS Genet. 2013; 9(2):1003264.CrossRef
12.
go back to reference Kooperberg C, LeBlanc M, Obenchain V. Risk prediction using genome-wide association studies. Gene Epidemiol. 2010; 34(7):643–52.CrossRef Kooperberg C, LeBlanc M, Obenchain V. Risk prediction using genome-wide association studies. Gene Epidemiol. 2010; 34(7):643–52.CrossRef
13.
go back to reference Binder H, Benner A, Bullinger L, Schumacher M. Tailoring sparse multivariable regression techniques for prognostic single-nucleotide polymorphism signatures. Stat Med. 2013; 32(10):1778–91.CrossRef Binder H, Benner A, Bullinger L, Schumacher M. Tailoring sparse multivariable regression techniques for prognostic single-nucleotide polymorphism signatures. Stat Med. 2013; 32(10):1778–91.CrossRef
14.
go back to reference Hieke S, Benner A, Schlenk RF, Schumacher M, Bullinger L, Binder H. Identifying prognostic snps in clinical cohorts: Complementing univariate analyses by resampling and multivariable modeling. PloS one. 2016; 11(5):0155226.CrossRef Hieke S, Benner A, Schlenk RF, Schumacher M, Bullinger L, Binder H. Identifying prognostic snps in clinical cohorts: Complementing univariate analyses by resampling and multivariable modeling. PloS one. 2016; 11(5):0155226.CrossRef
15.
go back to reference Ayers KL, Cordell HJ. Snp selection in genome-wide and candidate gene studies via penalized logistic regression. Genet Epidemiol. 2010; 34(8):879–91.CrossRef Ayers KL, Cordell HJ. Snp selection in genome-wide and candidate gene studies via penalized logistic regression. Genet Epidemiol. 2010; 34(8):879–91.CrossRef
16.
go back to reference Meinshausen N, Bühlmann P. Stability selection. J Royal Stat Soc: Ser B (Stat Methodol). 2010; 72(4):417–73.CrossRef Meinshausen N, Bühlmann P. Stability selection. J Royal Stat Soc: Ser B (Stat Methodol). 2010; 72(4):417–73.CrossRef
17.
go back to reference Lockhart R, Taylor J, Tibshirani RJ, Tibshirani R. A significance test for the lasso. Ann Stat. 2014; 42(2):413.CrossRef Lockhart R, Taylor J, Tibshirani RJ, Tibshirani R. A significance test for the lasso. Ann Stat. 2014; 42(2):413.CrossRef
18.
go back to reference Hieke S, Benner A, Schlenk RF, Schumacher M, Bullinger L, Binder H. Integrating multiple molecular sources into a clinical risk prediction signature by extracting complementary information. BMC Bioinformatics. 2016; 17(1):327.CrossRef Hieke S, Benner A, Schlenk RF, Schumacher M, Bullinger L, Binder H. Integrating multiple molecular sources into a clinical risk prediction signature by extracting complementary information. BMC Bioinformatics. 2016; 17(1):327.CrossRef
19.
go back to reference Sheng E, Zhou XH, Chen H, Hu G, Duncan A. A new synthesis analysis method for building logistic regression prediction models. Stat Med. 2014; 33(15):2567–76.CrossRef Sheng E, Zhou XH, Chen H, Hu G, Duncan A. A new synthesis analysis method for building logistic regression prediction models. Stat Med. 2014; 33(15):2567–76.CrossRef
20.
go back to reference Efron B, Hastie T, Johnstone I, Tibshirani R, et al. Least angle regression. Ann Stat. 2004; 32(2):407–499.CrossRef Efron B, Hastie T, Johnstone I, Tibshirani R, et al. Least angle regression. Ann Stat. 2004; 32(2):407–499.CrossRef
21.
go back to reference Tutz G, Binder H. Boosting ridge regression. Comput Stat & Data Anal. 2007; 51(12):6044–59.CrossRef Tutz G, Binder H. Boosting ridge regression. Comput Stat & Data Anal. 2007; 51(12):6044–59.CrossRef
22.
go back to reference Tutz G, Binder H. Generalized additive modeling with implicit variable selection by likelihood-based boosting. Biom. 2006; 62(4):961–71.CrossRef Tutz G, Binder H. Generalized additive modeling with implicit variable selection by likelihood-based boosting. Biom. 2006; 62(4):961–71.CrossRef
23.
go back to reference Binder H, Schumacher M. Adapting prediction error estimates for biased complexity selection in high-dimensional bootstrap samples. Stat Appl Genet Mole Biol. 2008; 7(1):1–26. Binder H, Schumacher M. Adapting prediction error estimates for biased complexity selection in high-dimensional bootstrap samples. Stat Appl Genet Mole Biol. 2008; 7(1):1–26.
24.
go back to reference Vangimalla RR, Jeong H-H, Sohn K-A. Integrative regression network for genomic association study. BMC Med Genomics. 2016; 9(1):31.CrossRef Vangimalla RR, Jeong H-H, Sohn K-A. Integrative regression network for genomic association study. BMC Med Genomics. 2016; 9(1):31.CrossRef
25.
go back to reference Hastie T, Taylor J, Tibshirani R, Walther G, et al. Forward stagewise regression and the monotone lasso. Electron J Stat. 2007; 1:1–29.CrossRef Hastie T, Taylor J, Tibshirani R, Walther G, et al. Forward stagewise regression and the monotone lasso. Electron J Stat. 2007; 1:1–29.CrossRef
Metadata
Title
A multivariable approach for risk markers from pooled molecular data with only partial overlap
Authors
Anne-Sophie Stelzer
Livia Maccioni
Aslihan Gerhold-Ay
Karin E. Smedby
Martin Schumacher
Alexandra Nieters
Harald Binder
Publication date
01-12-2019
Publisher
BioMed Central
Published in
BMC Medical Genetics / Issue 1/2019
Electronic ISSN: 1471-2350
DOI
https://doi.org/10.1186/s12881-019-0849-0

Other articles of this Issue 1/2019

BMC Medical Genetics 1/2019 Go to the issue