Top

BMC Medical Research Methodology

Published in:

Open Access 01-12-2017 | Research article

Addressing data privacy in matched studies via virtual pooling

Authors: P. Saha-Chaudhuri, C.R. Weinberg

Published in: BMC Medical Research Methodology | Issue 1/2017

Abstract

Background

Data confidentiality and shared use of research data are two desirable but sometimes conflicting goals in research with multi-center studies and distributed data. While ideal for straightforward analysis, confidentiality restrictions forbid creation of a single dataset that includes covariate information of all participants. Current approaches such as aggregate data sharing, distributed regression, meta-analysis and score-based methods can have important limitations.

Methods

We propose a novel application of an existing epidemiologic tool, specimen pooling, to enable confidentiality-preserving analysis of data arising from a matched case-control, multi-center design. Instead of pooling specimens prior to assay, we apply the methodology to virtually pool (aggregate) covariates within nodes. Such virtual pooling retains most of the information used in an analysis with individual data and since individual participant data is not shared externally, within-node virtual pooling preserves data confidentiality. We show that aggregated covariate levels can be used in a conditional logistic regression model to estimate individual-level odds ratios of interest.

Results

The parameter estimates from the standard conditional logistic regression are compared to the estimates based on a conditional logistic regression model with aggregated data. The parameter estimates are shown to be similar to those without pooling and to have comparable standard errors and confidence interval coverage.

Conclusions

Virtual data pooling can be used to maintain confidentiality of data from multi-center study and can be particularly useful in research with large-scale distributed data.

We have used the clogit function in the programming language R to fit a conditional logistic regression model to data from a matched case control design.

All simulations and statistical analyses were performed using the programming language R. The built-in R function “clogit” was used to estimate parameters of eq. (4) and eq. (5). R codes are available upon request from the corresponding author.

Available from: https://sites.google.com/site/paramitasaharesearch/data/

Fears R, Brand H, Frackowiak R, Pastoret PP, Souhami R, Thompson B. Data protection regulation and the promotion of health research: getting the balance right. QJM. 2014;107(1):3–5.CrossRefPubMed

Dolgin, E.: ‘New data protection rules could harm research, science groups say’, Nature Medicine, 2014, 20, (3), pp. 224.

Mostert, M., Bredenoord, A.L., Biesaart, M.C.I.H., and van Delden, J.J.M.: ‘Big data in medical research and EU data protection law: challenges to the consent or anonymise approach’, Eur J Hum Genet, 2015.

Nyrén O, Stenbeck M, Grönberg H. The European Parliament proposal for the new EU general data protection regulation may severely restrict European epidemiological research. Eur J Epidemiol. 2014;29(4):227–30.CrossRefPubMedPubMedCentral

Olsen J. Data protection and epidemiological research: a new EU regulation is in the pipeline. Int J Epidemiol. 2014;43(5):1353–4.CrossRefPubMed

Ploem MC, Essink-Bot ML, Stronks K. Proposed EU data protection regulation is a threat to medical research. BMJ. 2013;346

Rosano G, Pelliccia F, Gaudio C, Coats AJ. The challenge of performing effective medical research in the era of healthcare data protection. Int J Cardiol. 2014;177(2):510–1.CrossRefPubMed

Vandenbroucke JP, Olsen J. Informed consent and the new EU regulation on data protection. Int J Epidemiol. 2013;42(6):1891–2.CrossRefPubMed

Wartenberg D, Thompson WD. Privacy versus public health: the impact of current confidentiality rules. Am J Public Health. 2010;100(3):407–12.CrossRefPubMedPubMedCentral

10.

Rumbold, J.M.M., and Pierscionek, B.: ‘The Effect of the General Data Protection Regulation on Medical Research’, J Med Internet Res, 2017, 19, (2), pp. e47.

11.

Suissa, S., Henry, D., Caetano, P., Dormuth, C.R., Ernst, P., Hemmelgarn, B., Lelorier, J., Levy, A., Martens, P.J., Paterson, J.M., Platt, R.W., Sketris, I., and Teare, G.: ‘CNODES: the Canadian Network for Observational Drug Effect Studies.’, Open Med, 2012, 6, (4), pp. ~e134–140.

12.

http://www.wired.com/2009/12/netflix-privacy-lawsuit/, accessed March 3 2016.

13.

Homer, N., Szelinger, S., Redman, M., Duggan, D., Tembe, W., Muehling, J., Pearson, J.V., Stephan, D.A., Nelson, S.F., and Craig, D.W.: ‘Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays’, PLoS Genet, 2008, 4, (8), pp. e1000167 (1000161–1000169).

14.

Karr, A.F., Fulp, W.J., Vera, F., Young, S.S., Lin, X., and Reiter, J.P.: ‘Secure, privacy-preserving analysis of distributed databases’, Technometrics, 2007, 49, pp. ~335–345.

15.

Karr, A.F., Lin, X., Sanil, A.P., and Reiter, J.P.: ‘Secure Regression on Distributed Databases’, Journal of Computational and Graphical Statistics, 2005, 14, (2), pp. ~263–279.

16.

Raghunathan, T.E., Reiter, J.P., and and Rubin, D.B.: ‘Multiple imputation for statistical disclosure Limitation’, Journal of Official Statistics, 2003, 19, (1), pp. ~1–16.

17.

Rassen, J.A., Moran, J., Toh, D., Kowal, M.K., Johnson, K., Shoabi, A., Hammad, T.A., Raebel, M.A., Holmes, J.H., Haynes, K., Myers, J., and Schneeweiss, S.: ‘Evaluating strategies for data sharing and analyses in distributed data settings’, in Editor (Ed.)^(Eds.): ‘Book Evaluating strategies for data sharing and analyses in distributed data settings’ (2010, edn.), pp.

18.

Walker E, Hernandez AV, Kattan MW. Meta-analysis: its strengths and limitations. Cleve Clin J Med. 2008;75(6):431–9.CrossRefPubMed

19.

Greco T, Zangrillo A, Biondi-Zoccai G, Landoni G. Meta-analysis: pitfalls and hints. Heart Lung Vessel. 2013;5(4):219–25.PubMedPubMedCentral

20.

Ng TT, McGory ML, Ko CY, Maggard MA. Meta-analysis in surgery: methods and limitations. Arch Surg. 2006;141(11):1125–30. discussion 1131CrossRefPubMed

21.

Burke DL, Ensor J, Riley RD. Meta-analysis using individual participant data: one-stage and two-stage approaches, and why they may differ. Stat Med. 2017;36(5):855–75.CrossRefPubMed

22.

Debray TP, Moons KG, Ahmed I, Koffijberg H, Riley RD. A framework for developing, implementing, and evaluating clinical prediction models in an individual participant data meta-analysis. Stat Med. 2013;32(18):3158–80.CrossRefPubMed

23.

Hua H, Burke DL, Crowther MJ, Ensor J, Tudur Smith C, Riley RD. One-stage individual participant data meta-analysis models: estimation of treatment-covariate interactions must avoid ecological bias by separating out within-trial and across-trial information. Stat Med. 2017;36(5):772–89.CrossRefPubMed

24.

Liu D, Liu RY, Xie M. Multivariate meta-analysis of heterogeneous studies using only summary statistics: efficiency and robustness. J Am Stat Assoc. 2015;110(509):326–40.CrossRefPubMed

25.

Saha-Chaudhuri, P.: ‘Covariate microaggregation for logistic regression: an application for analysis of confidential data’, Pre-print, 2016.

26.

Hosmer, D.W., Lemeshow, S., and Sturdivant, R.X.: ‘Applied Logistic Regression’ (John Wiley & Sons Inc., New York, NY, 2013, Third edn. 2013).

27.

Dorfman R. The detection of defective members of large populations. Ann Math Stat. 1943;14:436–40.CrossRef

28.

Kline RL, Brothers TA, Brookmeyer R, Zeger S, Quinn TC. Evaluation of human immunodeficiency virus seroprevalence in population surveys using pooled sera. J Clin Microbiol. 1989;27(7):1449–52.PubMedPubMedCentral

29.

Weinberg CR, Umbach DM. Using pooled exposure assessment to improve efficiency in case-control studies. Biometrics. 1999;55(3):718–26.CrossRefPubMed

30.

Saha-Chaudhuri P, Umbach DM, Weinberg CR. Pooled exposure assessment for matched case-control studies. Epidemiology. 2011;22(5):704–12.CrossRefPubMedPubMedCentral

31.

Saha-Chaudhuri P, Weinberg CR. Specimen pooling for efficient use of bio-specimens in studies of time to a common event. Am J Epidemiol. 2013;178:126–35.CrossRefPubMedPubMedCentral

32.

Weinberg, C.R., and Umbach, D.M.: ‘Correction to “Using Pooled Exposure Assessment to Improve Efficiency in Case–Control Studies,” by Clarice R. Weinberg and David M. Umbach; 55, 718–726, September 1999’, Biometrics, 2014, 70, (4), pp. 1061–1061.

33.

Yu, O.H.Y., Filion, K.B., Azoulay, L., Patenaude, V., Majdan, A., and Suissa, S.: ‘Incretin-Based Drugs and the Risk of Congestive Heart Failure’, Diabetes Care, 2015, 38, (2), pp. ~277–284.

34.

Herrett, E., Gallagher, A.M., Bhaskaran, K., Forbes, H., Mathur, R., van Staa, T., and Smeeth, L.: ‘Data resource profile: clinical practice research Datalink (CPRD)’, International Journal of Epidemiology, 2015.

35.

Wolfson, M., Wallace, S.E., Masca, N., Rowe, G., Sheehan, N.A., Ferretti, V., LaFlamme, P., Tobin, M.D., Macleod, J., Little, J., Fortier, I., Knoppers, B.M., and Burton, P.R.: ‘DataSHIELD: resolving a conflict in contemporary bioscience-performing a pooled analysis of individual-level data without sharing the data’, International Journal of Epidemiology, 2010, 39, pp. ~1372–1382.

36.

Fienberg, S.E., Fulp, W.J., Slavkovic, A.B., and Wrobel, T.A.: ‘"Secure" Log-Linear and Logistic Regression Analysis of Distributed Databases’, in Editor (Ed.)^(Eds.): ‘Book "Secure" Log-Linear and Logistic Regression Analysis of Distributed Databases’ (Springer Berlin Heidelberg, 2006, edn.), pp. ~277–290.

37.

Domingo-Ferrer JM, Mateo-Sanz J. Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knolwedge and Data Engeneering. 2002;14:189–201.CrossRef

38.

Schmid M. Estimation of a linear model under microaggregation by individual ranking. Allg Stat Arch. 2006;90(3):419–38.

39.

Schmid M, Schneeweiss H. The effect of microaggregation by individual ranking on the estimation of moments. J Econ. 2009;153(2):174–82.CrossRef

Title: Addressing data privacy in matched studies via virtual pooling
Authors: P. Saha-Chaudhuri
C.R. Weinberg
Publication date: 01-12-2017
Publisher: BioMed Central
Published in: BMC Medical Research Methodology / Issue 1/2017
Electronic ISSN: 1471-2288
DOI: https://doi.org/10.1186/s12874-017-0419-0

At a glance: The STEP trials

Springer Medicine

Addressing data privacy in matched studies via virtual pooling

Abstract

Background

Methods

Results

Conclusions

At a glance: The STEP trials

Springer Medicine

Abstract

Background

Methods

Results

Conclusions

Please log in to get access to this content

Other articles of this Issue 1/2017

Generating evidence on a risk-based monitoring approach in the academic setting – lessons learned

An experimental comparison of web-push vs. paper-only survey procedures for conducting an in-depth health survey of military spouses

A mega-ethnography of eleven qualitative evidence syntheses exploring the experience of living with chronic non-malignant pain

Sample size calculations based on a difference in medians for positively skewed outcomes in health care studies

AIMD - a validated, simplified framework of interventions to promote and integrate evidence into health practices, systems, and policies

Accounting for parameter uncertainty in the definition of parametric distributions used to describe individual patient variation in health economic models