Skip to main content
Top
Published in: European Journal of Epidemiology 3/2019

Open Access 01-03-2019 | ESSAY

Principles of confounder selection

Author: Tyler J. VanderWeele

Published in: European Journal of Epidemiology | Issue 3/2019

Login to get access

Abstract

Selecting an appropriate set of confounders for which to control is critical for reliable causal inference. Recent theoretical and methodological developments have helped clarify a number of principles of confounder selection. When complete knowledge of a causal diagram relating all covariates to each other is available, graphical rules can be used to make decisions about covariate control. Unfortunately, such complete knowledge is often unavailable. This paper puts forward a practical approach to confounder selection decisions when the somewhat less stringent assumption is made that knowledge is available for each covariate whether it is a cause of the exposure, and whether it is a cause of the outcome. Based on recent theoretically justified developments in the causal inference literature, the following proposal is made for covariate control decisions: control for each covariate that is a cause of the exposure, or of the outcome, or of both; exclude from this set any variable known to be an instrumental variable; and include as a covariate any proxy for an unmeasured variable that is a common cause of both the exposure and the outcome. Various principles of confounder selection are then further related to statistical covariate selection methods.
Footnotes
1
In principle one could control for covariates temporally subsequent to the exposure but not affected by the exposure [2], or even variables affected by the exposure but not related to the outcome [4] but since it is difficult to know for sure whether a covariate that is temporally subsequent to the exposure is affected by it, often the restriction is made to covariates prior to the treatment or exposure under study. It is possible that the variable occurs prior to the exposure but is measured retrospectively subsequent to the exposure, and such variables might also be considered, though concerns about measurement error of such variables might then also be introduced.
 
2
Another criterion that might be put forward that we could refer to as an “extended common cause criterion” would be to control for any variable that is either a common cause of the exposure and outcome, or that was on the pathway from such a common cause to the exposure or outcome. This criterion, like the disjunctive cause criterion, would select a sufficient set of confounders in both Figs. 2 and 3. The downside of this “extended common cause criterion” is that it requires far more knowledge of the underlying diagram. The “disjunctive cause criterion” and the “common cause criterion” only required knowledge of whether each variable is a cause of the exposure or of the outcome or of both. The “extended common cause criterion” requires also knowledge of whether each variable is such that there is another variable that is a common cause of the exposure and the outcome and for which the variable in question is on the pathway from the common cause to either the exposure or the outcome. In other words, the “extended common cause criterion” requires considerable knowledge of the relationships that potential covariates have to each other. It is difficult to conceive of contexts in which this information would be available without also having knowledge of the entire causal diagram; and with knowledge of the entire causal diagram, Pearl’s original backdoor path criterion would suffice.
 
Literature
1.
go back to reference Pearl J. Causal diagrams for empirical research (with discussion). Biometrika. 1995;82:669–710.CrossRef Pearl J. Causal diagrams for empirical research (with discussion). Biometrika. 1995;82:669–710.CrossRef
2.
go back to reference Pearl J. Causality: models, reasoning, and inference. 2nd ed. Cambridge: Cambridge University Press; 2009.CrossRef Pearl J. Causality: models, reasoning, and inference. 2nd ed. Cambridge: Cambridge University Press; 2009.CrossRef
3.
go back to reference Huang Y, Valtorta M. Pearl’s calculus of interventions is complete. In: Twenty second conference on uncertainty in artificial intelligence. Huang Y, Valtorta M. Pearl’s calculus of interventions is complete. In: Twenty second conference on uncertainty in artificial intelligence.
4.
go back to reference Shpitser I, VanderWeele TJ, Robins JM. On the validity of covariate adjustment for estimating causal effects. In: Proceedings of the 26th conference on uncertainty and artificial intelligence. Corvallis: AUAI Press; (2010), p. 527–536. Shpitser I, VanderWeele TJ, Robins JM. On the validity of covariate adjustment for estimating causal effects. In: Proceedings of the 26th conference on uncertainty and artificial intelligence. Corvallis: AUAI Press; (2010), p. 527–536.
5.
go back to reference Greenland S. Quantifying biases in causal models: classical confounding vs collider-stratification bias. Epidemiology. 2003;14:300–6.PubMed Greenland S. Quantifying biases in causal models: classical confounding vs collider-stratification bias. Epidemiology. 2003;14:300–6.PubMed
6.
go back to reference Ding P, Miratrix LW. To adjust or not to adjust? Sensitivity analysis of M-bias and butterfly-bias (with comments). J Causal Infer. 2015;3:41–57.CrossRef Ding P, Miratrix LW. To adjust or not to adjust? Sensitivity analysis of M-bias and butterfly-bias (with comments). J Causal Infer. 2015;3:41–57.CrossRef
8.
go back to reference Cole SR, Platt RW, Schisterman EF, Chu H, Westreich D, Richardson D, Poole C. Illustrating bias due to conditioning on a collider. Int J Epidemiol. 2010;39:417–20.PubMedCrossRef Cole SR, Platt RW, Schisterman EF, Chu H, Westreich D, Richardson D, Poole C. Illustrating bias due to conditioning on a collider. Int J Epidemiol. 2010;39:417–20.PubMedCrossRef
10.
go back to reference VanderWeele TJ. Explanation in causal inference: methods for mediation and interaction. New York: Oxford University Press; 2015. VanderWeele TJ. Explanation in causal inference: methods for mediation and interaction. New York: Oxford University Press; 2015.
11.
go back to reference Bhattacharya J, Vogt W. Do instrumental variables belong in propensity scores? Int J Stat Econ. 2012;9:107–27. Bhattacharya J, Vogt W. Do instrumental variables belong in propensity scores? Int J Stat Econ. 2012;9:107–27.
12.
go back to reference Myers JA, Rassen JA, Gagne JJ, Huybrechts KF, Schneeweiss S, Rothman KJ, Joffee MM, Glynn RJ. Effects of adjusting for instrumental variables on bias and precision of effect estimates. Am J Epidemiol. 2011;174:1213–22.PubMedPubMedCentralCrossRef Myers JA, Rassen JA, Gagne JJ, Huybrechts KF, Schneeweiss S, Rothman KJ, Joffee MM, Glynn RJ. Effects of adjusting for instrumental variables on bias and precision of effect estimates. Am J Epidemiol. 2011;174:1213–22.PubMedPubMedCentralCrossRef
13.
go back to reference Pearl J. On a class of bias-amplifying variables that endanger effect estimates. In: Grunwald P, Spirtes P, editors. Proceedings of the 26th conferenec on uncertainty in artificial intelligence (UAI 2010). Corvallis, Oregon: Association for Uncetainty in Artificial Intelligence; 2010. p. 425–32. Pearl J. On a class of bias-amplifying variables that endanger effect estimates. In: Grunwald P, Spirtes P, editors. Proceedings of the 26th conferenec on uncertainty in artificial intelligence (UAI 2010). Corvallis, Oregon: Association for Uncetainty in Artificial Intelligence; 2010. p. 425–32.
14.
go back to reference Middleton JA, Scott MA, Diakow R, Hill JL. Bias amplification and bias unmasking. Polit Anal. 2016;24:307–23.CrossRef Middleton JA, Scott MA, Diakow R, Hill JL. Bias amplification and bias unmasking. Polit Anal. 2016;24:307–23.CrossRef
15.
go back to reference Wooldridge J. Should instrumental variables be used as matching variables? Res Econ. 2016;70:232–7.CrossRef Wooldridge J. Should instrumental variables be used as matching variables? Res Econ. 2016;70:232–7.CrossRef
16.
go back to reference Ding P, VanderWeele TJ, Robins JM. Instrumental variables as bias amplifiers with general outcome and confounding. Biometrika. 2017;104:291–302.PubMedCrossRef Ding P, VanderWeele TJ, Robins JM. Instrumental variables as bias amplifiers with general outcome and confounding. Biometrika. 2017;104:291–302.PubMedCrossRef
17.
go back to reference Ogburn EL, VanderWeele TJ. Bias attenuation results for nondifferentially mismeasured ordinal and coarsened confounders. Biometrika. 2013;100:241–8.PubMedCrossRef Ogburn EL, VanderWeele TJ. Bias attenuation results for nondifferentially mismeasured ordinal and coarsened confounders. Biometrika. 2013;100:241–8.PubMedCrossRef
19.
go back to reference Greenland S, Robins JM. Identifiability, exchangeability, and epidemiologic confounding. Int J Epidemiol. 1986;15:413–9.PubMedCrossRef Greenland S, Robins JM. Identifiability, exchangeability, and epidemiologic confounding. Int J Epidemiol. 1986;15:413–9.PubMedCrossRef
20.
go back to reference Robins JM. Estimation of the time-dependent accelerated failure time model in the presence of confounding factors. Biometrika. 1992;79:321–34.CrossRef Robins JM. Estimation of the time-dependent accelerated failure time model in the presence of confounding factors. Biometrika. 1992;79:321–34.CrossRef
21.
go back to reference Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;1983(70):41–55.CrossRef Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;1983(70):41–55.CrossRef
22.
go back to reference Barnow BS, Cain GG, Goldberger AS. Issues in the analysis of selectivity bias. In: Stromsdorfer E, Farkas G, editors. Evaluation studies, vol. 5. San Francisco: Sage; 1980. Barnow BS, Cain GG, Goldberger AS. Issues in the analysis of selectivity bias. In: Stromsdorfer E, Farkas G, editors. Evaluation studies, vol. 5. San Francisco: Sage; 1980.
23.
go back to reference Imbens GW. Nonparametric estimation of average treatment effects under exogeneity: a review. Rev Econ Stat. 2004;86:4–29.CrossRef Imbens GW. Nonparametric estimation of average treatment effects under exogeneity: a review. Rev Econ Stat. 2004;86:4–29.CrossRef
24.
go back to reference VanderWeele TJ. Concerning the consistency assumption in causal inference. Epidemiology. 2009;20:880–3.PubMedCrossRef VanderWeele TJ. Concerning the consistency assumption in causal inference. Epidemiology. 2009;20:880–3.PubMedCrossRef
25.
go back to reference Rubin DB. Author’s reply (to Judea Pearl’s and Arvid Sjolander’s letters to the editor). Stat Med. 2009;28:1420–3.CrossRef Rubin DB. Author’s reply (to Judea Pearl’s and Arvid Sjolander’s letters to the editor). Stat Med. 2009;28:1420–3.CrossRef
26.
go back to reference Rubin DB. For objective causal inference, design trumps analysis. Ann Appl Stat. 2008;3:808–40. Rubin DB. For objective causal inference, design trumps analysis. Ann Appl Stat. 2008;3:808–40.
27.
28.
go back to reference Glymour MM, Weuve J, Chen JT. Methodological challenges in causal research on racial and ethnic patterns of cognitive trajectories: measurement, selection, and bias. Neuropsychol Rev. 2008;18:194–213.PubMedPubMedCentralCrossRef Glymour MM, Weuve J, Chen JT. Methodological challenges in causal research on racial and ethnic patterns of cognitive trajectories: measurement, selection, and bias. Neuropsychol Rev. 2008;18:194–213.PubMedPubMedCentralCrossRef
29.
go back to reference Greenland S. An introduction to instrumental variables for epidemiologists. Int J Epidemiol. 2000;29:722–9.PubMedCrossRef Greenland S. An introduction to instrumental variables for epidemiologists. Int J Epidemiol. 2000;29:722–9.PubMedCrossRef
30.
go back to reference Angrist JD, Imbens GW, Rubin DB. Identification of causal effects using instrumental variables (with discussion). J Am Stat Assoc. 1996;91:444–72.CrossRef Angrist JD, Imbens GW, Rubin DB. Identification of causal effects using instrumental variables (with discussion). J Am Stat Assoc. 1996;91:444–72.CrossRef
31.
go back to reference Hernán MA, Robins JM. Instruments for causal inference: an epidemiologist’s dream? Epidemiology. 2006;17:360–72.PubMedCrossRef Hernán MA, Robins JM. Instruments for causal inference: an epidemiologist’s dream? Epidemiology. 2006;17:360–72.PubMedCrossRef
32.
go back to reference Greenland S. The effect of misclassification in the presence of covariates. Am J Epidemiol. 1980;112:564–9.PubMedCrossRef Greenland S. The effect of misclassification in the presence of covariates. Am J Epidemiol. 1980;112:564–9.PubMedCrossRef
33.
go back to reference Brenner H. Bias due to non-differential misclassification of polytomous confounders. J Clin Epidemiol. 1993;46:57–63.PubMedCrossRef Brenner H. Bias due to non-differential misclassification of polytomous confounders. J Clin Epidemiol. 1993;46:57–63.PubMedCrossRef
34.
go back to reference Schlesselman JJ. Assessing effects of confounding variables. Am J Epidemiol. 1978;108:3–8.PubMed Schlesselman JJ. Assessing effects of confounding variables. Am J Epidemiol. 1978;108:3–8.PubMed
35.
go back to reference Rosenbaum PR, Rubin DB. Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. J R Stat Soc Ser B. 1983;45:212–8. Rosenbaum PR, Rubin DB. Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. J R Stat Soc Ser B. 1983;45:212–8.
36.
go back to reference Flanders WD, Khoury MJ. Indirect assessment of confounding: graphic description and limits on effect of adjusting for covariates. Epidemiology. 1990;1:239–46.PubMedCrossRef Flanders WD, Khoury MJ. Indirect assessment of confounding: graphic description and limits on effect of adjusting for covariates. Epidemiology. 1990;1:239–46.PubMedCrossRef
37.
go back to reference Lash TL, Fox MP, Fink AK. Applying quantitative bias analysis to epidemiologic data. New York: Springer; 2009.CrossRef Lash TL, Fox MP, Fink AK. Applying quantitative bias analysis to epidemiologic data. New York: Springer; 2009.CrossRef
39.
go back to reference VanderWeele TJ, Ding P. Sensitivity analysis in observational research: introducing the E-value. Ann Intern Med. 2017;167:268–74.PubMedCrossRef VanderWeele TJ, Ding P. Sensitivity analysis in observational research: introducing the E-value. Ann Intern Med. 2017;167:268–74.PubMedCrossRef
40.
go back to reference Hernán MA, Robins JM. Causal inference. Boca Raton: Chapman & Hall/CRC; 2018. Hernán MA, Robins JM. Causal inference. Boca Raton: Chapman & Hall/CRC; 2018.
41.
go back to reference Robins JM. Causal inference from complex longitudinal data. In: Berkane M, editor. Latent variable modeling and applications to causality (Los Angeles, CA, 1994). Lecture notes in statistics, vol. 120. New York: Springer; 1997. p. 69–117.CrossRef Robins JM. Causal inference from complex longitudinal data. In: Berkane M, editor. Latent variable modeling and applications to causality (Los Angeles, CA, 1994). Lecture notes in statistics, vol. 120. New York: Springer; 1997. p. 69–117.CrossRef
42.
go back to reference Taubman SL, Robins JM, Mittleman MA, Hernán MA. Intervening on risk factors for coronary heart disease: an application of the parametric g-formula. Int J Epidemiol. 2009;38(6):1599–611.PubMedPubMedCentralCrossRef Taubman SL, Robins JM, Mittleman MA, Hernán MA. Intervening on risk factors for coronary heart disease: an application of the parametric g-formula. Int J Epidemiol. 2009;38(6):1599–611.PubMedPubMedCentralCrossRef
43.
go back to reference Garcia-Aymerich J, Varraso R, Danaei G, Camargo CA, Hernán MA. Incidence of adult-onset asthma after hypothetical interventions on body mass index and physical activity. An application of the parametric g-formula. Am J Epidemiol. 2014;179(1):20–6.PubMedCrossRef Garcia-Aymerich J, Varraso R, Danaei G, Camargo CA, Hernán MA. Incidence of adult-onset asthma after hypothetical interventions on body mass index and physical activity. An application of the parametric g-formula. Am J Epidemiol. 2014;179(1):20–6.PubMedCrossRef
44.
go back to reference Danaei G, Pan A, Hu FB, Hernán MA. Hypothetical lifestyle interventions in middle-aged women and risk of type 2 diabetes: a 24-year prospective study. Epidemiology. 2013;24(1):122–8.PubMedPubMedCentralCrossRef Danaei G, Pan A, Hu FB, Hernán MA. Hypothetical lifestyle interventions in middle-aged women and risk of type 2 diabetes: a 24-year prospective study. Epidemiology. 2013;24(1):122–8.PubMedPubMedCentralCrossRef
45.
go back to reference Lajous M, Willett WC, Robins JM, Young JG, Rimm EB, Mozaffarian D, Hernán MA. Changes in fish consumption in midlife and the risk of coronary heart disease in men and women. Am J Epidemiol. 2013;178(3):382–91.PubMedPubMedCentralCrossRef Lajous M, Willett WC, Robins JM, Young JG, Rimm EB, Mozaffarian D, Hernán MA. Changes in fish consumption in midlife and the risk of coronary heart disease in men and women. Am J Epidemiol. 2013;178(3):382–91.PubMedPubMedCentralCrossRef
46.
go back to reference VanderWeele TJ, Jackson JW, Li S. Causal inference and longitudinal data: a case study of religion and mental health. Soc Psychiatry Psychiatr Epidemiol. 2016;51:1457–66.PubMedCrossRef VanderWeele TJ, Jackson JW, Li S. Causal inference and longitudinal data: a case study of religion and mental health. Soc Psychiatry Psychiatr Epidemiol. 2016;51:1457–66.PubMedCrossRef
47.
go back to reference Maldonado G, Greenland S. Simulation study of confounder-selection strategies. Am J Epidemiol. 1993;138:923–36.PubMedCrossRef Maldonado G, Greenland S. Simulation study of confounder-selection strategies. Am J Epidemiol. 1993;138:923–36.PubMedCrossRef
48.
go back to reference Greenland S. Invited commentary: variable selection versus shrinkage in the control of multiple confounders. Am J Epidemiol. 2008;167:523–9.PubMedCrossRef Greenland S. Invited commentary: variable selection versus shrinkage in the control of multiple confounders. Am J Epidemiol. 2008;167:523–9.PubMedCrossRef
49.
go back to reference Belloni A, Chernozhukov V, Hansen C. Inference on treatment effects after selection among high-dimensional controls. Rev Econ Stud. 2014;81:608–50.CrossRef Belloni A, Chernozhukov V, Hansen C. Inference on treatment effects after selection among high-dimensional controls. Rev Econ Stud. 2014;81:608–50.CrossRef
50.
go back to reference Chernozhukov V, Hansen C, Spindler M. Valid post-selection and post-regularization inference: an elementary, general approach. Annu Rev Econ. 2015;7:649–88.CrossRef Chernozhukov V, Hansen C, Spindler M. Valid post-selection and post-regularization inference: an elementary, general approach. Annu Rev Econ. 2015;7:649–88.CrossRef
51.
go back to reference Lee JD, Sun DL, Sun Y, Taylor JE. Exact post-selection inference with the lasso. Ann Stat. 2016;44(3):907–27.CrossRef Lee JD, Sun DL, Sun Y, Taylor JE. Exact post-selection inference with the lasso. Ann Stat. 2016;44(3):907–27.CrossRef
52.
go back to reference Greenland S, Robins JM, Pearl J. Confounding and collapsibility in causal inference. Stat Sci. 1999;14:29–46.CrossRef Greenland S, Robins JM, Pearl J. Confounding and collapsibility in causal inference. Stat Sci. 1999;14:29–46.CrossRef
53.
go back to reference Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H, Brookhart MA. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology. 2009;20(4):512–22.PubMedPubMedCentralCrossRef Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H, Brookhart MA. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology. 2009;20(4):512–22.PubMedPubMedCentralCrossRef
54.
go back to reference Rassen JA, Glynn RJ, Brookhart MA, Schneeweiss S. Covariate selection in high-dimensional propensity score analyses of treatment effects in small samples. Am J Epidemiol. 2011;173(12):1404–13.PubMedPubMedCentralCrossRef Rassen JA, Glynn RJ, Brookhart MA, Schneeweiss S. Covariate selection in high-dimensional propensity score analyses of treatment effects in small samples. Am J Epidemiol. 2011;173(12):1404–13.PubMedPubMedCentralCrossRef
55.
go back to reference van der Laan JM, Rose S. Targeted learning in data science: causal inference for complex longitudinal studies. New York: Springer; 2018.CrossRef van der Laan JM, Rose S. Targeted learning in data science: causal inference for complex longitudinal studies. New York: Springer; 2018.CrossRef
56.
go back to reference van der Laan JM, Rose S. Targeted learning: causal inference for observational and experimental data. New York: Springer; 2011.CrossRef van der Laan JM, Rose S. Targeted learning: causal inference for observational and experimental data. New York: Springer; 2011.CrossRef
57.
go back to reference Schuler M, Rose S. Targeted maximum likelihood estimation for causal inference in observational studies. Am J Epidemiol. 2017;185(1):65–73.PubMedCrossRef Schuler M, Rose S. Targeted maximum likelihood estimation for causal inference in observational studies. Am J Epidemiol. 2017;185(1):65–73.PubMedCrossRef
Metadata
Title
Principles of confounder selection
Author
Tyler J. VanderWeele
Publication date
01-03-2019
Publisher
Springer Netherlands
Published in
European Journal of Epidemiology / Issue 3/2019
Print ISSN: 0393-2990
Electronic ISSN: 1573-7284
DOI
https://doi.org/10.1007/s10654-019-00494-6

Other articles of this Issue 3/2019

European Journal of Epidemiology 3/2019 Go to the issue