Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Original Article
  • Published:

Causal models for estimating the effects of weight gain on mortality

Abstract

Suppose, in contrast to the fact, in 1950, we had put the cohort of 18-year-old non-smoking American men on a stringent mandatory diet that guaranteed that no one would ever weigh more than their baseline weight established at the age of 18 years. How would the counterfactual mortality of these 18 year olds have compared to their actual observed mortality through 2007? We describe in detail how this counterfactual contrast could be estimated from longitudinal epidemiologic data similar to that stored in the electronic medical records of a large health maintenance organization (HMO) by applying g-estimation to a novel of structural nested model (SNM). Our analytic approach differs from any alternative approach in that, in the absence of model misspecification, it can successfully adjust for (i) measured time-varying confounders such as exercise, hypertension and diabetes that are simultaneously intermediate variables on the causal pathway from weight gain to death and determinants of future weight gain, (ii) unmeasured confounding by undiagnosed preclinical disease (that is, reverse causation) that can cause both poor weight gain and premature mortality (provided an upper bound can be specified for the maximum length of time a subject may suffer from a subclinical illness severe enough to affect his weight without the illness becomes clinically manifest) and (iii) the presence of particular identifiable subgroups, such as those suffering from serious renal, liver, pulmonary and/or cardiac disease, in whom confounding by unmeasured prognostic factors is so severe as to render useless any attempt at direct analytic adjustment. However, (ii) and (iii) limit the ability to empirically test whether the SNM is misspecified. The other two g-methods—the parametric g-computation algorithm and inverse probability of treatment weighted estimation of marginal structural models—can adjust for potential bias due to (i) but not due to (ii) or (iii).

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Similar content being viewed by others

References

  1. Willett WC, Dietz WH, Colditz GA . Guidelines for healthy weight. N Engl J Med 1999; 341: 427–434.

    Article  CAS  Google Scholar 

  2. Robins JM . Estimation of the time-dependent accelerated failure time model in the presence of confounding factors. Biometrika 1992; 79: 321–334.

    Article  Google Scholar 

  3. Robins JM, Wasserman L . Estimation of effects of sequential treatments by reparameterizing directed acyclic graphs. In: Geiger D, Shenoy P (eds). Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence, Providence Rhode Island, 1–3 August 1997. Morgan Kaufmann: San Francisco, 1997, pp 409–420.

    Google Scholar 

  4. Robins JM . Association, causation, and marginal structural models. Synthese 1999; 121: 151–179.

    Article  Google Scholar 

  5. Robins JM, Hernan MA, Siebert U . Effects of multiple interventions. In: Ezzati M, Lopez AD, Rodgers A, Murray CJL (eds). Comparative Quantification of Health Risks: Global and Regional Burden of Disease Attributable to Selected Major Risk Factors, vol I. World Health Organization: Geneva, 2004, pp 2191–2230.

    Google Scholar 

  6. Hernán MA, Hernandez Diaz S, Robins JM . A structural approach to selection bias. Epidemiology 2004; 15: 615–625.

    Article  Google Scholar 

  7. Robins JM . Correcting for non-compliance in randomized trials using structural nested mean models. Commun Stat 1994; 23: 2379–2412.

    Article  Google Scholar 

  8. Robins JM . Optimal structural nested models for optimal sequential decisions. In: Lin DY, Heagerty P (eds). Proceedings of the Second Seattle Symposium on Biostatistics. Springer-Verlag: New York, 2004.

    Google Scholar 

  9. Robins JM . Causal inference from complex longitudinal data. In: Berkane M (ed). Latent Variable Modeling and Applications to Causality. Lecture Notes in Statistics (120). Springer-Verlag: New York, 1997, pp 69–117.

    Chapter  Google Scholar 

  10. Murphy SA . Optimal dynamic treatment regimes. J R Stat Soc Ser B 2003; 65: 331–366.

    Article  Google Scholar 

  11. Robins JM, Scharfstein D, Rotnitzky A . Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In: Halloran ME, Berry D (eds). Statistical Models in Epidemiology: the Environment and Clinical Trials. Springer-Verlag: New York, 1999, pp 1–94.

    Google Scholar 

  12. Lok JJ, Gill RD, van der Vaart AW, Robins JM . Estimating the causal effect of a time-varying treatment on time-to-event using structural nested failure time models. Stat Neerl 2001; 58: 271–295.

    Article  Google Scholar 

  13. Robins JM . General methodological considerations. J Econom 2003; 112: 89–106.

    Article  Google Scholar 

  14. Robins JM . Testing and estimation of direct effects by reparameterizing directed acyclic graphs with structural nested models. In: Glymour C, Cooper G (eds). Computation, Causation, and Discovery. AAAI Press/The MIT Press: Menlo Park, CA; Cambridge, MA, 1999, pp 349–405.

    Google Scholar 

  15. Robins JM . Analytic methods for estimating HIV treatment and cofactor effects. In: Ostrow DG, Kessler R (eds). Methodological Issues of AIDS Mental Health Research. Plenum Publishing: New York, 1993, pp 213–290.

  16. Joffe MM, Hoover DR, Jacobson LP, Kingsley L, Chmiel JS, Fischer BR et al. Estimating the effect of Ziduvodine on Kaposi's sarcoma from observational data using a rank preserving failure time model. Stat Med 1998; 17: 1073–1102.

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J M Robins.

Appendices

Appendix 1

A formal definition of a joint SNFTM for Xm and a SNMM for YmXm

The definition here is the alternative, more intuitive and more general definition mentioned in the main text. The equivalence with the definitions in the main text is proved below.

We first consider the uncensored case. The observed data are , X, Y, where X is a continuous time to event variable and Y is measured at K+1. The counterfactual data are (Xm, Ym), m=0, …, K+1, denoting X and Y under treatment regimens where one experiences his observed treatment up to m and then receives no treatment (treatment level 0) thereafter. We make the assumption that XK+1=X, YK+1=Y. The covariate L(k) precedes A(k) which precedes L(k+1).

The function is a counterfactual conditional quantile–quantile function, where S and S−1 denote a survivor function and its inverse. It is a standard result that xm(x, ) is the unique function for which , ) and Xm have the same conditional distribution, that is,

Define XK+1=X and then recursively define Xmxm(Xm+1, ). Robins and Wasserman3 proved the following

Theorem A1:

where we silently take such equations to hold for all m=0, …, K.

Furthermore, Robins and co-workers11, 14 and Lok12 proved the function xm is unique. That is if the above equation holds for with Xm replaced by some Hm=hm(Hm+1, ) and HK+1=X, then the function hm must be the function xm.

An SNFTM for Xm assumes

for a known function xm(x, ; ψ) satisfying xm(x, , ψ)=x if ψ=0 or A(m)=0 with ψ* an unknown parameter vector.

It follows immediately that

with

The uniqueness of Xm implies that SNFTMs as defined in the text are also SNFTMs as defined here.

Recall , ) and define

which is equivalent to

Define YK+1=Y and then recursively define Ym=Ym+1γm( , Xm).

We prove the following theorem below.

Theorem A2:

Furthermore the function γm is unique. That is if the above equation holds with Ym replaced by some Hm=Hm+1hm( , Xm) and HK+1=Y, then the function hm must be the function γm.

An additive SNMM for YmXm assumes

for a known function γm( , x; β) satisfying γm( , x; β)=0 if β=0 or A(m)=0 with β* an unknown parameter vector.

It follows immediately that

with

The uniqueness of γm implies that an additive SNMM for YmXm as defined in the text is equivalent to the additive SNMM for YmXm as defined here.

Proof of Theorem A2: By backward induction.

Case 1: m=K;

where the first equality uses the definition of YK and that YK+1=Y=YK+1, the second uses that by XK+1=X=XK+1, and the third is the definition of γK(Ā(K), .

Case 2: Assume true for m. We prove true for m+1.

(by the laws of probability)

(by the induction assumption)

(by and having identical distributions)

(by the laws of probability) (by the definition of )

(by the defintion of

Uniqueness is proved as in Robins8 and Lok et al.12 and is omitted.

An additive SNMM for YmXm may not be appropriate for analyzing censored data due to administrative censoring of X at time K as discussed in the text. As indicated in the section Censoring, our approach requires that we consider a broader class of SNMM models, which we now describe.

Consider a collection of functions cm(x, ) indexed by m and define and . For fixed , cm(x, ) need not be a 1–1 function of x.

Redefine

which is equivalent to

Define YK+1=Y and then recursively redefine Ym=Ym+1γm( , Cm). We have the following theorem.

Theorem A3:

Furthermore the function γm is unique. That is, if the above equation holds with Ym replaced by some Hm=Hm+1hm( , Cm) and HK+1=Y, then the function hm must be the function γm.

Proof of A.3: We only describe where the proof differs from that of its special case Theorem A.2. The proof is essentially identical except for the replacement of Xm by Cm, x by c and by

An additive SNMM for YmCm assumes

for a known function γm( , c; β) satisfying γm( , c; β)=0 if β=0 or A(m)=0 with β* an unknown parameter vector.

Then given a SNFTM for Xm and a function that may depend on the functions jm, suppose we can define a parametrized class of functions satisfying

For example, in the section on censoring in the main text, we took

Then defining we have

with

Appendix 2

Estimation of effects with the parametric g-formula and IPTW when a sufficiently long MLP exists

In this section, we show that the parametric g-formula and IPTW can be used to estimate certain causal effects when there exists a sufficiently long MLP. We begin with a preliminary discussion of these two methods of estimation.

Preliminaries. In this preliminary discussion, we assume that, as in the section A locally rank-preserving SNM, there is neither confounding by preclinical disease nor an MLP. Specifically we assume, for each regimen g, the COg assumption that, for each j, holds, with Ξg(m) defined in Equation (53).

Recoding: Without loss of generality, we henceforth redefine (that is, recode) such that Ξg(j) is now one of the components of but we remove from the components corresponding to X, that is, the components (XI(Xj), I(Xj)). Then we can write the COg assumption as

as, from these definitions, Ξg(j)=0 implies AΔg(j)=0. The COg assumption implies

as implies (Yjg, Xjg)=(Y0g, X0g). This last equation is the standard definition of no unmeasured confounding given ( , (XI(Xj), I(Xj))) for the effect of AΔg(j) on the counterfactuals Y0g, X0g. Let be the conditional hazard of X given the information in •.

Robins4, 15 proves that Equation (74) implies that

is identified through

with

where the first formula for is referred to as the g-computation algorithm formula (g-formula, for short) and the second formula as the IPTW formula. To shorten the formulae, we have written as a shorthand for when the time t is clear. In fact Robins4, 15 shows that the assumption

which is implied by the assumption of Equation (74), suffices to establish the identifying formulae. To estimate , we can use either the parametric g-formula estimator that replaces the unknowns and in the first formula by estimates based on parametric models or the IPTW estimator that replaces the unknown in the second formula with a parametric estimate and the unknown expectation with a sample average. Both approaches are alternatives to g-estimation of structural nested models (SNMs).

Robins4, 15 proves E[Y0g] is identified under the assumption of Equation (74) by

In the above formulae, we have assumed for simplicity that X has support on (0, K+1) so censoring for X is absent.

We next consider whether and E[Y0g] remain identified in the presence of confounding by preclinical disease and a sufficiently long MLP.

Identification and estimation of

The following theorem establishes the identification of First note under our recoding, the RCg assumption becomes

Theorem A4: Given a regimen g, let a g-specific MLP satisfy the definition of an MLP of the section Estimation under a rank-preserving SNM for YmXm with Xm known, except with Xk and Xm replaced by Xkg and Xmg and A(m) replaced by AΔg(m). Suppose AΔg(m) has a g-specific MLP of χ months for its effect on X where χ exceeds the time ς in the CDg assumption. Then, under the CDg and RCg assumptions, remains identified by both the g-formula and the IPTW formula when the recoded L(t) and AΔg(t) are redefined as L(t) and AΔg,†(t) where

The theorem thus states that the identifying formulae are the usual g-formula and IPTW formula except we replace both the treatment variable AΔg(t) and the covariate variable L(t) by their values χ time units earlier. (For the IPTW formula, the transformation is applied to It is important to emphasize that a similar transformation is not applied to X. Thus, the conditioning event transforms to .

Proof of theorem: It suffices to show Equation (78) holds when L(t) and AΔg(t) are replaced by L(t) and AΔg,†(t). By RCg, Thus, By CDg and χ>ς,

Thus with mχ+j.

Now the event X>(mχ) is the event Xm−χg>(mχ). Further, by χ a g-specific MLP we also have the event Xm−χg>m is the event X>m. Thus, we have As, given we have (Ymχg, Xmχg)=(Y0g, X0g), we conclude , which is exactly Equation (78) with L(t) and AΔg(t) replaced by L(t) and AΔg,†(t), proving the theorem.

In contrast, under the conditions of the previous theorem, E[Y0g] is not identified because Equation (74), in contrast to Equation (78), fails to hold when L(t) and AΔg(t) are replaced by L(t) and AΔg,†(t). Specifically, Equation (74) can be written as the conjunction of Equation (78),

and

We show below that under the conditions of the previous theorem, Equation (81) holds but Equation (82) does not when L(t) and AΔg(t) are replaced by L(t) and AΔg,†(t) To show (81) we modify slightly the proof of Equation (78) as follows:

by the g-specific MLP assumption.

The proof of (82) fails because the event is not the same event as , under CDg because Xjg<j+ς does not imply

Proof that E[YT0] is non-parametrically identified when a sufficiently long MLP exists. In the section Intractable confounding in subgroups, we stated that E[YT0] is non-parametrically identified under the conditions of the previous theorem with the regimen g in the theorem being the regimen that always assigns exposure zero. A proof follows.

Let IN, AT, ΞT, YTm, XTm be as defined in the section Intractable confounding in subgroups where we recall that because of the existence of the MLP of length χ>ς, all subjects with ς<Xm<m+ς have IN(m)=1. First in Equations (78), (81) and (82) we replace (Y0g, X0g) by (YT0, XT0), AΔg(m) by AT(mχ), and redefine L(m) as L(mχ) with the component Ξ(m) of L(m) being replaced by ΞT(mχ). Equation (82) now holds trivially because with probability one mχ+ς>X implies IN(mχ)=1 and thus ΞT(mχ)=0 and AT=(mχ)=0. Furthermore, the proofs of Equations (78) and (81) go through as above with only minor notational changes. We therefore conclude that Equation (74) holds and thus that E[YT0] is non-parametrically identified. The identifying IPTW formula is explicitly given by

Appendix 3

Optimal regimen models

Suppose we now wish to estimate the regimen gopt that maximizes E[Y0g] over all regimens g. We will do so by specifying an optimal regimen SNMM and associated SNFTM.

To begin consider the dietary intervention a(k), gopt,k+1 in which one follows his observed diet up to month k, allows a BMI increase of a(k) over his maximum previous BMI in month k, and follows the unknown optimal regimen gopt thereafter. Let be the associated counterfactuals. When A(k)=a(k), write gopt,k+1 for the regimen A(k), gopt,k+1. Note

We will make the following assumptions:

Optimal regimen RC assumption: A(m) is statistically independent of given Ξ(m)=1, and for each a(m)0.

Optimal regimen CD assumption:

We next recursively define random variables by the relationship that and, for m=K, … 0,

These equations recursively define in terms of the observed data, the regimen gopt,m+1 and the parameter vector ψ as can be verified by noting that these equations imply the following relationship between and

We assume an optimal regimen SNFTM given by

for an unknown value ψ* of the vector ψ.

We also assume an optimal regimen SNMM

Above ω(a(t), ā(t−1), (t), ψ) and γm[a(m), ā(m−1), (m), x, β] are known functions satisfying ω(a(t), ā(t−1), (t), ψ)=0 if a(t)=0 or ψ=0 and γm(a(m), ā(m−1), (m), β)=0 if a(m)=0 or β=0.

The optimal regimen itself remains unknown. However, we show below that the following algorithm evaluated at the true (β*, ψ*) would find the optimal regimen gopt under the following additional condition, that we henceforth assume to hold.

Additional condition: For each ā(m−1), (m), x, β, m the function γmopt[a(m), ā(m−1), (m), x, β] is either everywhere zero or is strictly concave downward in a(m) on the support of A(m).

Optimal regimen algorithm: Given any (β, ψ), calculate as follows.

Calculate Define

Calculate

Calculate

Recursively for m=K−1, … 0, calculate

Calculate Calculate

Note that to carry out this algorithm we will need to be able to estimate

for all possible values of a(m) in support of A(m). One possibility is to specify and fit an appropriate multivariate regression model with the possible values of a(m) indexing the multivariate outcomes at time m.

To understand why this is the correct algorithm, we first note that any regimen at m can be a function of X only if Xm, so that X is known by m. When X>m, we must average over because is a function of X. When X>m, will be the value of if the optimal regimen gopt(β,ψ) dictates the exposure a(m). The optimal regimen will choose the a(m) that optimizes the contribution to the utility at time m. But the optimizing a(m) depends on the a(k) chosen for the regimen for k>m. Thus, we need to use backward recursion to estimate the optimal regimen.

To be more specific, consider the subgroup of subjects with a history with X<K so Then that maximizes is the optimal treatment choice at K. However, we are only considering regimens (interventions) that do not force subjects to gain weight. We now argue that for any subject with A(K) less than the optimal decision is not to intervene at all, so the subject receives his observed treatment A(K). The subject with A(K) less than could still have received any treatment between 0 and A(K). However, among these set of treatments, the treatment A(K) is optimal by the concavity condition above.

Next consider the subgroup of subjects with a history ( with X>K. To find the optimal treatment, we average over . As the average over of a function that is concave in a(K) for every possible value of remains a concave function of a(K), we again take

That the same argument holds for each m is a standard dynamic programming argument as discussed in Robins.8

As (β*, ψ*) are unknown we must estimate them by g-estimation. Define

Note these equations are much more complex than the equations for ψ using an SNFTM and SNMM for a fixed g in that gopt is now not known but depends on the parameters (β, ψ) through the above algorithm for gopt(β, ψ). Thus, we can no longer estimate ψ* independently of β* as is now a function of β as well as ψ through its dependence on gopt(β,ψ). Rather, we must solve both pairs of g-estimation equations simultaneously.

Specifically, given the optimal regimen RC and CD assumptions, to obtain CAN estimators of the unknown parameters, we find jointly ( ) so that both the score test for the covariate vector depending on is precisely zero and the score test for the covariate vector depending on is precisely zero (both tests are restricted to subjects with and Ξ(m)=1). This turns out to be a very difficult computational problem. Robins8 describes a number of computational simplifications, but they are beyond the scope of the current paper. Finally, we obtain as our estimate of the optimal regimen gopt(β*,ψ*) and as our estimate of the expected utility under the optimal regimen.

Both estimation of E[Y0g] for a known g and of can be modified to allow for censoring at the end of follow-up at K+1 and for intractable unmeasured confounding in certain subgroups using methods exactly analogous to the methods for the estimation of E[Y0].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Robins, J. Causal models for estimating the effects of weight gain on mortality. Int J Obes 32 (Suppl 3), S15–S41 (2008). https://doi.org/10.1038/ijo.2008.83

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ijo.2008.83

Keywords

This article is cited by

Search

Quick links