The American Psychiatric Association (APA) has updated its Privacy Policy and Terms of Use, including with new information specifically addressed to individuals in the European Economic Area. As described in the Privacy Policy and Terms of Use, this website utilizes cookies, including for the purpose of offering an optimal online experience and services tailored to your preferences.

Please read the entire Privacy Policy and Terms of Use. By closing this message, browsing this website, continuing the navigation, or otherwise continuing to use the APA's websites, you confirm that you understand and accept the terms of the Privacy Policy and Terms of Use, including the utilization of cookies.

×
Published Online:https://doi.org/10.1176/ps.2010.61.2.160

Systematic, quantitative assessment of outcomes is a fundamental procedure in depression treatment research. However, the metrics most commonly used in outcome research bear little resemblance to the day-to-day experience of individuals with depression. Although there may be no methodological disadvantage to using abstract statistical constructs in evaluating treatment efficacy, the need to facilitate effectiveness research introduces a broader set of demands on treatment research. Two such demands are the facilitation of conducting cost-effectiveness analyses to help judge the relative value of an intervention and the ability to communicate outcomes effectively to frontline clinicians who are increasingly interested in incorporating evidence-based practices that have been substantiated through effectiveness research. In this report we illustrate the feasibility and validity of using the concept of estimated depression-free days (DFDs) as an outcome metric that is methodologically sound, easily incorporated into cost-effectiveness analyses, and inherently representative of the lived experience of patients with depression ( 1 ).

Comparing response to treatment between groups is most commonly done by transforming two assessment points into an effect size. For example, Cohen's d is a standardized effect size measure that indicates the differential change in symptom severity between two groups in terms of standard deviation from the mean ( 2 ). This type of effect size is efficient for comparing groups but conveys virtually no clinically relevant information. To help reconcile clinical terminology with outcome metrics, Riso and colleagues ( 3 ) established a basis for using a clinically relevant treatment response, commonly defined as a 50% reduction in symptoms between an initial assessment point and a follow-up assessment. Using treatment response (or other clinically relevant metrics, such as remission) offers the advantage of providing clinically relevant information, but this information is presented as a snapshot in time and does not reflect the actual course of change between assessment points and thus the depression-relevant experience of the patient over time. The DFD is an outcome metric that is both easily interpretable and intrinsically more accurate than methods based on simple transformations of two assessment points when multiple assessments are available. The concept of estimating DFDs from depression severity scores was initially used in analyses of a depression treatment trial by Lave and colleagues ( 4 ), and it has since been used in several trials of depression treatment ( 1 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 ). Converting ratings of depression severity over time into DFDs produces a construct with more direct clinical relevancy and minimal loss of precision ( 1 , 9 ). Furthermore, DFDs can be easily translated to quality-adjusted life years ( 9 ) to facilitate cost analyses ( 9 , 11 , 13 , 14 , 15 , 16 , 17 ).

In this report we present depression outcomes based on two measures of depression symptom severity—the Patient Health Questionnaire (PHQ-9) (18,19) and the Hopkins Symptom Checklist (HSCL-20), a 20-item subset of depression items from the Symptom Checklist-90 ( 20 )—that were used in a large effectiveness trial of collaborative care for depression treatment for older adults. In doing so, we demonstrate two characteristics that make the choice of DFD as a main outcome metric compelling—namely, the clinical relevancy of DFD and the potential improvement in assessment accuracy when multiple assessment points are available.

Methods

Data were derived from the intervention arm of the Improving Mood—Promoting Access to Collaborative Treatment (IMPACT) study ( 21 ). The IMPACT study was a multisite, randomized trial comparing a primary care-based collaborative care model with usual primary care for late-life depression. The study was conducted at seven study sites in five states (California, Indiana, North Carolina, Texas, and Washington) and represented eight health care organizations and 18 primary care clinics. Recruitment occurred between June 1999 and August 2001. Patients were followed for 24 months.

Sample

Primary care patients aged 60 or older were recruited from 18 diverse primary care clinics. All participants signed written informed consent forms approved by the institutional review boards at the study coordinating center and all study sites. Of the 35,098 patients approached, 1,801 met eligibility requirements (major depression, dysthymia, or both), consented to treatment, and were randomly assigned to the intervention arm of the study; 906 were randomly assigned to the IMPACT model of collaborative care. Intervention participants were selected for these analyses because, in addition to independent assessments of depression severity using the HSCL-20, they systematically completed PHQ-9 questionnaires at each clinical encounter as an integral part of their treatment ( 22 ). Exclusion criteria included severe cognitive impairment, active substance abuse, active suicidal behavior, severe mental illness, and active treatment from a psychiatrist.

DFD estimation

DFD estimates are calculated by using linear interpolation to estimate daily depression severity across assessment points ( 1 ). In this study our standard assessment points were at baseline, three months, six months, and 12 months. Study outcome assessments were conducted with the HSCL-20 via telephone by an independent assessor blind to the study conditions. In addition, the care managers in the study used the PHQ-9 as a clinical assessment tool. The clinical assessments were conducted at each contact point with the patient. For this study we isolated results from PHQ-9s administered within 30 days of the four standardized assessment points. We used PHQ-9 data because of their clinical utility, as well as for the opportunity to look at how a larger number of assessment points, compared with the standard four points, would influence DFD estimates.

Estimates of DFD based on HSCL-20 data

To estimate DFDs, we assigned a depression level to each day within the assessment period. Days within an assessment period in which the average HSCL-20 score was below .5 (on a scale of 0–4) were characterized as fully depression free and hence were assigned a score of 1. Days with average HSCL-20 scores above 1.7 (the mean score of depressed patients entering the trial) were characterized as fully depressed days and assigned a score of 0. For assessment periods in which the average depression score was between .5 and 1.7, linear interpolation was used to convert daily values into proportions between 0 and 1.

The composite estimate of DFDs was equal to the number of days within the assessment period multiplied by the assigned level of depression. When multiple assessment points were available, the DFD estimate was computed for each intermittent time period, and then the total DFD was computed as a weighted sum (weighted by the relative duration of each period). For example, to compute DFDs using baseline and 12 months, each point has an equal distribution, so the formula is DFD=365×[(.5×baseline DFD)+(.5×12-month DFD)]. To compute DFDs with the four assessment points distributed at baseline, three months, six months, and 12 months, the formula is DFD=365×[(.125×baseline DFD)+(.250×3-month DFD)+(.375×6-month DFD)+(.250×12-month DFD)].

As stated above, the weight for each assessment point varies by the amount of time that point contributes to the estimate. Hence in this example, the three-month period beginning at baseline represents 25% of the total period, so baseline contributes to half of the three-month period (.5×25%=.125). The three-month assessment point contributes to the calculation twice, once for the initial period between baseline and three months (weight=.125) and once again for the three-month period between three months and six months (weight=.125), for a total weight of .25. The six-month assessment point contributes to both the three-month interval between three and six months (weight=.125) and to the longer, six-month interval between six and 12 months (weight=.25), for a total weight of .375.

Estimates of DFDs have been reported with the use of the Hamilton Depression Rating Scale, the Beck Depression Inventory, and the HSCL-20. No standards exist for establishing scale cutoffs for the interpolation process. Our HSCL-20 thresholds for computing DFDs were adapted from work by Simon ( 23 ), who used thresholds of .5 and 2.0 as one and zero DFDs, respectively. We used an upper cutoff point of 1.7—the mean baseline HSCL-20 score of IMPACT participants—to better reflect this sample's reported depression severity, all of whom met Structured Clinical Interview for DSM Disorders criteria for major depression or dysthymic disorder at the time of study entry.

Estimates of DFD based on the PHQ-9

The procedure for estimating DFD from PHQ-9 results followed the same method as described above for the HSCL-20, with substitution of appropriate cutoff scores. Using cutoffs established by Kroenke and colleagues ( 19 ), we characterized as fully depression free the days within assessment periods in which the average PHQ-9 scores were below 5 (classified as no depression) and hence assigned a score of 1. Days within an assessment period in which the average score was above 14 (classified as moderate to severe depression) were characterized as fully depressed days and were assigned a score of 0. Again, linear interpolation was used to convert average scores between our upper and lower cutoff scores into proportions between 0 and 1.

Results

Depression outcomes for participants who received collaborative care in the IMPACT study are reported in Table 1 .

Table 1 Outcomes from the Improving Mood—Promoting Access to Collaborative Treatment trial based on various metrics
Table 1 Outcomes from the Improving Mood—Promoting Access to Collaborative Treatment trial based on various metrics
Enlarge table

As discussed, one of the advantages of using the DFD measure is the ability to incorporate multiple assessment points. The increase in reported symptom change on the HSCL-20 went from 153 DFDs with two assessment points, to 197 with three, to 204 with four assessment points—resulting in an increase of 33%. Similarly for the PHQ-9, two assessment points yielded 200 DFDs, whereas four assessment points yielded 265, an increase again of 33%. Using all available PHQ-9 assessments (mean of 16, range 8–38) yielded 273 DFDs, an incremental change from four assessment points of 3%.

Discussion

We computed DFDs by using two assessment instruments and two methods for establishing cutoff scores for determining a DFD. In both approaches, we found that incorporating multiple assessment points changed the estimated effect size of treatment by 33%. The use of the DFD measure affords researchers advantages, namely the inherent ability to take advantage of multiple assessment points to increase accuracy in representing the course of symptom response and the ease with which cost analyses can be conducted.

In this study, we did not compare DFDs between the intervention and comparison groups from the IMPACT trial. The missing step of comparing the relative difference in DFDs between groups is a straightforward analytic process, and the results of this comparison have been reported elsewhere ( 24 , 25 ). Instead, we examined results from the intervention arm because they afford the ability to investigate a potential ceiling effect for measurement frequency on outcome. The most accurate method of calculating DFDs would likely use daily experience sampling (with a depression diary, for example). However, daily measurement is expensive, and follow-through by patients is a major barrier. Our results indicate that such methods may not provide substantial additional benefit in estimating DFDs. We found that four assessment points gave nearly as much information as using a combined sample with a mean of 16 assessment points, demonstrating that we do not need to measure depression severity more than four times over the course of a year to determine an accurate measure of DFD.

Our data do not allow us to determine an optimal number of assessment points for modeling DFD. It is possible that four is good but that five or six points would provide an adequate increase in accuracy to justify the added costs of assessment. We illustrated two approaches to selecting a cutoff for a DFD or a fully depressed day. For the HSCL-20 we illustrated selecting a cutoff based on a combination of cutoffs recommended in previous studies ( 23 ) and adjustment to the mean level of depression in our sample. For the PHQ-9 we used more general values associated with the instrument when used in population studies. Although the selection of cutoff values is not likely to affect between-group differences in any given analysis (because the selection is applied to both groups), the cutoffs have an impact on the magnitude of the DFD and hence its clinical relevance. Using either method has advantages in representing different groups, and the selection of cutoffs should be clearly articulated. Future studies could focus on determining optimal parameters for cutoff values (such as by using a daily-diary reporting method as a gold standard and comparing variable periodic assessment points with variable cutoffs).

Conclusions

This study could have an impact on the future of clinical research with regard to depression treatment outcomes. Researchers have been debating the best methods for determining clinical significance, and very few solutions have been proposed that are useful. Metrics such as numbers needed to treat may be helpful in determining the overall effects of treatment but leave little information about the degree to which interventions have had an impact on individual lives. Jacobson and Truax ( 26 ) proposed a definition of clinical significance that is widely used in the psychological literature, but it assumes that the only clinically meaningful outcome for a psychiatric intervention is one that results in absolute eradication of symptoms; this is not a realistic expectation for real-world intervention. The DFD measure provides for more meaningful outcomes, and it has excellent face validity and direct clinical relevance to consumers of depression treatments.

Acknowledgments and disclosures

Original data collection was supported by grants from the John A. Hartford Foundation, the California Health Care Foundation, the Hogg Foundation, and the Robert Wood Johnson Foundation. Data analysis and preparation of this manuscript were supported by grants 5K24MH074717 and KL2RR025015 from the National Institute of Mental Health. The design, conduct, data collection, analysis, and interpretation of the results of this study were performed independently of the funders. The funding agencies also played no role in review or approval of the manuscript.

The authors report no competing interests.

Dr. Vannoy and Dr. Unützer are affiliated with the Department of Psychiatry and Behavioral Sciences, University of Washington, 1959 N.E. Pacific St., Box 356560, BB1533, Seattle, WA 98195-6560 (e-mail: [email protected]). Dr. Arean is with the Department of Psychiatry, University of California, San Francisco.

References

1. Mallick R, Chen J, Entsuah AR, et al: Depression-free days as a summary measure of the temporal pattern of response and remission in the treatment of major depression: a comparison of venlafaxine, selective serotonin reuptake inhibitors, and placebo. Journal of Clinical Psychiatry 64:321–330, 2003Google Scholar

2. Cohen J: A power primer. Psychological Bulletin 112:155–159, 1992Google Scholar

3. Riso LP, Thase ME, Howland RH, et al: A prospective test of criteria for response, remission, relapse, recovery, and recurrence in depressed patients treated with cognitive behavior therapy. Journal of Affective Disorders 43:131–142, 1997Google Scholar

4. Lave JR, Frank RG, Schulberg HC, et al: Cost-effectiveness of treatments for major depression in primary care practice. Archives of General Psychiatry 55:645–651, 1998Google Scholar

5. Araya R, Flynn T, Rojas G, et al: Cost-effectiveness of a primary care treatment program for depression in low-income women in Santiago, Chile. American Journal of Psychiatry 163:1379–1387, 2006Google Scholar

6. Ciechanowski PS, Russo JE, Katon WJ, et al: The association of patient relationship style and outcomes in collaborative care treatment for depression in patients with diabetes. Medical Care 44:283–291, 2006Google Scholar

7. Liu CF, Hedrick SC, Chaney EF, et al: Cost-effectiveness of collaborative care for depression in a primary care veteran population. Psychiatric Services 54:698–704, 2003Google Scholar

8. Montgomery SA, Andersen HF: Escitalopram versus venlafaxine XR in the treatment of depression. International Clinical Psychopharmacology 21:297–309, 2006Google Scholar

9. Pyne JM, Tripathi S, Williams DK, et al: Depression-free day to utility-weighted score: is it valid? Medical Care 45:357–362, 2007Google Scholar

10. Revicki DA, Siddique J, Frank L, et al: Cost-effectiveness of evidence-based pharmacotherapy or cognitive behavior therapy compared with community referral for major depression in predominantly low-income minority women. Archives of General Psychiatry 62:868–875, 2005Google Scholar

11. Simon GE, Barber C, Birnbaum HG, et al: Depression and work productivity: the comparative costs of treatment versus nontreatment. Journal of Occupational and Environmental Medicine 43:2–9, 2001Google Scholar

12. Simon RI: Suicide risk assessment: what is the standard of care? Journal of the American Academy of Psychiatry and the Law 30:340–344, 2002Google Scholar

13. Trivedi MH, Wan GJ, Mallick R, et al: Cost and effectiveness of venlafaxine extended-release and selective serotonin reuptake inhibitors in the acute phase of outpatient treatment for major depressive disorder. Journal of Clinical Psychopharmacology 24:497–506, 2004Google Scholar

14. Simon GE, Manning WG, Katzelnick DJ, et al: Cost-effectiveness of systematic depression treatment for high utilizers of general medical care. Archives of General Psychiatry 58:181–187, 2001Google Scholar

15. Katon WJ, Unützer J, Fan MY, et al: Cost-effectiveness and net benefit of enhanced treatment of depression for older adults with diabetes and depression. Diabetes Care 29:265–270, 2006Google Scholar

16. Katon W, Lin EH, Kroenke K: The association of depression and anxiety with medical symptom burden in patients with chronic medical illness. General Hospital Psychiatry 29:147–155, 2007Google Scholar

17. Rost K, Pyne JM, Dickinson LM, et al: Cost-effectiveness of enhancing primary care depression management on an ongoing basis. Annals of Family Medicine 3:7–14, 2005Google Scholar

18. Spitzer RL, Williams JB, Kroenke K, et al: Utility of a new procedure for diagnosing mental disorders in primary care: the PRIME-MD 1000 study. JAMA 272:1749–1756, 1994Google Scholar

19. Kroenke K, Spitzer RL, Williams JB: The PHQ-9: validity of a brief depression severity measure. Journal of General Internal Medicine 16:606–613, 2001Google Scholar

20. Derogatis LR, Lipman RS, Rickels K, et al: The Hopkins Symptom Checklist (HSCL): a self-report symptom inventory. Behavioral Science 19:1–15, 1974Google Scholar

21. Unützer J, Katon WJ, Callahan CM, et al: Collaborative care management of late-life depression in the primary care setting: a randomized controlled trial. JAMA 288:2836–2845, 2002Google Scholar

22. Lowe B, Unützer J, Callahan et al: Monitoring depression treatment outcomes with the Patient Health Questionnaire–9. Medical Care 42:1194–1201, 2004Google Scholar

23. Simon GE: Evidence review: efficacy and effectiveness of antidepressant treatment in primary care. General Hospital Psychiatry 24:213–224, 2002Google Scholar

24. Katon WJ, Schoenbaum M, Fan MY, et al: Cost-effectiveness of improving primary care treatment of late-life depression. Archives of General Psychiatry 62:1313–1320, 2005Google Scholar

25. Simon GE, Katon WJ, Lin EH, et al: Cost-effectiveness of systematic depression treatment among people with diabetes mellitus. Archives of General Psychiatry 64:65–72, 2007Google Scholar

26. Jacobson NS, Truax P: Clinical significance: a statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology 59:12–19, 1991Google Scholar