Scolaris Content Display Scolaris Content Display

Exercise for depression

Collapse all Expand all

Background

Depression is a common and important cause of morbidity and mortality worldwide. Depression is commonly treated with antidepressants and/or psychological therapy, but some people may prefer alternative approaches such as exercise. There are a number of theoretical reasons why exercise may improve depression. This is an update of an earlier review first published in 2009.

Objectives

To determine the effectiveness of exercise in the treatment of depression in adults compared with no treatment or a comparator intervention.

Search methods

We searched the Cochrane Depression, Anxiety and Neurosis Review Group’s Controlled Trials Register (CCDANCTR) to 13 July 2012. This register includes relevant randomised controlled trials from the following bibliographic databases: The Cochrane Library (all years); MEDLINE (1950 to date); EMBASE (1974 to date) and PsycINFO (1967 to date). We also searched www.controlled‐trials.com, ClinicalTrials.gov and the WHO International Clinical Trials Registry Platform. No date or language restrictions were applied to the search.

We conducted an additional search of the CCDANCTR up to 1st March 2013 and any potentially eligible trials not already included are listed as 'awaiting classification.'

Selection criteria

Randomised controlled trials in which exercise (defined according to American College of Sports Medicine criteria) was compared to standard treatment, no treatment or a placebo treatment, pharmacological treatment, psychological treatment or other active treatment in adults (aged 18 and over) with depression, as defined by trial authors. We included cluster trials and those that randomised individuals. We excluded trials of postnatal depression.

Data collection and analysis

Two review authors extracted data on primary and secondary outcomes at the end of the trial and end of follow‐up (if available). We calculated effect sizes for each trial using Hedges' g method and a standardised mean difference (SMD) for the overall pooled effect, using a random‐effects model risk ratio for dichotomous data. Where trials used a number of different tools to assess depression, we included the main outcome measure only in the meta‐analysis. Where trials provided several 'doses' of exercise, we used data from the biggest 'dose' of exercise, and performed sensitivity analyses using the lower 'dose'. We performed subgroup analyses to explore the influence of method of diagnosis of depression (diagnostic interview or cut‐off point on scale), intensity of exercise and the number of sessions of exercise on effect sizes. Two authors performed the 'Risk of bias' assessments. Our sensitivity analyses explored the influence of study quality on outcome.

Main results

Thirty‐nine trials (2326 participants) fulfilled our inclusion criteria, of which 37 provided data for meta‐analyses. There were multiple sources of bias in many of the trials; randomisation was adequately concealed in 14 studies, 15 used intention‐to‐treat analyses and 12 used blinded outcome assessors.

For the 35 trials (1356 participants) comparing exercise with no treatment or a control intervention, the pooled SMD for the primary outcome of depression at the end of treatment was ‐0.62 (95% confidence interval (CI) ‐0.81 to ‐0.42), indicating a moderate clinical effect. There was moderate heterogeneity (I² = 63%).

When we included only the six trials (464 participants) with adequate allocation concealment, intention‐to‐treat analysis and blinded outcome assessment, the pooled SMD for this outcome was not statistically significant (‐0.18, 95% CI ‐0.47 to 0.11). Pooled data from the eight trials (377 participants) providing long‐term follow‐up data on mood found a small effect in favour of exercise (SMD ‐0.33, 95% CI ‐0.63 to ‐0.03).

Twenty‐nine trials reported acceptability of treatment, three trials reported quality of life, none reported cost, and six reported adverse events.

For acceptability of treatment (assessed by number of drop‐outs during the intervention), the risk ratio was 1.00 (95% CI 0.97 to 1.04).

Seven trials compared exercise with psychological therapy (189 participants), and found no significant difference (SMD ‐0.03, 95% CI ‐0.32 to 0.26). Four trials (n = 300) compared exercise with pharmacological treatment and found no significant difference (SMD ‐0.11, ‐0.34, 0.12). One trial (n = 18) reported that exercise was more effective than bright light therapy (MD ‐6.40, 95% CI ‐10.20 to ‐2.60).

For each trial that was included, two authors independently assessed for sources of bias in accordance with the Cochrane Collaboration 'Risk of bias' tool. In exercise trials, there are inherent difficulties in blinding both those receiving the intervention and those delivering the intervention. Many trials used participant self‐report rating scales as a method for post‐intervention analysis, which also has the potential to bias findings.

Authors' conclusions

Exercise is moderately more effective than a control intervention for reducing symptoms of depression, but analysis of methodologically robust trials only shows a smaller effect in favour of exercise. When compared to psychological or pharmacological therapies, exercise appears to be no more effective, though this conclusion is based on a few small trials.

PICOs

Population
Intervention
Comparison
Outcome

The PICO model is widely used and taught in evidence-based health care as a strategy for formulating questions and search strategies and for characterizing clinical studies or meta-analyses. PICO stands for four different potential components of a clinical question: Patient, Population or Problem; Intervention; Comparison; Outcome.

See more on using PICO in the Cochrane Handbook.

Exercise for depression

Why is this review important?

Depression is a common and disabling illness, affecting over 100 million people worldwide. Depression can have a significant impact on people’s physical health, as well as reducing their quality of life. Research has shown that both pharmacological and psychological therapies can be effective in treating depression. However, many people prefer to try alternative treatments. Some NHS guidelines suggest that exercise could be used as a different treatment choice. However, it is not clear if research actually shows that exercise is an effective treatment for depression.

Who may be interested in this review?

Patients and families affected by depression.
General Practitioners.
Mental health policy makers.
Professionals working in mental health services.

What questions does this review aim to answer?

This review is an update of a previous Cochrane review from 2010 which suggested that exercise can reduce symptoms of depression, but the effect was small and did not seem to last after participants stopped exercising.

We wanted to find out if more trials of the effect of exercise as a treatment for depression have been conducted since our last review that allow us to answer the following questions:

Is exercise more effective than no therapy for reducing symptoms of depression?
Is exercise more effective than antidepressant medication for reducing symptoms of depression?
Is exercise more effective than psychological therapies or other non‐medical treatments for depression?
How acceptable to patients is exercise as a treatment for depression?

Which studies were included in the review?

We used search databases to find all high‐quality randomised controlled trials of how effective exercise is for treating depression in adults over 18 years of age. We searched for studies published up until March 2013. We also searched for ongoing studies to March 2013. All studies had to include adults with a diagnosis of depression, and the physical activity carried out had to fit criteria to ensure that it met with a definition of ‘exercise’.

We included 39 studies with a total of 2326 participants in the review. The reviewers noted that the quality of some of the studies was low, which limits confidence in the findings. When only high‐quality trials were included, exercise had only a small effect on mood that was not statistically significant.

What does the evidence from the review tell us?

Exercise is moderately more effective than no therapy for reducing symptoms of depression.
Exercise is no more effective than antidepressants for reducing symptoms of depression, although this conclusion is based on a small number of studies.
Exercise is no more effective than psychological therapies for reducing symptoms of depression, although this conclusion is based on small number of studies.
The reviewers also note that when only high‐quality studies were included, the difference between exercise and no therapy is less conclusive.
Attendance rates for exercise treatments ranged from 50% to 100%.
The evidence about whether exercise for depression improves quality of life is inconclusive.

What should happen next?

The reviewers recommend that future research should look in more detail at what types of exercise could most benefit people with depression, and the number and duration of sessions which are of most benefit. Further larger trials are needed to find out whether exercise is as effective as antidepressants or psychological treatments.

Authors' conclusions

Implications for practice

Our review suggested that exercise might have a moderate‐sized effect on depression, but because of the risks of bias in many of the trials, the effect of exercise may only be small. We cannot be certain what type and intensity of exercise may be effective, and the optimum duration and frequency of a programme of exercise. There are few data on whether any benefits persist after exercise has stopped.

The evidence also suggests that exercise may be as effective as psychological or pharmacological treatments, but the number of trials reporting these comparisons and the number of participants randomised, were both small.

Implications for research

A future update of the current review, including results from ongoing trials and those 'awaiting classification', may increase the precision of estimates of effect sizes. Future systematic reviews and meta‐analyses could be performed to investigate the effect of exercise on people with dysthymia. 

This review would be strengthened by additional large‐scale high‐quality studies where all participants at the time of recruitment were diagnosed through clinical interview as having depression, adhered closely to an exercise regimen as a sole intervention and were further assessed through diagnostic clinical interview post‐intervention.

It would also be worth considering whether any long‐lasting effects of exercise correlated with sustained increases in physical activity over time. Now that we can measure physical activity directly using accelerometers, this would be a feasible piece of research to perform.

There is a paucity of data comparing exercise with psychological treatments and pharmacological treatments. Further trials are needed in this area.

Summary of findings

Open in table viewer
Summary of findings for the main comparison. Exercise compared to control for adults with depression

Exercise compared to no intervention or placebo for adults with depression

Patient or population: adults with depression
Settings: any setting
Intervention: Exercise
Comparison: no intervention or placebo

Outcomes

Illustrative comparative risks* (95% CI)

No of Participants
(studies)

Quality of the evidence
(GRADE)

Comments

Assumed risk

Corresponding risk

No intervention or placebo

Exercise

Symptoms of depression
Different scales
Follow‐up: post‐treatment

The mean symptoms of depression in the control groups was
0

The mean symptoms of depression in the intervention groups was
0.62 standard deviations lower
(0.81 to 0.42 lower)1

1353
(35 studies)

⊕⊕⊕⊝
moderate2,3,4

SMD ‐0.62 (95% CI: ‐0.81 to ‐0.42).

The effect size was interpreted as 'moderate' (using Cohen's rule of thumb)

Symptoms of depression (long‐term)
different scales

The mean symptoms of depression (long‐term) in the control groups was
0

The mean symptoms of depression (long‐term) in the intervention groups was
0.33 standard deviations lower
(0.63 to 0.03 lower)

377
(8 studies)

⊕⊕⊝⊝
low4,5

SMD ‐0.33 (95% CI: ‐0.63 to ‐0.03).

The effect size was interpreted as 'small' (using Cohen's rule of thumb)

Adverse events

See comment

See comment

0
(6 studies)

⊕⊕⊕⊝
moderate

Seven trials reported no difference in adverse events between exercise and usual care groups. Dunn 2005 reported increased severity of depressive symptoms (n = 1), chest pain (n = 1) and joint pain/swelling (n = 1); all these participants discontinued exercise. Singh 1997 reported that 1 exerciser was referred to her psychologist at 6 weeks due to increasing suicidality; and musculoskeletal symptoms in 2 participants required adjustment of training regime. Singh 2005

reported adverse events in detail (visits to a health professional, minor illness, muscular pain, chest pain, injuries requiring training adjustment, falls, deaths and hospital days) and found no difference between the groups. Knubben 2007 reported "no negative effects of exercise (muscle pain, tightness or fatigue)"; after the training had finished, 1 person in the placebo group required gastric lavage and 1 person in the exercise group inflicted a superficial cut on her arm. Sims 2009

reported no adverse events or falls in either the exercise or control group. Blumenthal 2007 reported more side effects in the sertraline group (see comparison below) but there was no difference between the exercise and control group. Blumenthal 2012a reported more fatigue and sexual dysfunction in the sertraline group than the exercise group.

Acceptability of treatment

Study population

1363
(29 studies)

⊕⊕⊕⊝
moderate2

RR 1
(95% CI: 0.97 to 1.04)

865 per 1000

865 per 1000
(839 to 900)

Quality of life

The mean quality of life in the intervention groups was
0 higher
(0 to 0 higher)

0
(4 studies)

See comment

There was no statistically significant differences for the mental (SMD ‐0.24; 95% CI ‐0.76 to 0.29). psychological (SMD 0.28; 95% CI ‐0.29 to 0.86) and social domains (SMD 0.19; 95% CI ‐0.35 to 0.74). Two studies reported a statistically significant difference for the environment domain favouring exercise (SMD 0.62; 95% CI 0.06 to 1.18) and 4 studies reported a statistically significant difference for the physical domain favouring exercise (SMD 0.45; 95% CI 0.06 to 0.83).

*The basis for the assumed risk (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).
CI: Confidence interval; RR: Risk ratio;

GRADE Working Group grades of evidence
High quality: Further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality: We are very uncertain about the estimate.

1 Effect estimate calculated by re‐expressing the SMD on the Hamilton Depression Rating Scale using the control group SD (7) from Blumenthal 2007 (study chosen for being most representative). The SD was multiplied by the pooled SMD to provide the effect estimate on the HDRS.
2 Lack of blinding of outcome assessors probably increased effect sizes and drop‐out rates were high. Also sequence generation was considered unclear in 23 studies.
3 I² = 63% and P < 0.00001, indicated moderate levels of heterogeneity
4 Population size is large, effect size is above 0.2 SD, and the 95% CI does not cross the line of no effect.
5 Lack of blinding of outcome assessors probably increased effect sizes and drop‐out rates were high. Also sequence generation was considered unclear in 4 studies.

Open in table viewer
Summary of findings 2. Exercise compared to psychological treatments for adults with depression

Exercise compared to cognitive therapy for adults with depression

Patient or population: adults with depression
Settings:
Intervention: Exercise
Comparison: cognitive therapy

Outcomes

Illustrative comparative risks* (95% CI)

No of Participants
(studies)

Quality of the evidence
(GRADE)

Comments

Assumed risk

Corresponding risk

Cognitive therapy

Exercise

Symptoms of depression

The mean symptoms of depression in the intervention groups was
0.03 standard deviations lower
(0.32 lower to 0.26 higher)

189
(7 studies)

⊕⊕⊕⊝
moderate1,2,3

SMD ‐0.03 (95% CI: ‐0.32 to 0.26)

Acceptability of treatment

Study population

172
(4 studies)

⊕⊕⊕⊝
moderate1

RR 1.08
(95% CI: 0.95 to 1.24)

766 per 1000

827 per 1000
(728 to 950)

Quality of Life

The mean quality of life in the intervention groups was
0 higher
(0 to 0 higher)

0
(1 study)

⊕⊕⊕⊝
moderate1

One trial reported changes in the Minnesota Living with Heart Failure Questionnaire, a quality of life measure (Gary 2010). There was no statistically significant difference for the physical domain (MD 0.15; 95% CI: ‐7.40 to 7.70) or the mental domain (MD ‐0.09; 95% CI: ‐9.51 to 9.33).

*The basis for the assumed risk (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).
CI: Confidence interval; RR: Risk ratio;

GRADE Working Group grades of evidence
High quality: Further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality: We are very uncertain about the estimate.

1 Lack of blinding of outcome assessors probably increased effect sizes and drop‐out rates were high. Also sequence generation was considered unclear in 7 studies.
2 I² = 0% and P = 0.62, indicated no heterogeneity
3 The studies included were all relevant to the review question, particularly given that all studies had to meet the criteria of the ACSM definition of exercise.

Open in table viewer
Summary of findings 3. Exercise compared to bright light therapy for adults with depression

Exercise compared to bright light therapy for adults with depression

Patient or population: adults with depression
Settings:
Intervention: Exercise
Comparison: bright light therapy

Outcomes

Illustrative comparative risks* (95% CI)

No of Participants
(studies)

Quality of the evidence
(GRADE)

Comments

Assumed risk

Corresponding risk

Bright light therapy

Exercise

Symptoms of depression

The mean symptoms of depression in the intervention groups was
6.4 lower
(10.2 to 2.6 lower)

18
(1 study)

⊕⊝⊝⊝
very low1,2,3

MD ‐6.40 (95% CI: ‐10.20 to ‐2.60).

Although this trial suggests a benefit of exercise, it is too small to draw firm conclusions

*The basis for the assumed risk (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).
CI: Confidence interval;

GRADE Working Group grades of evidence
High quality: Further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality: We are very uncertain about the estimate.

1 Lack of blinding of outcome assessors probably increased effect sizes and drop‐out rates were not reported. Also sequence generation and concealment was considered unclear.
2 The study included was relevant to the review question, particularly given that all studies had to meet the criteria of the ACSM definition of exercise.
3 Based on 18 people

Open in table viewer
Summary of findings 4. Exercise compared to pharmacological treatments for adults with depression

Exercise compared to antidepressants for adults with depression

Patient or population: adults with depression
Settings:
Intervention: Exercise
Comparison: antidepressants

Outcomes

Illustrative comparative risks* (95% CI)

No of Participants
(studies)

Quality of the evidence
(GRADE)

Comments

Assumed risk

Corresponding risk

Antidepressants

Exercise

Symptoms of depression

The mean symptoms of depression in the intervention groups was
0.11 standard deviations lower
(0.34 lower to 0.12 higher)

300
(4 studies)

⊕⊕⊕⊝
moderate1,2,3

SMD ‐0.11 (95% CI: ‐0.34 to 0.12)

Acceptability of treatment

Study population

278
(3 studies)

⊕⊕⊕⊝
moderate1

RR 0.98
(95% CI: 0.86 to 1.12)

891 per 1000

873 per 1000
(766 to 997)

Quality of life

The mean quality of life in the intervention groups was
0 higher
(0 to 0 higher)

0
(1 study)

⊕⊕⊕⊝
moderate1

One trial, Brenes 2007, reported no difference in change in SF‐36 mental health and physical health components between medication and exercise groups.

Adverse events

See comment

See comment

0
(3 studies)

⊕⊕⊕⊝
moderate1

Blumenthal 1999 reported that 3/53 in exercise group suffered musculoskeletal injuries; injuries in the medication group were not reported.

Blumenthal 2007 collected data on side effects by asking participants to rate a 36‐item somatic symptom checklist and reported that "a few patients reported worsening of symptoms"; of the 36 side effects assessed, only 1 showed a statistically significant group difference (P = 0.03), i.e. that the sertraline group reported worse post‐treatment diarrhoea and loose stools.

Blumenthal 2012a assessed 36 side effects; only 2 showed a significant group difference: 20% of participants receiving sertraline reported worse post‐treatment fatigue compared with 2.4% in the exercise group and 26% reported increased sexual problems compared with 2.4% in the exercise group.

*The basis for the assumed risk (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).
CI: Confidence interval; RR: Risk ratio;

GRADE Working Group grades of evidence
High quality: Further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality: We are very uncertain about the estimate.

1 Lack of blinding of outcome assessors probably increased effect sizes and drop‐out rates were high. Also sequence generation was considered unclear in 1 study.
2 I² = 0% and P = 0.52, indicated no heterogeneity
3 The studies included were all relevant to the review question, particularly given that all studies had to meet the criteria of the ACSM definition of exercise.

Background

Description of the condition

Depression refers to a wide range of mental health problems characterised by the absence of a positive affect (a loss of interest and enjoyment in ordinary things and experiences), persistent low mood and a range of associated emotional, cognitive, physical and behavioural symptoms (NICE 2009).

Severity of depression is classified using the Diagnostic and Statistical Manual of Mental Disorders, fourth edition (DSM‐IV) criteria as mild (five or more symptoms with minor functional impairment), moderate (symptoms or functional impairment are between 'mild' and 'severe') and severe (most symptoms present and interfere with functioning, with or without psychotic symptoms) (NICE 2009). Depression is common, affecting 121 million adults worldwide, and rated as the fourth leading cause of disease burden in 2000 (Moussavi 2007). Depression is an important cause of morbidity and mortality and produces the greatest decrement in health compared with other chronic diseases such as angina or arthritis (Moussavi 2007).

Description of the intervention

Depression is commonly treated with antidepressants or psychological therapies or a combination of both. Antidepressants are effective for the treatment of depression in primary care (Arroll 2009). However antidepressants may have adverse side effects, adherence can be poor, and there is a lag time between starting antidepressants and improvements in mood. Psychological treatments are generally free from side effects and are recommended in the UK National Institute for Health and Clinical Excellence (NICE) guidelines (NICE 2009) but some people may not wish to receive psychological therapy due to low expectations of positive outcome or perceived stigma. Psychological therapy also requires sustained motivation and a degree of psychological mindedness in order to be effective. Depression is a well‐recognised reason for seeking alternative therapies (Astin 1998). Whilst this may reflect dissatisfaction with conventional treatments, another possibility is that alternative therapies may be more in line with people's own beliefs and philosophies (Astin 1998). There has been increasing interest in the potential role of alternative therapies such as music therapy, light therapy, acupuncture, family therapy, marital therapy, relaxation and exercise for the management of depression.

Exercise is defined as the "planned, structured and repetitive bodily movement done to improve or maintain one or more components of physical fitness" (ACSM 2001). The effect of exercise on depression has been the subject of research for several decades and is believed by a number of researchers and clinicians to be effective in the treatment of depression (Beesley 1997). This reflects an historic perspective on the role of aerobic exercise prescription for depression. For example, a report for the National Service Framework for Mental Health suggested that exercise should be included as a treatment option for people with depression (Donaghy 2000). The NICE guideline for depression recommended structured, supervised exercise programmes, three times a week (45 minutes to one hour) over 10 to 14 weeks, as a low‐intensity Step 2 intervention for mild to moderate depression (NICE 2009). A recent guideline published by the Scottish Intercollegiate Guidelines Network (SIGN) for non‐pharmaceutical management of depression in adults recommended that structured exercise may be considered as a treatment option for people with depression (graded 'B' relating to the strength of the evidence on which the recommendation was based) (SIGN 2010). Exercise programmes can be offered in the UK through Exercise Referral Systems (DOH 2001). These schemes direct someone to a service offering an assessment of need, development of a tailored physical activity programme, monitoring of progress and follow‐up. However, a systematic review of exercise on prescription schemes found limited evidence about their effectiveness and recommended further research (Sorensen 2006), and a further more recent review found that there was still considerable uncertainty about the effectiveness of exercise referral schemes for increasing physical activity, fitness, or health indicators, or whether they are an efficient use of resources for sedentary people (Pavey 2011). A second recent review noted that most trials in this area that have previously been included in systematic reviews recruit participants from outside of health services, making it difficult to assess whether prescribing exercise in a clinical setting (i.e. when a health professional has made a diagnosis of depression) is effective (Krogh 2011). In that review, studies were restricted only to those trials in which participants with a clinical diagnosis of depression were included, and the authors found no evidence of an effect of exercise in these trials (Krogh 2011). NICE concluded that there was insufficient evidence to recommend Exercise Referral Schemes other than as part of research studies to evaluate their effectiveness. Thus, whilst the published guidelines recommend exercise for depression, NICE recommends that Exercise Referral Schemes, to which people with depression are referred, need further evaluation.

This review focuses on exercise defined according to American College of Sports Medicine (ACSM) criteria. Whilst accepting that other forms of bodily movement may be effective, some of these are the subjects of other reviews.

How the intervention might work

Observational studies have shown that depression is associated with low levels of physical activity (Smith 2013). Whilst an association between two variables does not necessarily imply causality, there are plausible reasons why physical activity and exercise may improve mood. Exercise may act as a diversion from negative thoughts, and the mastery of a new skill may be important (LePore 1997). Social contact may be part of the mechanism. Craft 2005 found support for self efficacy as the mechanism by which exercise might have an antidepressant effect; people who experienced an improvement in mood following exercise showed higher self efficacy levels at three weeks and nine weeks post‐exercise. Self efficacy has been found to be intricately linked with self esteem, which in turn is considered to be one of the strongest predictors of overall, subjective well‐being (Diener 1984). Low self esteem is also considered to be closely related to mental illness (Fox 2000). Physical activity may have physiological effects such as changes in endorphin and monoamine levels, or reduction in the levels of the stress hormone cortisol (Chen 2013), all of which may improve mood. Exercise stimulates growth of new nerve cells and release of proteins known to improve health and survival of nerve cells, e.g. brain‐derived growth neurotrophic factor (Cotman 2002; Ernst 2005).

Why it is important to do this review

Several systematic reviews and meta‐analyses (Blake 2009; Carlson 1991; Craft 2013; Krogh 2011; Lawlor 2001; North 1990; Pinquart 2007; Rethorst 2009; Sjosten 2006; Stathopoulou 2006; Sorensen 2006) have looked at the effect of exercise on depression. However, five of these reviews pooled data from a range of study types that included uncontrolled studies and randomised as well as non‐randomised controlled trials, and pooled data from trials that compared exercise without treatment with data from trials that compared exercise and other forms of treatment (Blake 2009; Carlson 1991; Craft 2013; North 1990; Pinquart 2007). Two included trials predominantly of older people (Blake 2009; Sjosten 2006). One meta‐analysis (Stathopoulou 2006) included only publications from peer‐reviewed journals even though it is widely acknowledged that positive trials are more likely to be published than negative or inconclusive trials. The Cochrane Handbook for Systematic Reviews of Interventions recommends comprehensive searching for all trials, including unpublished ones, to avoid bias (Handbook 2011). Two meta‐analyses which included assessments of study quality both cautiously concluded that exercise may be effective, but recommended that further well‐designed trials are required (Lawlor 2001; Sjosten 2006). One meta‐analysis (Rethorst 2009) concluded that exercise is effective as a treatment for depression, but suggested that further conclusive results are necessary for exercise to become a recommended form of treatment. When only studies recruiting participants from a clinical setting were included (i.e. those diagnosed by a health professional as having depression), there is no evidence that exercise is of benefit (Krogh 2011). Another review of walking for depression suggested that walking might be a useful adjunct for depression treatment, and recommended further trials (Robertson 2012).

This review was published in 2001, in the British Medical Journal (Lawlor 2001). It was converted into a Cochrane review in 2009 (Mead 2009), and updated in 2012 (Rimer 2012). Since our last update, we had become aware of new trials that needed to be considered for inclusion, some of which had received considerable press coverage. Furthermore, several suggestions were made by the Cochrane Depression, Anxiety and Neurosis Review Group (CCDAN) editorial team about how to improve the review, e.g. inclusion of new subgroup analyses and summary of findings tables. The aim of this review is therefore to update the evidence in this area and to improve the methodology since the previous version (Rimer 2012). These changes are described below in Differences between protocol and review.

Objectives

  1. To determine the effectiveness of exercise compared with no treatment (no intervention or control) for depression in adults.

  2. To determine the effectiveness of exercise compared with other interventions (psychological therapies, alternative interventions such as light therapy, pharmacological treatment) for depression in adults.

Methods

Criteria for considering studies for this review

Types of studies

Randomised controlled trials (RCTs) (including parallel, cluster, or individual, or the first phase of cross‐over trials).

We defined a trial as a 'randomised controlled trial' if the allocation of participants to intervention and comparison groups is described as randomised (including terms such as 'randomly', 'random' and 'randomisation').

Types of participants

Adult men and women aged 18 and over (with no upper age limit) in any setting, including inpatients.

Studies were included if the participants were defined by the author of the trial as having depression (by any method of diagnosis and with any severity of depression). We excluded trials that randomised people both with and without depression, even if results from the subgroups of participants with depression were reported separately, as we had done in previous versions of the review (Mead 2009; Rimer 2012).

The effects of exercise on depressive symptoms in participants with emotional distress (but not fulfilling a diagnosis of depression) or those who are healthy were not included in this review. We acknowledge that it can sometimes be difficult to distinguish between depression and dysthymia in the 'real world', as there needs to be only two weeks of decreased interest and enjoyment to define depression. However, we were primarily interested in the role of exercise in people with depression, for whom there is substantial morbidity, rather than people with mild, transient episodes of low mood.

Studies that investigated the effect of exercise on anxiety and neurotic disorders, dysthymia (i.e. low mood not fulfilling diagnostic criteria for depression) or postnatal depression were not included in the review.

Types of interventions

Exercise was defined as "planned, structured and repetitive bodily movement done to improve or maintain one or more components of physical fitness" (ACSM 2001). The reviews in 2001 (Lawlor 2001) and 2009 (Mead 2009) included any trial where the intervention was defined by the authors as exercise, irrespective of whether it fulfilled this standard definition. For subsequent updates, we agreed with the CCDAN Review Group editorial team that we would use the widely accepted and standardised definition instead (ACSM 2001). This meant that we excluded two trials (Chou 2004; Tsang 2006) that we had included in a previous review , and we also excluded studies that provided advice on how to increase physical activity, that did not fulfil the ACSM definition of exercise. We note however that studies are included irrespective of whether fitness gains were reported or not, and if they were reported, irrespective of whether fitness gains were achieved.

Experimental intervention

  • Any type of exercise (as defined above). We excluded studies that measured outcomes immediately before and after a single exercise session, and trials which provided less than a week of exercise.

Comparator intervention

  • A 'control' intervention. This included studies in which exercise was compared to no intervention; 'waiting list control', those in which it was compared to an intervention which the authors defined as a placebo; and those in which exercise was used as an adjunct to an established treatment which was received (in an identical way) by participants in both the exercising and non‐exercising group, e.g. exercise plus cognitive behavioural therapy (CBT) versus CBT alone.

  • Other type of active treatments, where the aim of the treatment was to improve mood. This includes pharmacological treatments, psychological therapies, or other alternative treatments.

Note that this strategy was the same as that included in the original review (Lawlor 2001).

We excluded studies comparing two different types of exercise with no non‐exercising comparison group.

We excluded trials described by the authors as 'combination treatments', where exercise was one component of the 'combination', because we could not disentangle the effect of exercise from the effect of the other components of the intervention.

Types of outcome measures

Primary outcomes

1. Our primary outcome was a measure of depression or mood at the outcome assessment, either as a continuous measure or as a dichotomous outcome.

Continuous measures of depression were reported using a variety of depression scales, the most common of which were the Beck Depression Inventory (Beck 1961) and the Hamilton Rating Scale for Depression (Hamilton 1960).

In previous versions of the review, where trials used a number of different tools to assess depression, we included the main outcome measure only in the meta‐analysis. The main outcome measure was defined using a hierarchy of criteria as follows: identified by the trial authors as the main outcome measure, outcome reported in the abstract, first outcome reported in the Results section.

Where trials used dichotomous data as primary outcomes, and also provided data on continuous outcome measures, we used the data provided in the trial reports for the continuous outcome measure in our meta‐analysis. This was because we knew from previous updates that trials generally reported only continuous outcomes.

Secondary outcomes

2. Acceptability of treatment, assessed by a) attendance at exercise interventions, and b) the number of participants completing the interventions;
3. Quality of life;
4. Cost;
5. Adverse events, e.g. musculoskeletal pain, fatigue.

In order to better understand the generalisability of exercise for depression, we also extracted data on the number of people screened for inclusion and the number recruited (Table 1).

Open in table viewer
Table 1. Number screened; number still in trial and exercise intervention at end of trial

Trial ID

Screened

Randomised

Allocated exercise

Completed trial

Completed comparator group, e.g. control, other treatment (as a proportion of those allocated)

Completed exercise (as a proportion of those allocated)

Blumenthal 1999

604 underwent telephone screening

156

55

133

41/48 (medication)

44/55 (exercise plus medication)

39/53 (exercise alone)

Blumenthal 2007

457

202

51 (supervised), 53 home‐based

183

42/49 (placebo)

45/49 (sertraline)

45/51 (supervised), 51/53 home‐based

Blumenthal 2012b

1680 enquired about the study

101

37

95

23/24 completed 'placebo' and 36/40 completed the medication

36/37 completed the exercise

Brenes 2007

Not reported

37

14

Not reported

Not reported

Not reported

Bonnet 2005

Not reported

11

5

7

4/6

3/5

Chu 2008

104 responded to adverts

54

36

38

12/18

26/36 (both exercise arms combined)

15/18 in the high‐intensity arm

Dunn 2005

1664 assessed for eligibility

80

17

45

9/13

11/17 (public health dose 3 times per week)

Doyne 1987

285 responded to adverts

57

Not reported

40 completed treatment or control

27 (denominator not known)

13 (denominator not known)

Epstein 1986

250 telephone inquiries received

33

7

Not reported

Not reported

7

Fetsch 1979

Not reported

21

10

16

8/11

8/10

Foley 2008

215 responded to adverts

23

10

13

5/13

8/10

Fremont 1987

72 initially expressed an interest

61

21

49

31/40

18/21

Gary 2010

982 referred, 242 had heart failure, 137 had a BDI > 10 and 74 eligible and consented

74

20

68/74 completed post‐intervention assessments and 62 completed follow‐up assessments

usual care 15/17

exercise only: 20/20

Greist 1979

Not reported

28

10

22

15/18

8/10

Hemat‐Far 2012

350 screened

20

10

20

not stated

not stated

Hess‐Homeier 1981

Not reported

17

5

Not reported

Not reported

Not reported

Hoffman 2010

253 screened, 58 ineligible

84

42

76

39/42 (2 were excluded by the trialists and 1 did not attend follow‐up)

37/42 of exercise group provided data for analysis

Klein 1985

209 responded to an advertisement

74

27

42

11/23 (meditation)

16/24 (group therapy)

15/27

Knubben 2007

Not reported

39 (note data on only 38 reported)

20

35

16/18

19/20

Krogh 2009

390 referred

165

110

137

42/55

95/110 (both exercise arms combined)

47/55 (strength)

48/55 (aerobic)

Martinsen 1985

Not reported

43

24

37

17/19

20/24

Mather 2002

1185 referred or screened

86

43

86

42/43

43/43

McCann 1984

250 completed BDI, 60 contacted

47

16

43

14/15 completed placebo

14/16 completed 'no treatment'

15/16

McNeil 1991

82

30

10

30

10/10 (waiting list)

10/10 (social contact)

10/10

Mota‐Pereira 2011

150

33

22

29/33

10/11

19/22

Mutrie 1988

36

24

9

24

7/7

9/9

Nabkasorn 2005

266 volunteers screened

59

28

49

28/31

21/28

Orth 1979

17

11

3

7

2/2

3/3

Pilu 2007

Not reported

30

10

30

20/20

10/10

Pinchasov 2000

Not reported

18

9

Not reported

Not reported

Not reported

Reuter 1984

Not reported

Not reported

9

Not reported

Not reported

9

Schuch 2011

14/40 invited patients were not interested in participating

26

15

"no patient withdrew from intervention"

"no patient withdrew from intervention"

"no patient withdrew from intervention"

Setaro 1985

211 responses to advertisement

180

30

150

Not reported

25/30

Shahidi 2011

70 older depressed women chosen from 500 members of a district using the geriatric depression scale

70

23

60/70

20/24

20/23

Sims 2009

1550 invitations, 233 responded

45

23

43

22/22

21/23

Singh 1997

Letters sent to 2953 people, 884 replied

32

17

32

15/15

17/17

Singh 2005

451

60

20

54

19/20 (GP standard care)

18/20 (high‐intensity training)

Veale 1992

Not reported

83

48

57

29/35

36/48

Williams 2008

96 in parent study

43

33

34

8/10

26/33 (both exercise groups combined)

15/16 exercise

11/17 walking

BDI: Beck Depression Inventory

Timing of outcome assessment

We extracted data at the end of treatment, and also at the end of any longer‐term follow‐up after the intervention had been stopped.

Search methods for identification of studies

Electronic searches

We carried out the following electronic searches (Appendix 1; Appendix 2)

  • The Cochrane Depression, Anxiety and Neurosis Review Group's Specialised Register (CCDANCTR) (all years to 1 March 2013);

  • The Cochrane Central Register of Controlled Trials (CENTRAL) (all years to 2010):

  • MEDLINE (1950 to February 2010);

  • EMBASE (1980 to February 2010);

  • PsycINFO (all years to February 2010);

  • Sports Discus (1975 to 2007).

We searched Current Controlled Trials (May 2008, November 2010 and March 2013) to identify any ongoing trials. We performed an electronic search of ClinicalTrials.gov and WHO International Clinical Trials Registry Platform (ICTRP) in March 2013.

Because the searches for the CDCANCTR register are up‐to‐date and comprehensive, we were advised by the CCDAN editorial team that it was not necessary to search the other databases.

For a previous version of this review (Mead 2009) we conducted a cited reference search in the Web of Science using the references to all included studies, excluded studies and studies awaiting assessment. This cited reference search was not repeated for subsequent updates.

In order to ensure that the review was as up‐to‐date as possible when it was submitted for editorial review, we searched the CCDANCTR (up to 1st March 2013) again on 2nd May 2013, , so that we could list potentially eligible studies as ‘Studies awaiting classification’.

Searching other resources

For the initial review (Lawlor 2001), the following journals were searched: BMJ, JAMA, Archives of Internal Medicine, New England Journal of Medicine, Journal of the Royal Society of Medicine, Comprehensive Psychiatry, British Journal of Psychiatry, Acta Psychiatrica Scandinavica and British Journal of Sports Medicine.

For the update in 2009 (Mead 2009), we contacted experts, including authors of all included studies and those with at least two publications amongst the excluded studies, to identify any additional unpublished or ongoing studies, authors of significant papers and other experts in the field to ensure identification of all randomised controlled trials (published, unpublished or ongoing).

Due to limitations in available resources for this current update, we did not repeat these handsearches. We limited our contact with authors to those whose trials had been 'ongoing' in the previous version, to enquire whether they had subsequently been published. We also contacted authors to obtain any missing information about trial details. We had planned to do this should any data be missing e.g. standard deviations, although this was not necessary.

We screened the bibliographies of all included articles for additional references.

Data collection and analysis

Selection of studies

Two review authors (GC and GM) independently screened the citations from the searches, and decided which full texts should be retrieved. They then independently applied inclusion and exclusion criteria, resolving any differences in opinion through discussion. If they could not reach agreement, a third author was available (CG) to decide whether a study should be included or excluded.

For the searches of the CCDANCTR up to 1st March 2013, the Trials Search Co‐ordinator checked abstracts, excluded obviously irrelevant ones, and then sent a list of the remaining citations to GM for scrutiny, to be included as 'studies awaiting classification'. 

We created a PRISMA flow diagram to detail the study selection process.

Data extraction and management

We extracted data, when available, at the end of treatment and at the end of follow‐up.

For this update, two review authors (GC, FW) independently extracted data for our primary and secondary outcomes for each new trial identified. A third review author (GM) extracted data on type of exercise from all the included trials, to enable a fourth author (CG) to categorise intensity of exercise according to ACSM criteria.

Data extracted were participants, interventions, outcome measures, results, the number of people screened, the number randomised, the number allocated to exercise, the number who dropped out of the exercise arm (Table 1), secondary clinical outcomes, cost and adverse events, and main conclusions. All the review authors used the same structured paper extraction form that had been piloted on two studies. We resolved any discrepancies by referring to the original papers and by discussion.

Main comparisons

We undertook the following analyses.

  1. Exercise versus 'control' (as defined above).

  2. Exercise versus psychological therapies.

  3. Exercise versus alternative treatments.

  4. Exercise versus pharmacological treatments.

Assessment of risk of bias in included studies

The Cochrane Collaboration 'Risk of bias' tool was used to assess risks of bias, according to Chapter 8 of the Cochrane Handbook for Systematic Reviews of Interventions (Handbook 2011). Two review authors independently extracted data on random sequence generation, allocation concealment, blinding of participants, blinding of those delivering the intervention, blinding of outcome assessors, incomplete outcome data, selective reporting and other potential biases. Each of these domains was categorised as being at high risk of bias, unclear risk of bias or low risk of bias. We resolved any disagreements through discussion.

For concealment of allocation we distinguished between trials that were adequately concealed (central randomisation at a site remote from the study; computerised allocation in which records are in a locked, unreadable file that can be assessed only after entering participant details; the drawing of non‐opaque envelopes), inadequately concealed (open lists or tables of random numbers; open computer systems; drawing of non‐opaque envelopes) and unclear (no information in report, and the authors either did not respond to requests for information or were unable to provide information).

Trials could only be defined as 'intention‐to‐treat' if participants were analysed according to the allocated treatment AND if all participants either completed allocated treatments or if missing outcome data were replaced using a recognised statistical method, e.g. last observation carried forward (LOCF).

For blinding we distinguished between trials in which the main outcome was measured by an assessor who was blind to treatment allocation (blind) and those in which the main outcome was measured either by the participants themselves (i.e. self report) or by a non‐blinded assessor (not blind).

Measures of treatment effect

We undertook a narrative review of all studies and a meta‐analysis of those studies with appropriate data. Where trials used a number of different tools to assess depression we included the main outcome measure only in the meta‐analysis. The main outcome measure was defined using a hierarchy of criteria as follows: identified by the authors as the main outcome measure; outcome reported in the abstract; first outcome reported in the Results section.

For continuous data where different scales were used, the standardised mean difference (SMD) was calculated and reported with a 95% confidence interval (CI). For dichotomous data the risk ratio was calculated and reported with a 95% CI.

We interpreted the SMDs using the following 'rule of thumb': 0.2 represents a small effect, 0.5 a moderate effect and 0.8 a large effect (Schünemann 2008).

We pooled long‐term follow‐up data from those trials that reassessed participants long after the interventions had been completed. 'Long after' could mean an assessment at any period of time after the intervention had been completed.

Unit of analysis issues

Studies with multiple treatment groups

Where trials included a control arm, an exercise arm and an 'established treatment' arm (e.g. CBT, antidepressants), we extracted data on control versus exercise, and exercise versus established treatment. This meant that data from the exercise arm were included in two separate comparisons, in separate univariate analyses.

Where trials compared an established treatment (e.g. CBT, antidepressants) versus exercise versus both the established treatment and exercise, we made two comparisons: (i) established treatment plus exercise versus established treatment alone, and included this in the meta‐analysis of treatment versus control; (ii) exercise versus established treatment (e.g. CBT, antidepressants). This means that data from the 'established treatment alone' arm were used in two separate comparisons.

In the review versions in 2001 (Lawlor 2001), 2009 (Mead 2009) and 2012 (Rimer 2012) , for trials which included more than one intensity of exercise, we used the exercise arm with the greatest clinical effect in the review. Similarly, when trials provided more than one type of exercise, we used the type of exercise with the greatest clinical effect. However, because this may overestimate the effect of exercise, we now use the exercise arm which provides the biggest 'dose' of exercise, and performed a sensitivity analysis to explore the effect of using the smallest 'dose'.

Cross‐over trials

For cross‐over trials, we intended to use the first phase of the trial only due to the potential 'carry‐over' effect of exercise. To date, we have not included any cross‐over trials.

Cluster‐randomised trials

If cluster‐randomised trials were identified and incorrectly analysed using individuals as the unit of analysis, we intended to make corrections using the intracluster correlation coefficient (ICC). If this had not been available, we would have imputed the ICC from similar studies. In fact, we did not find any cluster‐RCTs to include.

Dealing with missing data

For two previous versions of this review (Lawlor 2001; Mead 2009) we found current contact details of all authors through correspondence addresses in study reports and by searching websites. We contacted all authors by email or post (sending three reminders to non‐responders) to establish missing details in the methods and results sections of the written reports and to determine authors' knowledge of, or involvement in, any current work in the area. For the previous update (Rimer 2012) and this current update, we contacted authors only if there were missing data items, or if we needed more detail to decide on whether or not to include the study.

Some trials, in which participants dropped out, reported data from only the remaining participants, so we used these data in our meta‐analyses. For trials which attempted to impute data from missing participants (e.g. LOCF for continuous data) we used the imputed values and categorised the trial as 'intention‐to‐treat.' When we could not obtain information either from the publication or from the authors, we classified the trial as 'not intention‐to‐treat', and used the data from the available cases in the meta‐analysis.

Assessment of heterogeneity

We used the Chi² test, together with the I² statistic, to assess heterogeneity.

A P value of 0.1 or less indicates significant heterogeneity when considering Chi². The ranges for I² are:

  • 0% to 40%: might not be important;

  • 30% to 60%: may represent moderate heterogeneity;

  • 50% to 90%: may represent substantial heterogeneity;

  • 75% to 100%: considerable heterogeneity.

Note that the importance of the observed value of I² depends on (i) the magnitude and direction of effects and (ii) the strength of evidence for heterogeneity (e.g. P value from the Chi² test, or a confidence interval for I²) (Handbook 2011).

Assessment of reporting biases

We used a funnel plot to explore reporting biases when 10 or more studies were included in the meta‐analysis. However, other reasons such as heterogeneity and small study effects also cause asymmetrical funnel plots.

Data synthesis

We used a random‐effects model based on DerSimonian and Laird's method to calculate the pooled effect size (DerSimonian 1986). We synthesised data from trials where outcome data were collected as soon as the intervention ended, and performed a separate synthesis of data collected weeks or months after the intervention ended, to explore whether any benefits were retained after the intervention had been completed. When performing meta‐analyses of complex interventions, decisions need to be made about whether the interventions are sufficiently similar to be combined into a meta‐analysis. We included trials that fulfilled the ACSM definition of exercise (ACSM 2001), and combined these data in a meta‐analysis.

We created 'Summary of findings' tables for outcomes and graded them accordingly using the GRADE approach (gradepro.org/aboutus.html).

Subgroup analysis and investigation of heterogeneity

  1. We explored the effect of different types of exercise (aerobic, resistance exercise or mixed aerobic and resistance) for those trials comparing exercise versus control on outcome, by performing subgroup analyses for the different types of exercise.

  2. This update has for the first time explored the impact of intensity of exercise on outcome, dividing intensity into hard/vigorous or moderate, using ACSM criteria(ACSM 1998).The combination (where possible) of authors' description, compendium of physical activities classification and the ACSM intensity/metabolic equations (MET) cut‐offs (in particular the most recent ones which take age into account) were used to categorise intensity (https://sites.google.com/site/compendiumofphysicalactivities/Activity).

  3. This update has for the first time explored the effect of the number of exercise sessions, by extracting data on the length of the exercise programme and the frequency of exercise sessions. We categorised studies by the total number of sessions and then grouped the total number as 0 ‐ 12, 13 ‐ 24, 25 ‐ 36, at least 37.

  4. This update has for the first time explored how the diagnosis of depression at baseline (using a cut‐point on a scale, or by psychiatric interview) influenced the effect of exercise on mood at the end of treatment.

Sensitivity analysis

We undertook sensitivity analyses to explore how much of the variation between studies comparing exercise to no exercise is explained by between‐study differences in:

  1. publication type (peer‐reviewed journal, conference abstract/proceedings, doctoral dissertation).

  2. allocation concealment.

  3. intention‐to‐treat analysis (as defined above).

  4. blinding.

We included only trials at low risk of bias for each of these outcomes in the sensitivity analysis. We then performed a sensitivity analysis, as we had done previously, including trials that were at low risk of bias for three key quality criteria: Allocation concealment, AND intention‐to‐treat AND blinding.

We also performed a sensitivity analysis using those trials that had several arms, for which we had included the arm with the biggest 'dose' of exercise in the initial analysis. Here we include the arm with the smallest 'dose'.

Results

Description of studies

Results of the search

The results of searches for the previous updates have already been described in detail (Lawlor 2001; Mead 2009; Rimer 2012).

The 2012 review update (Rimer 2012) included studies identified from searches performed in 2010 and 2011. In 2010 (Rimer 2012), we had identified three ongoing trials (Blumenthal 2012a; McClure 2008; Underwood 2013). Of these, one has been included (Blumenthal 2012a) in this update. One of these was excluded because the intervention was not exercise alone (McClure 2008). The other was excluded because participants did not have to have depression to enter the trial (Underwood 2013); although the trialists reported results from the subgroup with depression at entry, we had previously excluded trials reporting our main outcomes as subgroup analyses. In June 2011, our search of the Cochrane Depression, Anxiety and Neurosis Group Clinical Trials Register (CCDANCTR) identified 45 citations; of which we retrieved full texts for 10 studies. Of these 10 full‐text studies, we excluded five (Lolak 2008; Mailey 2010; Oeland 2010; Sneider 2008; Thomson 2010) and five studies (Annesi 2010; Ciocon 2003; Gary 2010; Shahidi 2011; Chalder 2012) were listed as 'awaiting classification' (Rimer 2012); in this current update, two of these have been included (Gary 2010; Shahidi 2011), one has been excluded because further scrutiny led us to conclude that the intervention did not fulfil the definition of 'exercise' (Ciocon 2003), one was excluded because it was a trial of advice to increase physical activity that did not fulfil the ACSM definition of exercise (Chalder 2012) and one was excluded because it was a subgroup analysis from a trial of people with obesity (Annesi 2010).

In September 2012, the searches of the CCDANCTR identified a further 290 citations. Of these, we retrieved full texts for 43 studies: 39 were excluded (Akandere 2011; Arcos‐Carmona 2011; Attia 2012; Aylin 2009; Bowden 2012; Chalder 2012; Chan 2011; Chow 2012; Christensen 2012; Clegg 2011; Demiralp 2011; Deslandes 2010; Gutierrez 2012; Hedayati 2012; Immink 2011; Jacobsen 2012; Johansson 2011; Lavretsky 2011; Leibold 2010; Levendoglu 2004; Levinger 2011; Littbrand 2011; Matthews 2011; Midtgaard 2011; Mudge 2008; O'Neil 2011; Ouzouni 2009; Penttinen 2011; Perna 2010; Piette 2011; Robledo Colonia 2012; Roshan 2011; Ruunsunen 2012; Schwarz 2012; Silveira 2010; Songoygard 2012; Trivedi 2011; Whitham 2011; Wipfli 2011); three were included (Hemat‐Far 2012; Mota‐Pereira 2011; Schuch 2011) and one trial is not yet complete (EFFORT D).

In March 2013, a search of the WHO Clinical Trials Registry Platform identified 188 citations. We sought full texts for 29; of these 29 studies, 23 are listed as ongoing trials (ACTRN12605000475640; ACTRN12612000094875; ACTRN12612000094875; CTR/2012/09/002985; EFFORT D; IRCT201205159763; IRCT2012061910003N1; ISRCTN05673017; NCT00103415; NCT00643695; NCT00931814; NCT01024790; NCT01383811; NCT01401569; NCT01464463; NCT01573130; NCT01573728; NCT01619930; NCT01696201; NCT01763983; NCT01787201; NCT01805479; UMIN000001488). One trial has been completed and is included (Hoffman 2010). Four trials were excluded (Bromby 2010; Lever‐van Milligen 2012; NCT00964054; NCT00416221) and one awaits assessment (DEMO II 2012)

Through correspondence with the authors of one study (Blumenthal 2012a), another study by the same group was identified (Blumenthal 2012b); however this reported data from a subgroup with depression and was excluded (as we did for previous trials reporting subgroups).

The search of CCDANCTR up to 1st March 2013 identified 151 records (titles and abstracts). The Trials Search Co‐ordinator excluded 89 obviously irrelevant citations. Of the remaining 62 studies, seven were already listed as included or excluded or awaiting assessment, one review author (GM) excluded 46 were excluded as they were obviously irrelevant, and the full text of nine articles were retrieved; one of these was a subsidiary publication for an included study (Hoffman 2010), one had already been excluded (Silveira 2010) and the other seven are listed as ‘awaiting classification' (Aghakhani 2011DEMO II 2012; Gotta 2012; Murphy 2012; Pinniger 2012; Sturm 2012; Martiny 2012).

For this current update, we are therefore including seven new studies (Hemat‐Far 2012;Hoffman 2010; Gary 2010; Mota‐Pereira 2011; Shahidi 2011; Schuch 2011; Blumenthal 2012a), making a total of 39 included studies (Characteristics of included studies table). For this update, we have excluded a further 54 studies (Characteristics of excluded studies table), giving a total of 175 excluded, listed 23 as ongoing studies (Characteristics of ongoing studies table), and listed seven as awaiting classification (Characteristics of studies awaiting classification table).

See the PRISMA flow diagram for details of the study selection process for this current update (Figure 1).


Study flow diagram, showing the results of the searches for this current update.

Study flow diagram, showing the results of the searches for this current update.

Included studies

In our previous update, we identified 32 completed trials.

For this update, we include seven additional trials, recruiting a total of 408 additional participants at randomisation. Of these, 374 participants remained in the trials by the time of outcome analysis (Blumenthal 2012a; Gary 2010; Hemat‐Far 2012; Hoffman 2010; Mota‐Pereira 2011; Shahidi 2011; Schuch 2011) (see Characteristics of included studies table).

Of the 39 included trials (recruiting 2326 people), 22 were from the USA (Blumenthal 1999; Blumenthal 2007; Blumenthal 2012a; Bonnet 2005; Brenes 2007; Chu 2008; Dunn 2005; Doyne 1987; Epstein 1986; Fetsch 1979; Fremont 1987; Gary 2010; Greist 1979; Hess‐Homeier 1981; Hoffman 2010; Klein 1985; McCann 1984; Orth 1979; Reuter 1984; Setaro 1985; Singh 1997; Williams 2008); one was from Canada (McNeil 1991), three from the UK (Mather 2002; Mutrie 1988; Veale 1992), two from Australia (Sims 2009; Singh 2005), two from Iran (Hemat‐Far 2012; Shahidi 2011), one from New Zealand (Foley 2008), one from Norway (Martinsen 1985), one from Denmark (Krogh 2009), one from Germany (Knubben 2007), one from Italy (Pilu 2007), one from Russia (Pinchasov 2000), one from Brazil (Schuch 2011), one from Portugal (Mota‐Pereira 2011) and one from Thailand (Nabkasorn 2005).

Of these 39 trials, 30 were peer‐reviewed papers (Blumenthal 1999; Blumenthal 2007; Blumenthal 2012a; Brenes 2007; Dunn 2005; Doyne 1987; Foley 2008; Fremont 1987; Gary 2010; Greist 1979; Hemat‐Far 2012; Hoffman 2010; Klein 1985; Krogh 2009; Knubben 2007; Martinsen 1985; Mather 2002; McCann 1984; McNeil 1991; Mota‐Pereira 2011; Nabkasorn 2005; Pilu 2007; Pinchasov 2000; Schuch 2011; Shahidi 2011; Sims 2009; Singh 1997; Singh 2005; Veale 1992; Williams 2008). Seven were doctoral dissertations (Bonnet 2005; Chu 2008; Epstein 1986; Fetsch 1979; Hess‐Homeier 1981; Orth 1979; Setaro 1985) and two were published in abstract form only (Mutrie 1988; Reuter 1984).

Of these 39 trials, data from two studies were unsuitable for statistical pooling because they were provided in graphical form only (McCann 1984) or provided no numerical data at all (Greist 1979). One trial (Nabkasorn 2005) provided data in graphical form only which we were able to include after manually converting the graph into mean and standard deviation (SD) values by drawing a horizontal line from the mean and SD on the graph to the vertical axis. Hence, we used data from 37 trials in the meta‐analyses.

Five trials (Blumenthal 1999; Blumenthal 2007; Blumenthal 2012a; Krogh 2009; Mather 2002) provided data on whether participants fulfilled diagnostic criteria for depression at the end of the study, as well as depression scales. We used the scale results described in the paper rather than using formulae to convert the dichotomous outcomes to continuous outcomes, to allow inclusion of these trials in the meta‐analysis.

Five authors provided further data on their studies (Blumenthal 2012a; Gary 2010; Hoffman 2010; Mota‐Pereira 2011; Sims 2009).

Design

All included studies were randomised controlled trials (RCTs); further details are provided in the Characteristics of included studies table. There were no cluster‐RCTs that fulfilled our inclusion criteria.

Seventeen studies had two arms (Bonnet 2005; Fetsch 1979; Foley 2008; Knubben 2007; Hemat‐Far 2012; Hoffman 2010; Martinsen 1985; Mather 2002; Mota‐Pereira 2011; Nabkasorn 2005; Pilu 2007; Pinchasov 2000; Reuter 1984; Schuch 2011; Sims 2009; Singh 1997; Veale 1992), 17 had three arms (Blumenthal 1999; Blumenthal 2012a; Brenes 2007; Chu 2008,; Doyne 1987; Epstein 1986; Fremont 1987; Greist 1979; Hess‐Homeier 1981; Klein 1985; Krogh 2009; McCann 1984; McNeil 1991; Mutrie 1988; Shahidi 2011; Singh 2005; Williams 2008), three had four arms (Blumenthal 2007; Gary 2010; Orth 1979), one had five arms (four intensities of exercise and control; (Dunn 2005) and one had six arms (cognitive behavioural therapy (CBT) plus aerobic exercise, aerobic exercise only, CBT only, CBT plus non‐aerobic exercise, non‐aerobic exercise only or no intervention; Setaro 1985).

Of the 17 trials with two arms, exercise was compared with waiting list or usual care in eight trials (Hemat‐Far 2012; Hoffman 2010; Mota‐Pereira 2011; Nabkasorn 2005; Pilu 2007; Schuch 2011; Sims 2009; Veale 1992), exercise was compared with a placebo intervention (e.g. social activity) in four trials (Knubben 2007; Martinsen 1985; Mather 2002; Singh 1997), exercise was compared with CBT in one trial (Fetsch 1979), two trials compared CBT plus exercise versus CBT alone (Bonnet 2005; Reuter 1984), one trial compared exercise with stretching (Foley 2008) and one trial compared exercise with bright light therapy (Pinchasov 2000).

Of the 17 trials with three arms, one trial compared exercise versus exercise plus sertraline versus sertraline (Blumenthal 1999), one compared exercise versus sertraline versus usual care (Brenes 2007), one compared exercise versus antidepressant (sertraline) versus placebo (Blumenthal 2012a), one compared exercise versus walking versus social conversation (Williams 2008) and three compared exercise versus waiting list versus a placebo intervention (e.g. social activity) (McCann 1984; McNeil 1991; Mutrie 1988). Two compared exercise versus usual care versus CBT (Epstein 1986; Hess‐Homeier 1981), one compared exercise versus CBT versus both exercise and CBT (Fremont 1987), one compared exercise versus low‐intensity CBT versus high‐intensity CBT (Greist 1979), and one compared exercise versus a placebo versus CBT (Klein 1985). One trial compared high‐intensity versus low‐intensity aerobic exercise versus stretching (Chu 2008), one compared strength versus aerobic versus relaxation training (Krogh 2009), one compared high‐intensity resistance training versus low‐intensity resistance training versus usual care (Singh 2005), one compared exercise versus yoga versus control (Shahidi 2011) and one compared running versus weight‐lifting versus waiting list (Doyne 1987).

Of the three trials with four arms, one compared exercise to three types of control (Orth 1979), one compared home‐based exercise versus supervised exercise versus sertraline versus placebo (Blumenthal 2007), and one compared exercise versus combined exercise and CBT versus CBT alone versus usual care (Gary 2010).

Participants

Twenty‐three trials recruited participants from non‐clinical populations (Blumenthal 1999; Blumenthal 2007; Bonnet 2005; Brenes 2007; Dunn 2005; Doyne 1987; Epstein 1986; Fetsch 1979; Fremont 1987; Greist 1979; Hemat‐Far 2012; Hess‐Homeier 1981; Klein 1985; McCann 1984; McNeil 1991; Nabkasorn 2005; Orth 1979; Pinchasov 2000; Setaro 1985; Shahidi 2011; Singh 1997; Singh 2005; Williams 2008) with most involving recruitment of participants through the media.

Nine trials recruited participants from clinical populations, i.e. hospital inpatients or outpatients (Gary 2010; Knubben 2007; Martinsen 1985; Mota‐Pereira 2011; ; Mutrie 1988; Pilu 2007; Reuter 1984; Schuch 2011; Veale 1992).

Seven trials recruited participants from both clinical and non‐clinical populations (Blumenthal 2012a; Chu 2008; Foley 2008; Hoffman 2010; Krogh 2009; Mather 2002; Sims 2009).

Of the 23 trials recruiting people from non‐clinical populations, diagnosis of depression was by a clinical interview in ten studies (Blumenthal 1999; Blumenthal 2007; Bonnet 2005; Doyne 1987; Dunn 2005; Hemat‐Far 2012; Klein 1985; Pinchasov 2000; Singh 1997; Singh 2005). The other 13 studies used a cut‐off point on one of several depression scales: Beck Depression Inventory: (Epstein 1986; Fremont 1987; Fetsch 1979; Hess‐Homeier 1981; McCann 1984; McNeil 1991); Centre for Epidemiologic Studies Depression Scale (Nabkasorn 2005); Cornell Scale for Depression in Dementia (Williams 2008); Depression Adjective Checklist (Orth 1979); Minnesota Multiple Personality Inventory (Setaro 1985); Patient Health Questionnaire‐9 (Brenes 2007); Geriatric Depression Scale (Shahidi 2011); or Symptom Checklist Score (Greist 1979).

There were more women than men (see Characteristics of included studies table) and mean age ranged from 22 years (Orth 1979) to 87.9 years (Williams 2008).

Interventions

Thirty‐three trials provided aerobic exercise, of which 16 trials provided running (Blumenthal 1999; Blumenthal 2012a; Doyne 1987; Epstein 1986; Fetsch 1979; Fremont 1987; Greist 1979; Hess‐Homeier 1981; Hemat‐Far 2012; Klein 1985; McCann 1984; Nabkasorn 2005; Orth 1979; Reuter 1984; Shahidi 2011; Veale 1992), three provided treadmill walking (Blumenthal 2007; Bonnet 2005; Dunn 2005), four provided walking (Gary 2010; Knubben 2007; McNeil 1991; Mota‐Pereira 2011), one provided aerobic training with an instructor (Martinsen 1985), one provided aerobic dance (Setaro 1985) and one provided cycling on a stationary bicycle (Pinchasov 2000).

Three studies provided aerobic exercises according to preference (Chu 2008; Hoffman 2010; Schuch 2011) and another provided mixed aerobic and resistance training (Brenes 2007). One study did not specify the type of aerobic exercise provided (Foley 2008).

Two trials compared two different exercise interventions versus control: Krogh 2009 compared resistance training with combination aerobic exercises (including cycling, running, stepping and rowing) and Williams 2008 compared combination walking and strength training and walking alone.

Two trials provided mixed exercise, i.e. endurance, muscle strengthening and stretching (Mather 2002; Mutrie 1988), and four provided resistance training (Pilu 2007; Sims 2009; Singh 1997; Singh 2005).

Seventeen trials (Blumenthal 2007; Blumenthal 2012a; Bonnet 2005; Brenes 2007; Doyne 1987; Dunn 2005; Fremont 1987; Hoffman 2010; Knubben 2007; Mather 2002; McCann 1984; Mutrie 1988; Schuch 2011; Setaro 1985; Sims 2009; Singh 1997; Singh 2005) provided indoor exercise, two trials provided outdoor exercise ( Gary 2010; McNeil 1991) and the remaining trials did not report whether the exercise was indoors or outdoors.

Only one trial stated that unsupervised exercise was provided (Orth 1979). Two trials included both supervised and home‐based exercise arms (Blumenthal 2007; Chu 2008). The other trials provided supervised exercise or did not report this information.

Twelve trials provided individual exercises (Blumenthal 2007; Chu 2008; Doyne 1987; Dunn 2005; Greist 1979; Klein 1985; McNeil 1991; Mota‐Pereira 2011; Mutrie 1988; Orth 1979; Schuch 2011; Williams 2008), 16 provided group exercises (Blumenthal 1999; Blumenthal 2012a; Brenes 2007; Fetsch 1979; Fremont 1987; Krogh 2009; Mather 2002; McCann 1984; Nabkasorn 2005; Pilu 2007; Setaro 1985; Shahidi 2011; Sims 2009; Singh 1997; Singh 2005; Veale 1992) and the remaining trials did not report this information.

The duration of the intervention ranged from 10 days (Knubben 2007) to 16 weeks (Blumenthal 1999; Blumenthal 2007). Two trials did not state duration: one performed assessments at the end of the intervention at eight months (Pilu 2007) and the other at time of discharge from hospital (Schuch 2011).

The 'control' groups of 'no treatment' or 'placebo' comprised heterogeneous interventions including social conversation, telephone conversations to discuss their general health and relaxation (avoiding muscular contraction). For exercise versus control, there were different types of comparator arm (see Analysis 5.5). Two compared with a placebo; 17 with no treatment, waiting list, usual care or self management; six compared exercise plus treatment vs treatment; six compared exercise with stretching, meditation or relaxation; and four with 'occupational intervention', health education or casual conversation.

Outcomes
Depression measurement

Of the 39 trials, 12 reported Beck Depression Inventory (BDI) scores, and 13 reported Hamilton Rating Scale for Depression (HAMD) scores. A variety of other scales were also used.

Other clinical endpoints and adverse effects

Several recorded clinical endpoints as well as mood: Blumenthal 2012a, Brenes 2007 and Gary 2010 (physical functioning); Chu 2008 (self efficacy), Foley 2008 (self efficacy, episodic memory and cortisol awakening response), Knubben 2007 (length of hospital stay), Krogh 2009 (absence from work and effect on cognitive ability), Mather 2002 (participant and clinical global impression), Pilu 2007 and Mota‐Pereira 2011 (clinical global impression and global assessment of functioning), Sims 2009 (quality of life, stroke impact scale, psychosocial health status and adverse events), Singh 1997 (sickness impact profile), Gary 2010, Hoffman 2010, Schuch 2011, Singh 2005, Pilu 2007 and Brenes 2007 (quality of life), Shahidi 2011 (Life satisfaction scale) and Blumenthal 2012a (cardiovascular biomarkers).

Seven trials systematically recorded and reported adverse events (Blumenthal 2007; Blumenthal 2012a; Dunn 2005; Knubben 2007; Singh 1997; Singh 2005; Sims 2009). No trial provided data on costs.

Timing of outcome measures

All our included trials reported mood as a continuous outcome at the end of treatment. Long‐term follow‐up data beyond the end of the interventions are described for eight trials (ranging from 4 months to 26 months). Fremont 1987 (follow‐up at four months), Sims 2009 (follow‐up at six months), Klein 1985 (follow‐up for nine months), Blumenthal 1999 (follow‐up at 10 months Babyak 2000), Krogh 2009 (follow‐up at 12 months), Singh 1997 (follow‐up at 26 months, reported in Singh 2001), Mather 2002 (follow‐up at 34 weeks) and Gary 2010 (follow‐up at six months). Hoffman 2010 reported long‐term follow‐up but we were unable to include this in the meta‐analysis due to the way it was reported. The author has been contacted for data.

Excluded studies

In this update, a further 54 studies were excluded following review of full text.

In this update, we decided not to list reviews as excluded studies. Additionally some references that were previously classified as excluded studies have been re‐classified as additional reports of included studies. As a result there are now a total of 174 excluded studies.

One hundred and twenty‐nine publications described randomised trials of exercise; the reasons for excluding these are listed in more detail below:

In 93 trials, participants did not have to have depression (as defined by the authors of the trial) to be eligible for the trial (Abascal 2008; Akandere 2011; Arcos‐Carmona 2011; Asbury 2009; Aylin 2009; Badger 2007; Baker 2006; Berke 2007; Blumenthal 2012b; Bosch 2009; Brittle 2009; Burton 2009; Carney 1987; Chen 2009; Christensen 2012; Clegg 2011; Courneya 2007; Demiralp 2011; Dalton 1980; Eby 1985; Elavsky 2007; Ersek 2008; Fox 2007; Gary 2007; Ghroubi 2009; Gottlieb 2009; Gusi 2008; Gutierrez 2012; Haffmans 2006; Hannaford 1988; Haugen 2007; Hembree 2000; Herrera 1994; Hughes 1986; Jacobsen 2012; Johansson 2011; Karlsson 2007; Kerr 2008; Kerse 2010; Kim 2004; Knapen 2006; Kulcu 2007; Kupecz 2001; Lai 2006; Latimer 2004; Lautenschlager 2008; Leppämäki 2002; Levendoglu 2004; Lever‐van Milligen 2012; Levinger 2011; Lin 2007; Littbrand 2011; Lolak 2008; Machado 2007; Mailey 2010; Martin 2009; Matthews 2011; Midtgaard 2011; Morey 2003; Motl 2004; Mudge 2008; Mutrie 2007; Neidig 1998; Neuberger 2007; Nguyen 2001; Oeland 2010; Ouzouni 2009; Pakkala 2008; Penttinen 2011; Perna 2010; Rhodes 1980; Robledo Colonia 2012; Ruunsunen 2012; Salminen 2005; Sarsan 2006; Sims 2006; Smith 2008; Songoygard 2012; Stein 1992; Stern 1983; Strömbeck 2007; Sung 2009; Tapps 2009; Thomson 2010; Tomas‐Carus 2008; Tsang 2003; Tenorio 1986; Underwood 2013; Weinstein 2007; White 2007; Wilbur 2009; Wipfli 2008; Wipfli 2011).

Ten trials compared two types of exercise with no non‐exercising control (Bosscher 1993; NCT00546221; NCT01152086; Legrand 2009; Passmore 2006; Sexton 1989; TREAD 2003; Trivedi 2011; Wieman 1980; Williams 1992).

Three trials reported subgroup analyses of depressed patients, one from a randomised trial of exercise for osteoarthritis (Penninx 2002), one from a cohort of participants enrolled in cardiac rehabilitation following major cardiac events (Milani 2007) and one from a cluster‐RCT of exercise in nursing homes (Underwood 2013).

Three trials included only a single bout of exercise (Bartholomew 2005; Bodin 2004; Gustafsson 2009) and one trial provided exercise for only four days (Berlin 2003).

Two trials that recruited women with postnatal depression were excluded (Armstrong 2003; Armstrong 2004).

Five trials provided exercise interventions that did not fulfil the ACSM definition of exercise (Tai‐Chi (Chou 2004); Qigong (Tsang 2006; Chow 2012); and yoga (Oretzky 2006; Immink 2011) to waiting list controls.

Seven trials involving adolescents (Beffert 1993; Brown 1992; NCT00964054; Hughes 2009; MacMahon 1988; Rofey 2008; Roshan 2011) were excluded.

Two trials were excluded as they provided exercise counselling, not exercise (Vickers 2009; Chalder 2012).

Three trials were excluded as the intervention was multifaceted (McClure 2008; O'Neil 2011; Sneider 2008).

Ongoing studies

There are 23 ongoing studies (IRCT201205159763; NCT01805479; CTR/2012/09/002985; NCT01787201; NCT01619930; NCT01573130; NCT01573728; EFFORT D; NCT01464463; ACTRN12605000475640; UMIN000001488; NCT01763983; ACTRN12612000094875; NCT00103415; ACTRN12609000150246; NCT01696201; ISRCTN05673017; NCT01401569; IRCT2012061910003N1; NCT01024790; NCT01383811; NCT00643695; NCT00931814). One trial (EFFORT D) was identified from the September 2012 search of the CCDANCTR and the remaining 23 were identified from the March 2013 search of the WHO Clinical Trials Registry Platform.

Studies awaiting classification

This is a fast‐moving field and our searches of CCDANCTR in March 2013 identified seven studies that are awaiting further assessment (Aghakhani 2011DEMO II 2012; Martiny 2012; Murphy 2012; Pinniger 2012; Sturm 2012; Gotta 2012). Initial screening of these studies indicated three of these studies (Martiny 2012; Pinniger 2012; Sturm 2012) could be eligible for inclusion in this review. We plan to update the review, ideally within the year, to include the constantly growing number of relevant studies. See Characteristics of studies awaiting classification for full details.

New studies found for this update

We are including seven additional trials, recruiting a total of 408 participants at randomisation. Of these, 374 participants remained in the trials by the time of outcome analysis (Blumenthal 2012a; Hemat‐Far 2012; Hoffman 2010; Gary 2010; Mota‐Pereira 2011; Shahidi 2011; Schuch 2011). See Characteristics of included studies.

Risk of bias in included studies

Sequence generation

We categorised 11 trials as being at low risk of bias, one as being at high risk of bias (Hemat‐Far 2012), and the rest as being at unclear risk of bias. A graphical representation of the 'Risk of bias' assessment can be seen in Figure 2 and Figure 3. Please see the Characteristics of included studies for the full 'Risk of bias' assessment for each study.


'Risk of bias' graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.

'Risk of bias' graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.


'Risk of bias' summary: review authors' judgements about each risk of bias item for each included study.

'Risk of bias' summary: review authors' judgements about each risk of bias item for each included study.

Allocation

Allocation concealment was adequate and therefore at low risk of bias in 14 trials (Blumenthal 2007; Blumenthal 2007; Blumenthal 2012a; Dunn 2005; Hoffman 2010; Knubben 2007; Krogh 2009; Martinsen 1985; Mather 2002; Sims 2009; Singh 1997; Singh 2005; Veale 1992; Williams 2008). For the remaining trials, the risk of bias was rated as unclear or high.

Blinding

Twelve trials included blinding of the outcome assessor so were rated as being at low risk of bias (Blumenthal 1999; Blumenthal 2007; Blumenthal 2012a; Brenes 2007; Dunn 2005; Gary 2010; Knubben 2007; Krogh 2009; Mather 2002; Mota‐Pereira 2011; Singh 2005; Williams 2008). The rest were categorised as being at unclear or high risk because they used self‐reported outcomes.

In exercise trials, participants cannot be blind to the treatment allocation. We were uncertain what effect this would have on bias, so we classified all trials as being at 'unclear' risk of bias. Similarly, for those trials where supervised exercise was provided, the person delivering the intervention could not be blind, so we classified all trials as being at 'unclear' risk of bias (note that not all reported whether exercise was performed with or without supervision).

Incomplete outcome data

Fifteen trials performed 'intention‐to‐treat' (ITT) analyses (Blumenthal 1999; Blumenthal 2007; Blumenthal 2012a; Dunn 2005; Hemat‐Far 2012; Hoffman 2010; Krogh 2009; Mather 2002; McNeil 1991; Mota‐Pereira 2011; Mutrie 1988; Orth 1979; Pilu 2007; Singh 1997; Schuch 2011), i.e. complete outcome data were reported or, if there were missing outcome data, these were replaced using a recognised statistical method, e.g. last observation carried forward, and participants remained in the group to which they had been allocated. One trial reported data for individual participants (Orth 1979), so by using last observation carried forward we replaced data from the participants who did not complete the trial and included these data in the meta‐analysis of ITT trials. One trial reported that the analysis was ITT, because it used a 'worse‐case' scenario assumption to replace data from participants who did not complete the trial, but only included 38 of the 39 randomised participants in the analyses, so we classified it as 'not ITT' (Knubben 2007). The remaining studies were classified as being at unclear or high risk of bias. Most trials did not report data from participants who dropped out.

Selective reporting

We attempted to identify whether a protocol was available by screening the reference list of the publications. We identified protocols for three trials and checked that all prespecified outcome events were reported, and rated these as being at low risk of bias (Blumenthal 2012a; Dunn 2005; Krogh 2009). We categorised four others trials (Hemat‐Far 2012; Hoffman 2010; Mota‐Pereira 2011; Shahidi 2011) as being at low risk of bias as we judged that there was sufficient information in the methods to be sure that the trials had reported all their planned outcomes.

Other potential sources of bias

On the basis of the first 50 participants, one study (Krogh 2009) changed their sample size calculation based on the observed standard deviation (they had initially calculated they needed a minimum of 186 participants (SD = 6), but for the first 50 participants the SD was 3.9 so they adjusted their sample size calculation to 135 participants). Hemat‐Far 2012 told the control group to reduce their activity.

We decided to include data on continuous depression scores in our meta‐analysis, rather than depression measured as a dichotomous outcome. This is because we knew from previous updates that very few trials had reported depression as a dichotomous outcome, and we wished to include as many trials as possible in our meta‐analysis. For future updates, we will consider whether to perform a separate meta‐analysis for trials that measured depression as a dichotomous outcome.

Publication bias

Visually our funnel plot appeared to be asymmetrical.

There is evidence of bias (Begg P value = 0.02, Egger P value = 0.002) (funnel plot for Analysis 1.1, exercise versus control, Figure 4) that might be due to publication bias, to outcome reporting bias or to heterogeneity.


Funnel plot of comparison: 1 Exercise versus control, outcome: 1.1 Reduction in depression symptoms post‐treatment.

Funnel plot of comparison: 1 Exercise versus control, outcome: 1.1 Reduction in depression symptoms post‐treatment.

Effects of interventions

See: Summary of findings for the main comparison Exercise compared to control for adults with depression; Summary of findings 2 Exercise compared to psychological treatments for adults with depression; Summary of findings 3 Exercise compared to bright light therapy for adults with depression; Summary of findings 4 Exercise compared to pharmacological treatments for adults with depression

We included 37 trials in our meta‐analyses. The remaining two trials could not be included for the reasons stated above (Greist 1979; McCann 1984).

Comparison 1: Exercise versus 'control'

Thirty‐five trials (1356 participants) included a comparison of exercise with a 'control' intervention.

Primary outcome measure
1.1 Reduction in depression symptom severity

Post‐treatment

The pooled standardised mean difference (SMD), for the 35 trials, calculated using the random‐effects model was ‐0.62 (95% confidence interval (CI) ‐0.81 to ‐0.42) (Analysis 1.1), indicating a moderate clinical effect in favour of exercise. There was substantial heterogeneity (I² = 63%).

End of long‐term follow‐up

The pooled SMD from the eight trials (Blumenthal 1999; Fremont 1987; Gary 2010; Klein 1985; Krogh 2009; Mather 2002; Sims 2009; Singh 1997) (377 participants) that provided long‐term follow‐up data found only a small effect in favour of exercise (SMD ‐0.33, 95% CI ‐0.63 to ‐0.03) (Analysis 1.2). The long‐term follow‐up data from Blumenthal 1999 were reported in a separate publication (Babyak 2000), and from Singh 1997 in a separate publication (Singh 2001). There was moderate statistical heterogeneity (I² = 49%). Follow‐up data from Blumenthal 2007 have been reported according to the proportion who had fully or partially remitted from depression, but continuous mood scores were not reported so we could not include these data in the meta‐analysis.

Secondary outcome measures
1.2a Acceptability of treatment: attendance at exercise

Fourteen trials reported attendance rates for exercise; these were 50.6% for aerobic and 56.2% for strength arm (Krogh 2009), 59% (Mather 2002), 70% (Doyne 1987), 72% (Dunn 2005), 78% (Nabkasorn 2005), 82% (Gary 2010), 91% (Mota‐Pereira 2011) 92% (Blumenthal 1999; Blumenthal 2007), 93% (Singh 1997), 94% (Blumenthal 2012a), 97.3% for high‐intensity and 99.1% for low‐intensity (Chu 2008) and 95% to 100% (Singh 2005). One trial reported the mean number of exercise sessions attended as 5.88 (Hoffman 2010). One trial rescheduled missed visits (McNeil 1991) so participants attended the full course of exercise. As with intensity of exercise, it is difficult to attribute any differences in outcome to differences in attendance rates, because there were other sources of variation in the type of interventions (e.g. duration of intervention, type of exercise) and differences in the methodological quality between trials which might account for differences in outcome.

1.2b Acceptability of treatment: completing the intervention or control

We extracted data on the number randomised and completing each trial (see Table 1). This ranged from 100% completion (Hemat‐Far 2012; Mather 2002; McNeil 1991; Mutrie 1988; Pilu 2007; Singh 1997; Schuch 2011) to 42% completion (Doyne 1987). For the exercise intervention, this ranged from 100% completion (Gary 2010; Hemat‐Far 2012; Mather 2002; McNeil 1991; Mutrie 1988; Pilu 2007; Singh 1997; Schuch 2011) to 55% completion (Klein 1985).

Twenty‐nine studies (1363 participants) reported how many completed the exercise and control arms (Analysis 1.3). The risk ratio (RR) was 1.00 (95% CI 0.97 to 1.04).

1.3. Quality of life

Five trials reported quality of life at the end of treatment (Gary 2010; Hoffman 2010; Schuch 2011; Singh 2005; Pilu 2007). One author provided data regarding the different domains (Gary 2010). One trial reported quality of life at baseline but not at follow‐up (Sims 2009).

There were no statistically significant differences for the mental (SMD ‐0.24; 95% CI ‐0.76 to 0.29), psychological (SMD 0.28; 95% CI ‐0.29 to 0.86) and social domains (SMD 0.19; 95% CI ‐0.35 to 0.74) (Analysis 1.4). Two studies reported a statistically significant difference for the environment domain favouring exercise (SMD 0.62; 95% CI 0.06 to 1.18) and four studies reported a statistically significant difference for the physical domain favouring exercise (SMD 0.45; 95% CI 0.06 to 0.83).

1.4 Cost

No trial reported costs.

1.5 Adverse events

Seven trials reported no difference in adverse events between the exercise and usual care groups (Blumenthal 2007; Blumenthal 2012a; Dunn 2005; Knubben 2007; Singh 1997; Singh 2005; Sims 2009). Dunn 2005 reported increased severity of depressive symptoms (n = 1), chest pain (n = 1) and joint pain/swelling (n = 1); all these participants discontinued exercise. Singh 1997 reported that one exerciser was referred to her psychologist at six weeks due to increasing suicidality; and musculoskeletal symptoms in two participants required adjustment of training regime. Singh 2005 reported adverse events in detail (visits to a health professional, minor illness, muscular pain, chest pain, injuries requiring training adjustment, falls, deaths and hospital days) and found no difference between the groups. Knubben 2007 reported "no negative effects of exercise (muscle pain, tightness or fatigue)"; after the training had finished, one person in the placebo group required gastric lavage and one person in the exercise group inflicted a superficial cut in her arm. Sims 2009 reported no adverse events or falls in either the exercise or control groups. Blumenthal 2007 reported more side effects in the sertraline group (see comparison below) but there was no difference between the exercise and control group. Blumenthal 2012a reported more fatigue and sexual dysfunction in the sertraline group than in the exercise group.

Because of the diversity of different adverse events reported, we decided not to do a meta‐analysis of these data.

Comparison 2: Exercise versus psychological therapies

Primary outcome
2.1 Reduction in depression symptom severity

Post‐treatment

Seven trials (189 participants) provided data comparing exercise with psychological therapies; the SMD was ‐0.03 (95% CI ‐0.32 to 0.26) (Analysis 2.1) indicating no significant difference between the two interventions. No statistical heterogeneity was indicated.

End of long‐term follow‐up

There were insufficient data available for long‐term follow‐up.

Secondary outcomes
2.2a Acceptability of treatment: attendance at exercise sessions

One trial reported adherence scores that were calculated based on the number of sessions attended of those prescribed. This trial reported that the adherence rates were 82% for exercise and 72% for CBT (Gary 2010).

2.2b Acceptability of treatment: completing the intervention or psychological therapies

For staying in the trial, there were data from four trials (172 participants) (Analysis 2.2). The risk ratio was 1.08 (95% CI 0.95 to 1.24).

2.3. Quality of life

One trial reported changes in the Minnesota Living with Heart Failure Questionnaire, a quality of life measure (Gary 2010). There was no statistically significant difference for the physical domain (MD 0.15; 95% CI: ‐7.40 to 7.70) or the mental domain (MD ‐0.09; 95% CI: ‐9.51 to 9.33) (Analysis 2.3).

2.4 Cost

No trial reported costs.

2.5 Adverse events

No data available.

Comparison 3: Exercise versus alternative treatments

Primary outcome
3.1 Reduction in depression symptom severity

Post‐treatment

One trial found that exercise was superior to bright light therapy in reducing depression symptoms (Pinchasov 2000) (MD ‐6.40, 95% CI ‐10.20 to ‐2.60) (Analysis 3.1).

End of long‐term follow‐up

There were no data with regard to long‐term follow‐up.

Secondary outcomes

This trial did not report on any of the following outcomes.

3.2a Acceptability of treatment: attendance at exercise
3.2b Acceptability of treatment: completing the intervention or control
3.3. Quality of life
3.4 Cost
3.5 Adverse events

Comparison 4: Exercise versus pharmacological treatments

Primary outcome
4.1 Reduction in depression symptom severity

Post‐treatment

For the four trials (298 participants) that compared exercise with pharmacological treatments (Blumenthal 1999; Blumenthal 2007; Blumenthal 2012a; Brenes 2007) the SMD was ‐0.11 (95% CI ‐0.34 to 0.12) (Analysis 4.1), indicating no significant difference between the two interventions. No statistical heterogeneity was indicated.

End of long‐term follow‐up

There were insufficient data available for long‐term follow‐up.

Secondary outcomes
4.2a Acceptability of treatment: attendance at exercise

Blumenthal 1999 reported that of those allocated exercise alone, the median number of sessions attended was 89.6%. Of those allocated medication, no participant deviated by more than 5% from the prescribed dose.

4.2b Acceptability of treatment: completing the intervention or pharmacological intervention

For remaining in the trial, there were data from three trials (278 participants). The risk ratio was 0.98 (95% CI 0.86 to 1.12) (Analysis 4.2).

4.3. Quality of life

One trial Brenes 2007 reported no difference in change in SF‐36 mental health and physical health components between medication and exercise groups (Analysis 4.3).

4.4 Cost

No trial reported costs.

4.5 Adverse events

Blumenthal 1999 reported that 3/53 in the exercise group suffered musculoskeletal injuries; injuries in the medication group were not reported.

Blumenthal 2007 collected data on side effects by asking participants to rate a 36‐item somatic symptom checklist and reported that "a few patients reported worsening of symptoms"; of the 36 side effects assessed, only one showed a statistically significant group difference (P = 0.03), i.e. that the sertraline group reported worse post‐treatment diarrhoea and loose stools.

Blumenthal 2012a assessed 36 side effects; only two showed a significant group difference: 20% of participants receiving sertraline reported worse post‐treatment fatigue compared with 2.4% in the exercise group and 26% reported increased sexual problems compared with 2.4% in the exercise group.

Subgroup analyses

Type of exercise

We explored the influence of the type of exercise (aerobic, mixed or resistance) on outcomes for those trials comparing exercise versus control (Analysis 5.1). The SMD for aerobic exercise indicated a moderate clinical effect (SMD ‐0.55, 95% CI ‐0.77 to ‐0.34), whilst the SMDs for both mixed exercise (SMD ‐0.85, 95% CI ‐1.85 to 0.15) and resistance exercise (SMD ‐1.03, 95% CI ‐1.52 to ‐0.53) indicated large effect sizes, but with wide confidence intervals.

Intensity of exercise

We explored the influence of intensity (light/moderate, moderate, moderate/hard, hard, moderate/vigorous, vigorous) on the reduction of depression for those trials comparing exercise versus control (Analysis 5.2). The SMD for moderate/vigorous intensity (SMD ‐0.38, 95% CI ‐1.61 to 0.85) indicated a small effect size, whilst for moderate (SMD ‐0.64, 95% CI ‐1.01 to ‐0.28), moderate/hard (SMD ‐0.63, 95% CI ‐1.13 to ‐0.13) and hard intensity (SMD ‐0.56, 95% CI ‐0.93 to ‐0.20) there was a moderate clinical effect. A large effect size was indicated for vigorous intensity (SMD ‐0.77, 95% CI ‐1.30 to ‐0.24) and light/moderate intensity (SMD ‐0.83, 95% CI ‐1.32 to ‐0.34).

Duration and frequency of exercise

We explored the influence of the total number of prescribed exercise sessions (0 ‐ 12, 13 ‐ 24, 25 ‐ 36, 37+ sessions) on the reduction of depression for those trials comparing exercise versus control (Analysis 5.3). A moderate effect size was observed for 0 ‐ 12 sessions (SMD ‐0.42, 95% CI ‐1.26 to 0.43), and 37+ sessions (SMD ‐0.46, 95% CI ‐0.69 to ‐0.23). A large effect size was observed for 13 ‐ 24 sessions (SMD ‐0.70, 95% CI ‐1.09 to ‐0.31) and 25 ‐ 36 sessions (SMD ‐0.80, 95% CI ‐1.30 to ‐0.29).

Type of diagnosis

We performed subgroup analyses according to how the diagnosis of depression was made (cut‐point on a scale, or clinical interview and proper psychiatric diagnosis) (Analysis 5.4). There was a moderate effect size for clinical diagnosis of depression (SMD ‐0.57, 95% CI ‐0.81 to ‐0.32) and a cut‐point on a scale (SMD ‐0.67, 95% CI ‐0.95 to ‐0.39).

Type of control

We categorised controls as a) placebo; b) no treatment (including waiting list); c) exercise plus treatment versus treatment alone; d) stretching, meditation or relaxation; e) 'occupational', including education, occupational therapy. These categorisations were made by one author (KD) on the basis of data extracted at the initial stage of data extraction, and were checked by a second author (GM) (Analysis 5.5). The largest effect size was when exercise was compared with 'occupational' (SMD ‐3.67, 95% CI ‐4.94 to ‐2.41), and there was no statistically significant effect of exercise when it was compared with 'stretching, meditation or relaxation' (SMD ‐0.09 (95% CI ‐0.65 to 0.48).

We considered whether to perform additional subgroup analyses according to supervised versus unsupervised, indoor versus outdoor and individual versus group, and according to the type of control group (in the no‐treatment comparison) but we wanted to focus on the main subgroups of interest. Multiple subgroup analyses are generally considered inadvisable.

Sensitivity analyses

We conducted sensitivity analyses for the first comparison of exercise versus control to explore the impact of study quality on effect sizes. We have commented on how these sensitivity analyses influence pooled SMD, compared with the pooled SMD for the 35 trials comparing exercise with control (‐0.62 (95% CI) ‐0.81 to ‐0.42) (Analysis 1.1).

Peer‐reviewed journal publications

For the 34 trials (1335 participants) that were reported in peer‐reviewed journal publications or doctoral theses, the SMD was ‐0.59 (95% CI ‐0.78 to ‐0.40) (Analysis 6.1), showing a moderate clinical effect in favour of exercise, which is similar to the pooled SMD for all 35 trials.

Published as abstracts/conference proceedings only

The pooled SMD for the one study published as conference abstracts only was ‐2.00 (95% CI ‐3.19 to ‐0.82) (Analysis 6.2), showing a large effect size in favour of exercise i.e. larger than the pooled SMD for all 35 trials.

Allocation concealment

For the 14 trials (829 participants) with adequate allocation concealment and therefore at low risk of bias, the SMD was ‐0.49 (95% CI ‐0.75 to ‐0.24) (Analysis 6.3), showing a moderate clinical effect in favour of exercise, similar to the pooled SMD for all 35 trials.

Use of intention‐to‐treat analysis

For the 11 trials (567 participants) with intention‐to‐treat analyses, the SMD was ‐0.61 (95% CI ‐1.00 to ‐0.22) (Analysis 6.4), showing a moderate clinical effect in favour of exercise, similar to the pooled SMD for all 35 trials.

Blinded outcome assessment

For the 12 trials (658 participants) with blinded outcome assessments and therefore at low risk of bias, the SMD was ‐0.36 (95% CI ‐0.60 to ‐0.12) (Analysis 6.5), showing a small clinical effect in favour of exercise,which is smaller than the pooled SMD for all 35 trials.

Allocation concealment and intention‐to‐treat analysis and blinded outcome assessment

For the six trials (Blumenthal 1999; Blumenthal 2007; Blumenthal 2012a; Dunn 2005; Krogh 2009; Mather 2002) (464 participants) with adequate allocation concealment and intention‐to‐treat analyses and blinded outcome assessment and therefore at low risk of bias, the SMD was ‐0.18 (95% CI ‐0.47 to 0.11) (Analysis 6.6), i.e. there was a small clinical effect in favour of exercise, which did not reach statistical significance. This is smaller than the pooled SMD for all 35 trials.

Sensitivity analyses: including the arm with the smallest dose of exercise for those trials for which we used the arm with the largest dose of exercise in comparison 1

We included the arm with the smallest dose of exercise for 10 trials (Blumenthal 2007; Chu 2008; Doyne 1987; Dunn 2005; Krogh 2009; Mutrie 1988; Orth 1979; Setaro 1985; Singh 2005; Williams 2008) for which we had used the arm with the largest clinical effect in comparison 1 (Analysis 1.1). The SMD was ‐0.44 (95% CI ‐0.55 to ‐0.33) (Analysis 6.7), showing a moderate clinical effect in favour of exercise. This is similar to the pooled SMD for all 35 trials.

Recruitment and retention of participants

Table 1 presents data about the feasibility of recruiting and retaining participants both in the trial as a whole and in the exercise intervention in particular. We extracted data, when available, about the number of participants who were considered for inclusion in each trial, although this information was not available for all trials. The trials that did provide these data used different recruitment techniques (ranging from screening of people responding to advertisements to inclusion of those patients who were considered eligible by a referring doctor).

Discussion

Summary of main results

This updated review includes seven additional trials (384 additional participants); conclusions are similar to our previous review (Rimer 2012). The pooled standardised mean difference (SMD), for depression (measured by continuous variable), at the end of treatment, represented a moderate clinical effect. The 'Summary of findings' table suggests that the quality of the evidence is moderate.

There was some variation between studies with respect to attendance rates for exercise as an intervention, suggesting that there may be factors that influence acceptability of exercise among participants.

There was no difference between exercise and psychological therapy or pharmacological treatment on the primary outcome. There are too few data to draw conclusions about the effect of exercise on our secondary outcomes, including risk of harm.

Uncertainties

Uncertainties remain regarding how effective exercise is for improving mood in people with depression, primarily due to methodological shortcomings (please see below). Furthermore, if exercise does improve mood in people with depression, we cannot determine the optimum type, frequency and duration of exercise, whether it should be performed supervised or unsupervised, indoors or outdoors, or in a group or alone. There was, however, a suggestion that more sessions have a larger effect on mood than a smaller number of sessions, and that resistance and mixed training were more effective than aerobic training. Adverse events in those allocated to exercise were uncommon, but only a small number of trials reported this outcome. Ideally both the risks and benefits of exercise for depression should be evaluated in future trials. There were no data on costs, so we cannot comment on the cost‐effectiveness of exercise for depression. The type of control intervention may influence effect sizes. There was a paucity of data comparing exercise with psychological and pharmacological treatments; the available evidence suggests that exercise is no more effective than either psychological or pharmacological treatments.

Overall completeness and applicability of evidence

For this current update, we searched the CCDAN Group's trial register in September 2012, which is an up‐to‐date and comprehensive source of trials. We also searched the WHO trials portal in March 2013 in order to identify new ongoing trials. We scrutinised reference lists of the new trials identified. Ideally, we would have performed citation reference searches of all included studies, but with the large number of trials now in this review, this was no longer practical. Thus, it is possible that we may have missed some relevant trials. We updated our search of the CCDAN trials register up to 1st March 2013 and identified several studies that may need to be included in our next update. It is notable that in a seven‐month period (September 2012 to March 2013), several more potentially eligible completed trials have been published (Characteristics of studies awaiting classification). This demonstrates that exercise for depression is a topic of considerable interest to researchers, and that further updates of this review will be needed, ideally once a year, to ensure that the review is kept as up‐to‐date as possible.

The results of this review are applicable to adults classified by the trialists as having depression (either by a cut‐off score on a depression scale or by having a clinical diagnosis of depression) who were willing to participate in a programme of regular physical exercise, fulfilling the American College of Sports Medicine (ACSM) definition of exercise, within the context of a randomised controlled clinical trial. The trials we included are relevant to the review question. It is possible that only the most motivated of individuals were included in this type of research.

The data we extracted on aspects of feasibility (see Table 1) suggest that a large number of people need to be screened to identify suitable participants, unless recruiting from a clinical population, e.g. inpatients with depression. Note, though, that there was a wide range in the proportion of those screened who were subsequently randomised; this may be a function of the sampling frame (which may include a range of specifically screened or non‐screened potential participants), and interest in being a research participant at a time of low mood, as much as whether potential participants are interested in exercise as a therapy. A substantial number of people dropped out from both the exercise and control programmes, and even those who remained in the trial until the outcome assessments were not able to attend all exercise sessions.

We did not include trials in which advice was given to increase activity. Thus, we excluded a large, high‐quality trial (n = 361) in which people with depression in primary care were randomised to usual care or to usual care plus advice from a physical activity facilitator to increase activity (Chalder 2012), which showed no effect of the intervention on mood.

We had previously decided to exclude trials which included people both with and without depression, even if they reported data from a subgroup with depression. Thus, for this update, we excluded a large, high‐quality, cluster‐randomised trial recruiting 891 residents from 78 nursing homes (Underwood 2013), of whom 375 had baseline Geriatric Depression Scores suggesting depression. At the end of the treatment, there was no difference between the intervention and control group, for people both with and without depression at baseline. For future updates, we will include data from trials that reported subgroups with depression.

If this review had had broader inclusion criteria in relation to the type of intervention, we would have included additional studies, e.g. trials which provide advice to increase activity (e.g. Chalder 2012) and trials of other types of physical activity interventions that do not fulfil the ACSM definition for exercise (e.g. Tai Chi or Qigong, where mental processes are practiced alongside physical activity and may exert an additional or synergistic effect). Arguably, the review could be broader, but we have elected to keep the it more focused, partly to ensure that it remains feasible to update the review on a regular basis, with the resources we have available. The original review questions were conceived more than 10 years ago (Lawlor 2001), and although they are still relevant today, it would be of value to broaden the research questions to include evidence for other modes of physical activity. This could be through a series of related Cochrane reviews. There are already separate reviews of Tai Chi for depression, and we suggest that a review of advice to increase physical activity would be of value.

This review did not attempt to take into account the effects of exercise when the experience is pleasurable and self‐determined, though this would have been difficult as such data were not reported in the trials.

There were more women than men in the studies that we included, and there was a wide range in mean ages. We cannot currently make any new recommendations for the effectiveness of exercise referral schemes for depression (DOH 2001; Pavey 2011; Sorensen 2006). One study of the Welsh exercise referral scheme is 'awaiting assessment'. Nor can we be certain about the effect of exercise on other relevant outcomes e.g. quality of life, adverse events or its cost‐effectiveness because the majority of trials did not systematically report this information, although our meta‐analysis of quality of life suggested that exercise did not significantly improve quality of life compared to control.

We cannot comment about the effect of exercise in people with dysthymia (or sub‐clinical depression) and in those without mood disorders, as we explicitly excluded these trials from the review. Future systematic reviews and meta‐analyses might include these people, although new reviews would need to ensure that the search strategy was sufficiently comprehensive to identify all relevant trials. We excluded trials of exercise for postnatal depression (as we had done for our previous update).

Quality of the evidence

The majority of the trials we included were small and many had methodological weaknesses. We explicitly aimed to determine the influence of study quality, in particular allocation concealment, blinding and intention‐to‐treat analyses on effect sizes, as we had done in previous review versions (Lawlor 2001; Mead 2009, Rimer 2012). When only those trials with adequate allocation concealment and intention‐to‐treat analysis and blinded outcome assessors were included, the effect size was clinically small and not statistically significant (Analysis 6.6).

There was substantial heterogeneity; this might be explained by a number of factors including variation in the control intervention. However, when only high‐quality trials were included, the effect size was small and not statistically significant. Of the eight trials (377 participants) that provided long‐term follow‐up data, there was only a small effect in favour of exercise (SMD ‐0.33, 95% CI ‐0.63 to ‐0.03) at the end of long‐term follow‐up, This suggests that any benefits of exercise at the end of treatment may be lost over time. Thus, exercise may need to be continued in the longer term to maintain any early benefits. Our summary of findings tables indicate that the quality of evidence is low ('Summary of findings' table 5).

Our subgroup analyses showed that effect sizes were higher for mixed exercise and resistance exercise than for aerobic exercise alone, but confidence intervals were wide (Analysis 5.1). There were no apparent differences in effect sizes according to intensity of exercise (Analysis 5.2). Effect sizes were smaller in trials which provided fewer than 12 sessions of exercise (Analysis 5.3). Effect sizes were not statistically significant when compared with stretching, meditation or relaxation (Analysis 5.5). Our sensitivity analysis for 'dose' of exercise suggested that a lower dose of exercise was less effective than a higher dose (Analysis 6.7). Although our subgroup analyses, are simply observational in nature, they are not inconsistent with the current recommendations by NICE (NICE 2009).

We extracted information from the trials about other potential sources of biases, in line with the Cochrane Collaboration 'Risk of bias' tool. In exercise trials, it is generally not possible to blind participants or those delivering the intervention to the treatment allocation. Thus, if the primary outcome is measured by self report, this is an important potential source of bias. When we performed sensitivity analysis by including only those trials with blinded outcome assessors, the effect size was smaller than when these trials were included. This suggests that self report may lead to an overestimate of treatment effect sizes. It is important to note, however, that clinician‐rated outcomes (e.g. Hamilton Rating Scale for Depression) may also be subject to clinical interpretation and therefore are not free from bias. For random sequence generation, the risk of bias was unclear for most of the trials. For selective reporting, we categorised risk of bias as unclear for most of the trials, although we did not have the study protocols.

Furthermore, the funnel plot was asymmetrical suggesting small study bias, heterogeneity or outcome reporting bias.

Potential biases in the review process

We attempted to avoid bias by ensuring that we had identified all relevant studies through comprehensive systematic searching of the literature and contact with authors of the trials to identify other trials, both published and unpublished. However, we accept that some publication bias is inevitable and this is indicated by the asymmetrical funnel plot. This is likely to lead to an overestimate of effect sizes, because positive trials are more likely to be published than negative trials. The searches for this current update were less extensive than for the initial review in 2001 (Lawlor 2001), but because the CCDAN register of trials is updated regularly from many different sources, we think it is unlikely that we have missed relevant trials.

As noted above, there is considerable interest in the continued development of a robust and accurate evidence base in this field to guide practice and healthcare investment. We are already aware of three recent additional studies that were identified through extensive searches of CCDANCTR. Initial scrutiny of these studies suggests that they would not overturn our conclusions, but they highlight the need to maintain regular updates of this review.

For a previous version of this review (Mead 2009), we made post hoc decisions to exclude trials defined as a 'combination' intervention, a trial in which the exercise intervention lasted only four days (Berlin 2003), and trials of postnatal depression (Armstrong 2003; Armstrong 2004). For the update in 2012, (Rimer 2012), we specified in advance that we would exclude trials that did not fulfil the ACSM criteria (ACSM 2001) for exercise; this meant that we excluded two studies (Chou 2004; Tsang 2006) that had previously been included.

In previous versions of the review, we used data from the arm with the largest clinical effect; this approach could have biased the results in favour of exercise. For this update, we used the largest 'dose' of exercise and performed a sensitivity analyses to determine the effect of using the smaller 'dose' (Analysis 6.7). This showed that the effect size was slightly smaller for the lower dose than the higher dose (‐0.44 for the lower dose and ‐0.62 for the higher dose). This is consistent with one of the subgroup analyses which showed that fewer than 12 sessions was less effective than a larger number of sessions.

We performed several subgroup analyses, which, by their nature, are simply observational. A variety of control interventions were used. We explored the influence of the type of control intervention (Analysis 5.5); this suggests that exercise may be no more effective than stretching/meditation or relaxation on mood. When we performed subgroup analysis of high‐quality trials only, we categorised the comparator (relaxation) in one of the trials as a control intervention (Krogh 2009), rather than as an active treatment. Had we categorised relaxation as an active treatment,(e.g.Analysis 6.6), exercise would have had a larger clinical effect in the meta‐analysis.

Agreements and disagreements with other studies or reviews

Previous systematic reviews which found that exercise improved depression included uncontrolled trials (Blake 2009; Carlson 1991; Craft 2013; North 1990; Pinquart 2007), so the results of these reviews are probably biased in favour of exercise. Another systematic review (Stathopoulou 2006) which identified trials in peer‐reviewed journals only included only eight of the trials which we identified for our review (Doyne 1987; Dunn 2005; Klein 1985; McNeil 1991; Pinchasov 2000; Singh 1997; Singh 2005; Veale 1992), and also included two trials which we had excluded (Bosscher 1993; Sexton 1989). This review (Stathopoulou 2006) found a larger effect size than we did. A further two reviews included mainly older people (Blake 2009; Sjosten 2006), whereas we included participants of all ages (aged 18 and over). Another meta‐analysis (Rethorst 2009) concluded that exercise is effective as a treatment for depression, and also found a larger effect size than we did. A narrative review of existing systematic reviews suggested that it would seem appropriate that exercise is recommended in addition to other treatments pending further high‐quality trial data (Daley 2008). However, a systematic review that included only studies where participants had a clinical diagnosis of depression according to a healthcare professional found no benefit of exercise (Krogh 2011). Another review of walking for depression suggested that walking might be a useful adjunct for depression treatment, and recommended further trials (Robertson 2012).

Study flow diagram, showing the results of the searches for this current update.
Figures and Tables -
Figure 1

Study flow diagram, showing the results of the searches for this current update.

'Risk of bias' graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.
Figures and Tables -
Figure 2

'Risk of bias' graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.

'Risk of bias' summary: review authors' judgements about each risk of bias item for each included study.
Figures and Tables -
Figure 3

'Risk of bias' summary: review authors' judgements about each risk of bias item for each included study.

Funnel plot of comparison: 1 Exercise versus control, outcome: 1.1 Reduction in depression symptoms post‐treatment.
Figures and Tables -
Figure 4

Funnel plot of comparison: 1 Exercise versus control, outcome: 1.1 Reduction in depression symptoms post‐treatment.

Comparison 1 Exercise versus 'control', Outcome 1 Reduction in depression symptoms post‐treatment.
Figures and Tables -
Analysis 1.1

Comparison 1 Exercise versus 'control', Outcome 1 Reduction in depression symptoms post‐treatment.

Comparison 1 Exercise versus 'control', Outcome 2 Reduction in depression symptoms follow‐up.
Figures and Tables -
Analysis 1.2

Comparison 1 Exercise versus 'control', Outcome 2 Reduction in depression symptoms follow‐up.

Comparison 1 Exercise versus 'control', Outcome 3 Completed intervention or control.
Figures and Tables -
Analysis 1.3

Comparison 1 Exercise versus 'control', Outcome 3 Completed intervention or control.

Comparison 1 Exercise versus 'control', Outcome 4 Quality of life.
Figures and Tables -
Analysis 1.4

Comparison 1 Exercise versus 'control', Outcome 4 Quality of life.

Comparison 2 Exercise versus psychological therapies, Outcome 1 Reduction in depression symptoms post‐treatment.
Figures and Tables -
Analysis 2.1

Comparison 2 Exercise versus psychological therapies, Outcome 1 Reduction in depression symptoms post‐treatment.

Comparison 2 Exercise versus psychological therapies, Outcome 2 Completed exercise or pyschological therapies.
Figures and Tables -
Analysis 2.2

Comparison 2 Exercise versus psychological therapies, Outcome 2 Completed exercise or pyschological therapies.

Comparison 2 Exercise versus psychological therapies, Outcome 3 Quality of life.
Figures and Tables -
Analysis 2.3

Comparison 2 Exercise versus psychological therapies, Outcome 3 Quality of life.

Comparison 3 Exercise versus bright light therapy, Outcome 1 Reduction in depression symptoms post‐treatment.
Figures and Tables -
Analysis 3.1

Comparison 3 Exercise versus bright light therapy, Outcome 1 Reduction in depression symptoms post‐treatment.

Comparison 4 Exercise versus pharmacological treatments, Outcome 1 Reduction in depression symptoms post‐treatment.
Figures and Tables -
Analysis 4.1

Comparison 4 Exercise versus pharmacological treatments, Outcome 1 Reduction in depression symptoms post‐treatment.

Comparison 4 Exercise versus pharmacological treatments, Outcome 2 Completed exercise or antidepressants.
Figures and Tables -
Analysis 4.2

Comparison 4 Exercise versus pharmacological treatments, Outcome 2 Completed exercise or antidepressants.

Comparison 4 Exercise versus pharmacological treatments, Outcome 3 Quality of Life.
Figures and Tables -
Analysis 4.3

Comparison 4 Exercise versus pharmacological treatments, Outcome 3 Quality of Life.

Comparison 5 Reduction in depression symptoms post‐treatment: Subgroup analyses, Outcome 1 Exercise vs control subgroup analysis: type of exercise.
Figures and Tables -
Analysis 5.1

Comparison 5 Reduction in depression symptoms post‐treatment: Subgroup analyses, Outcome 1 Exercise vs control subgroup analysis: type of exercise.

Comparison 5 Reduction in depression symptoms post‐treatment: Subgroup analyses, Outcome 2 Exercise vs control subroup analysis: intensity.
Figures and Tables -
Analysis 5.2

Comparison 5 Reduction in depression symptoms post‐treatment: Subgroup analyses, Outcome 2 Exercise vs control subroup analysis: intensity.

Comparison 5 Reduction in depression symptoms post‐treatment: Subgroup analyses, Outcome 3 Exercise vs control subroup analysis: number of sessions.
Figures and Tables -
Analysis 5.3

Comparison 5 Reduction in depression symptoms post‐treatment: Subgroup analyses, Outcome 3 Exercise vs control subroup analysis: number of sessions.

Comparison 5 Reduction in depression symptoms post‐treatment: Subgroup analyses, Outcome 4 Exercise vs control subroup analysis: diagnosis of depression.
Figures and Tables -
Analysis 5.4

Comparison 5 Reduction in depression symptoms post‐treatment: Subgroup analyses, Outcome 4 Exercise vs control subroup analysis: diagnosis of depression.

Comparison 5 Reduction in depression symptoms post‐treatment: Subgroup analyses, Outcome 5 Exercise vs control subgroup analysis: type of control.
Figures and Tables -
Analysis 5.5

Comparison 5 Reduction in depression symptoms post‐treatment: Subgroup analyses, Outcome 5 Exercise vs control subgroup analysis: type of control.

Comparison 6 Exercise versus control: sensitivity analyses, Outcome 1 Reduction in depression symptoms post‐treatment: peer‐reviewed journal publications and doctoral theses only.
Figures and Tables -
Analysis 6.1

Comparison 6 Exercise versus control: sensitivity analyses, Outcome 1 Reduction in depression symptoms post‐treatment: peer‐reviewed journal publications and doctoral theses only.

Comparison 6 Exercise versus control: sensitivity analyses, Outcome 2 Reduction in depression symptoms post‐treatment: studies published as abstracts or conference proceedings only.
Figures and Tables -
Analysis 6.2

Comparison 6 Exercise versus control: sensitivity analyses, Outcome 2 Reduction in depression symptoms post‐treatment: studies published as abstracts or conference proceedings only.

Comparison 6 Exercise versus control: sensitivity analyses, Outcome 3 Reduction in depression symptoms post‐treatment: studies with adequate allocation concealment.
Figures and Tables -
Analysis 6.3

Comparison 6 Exercise versus control: sensitivity analyses, Outcome 3 Reduction in depression symptoms post‐treatment: studies with adequate allocation concealment.

Comparison 6 Exercise versus control: sensitivity analyses, Outcome 4 Reduction in depression symptoms post‐treatment: studies using intention‐to‐treat analysis.
Figures and Tables -
Analysis 6.4

Comparison 6 Exercise versus control: sensitivity analyses, Outcome 4 Reduction in depression symptoms post‐treatment: studies using intention‐to‐treat analysis.

Comparison 6 Exercise versus control: sensitivity analyses, Outcome 5 Reduction in depression symptoms post‐treatment: studies with blinded outcome assessment.
Figures and Tables -
Analysis 6.5

Comparison 6 Exercise versus control: sensitivity analyses, Outcome 5 Reduction in depression symptoms post‐treatment: studies with blinded outcome assessment.

Comparison 6 Exercise versus control: sensitivity analyses, Outcome 6 Reduction in depression symptoms post‐treatment: allocation concealment, intention‐to‐treat, blinded outcome.
Figures and Tables -
Analysis 6.6

Comparison 6 Exercise versus control: sensitivity analyses, Outcome 6 Reduction in depression symptoms post‐treatment: allocation concealment, intention‐to‐treat, blinded outcome.

Comparison 6 Exercise versus control: sensitivity analyses, Outcome 7 Reduction in depression symptoms post‐treatment: Lowest dose of exercise.
Figures and Tables -
Analysis 6.7

Comparison 6 Exercise versus control: sensitivity analyses, Outcome 7 Reduction in depression symptoms post‐treatment: Lowest dose of exercise.

Summary of findings for the main comparison. Exercise compared to control for adults with depression

Exercise compared to no intervention or placebo for adults with depression

Patient or population: adults with depression
Settings: any setting
Intervention: Exercise
Comparison: no intervention or placebo

Outcomes

Illustrative comparative risks* (95% CI)

No of Participants
(studies)

Quality of the evidence
(GRADE)

Comments

Assumed risk

Corresponding risk

No intervention or placebo

Exercise

Symptoms of depression
Different scales
Follow‐up: post‐treatment

The mean symptoms of depression in the control groups was
0

The mean symptoms of depression in the intervention groups was
0.62 standard deviations lower
(0.81 to 0.42 lower)1

1353
(35 studies)

⊕⊕⊕⊝
moderate2,3,4

SMD ‐0.62 (95% CI: ‐0.81 to ‐0.42).

The effect size was interpreted as 'moderate' (using Cohen's rule of thumb)

Symptoms of depression (long‐term)
different scales

The mean symptoms of depression (long‐term) in the control groups was
0

The mean symptoms of depression (long‐term) in the intervention groups was
0.33 standard deviations lower
(0.63 to 0.03 lower)

377
(8 studies)

⊕⊕⊝⊝
low4,5

SMD ‐0.33 (95% CI: ‐0.63 to ‐0.03).

The effect size was interpreted as 'small' (using Cohen's rule of thumb)

Adverse events

See comment

See comment

0
(6 studies)

⊕⊕⊕⊝
moderate

Seven trials reported no difference in adverse events between exercise and usual care groups. Dunn 2005 reported increased severity of depressive symptoms (n = 1), chest pain (n = 1) and joint pain/swelling (n = 1); all these participants discontinued exercise. Singh 1997 reported that 1 exerciser was referred to her psychologist at 6 weeks due to increasing suicidality; and musculoskeletal symptoms in 2 participants required adjustment of training regime. Singh 2005

reported adverse events in detail (visits to a health professional, minor illness, muscular pain, chest pain, injuries requiring training adjustment, falls, deaths and hospital days) and found no difference between the groups. Knubben 2007 reported "no negative effects of exercise (muscle pain, tightness or fatigue)"; after the training had finished, 1 person in the placebo group required gastric lavage and 1 person in the exercise group inflicted a superficial cut on her arm. Sims 2009

reported no adverse events or falls in either the exercise or control group. Blumenthal 2007 reported more side effects in the sertraline group (see comparison below) but there was no difference between the exercise and control group. Blumenthal 2012a reported more fatigue and sexual dysfunction in the sertraline group than the exercise group.

Acceptability of treatment

Study population

1363
(29 studies)

⊕⊕⊕⊝
moderate2

RR 1
(95% CI: 0.97 to 1.04)

865 per 1000

865 per 1000
(839 to 900)

Quality of life

The mean quality of life in the intervention groups was
0 higher
(0 to 0 higher)

0
(4 studies)

See comment

There was no statistically significant differences for the mental (SMD ‐0.24; 95% CI ‐0.76 to 0.29). psychological (SMD 0.28; 95% CI ‐0.29 to 0.86) and social domains (SMD 0.19; 95% CI ‐0.35 to 0.74). Two studies reported a statistically significant difference for the environment domain favouring exercise (SMD 0.62; 95% CI 0.06 to 1.18) and 4 studies reported a statistically significant difference for the physical domain favouring exercise (SMD 0.45; 95% CI 0.06 to 0.83).

*The basis for the assumed risk (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).
CI: Confidence interval; RR: Risk ratio;

GRADE Working Group grades of evidence
High quality: Further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality: We are very uncertain about the estimate.

1 Effect estimate calculated by re‐expressing the SMD on the Hamilton Depression Rating Scale using the control group SD (7) from Blumenthal 2007 (study chosen for being most representative). The SD was multiplied by the pooled SMD to provide the effect estimate on the HDRS.
2 Lack of blinding of outcome assessors probably increased effect sizes and drop‐out rates were high. Also sequence generation was considered unclear in 23 studies.
3 I² = 63% and P < 0.00001, indicated moderate levels of heterogeneity
4 Population size is large, effect size is above 0.2 SD, and the 95% CI does not cross the line of no effect.
5 Lack of blinding of outcome assessors probably increased effect sizes and drop‐out rates were high. Also sequence generation was considered unclear in 4 studies.

Figures and Tables -
Summary of findings for the main comparison. Exercise compared to control for adults with depression
Summary of findings 2. Exercise compared to psychological treatments for adults with depression

Exercise compared to cognitive therapy for adults with depression

Patient or population: adults with depression
Settings:
Intervention: Exercise
Comparison: cognitive therapy

Outcomes

Illustrative comparative risks* (95% CI)

No of Participants
(studies)

Quality of the evidence
(GRADE)

Comments

Assumed risk

Corresponding risk

Cognitive therapy

Exercise

Symptoms of depression

The mean symptoms of depression in the intervention groups was
0.03 standard deviations lower
(0.32 lower to 0.26 higher)

189
(7 studies)

⊕⊕⊕⊝
moderate1,2,3

SMD ‐0.03 (95% CI: ‐0.32 to 0.26)

Acceptability of treatment

Study population

172
(4 studies)

⊕⊕⊕⊝
moderate1

RR 1.08
(95% CI: 0.95 to 1.24)

766 per 1000

827 per 1000
(728 to 950)

Quality of Life

The mean quality of life in the intervention groups was
0 higher
(0 to 0 higher)

0
(1 study)

⊕⊕⊕⊝
moderate1

One trial reported changes in the Minnesota Living with Heart Failure Questionnaire, a quality of life measure (Gary 2010). There was no statistically significant difference for the physical domain (MD 0.15; 95% CI: ‐7.40 to 7.70) or the mental domain (MD ‐0.09; 95% CI: ‐9.51 to 9.33).

*The basis for the assumed risk (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).
CI: Confidence interval; RR: Risk ratio;

GRADE Working Group grades of evidence
High quality: Further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality: We are very uncertain about the estimate.

1 Lack of blinding of outcome assessors probably increased effect sizes and drop‐out rates were high. Also sequence generation was considered unclear in 7 studies.
2 I² = 0% and P = 0.62, indicated no heterogeneity
3 The studies included were all relevant to the review question, particularly given that all studies had to meet the criteria of the ACSM definition of exercise.

Figures and Tables -
Summary of findings 2. Exercise compared to psychological treatments for adults with depression
Summary of findings 3. Exercise compared to bright light therapy for adults with depression

Exercise compared to bright light therapy for adults with depression

Patient or population: adults with depression
Settings:
Intervention: Exercise
Comparison: bright light therapy

Outcomes

Illustrative comparative risks* (95% CI)

No of Participants
(studies)

Quality of the evidence
(GRADE)

Comments

Assumed risk

Corresponding risk

Bright light therapy

Exercise

Symptoms of depression

The mean symptoms of depression in the intervention groups was
6.4 lower
(10.2 to 2.6 lower)

18
(1 study)

⊕⊝⊝⊝
very low1,2,3

MD ‐6.40 (95% CI: ‐10.20 to ‐2.60).

Although this trial suggests a benefit of exercise, it is too small to draw firm conclusions

*The basis for the assumed risk (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).
CI: Confidence interval;

GRADE Working Group grades of evidence
High quality: Further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality: We are very uncertain about the estimate.

1 Lack of blinding of outcome assessors probably increased effect sizes and drop‐out rates were not reported. Also sequence generation and concealment was considered unclear.
2 The study included was relevant to the review question, particularly given that all studies had to meet the criteria of the ACSM definition of exercise.
3 Based on 18 people

Figures and Tables -
Summary of findings 3. Exercise compared to bright light therapy for adults with depression
Summary of findings 4. Exercise compared to pharmacological treatments for adults with depression

Exercise compared to antidepressants for adults with depression

Patient or population: adults with depression
Settings:
Intervention: Exercise
Comparison: antidepressants

Outcomes

Illustrative comparative risks* (95% CI)

No of Participants
(studies)

Quality of the evidence
(GRADE)

Comments

Assumed risk

Corresponding risk

Antidepressants

Exercise

Symptoms of depression

The mean symptoms of depression in the intervention groups was
0.11 standard deviations lower
(0.34 lower to 0.12 higher)

300
(4 studies)

⊕⊕⊕⊝
moderate1,2,3

SMD ‐0.11 (95% CI: ‐0.34 to 0.12)

Acceptability of treatment

Study population

278
(3 studies)

⊕⊕⊕⊝
moderate1

RR 0.98
(95% CI: 0.86 to 1.12)

891 per 1000

873 per 1000
(766 to 997)

Quality of life

The mean quality of life in the intervention groups was
0 higher
(0 to 0 higher)

0
(1 study)

⊕⊕⊕⊝
moderate1

One trial, Brenes 2007, reported no difference in change in SF‐36 mental health and physical health components between medication and exercise groups.

Adverse events

See comment

See comment

0
(3 studies)

⊕⊕⊕⊝
moderate1

Blumenthal 1999 reported that 3/53 in exercise group suffered musculoskeletal injuries; injuries in the medication group were not reported.

Blumenthal 2007 collected data on side effects by asking participants to rate a 36‐item somatic symptom checklist and reported that "a few patients reported worsening of symptoms"; of the 36 side effects assessed, only 1 showed a statistically significant group difference (P = 0.03), i.e. that the sertraline group reported worse post‐treatment diarrhoea and loose stools.

Blumenthal 2012a assessed 36 side effects; only 2 showed a significant group difference: 20% of participants receiving sertraline reported worse post‐treatment fatigue compared with 2.4% in the exercise group and 26% reported increased sexual problems compared with 2.4% in the exercise group.

*The basis for the assumed risk (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).
CI: Confidence interval; RR: Risk ratio;

GRADE Working Group grades of evidence
High quality: Further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality: We are very uncertain about the estimate.

1 Lack of blinding of outcome assessors probably increased effect sizes and drop‐out rates were high. Also sequence generation was considered unclear in 1 study.
2 I² = 0% and P = 0.52, indicated no heterogeneity
3 The studies included were all relevant to the review question, particularly given that all studies had to meet the criteria of the ACSM definition of exercise.

Figures and Tables -
Summary of findings 4. Exercise compared to pharmacological treatments for adults with depression
Table 1. Number screened; number still in trial and exercise intervention at end of trial

Trial ID

Screened

Randomised

Allocated exercise

Completed trial

Completed comparator group, e.g. control, other treatment (as a proportion of those allocated)

Completed exercise (as a proportion of those allocated)

Blumenthal 1999

604 underwent telephone screening

156

55

133

41/48 (medication)

44/55 (exercise plus medication)

39/53 (exercise alone)

Blumenthal 2007

457

202

51 (supervised), 53 home‐based

183

42/49 (placebo)

45/49 (sertraline)

45/51 (supervised), 51/53 home‐based

Blumenthal 2012b

1680 enquired about the study

101

37

95

23/24 completed 'placebo' and 36/40 completed the medication

36/37 completed the exercise

Brenes 2007

Not reported

37

14

Not reported

Not reported

Not reported

Bonnet 2005

Not reported

11

5

7

4/6

3/5

Chu 2008

104 responded to adverts

54

36

38

12/18

26/36 (both exercise arms combined)

15/18 in the high‐intensity arm

Dunn 2005

1664 assessed for eligibility

80

17

45

9/13

11/17 (public health dose 3 times per week)

Doyne 1987

285 responded to adverts

57

Not reported

40 completed treatment or control

27 (denominator not known)

13 (denominator not known)

Epstein 1986

250 telephone inquiries received

33

7

Not reported

Not reported

7

Fetsch 1979

Not reported

21

10

16

8/11

8/10

Foley 2008

215 responded to adverts

23

10

13

5/13

8/10

Fremont 1987

72 initially expressed an interest

61

21

49

31/40

18/21

Gary 2010

982 referred, 242 had heart failure, 137 had a BDI > 10 and 74 eligible and consented

74

20

68/74 completed post‐intervention assessments and 62 completed follow‐up assessments

usual care 15/17

exercise only: 20/20

Greist 1979

Not reported

28

10

22

15/18

8/10

Hemat‐Far 2012

350 screened

20

10

20

not stated

not stated

Hess‐Homeier 1981

Not reported

17

5

Not reported

Not reported

Not reported

Hoffman 2010

253 screened, 58 ineligible

84

42

76

39/42 (2 were excluded by the trialists and 1 did not attend follow‐up)

37/42 of exercise group provided data for analysis

Klein 1985

209 responded to an advertisement

74

27

42

11/23 (meditation)

16/24 (group therapy)

15/27

Knubben 2007

Not reported

39 (note data on only 38 reported)

20

35

16/18

19/20

Krogh 2009

390 referred

165

110

137

42/55

95/110 (both exercise arms combined)

47/55 (strength)

48/55 (aerobic)

Martinsen 1985

Not reported

43

24

37

17/19

20/24

Mather 2002

1185 referred or screened

86

43

86

42/43

43/43

McCann 1984

250 completed BDI, 60 contacted

47

16

43

14/15 completed placebo

14/16 completed 'no treatment'

15/16

McNeil 1991

82

30

10

30

10/10 (waiting list)

10/10 (social contact)

10/10

Mota‐Pereira 2011

150

33

22

29/33

10/11

19/22

Mutrie 1988

36

24

9

24

7/7

9/9

Nabkasorn 2005

266 volunteers screened

59

28

49

28/31

21/28

Orth 1979

17

11

3

7

2/2

3/3

Pilu 2007

Not reported

30

10

30

20/20

10/10

Pinchasov 2000

Not reported

18

9

Not reported

Not reported

Not reported

Reuter 1984

Not reported

Not reported

9

Not reported

Not reported

9

Schuch 2011

14/40 invited patients were not interested in participating

26

15

"no patient withdrew from intervention"

"no patient withdrew from intervention"

"no patient withdrew from intervention"

Setaro 1985

211 responses to advertisement

180

30

150

Not reported

25/30

Shahidi 2011

70 older depressed women chosen from 500 members of a district using the geriatric depression scale

70

23

60/70

20/24

20/23

Sims 2009

1550 invitations, 233 responded

45

23

43

22/22

21/23

Singh 1997

Letters sent to 2953 people, 884 replied

32

17

32

15/15

17/17

Singh 2005

451

60

20

54

19/20 (GP standard care)

18/20 (high‐intensity training)

Veale 1992

Not reported

83

48

57

29/35

36/48

Williams 2008

96 in parent study

43

33

34

8/10

26/33 (both exercise groups combined)

15/16 exercise

11/17 walking

BDI: Beck Depression Inventory

Figures and Tables -
Table 1. Number screened; number still in trial and exercise intervention at end of trial
Comparison 1. Exercise versus 'control'

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Reduction in depression symptoms post‐treatment Show forest plot

35

1353

Std. Mean Difference (IV, Random, 95% CI)

‐0.62 [‐0.81, ‐0.42]

2 Reduction in depression symptoms follow‐up Show forest plot

8

377

Std. Mean Difference (IV, Random, 95% CI)

‐0.33 [‐0.63, ‐0.03]

3 Completed intervention or control Show forest plot

29

1363

Risk Ratio (M‐H, Random, 95% CI)

1.00 [0.97, 1.04]

4 Quality of life Show forest plot

4

Std. Mean Difference (IV, Fixed, 95% CI)

Subtotals only

4.1 Mental

2

59

Std. Mean Difference (IV, Fixed, 95% CI)

‐0.24 [‐0.76, 0.29]

4.2 Psychological

2

56

Std. Mean Difference (IV, Fixed, 95% CI)

0.28 [‐0.29, 0.86]

4.3 Social

2

56

Std. Mean Difference (IV, Fixed, 95% CI)

0.19 [‐0.35, 0.74]

4.4 Environment

2

56

Std. Mean Difference (IV, Fixed, 95% CI)

0.62 [0.06, 1.18]

4.5 Physical

4

115

Std. Mean Difference (IV, Fixed, 95% CI)

0.45 [0.06, 0.83]

Figures and Tables -
Comparison 1. Exercise versus 'control'
Comparison 2. Exercise versus psychological therapies

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Reduction in depression symptoms post‐treatment Show forest plot

7

189

Std. Mean Difference (IV, Random, 95% CI)

‐0.03 [‐0.32, 0.26]

2 Completed exercise or pyschological therapies Show forest plot

4

172

Risk Ratio (M‐H, Random, 95% CI)

1.08 [0.95, 1.24]

3 Quality of life Show forest plot

1

Mean Difference (IV, Fixed, 95% CI)

Totals not selected

3.1 Physical

1

Mean Difference (IV, Fixed, 95% CI)

0.0 [0.0, 0.0]

3.2 Mental

1

Mean Difference (IV, Fixed, 95% CI)

0.0 [0.0, 0.0]

Figures and Tables -
Comparison 2. Exercise versus psychological therapies
Comparison 3. Exercise versus bright light therapy

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Reduction in depression symptoms post‐treatment Show forest plot

1

18

Mean Difference (IV, Fixed, 95% CI)

‐6.4 [‐10.20, ‐2.60]

Figures and Tables -
Comparison 3. Exercise versus bright light therapy
Comparison 4. Exercise versus pharmacological treatments

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Reduction in depression symptoms post‐treatment Show forest plot

4

300

Std. Mean Difference (IV, Random, 95% CI)

‐0.11 [‐0.34, 0.12]

2 Completed exercise or antidepressants Show forest plot

3

278

Risk Ratio (M‐H, Random, 95% CI)

0.98 [0.86, 1.12]

3 Quality of Life Show forest plot

1

Mean Difference (IV, Fixed, 95% CI)

Subtotals only

3.1 Mental

1

25

Mean Difference (IV, Fixed, 95% CI)

‐11.90 [‐24.04, 0.24]

3.2 Physical

1

25

Mean Difference (IV, Fixed, 95% CI)

1.30 [‐0.67, 3.27]

Figures and Tables -
Comparison 4. Exercise versus pharmacological treatments
Comparison 5. Reduction in depression symptoms post‐treatment: Subgroup analyses

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Exercise vs control subgroup analysis: type of exercise Show forest plot

35

Std. Mean Difference (IV, Random, 95% CI)

Subtotals only

1.1 Aerobic exercise

28

1080

Std. Mean Difference (IV, Random, 95% CI)

‐0.55 [‐0.77, ‐0.34]

1.2 Mixed exercise

3

128

Std. Mean Difference (IV, Random, 95% CI)

‐0.85 [‐1.85, 0.15]

1.3 Resistance exercise

4

144

Std. Mean Difference (IV, Random, 95% CI)

‐1.03 [‐1.52, ‐0.53]

2 Exercise vs control subroup analysis: intensity Show forest plot

35

Std. Mean Difference (IV, Random, 95% CI)

Subtotals only

2.1 light/moderate

3

76

Std. Mean Difference (IV, Random, 95% CI)

‐0.83 [‐1.32, ‐0.34]

2.2 moderate

12

343

Std. Mean Difference (IV, Random, 95% CI)

‐0.64 [‐1.01, ‐0.28]

2.3 hard

11

595

Std. Mean Difference (IV, Random, 95% CI)

‐0.56 [‐0.93, ‐0.20]

2.4 vigorous

5

230

Std. Mean Difference (IV, Random, 95% CI)

‐0.77 [‐1.30, ‐0.24]

2.5 Moderate/hard

2

66

Std. Mean Difference (IV, Random, 95% CI)

‐0.63 [‐1.13, ‐0.13]

2.6 Moderate/vigorous

2

42

Std. Mean Difference (IV, Random, 95% CI)

‐0.38 [‐1.61, 0.85]

3 Exercise vs control subroup analysis: number of sessions Show forest plot

35

Std. Mean Difference (IV, Random, 95% CI)

Subtotals only

3.1 0 ‐ 12 sessions

5

195

Std. Mean Difference (IV, Random, 95% CI)

‐0.42 [‐1.26, 0.43]

3.2 13 ‐ 24 sessions

9

296

Std. Mean Difference (IV, Random, 95% CI)

‐0.70 [‐1.09, ‐0.31]

3.3 25 ‐ 36 sessions

8

264

Std. Mean Difference (IV, Random, 95% CI)

‐0.80 [‐1.30, ‐0.29]

3.4 37+ sessions

10

524

Std. Mean Difference (IV, Random, 95% CI)

‐0.46 [‐0.69, ‐0.23]

3.5 unclear

3

73

Std. Mean Difference (IV, Random, 95% CI)

‐0.89 [‐1.39, ‐0.40]

4 Exercise vs control subroup analysis: diagnosis of depression Show forest plot

35

Std. Mean Difference (IV, Random, 95% CI)

Subtotals only

4.1 clinical diagnosis of depression

23

967

Std. Mean Difference (IV, Random, 95% CI)

‐0.57 [‐0.81, ‐0.32]

4.2 depression categorised according to cut points on a scale

11

367

Std. Mean Difference (IV, Random, 95% CI)

‐0.67 [‐0.95, ‐0.39]

4.3 unclear

1

18

Std. Mean Difference (IV, Random, 95% CI)

‐2.00 [‐3.19, ‐0.82]

5 Exercise vs control subgroup analysis: type of control Show forest plot

35

1353

Mean Difference (IV, Fixed, 95% CI)

‐1.57 [‐1.97, ‐1.16]

5.1 placebo

2

156

Mean Difference (IV, Fixed, 95% CI)

‐2.66 [‐4.58, ‐0.75]

5.2 No treatment, waiting list, usual care, self monitoring

17

563

Mean Difference (IV, Fixed, 95% CI)

‐4.75 [‐5.72, ‐3.78]

5.3 exercise plus treatment vs treatment

6

225

Mean Difference (IV, Fixed, 95% CI)

‐1.22 [‐2.21, ‐0.23]

5.4 stretching, meditation or relaxation

6

219

Mean Difference (IV, Fixed, 95% CI)

‐0.09 [‐0.65, 0.48]

5.5 occupational intervention, health education, casual conversation

4

190

Mean Difference (IV, Fixed, 95% CI)

‐3.67 [‐4.94, ‐2.41]

Figures and Tables -
Comparison 5. Reduction in depression symptoms post‐treatment: Subgroup analyses
Comparison 6. Exercise versus control: sensitivity analyses

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Reduction in depression symptoms post‐treatment: peer‐reviewed journal publications and doctoral theses only Show forest plot

34

1335

Std. Mean Difference (IV, Random, 95% CI)

‐0.59 [‐0.78, ‐0.40]

2 Reduction in depression symptoms post‐treatment: studies published as abstracts or conference proceedings only Show forest plot

1

18

Std. Mean Difference (IV, Random, 95% CI)

‐2.00 [‐3.19, ‐0.82]

3 Reduction in depression symptoms post‐treatment: studies with adequate allocation concealment Show forest plot

14

829

Std. Mean Difference (IV, Random, 95% CI)

‐0.49 [‐0.75, ‐0.24]

4 Reduction in depression symptoms post‐treatment: studies using intention‐to‐treat analysis Show forest plot

11

567

Std. Mean Difference (IV, Random, 95% CI)

‐0.61 [1.00, ‐0.22]

5 Reduction in depression symptoms post‐treatment: studies with blinded outcome assessment Show forest plot

12

658

Std. Mean Difference (IV, Random, 95% CI)

‐0.36 [‐0.60, ‐0.12]

6 Reduction in depression symptoms post‐treatment: allocation concealment, intention‐to‐treat, blinded outcome Show forest plot

6

464

Std. Mean Difference (IV, Random, 95% CI)

‐0.18 [‐0.47, 0.11]

7 Reduction in depression symptoms post‐treatment: Lowest dose of exercise Show forest plot

35

1347

Std. Mean Difference (IV, Fixed, 95% CI)

‐0.44 [‐0.55, ‐0.33]

Figures and Tables -
Comparison 6. Exercise versus control: sensitivity analyses