Psychosocial interventions for people with both severe mental illness and substance misuse

Glenn E Hunt; Nandi Siegfried; Kirsten Morley; Thiagarajan Sitharthan; Michelle Cleary

doi:10.1002/14651858.CD001088.pub3

Psychosocial interventions for people with both severe mental illness and substance misuse

Authors' declarations of interest

Version published: 03 October 2013 Version history

https://doi.org/10.1002/14651858.CD001088.pub3

Collapse all Expand all

Abstract

available in

Background

Even low levels of substance misuse by people with a severe mental illness can have detrimental effects.

Objectives

To assess the effects of psychosocial interventions for reduction in substance use in people with a serious mental illness compared with standard care.

Search methods

For this update (2013), the Trials Search Co‐ordinator of the Cochrane Schizophrenia Group (CSG) searched the CSG Trials Register (July 2012), which is based on regular searches of major medical and scientific databases. The principal authors conducted two further searches (8 October 2012 and 15 January 2013) of the Cochrane Database of Systematic Reviews, MEDLINE and PsycINFO. A separate search for trials of contingency management was completed as this was an additional intervention category for this update.

Selection criteria

We included all randomised controlled trials (RCTs) comparing psychosocial interventions for substance misuse with standard care in people with serious mental illness.

Data collection and analysis

We independently selected studies, extracted data and appraised study quality. For binary outcomes, we calculated standard estimates of relative risk (RR) and their 95% confidence intervals (CI) on an intention‐to‐treat basis. For continuous outcomes, we calculated the mean difference (MD) between groups. For all meta‐analyses we pooled data using a random‐effects model. Using the GRADE approach, we identified seven patient‐centred outcomes and assessed the quality of evidence for these within each comparison.

Main results

We included 32 trials with a total of 3165 participants. Evaluation of long‐term integrated care included four RCTs (n = 735). We found no significant differences on loss to treatment (n = 603, 3 RCTs, RR 1.09 CI 0.82 to 1.45, low quality of evidence), death by 3 years (n = 421, 2 RCTs, RR 1.18 CI 0.39 to 3.57, low quality of evidence), alcohol use (not in remission at 36 months) (n = 143, 1 RCT, RR 1.15 CI 0.84 to 1.56,low quality of evidence), substance use (n = 85, 1 RCT, RR 0.89 CI 0.63 to 1.25, low quality of evidence), global assessment of functioning (n = 171, 1 RCT, MD 0.7 CI 2.07 to 3.47, low quality of evidence), or general life satisfaction (n = 372, 2 RCTs, MD 0.02 higher CI 0.28 to 0.32, moderate quality of evidence).

For evaluation of non‐integrated intensive case management with usual treatment (4 RCTs, n = 163) we found no statistically significant difference for loss to treatment at 12 months (n = 134, 3 RCTs, RR 1.21 CI 0.73 to 1.99, very low quality of evidence).

Motivational interviewing plus cognitive behavioural therapy compared to usual treatment (7 RCTs, total n = 878) did not reveal any advantage for retaining participants at 12 months (n = 327, 1 RCT, RR 0.99 CI 0.62 to 1.59, low quality of evidence) or for death (n = 493, 3 RCTs, RR 0.72 CI 0.22 to 2.41, low quality of evidence), and no benefit for reducing substance use (n = 119, 1 RCT, MD 0.19 CI ‐0.22 to 0.6, low quality of evidence), relapse (n = 36, 1 RCT, RR 0.5 CI 0.24 to 1.04, very low quality of evidence) or global functioning (n = 445, 4 RCTs, MD 1.24 CI 1.86 to 4.34, very low quality of evidence).

Cognitive behavioural therapy alone compared with usual treatment (2 RCTs, n = 152) showed no significant difference for losses from treatment at 3 months (n = 152, 2 RCTs, RR 1.12 CI 0.44 to 2.86, low quality of evidence). No benefits were observed on measures of lessening cannabis use at 6 months (n = 47, 1 RCT, RR 1.30 CI 0.79 to 2.15, very low quality of evidence) or mental state (n = 105, 1 RCT, Brief Psychiatric Rating Scale MD 0.52 CI ‐0.78 to 1.82, low quality of evidence).

We found no advantage for motivational interviewing alone compared with usual treatment (8 RCTs, n = 509) in reducing losses to treatment at 6 months (n = 62, 1 RCT, RR 1.71 CI 0.63 to 4.64, very low quality of evidence), although significantly more participants in the motivational interviewing group reported for their first aftercare appointment (n = 93, 1 RCT, RR 0.69 CI 0.53 to 0.9). Some differences, favouring treatment, were observed in abstaining from alcohol (n = 28, 1 RCT, RR 0.36 CI 0.17 to 0.75, very low quality of evidence) but not other substances (n = 89, 1 RCT, RR ‐0.07 CI ‐0.56 to 0.42, very low quality of evidence), and no differences were observed in mental state (n = 30, 1 RCT, MD 0.19 CI ‐0.59 to 0.21, very low quality of evidence).

We found no significant differences for skills training in the numbers lost to treatment by 12 months (n = 94, 2 RCTs, RR 0.70 CI 0.44 to 1.1, very low quality of evidence).

We found no differences for contingency management compared with usual treatment (2 RCTs, n = 206) in numbers lost to treatment at 3 months (n = 176, 1 RCT, RR 1.65 CI 1.18 to 2.31, low quality of evidence), number of stimulant positive urine tests at 6 months (n = 176, 1 RCT, RR 0.83 CI 0.65 to 1.06, low quality of evidence) or hospitalisations (n = 176, 1 RCT, RR 0.21 CI 0.05 to 0.93, low quality of evidence).

We were unable to summarise all findings due to skewed data or because trials did not measure the outcome of interest. In general, evidence was rated as low or very low due to high or unclear risks of bias because of poor trial methods, or poorly reported methods, and imprecision due to small sample sizes, low event rates and wide confidence intervals.

Authors' conclusions

We included 32 RCTs and found no compelling evidence to support any one psychosocial treatment over another for people to remain in treatment or to reduce substance use or improve mental state in people with serious mental illnesses. Furthermore, methodological difficulties exist which hinder pooling and interpreting results. Further high quality trials are required which address these concerns and improve the evidence in this important area.

PICOs

Population

Intervention

Comparison

Outcome

The PICO model is widely used and taught in evidence-based health care as a strategy for formulating questions and search strategies and for characterizing clinical studies or meta-analyses. PICO stands for four different potential components of a clinical question: Patient, Population or Problem; Intervention; Comparison; Outcome.

See more on using PICO in the Cochrane Handbook.

Plain language summary

available in

Psychosocial interventions for people with both severe mental illness and substance misuse

‘Dual diagnosis’ is the term used to describe people who have a mental health problem and also have problems with drugs or alcohol. In some areas, over 50% of all those with mental health difficulties will have problems with drugs or alcohol. For people with mental illness, substance misuse often has a negative and damaging effect on the symptoms of their illness and the way their medication works. They may become aggressive or engage in activities that are illegal. Substance misuse can also increase risk of suicide, hepatitis C, HIV, relapse, incarceration and homelessness.

People who have substance misuse problems but no mental illness can be treated via a variety of psychosocial interventions. These include motivational interviewing, or MI, that looks at people’s motivation for change; cognitive behavioural therapy, or CBT, which helps people adapt their behaviour by improving coping strategies; a supportive approach similar to that pioneered by Alcoholics Anonymous; family psycho‐education observing the signs and effects of substance misuse; and group or individual skills training. However, using these interventions for people with dual diagnosis is more complex.

The aim of this review was to assess the effects of psychosocial interventions for substance reduction in people with a serious mental illness compared to care as usual or standard care. A search for studies was carried out in July 2012; 32 studies were included in the review with a total of 3165 people. These studies used a variety of different psychosocial interventions (including CBT, MI, skills training, integrated models of care). In the main, evidence was graded as low or very low quality and no study showed any great difference between psychosocial interventions and treatment as usual. There was no compelling evidence to support any one psychosocial treatment over another. However, differences in study designs made comparisons between studies problematic. Studies also had high numbers of people leaving early, differences in outcomes measured, and differing ways in which the psychosocial interventions were delivered. More large scale, high quality and better reported studies are required to address these shortcomings. This will better address whether psychosocial interventions are effective and good for people with mental illness and substance misuse problems.

This plain language summary has been written by a consumer, Ben Gray from RETHINK.

Authors' conclusions

Implications for practice

This review is larger than the previous or original review (32 as opposed to 25 and six studies, respectively), although all three have similar results. The findings reveal no compelling evidence to support any one psychosocial treatment to reduce substance use or to improve mental state for people with severe mental illnesses. Some support for substance use reduction came from one small study assessing motivational interviewing, where more participants receiving this treatment abstained from alcohol. Further, more participants receiving motivational interviewing attended their first aftercare appointment. In combination with cognitive behavioural therapy, motivational interviewing also improved mental state, life satisfaction and social functioning. Little support was found for integrated, non‐integrated, or skills training programmes being superior to standard care. A recent study (McDonell 2013) reported reduced stimulant use in homeless people randomised to contingency management. This intervention was combined in another study (Bellack 2006) with motivational interviewing and cognitive behavioural therapy, with some positive outcomes.

However, methodological difficulties exist which hinder pooling and interpreting results and include high attrition rates; varying fidelity of interventions; varying outcome measures, settings and samples (sample size, participant level of substance use, motivation to change, diagnoses, age, gender, cultural, socioeconomic and contextual influences); and, in some cases, comparison groups may have received higher levels of treatment than usual standard care. Therefore, it is not yet possible to reach clear conclusions, although it is pleasing to see that the field is developing with an increase in high quality randomised controlled trials offering high‐fidelity programmes and reporting more usable data. However, the largest trial to date (Barrowclough 2010) did not find that motivational interviewing combined with cognitive behavioural therapy significantly improved patient outcomes.

1. For people with severe mental illness and substance misuse problems, and their carers

People with both severe mental health and substance misuse problems should be aware that at present there is little evidence to support any particular psychosocial intervention over another. This does not mean that particular treatments do not help, but that data are few and the little supportive evidence found in these studies should be replicated. No‐one can suggest to people entering a service that one form of support should really take precedence over another.

2. For clinicians

Clinicians need to keep up‐to‐date on the latest research findings in this area because as new trials are published, the evidence base should rapidly build to support particular interventions for this challenging group of patients. Interventions for substance reduction may need to be further developed and adapted for people with a serious mental illness. Clinicians who seek to offer existing interventions over and above standard care should take the opportunity to work with trial researchers to generate useful data.

3. For policy makers and commissioners of care

Developments in specific treatments and in models of service delivery are still taking place. While there is no evidence that the innovative integrated services that have been developed in the USA are helpful, conversely there is also no convincing evidence that they lead to a worse outcome. The development of such services may be unlikely in other countries, such as the UK where the general policy is to build on the existing links and to use mainstream services as far as possible (Seivewright 2005). This may be a function of methodological problems within the studies or it may be that there is, in fact, no effect. Policies in this difficult area are needed. These policies should be either based on good evidence or in their implementation should generate the relevant evidence.

Implications for research

1. General

1.1 Reporting of outcome measures

Only validated and non‐adapted scales should be used in future trials. Clear reporting of data during treatment and at various follow‐up periods with an indication that they meet the assumptions of the analyses undertaken would be helpful. Wherever possible, dichotomous data should be reported in addition to continuous data, as the use of outcomes such as retention in treatment, relapse, hospitalisation and abstinence rates are relevant to the topic and are preferable to reporting skewed data (Jones 2004).

1.2 Methodology

Clear and strict adherence to the CONSORT statement (Altman 2001; Begg 1996; Moher 1998; Turpin 2005) for methodology and all outcomes should be the goal of future trials. A full description of the number of participants lost to treatment and evaluation after the randomisation process should be completed at each time point for both treatment arms. A clear description of the randomisation process and blinding is also not difficult and is now necessary. The use of intention‐to‐treat analysis can assist with minimising bias resulting from missing data. Double‐blind evaluation of outcomes of psychosocial interventions is not possible due to the nature of the intervention. However, researchers should take every precaution to minimise the effect of bias by at least using raters blind to group assignment.

2. Specific

Consistent with our suggestions for more quality randomised controlled trials, other recently published reviews advocate a need for more consistent and methodologically rigorous trials on this topic to test both individual components and integrated programmes (Donald 2005; Drake 2004; Lubman 2010; Mueser 2005; Murthy 2012; Tiet 2007). Also worth noting are recent treatment recommendations on psychosocial interventions for substance reduction modified for people with a mental illness (Baker 2012; Dixon 2010; Kelly 2012; NICE 2011; Work Group 2007; Ziedonis 2005).

Future high quality trials in this area will contribute to the growing body of data and will allow future reviews to tease out findings. Assessing brief interventions (such as motivational interviewing) over standard care will allow the identification of cost‐effective and easy to implement components that can be quickly integrated into standard care. New trials should aim to recruit sufficiently large sample sizes and collect data that can be reported and, if appropriate, synthesised in meta‐analyses. Informed consent of participants should include statements that all anonymous data will be publicly available. The use of measurement scales should be of clinical value, in common use, and have demonstrated reliability and validity. We suggest a design for a future trial with the key methodological points highlighted in Table 2. Future reviews may explore differences between subgroups (determined a priori), such as differences between levels of substance use (misuse versus dependence), differences between substances used, and differences between age groups (for example, first episode schizophrenia versus older patients).

Open in table viewer

Table 2. Suggested design for trial

Methods	Allocation: centralised sequence generation with table of random numbers or computer generated code, stratified by severity of substance use. Sequence concealed until interventions assigned. Blinding: those recruiting and assigning participants, those assessing outcomes will be blind to treatment allocation. Duration: minimum of 1 year.
Participants	Diagnosis: Severe mental illness based on a diagnosis of schizophrenia, schizoaffective disorder, and other psychotic disorders. N=440* recruited to obtain a minimum sample of 280 at 12 months given the high drop‐out rate for some of the outcome measures. Age: adults 18‐55 years. Sex: men and women. Setting: hospital and community.
Interventions	1. Standard care plus 3‐5 sessions of motivational interviewing + 3 months of weekly CBT. 2. Standard care plus one motivational interview.
Outcomes	Lost to treatment. Death. Substance use: number of patients using substances, OTI. Mental state: BPRS. Relapse: number of patients readmitted. Quality of life: BQOL. Functioning: GAF. Arrests.
Notes	* size of study to detect a 10% difference in improvement with 80% certainty. If scales are used to measure outcome then there should be binary cut‐off points, defined before study start, of clinically important improvement.

Summary of findings

Open in table viewer

Summary of findings for the main comparison. INTEGRATED MODELS OF CARE compared to TREATMENT AS USUAL for both severe mental illness and substance misuse

INTEGRATED MODELS OF CARE compared to TREATMENT AS USUAL for both severe mental illness and substance misuse
Patient or population: People with both severe mental illness and substance misuse Settings: Outpatient Intervention: INTEGRATED MODELS OF CARE Comparison: TREATMENT AS USUAL
Outcomes	*Illustrative comparative risks (95% CI)**		Relative effect (95% CI)	No of Participants (studies)	Quality of the evidence (GRADE)	Comments
	Assumed risk	Corresponding risk
	TREATMENT AS USUAL	INTEGRATED MODELS OF CARE
Lost to treatment Follow‐up: mean 36 months	212 per 1000	231 per 1000 (174 to 308)	RR 1.09 (0.82 to 1.45)	603 (3 studies)	⊕⊕⊝⊝ low^1,2	Data were available for 36 months only
Death Follow‐up: mean 36 months	28 per 1000	33 per 1000 (11 to 101)	RR 1.18 (0.39 to 3.57)	421 (2 studies)	⊕⊕⊝⊝ low^3,4	Data were available for 36 months only
Alcohol use: Not in remission Follow‐up: mean 36 months	500 per 1000	575 per 1000 (420 to 780)	RR 1.15 (0.84 to 1.56)	143 (1 study)	⊕⊕⊝⊝ low^4,5	Data were available for 36 months only
Drug (non‐alcohol) use: Not in remission Follow‐up: mean 36 months	650 per 1000	578 per 1000 (409 to 812)	RR 0.89 (0.63 to 1.25)	85 (1 study)	⊕⊕⊝⊝ low^4,5	Data were available for 36 months only
Mental state Days of hospitalisation	See comment	See comment	Not estimable	0 (3 studies)	See comment	Data were skewed. In all three trials days of hospitalisation was less in the treatment group of approximately 3 days but SD were large and overlapped.
Global Assessment of Functioning GAF scale of 1 ‐ 100 Follow‐up: mean 12 months		The mean global assessment of functioning in the intervention groups was 0.7 higher (2.07 lower to 3.47 higher)		171 (1 study)	⊕⊕⊝⊝ low^5,6	NOTE: The GAF measures functioning on a scale of 1 to 100 and the difference detected in this single trial is not of clinical importance.
General life satisfaction Quality of Life Interview section, scale of 1 to 7 Follow‐up: mean 12 months		The mean general life satisfaction in the intervention groups was 0.02 higher (0.28 lower to 0.32 higher)		372 (2 studies)	⊕⊕⊕⊝ moderate³	The scale is from 1 to 7 and the very small difference was not statistically significant and is not of clinical importance.
The basis for the assumed risk* (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). CI: Confidence interval; RR: Risk ratio;
GRADE Working Group grades of evidence High quality: Further research is very unlikely to change our confidence in the estimate of effect. Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low quality: We are very uncertain about the estimate.
¹ Risk of bias: Rated as SERIOUS: Blinding of participants and personnel in all three trials was not possible and performance bias was rated as unclear risk of bias. Similarly all trials were at an unclear risk of detection bias. ² Imprecision: Rated as SERIOUS: The number of events is less than 300 and the overall sample size is small. ³ Risk of bias: Rated as SERIOUS: Blinding of participants and personnel in both trials was not possible and performance bias was rated as unclear risk of bias. Similarly all trials were at an unclear risk of detection bias as outcomes ratings were not blinded. ⁴ Imprecision: Rated as SERIOUS: The event rate is very low and the 95% confidence interval is wide. ⁵ Risk of bias: Rated as SERIOUS: Blinding of participants and personnel was not possible and performance bias was rated as unclear risk of bias. Similarly there was an unclear risk of detection bias as outcomes ratings were not blinded. ⁶ Imprecision: Rated as SERIOUS: The confidence interval is very wide and the sample size small.

Open in table viewer

Summary of findings 2. NON‐INTEGRATED MODELS OF CARE OR INTENSIVE CASE MANAGEMENT compared to TREATMENT AS USUAL for both severe mental illness and substance misuse

NON‐INTEGRATED MODELS OF CARE OR INTENSIVE CASE MANAGEMENT compared to TREATMENT AS USUAL for People with both severe mental illness and substance misuse
Patient or population: People with both severe mental illness and substance misuse Settings: Out‐patient Intervention: NON‐INTEGRATED MODELS OF CARE OR INTENSIVE CASE MANAGEMENT Comparison: TREATMENT AS USUAL
Outcomes	*Illustrative comparative risks (95% CI)**		Relative effect (95% CI)	No of Participants (studies)	Quality of the evidence (GRADE)	Comments
	Assumed risk	Corresponding risk
	TREATMENT AS USUAL	NON‐INTEGRATED MODELS OF CARE OR INTENSIVE CASE MANAGEMENT
Lost to treatment Follow‐up: mean 12 months	239 per 1000	289 per 1000 (174 to 475)	RR 1.21 (0.73 to 1.99)	134 (3 studies)	⊕⊝⊝⊝ very low^1,2
Death ‐ not measured	See comment	See comment	Not estimable	‐	See comment	Death was not measured in any of the trials.
Alcohol use C‐DIS‐R computer program for Diagnostic Interview Schedule: average score Follow‐up: mean 12 months		The mean alcohol use in the intervention groups was 0 higher (0 to 0 higher)		49 (1 study)	⊕⊝⊝⊝ very low^3,4	Data were skewed from one trial and there was no analysis of the difference between randomised arms.
Drug (non‐alcohol) use C‐DIS‐R computer program for Diagnostic Interview Schedule: average score Follow‐up: mean 12 months		The mean drug (non‐alcohol) use in the intervention groups was 0 higher (0 to 0 higher)		49 (1 study)	⊕⊝⊝⊝ very low^3,4	Data were skewed from one trial and there was no analysis of the difference between randomised arms.
Mental state Schizophrenia symptoms on C‐DIS‐R computer program for Diagnostic Interview Schedule: average score Follow‐up: mean 12 months		The mean mental state in the intervention groups was 0 higher (0 to 0 higher)		49 (1 study)	⊕⊝⊝⊝ very low^3,4	Data were skewed from one trial and there was no analysis of the difference between randomised arms.
Global Assessment of Functioning Role Functioning Scale Follow‐up: mean 12 months		The mean global assessment of functioning in the intervention groups was 0.7 higher (1.56 lower to 2.96 higher)		50 (1 study)	⊕⊝⊝⊝ very low^3,5	NOTE: the scale is 1 to 7 and the difference observed is not of clinical importance and is not statistically significant.
General life satisfaction Quality of Life Interview section, scale of 1 to 7 Follow‐up: mean 12 months		The mean general life satisfaction in the intervention groups was 0 higher (0 to 0 higher)		29 (1 study)	⊕⊝⊝⊝ very low^3,6	Data were skewed from one trial and there was no analysis of the difference between randomised arms.
The basis for the assumed risk* (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). CI: Confidence interval; RR: Risk ratio;
GRADE Working Group grades of evidence High quality: Further research is very unlikely to change our confidence in the estimate of effect. Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low quality: We are very uncertain about the estimate.
¹ Risk of bias: Rated as VERY SERIOUS: Random generation and allocation concealment was not adequately reported and the risk of bias is unclear. Both performance and detection bias was unclear as blinding was not performed or was unclearly reported. Attrition was unclear or very high (57% in Bond‐Anderson 91) so overall the risk of bias was rated as very serious. ² Imprecision: Rated as SERIOUS: The confidence interval is wide and the sample size is very small. ³ Risk of bias: Rated as SERIOUS: Blinding of participants and personnel was not possible and performance bias was rated as unclear risk of bias. Similarly there was an unclear risk of detection bias as outcomes ratings were participant/clinician mediated. ⁴ Imprecision: Rated as VERY SERIOUS: The sample size is very small from this single trial (N = 49) ⁵ Imprecision: Rated as VERY SERIOUS: The sample size is small and the confidence interval is very wide. ⁶ Imprecision: Rated as VERY SERIOUS: The sample size used in this review from this single trial was 29 based on those participants who had current substance use disorders.

Open in table viewer

Summary of findings 3. COGNITIVE BEHAVIOUR THERAPY + MOTIVATIONAL INTERVIEWING compared to TREATMENT AS USUAL for both severe mental illness and substance misuse

COGNITIVE BEHAVIOUR THERAPY + MOTIVATIONAL INTERVIEWING compared to TREATMENT AS USUAL for both severe mental illness and substance misuse
Patient or population: patients with both severe mental illness and substance misuse Settings: Out‐patient Intervention: COGNITIVE BEHAVIOUR THERAPY + MOTIVATIONAL INTERVIEWING Comparison: TREATMENT AS USUAL
Outcomes	*Illustrative comparative risks (95% CI)**		Relative effect (95% CI)	No of Participants (studies)	Quality of the evidence (GRADE)	Comments
	Assumed risk	Corresponding risk
	TREATMENT AS USUAL	COGNITIVE BEHAVIOUR THERAPY + MOTIVATIONAL INTERVIEWING
Lost to treatment Follow‐up: mean 12 months	178 per 1000	176 per 1000 (110 to 283)	RR 0.99 (0.62 to 1.59)	327 (1 study)	⊕⊕⊝⊝ low^1,2
Death Follow‐up: mean 12 months	33 per 1000	23 per 1000 (7 to 78)	RR 0.72 (0.22 to 2.41)	493 (3 studies)	⊕⊕⊝⊝ low^2,3
Alcohol use Estimated daily consumption in previous month Follow‐up: mean 12 months	See comment	See comment	Not estimable	46 (1 study)	See comment	Data were skewed from one trial and there was no analysis of the difference between randomised arms.
Drug (non‐alcohol) use Average number of different drugs used during the past month measured by the Opiate Treatment Index Follow‐up: mean 6 months		The mean drug (non‐alcohol) use in the intervention groups was 0.19 higher (0.22 lower to 0.6 higher)		119 (1 study)	⊕⊕⊝⊝ low^4,5
Mental state Relapse at 3 months after 9 months of treatment Follow‐up: mean 12 months	667 per 1000	333 per 1000 (160 to 693)	RR 0.5 (0.24 to 1.04)	36 (1 study)	⊕⊝⊝⊝ very low^6,7
Global Assessment of Functioning GAF scale of 1‐ 100 Follow‐up: mean 12 months		The mean global assessment of functioning in the intervention groups was 1.24 higher (1.86 lower to 4.34 higher)		445 (4 studies)	⊕⊝⊝⊝ very low^8,9,10	NOTE: The GAF measures functioning on a scale of 1 to 100 and the difference detected in this single trial is not of clinical importance.
General life satisfaction Brief Quality of Life Scale Follow‐up: mean 6 months		The mean general life satisfaction in the intervention groups was 0.58 higher (0 to 1.16 higher)		110 (1 study)	⊕⊕⊝⊝ low^11,12
The basis for the assumed risk* (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). CI: Confidence interval; RR: Risk ratio;
GRADE Working Group grades of evidence High quality: Further research is very unlikely to change our confidence in the estimate of effect. Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low quality: We are very uncertain about the estimate.
¹ Risk of bias: Rated as SERIOUS: The single trial which reported on 12 month loss to treatment had adequate random generation and allocation concealment. However, we down‐graded it for possible performance bias as participants and clinicians were not blinded. Detection bias was a low risk as outcome assessors were blinded. ² Imprecision: Rated as SERIOUS: The event rate is low and the confidence interval is wide and includes the line of no effect and appreciable harm. ³ Risk of bias: Rated as SERIOUS: The three trials included in this meta‐analysis were well‐conducted. A lack of blinding is unlikely to affect measurement of death. However, attrition was > 20% in all three trials and although missing data was balanced across groups, there is an unclear risk of bias due to attrition bias. ⁴ Risk of bias: Rated as SERIOUS: Random generation and allocation concealment were unclear and blinding was not possible for participants or clinicians. Attrition was high at 20% at 12 months, but missing outcome data was balanced between groups. ⁵ Imprecision: Rated as SERIOUS: The sample size is small and the confidence interval is wide. ⁶ Risk of bias: Rated as SERIOUS: The risk of attrition bias is unclear (22% across both groups at 18 months) despite missing outcome balanced between groups. A lack of blinding of participants and clinicians may result in performance bias. ⁷ Imprecision: Rated as VERY SERIOUS: The event rate is extremely low in this very small single trial (N = 36) and the confidence interval is wide. ⁸ Risk of bias: Rated as SERIOUS: Attrition was > 20% in all four trials and although missing data was balanced across groups, there is an unclear risk of bias due to attrition bias. ⁹ Inconsistency: Rated as SERIOUS: Heterogeneity was present (ChiÂ² = 5.20, df = 3 (P = 0.16); IÂ² = 42%). One trial (Barrowclough) showed significant improvement in the treatment group compared with the others, but we were unable to explain the reason for this. ¹⁰ Imprecision: Rated as SERIOUS: Four trials provided data for this meta‐analysis. The confidence interval is wide. ¹¹ Risk of bias: Rated as SERIOUS: Blinding of participants and clinicians was not possible and performance bias may be a risk. Attrition was 25% at 6 months and missing data were not balanced across interventions. Missing outcomes are enough to induce clinically relevant bias in observed effect size. ¹² Imprecision: Rated as SERIOUS: The sample size of the single trial is small and the confidence interval is wide.

Open in table viewer

Summary of findings 4. COGNITIVE BEHAVIOUR THERAPY compared to TREATMENT AS USUAL for both severe mental illness and substance misuse

COGNITIVE BEHAVIOUR THERAPY compared to TREATMENT AS USUAL for both severe mental illness and substance misuse
Patient or population: patients with both severe mental illness and substance misuse Settings: Outpatient Intervention: COGNITIVE BEHAVIOUR THERAPY Comparison: TREATMENT AS USUAL
Outcomes	*Illustrative comparative risks (95% CI)**		Relative effect (95% CI)	No of Participants (studies)	Quality of the evidence (GRADE)	Comments
	Assumed risk	Corresponding risk
	TREATMENT AS USUAL	COGNITIVE BEHAVIOUR THERAPY
Lost to treatment Follow‐up: mean 3 months	97 per 1000	108 per 1000 (43 to 277)	RR 1.12 (0.44 to 2.86)	152 (2 studies)	⊕⊕⊝⊝ low^1,2
Death	See comment	See comment	Not estimable	0 (0)	See comment	Death was not measured in any of the trials.
Alcohol use	See comment	See comment	Not estimable	105 (1 study)	⊕⊕⊝⊝ low^1,3	Naeem 2005 measured alcohol together with drug use in the Health of the Nation Outcome (HoNOS) scale. Edwards did not report on alcohol.
Drug (non‐alcohol) use: Cannabis Percentage of participants who used cannabis in last 4 weeks Follow‐up: mean 6 months	500 per 1000	650 per 1000 (395 to 1000)	RR 1.3 (0.79 to 2.15)	47 (1 study)	⊕⊝⊝⊝ very low^1,4	Data of outcomes for other drugs were skewed and were not compared between intervention and control.
Mental state General symptoms on Brief Psychiatric Rating Scale Follow‐up: mean 6 months		The mean mental state in the intervention groups was 0.52 higher (0.78 lower to 1.82 higher)		105 (1 study)	⊕⊕⊝⊝ low^1,4	The difference noted is unlikely to be clinically significant.
Global Assessment of Functioning The Social and Occupational Functioning Scale (SOFAS): scale of 1 to 100 Follow‐up: mean 6 months		The mean global assessment of functioning in the intervention groups was 4.7 lower (14.52 lower to 5.12 higher)		47 (1 study)	⊕⊝⊝⊝ very low^1,5	The other trial in this comparison (Naeem 2005) measured Functioning with the HoNOS scale. Data was skewed and meta‐analysis was not possible
General life satisfaction		The mean general life satisfaction in the intervention groups was 0 higher (0 to 0 higher)		0 (0)	See comment	No study measured life satisfaction.
The basis for the assumed risk* (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). CI: Confidence interval; RR: Risk ratio;
GRADE Working Group grades of evidence High quality: Further research is very unlikely to change our confidence in the estimate of effect. Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low quality: We are very uncertain about the estimate.
¹ Risk of bias: Rated as SERIOUS: The participants and personnel were not blinded and performance bias may be present. Missing data was addressed by Last Observation Carried Forward in Edward 2006 but attrition bias may be present as loss to follow‐up was 30% at 9 months. ² Imprecision: Rated as SERIOUS: The event rate is low and the confidence interval is wide. ³ Imprecision: Rated as SERIOUS: The sample size of the two trials combined is very small and any estimate of effect is likely to be imprecise. ⁴ Imprecision: Rated as VERY SERIOUS: The event rate is low, the sample size small and the confidence interval is wide. ⁵ Imprecision: Rated as VERY SERIOUS: The confidence interval is very wide and the sample size small.

Open in table viewer

Summary of findings 5. COGNITIVE BEHAVIOUR THERAPY and PSYCHOSOCIAL REHABILITATION compared to TREATMENT AS USUAL for both severe mental illness and substance misuse

COGNITIVE BEHAVIOUR THERAPY and PSYCHOSOCIAL REHABILITATION compared to TREATMENT AS USUAL for both severe mental illness and substance misuse
Patient or population: patients with both severe mental illness and substance misuse Settings: Jail and community Intervention: COGNITIVE BEHAVIOUR THERAPY and PSYCHOSOCIAL REHABILITATION Comparison: TREATMENT AS USUAL
Outcomes	*Illustrative comparative risks (95% CI)**		Relative effect (95% CI)	No of Participants (studies)	Quality of the evidence (GRADE)	Comments
	Assumed risk	Corresponding risk
	TREATMENT AS USUAL	COGNITIVE BEHAVIOUR THERAPY and PSYCHOSOCIAL REHABILITATION
Loss to Treatment Follow‐up: mean 12 months	See comment	See comment	Not estimable	61 (1 study)	⊕⊝⊝⊝ very low^1,2	Loss to Treatment was only reported per trial and not per randomised arm.
Death ‐ not measured	See comment	See comment	Not estimable	‐	See comment	The trial did not measure death as an outcome.
Alcohol use ‐ not measured	See comment	See comment	Not estimable	‐	See comment	This trial focused on criminal outcomes of jail and offences. Data were skewed and not compared between arms.
Drug (non‐alcohol) use ‐ not measured	See comment	See comment	Not estimable	‐	See comment	This trial focused on criminal outcomes of jail and offences. The data were skewed and was not compared between arms.
Mental state ‐ not measured	See comment	See comment	Not estimable	‐	See comment	This trial focused on criminal outcomes of jail and offences. The data were skewed and was not compared between arms.
Global Assessment of Functioning ‐ not measured	See comment	See comment	Not estimable	‐	See comment	This trial focused on criminal outcomes of jail and offences. The data were skewed and was not compared between arms.
General life satisfaction ‐ not measured	See comment	See comment	Not estimable	‐	See comment	This trial focused on criminal outcomes of jail and offences. The data were skewed and was not compared between arms.
The basis for the assumed risk* (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). CI: Confidence interval; RR: Risk ratio;
GRADE Working Group grades of evidence High quality: Further research is very unlikely to change our confidence in the estimate of effect. Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low quality: We are very uncertain about the estimate.
¹ Risk of bias: Rated as VERY SERIOUS: Although the random generation is reported as computer‐generated the numbers in the arms vary and no report is made as to whether randomisation was done in a ratio fashion. Blinding was not done and performance bias may be unclear and detection bias is high risk as the assessors were not blinded. There is a high risk of selective reporting bias as few outcomes are reported per arm and mainly by site. ² Imprecision: Rated as VERY SERIOUS: The sample size is small and any estimate of effect (had it been reported per arm) is highly likely to be imprecise.

Open in table viewer

Summary of findings 6. COMBINED COGNITIVE BEHAVIOUR THERAPY and INTENSIVE CASE MANAGEMENT compared to TREATMENT AS USUAL for both severe mental illness and substance misuse

COMBINED COGNITIVE BEHAVIOUR THERAPY and INTENSIVE CASE MANAGEMENT compared to TREATMENT AS USUAL for both severe mental illness and substance misuse
Patient or population: patients with both severe mental illness and substance misuse Settings: Jail and community Intervention: COMBINED COGNITIVE BEHAVIOUR THERAPY and INTENSIVE CASE MANAGEMENT Comparison: TREATMENT AS USUAL
Outcomes	*Illustrative comparative risks (95% CI)**		Relative effect (95% CI)	No of Participants (studies)	Quality of the evidence (GRADE)	Comments
	Assumed risk	Corresponding risk
	TREATMENT AS USUAL	COMBINED COGNITIVE BEHAVIOUR THERAPY and INTENSIVE CASE MANAGEMENT
Loss to Treatment Follow‐up: mean 12 months	See comment	See comment	Not estimable	59 (1 study)	⊕⊝⊝⊝ very low^1,2
Death	See comment	See comment	Not estimable	0 (0)	See comment	The trial did not measure death.
Alcohol use ‐ not measured	See comment	See comment	Not estimable	‐	See comment	This trial focused on criminal outcomes of jail and offences. Data were skewed and not compared between arms.
Drug (non‐alcohol) use ‐ not measured	See comment	See comment	Not estimable	‐	See comment	This trial focused on criminal outcomes of jail and offences. Data were skewed and not compared between arms.
Mental state ‐ not measured	See comment	See comment	Not estimable	‐	See comment	This trial focused on criminal outcomes of jail and offences. Data were skewed and not compared between arms.
Global Assessment of Functioning ‐ not measured	See comment	See comment	Not estimable	‐	See comment	This trial focused on criminal outcomes of jail and offences. Data were skewed and not compared between arms.
General life satisfaction ‐ not measured	See comment	See comment	Not estimable	‐	See comment	This trial focused on criminal outcomes of jail and offences. Data were skewed and not compared between arms.
The basis for the assumed risk* (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). CI: Confidence interval; RR: Risk ratio;
GRADE Working Group grades of evidence High quality: Further research is very unlikely to change our confidence in the estimate of effect. Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low quality: We are very uncertain about the estimate.
¹ Risk of bias: Rated as VERY SERIOUS: Although the random generation is reported as computer‐generated the numbers in the arms vary and no report is made as to whether randomisation was done in a ratio fashion. Blinding was not done and performance bias may be unclear and detection bias is high risk as the assessors were not blinded. There is a high risk of selective reporting bias as few outcomes are reported per arm and mainly by site. ² Risk of bias: Rated as VERY SERIOUS: Although the random generation is reported as computer‐generated the numbers in the arms vary and no report is made as to whether randomisation was done in a ratio fashion. Blinding was not done and performance bias may be unclear and detection bias is high risk as the assessors were not blinded. There is a high risk of selective reporting bias as few outcomes are reported per arm and mainly by site.

Open in table viewer

Summary of findings 7. INTENSIVE CASE MANAGEMENT compared to TREATMENT AS USUAL for both severe mental illness and substance misuse

INTENSIVE CASE MANAGEMENT compared to TREATMENT AS USUAL for both severe mental illness and substance misuse
Patient or population: patients with both severe mental illness and substance misuse Settings: Jail and community Intervention: INTENSIVE CASE MANAGEMENT Comparison: TREATMENT AS USUAL
Outcomes	*Illustrative comparative risks (95% CI)**		Relative effect (95% CI)	No of Participants (studies)	Quality of the evidence (GRADE)	Comments
	Assumed risk	Corresponding risk
	TREATMENT AS USUAL	INTENSIVE CASE MANAGEMENT
Loss to Treatment	See comment	See comment	Not estimable	101 (1 study)	⊕⊝⊝⊝ very low^1,2
Death ‐ not measured	See comment	See comment	Not estimable	‐	See comment
Alcohol use ‐ not measured	See comment	See comment	Not estimable	‐	See comment	This trial focused on criminal outcomes of jail and offences. Data were skewed and not compared between arms.
Drug (non‐alcohol) use ‐ not measured	See comment	See comment	Not estimable	‐	See comment	This trial focused on criminal outcomes of jail and offences. Data were skewed and not compared between arms.
Mental state ‐ not measured	See comment	See comment	Not estimable	‐	See comment	This trial focused on criminal outcomes of jail and offences. Data were skewed and not compared between arms.
Global Assessment of Functioning ‐ not measured	See comment	See comment	Not estimable	‐	See comment	Data were skewed and not compared between arms.
General life satisfaction ‐ not measured	See comment	See comment	Not estimable	‐	See comment
The basis for the assumed risk* (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). CI: Confidence interval; RR: Risk ratio;
GRADE Working Group grades of evidence High quality: Further research is very unlikely to change our confidence in the estimate of effect. Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low quality: We are very uncertain about the estimate.
¹ Risk of bias: Rated as VERY SERIOUS: Although the random generation is reported as computer‐generated the numbers in the arms vary. The report states that group sizes were not equivalent due to early jail releases, which necessitated the discontinuation of new participants being randomly assigned to the treatment groups after May 2003. Allocation concealment was not conducted. Blinding was not done and performance bias may be unclear and detection bias is high risk as the assessors were not blinded. There is a high risk of selective reporting bias as few outcomes are reported per arm and mainly by site. ² Imprecision: Rated as SERIOUS: The sample size is small and any estimate of effect (had it been reported per arm) is likely to be imprecise.

Open in table viewer

Summary of findings 8. MOTIVATIONAL INTERVIEWING compared to TREATMENT AS USUAL for both severe mental illness and substance misuse

MOTIVATIONAL INTERVIEWING compared to TREATMENT AS USUAL for both severe mental illness and substance misuse
Patient or population: patients with both severe mental illness and substance misuse Settings: Hospital and community Intervention: MOTIVATIONAL INTERVIEWING Comparison: TREATMENT AS USUAL
Outcomes	*Illustrative comparative risks (95% CI)**		Relative effect (95% CI)	No of Participants (studies)	Quality of the evidence (GRADE)	Comments
	Assumed risk	Corresponding risk
	TREATMENT AS USUAL	MOTIVATIONAL INTERVIEWING
Lost to treatment Follow‐up: mean 6 months	156 per 1000	266 per 1000 (94 to 560)	RR 1.71 (0.63 to 4.64)	62 (1 study)	⊕⊝⊝⊝ very low^1,2
Death Follow‐up: mean 18 months	40 per 1000	42 per 1000 (3 to 629)	RR 1.04 (0.07 to 15.73)	49 (1 study)	⊕⊝⊝⊝ very low^3,4
Alcohol use Not abstaining from alcohol Follow‐up: mean 6 months	923 per 1000	332 per 1000 (157 to 692)	RR 0.36 (0.17 to 0.75)	28 (1 study)	⊕⊝⊝⊝ very low^5,6
Drug (non‐alcohol) use Polydrug consumption levels measured by Opiate Treatment Index (OT) Follow‐up: mean 12 months		The mean drug (non‐alcohol) use in the intervention groups was 0.07 lower (0.56 lower to 0.42 higher)		89 (1 study)	⊕⊝⊝⊝ very low^7,8
Mental state Symptom Checklist 90‐revised ‐ General Severity Index: Scale 0 to 4: Average score Follow‐up: mean 3 months		The mean mental state in the intervention groups was 0.19 lower (0.59 lower to 0.21 higher)		30 (1 study)	⊕⊝⊝⊝ very low^9,10	This is unlikely to be of clinical significance.
Global Assessment of Functioning GAF scale of 1‐ 100 Follow‐up: mean 12 months		The mean global assessment of functioning in the intervention groups was 2.3 higher (1.3 lower to 5.9 higher)		54 (1 study)	⊕⊝⊝⊝ very low^1,9	The difference is unlikely to be of clinical significance given the scale is from 1 to 100.
General life satisfaction ‐ not measured	See comment	See comment	Not estimable	‐	See comment	None of the eight trials contributing data to this comparison measured general life satisfaction.
The basis for the assumed risk* (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). CI: Confidence interval; RR: Risk ratio; OR: Odds ratio;
GRADE Working Group grades of evidence High quality: Further research is very unlikely to change our confidence in the estimate of effect. Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low quality: We are very uncertain about the estimate.
¹ Risk of bias: Rated as SERIOUS: This single trial has unclear risk of allocation concealment and an unclear risk for performance bias. ² Imprecision: Rated as VERY SERIOUS: The sample size is very small (N = 62), the event rate very low and the confidence interval very wide. ³ Risk of bias: Rated as SERIOUS: Allocation concealment was unclear and blinding was not possible so performance bias is unclear. Assessors were not blinded so there is a high risk of detection bias. Attrition was 29% at 18 months. ⁴ Imprecision: Rated as VERY SERIOUS: The event rate is very small and the confidence interval is very wide. ⁵ Risk of bias: Rated as SERIOUS: Random generation, allocation concealment and performance bias (lack of blinding) posed an unclear risk of bias. Detection bias was likely as assessors were not blinded. ⁶ Imprecision: Rated as VERY SERIOUS: The sample size was extremely small (N = 30), and the event rate very low. ⁷ Risk of bias: Rated as SERIOUS: Selection bias was unclear and performance bias may be present as personnel and participants were not blinded. Assessors were blinded. Attrition bias is a high risk as 44% were lost to follow‐up by 12 months. ⁸ Imprecision: Rated as VERY SERIOUS: The single trial has a very small sample size (N = 30) and imprecision is very likely. ⁹ Imprecision: Rated as VERY SERIOUS: The single trial sample size is very small (N = 54) and the confidence interval is very wide. ¹⁰ Risk of bias: Rated as VERY SERIOUS: Selection bias was a high risk as allocation concealment was modified to allow for participant refusal and to minimise disruption to the treatment programme. Performance and detection bias were unclear as blinding was not possible for personnel and participants and assessor blinding was not reported.

Open in table viewer

Summary of findings 9. SKILLS TRAINING compared to TREATMENT AS USUAL for both severe mental illness and substance misuse

SKILLS TRAINING compared to TREATMENT AS USUAL for both severe mental illness and substance misuse
Patient or population: patients with both severe mental illness and substance misuse Settings: Community and outpatient Intervention: SKILLS TRAINING Comparison: TREATMENT AS USUAL
Outcomes	*Illustrative comparative risks (95% CI)**		Relative effect (95% CI)	No of Participants (studies)	Quality of the evidence (GRADE)	Comments
	Assumed risk	Corresponding risk
	TREATMENT AS USUAL	SKILLS TRAINING
Lost to treatment Follow‐up: mean 12 months	367 per 1000	257 per 1000 (162 to 404)	RR 0.7 (0.44 to 1.1)	94 (2 studies)	⊕⊝⊝⊝ very low^1,2
Death ‐ not measured	See comment	See comment	Not estimable	‐	See comment	No trial measured death as an outcome.
Alcohol use C‐DIS‐R average score Follow‐up: mean 12 months	See comment	See comment	Not estimable	46 (1 study)	⊕⊝⊝⊝ very low^3,4	Data was skewed and no estimate of effect was calculated between randomised arms.
Drug (non‐alcohol) use C‐DIS‐R average score Follow‐up: mean 12 months	See comment	See comment	Not estimable	46 (1 study)	⊕⊝⊝⊝ very low^3,4	Data were skewed and no estimate of effect was calculated between randomised arms.
Mental state Relapse measured by days in hospital Follow‐up: 8 months	See comment	See comment	Not estimable	29 (1 study)	⊕⊝⊝⊝ very low^1,5	Data were highly skewed and no estimates of effects were calculated between randomised arms.
Global Assessment of Functioning Role Functioning Scale: scale 1 to 7 Follow‐up: mean 12 months		The mean global assessment of functioning in the intervention groups was 1.07 higher (1.15 lower to 3.29 higher)		47 (1 study)	⊕⊝⊝⊝ very low^3,6	NOTE: the scale is 1 to 7 and the difference observed is not of clinical importance and is not statistically significant.
General life satisfaction ‐ not measured	See comment	See comment	Not estimable	‐	See comment	Neither trial measured general life satisfaction.
The basis for the assumed risk* (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). CI: Confidence interval; RR: Risk ratio;
GRADE Working Group grades of evidence High quality: Further research is very unlikely to change our confidence in the estimate of effect. Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low quality: We are very uncertain about the estimate.
¹ Risk of Bias: Rated as VERY SERIOUS: Blinding was not possible and performance bias may be present. It was unclear whether assessors were blinded and detection bias may be present. Attrition bias was a high risk in Hellerstein 1995 with 47% loss to follow‐up at 4 months and 64% at 8 months with no reasons for drop‐outs provided and not addressed in analysis. ² Imprecision: Rated as SERIOUS: The event rate was low (zero events in one trial) with a wide confidence interval (absolute risk: 110 fewer per 1000 ranging from 206 fewer to 27 more per 1000) ³ Risk of Bias: Rated as SERIOUS: Blinding was not possible and performance bias may be present. It was unclear whether assessors were blinded and detection bias may be present. ⁴ Imprecision: Rated as VERY SERIOUS: The single trial has a very small sample size (N =46) and imprecision is highly likely. ⁵ Imprecision: Rated as VERY SERIOUS: Data was available for only 29 participants and any estimate of effect is likely to be imprecise. ⁶ Imprecision: Rated as VERY SERIOUS: The single trial has a very small sample size (N =47) and the confidence interval is wide.

Open in table viewer

Summary of findings 10. SPECIALISED CASE MANAGEMENT SERVICES compared to STANDARD CARE for both severe mental illness and substance misuse

SPECIALISED CASE MANAGEMENT SERVICES compared to STANDARD CARE for both severe mental illness and substance misuse
Patient or population: patients with both severe mental illness and substance misuse Settings: Community Intervention: SPECIALISED CASE MANAGEMENT SERVICES Comparison: STANDARD CARE
Outcomes	*Illustrative comparative risks (95% CI)**		Relative effect (95% CI)	No of Participants (studies)	Quality of the evidence (GRADE)	Comments
	Assumed risk	Corresponding risk
	STANDARD CARE	SPECIALISED CASE MANAGEMENT SERVICES
Loss to Treatment ‐ not measured	See comment	See comment	Not estimable	‐	See comment	This was not measured in the trial.
Death ‐ not measured	See comment	See comment	Not estimable	‐	See comment	This was not measured in the trial.
Alcohol use Unpublished drug and alcohol questionnaire	See comment	See comment	Not estimable	64 (1 study)	⊕⊝⊝⊝ very low^1,2	Data were from one small sub‐trial and there was no analysis of the difference between treatment arms.
Drug (non‐alcohol) use Unpublished drug and alcohol questionnaire	See comment	See comment	Not estimable	64 (1 study)	⊕⊝⊝⊝ very low^1,2	Data were from one small sub‐trial and there was no analysis of the difference between treatment arms.
Mental state Days of admission Follow‐up: mean 24 months	See comment	See comment	Not estimable	56 (1 study)	⊕⊝⊝⊝ very low^1,2,3	Data were reported per randomised site and were highly skewed.
Global Assessment of Functioning ‐ not reported	See comment	See comment	Not estimable	‐	See comment	The GAF scale was conducted but the trial did not report on this.
General life satisfaction ‐ not measured	See comment	See comment	Not estimable	‐	See comment	The trial did not measure life satisfaction.
The basis for the assumed risk* (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). CI: Confidence interval; RR: Risk ratio;
GRADE Working Group grades of evidence High quality: Further research is very unlikely to change our confidence in the estimate of effect. Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low quality: We are very uncertain about the estimate.
¹ Risk of Bias: Rated as VERY SERIOUS: This small study was conducted in 6 sites but only 2 sites were randomised. In these sites, the risk of selection bias was unclear as random generation and allocation concealment were not reported and performance and detection bias were unclear as blinding was not reported. Risk of attrition was high as loss to follow‐up was 37% at 12 months and > 50% at 18 months. There is a high risk of selective reporting bias as many results were reported per site rather than by randomised arms. ² Imprecision: Rated as VERY SERIOUS: The sample size of the available denominators for the outcome of days of admission for the two randomised sites is very small (N = 36 and 28) and any estimate of effect is likely to be imprecise. ³ Inconsistency: Rated as SERIOUS: In one site there were fewer days in the specialised case management group (mean = 8.36 (SD: 22.36) vs 1.86 (SD: 4.20) and in the second site there were many more days in the specialised group (mean = 37.06 (SD: 40.50) vs 76.91 (SD: 110.34).

Open in table viewer

Summary of findings 11. CONTINGENCY MANAGEMENT compared to TREATMENT AS USUAL for both severe mental illness and substance misuse

Contingency Management compared to Treatment as usual for both severe mental illness and substance misuse
Patient or population: patients with both severe mental illness and substance misuse Settings: Community Intervention: Contingency Management Comparison: Treatment as usual
Outcomes	*Illustrative comparative risks (95% CI)**		Relative effect (95% CI)	No of Participants (studies)	Quality of the evidence (GRADE)	Comments
	Assumed risk	Corresponding risk
	Treatment as usual	Contingency Management
Lost to treatment Follow‐up: mean 3 months	353 per 1000	582 per 1000 (416 to 815)	RR 1.65 (1.18 to 2.31)	176 (1 study)	⊕⊕⊝⊝ low^1,2
Death ‐ not measured	See comment	See comment	Not estimable	‐	See comment	Neither trial measured death as an outcome.
Alcohol use Mean days of alcohol use in 6 months	See comment	See comment	Not estimable	107 (1 study)	See comment	Data were skewed from a single trial and no between‐arm comparison was reported.
Drug (non‐alcohol) use Number with stimulant‐positive urine test Follow‐up: mean 6 months	647 per 1000	537 per 1000 (421 to 686)	RR 0.83 (0.65 to 1.06)	176 (1 study)	⊕⊕⊝⊝ low^1,3
Mental state Number hospitalised Follow‐up: mean 6 months	106 per 1000	22 per 1000 (5 to 98)	RR 0.21 (0.05 to 0.93)	176 (1 study)	⊕⊕⊝⊝ low^1,4
Global Assessment of Functioning ‐ not measured	See comment	See comment	Not estimable	‐	See comment	Neither trial measured Functioning.
General life satisfaction ‐ not measured	See comment	See comment	Not estimable	‐	See comment	Neither trial measured Life Satisfaction.
The basis for the assumed risk* (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). CI: Confidence interval; RR: Risk ratio;
GRADE Working Group grades of evidence High quality: Further research is very unlikely to change our confidence in the estimate of effect. Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low quality: We are very uncertain about the estimate.
¹ Risk of bias: Rated as SERIOUS: Blinding was not possible so performance bias was rated as an unclear risk of bias. Primary outcome was urinalysis so detection bias was unlikely, Attrition bias was an unclear risk with only 42% completing 4 months of intervention and 65% completing the control. ² Imprecision: Rated as SERIOUS: The confidence interval is wide. The estimate of effect and 95% confidence interval do not cross 1 and indicate harm. However the estimate is likely to be imprecise given the low event rate. ³ Imprecision: Rated as SERIOUS: The event rate is low (less than 300 according to GRADE) and the confidence interval includes 1 and appreciable benefit. ⁴ Imprecision: Rated as SERIOUS: The event rate is very low and the confidence interval is wide.

Background

Description of the condition

Substance misuse among people with a severe mental illness is a major concern, with prevalence rates over 50%. This figure varies across studies, depending on location and methodologies and by the way substance misuse problems and severe mental illness are defined (Carra 2009; Green 2007; Gregg 2007; Lai 2012a; Lai 2012b; Lowe 2004; Regier 1990; Todd 2004). Improving services for these patients (often labelled as having a 'dual diagnosis') is a priority as using drugs or consuming alcohol, even at low levels, is associated with a range of adverse consequences, including higher rates of non‐adherence, relapse, suicide, HIV, hepatitis, homelessness, aggression, incarceration, and fewer social supports or financial resources (Donald 2005; Green 2007; Hunt 2002; Schmidt 2011; Siegfried 1998; Tsuang 2006). Further co‐morbidity places an additional burden on families, psychiatric and government resources and is particularly challenging to those providing services as these patients have lower rates of treatment completion and higher rates of relapse (Siegfried 1998; Tyrer 2004; Warren 2007).

Description of the intervention

It is important that co‐occurring substance use is detected as early as possible and that appropriate and effective treatment is provided (Green 2007; Siegfried 1998). Treatment has traditionally been complicated by different approaches and philosophies among mental health and drug services as they may differ in their theoretical underpinnings, policies and protocols. Separate treatment programmes have been offered in parallel or sequentially by different clinicians, which may result in less than optimum patient care with the patient having to negotiate two separate treatment systems (Green 2007). Another approach to care is the integrated treatment model where mental health and substance use treatments are brought together simultaneously by the same service, clinician or team of clinicians who are competent in both service areas and place similar importance on both (Drake 2004; Green 2007). Basic elements include an assertive style of engagement, techniques of close monitoring, comprehensive services (including inpatient, day hospital, community team and outpatient care), supportive living environments, flexibility and specialisation of clinicians, step‐wise treatment, and a long‐term perspective and optimism (Drake 1993). Assertive Community Treatment (ACT) and residential programmes are generally long‐term and can form a basis for integrated programmes.

How the intervention might work

As many substance users in the general population have benefited from a range of psychosocial interventions, it would follow that these same interventions should also benefit people with psychosis when their mental health problems are taken into account (Barrowclough 2006 a). Most, if not all, substances of abuse increase dopaminergic activity in the brain (Koob 2010). Given that schizophrenia and other forms of psychosis are characterised by heightened dopaminergic transmission and that neuroleptics decrease activity or block dopamine receptors (Kapur 2005), it stands to reason that most substances of abuse increase symptoms, the risk of relapse and compromise the beneficial effects of neuroleptics (LeDuc 1995; Seibyl 1993). This is especially true for stimulant drugs like amphetamine, cocaine and concentrated forms such as crack cocaine and methamphetamine ('ice') that can exacerbate or mimic psychotic symptoms (Callaghan 2012; McKetin 2013; Pluddemann 2013). Substance use is also related to poor compliance with treatment, further increasing the risk of relapse (Hunt 2002). Interventions that reduce substance use are likely to improve symptoms, relapse rates, recovery and other outcomes (Cleary 2009a; Drake 2008; Horsfall 2009). Common psychosocial interventions to reduce substance use and misuse include Twelve Step recovery, which adopts a supportive approach such as that used by Alcoholics Anonymous (AA); motivational interviewing, which aims to increase an individual's motivation for change; group and individual skills training; family psycho‐education regarding the signs and effects of substance use; and individual or group psychotherapy involving cognitive or behavioural principles, or both, which aim to increase coping strategies, awareness and self‐monitoring behaviour. All of these interventions can vary in intensity and duration, and can be offered in a variety of settings either individually or as part of an integrated programme. Integrated treatment ensures mental health and substance misuse services are available in the same setting and delivered in a coherent fashion.

Why it is important to do this review

While encouraging, results of trials assessing the effectiveness of these psychosocial interventions for mental health consumers are equivocal (for reviews, see: Bogenschutz 2006; Cleary 2009a; Dixon 2010; Drake 1998b; Drake 2004; Drake 2008; Horsfall 2009; Ley 2000; Mueser 2005; NICE 2011). Many studies have been hampered by small heterogeneous samples, poor experimental design (for example non‐random assignment), high attrition rates, short follow‐up periods, lack of accuracy of measuring substance use, skewed data, use of non‐standardised outcome measures and unclear descriptions of treatment components (Barrowclough 2006 a; Cleary 2008; Ley 2000). When assessing integrated programmes, it can also be difficult to determine exactly which part of the programme is the most effective, and control groups (particularly in the USA) may involve a certain level of service integration, making interpretations difficult (Drake 1996). Moreover, study methodologies, interventions and outcome measures vary across studies, as do patterns of participants' readiness to change, severity and type of illness and substance use, all of which make combining results in a review problematic (Donald 2005).

This current review updates the 2008 Cochrane review on "Psychosocial treatment programmes for people with both severe mental illness and substance misuse". The previous review included any programme of substance misuse treatment and located 25 randomised controlled trials. The authors from two previous reviews found no evidence to support any one substance misuse programme as being superior to another (Cleary 2008; Ley 2000). We felt an update of this review was warranted as there are several new studies that have been conducted in the last five years.

Objectives

To assess the effects of psychosocial interventions for reduction in substance use by people with a serious mental illness compared with standard care.

Methods

Criteria for considering studies for this review

Types of studies

We included all relevant, randomised controlled trials (RCTs) with or without blinding if they utilised a psychosocial intervention to reduce substance use in patients with severe mental illness and substance misuse compared with standard care. We excluded quasi‐randomised trials, such as those where allocation was alternate or sequential.

Types of participants

We included people with severe mental illness (for example, schizophrenia, bipolar disorder and psychosis) and concurrent problem of substance misuse. We have defined people with 'severe' illness as those with a chronic mental illness like schizophrenia who present to adult services for long‐term care. Those with an organic disorder, non‐severe mental illness (for example, personality disorder, post‐traumatic stress disorder (PTSD), anxiety disorders, depressive symptoms based on scores from a scale) or those who solely abused tobacco were, if possible, excluded. Trials that included a mixture of patients with a severe mental diagnosis were included if a large proportion had a schizophrenia‐like illness or psychosis (see Characteristics of included studies). For the current update, studies were excluded if all of the participants had a diagnosis of bipolar disorder or major depressive disorder, so they do not overlap with affective disorder reviews.

Types of interventions

We anticipated that studies included in the review would use a wide variety of psychosocial interventions for substance misuse, making direct comparisons difficult. In order to enhance the utility of the review, we developed a priori categories within which we made planned comparisons. These categories were developed from theoretical models of the types of behavioural and psychosocial interventions offered to clients and the context in which they are delivered. The types of interventions were grouped in two strata, based on duration and intensity of treatment. The first stratum describes long‐term interventions for dual diagnosis patients that offered an array of services with different levels of integration and assertive outreach (taking place over years rather than weeks or months), and the second describes stand‐alone psychosocial interventions that clients received over shorter periods. We did not include Interventions for informal carers (partner or family members) as separate categories, though we did sometimes include them as part of the treatments mentioned below.

1. Provider‐oriented long‐term interventions: integrated and non‐integrated care by community mental health teams for dual diagnosis populations

1.1 Integrated models of care with assertive community treatment (ACT)

Integrated treatment models for patients with a dual diagnosis unify services at the provider level rather than forcing clients to negotiate separate mental health and substance abuse treatment programmes (Drake 1993). The range of services provided varies according to client needs and should be able to handle patients at differing stages of readiness to change (Tsuang 2006). Substance abuse treatments are integrated into an array of direct services, such as frequent home visits, crisis intervention, housing skills training, vocational rehabilitation, medication monitoring, and family psycho‐education. Integrated treatment means that the same clinicians or teams of clinicians in the one setting provide long‐term treatments in a co‐ordinated fashion (Barrowclough 2006 a; Green 2007). Teams consist of three to six clinicians and attempt to remain faithful to a specified model of care. To the client, the services should appear seamless with a consistent approach, philosophy and set of recommendations. Usually the caseloads of dual diagnosis teams are lower (approximately 10 to 15 clients shared within a team) than for standard case managers (approximately 20 to 30). Integrated treatment is a process that takes place over years rather than weeks or months. Studies included in this category must have clearly demonstrated the following: 1) assertive community outreach to engage and retain clients and to offer services to reluctant or uncooperative clients, 2) staged interventions to reduce substance use, and 3) adherence to the integrated team philosophy. The intervention could be community‐based or provided for special populations, such as homeless people or forensic patients.

1.2 Non‐integrated models of care or intensive case management

Non‐integrated treatment entails similar interventions by community teams, as described above, except the same members do not deliver them in a co‐ordinated fashion and assertive community outreach is not included. Normally, case managers in this category are better trained and have higher clinical qualifications and better therapeutic skills than standard case managers. Intensive case management is defined as lower case load size (approximately 10 to 15 clients) than for standard case managers and tends to have a 'psychodynamic' flavour (see Marshall 1998). To be included in this category, part of the intervention had to address the client's drug and alcohol misuse.

2. Patient or client focused short‐term interventions for substance misuse

These interventions can be broadly grouped into individual and group modalities. They are offered in addition to routine care (treatment as usual, standard case management) and are based on different theoretical models. Although they could be part of the provider‐oriented packages described above, studies included here were easier to evaluate since they described a simplified intervention that can be easily reproduced. As some studies used more than one intervention (for example, cognitive behavioural therapy combined with motivational interviewing), these were included in a separate category.

2.1 Individual approaches

2.1.1 Cognitive behavioural therapies

Cognitive behavioural approaches include a variety of interventions (Rector 2012; Work Group 2007). The defining features are: 1) emphasis on functional analysis of drug use, understanding the reasons for use and consequences; and 2) skills training for recognising the situations where a person is most vulnerable to drug use and avoiding these situations. A cognitive behavioural intervention seeks to establish links between drug misuse, irrational beliefs, and misperceptions at a personal level and endeavours to correct the thoughts, feelings and actions of the recipient with respect to and the promotion of alternative ways of coping (Jones 2004; Jones 2012). The target symptom that is usually focused on is reducing problematic substance use or harm minimisation, such as reducing the risk of contracting HIV.

2.1.2 Motivational interviewing

Motivational interviewing takes a non‐confrontational approach to treating substance misuse and is intended to enhance the individual's intrinsic motivation for change, in patients who often find it difficult to commit to change (Tsuang 2006). It matches the patient's level of problem recognition to change with specific strategies and goals and can be delivered in brief sessions or over a number of weeks. It is based on four key principles: 1) expressing empathy, 2) developing discrepancy, 3) supporting self‐efficacy, and 4) rolling with resistance (Chanut 2005); and is directed at five stages: 1) pre‐contemplation, 2) contemplation, 3) preparation, 4) action, and 5) maintenance (Tsuang 2006). A key hypothesis is that the patient's perspective on the importance of change is fundamental to the patient's readiness to address the problem. Developing the patient's confidence in their ability to achieve the desired change is also a key issue of motivational interviewing. This treatment is delivered individually or in small group settings.

2.1.3 Contingency management

Based on principals of operant conditioning, contingency management (CM) offers incentives or rewards to reinforce specific goals (reduced substance use, risky behaviours etc). Typically, rewards are provided if a negative substance test is provided (urine test or breath test). Rewards can vary widely, ranging from encouraging statements ('keep up the good work') to large or small financial prize (vouchers for food, cash etc). This approach has shown consistent success with various drug use disorders: cannabis, opiate and cocaine dependence and polysubstance use disorders (Dutra 2008). Contingency management has also been 'bundled' with other psychosocial interventions, for example, motivational interviewing plus cognitive behavioural therapies (Bellack 2006). Thus, contingency management was added to the current review due to the number of current and ongoing trials using this intervention.

2.2 Group approaches

2.2.1 Social skills training

These groups are aimed at helping clients develop interpersonal skills for establishing and maintaining relationships with others, dealing with conflict, and handling social situations involving substance misuse (Mueser 2004). They are taught in a highly structured way by using role play, corrective feedback and homework. This usually occurs in a group format, although the methods can also be employed in individual work as a type of cognitive behavioural counselling.

3. Standard care or treatment as usual

This was defined as the care that a person would normally receive had they not been included in the research trial. This could include standard case management (see Marshall 1998 for definition). Standard care varies between settings and can be supplemented by additional components, including psycho‐educational material, family therapy, or referral to self‐help groups (for example, Alcoholics Anonymous) or other agencies for substance abuse treatment.

Types of outcome measures

We intended to group data into short, medium and long‐term outcomes. However, this would have resulted in much data loss as outcome periods varied and therefore, post hoc, we reported for the following time periods: 3, 6, 9, 12, 18, 24 and 36 months (where applicable).

Primary outcomes

1. Numbers lost to treatment: this is a measure of stability and engagement.

This is the number of participants who did not continue with the treatment following randomisation; however, some may have provided data for the study. This varies with study design as some treatments are ongoing for the study duration and some are short‐term. When studies reported exactly the same data for both lost to treatment and lost to evaluation (see below), and if there were no other studies with which to pool data, then we only reported the numbers lost to treatment (to reduce the number of comparison tables). We did not adjust numbers lost to treatment for death (see below).

2. Change in substance use as defined by each of the studies.

3. Changes in symptoms as defined by each of the studies.

Secondary outcomes

1. Numbers lost to evaluation.

This is the number of people lost to the study who did not provide data at particular time points.

2. Death (all causes).

Some studies may not have reported on the number of participants dying over the treatment or evaluation period. If reported, we recorded death in a separate table but these cases were retained in the lost to treatment and lost to evaluation figures as it was often unclear when the death occurred or the cause of death was not stated as unlikely to be linked to the intervention.

3. Substance use (alcohol or drugs, or both).

4. Mental state.

5. Global functioning.

6. Social functioning.

7. Quality of life and life satisfaction.

8. Hospital readmissions (and days in the community).

9. Homelessness.

10. Compliance with treatment and medication.

Summary of findings table

We used the GRADE approach to interpret findings (Schünemann 2008) and used the GRADE profiler to import data from Review Manager (RevMan) to create 'Summary of findings' (SOF) tables. These tables provide outcome‐specific information concerning the overall quality of evidence from each included study in the comparison, the magnitude of effect of the interventions examined, and the sum of available data on all outcomes that we rated as important to patient care and decision making. We selected the following main outcomes for inclusion in the SOF tables.

Numbers lost to treatment (medium‐term: 12 months; if these data were not available we used the short‐term data).
Death.
Alcohol use (as measured in the trials).
Drug use (as measured in the trials).
Mental state (as measured in the trials, and if no specific scale assessment was done we reported on relapse or hospitalisation).
Global assessment of functioning (as measured in the trials),
General life satisfaction (as measured in the trials).

Search methods for identification of studies

Electronic searches

For previous search methods from prior review updates please see Appendix 1.

Cochrane Schizophrenia Group Trials Register

The Trials Search Co‐ordinator searched the Cochrane Schizophrenia Group Trials Register (July 2012) using the phrase:

[((*polydrug* or *substanc* or *alcoh* or *tranquiliz* or *narcot* or * abus* or *opiat* or *street drug* or *solvent* or *inhalan* or *intoxi*) in title, abstract and indexing terms REFERENCE) or ((*substance abus* or drug abus* or *alcohol* or *cannabis*) in health care conditions of STUDY)].

The Cochrane Schizophrenia Group Trials Register is compiled by systematic searches of major databases, handsearches of relevant journals and conference proceedings (see Group Module). Incoming trials are assigned to relevant existing or new review titles.

Searching other resources

1. Reference lists

We searched all references of articles selected for inclusion, major review articles (Baker 2012; Dixon 2010; Drake 2008; Dutra 2008; Horsfall 2009; Kelly 2012) as well as recent guidelines (NICE 2011) on this topic for further relevant trials.

2. Journal databases

Two further searches were completed (8 October 2012 and 15 January 2013) by the principal reviewer (GEH) using the Cochrane Database of Systematic Reviews, MEDLINE (daily update, PREMEDLINE), and PsycINFO. A separate search for randomised trials using contingency management was completed as this was an additional intervention category for this update. We also searched MEDLINE for recent articles (2008 to 2013) by the first authors of all included studies in order to get a more complete list of recent publications.

We also did 'forward' searches to identify trials that cited previously included RCTs using Web of Science and Scopus. Scopus was used to identify trials that cited the most recent version of this review (Cleary 2008) up to 15 February 2013.

3. Trials registries

In addition, websites and journals that list ongoing trials in the USA, UK, Australia and various European countries were searched for RCTs through the the Cochrane Schizophrenia Group Trials Register. The principal researcher (GEH) searched www.clinicaltrials.gov for protocols of current and previously included studies for proposed outcome measures to assess selective reporting bias.

4. Personal contact

We contacted the first author (or corresponding author) of newly included studies for this update regarding their knowledge of ongoing or unpublished trials.

Data collection and analysis

For previous data collection and analysis methods see Appendix 2.

Selection of studies

For this update GEH inspected all citations from the new electronic search and identified relevant abstracts, full text articles and trials against the inclusion criteria. To ensure reliability, KM inspected all full text articles for inclusion. Where there were uncertainties or disagreements, two additional authors provided resolution (NS and MC). Where disputes could not be resolved, these studies remained as awaiting assessment or ongoing studies and the authors were contacted for clarification.

Data extraction and management

1. Extraction

For this update, GEH and KM extracted data from the included studies. We resolved disputes by discussion and adjudication from the other review authors (NS and MC) when necessary. If it was not possible to extract data or if further information was needed, we attempted to contact the authors. We extracted data presented only in graphs and figures whenever possible, but the data were included only if two review authors independently had the same result. When further information was necessary, we contacted authors of studies in order to obtain missing data or for clarification of methods.

2. Management

2.1 Forms

We extracted data onto standard, simple forms.

2.2 Scale‐derived data

We included continuous data from rating scales only if:

the psychometric properties of the measuring instrument have been described in a peer‐reviewed journal (Marshall 2000); and
the measuring instrument has not been written or modified by one of the trialists for that particular trial.

Ideally the measuring instrument should either be: i) a self‐report or ii) completed by an independent rater or relative (not the therapist). We realise that this is not often reported clearly; we have noted whether or not this is the case in Characteristics of included studies.

2.3 Endpoint versus change data

There are advantages of both endpoint and change data. Change data can remove a component of between‐person variability from the analysis. On the other hand, calculation of change needs two assessments (baseline and endpoint), which can be difficult in unstable and difficult to measure conditions such as schizophrenia. We decided to primarily use endpoint data, and only use change data if the former were not available. We combined endpoint and change data in the analysis as we used mean differences (MD) rather than standardised mean differences throughout (Higgins 2011, Chapter 9.4.5.2).

2.4 Skewed data

Continuous data on clinical and social outcomes are often not normally distributed. To avoid the pitfall of applying parametric tests to non‐parametric data, we aimed to apply the following standards to all data before inclusion:

standard deviations and means are reported in the paper or obtainable from the authors;
when a scale starts from the finite number zero, the standard deviation, when multiplied by two, is less than the mean (as otherwise the mean is unlikely to be an appropriate measure of the centre of the distribution (Altman 1996));
if a scale started from a positive value (such as the Positive and Negative Syndrome Scale (PANSS) which can have values from 30 to 210), we modified the calculation described above to take the scale starting point into account. In these cases skew is present if 2SD > (S ‐ S min), where S is the mean score and S min is the minimum score.

Endpoint scores on scales often have a finite start and endpoint and these rules can be applied. We entered skewed endpoint data from studies of fewer than 200 participants as 'other data; within Data and analyses rather than into a statistical analysis. Skewed data pose less of a problem when looking at mean if the sample size is large; we entered such endpoint data into the syntheses.

When continuous data are presented on a scale that includes a possibility of negative values (such as change data), it is difficult to tell whether data are skewed or not; we entered skewed change data into analyses regardless of size of study.

2.5 Common measure

To facilitate comparison between trials, we intended to convert variables that can be reported in different metrics, such as days in hospital (mean days per year, per week or per month) to a common metric (for example, mean days per month).

2.6 Conversion of continuous to binary

Where possible, we made efforts to convert outcome measures to dichotomous data. This can be done by identifying cut‐off points on rating scales and dividing participants accordingly into ’clinically improved’ or ’not clinically improved’. It is generally assumed that if there is a 50% reduction in a scale‐derived score such as the Brief Psychiatric Rating Scale (BPRS) (Overall 1962) or the PANSS (Kay 1986; Kay 1987) this could be considered as a clinically significant response (Leucht 2005a; Leucht 2005b). If data based on these thresholds were not available, we used the primary cut‐off presented by the original authors.

2.7 Direction of graphs

Where possible, we entered data in such a way that the area to the left of the line of no effect indicated a favourable outcome for the treatment intervention. Where keeping to this made it impossible to avoid outcome titles with clumsy double‐negatives (for example, 'Not improved') we reported data where the left of the line indicates an unfavourable outcome. This was noted in the relevant graphs.

Assessment of risk of bias in included studies

For this 2013 update, GEH worked independently by using criteria described in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011) to assess trial quality. This new set of criteria is based on evidence of associations between overestimate of effect and high risk of bias of the article, such as sequence generation, allocation concealment, blinding, incomplete outcome data and selective reporting.

Where inadequate details of randomisation and other characteristics of trials were provided, we contacted authors of the studies in order to obtain additional information.

We have noted the level of risk of bias in the text of the review.

Measures of treatment effect

1. Binary data

For binary outcomes we calculated a standard estimation of the risk ratio (RR) and its 95% confidence interval (CI). It has been shown that RR is more intuitive (Boissel 1999) than odds ratios and that odds ratios tend to be interpreted as RR by clinicians (Deeks 2000). The Number Needed to Treat or Harm (NNT or H) statistic with its CIs is intuitively attractive to clinicians but is problematic both in its accurate calculation in meta‐analyses and interpretation (Hutton 2009). For binary data presented in the 'Summary of findings' tables, where possible, we calculated illustrative comparative risks.

2. Continuous data

For continuous outcomes we estimated mean difference (MD) between groups. We would prefer not to calculate effect size measures (standardised mean difference (SMD)). However, if scales of very considerable similarity were used, we presumed there was a small difference in measurement, and we would have calculated effect size and transformed the effect back to the units of one or more of the specific instruments.

Unit of analysis issues

1. Cluster trials

Studies increasingly employ 'cluster randomisation' (such as randomisation by clinician or practice), but analysis and pooling of clustered data poses problems. Authors often fail to account for intra‐class correlation in clustered studies, leading to a 'unit of analysis' error (Divine 1992) whereby P values are spuriously low, confidence intervals unduly narrow and statistical significance overestimated. This causes type I errors (Bland 1997; Gulliford 1999).

None of the presently included trials used cluster randomisation. For the purposes of future updates of this review, where clustering is not accounted for in primary studies we planned to present data in a table with a (*) symbol to indicate the presence of a probable unit of analysis error. In subsequent versions of this review, should we include cluster RCTs, we will seek to contact first authors of studies to obtain intra‐class correlation coefficients for their clustered data and to adjust for this by using accepted methods (Gulliford 1999). Where clustering has been incorporated into the analysis of primary studies, we plan to present these data as if from a non‐cluster randomised study but adjusted for the clustering effect.

We have sought statistical advice and have been advised that the binary data as presented in a report should be divided by a 'design effect'. This is calculated using the mean number of participants per cluster (m) and the intra‐class correlation coefficient (ICC) (design effect = 1 + (m ‐ 1)*ICC) (Donner 2002). If the ICC is not reported it was assumed to be 0.1 (Ukoumunne 1999).

If we had identified cluster trials, we would have analysed them taking into account intra‐class correlation coefficients and relevant data documented in the report. Synthesis with other studies would have been possible using the generic inverse variance technique.

2. Cross‐over trials

None of the presently included studies employed a cross‐over trial design. For the purposes of future updates of the review, a major concern of cross‐over trials is the carry‐over effect. It occurs if an effect (for example, pharmacological, physiological or psychological) of the treatment in the first phase is carried over to the second phase. As a consequence, on entry to the second phase the participants can differ systematically from their initial state despite a wash‐out phase. For the same reason cross‐over trials are not appropriate if the condition of interest is unstable (Elbourne 2002). As both effects are very likely in severe mental illness, we proposed to only use the data of the first phase of cross‐over studies.

3. Studies with multiple treatment groups

Where a study involves more than two treatment arms, if relevant, we presented the additional treatment arms in comparisons. If data are binary we simply added these and combined them within the two‐by‐two table. If data were continuous we combined data following the formula in section 7.7.3.8 (Combining groups) of the Cochrane Handbook for Systemic reviews of Interventions (Higgins 2011). Where the additional treatment arms were not relevant, we did not reproduce these data.

Dealing with missing data

1. Overall loss of credibility

At some degree of loss of follow‐up, data must lose credibility (Xia 2009). We chose that, for any particular outcome, should more than 50% of data be unaccounted for we would not reproduce these data or use them within the analyses. If, however, more than 50% of those in one arm of a study were lost, but the total loss was less than 50%, we would address this within the 'Summary of findings' tables by down‐rating quality. Finally, we would also downgrade quality within the 'Summary of findings' tables should loss be 25% to 50% in total.

2. Binary

In the case where attrition for a binary outcome is between 0 and 50% and where these data are not clearly described, we presented data on a 'once‐randomised‐always‐analyse' basis (an intention to treat analysis). Those leaving the study early were all assumed to have the same rates of negative outcome as those who completed, with the exception of the outcome of death and adverse effects. For these outcomes the rate of those who stay in the study ‐ in that particular arm of the trial ‐ was used for those who did not. We undertook a sensitivity analysis testing how prone the primary outcomes are to change when data only from people who complete the study to that point were compared to the intention to treat analysis using the above assumptions.

3. Continuous

3.1 Attrition

In the case where attrition for a continuous outcome is between 0% and 50%, and data only from people who complete the study to that point are reported, we reproduced these.

3.2 Standard deviations

If standard deviations are not reported, we first tried to obtain the missing values from the authors. If not available, where there are missing measures of variance for continuous data but an exact standard error and confidence intervals available for group means, and either a P value or t value available for differences in mean, we can calculate them according to the rules described in the Cochrane Handbook for Systemic reviews of Interventions (Higgins 2011). That is, when only the standard error (SE) is reported, standard deviations (SDs) are calculated by the formula SD = SE * square root (n). Chapters 7.7.3 and 16.1.3 of the Cochrane Handbook for Systemic reviews of Interventions (Higgins 2011) present detailed formulae for estimating SDs from P values, t or F values, confidence intervals, ranges or other statistics. If these formulae did not apply, we calculated the SDs according to a validated imputation method which is based on the SDs of the other included studies (Furukawa 2006). Although some of these imputation strategies can introduce error, the alternative would be to exclude a given study’s outcome and thus to lose information. We nevertheless examined the validity of the imputations in a sensitivity analysis by excluding the imputed values.

3.3 Last observation carried forward

We anticipated that in some studies the method of last observation carried forward (LOCF) would be employed within the study report. As with all methods of imputation to deal with missing data, LOCF introduces uncertainty about the reliability of the results (Leucht 2007). Therefore, where LOCF data have been used in the trial, if less than 50% of the data have been assumed we would present and use these data and indicate that they are the product of LOCF assumptions.

Assessment of heterogeneity

1. Clinical heterogeneity

We considered all included studies initially, without seeing comparison data, to judge clinical heterogeneity. We simply inspected all studies for clearly outlying people or situations which we had not predicted would arise. When such situations or participant groups arose, we fully discussed these.

2. Methodological heterogeneity

We considered all included studies initially, without seeing comparison data, to judge methodological heterogeneity. We simply inspected all studies for clearly outlying methods which we had not predicted would arise. When such methodological outliers arose, we fully discussed these.

3. Statistical heterogeneity

3.1 Visual inspection

We visually inspected graphs to investigate the possibility of statistical heterogeneity.

3.2 Employing the I² statistic

We investigated heterogeneity between studies by considering the I² statistic alongside the Chi² P value. The I² provides an estimate of the percentage of inconsistency thought to be due to chance (Higgins 2003). The importance of the observed value of I² depends on: i) magnitude and direction of effects, and ii) strength of evidence for heterogeneity (for example, P value from Chi² test, or a confidence interval for I²). An I² estimate greater than or equal to around 50% accompanied by a statistically significant Chi² statistic was interpreted as evidence of substantial levels of heterogeneity (Higgins 2011). When substantial levels of heterogeneity were found in the primary outcome, we explored reasons for the heterogeneity (Subgroup analysis and investigation of heterogeneity).

Assessment of reporting biases

Reporting biases arise when the dissemination of research findings is influenced by the nature and direction of results (Egger 1997). These are described in section 10 of the Cochrane Handbook for Systematic Reviews of Intervention (Higgins 2011). We are aware that funnel plots may be useful in investigating reporting biases but are of limited power to detect small‐study effects. We did not plan to use funnel plots for outcomes where there were 10 or fewer studies, or where all studies were of similar sizes. As no meta‐analyses of more than five studies were undertaken, we did not conduct funnel plot analysis.

Data synthesis

We understand that there is no closed argument for preference for use of fixed‐effect or random‐effects models. The random‐effects method incorporates an assumption that the different studies are estimating different, yet related, intervention effects. This often seems to be true to us and the random‐effects model takes into account differences between studies even if there is no statistically significant heterogeneity. There is, however, a disadvantage to the random‐effects model: it puts added weight onto small studies, which often are the most biased ones. Depending on the direction of effect, these studies can either inflate or deflate the effect size. We chose the random‐effects model for all analyses. The reader is, however, able to choose to inspect the data using the fixed‐effect model.

Subgroup analysis and investigation of heterogeneity

1. Subgroup analyses ‐ only primary outcomes

1.1 Clinical state, stage or problem

We proposed to undertake this review and provide an overview of the effects of psychosocial interventions for people with schizophrenia in general. In addition, however, we tried to report data on subgroups of people in the same clinical state, stage and with similar problems.

2. Investigation of heterogeneity

If inconsistency was high, we have reported this. First, we investigated whether data had been entered correctly. Second, if data were correct, we visually inspected the graph and successively removed studies outside of the company of the rest to see if homogeneity was restored. For this review we decided that should this occur, with data contributing to the summary finding of no more than around 10% of the total weighting, we would present the data. If not, then we did not pool the data and discussed the issues. We know of no supporting research for this 10% cut‐off, but we use prediction intervals as an alternative to this unsatisfactory state.

When unanticipated clinical or methodological heterogeneity is obvious we simply stated hypotheses regarding these for future reviews or versions of this review. We do not anticipate undertaking analyses relating to these.

Sensitivity analysis

We conducted sensitivity analyses on outcomes of comparisons with four or more trials where studies with different quality were combined to ascertain if there were substantial differences in the results when lesser quality trials or those comprising patients with schizophrenia (or other psychoses) were compared to trials of higher quality or using mixed diagnostic groups. We applied all sensitivity analyses to the primary outcomes based on randomised sequence, allocation concealment and blinding of outcome measurement. We only conducted sensitivity analyses to comparisons with four or more studies as analyses with less than four trials would provide unclear decisions on whether there have been any possible biases in the estimate of effects.

1. Implication of randomisation

We aimed to include trials in a sensitivity analysis if they were described in some way so as to imply randomisation. For the primary outcomes we included these studies and if there was no substantive difference when the implied randomised studies were added to those with a better description of randomisation then we entered all data from these studies.

2. Assumptions for lost binary data

Where assumptions had to be made regarding people lost to follow‐up (see Dealing with missing data) we compared the findings of the primary outcomes when we used our assumptions and when we used data only from people who completed the study to that point. If there was a substantial difference, we reported the results and discussed them but continued to employ our assumption.

Where assumptions had to be made regarding missing standard deviation (SD) data (see Dealing with missing data), we compared the findings of the primary outcomes when we used our assumptions and when we used data only from people who completed the study to that point. A sensitivity analysis was undertaken testing how prone results were to change when completer‐only data were compared to the imputed data using the above assumption. If there was a substantial difference, we reported results and discussed them but continued to employ our assumption.

3. Risk of bias

We analysed the effects of excluding trials that were judged to be at high risk of bias across one or more of the domains of randomisation (implied as randomised with no further details available), allocation concealment, blinding and outcome reporting for the meta‐analysis of the primary outcome. If the exclusion of trials at high risk of bias did not substantially alter the direction of effect or the precision of the effect estimates, then we included data from these trials in the analysis.

4. Imputed values

A sensitivity analysis to assess the effects of including data from trials where we used imputed values for ICC in calculating the design effect in cluster randomised trials was not needed for this update as there were no cluster randomised trials.

If we noted substantial differences in the direction or precision of effect estimates in any of the sensitivity analyses listed above, we did not pool data from the excluded trials with the other trials contributing to the outcome but presented them separately.

Results

Description of studies

Results of the search

A total of 4866 citations were found using the search strategy devised for the original version of this review. The inclusion of the word 'drug' in the search strategy produced a vast number of irrelevant references. For the updated search, we found an additional 661 citations of which 52 appeared relevant. From this pool, 25 were considered for inclusion (Cleary 2008). For the current update (2012 search) 130 additional relevant references were scrutinised in October 2012, which resulted in an additional five studies considered for inclusion. Two further studies were considered for inclusion from an updated search in January 2013. See also Figure 1. One trial report was in German (Bechdolf 2011) and was translated into English for the purposes of data extraction.

Figure 1

Search flow diagram, assessment and reporting of included and excluded studies for 2013 Update.

Included studies

In the previous review (Cleary 2008), 25 randomised controlled trials (RCTs) were selected for inclusion. Three studies (Godley 1994; Maloney 2006; Morse 2006) contained only skewed data (shown as 'other data' within the Data and analyses). The remaining 22 trials provided usable data (either dichotomous or continuous parametric data). For the current update, nine new trials were selected for inclusion. Two studies included in the previous review (Schmitz 2002; Weiss 2007) were excluded in this update as all of the participants were diagnosed with bipolar disorder (see Types of participants). In total, 32 RCTs were included in the current review.

1. Design

Three trials were set exclusively in hospital (Baker 2002; Bechdolf 2011; Swanson 1999) and 19 in the community. Eight trials recruited patients or were conducted in both the community (outpatients) and in hospital (Bellack 2006; Bonsack 2011; Graeber 2003; Hellerstein 1995; Hjorthoj 2013; Kavanagh 2004; Madigan 2013; Naeem 2005) and two were set in the community and in jail (Chandler 2006; Maloney 2006).

Most studies randomly allocated participants to one of two treatment conditions; the exceptions were Burnam 1995; Jerrell 1995a; Jerrell 1995b; Maloney 2006; and Morse 2006. These trials randomly allocated participants to one of three or four (Maloney 2006) interventions. We have used only two of the intervention arms in Burnam 1995 as the other did not fit into any a priori category described for inclusion in this review. Data are shown in additional tables. Study durations ranged from three months to three years and the length of the interventions ranged from less than one hour to three years. There were 19 trials from the USA, six from Australia, three from the UK and one each from Denmark, Germany, Ireland and Switzerland.

2. Participants

A total of 3165 people participated in the trials after giving informed consent and were randomised into one of the treatment arms. All participants were adults (aged 18 to 65 years) who were 'severely mentally ill' with the majority having a diagnoses of schizophrenia, schizoaffective disorder or psychosis. All had a current diagnosis of substance use disorder or had documented evidence of substance misuse. Some were homeless or had a history of unstable accommodation (Burnam 1995; Essock 2006; Morse 2006; Tracy 2007) and some were incarcerated at the time of the study (Chandler 2006; Maloney 2006).

3. Interventions

Integrated models of care (4 RCTs).
Non‐integrated models of care (4 RCTs).
Combined cognitive behavioural therapy and motivational interviewing (7 RCTs).
Cognitive behavioural therapy (2 RCTs).
Motivational interviewing (8 RCTs).
Contingency management (2 RCTs).
Skills training (2 RCTs).

Three trials, containing unusable data, were not allocated to a comparison (Godley 1994; Maloney 2006; Morse 2006) although skewed data were noted in 'Other data' tables where available.

4. Outcomes

Where possible, we included dichotomous data relating to loss to treatment, loss to evaluation, death, abstinence or reduced substance use, relapse, attendance at aftercare, and arrests.

All of the outcome scales and their abbreviations are listed in Table 1 together with the reference of the source of the scale. See below for descriptions of the continuous data scales that reported data used in the analyses. For a full list of the scales mentioned in each of the studies see Characteristics of included studies.

Open in table viewer

Table 1. List of scales and abbreviations used in included studies

Name of tool	Abbreviation	Source of scale ‐ reference
*Diagnostic tools*
Diagnostic and Statistical Manual of Mental Disorders, 4th edition	DSM‐IV	DSM‐IV
The classification of mental and behavioural disorders	ICD‐10	ICD‐10
Structured Clinical Interview for Diagnosis	SCID	Spitzer 1990
Diagnostic Interview Schedule (DIS), computerised scoring for DSM‐III‐R criteria	C‐DIS‐R	DSM III‐R
*Substance use scales*
Addiction Severity Index	ASI	McLellan 1980; McLellan 1992
Alcohol Use Inventory	AUI	Horn 1987
Alcohol Use Scale	AUS	Mueser 1995
Brief Drinker Profile	BDP	Miller 1987
Drug and Alcohol Problem Scale	DAPS	adapted non‐peer reviewed version of this scale used; see Bond 1991a
Drug Use Scale	DUS	Mueser 1995
Opiate Treatment Index	OTI	Darke 1991
Change Questionnaire‐Cannabis	RTCQ‐C	Rollnick 1992
Substance Abuse Treatment Scale	SATS	McHugo 1995
Schedule for Clinical Assessment in Neuropsychiatry	SCAN	Wing 1990
Substance Use Severity Scale	USS	Carey 1996
*Mental state scales*
Addiction Severity Index (psychiatric sub‐scale)	ASI	McLellan 1980
Beck Depression Inventory ‐ Short Form	BDI‐SF, BDI‐11	Beck 1972
Brief Psychiatric Rating Scale	BPRS	Lukoff 1986
Brief Scale for Anxiety	BSA	Tyrer 1984
Brief Symptom Inventory	BSI	Derogatis 1983a
Calgary Depression Scale	CDS	Addington 1992
Comprehensive Psychopathological Rating Scale	CPRS	Asberg 1978
Hamilton Rating Scale for Depression	HAM ‐ D	Hamilton 1960
Insight Scale		David 1992
Montgomery Asberg Depression Rating Scale	MADRS	Montgomery 1979
Positive & Negative Syndrome Scale for schizophrenia	PANNS	Kay 1987
Psychiatric Epidemiologic Research Interview	PERI	Dohrenwend 1980
Scale for the Assessment of Negative Symptoms	SANS	Andreasen 1982
Scale for the assessment of Positive Symptoms	SAPS	Norman 1996
Symptom Checklist 90	SCL‐90	Derogatis 1973; Derogatis 1975
Symptom Checklist 90‐revised	SCL‐90‐R	Derogatis 1983b
Schizophrenia Change Scale	SCR	Montgomery 1978
Young Mania Rating Scale	YMRS	Young 1978
*General function scales*
Global Assessment of Functioning	GAF	DSM‐IV
Health of the Nation Outcome Scale	HoNOS	Wing 1996
Role Functioning Scale	RFS	Green 1987
Social Adjustment Scale for the Severely Mentally Ill	SAS‐SMI	Wieduwilt 1999
Social Functioning Scale	SFS	Birchwood 1990
The Social and Occupational Functioning Scale	SOFAS	Goldman 1992
*Quality of life scales*
Brief Quality of Life Scale	BQOL	Lehman 1995
Life Satisfaction Checklist	LSC	Bond 1988; Bond 1990
Manchester Short Assessment of Quality of Life	MANSA	Priebe 1999
Quality of Life Interview	QOLI	Lehman 1988
Satisfaction with Life Scale	SLS	Stein 1980
World Health Organization's Quality of Life assessment scale, short version	WHOQOL‐BREF	Skevington 2004
*Other*
Client Satisfaction Questionnaire	CSQ	Larsen 1979
The Service Utilization Rating Scale	SURS	Mihalopoulos 1999

4.1 Substance use scales

a. Drug and alcohol scales from Addiction Severity Index (ASI)

The ASI (McLellan 1980) provides two summary scores of problems of functioning in seven areas, including psychiatric problems, and those concerning drug and alcohol use. Severity ratings range from zero to nine and are assessments of lifetime and current problem severity derived by the interviewer. Composite scores are mathematically derived and are based on client responses to a set of items based on the last 30 days. Although difficulties have been reported concerning the use of the ASI with people who have severe mental illness (Corse 1995), the psychometric properties of the subscales with this population have been reported by a number of authors (Appleby 1997; Hodgins 1992; Zanis 1997). Given that the problems encountered by the scale are likely to be encountered by any other similar instrument based on self‐reports of those with severe and persistent mental illness, it was decided to include data obtained with the ASI (used in Barrowclough 2001; Bechdolf 2011; Bellack 2006; Drake 1998a; Essock 2006; Hellerstein 1995 and Lehman 1993).

b. Alcohol Use Inventory (AUI)

This inventory assesses alcohol use (Horn 1987) (used by Hickman 1997).

c. Alcohol Use Scale (AUS)

A five‐point scale based on clinicians' ratings of severity of disorder, ranging from one (abstinence) to five (severe dependence) (Mueser 1995). This was used in Drake 1998a and Essock 2006.

d. Cannabis and Substance Use Assessment Schedule (CASUAS) (modified from the SCAN)

This measures cannabis use and includes similar information to the ASI, such as percentage of days using cannabis in the past four weeks, frequency of cannabis use, and an index of severity (range 0 to 4) with higher scores indicating greater severity (Wing 1990) (used by Edwards 2006).

e. Drug Use Scale (DUS)

A five‐point scale based on clinicians' ratings of severity of disorder, ranging from one (abstinence) to five (severe dependence) (Mueser 1995) (used in Drake 1998a and Essock 2006).

f. Opiate Treatment Index (OTI)

The OTI has six domains reflecting treatment outcomes of: drug use, HIV risk‐taking behaviour, social functioning, criminality, health status and psychological adjustment (Darke 1991; Darke 1992). The drug use domain consists of 11 items measuring drug use over the last three days (recent drug use) or previous month (28 days) for alcohol, cannabis, amphetamines, cocaine, opiates and other drugs. Clients are asked to estimate the number of drinks or usage of drugs on the two most recent use days in the previous month. The quantity over the two days (q1 + q2) is divided by day interval (t1 + t2). Thus, an OTI score of 1.0 indicates one drink, injection or joint per day; 0.14 to 0.99 more than once a week; 0.01 to 0.13 once a week or less, and 2.0 or more indicates use more than once a day. Higher scores indicate a greater degree of dysfunction or substance use. Baker 2002 and Baker 2006 used the OTI to measure substance use over the previous month.

g. Substance Abuse Treatment Scale (SATS)

An eight‐point scale indicating progression toward recovery ranging from one (early stages of engagement) to eight (relapse prevention). Higher scores indicate greater progression (McHugo 1995). This was used by Drake 1998a and Essock 2006.

h. Alcohol and drug use disorders section of the Structured Clinical Interview for DSM‐III‐R (Patient Edition) (SCID)

Items relate to substance use in the past month (Spitzer 1990). Higher scores indicate a greater degree of dysfunction (used by Baker 2002).

i. Substance Use Severity Scale (USS)

This is a five‐point scale, ranging from one (not using) to five (meets criteria for severe use) (Carey 1996), used by Morse 2006.

4.2 Mental state assessment

a. Beck Depression Inventory (BDI)

This contains 21 self‐report items which measure the severity of depression (Beck 1972). Each item comprises four statements (rated from 0 to 4) describing increasing severity on how they felt over the preceding week. Scores range from 0 to 84, with higher scores indicating more severe symptoms (used in Baker 2006; Edwards 2006 used the short form of this scale (BDI‐SF)).

b. Brief Psychiatric Rating Scale (BPRS)

Used to assess the severity of a range of psychiatric symptoms, including psychotic symptoms (Lukoff 1986), the scale has 24 items of which 14 are based on the person's self‐report in the last two weeks and 10 on the person's behaviour during the interview. Each item can be defined on a seven‐point scale from one (not present) to seven (extremely severe). Total scoring ranges from 24 to 168 and there are five subscales with minimum scores ranging from three to four depending on the subscale (used in Baker 2006; Drake 1998a; Edwards 2006; and Essock 2006).

c. Brief Symptom Inventory (BSI)

This measures psychiatric symptomatology (Derogatis 1983a). A brief rating scale is used by an independent rater to assess severity of psychiatric symptoms. Scores range from 0 to 4 with higher scores indicating more symptoms (used by Baker 2002 and McDonell 2013).

d. Comprehensive Psychopathological Rating Scale (CPRS)

This is an interview rating scale covering a wide range of psychiatric symptoms, and can be used in total or as subscales. The Montgomery Asperg Depression Rating Scale (MADRS), Brief Scale for Anxiety (BSA) and the Schizophrenia Change Scale (SCR) are all subscales of the CPRS. It comprises 65 items that cover the range of psychopathology over the preceding week (40 symptom items are rated by the participant) (Asberg 1978). Each item is rated on a 0 to 3 scale, varying from 'not present' to 'extremely severe', with high scores indicating more severe symptoms and a worse outcome (used by Naeem 2005).

e. Global Assessment of Functioning (GAF)

The Global Assessment of Functioning is a revised version of the Global Assessment Scale (GAS) (Endicott 1976). The (GAF) scale allows the clinical progress of the patient to be expressed in global terms using a single measure. The GAF allows the clinician to express the patient's psychological, social and occupational functioning on a continuum extending from superior mental health, with optimal social and occupational performance, to profound mental impairment when social and occupational functioning is precluded. Developed by DSM‐IV to report global assessment of functioning on the Axis V (DSM‐IV) it ranges from 1 to 100 (zero is used to acknowledge inadequate information). Higher scores indicate a better outcome; scores ranging from 1 to 20 indicate a person unable to function independently; 21 to 40 indicate major impairment, severely impaired by delusions; 41 to 60 moderately impaired, having serious symptoms and these patients usually need continuous treatment in a partial hospitalisation or outpatient setting; 61 to 80 indicate slight or mild impairment with transient symptoms; and 81 to 100, good or superior functioning. Baker 2006, Barrowclough 2001, Barrowclough 2010,Bechdolf 2011, Bonsack 2011, Essock 2006 and Madigan 2013 used this scale.

f. Health of the Outcome Nation Outcomes Scale (HoNOS)

HoNOS is a 12‐item instrument on a scale of 0 to 4 used to rate patients' symptoms and progress towards health (Wing 1996). Item 3 can be used to rate drug and alcohol use (0 = no problem, 1 = some over‐indulgence but within social norm, 2 = loss of control, 3 = marked craving, 4 = incapacitated by alcohol or drug problem) and other items can be used to assess social functioning. Thus, ratings range from 0 to 48 and higher scores indicate a poorer outcome (used by Naeem 2005).

g. Insight Scale

This is used to assess the level of insight the patient has of his or her illness (David 1992). Seven self‐report items are scored from 0 = no insight to 2 = full insight. One additional self‐report item is scored 0 to 4 (used by Naeem 2005).

h. The Positive and Negative Syndrome Scale (PANSS)

The PANSSt was developed from the BPRS and the Psychopathology Rating Scale (Kay 1987). It is used as a method for evaluating positive, negative and other symptom dimensions in schizophrenia. The scale has 30 items and each item can be defined on a seven‐point scoring system, varying from one (absent) to seven (extreme), so total scores range from 30 to 210. This scale can be divided into three subscales for measuring the severity of general psychopathology (range 16 to 112), positive symptoms (PANSS‐P, range 7 to 49) and negative symptoms (PANSS‐N, range 7 to 49). A low score indicates low levels of symptoms. This was used by Barrowclough 2001, Barrowclough 2010, Bechdolf 2011, Bonsack 2011 and Kemp 2007.

i. Psychiatric scale from Addiction Severity Index (ASI‐psychiatric)

Psychiatric subscores (McLellan 1980) were reported in Lehman 1993 and Hellerstein 1995. See the ASI scoring above.

j. Scale for the Assessment of Negative Symptoms (SANS)

The scale assesses negative symptoms for schizophrenia (Andreasen 1982). This assesses five symptoms complexes to obtain the clinical rating of negative symptoms over the preceding week. They are affective blunting, alogia, apathy, anhedonia and disturbance of attention. Each item uses a six‐point scale ranging from 0 (not at all) to 5 (indicating severe). High scores indicate a worse outcome (used by Edwards 2006).

k. Symptom Checklist 90 (revised) (SCL‐90‐R)

Used to measure psychiatric symptoms (Derogatis 1983a), the scale has 90 self‐report items designed to measure nine symptom dimensions. Each item has a five‐point Likert scale ranging from 0 (mild or not at all) to 4 (severe or extremely distressing), with higher scores indicating greater symptomatology (used by Hickman 1997).

4.3 Quality of life and client satisfaction

a. The Quality of Life Interview (QOLI) and the Brief Quality of Life Scale (BQOL)

The QOLI contains 153 items that measure global life satisfaction as well as objective and subjective quality of life (Lehman 1988; Lehman 1995). It has eight domains (for example, living situations, daily activities and functioning, family relations, social relations). Rated on a 7‐point scale (1 = terrible, 2 = unhappy, 3 = mostly dissatisfied, 4 = equally satisfied and dissatisfied, 5 = mostly satisfied, 6 = pleased, and 7 = delighted) with higher scores indicating better quality of life. It was used by Baker 2006, Bellack 2006, Drake 1998a, Essock 2006 and Lehman 1993.

b. World Health Organization's Quality of Life scale (WHOQOL‐BREF)

The WHOQOL‐BREF is a 26‐item scale (Skevington 2004) assessing physical health, psychological well being, social relationships, and environmental factors (for example, home environment, recreation, access to health care, physical safety and financial resources). It also contains two general items and each item is rated on a 5‐point scale (1 to 5, with higher scores = better quality) (used by Madigan 2013).

c. Client Satisfaction Questionaire (CSQ)

The CSF questionnaire (CSQ) (Larsen 1979) is a self‐report instrument that consists of eight items designed to measure global patient satisfaction of services provided and if they met their needs or approval. The items are rated on a 4‐point scale (minimum of 1 = no definitely not to maximum 4 = very satisfied), with a minimum score of 8 and maximum of 32 and higher scores indicating greater satisfaction (used by Hjorthoj 2013).

4.4 Social functioning

a. Role Functioning Scale (RFS)

This is a self‐report scale whereby the total of four subscales measures global role functioning (Green 1987). Scores reported are summary scores derived from four independent raters. Higher scores indicate better functioning (used by Jerrell 1995a and Jerrell 1995b).

b. Social Adjustment Scale for the Severely Mentally Ill (SAS‐SMI)

An abbreviated version of the Social Adjustment Scale II is used to assess social adjustment (Wieduwilt 1999), with a self‐reported scale composed of 24 items covering seven areas including social, family and work functioning designed specifically for use with schizophrenic populations. Scores range from 1 to 7, with a high score indicating poor outcome (used by Jerrell 1995a and Jerrell 1995b).

c. Social Functioning Scale (SFS)

A self‐report scale developed for people with schizophrenia which enumerates basic skills necessary for community living and performance (Birchwood 1990), the SFS is a 79‐item questionnaire that uses a 4‐point rating scale (0 to 3) of frequency or ability. Items are grouped into seven domains. Raw scores for each subscale are converted to a standard score; overall functioning is based on the mean standard score (Burns 2007). Higher standardised scores indicate better functioning (range 55 to 135) (Birchwood 1990). This was used by Barrowclough 2001.

d. The Social and Occupational Functioning Scale (SOFAS)

SOFAS was derived from the GAF scale and is used to assess levels of physical and mental functioning in social and work settings (Burns 2007; Goldman 1992). Scored similarly to the GAF (see above) by an observer it ranges from 0 to 100 with zero representing inadequate information. Higher scores indicate better outcomes (used by Bonsack 2011 and Edwards 2006).

e. Service Utilisation Rating Scale (SURS)

This measures inpatient and outpatient attendance and medication usage (Mihalopoulos 1999) (used by Edwards 2006).

Excluded studies

In the current update, we excluded 46 studies or trials identified through the initial search (July 2012): five were not randomised, 30 did not include participants with a concurrent diagnosis of severe mental illness and substance misuse, 10 studies used a non‐psychosocial intervention or did not include a specific substance misuse treatment programme, and one trial had no usable data. Five further full text articles that were identified through subsequent searches (Bagoien 2013; Jones 2011; Sigmon 2000; Smeerdijk 2010; Weiss 2009) were excluded.

In the 2008 review, we excluded 68 studies (this did not include related studies, please see Characteristics of excluded studies). Twenty‐six were not randomised or used a quasi‐randomisation method, 18 did not have participants with a concurrent diagnosis of severe mental illness and substance misuse, and 14 used a non‐psychosocial intervention or did not include a specific substance misuse treatment programme. A further 10 RCTs were excluded either due to high attrition rates or unclear reporting (attempts were made to contact all authors for further information). One study previously listed in the 2008 review as ongoing (Sitharthan 1999) was excluded in the current review. Two studies previously included in the 2008 review (Schmitz 2002; Weiss 2007) were excluded as all the participants had bipolar disorder.

Two studies are listed as awaiting assessment (Meister 2010; Odom 2005). Both are dissertations: one requires translation from German to English and the other has been requested.

We found 12 ongoing studies and have tried to contact the authors for further information. Three trials each intend to assess cognitive behavioual therapu versus treatment as usual, motivational interviewing plus cognitive behavioural therapy versus treatment as usual and contingency management; one trial intends to assess motivational interviewing versus treatment as usual, one integrative therapy, and one will use an (undescribed) educational intervention.

Risk of bias in included studies

For a summary of the overall risk of bias in the included trials please see Figure 2 and Figure 3.

Figure 2

Risk of bias graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.

Figure 3

Risk of bias summary: review authors' judgements about each risk of bias item for each included study.

Allocation

1. Random generation

All 32 studies were stated to be randomised. Some used blocking or stratification methods in the sequence to obtain evenly balanced groups or different proportions (2:1) for each intervention, site, or to control for various demographic variables (type or degree of substance use, gender or psychiatric diagnosis). Ten studies stated the sequence was computer generated (Barrowclough 2001; Barrowclough 2010; Bonsack 2011; Chandler 2006; Edwards 2006; Essock 2006; Hjorthoj 2013; Madigan 2013; Maloney 2006; Naeem 2005), but often it was unclear how the sequence was generated (for example, random number table). Six studies used urn randomisation or placed cards in envelopes that were shuffled to produce a random sequence (Bellack 2006; Jerrell 1995a; Jerrell 1995b; Kemp 2007; Lehman 1993; McDonell 2013). Four other studies mentioned that a numbered table or random sequence was used to generate the sequence or that the sequence was stratified (Kavanagh 2004; Nagel 2009; Swanson 1999; Tracy 2007) but it was not clear if a computer was involved as no further details were provided. The 20 studies listed above (with the exception of Maloney 2006) were classified as low risk of selection bias as they provided some details of the allocation process or further particulars were provided by the researchers. Some participants in the Maloney 2006 study were not randomised due to procedural difficulties. The remaining studies did not provide enough details of how the allocation sequence was generated to make a judgement so were classified as of unclear quality with a moderate risk of selection bias and an overestimate of positive effect.

2. Allocation concealment

One study provided a full description of the methods used to generate the random sequence and allocation concealment (Barrowclough 2010). Seven studies provided some details or made explicit their method used for allocation concealment. Four other studies used urn method randomisation (Jerrell 1995a; Jerrell 1995b; Lehman 1993; McDonell 2013), which has a low risk of bias if used properly, and some confirmed this via personal emails. Five trials stated that allocation concealment was achieved by a third party or researcher who was independent of the treating team (Barrowclough 2001; Chandler 2006; Edwards 2006; Hjorthoj 2013; Madigan 2013) but often no further details were provided. The 10 studies listed above were judged as low risk as it was implied that the allocation concealment was adequate. Two trials (Maloney 2006; Swanson 1999) were judged high risk of bias as the researcher or therapist was involved with allocating patients. Four studies used sealed envelopes or patients selected a card (Baker 2006; Bonsack 2011; Kemp 2007; Naeem 2005). However, it was not clear if the envelopes were opaque or if other measures were taken to ensure concealment, so these were judged as unclear risk. The remaining studies were classified as of unclear quality with a moderate risk of selection bias and overestimate of positive effect as no details were provided regarding allocation concealment, but this may be due to incomplete reporting and not how the study was conducted.

Blinding

1. Performance bias

We classified blinding in respect to primary outcomes for performance and detection bias. Due to intervention characteristics, that is being a therapy or model of service, we assumed the participants and clinicians as being implicitly not blind to treatment assignment when considering performance bias. Therefore, we judged performance bias of all trials to be of unclear risk.

2. Detection bias

Overall, 15 studies stated that independent raters were blinded to allocation when assessing clinical ratings of mental state or substance use. For 13 other studies it was unclear if the raters were blind to treatment as this was not stated. Four studies (Graeber 2003; Kemp 2007; Maloney 2006; Nagel 2009) were judged at high risk of bias because it was stated that the outcome assessors were not blind to treatment allocation and it was therefore possible to assess the risk of bias in these studies with higher confidence for clinical‐based ratings. In three of these studies blinding status would not influence the primary outcome data as these were administrative measures (hospital readmissions, convictions, time to first outpatient appointment etc), however they were still judged high risk as less effort may have been made to follow‐up those in the control arm.

Incomplete outcome data

We only rated risk of incomplete outcome data in respect to the primary outcome. The number of participants lost to treatment or evaluation across studies ranged from 0% to 57%. Four trials were judged as adequately addressing incomplete outcome data and were rated as low risk of attrition bias because there were no missing outcome data (Hickman 1997; Lehman 1993; Swanson 1999) and for one study (Graeber 2003) there were no missing values for the primary outcome.

The following trials were rated as high risk. Bellack 2006 excluded 46 of 175 participants after they were randomised, a further 19 participants because they did not become engaged in treatment, and a further 27 were lost to follow‐up. Therefore, a significant proportion of participants (92/175, 53%) were excluded from the analysis, which may have a clinically relevant bias in intervention effect estimates. The attrition rate was greater than 50% for Bond 1991a, Godley 1994 (at 18 months) and Hellerstein 1995 (at 8 months) so data from these trials were excluded from the analysis as per protocol. In the Chandler 2006 trial the attrition rate for the primary outcome measure was 37% (68/182); and for the controls no interviews were conducted to ascertain their whereabouts (moved from area, reincarcerated or died) and this may have led to severe bias. Three further trials were rated as high risk as more than 40% of patients were lost to follow‐up; no reasons were given for them being missing and a full intention‐to‐treat (ITT) analysis was not reported (Baker 2002; Jerrell 1995a; Jerrell 1995b).

Many, but not all, included studies provided reasons for attrition. Reasons given included: some of the participants died during the trial, some could not be contacted or moved elsewhere, and some withdrew. Seven studies reported their results based on a full ITT analysis with all missing data imputed for primary outcomes using appropriate methods (Barrowclough 2010; Bechdolf 2011; Bonsack 2011; Edwards 2006; Hjorthoj 2013; Naeem 2005; Nagel 2009). These were rated as unclear as all imputation strategies can bias study results. The remaining trials were rated as 'unclear'. They either did not address this issue, presented insufficient information of attrition or exclusions to permit judgement (that is, no reasons for missing data provided or numbers lost to evaluation not stated for each group) or did not report a full ITT analysis with imputed missing values (Baker 2006; Barrowclough 2001; Bond 1991b; Burnam 1995; Drake 1998a; Essock 2006; Kavanagh 2004; Kemp 2007; Madigan 2013; Maloney 2006; McDonell 2013; Morse 2006; Tracy 2007).

Selective reporting

Four studies were rated as high quality in reporting outcomes with a low risk of reporting bias (Barrowclough 2001; Barrowclough 2010; Hjorthoj 2013; McDonell 2013) as the pre‐specified outcomes listed in the trial protocol were fully reported (cases or means, SD and number (n) for each outcome at specific time points) in the study report. Conversely, four studies were rated as low quality with a high risk of reporting bias (Godley 1994; Maloney 2006; Morse 2006; Tracy 2007) as they presented data in a way we could not consider as free of suggestion of selective outcome reporting. For these studies, there were no usable data or data were reported incompletely for each treatment arm or in a way (for example, as correction matrix, graphically or in a mixed‐methods model) that they could not be entered in a meta‐analysis. For the rest of the studies the risk of bias was assessed as unclear with a moderate risk of reporting bias due to insufficient information to permit judgement of yes or no; there was no protocol to assess the presence of selective reporting.

Other potential sources of bias

The risk of other potential sources of bias was rated as low as no evidence of other bias was apparent. Most were publicly funded trials. No declaration of interest was made by authors, and we assume there was none to be made. However, many study authors were active pioneers in developing and the implementation of the experimental intervention model across the scientific community and clinical world. This raises the issue of how researcher beliefs could affect the entire process of evaluating an intervention in an RCT. Although conscious of this issue, we decided not to make any attempt to rate it as it is very difficult to judge, and erroneous quantification could drive bias into our conclusions.

Effects of interventions

Comparison 1: integrated models of care versus treatment as usual

See summary of findings Table for the main comparison. Data for this comparison came from four trials (Burnam 1995; Chandler 2006; Drake 1998a; Essock 2006).

1.1 Lost to treatment

By the end of treatment (36 months) we found no significant difference in the likelihood of participants being lost to treatment from the pooled results of Chandler 2006; Drake 1998a and Essock 2006 (treatment group 24% lost, control group 21% lost; n = 603, RR 1.09 CI 0.82 to 1.45, Analysis 1.1). Statistical heterogeneity was not present (Chi² = 1.95, df = 2 (P = 0.38); I² = 0%).

1.2 Lost to evaluation

The control group for Burnam 1995 were 46% more likely to be lost to evaluation by 3 months (treatment group 15% lost, control 28% lost; n = 132, RR 0.54 CI 0.27 to 1.08), although not statistically significant. Six months data (Burnam 1995; Essock 2006) also did not reveal any significant difference between groups (n = 330, RR 0.69 CI 0.27 to 1.73, Analysis 1.2). Nine, 12, 24 and 36 months data were also not significantly different. For 36 month data we combined the results from three studies (Chandler 2006; Drake 1998a; Essock 2006) in a meta‐analysis. There was considerable statistical heterogeneity (Chi² = 7.70, df = 2 (P = 0.02); I² = 74%). Closer inspection of the forest plot indicated a higher retention rate in the treatment group in Drake 1998a, likely to account for this heterogeneity.

1.3 Death

We found no significant differences in the pooled results of Drake 1998a and Essock 2006 with regards to the likelihood of participants dying by the end of 36 months of treatment (treatment 3% died, control 3% died; n = 421, RR 1.18 CI 0.39 to 3.57, Analysis 1.3). Statistical heterogeneity was not present (Chi² = 0.68, df = 1, P = 0.40; I² = 0%).

1.4 Substance use

We found no significant difference (Drake 1998a) between groups in the likelihood of participants not being in remission (alcohol ‐ treatment 57%, control 50%; n = 143, RR 1.15 CI 0.84 to 1.56; drugs ‐ treatment 58%, control 65%; n = 85, RR 0.89 CI 0.63 to 1.25, Analysis 1.4) or in their average SATS scores by 6 months (n = 203, weighted mean difference (MD) 0.07 CI ‐0.28 to 0.42) or 36 months (n = 203, MD 0.11 CI ‐0.41 to 0.63, Analysis 1.5). Further outcome data related to alcohol use (Analysis 1.6), drug use (Analysis 1.7) and general substance use attitudes (Analysis 1.8) contained skewed data and are reported in 'Other data' tables.

1.5 Mental state

We found that the relapse data (Analysis 1.9) and BPRS scores (Analysis 1.10) contained wide confidence intervals (skewed data) and reported these in 'Other data' tables.

1.6 Service utilisation

We found that the pooled results of two studies (Drake 1998a; Essock 2006) for average number of days spent in stable community residences (not in hospital) by 12 months were equivocal (n = 378, MD ‐10.00 CI ‐38.61 to 18.60), and also between 24 (n = 203, MD 7.40 CI ‐6.32 to 21.12) and 36 months (n = 364, MD 5.17 CI ‐9.20 to 19.55, Analysis 1.11). Statistical heterogeneity was not present (Chi² = 0.31, df = 1, P = 0.58; I² = 0%). We found no significant difference (Essock 2006) in likelihood of hospitalisation by 36 months (treatment 42% hospitalised, control 48% hospitalised; n = 198, RR 0.88 CI 0.64 to 1.19, Analysis 1.12 ). Other measures (skewed data) of service use are reported in 'Other data' tables (Analysis 1.13).

1.7 Functioning

Only Essock 2006 reported data for functioning and we found no significant differences for average global functioning scores (GAF) at 6 months (n = 162, MD 1.10 CI ‐1.58 to 3.78), 12 months (n = 171, MD 0.70 CI ‐2.07 to 3.47), 18 months (n = 176, MD 1.00 CI ‐1.58 to 3.58), 24 months (n = 166, MD 1.70 CI ‐1.18 to 4.58), 30 months (n = 164, MD ‐0.60 CI ‐3.56 to 2.36) or 36 months (n = 170, MD 0.40 CI ‐2.47 to 3.27, Analysis 1.14). Forensic measures (Analysis 1.15), number of hours requiring medication (Analysis 1.16), per cent of time on the street (Analysis 1.17) and time in independent housing (Analysis 1.18) were skewed and are reported in 'Other data' tables.

1.8 Satisfaction

The pooled results of Drake 1998a and Essock 2006 revealed no significant difference in average general life satisfaction (QOLI) scores by 6 months (n = 361, MD ‐0.11 CI ‐0.41 to 0.20), 12 months (n = 372, MD 0.02 CI ‐0.28 to 0.32), 18 months (n = 377, MD 0.09 CI ‐0.27 to 0.44), 24 months (n = 370, MD 0.02 CI ‐0.29 to 0.33), 30 months (n = 366, MD 0.02 CI ‐0.27 to 0.32) and 36 months (n = 373, MD 0.10 CI ‐0.18 to 0.38, Analysis 1.19). Statistical heterogeneity was not present at any of the 6 time points (for example, 24 months: Chi² = 1.09, df = 1, P = 0.30; I² = 8%).

Comparison 2: non‐integrated models of care (intensive case management) versus treatment as usual

See summary of findings Table 2. Four trials assessed this comparison (Bond 1991a; Bond 1991b; Jerrell 1995b; Lehman 1993).

2.1 Lost to treatment

Pooled results of Bond 1991a; Bond 1991b and Jerrell 1995b showed a 23% increase in the likelihood of patients being lost from the treatment group by 6 months (treatment 27% lost, control 22% lost; n = 134, RR 1.23 CI 0.73 to 2.06), which was not statistically significant. Longer‐term evaluations at 12 months (treatment 28% lost, control 24% lost; n = 134, RR 1.21 CI 0.73 to 1.99) and 18 months (treatment 51% lost, control 37% lost; n = 134, RR 1.35 CI 0.83 to 2.19, Analysis 2.1) did not reveal any significant difference between groups. Statistical heterogeneity was not present at 6 months, 12 months, 18 months, 24 months, 30 months or 36 months.

2.2 Lost to evaluation

We found no significant difference in the pooled results (Bond 1991b; Jerrell 1995b; Lehman 1993) for lost to evaluation by 6 months (treatment 10% lost, control 10% lost; n = 121, RR 1.00 CI 0.38 to 2.60) and by 12 months (treatment 12% lost, control 12% lost; n = 121, RR 1.00 CI 0.43 to 2.35). Pooled results (Bond 1991b; Jerrell 1995b) at 18 months also revealed no significant differences between treatment groups (treatment 43% lost, control 33% lost; n = 92, RR 1.26 CI 0.48 to 3.30, Analysis 2.2). Statistical heterogeneity was not present at 6, 12, or 18 months.

2.3 Substance use and mental state

Data for substance use (Analysis 2.3) and mental state (Analysis 2.4) were skewed and are included in 'Other data' tables.

2.4 Functioning

We found no significant difference in the average role functioning (RFS) scores (Jerrell 1995b) by 6 months (n = 50, MD ‐0.78 CI ‐2.91 to 1.35) or 12 months (n = 50, MD 0.70 CI ‐1.56 to 2.96), although by 18 months the data favoured the control group (n = 29, MD ‐2.67 CI ‐5.28 to ‐0.06, Z = 2.00, P = 0.045, Analysis 2.5). The average baseline means (SD) on the RFS were similar between groups so did not explain this difference: baseline treatment 9.46 (4.11) to 10.77 (2.36) at 18 months; and baseline control 10.03 (3.87) to 13.44 (4.78) at 18 months. Note that higher scores indicate better functioning.

We found no significant difference in average levels of social adjustment scores (SAS) by 6 months (Jerrell 1995b) (n = 50, MD ‐0.93 CI ‐6.34 to 4.48), 12 months (n = 50, MD 3.09 CI ‐2.71 to 8.89) or 18 months (n = 29, MD ‐3.75 CI ‐10.12 to 2.62, Analysis 2.6).

2.5 Satisfaction

Data for average life satisfaction (QOLI) were skewed so are reported in 'Other data' tables (Analysis 2.7).

Comparison 3: cognitive behavioural therapy + motivational interviewing versus treatment as usual

See summary of findings Table 3. Data for this comparIson came from seven trials (Baker 2006; Barrowclough 2001; Barrowclough 2010; Bellack 2006; Hjorthoj 2013; Kemp 2007; Madigan 2013).

3.1 Lost to treatment

We found that the results from Baker 2006 indicated that the treatment group were 17 times more likely to be lost to treatment by 3 months (treatment 12%, control 0%; n = 130, RR 17.00 CI 1.0 to 288.56). In contrast, Madigan 2013 reported no significant group difference in lost to treatment by 3 months (treatment 29%, control 24%; n = 88, RR 1.19 CI 0.56 to 2.55). Combined, the treatment group was more likely to be lost to treatment by 3 months (treatment 20%, control 7%; n = 218, RR 3.37 CI 0.20 to 57.79) and there was considerable statistical heterogeneity (Chi² = 3.95, df = 1, P = 0.05; I² = 75%). Six month data (Barrowclough 2010; Bellack 2006; Hjorthoj 2013) revealed no significant difference for loss to treatment (treatment 29%, control 23%; n = 605, RR 1.02 CI 0.68 to 1.54, P = 0.91). Similarly, we found 9 to 10 month data (Barrowclough 2001; Hjorthoj 2013) were not significantly different in rates of loss to treatment (treatment 23%, control 32%; n = 139, RR 0.72 CI 0.42 to 1.23) nor were 12 month data significantly different (Barrowclough 2010) (treatment 17.7%, control 17.8%, Analysis 3.1). Statistical heterogeneity was not present at 6 or 9 to 10 months.

3.2 Lost to evaluation

We found all data to be equivocal between the treatment and control groups by 3 months (Baker 2006) (treatment 8% lost, control 6% lost; n = 130, RR 1.25 CI 0.35 to 4.45) and by 6 months (Baker 2006; Bellack 2006; Kemp 2007) (treatment 15% lost, control 14% lost; n = 259, 3 RCTs, RR 1.02 CI 0.35 to 2.94). Longer evaluation times also did not reach statistical significance, at 9 months (Barrowclough 2001) (treatment 11%, control 17%; n = 36, RR 0.67 CI 0.13 to 3.53), 12 months (Baker 2006; Barrowclough 2001; Madigan 2013) (treatment 31%, control 21%; n = 254, 3 RCTs, RR1.35 CI 0.87 to 2.08), 18 months (Barrowclough 2001; Barrowclough 2010); (treatment 20%, control 22%; n = 363, 2 RCTs, RR 0.92 CI 0.61 to 1.38) and 24 months (Barrowclough 2010) (treatment 21%, control 28%; n = 327, 1 RCT, RR 0.76 CI 0.52 to 1.11, Analysis 3.2). Statistical heterogeneity was not present for any of the above subgroup analyses.

3.3 Death

We found no significant difference in the pooled results (Baker 2006; Barrowclough 2001; Barrowclough 2010) for the likelihood of participants dying by about 1 year (treatment 2.4%, control 3.3%; n = 493, 3 RCTs, RR 0.72 CI 0.22 to 2.41, Analysis 3.3). Statistical heterogeneity was not present (Chi² = 2.18, df = 2, P = 0.34; I² = 8%). Similarly, we found no significant difference for the likelihood of participants hospitalised or dying versus alive and not admitted to hospital by 24 months (Barrowclough 2010) (treatment 23%, control 20%; n = 326, RR 1.15 CI 0.76 to 1.74, Analysis 3.4).

3.4 Substance use

Substance use from polydrug usage was not significantly different by 3 months (Baker 2006) (n = 119, MD 0.37 CI ‐0.01 to 0.75), or by 6 months (n = 119, MD 0.19 CI ‐0.22 to 0.60, Analysis 3.5). Moreover, cannabis use in the last 30 days was not significantly different at 3 months, the end of treatment (Madigan 2013) (n = 50, MD ‐0.2 CI ‐2.54 to 2.14) or at 12 months (Madigan 2013) (n = 42, MD ‐0.3 CI ‐2.84 to 2.24, Analysis 3.6). Averages of various substance use measures that reported skewed data are shown in 'Other data' tables (Analysis 3.7; Analysis 3.8).

3.5 Mental state

We were only able to include limited data for relapse and found no significant difference in the likelihood of relapse between groups (Barrowclough 2001) by 9 months (treatment 28% relapsed, control 56% relapsed; n = 36, RR 0.50 CI 0.21 to 1.17), or by 12 months (treatment 33%, control 67%; n = 36, RR 0.50 CI 0.24 to 1.04), or 18 months (treatment 39%, control 67%; n = 36, RR 0.58 CI 0.30 to 1.13, Analysis 3.9). No significant differences were found for total PANSS scores between treatment groups by 6 months (Hjorthoj 2013; Kemp 2007) (n = 78, MD 0.99 CI ‐5.91 to 7.89), 9 to 10 months (Barrowclough 2001; Hjorthoj 2013) (n = 92, MD ‐5.01 CI ‐11.25 to 1.22), 12 months (Barrowclough 2010) (n = 274, MD 2.52 CI ‐0.68 to 5.72) and by 24 months (Barrowclough 2010) (n = 247, MD 2.71 CI ‐0.58 to 6.00, Analysis 3.10). Moreover, no significant differences were reported for the PANSS positive symptom (Analysis 3.11) nor the PANSS negative symptom (Analysis 3.12) subscales at 12 or 24 months. Statistical heterogeneity was not present for any of the above time points. Average scores for other measures of mental state that reported skewed data are presented in 'Other data' tables (Analysis 3.13).

3.6 Functioning

3.6.1 Arrests

We found the number of reported arrests (Bellack 2006) were not significantly different between treatment and control group by 6 months (treatment 13%, control 27%; n = 110, RR 0.49 CI 0.22 to 1.10, Analysis 3.14).

3.6.2 Global assessment of functioning

Global assessment scores for functioning (GAF) were not significantly different by 3 months (Baker 2006; Madigan 2013) (n = 177, MD ‐1.17 CI ‐4.57 to 2.23), 6 months (n = 119, MD ‐0.09 CI ‐3.70 to 3.52), 12 months (Baker 2006; Barrowclough 2001; Barrowclough 2010; Madigan 2013) (n = 445, 4 RCTs, MD 1.24 CI ‐1.86 to 4.34), 18 months (n = 28, 1 RCT, MD 6.68 CI ‐5.24 to 18.60) or 24 months (n = 234, 1 RCT, MD ‐0.21 CI ‐2.93 to 2.51, Analysis 3.15). Lower scores indicate poorer functioning. Statistical heterogeneity was not present at 3 months or 12 months (Chi² = 5.20, df = 3, P = 0.16; I² = 42%).

3.6.3 Social functioning

We found no significant difference by 9 months (Barrowclough 2001) (n = 32, MD 5.01 CI ‐0.55 to 10.57) in social functioning scores. However, by 12 months (3 months following end of treatment) results favoured the treatment group (high scores = better) (Barrowclough 2001) (n = 32, MD 7.27 CI 0.86 to 13.68, Analysis 3.16).

3.7 Quality of life

Average general life satisfaction scores (BQOL) were higher for the treatment group (Bellack 2006) by 6 months (n = 110, MD 0.58 CI 0.00 to 1.16, P = 0.049, Analysis 3.17), although confidence intervals crossed the line of no effect. Differences in baseline means (SD) did not account for this finding (treatment 4.25 (1.65) to 4.79 (1.66) at 6 months, and control 3.96 (1.58) to 4.21 (1.43) at 6 months). Lower scores indicate less life satisfaction. However, no significant differences were found in overall quality of life scores (BQOL) by 6 months (Bellack 2006) (n = 110, MD ‐0.02 CI ‐0.61 to 0.57, Analysis 3.18). No significant differences in WHOQOL Bref scores were reported by Kemp 2007 (n = 16, MD ‐15.70 CI ‐36.19 to 4.79, Analysis 3.19) nor were there any significant differences in quality of life scores using the MANSA by 6 months (Hjorthoj 2013) (n = 64, MD ‐2.70 CI ‐7.01 to 1.61) or 10 months (n = 61, MD 0.90 CI ‐3.73 to 5.53, Analysis 3.20).

3.8 Satisfaction

One study (Hjorthoj 2013) reported client satisfaction was higher for the treatment group by 10 months (n = 62, MD 6.40 CI 3.87 to 8.93, P < 0.001, Analysis 3.21). The average direct cost subscale of the BQOL at 6 months reported by Bellack 2006 was skewed and is reported in 'Other data' tables (Analysis 3.22).

Comparison 4: cognitive behavioural therapy versus treatment as usual

See summary of findings Table 4. Data for this comparison came from two trials (Edwards 2006; Naeem 2005).

4.1 Lost to treatment

We found that the data for being lost from treatment (Edwards 2006; Naeem 2005) by 3 months were not significantly different (treatment 18%, control 23% lost; n = 259, RR 1.12 CI 0.44 to 2.86, Analysis 4.1). Statistical heterogeneity was not present (Chi² = 0.00, df = 1, P = 0.95; I² = 0%).

4.2 Lost to evaluation

The number of participants lost to evaluation (Edwards 2006) after 9 months were similar in each group (treatment 30%, control 29%; n = 47, RR 1.04 CI 0.43 to 2.51, Analysis 4.2).

4.3 Substance use

No significant differences were found in the use of cannabis (Edwards 2006) in the previous 4 weeks between groups at 3 months assessment (treatment 57%, control 54%; n = 47, RR 1.04 CI 0.62 to 1.74). Six month data were also not significantly different (n = 47, RR 1.30 CI 0.79 to 2.15, Analysis 4.3). Various measures of substance use reporting skewed data are shown in 'Other data' tables (Analysis 4.4).

4.4 Mental state

We found no significant difference on insight scores (Insight Scale) by 3 months (Naeem 2005) (n = 105, MD 0.52 CI ‐0.78 to 1.82, Analysis 4.5). Various measures of mental state reporting skewed data are shown in 'Other data' tables (Analysis 4.6).

4.5 Functioning

We found no significant difference in average social and occupational functioning scores (Edwards 2006) (SOFAS) by 3 months (n = 47, MD ‐0.80 CI ‐9.95 to 8.35) or 6 months (n = 47, MD ‐4.70 CI ‐14.52 to 5.12, Analysis 4.7). Average HONOS scores (Analysis 4.8) and outpatient medication (Analysis 4.9) are shown in 'Other data' tables due to skewed data.

Comparison 5: cognitive behavioural therapy + psychological rehabilitation versus treatment as usual

5.1 Functioning

See summary of findings Table 5. We were only able to add outcome data relating to functioning and these were all skewed data, which are reported in 'Other data' tables (Maloney 2006). There was no real indication that the number of arrests was less in the cognitive behavioural therapy + psychosocial rehabilitation group over all the time periods (Analysis 5.1), and this also applied to the number of convictions (Analysis 5.2). The number of days in jail for each group was also not really noticeably different (Analysis 5.3). It should be stressed that all data were skewed and not reanalysed, merely reported again in this review.

Comparison 6: combined cognitive behavioural therapy + intensive case managementversus treatment as usual

6.1 Functioning

See summary of findings Table 6. We were only able to add outcome data relating to functioning and these were all skewed data, which are reported in 'Other data' tables (Maloney 2006). There is some indication that the number of arrests was less in the cognitive behavioural therapy + intensive case management group over all the time periods (Analysis 6.1) and this also applied to the number of convictions (Analysis 6.2). However, the number of days in jail for each group was not noticeably different (Analysis 6.3). It should be stressed that all data were skewed and not reanalysed, merely reported again in this review.

Comparison 7: intensive case management versus treatment as usual

See summary of findings Table 7.

7.1 Functioning

We were only able to add outcome data relating to functioning and these were all skewed data, which are reported in 'Other data' tables (Maloney 2006). There is no real indication that the number of arrests was less in the intensive case management group over all the time periods (Analysis 7.1) and this also applied to the number of convictions (Analysis 7.2). The number of days in jail for each group was also not noticeably different (Analysis 7.3). It should be stressed that all data were skewed and not reanalysed, merely reported again in this review.

Comparison 8: motivational interviewing versus treatment as usual

See summary of findings Table 8. Data for this comparison came from eight trials (Baker 2002; Bechdolf 2011; Bonsack 2011; Graeber 2003; Hickman 1997; Kavanagh 2004; Nagel 2009; Swanson 1999).

8.1 Lost to treatment

Bonsack 2011 had an unusually long treatment period using motivational interviewing (6 months). There were no significant differences in lost to treatment at 3 months (n = 62, RR 0.89 CI 0.30 to 2.61) or 6 months (n = 62, RR 1.71 CI 0.63 to 4.64, Analysis 8.1).

8.2 Lost to evaluation

Pooled results from six studies (Baker 2002; Bechdolf 2011; Graeber 2003; Hickman 1997; Kavanagh 2004; Swanson 1999) revealed no significant difference in those lost to evaluation by 3 months (treatment 17% lost, control 16% lost; n = 398, RR 1.12 CI 0.64 to 1.96). Similarly, 6 month data (Bechdolf 2011; Graeber 2003; Kavanagh 2004; Nagel 2009) (n = 164, 4 RCTs, RR 0.85 CI 0.29 to 2.53) and 12 month data were not significantly different (n = 247, 3 RCTs, RR 0.92 CI 0.44 to 1.92, Analysis 8.2) between motivational interviewing and the control group. Statistical heterogeneity was not present at 3, 6 or 12 months (Chi² = 3.55, df = 2, P = 0.17; I² = 44%).

8.3 Relapse

We found no significant difference in hospital readmissions by 12 months (Bonsack 2011) (treatment 30%, control 34%; n = 62, RR 0.82 CI 0.28 to 2.38, Analysis 8.3).

8.4 Lost to first aftercare appointment

We found participants in the control group were more likely to not attend their first aftercare appointment (Swanson 1999) (treatment 58%, control 84%; n = 93, RR 0.69 CI 0.53 to 0.90, Analysis 8.4) compared with those receiving motivational interviewing.

8.5 Death

We found no significant differences in the likelihood of death due to all causes by 18 months (Nagel 2009) (treatment 4%, control 4%; n = 49, RR 1.04 CI 0.07 to 15.73, Analysis 8.5).

8.6 Substance use

We found that alcohol dependence and abuse were not significantly different (Baker 2002) (treatment 39%, control 29%; n = 52, RR 1.35 CI 0.62 to 2.92) between groups. Also, we found no significant differences in the likelihood of participants using amphetamine (treatment 9%, control 38%; n = 19, RR 0.24 CI 0.03 to 1.92) or cannabis (treatment 50%, control 65%; n = 62, RR 0.77 CI 0.49 to 1.21, Analysis 8.6). Polydrug use was not found to be significantly different for 3 and 12 month evaluation data (OTI, high = poor) (Baker 2002) (n = 89, MD ‐0.41 and ‐0.07, respectively, Analysis 8.7).

We found no significant differences (Kavanagh 2004) for the outcome of not abstaining or not improved on all substances by 12 months (treatment 38%, control 75%; n = 25, RR 0.51 CI 0.24 to 1.10, Analysis 8.8). Three month data (Graeber 2003) did not reveal any significant difference in not abstaining from alcohol (treatment 40%, control 77%; n = 28, RR 0.52 CI 0.26 to 1.03). However, by 6 months we found results from this small study (Graeber 2003) favoured the treatment group (treatment 42%, control 92%; n = 28, RR 0.36 CI 0.17 to 0.75, Analysis 8.9). Change in cannabis use from baseline was lower in at 3 months (Bonsack 2011) (n = 62, MD ‐12.81 CI ‐23.05 to ‐2.57, P = 0.014), 6 months (n = 62, MD ‐9.64 CI ‐18.05 to ‐1.23, P = 0.025), but not at 12 months (n = 62, MD ‐5.82 CI ‐14.77 to 3.13, Analysis 8.10). Cannabis consumption (Analysis 8.11), average substance use scores on the Opiate Treatment Index (OTI) (Analysis 8.12) and other measures of alcohol use (Analysis 8.13) are reported in 'Other data' tables due to skewed data.

8.7 Mental state

We found that 3 month data by Hickman 1997 revealed no significant differences in general severity (n = 30, MD ‐0.19 CI ‐0.59 to 0.21), positive distress symptoms (n = 30, MD ‐0.19 CI ‐0.66 to 0.28), or total positive symptoms (n = 30, MD ‐4.20 CI ‐18.72 to 10.32) as measured by the SCL‐90 (Analysis 8.14). Further, PANSS negative symptom scores were not significantly different at 3 months (Bonsack 2011) (n = 62, MD ‐0.10 CI ‐2.06 to 1.86) or 6 months (RR 0.0 CI ‐1.80 to 1.8, Analysis 8.15); nor were PANSS positive symptom scores at 3 months (RR ‐0.30 CI ‐2.55 to 1.95) or 6 months (RR ‐0.10 CI ‐2.58 to 2.38, Analysis 8.16). Brief Symptom Inventory scores at 3 months were skewed (Analysis 8.17) and were reported in 'Other data' tables.

8.8 Functioning

Social functioning scores (Baker 2002) did not reveal any significant differences by 6 months (n = 102, MD ‐0.71 CI ‐2.76 to 1.34), or by 12 months as measured by the OTI (n = 102, MD ‐1.42 CI ‐3.35 to 0.51, Analysis 8.18). Moreover, GAF scores were not significantly different at 3 months (Bonsack 2011) (MD ‐0.40 CI ‐3.53 to 2.73), 6 months (MD ‐1.0 CI ‐4.81 to 2.81) or 12 months (MD 2.3 CI ‐1.30 to 5.90, Analysis 8.19). Social occupational functioning (SOFAS) scores were not significantly different at 3 months (Bonsack 2011) (MD 0.10 CI ‐3.02 to 3.22), 6 months (MD ‐0.10 CI ‐3.51 to 3.31) or 12 months (MD 2.70 CI ‐1.08 to 6.48, Analysis 8.20). Number of crimes reported at 6 and 12 months are reported in 'Other data' tables (Analysis 8.21).

Comparison 9: skills training versus treatment as usual

See summary of findings Table 9. Data for this comparison came from two trials (Hellerstein 1995; Jerrell 1995a).

9.1 Lost to treatment

We found that the pooled results of Hellerstein 1995 and Jerrell 1995a showed a 51% greater likelihood that participants would be lost from the control group by 6 months (treatment 16%, control 31%; n = 94, RR 0.49 CI 0.24 to 0.97) although this was not significant by 12 months (treatment 27%, control 37%; n = 94, RR 0.70 CI 0.44 to 1.10). By contrast, at 18 months we found that participants given skills training were twice as likely to be lost (treatment 68%, control 28%; n = 47, 1 RCT, RR 2.44 CI 1.22 to 4.86, Analysis 9.1).

9.2 Substance use

Average scores of various substance use scales were skewed and reported in 'Other data' tables (Analysis 9.2).

9.3 Functioning

We found no significant differences in average role functioning scores by 6 months (Jerrell 1995a) (n = 47, MD 0.61 CI ‐1.63 to 2.85), 12 months (n = 47, MD 1.07 CI ‐1.15 to 3.29) and 18 months (n = 25, MD ‐2.55 CI ‐6.24 to 1.14, Analysis 9.3). No differences were observed in social adjustment (SAS) by 6 months (Jerrell 1995a) (n = 47, MD ‐0.92 CI ‐6.58 to 4.74), 12 months (n = 47, MD 2.58 CI ‐3.39 to 8.55) and 18 months (n = 25, MD ‐4.66 CI ‐15.29 to 5.97, Analysis 9.4).

Comparison 10: specialised case management services versus standard care

See summary of findings Table 10. Godley 1994 was a small trial that we found difficult to present and interpret. Data were reported by site and were all skewed.

10.1 Service use

We were only able to add outcome data relating to admissions and length of stay and these were all skewed data, which we have reported in 'Other data' tables (Analysis 10.1). We found no pattern overall of one package of care favoured over another.

Comparison 11: integrated assertive community treatment versus assertive community treatment team versus standard care

One trial contributed data for this comparison (Morse 2006). We did not construct a GRADE 'Summary of findings' table as the data were skewed and were presented according to the three arms and not as direct comparisons between each arm.

11.1 Substance use

All data for this outcome were skewed and are reported in 'Other data' tables (Analysis 11.1).

11.2 Functioning

All data for this outcome were skewed and are reported in 'Other data' tables (Analysis 11.2; Analysis 11.3).

Comparison 12: contingency management versus standard care

See summary of findings Table 11. Two trials assessed this comparison (McDonell 2013; Tracy 2007).

12.1 Lost to treatment

No significant differences were reported in lost to treatment by 4 weeks (Tracy 2007) (treatment 0%, control 27%; n = 30, RR 0.11 CI 0.01 to 1.90). However, McDonell 2013 reported that those assigned to the contingency management condition were more likely not to complete the treatment period (dropping out) than controls at 3 months (treatment 58%, control 35% lost; n = 176, RR 1.65 CI 1.18 to 2.31, Z = 2.92, P = 0.0035, Analysis 12.1).

12.2 Lost to evaluation

No significant differences were reported in those lost to evaluation by 6 months (McDonell 2013) (treatment 32%, control 24%; n = 176, RR 1.35 CI 0.83 to 2.20, Analysis 12.2).

12.3 Substance use

Stimulant‐positive urine tests were significantly more likely in control versus treated patients by 12 weeks (McDonell 2013) (treatment 10%, controls 25%; n = 176, RR 0.34 CI 0.17 to 0.68, Z = 3.04, P = 0.0024) but not at 6 months (treatment 54%, control 65%; n = 176, RR 0.83 CI 0.65 to 1.06, Z = 1.46, P = 0.14, Analysis 12.3). Injection use during treatment was significantly lower in the treatment arm compared to the control arm at 3 months (McDonell 2013) (treatment 37%, control 66%; n = 176, RR 0.57 CI 0.42 0.77, Z = 3.62, P < 0.001) but was not significantly different at the 6 month follow‐up (treatment 44%, control 56%; n = 107, RR 0.78 CI 0.53 1.15, Z = 1.24, P = 0.22, Analysis 12.4). Average scores on various substance use measures were skewed and reported in 'Other data' tables (Analysis 12.5).

12.4 Mental state

Relapse rates (hospitalised within 6 months after randomisation) were significantly lower in the treatment arm compared to the control arm (McDonell 2013) (treatment 2%, control 11%; n = 176, RR 0.21, CI 0.05 0.93, Analysis 12.6). Average scores on various mental state scales were skewed and reported in 'Other data' tables (Analysis 12.7).

Comparison 13: sensitivity analyses

All of the included studies were described as randomised and random sequence generation was judged as at low or unclear risk of bias for all included trials. Therefore, we did not undertake the anticipated sensitivity analysis. There were only two comparisons (Analysis 3.15; Analysis 8.2) where four or more studies were reported for a comparison and sensitivity analyses were undertaken for these. Analysis 13.1 grouped studies investigating motivational interviewing plus cognitive behavioural therapy according to risk of bias for allocation concealment and Analysis 13.2 grouped studies investigating motivational interviewing according to diagnostic entry criteria (mixed diagnoses versus schizophrenia only trials) for the short to medium term (three to six months). Neither of these analyses altered the overall result.

Discussion

Summary of main results

Comparison 1: integrated models of care versus treatment as usual

Please see summary of findings Table for the main comparison. Overall there was low quality evidence of no difference between integrated models of care and treatment as usual in terms of numbers lost to treatment or deaths by 36 months, although individually some studies (Burnam 1995; Essock 2006) showed some effect for retaining participants in evaluation during the early stages of each study. At the end of each treatment period differences were no longer apparent. All four studies had sample sizes greater than 100 participants, drawn from homeless (Burnam 1995), forensic (Chandler 2006) and community populations (Drake 1998a; Essock 2006). Modified scales were used by Burnam 1995, precluding inclusion.

There was low quality evidence of no difference in alcohol or substance use between integrated models of care and treatment as usual in terms of, or not, of remission by 36 months.

Moreover, there was low quality evidence of no difference between integrated models of care and treatment as usual in terms of average general global functioning or satisfaction with quality of life.

Outcome measures of jail and hospital days, arrests and hours of medication service were all skewed in Chandler 2006. This resulted in attrition being the only clear outcome measure which could be analysed. We were able to include data from Essock 2006 and Drake 1998a. They provided both treatment and controls groups with a certain level of integrated care, the difference being that that the ACT teams provided most outpatient services themselves while standard case management (treatment as usual) brokered services to other clinicians. The null results found in this review suggest that providing services by the same team may not be crucial to successful integration of services, although readers are advised that the quality of evidence is low overall.

Comparison 2: non‐integrated models of care or intensive case management versus treatment as usual

Please see summary of findings Table 2. There was very low quality evidence of no difference between non‐integrated models of care or intensive case management and treatment as usual in terms of being lost to treatment by 12 months. Death was not measured in any of the trials. There was very low quality evidence of no difference between non‐integrated models of care or intensive case management and treatment as usual in terms of alcohol or drug use as data were skewed or not reported.

Moreover, there was very low quality evidence of no difference between non‐integrated models of care or intensive case management and treatment as usual in terms of mental state, average general global functioning or general life satisfaction.

The results showed no support for retaining participants in non‐integrated treatment over standard case management at any time period. We were only able to include little data as attrition rates were high (Bond 1991a), adapted scales were used, and the data were skewed or reporting was unclear (Bond 1991b; Lehman 1993; Jerrell 1995b). The role functioning (RFS) data provided by Jerrell 1995b by the end of the study (18 months) favoured the Twelve Step recovery control group, with a small but significant difference. The social adjustment scores were similar between groups.

Comparison 3: cognitive behavioural therapy + motivational interviewing (CBT+MI) versus treatment as usual

Please see summary of findings Table 3. There was low quality of evidence of no difference between CBT+MI and treatment as usual in terms of numbers lost to treatment or deaths by 12 months. All the data for alcohol use was skewed and evidence for substance use by 6 months was very low quality.

There was very low quality evidence of no difference between CBT+MI and treatment as usual in terms of mental state (relapse) and average global functioning at 12 months. Moreover, there was low quality evidence for quality of life at 6 months between treatment arms.

We found some support for the effectiveness of CBT+MI over standard care, yet, findings were inconsistent and, again, much data were unable to be used from all seven eligible studies. The Barrowclough 2001 was a small study but showed an increased likelihood of relapse in the control group up until 18 months. Global functioning was slightly lower in the control group by nine months, although this difference was not sustained at later time periods (up to 18 months). Bellack 2006 showed slightly decreased general life satisfaction scores and a 51% increased likelihood of being arrested in their reasonably sized control group by six months. By contrast, Baker 2006 showed that participants were more likely to drop out of the treatment group by three months. The treatment group also seemed to have a slightly higher mean number of drugs used by three months; this difference was not apparent by six months and Madigan 2013 showed no difference in cannabis use at three or six months. The largest study to date (Barrowclough 2010) reported no significant differences between interventions and death or hospitalised versus not admitted to hospital and alive by 24 months. Nor did this study report any differences in substance use, mental state (PANSS), or other outcomes. Hjorthoj 2013 reported higher satisfaction scores by 10 months but no difference were reported in quality of life or other outcomes.

Further research is required to determine whether long‐term cognitive behavioural therapy combined with motivational interviewing is useful and cost‐effective.

Comparison 4: cognitive behavioural therapy (CBT) versus treatment as usual

Please see summary of findings Table 4. There was low quality evidence of no difference between CBT and treatment as usual in terms of numbers lost to treatment by three months. Death was not measured in any of the trials. Neither trial reported alcohol use separately, so this effect could not be estimated and evidence for substance use by six months was very low quality.

There was low quality evidence of no difference between CBT and treatment as usual for mental state (BPRS) at six months, and evidence was very low for global functioning at six months. No study reported life satisfaction.

Support for retention in CBT was from the pooled results of Edwards 2006 and Naeem 2005. One study (Edwards 2006) found a 30% increased likelihood of cannabis use by those in the treatment group after 10 weekly sessions of CBT. No other differences were observed on measures of substance use or mental state and functioning, but again much of the data were unusable.

Comparison 5: cognitive behavioural therapy + psychological rehabilitation versus treatment as usual

Please see summary of findings Table 5. All of the outcomes for this comparison were very low quality or the outcome was not measured. It is problematic to interpret the skewed data and all data were from one study (Maloney 2006) allocating less than 100 people to this comparison. It is feasible that a subtle difference between treatment groups could not be highlighted because of the limited power of the trial but, from what data we have, there is no indication that the number of arrests is less in the cognitive behavioural therapy plus psychosocial rehabilitation group over all the time periods. This also applies to the number of convictions and the number of days in jail.

Comparison 6: combined cognitive behavioural therapy + intensive case management versus treatment as usual

Please see summary of findings Table 6. All of the outcomes for this comparison were very low quality or the outcome was not measured. Again Maloney 2006 reports useful outcomes relating to functioning in society but again the data are skewed and difficult to interpret. Unlike the preceding comparison, however, there is a suggestion that there may be some positive effect for people allocated to the cognitive behavioural therapy + intensive case management group. The number of arrests is less in the cognitive behavioural therapy + intensive case management group over all time periods, and this also applies to the number of convictions at 12 and 30 months. However, the number of days in jail for each group is not noticeably different. This may give some hope that the very intensive approach does have some benefit in terms of these important outcomes but, again, these findings from such a small study should be replicated before making any change in policy. Economic analyses would also be of interest for this package of care that is likely to be expensive.

Comparison 7: intensive case management versus treatment as usual

Please see summary of findings Table 7. All of the outcomes for this comparison were very low quality or the outcome was not measured. The intensive case management on its own did not produce results that give the impression of there being any major real effect in terms of functioning. Again, these skewed data are difficult to interpret and come from one small study (Maloney 2006).

Comparison 8: motivational interviewing versus treatment as usual

Please see summary of findings Table 8. There was very low quality evidence of no difference between motivational interviewing and treatment as usual in terms of numbers lost to treatment (six months), lost to evaluation (12 months) or deaths (18 months). There was very low quality evidence of not abstaining from alcohol (6 months) or polydrug use (12 months).

There was very low quality evidence of no difference between motivational interviewing and treatment as usual in terms of mental state (SCL‐90, three months) and average global functioning at 12 months. None of the trials measured general life satisfaction.

Some support was found for the effectiveness of motivational interviewing in reducing substance use, even though studies were generally small, interventions brief, and follow‐up times shorter than for other comparisons. Graeber 2003 found that there was more likelihood that patients in the treatment group would abstain from alcohol after only three sessions of motivational interviewing; by three months and six months this increased. Bonsack 2011 reported that individual sessions of motivational interviewing for up to 6 months reduced the number of joints consumed at three and six months, but not at 12 months follow‐up. Similarly, patients in the treatment group of Kavanagh 2004 showed they were more likely to be abstaining or had improved on all substances by 12 months after three hours of motivational interviewing over six to nine sessions. More patients in the treatment group of Swanson 1999 attended their first aftercare appointment after one 15 minute and one one‐hour session. Bechdolf 2011 also reported higher chances of attending outpatients over a period of six months. In contrast, Baker 2002 reported little differences between groups after one 45 minute session, which was more apparent at 12 months than at three months when the treatment showed some benefit. Hickman 1997 showed little difference in mental state scores after one brief session. The results indicate that multiple sessions of motivational interviewing may lead to short‐term reductions in substance use and increased attendance at outpatient appointments.

Comparison 9: skills training versus treatment as usual

Please see summary of findings Table 9. There was very low quality of evidence of no difference between skills training and treatment as usual in terms of numbers lost to treatment by 12 months. Death was not measured in any of the trials. There was also very low evidence for differences in alcohol use or substance use by 12 months as the data were skewed.

Moreover, there was very low quality evidence of no difference between skills training and treatment as usual for mental state (relapse) at eight months and for global functioning at 12 months. Neither trial reported on general life satisfaction.

Pooled results of Hellerstein 1995 and Jerrell 1995a showed that control group participants were more likely to be lost from the study. However, by 18 months Jerrell 1995a reported that participants in their treatment programme were more likely to be lost. Both studies adopted a psycho‐educational approach to both mental health and substance use treatment for their treatment groups. Hellerstein 1995 offered their treatment group a same site co‐ordinated treatment approach and their control group were offered the same treatment, which was not case co‐ordinated.

Comparison 10: specialsied case management services versus standard care

Please see summary of findings Table 10. All of the outcomes for this comparison were very low quality or the outcome was not measured. One small study (Godley 1994) presents data by site and, clearly, practice by site does differ considerably. There is not really a clear pattern in the data suggesting an effect, and where there is some suggestion of a difference between groups the data are based on very few people.

Comparison 11: integrated assertive community treatment versus assertive community treatment team versus standard care

No 'Summary of findings' table was conducted for this comparison. All of the outcomes for this comparison were very low quality or the outcome was not measured. Morse 2006 was a three‐arm study with about 50 people in each arm. Interesting data were presented for important outcomes but all were continuous and skewed. None gave the impression of a real difference occurring between the two packages of care and the standard care. Again, considering the huge effort that must have gone into the integrated assertive community treatment and assertive community treatment, this might indicate how difficult this group of people are to treat, or how standard care has as good an effect as anything in terms of substance misuse and general housing outcomes.

Comparison 12: contingency management versus standard care

Please see summary of findings Table 11. There was low quality evidence of no difference between contingency management and treatment as usual in terms of numbers lost to treatment by three months. Death was not measured in any of the trials. There was also little evidence for differences in alcohol use (data skewed) or substance use (stimulant‐positive urine tests) by six months.

Moreover, there was low quality evidence of no difference between contingency management and treatment as usual for mental state (number hospitalised) at six months. Neither trial reported on global assessment of functioning and general life satisfaction.

McDonell 2013 reported fewer patients with a stimulant‐positive urine at the end of treatment (three months) for the contingency management arm compared to standard care. However, by six months (three months post‐treatment) this was no longer significant. Moreover, they also reported less injection use at the end of treatment (three months) for the active arm compared to standard care, and again this was no longer significant at six months. Over the six month trial, fewer patients in the contingency managed arm were hospitalised compared to standard care.

Sensitivity analysis

Sensitivity analyses were conducted to ascertain if there were substantial differences in the results when lesser quality trials were excluded. There were relatively few trials to conduct the sensitivity analysis due to the small numbers of trials in each intervention and the large number of outcome measures at variable time points. There was no indication that trials of lesser quality or those recruiting patients with severe mental illness other than schizophrenia influenced the overall outcomes in this review.

Overall completeness and applicability of evidence

Many of the included studies were described as pilot studies which included small samples sizes. Fourteen trials involved more than 100 participants and two of these involved more than 200 participants after randomisation (Barrowclough 2010; Drake 1998a). However, the overall power for a particular common outcome and comparison was low due to the variety of interventions and outcomes measured.

Examination of the summary of findings indicates that several critical or important outcomes were not measured by any of the studies, and therefore no power exists. Future research could examine these comparisons in order to bring to light any potential benefits in the management of patients with a dual diagnosis.

The majority of studies presented medium‐term data; with six months to one year follow‐up. This is a reasonable length of time to assess differences in the intervention effects. Longer‐term studies (one to three years) employed integrated and assertive community care interventions. These types of studies are important to engage patients in treatment programmes that help recovery from serious mental illness.

Quality of the evidence

Primary outcome measures selected for this review were: remaining in treatment, substance use, and mental state. Pooled results demonstrated no consistent evidence to support any one treatment intervention over standard care. Some support for motivational interviewing was found from individual studies for substance use reduction. When motivational interviewing was offered in conjunction with cognitive behavioural therapy there was little support for improved mental state. These findings suggest that motivational interviewing is a crucial component to the effectiveness of treatment with cognitive behavioural therapy. However, it was challenging to identify the key aspects of each intervention given that these are mostly complex, multi‐faceted interventions. Little attention was paid to reporting the fidelity of the delivery of each intervention.

A limitation of this review is that there was substantial variation between studies as to what constituted standard care, in addition to some differences between the interventions themselves. For example, fidelity, duration, and intensity of treatment conditions varied, furthermore the outcome reporting periods also differed. This resulted in difficulties in grouping and interpreting data. There was a high volume of problematic data due to skew, use of non‐validated scales, or unclear reporting. Further high quality randomised trials are required which employ large samples, use validated and clinically relevant measures, and present data in a way that can be incorporated into a meta‐analysis.

All study participants had a diagnosis of severe mental illness and substance misuse. Participants were from a wide range of settings, so the results of this review will be applicable to similar patients, particularly those in the USA, as trials from the USA (21) were included in all comparisons. Some generalisation can be assumed for the UK (three trials) and Australia (six trials) for cognitive behavioural therapy and motivational interviewing as the studies from these areas examined these interventions. Integrated, non‐integrated, and skills training intervention findings may apply elsewhere only if the intervention is delivered in a similar manner. However, as there are differences between the USA and other countries' services, including education and training of health service staff, generalisation to other areas must be interpreted with caution (Donald 2005; Lowe 2004; Tyrer 2004). This is also true for resource‐constrained settings. We did not identify any trials from low or middle income countries.

Missing outcomes or too few data

Out of the primary outcome measures, studies only reported numbers lost to treatment clearly enough to allow pooling of results in each of the comparisons. Often the other primary outcome measures (substance use, mental state) were reported as continuous rather than binary data and much of these data were problematic. With this particular population, skewed data may be unavoidable and, as such, is problematic to present and manage in a meta‐analysis. However, opportunities were missed to report simple and useful binary outcomes.

Potential biases in the review process

It is possible that we failed to identify small negative trials, and we would be most interested if readers know of these. We endeavoured to reduce this potential bias by conducting a wide search, duplicate extraction, multiple checking, and handsearching key references and journals. We also contacted many of the authors of these trials over the years, and for this review we asked if they knew of any recently completed or ongoing trials. The introduction of websites and journals to register trials hopefully will reduce the 'file drawer' phenomenon, as negative trials are less likely to be published.

It is possible that our consideration of these data have been biased by our foreknowledge of the past work (Cleary 2008; Ley 2000). It is difficult to know what to do about this except to state that we do make every effort to be open to any new information or interpretation.

Agreements and disagreements with other studies or reviews

The findings of this review agree with other narrative syntheses of the literature, which have come to the same conclusions. There is little evidence from trials to support any one psychosocial treatment over another to reduce substance use or improve mental state for people with a serious mental illness (Baker 2012; Cleary 2009a; Dixon 2010; Drake 1998b; Horsfall 2009; NICE 2011).

Figure 1

Search flow diagram, assessment and reporting of included and excluded studies for 2013 Update.

Navigate to figure in ReviewOpen in new tab

Figure 2

Risk of bias graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.

Navigate to figure in ReviewOpen in new tab

Figure 3

Risk of bias summary: review authors' judgements about each risk of bias item for each included study.

Navigate to figure in ReviewOpen in new tab

Analysis 1.1

Comparison 1 Integrated models of care versus treatment as usual, Outcome 1 Lost to treatment ‐ by 36 months.

Navigate to figure in ReviewOpen in new tab

Analysis 1.2

Comparison 1 Integrated models of care versus treatment as usual, Outcome 2 Lost to evaluation.

Navigate to figure in ReviewOpen in new tab

Analysis 1.3

Comparison 1 Integrated models of care versus treatment as usual, Outcome 3 Death ‐ by 36 months.

Navigate to figure in ReviewOpen in new tab

Analysis 1.4

Comparison 1 Integrated models of care versus treatment as usual, Outcome 4 Substance use: 1. Not in remission ‐ by 36 months.

Navigate to figure in ReviewOpen in new tab

Analysis 1.5

Comparison 1 Integrated models of care versus treatment as usual, Outcome 5 Substance use: 2. Average score for progress towards recovery (SATS, low = poor).

Navigate to figure in ReviewOpen in new tab


Study	Intervention	Mean	SD	N
average score ‐ 6 months (AUS)
Drake 1998a	Treatment	3.12	1.03	70
Drake 1998a	Control	2.97	1.09	65
Essock 2006	Treatment	2.90	1.30	68
Essock 2006	Control	3.00	1.10	61
average score ‐ 12 months (AUS)
Drake 1998a	Treatment	3.17	1.05	75
Drake 1998a	Control	2.84	1.23	66
Essock 2006	Treatment	2.80	1.30	64
Essock 2006	Control	3.10	1.0	62
average score ‐ 18 months (AUS)
Drake 1998a	Treatment	3.07	1.15	75
Drake 1998a	Control	2.79	1.10	65
Essock 2006	Treatment	2.80	1.20	65
Essock 2006	Control	2.90	1.20	65
average score ‐ 24 months (AUS)
Drake 1998a	Treatment	2.98	1.07	73
Drake 1998a	Control	2.79	1.16	67
Essock 2006	Treatment	2.60	1.20	60
Essock 2006	Control	3.0	1.20	63
average score ‐ 30 months (AUS)
Drake 1998a	Treatment	2.86	1.09	72
Drake 1998a	Control	2.96	1.18	65
Essock 2006	Treatment	2.80	1.20	58
Essock 2006	Control	2.80	1.20	59
average score ‐ 36 months (AUS)
Drake 1998a	Treatment	2.70	1.12	74
Drake 1998a	Control	2.82	1.16	68
Essock 2006	Treatment	2.70	1.0	58
Essock 2006	Control	2.80	1.30	60
number of days using in last 6 months ‐ 6 month
Drake 1998a	Treatment	56.80	56.40	75
Drake 1998a	Control	47.50	58.40	68
Essock 2006	Treatment	36.60	50.1	65
Essock 2006	Control	38.60	54.20	53
number of days using in last 6 months ‐ 12 month
Drake 1998a	Treatment	59.10	53.30	75
Drake 1998a	Control	42.80	52.90	68
Essock 2006	Treatment	35.80	50.00	64
Essock 2006	Control	44.80	58.80	61
number of days using in last 6 months ‐ 18 month
Drake 1998a	Treatment	53.80	57.80	75
Drake 1998a	Control	33.50	44.40	68
Essock 2006	Treatment	30.00	46.90	64
Essock 2006	Control	37.00	46.80	66
number of days using in last 6 months ‐ 24 month
Drake 1998a	Treatment	52.40	55.90	75
Drake 1998a	Control	31.00	43.00	68
Essock 2006	Treatment	31.30	46.70	57
Essock 2006	Control	47.60	62.80	64
number of days using in last 6 months ‐ 30 month
Drake 1998a	Treatment	54.80	60.90	75
Drake 1998a	Control	54.00	63.00	68
Essock 2006	Treatment	39.40	52.40	60
Essock 2006	Control	42.80	59.80	62
number of days using in last 6 months ‐ 36 month
Drake 1998a	Treatment	46.40	53.60	75
Drake 1998a	Control	43.60	57.30	68
Essock 2006	Treatment	33.50	47.30	60
Essock 2006	Control	31.10	49.90	59

Analysis 1.6

Comparison 1 Integrated models of care versus treatment as usual, Outcome 6 Substance use: 3. Alcohol (skewed data).


Study	Treatment	Mean	SD	N	Notes
days institutionalised (hospital or incarcerated) ‐ 36 months (site 2)
Essock 2006	Integrated models of care	139	262	48	p=0.02 using Mann Whitney U test (non‐parametric test).
Essock 2006	TAU	158	254	50
days in hospital ‐ 36 months (site 2)
Essock 2006	Integrated models of care	32	91	48	p=0.002 using Mann Whitney U test (non‐parametric test).
Essock 2006	TAU	41	60	50
days in stable community residence ‐ 24 months
Essock 2006	Treatment	264.30	130.40	89
Essock 2006	Control	245.60	143.90	85
time on streets (%) ‐ 3 months
Burnam 1995	Treatment	24.77	42.21	54
Burnam 1995	Control	19.85	40.68	45
time on streets (%) ‐ 6 months
Burnam 1995	Treatment	29.67	44.86	45
Burnam 1995	Control	21.04	42.44	47
time on streets (%) ‐ 9 months
Burnam 1995	Treatment	19.73	47.28	43
Burnam 1995	Control	25.14	39.45	36
time in independent housing in past 60 days ‐ 3 months
Burnam 1995	Treatment	1.72	33.79	54
Burnam 1995	Control	9.67	40.33	45
time in independent housing in past 60 days ‐ 6 months
Burnam 1995	Treatment	35.20	48.78	45
Burnam 1995	Control	10.83	37.06	47
time in independent housing in past 60 days ‐ 9 months
Burnam 1995	Treatment	16.30	45.84	43
Burnam 1995	Control	26.31	46.83	36


Study	Treatment Group	Median	range	N	Notes
Barrowclough 2001	CBT + MI	19.99	‐25.6 to 83.4	17	U=86.5 p<0.03 Non‐parametric analysis. Data summed over 4 time periods (to 12 months) and subtracted from baseline.
Barrowclough 2001	TAU	‐6.52	‐67.9‐53.2	15


Study	Intervention	Time period	Mean	SD	N
average score (CASUAS)
Edwards 2006	Treatment	3 months	1.4	1.4	23
Edwards 2006	Control		1.3	1.4	24
Edwards 2006	Treatment	6 months	1.4	1.4	23
Edwards 2006	Control		1.3	1.5	24
average score (HONOS ‐ item 3)
Naeem 2005	Treatment	3 months	0.85	1.00	67
Naeem 2005	Control		0.88	0.88	38
Naeem 2005
Naeem 2005

Outcome or subgroup title	No. of studies	No. of participants	Statistical method	Effect size
1 Lost to treatment ‐ by 36 months Show forest plot	3	603	Risk Ratio (M‐H, Random, 95% CI)	1.09 [0.82, 1.45]

2 Lost to evaluation Show forest plot	4		Risk Ratio (M‐H, Random, 95% CI)	Subtotals only

2.1 by 3 months	1	132	Risk Ratio (M‐H, Random, 95% CI)	0.54 [0.27, 1.08]
2.2 by 6 months	2	330	Risk Ratio (M‐H, Random, 95% CI)	0.69 [0.27, 1.73]
2.3 by 9 months	1	132	Risk Ratio (M‐H, Random, 95% CI)	0.76 [0.49, 1.19]
2.4 by 12 months	1	198	Risk Ratio (M‐H, Random, 95% CI)	0.54 [0.22, 1.29]
2.5 by 24 months	1	198	Risk Ratio (M‐H, Random, 95% CI)	1.0 [0.47, 2.12]
2.6 by 36 months	3	603	Risk Ratio (M‐H, Random, 95% CI)	0.76 [0.35, 1.66]
3 Death ‐ by 36 months Show forest plot	2	421	Risk Ratio (M‐H, Random, 95% CI)	1.18 [0.39, 3.57]

4 Substance use: 1. Not in remission ‐ by 36 months Show forest plot	1		Risk Ratio (M‐H, Random, 95% CI)	Subtotals only

4.1 alcohol	1	143	Risk Ratio (M‐H, Random, 95% CI)	1.15 [0.84, 1.56]
4.2 drugs	1	85	Risk Ratio (M‐H, Random, 95% CI)	0.89 [0.63, 1.25]
5 Substance use: 2. Average score for progress towards recovery (SATS, low = poor) Show forest plot	1		Mean Difference (IV, Random, 95% CI)	Subtotals only

5.1 by 6 months	1	203	Mean Difference (IV, Random, 95% CI)	0.07 [‐0.28, 0.42]
5.2 by 36 months	1	203	Mean Difference (IV, Random, 95% CI)	0.11 [‐0.41, 0.63]
6 Substance use: 3. Alcohol (skewed data) Show forest plot			Other data	No numeric data

6.1 average score ‐ 6 months (AUS)			Other data	No numeric data
6.2 average score ‐ 12 months (AUS)			Other data	No numeric data
6.3 average score ‐ 18 months (AUS)			Other data	No numeric data
6.4 average score ‐ 24 months (AUS)			Other data	No numeric data
6.5 average score ‐ 30 months (AUS)			Other data	No numeric data
6.6 average score ‐ 36 months (AUS)			Other data	No numeric data
6.7 number of days using in last 6 months ‐ 6 month			Other data	No numeric data
6.8 number of days using in last 6 months ‐ 12 month			Other data	No numeric data
6.9 number of days using in last 6 months ‐ 18 month			Other data	No numeric data
6.10 number of days using in last 6 months ‐ 24 month			Other data	No numeric data
6.11 number of days using in last 6 months ‐ 30 month			Other data	No numeric data
6.12 number of days using in last 6 months ‐ 36 month			Other data	No numeric data
7 Substance use: 4. Drugs (skewed data) Show forest plot			Other data	No numeric data

7.1 average score ‐ 6 months (DUS)			Other data	No numeric data
7.2 average score ‐ 12 months (DUS)			Other data	No numeric data
7.3 average score ‐ 18 months (DUS)			Other data	No numeric data
7.4 average score ‐ 24 months (DUS)			Other data	No numeric data
7.5 average score ‐ 30 months (DUS)			Other data	No numeric data
7.6 average score ‐ 36 months (DUS)			Other data	No numeric data
7.7 number of days using in last 6 months ‐ 6 months			Other data	No numeric data
7.8 number of days using in last 6 months ‐ 12 months			Other data	No numeric data
7.9 number of days using in last 6 months ‐ 18 months			Other data	No numeric data
7.10 number of days using in last 6 months ‐ 24 month			Other data	No numeric data
7.11 number of days using in last 6 months ‐ 30 month			Other data	No numeric data
7.12 number of days using in last 6 months ‐ 36 month			Other data	No numeric data
8 Substance use: 5. General (skewed data) Show forest plot			Other data	No numeric data

8.1 average score ‐ 6 months (SATS)			Other data	No numeric data
8.2 average score ‐ 12 months (SATS)			Other data	No numeric data
8.3 average score ‐ 18 months (SATS)			Other data	No numeric data
8.4 average score ‐ 24 months (SATS)			Other data	No numeric data
8.5 average score ‐ 30 months (SATS)			Other data	No numeric data
8.6 average score ‐ 36 months (SATS)			Other data	No numeric data
9 Mental state: 1. Relapse (hospitalization days and crisis care) ‐ 36 months (skewed data) Show forest plot			Other data	No numeric data

10 Mental state: 2. Average score (BPRS, skewed data) Show forest plot			Other data	No numeric data

10.1 6 months			Other data	No numeric data
10.2 12 months			Other data	No numeric data
10.3 18 months			Other data	No numeric data
10.4 24 months			Other data	No numeric data
10.5 30 months			Other data	No numeric data
10.6 36 months			Other data	No numeric data
11 Service use: 1. Days in stable community residences (not in hospital) Show forest plot	2		Mean Difference (IV, Random, 95% CI)	Subtotals only

11.1 by 12 months	2	378	Mean Difference (IV, Random, 95% CI)	‐10.00 [‐38.61, 18.60]
11.2 by 24 months	1	203	Mean Difference (IV, Random, 95% CI)	7.40 [‐6.32, 21.12]
11.3 by 36 months	2	364	Mean Difference (IV, Random, 95% CI)	5.17 [‐9.20, 19.55]
12 Service use: 2. Number hospitalised ‐ during the 36 month study period Show forest plot	1	198	Risk Ratio (M‐H, Random, 95% CI)	0.88 [0.64, 1.19]

13 Service use: 3. Various measures (skewed data) Show forest plot			Other data	No numeric data

13.1 days institutionalised (hospital or incarcerated) ‐ 36 months (site 2)			Other data	No numeric data
13.2 days in hospital ‐ 36 months (site 2)			Other data	No numeric data
13.3 days in stable community residence ‐ 24 months			Other data	No numeric data
13.4 time on streets (%) ‐ 3 months			Other data	No numeric data
13.5 time on streets (%) ‐ 6 months			Other data	No numeric data
13.6 time on streets (%) ‐ 9 months			Other data	No numeric data
13.7 time in independent housing in past 60 days ‐ 3 months			Other data	No numeric data
13.8 time in independent housing in past 60 days ‐ 6 months			Other data	No numeric data
13.9 time in independent housing in past 60 days ‐ 9 months			Other data	No numeric data
14 Functioning: 1. Average general score (GAF, low = poor) Show forest plot	1		Mean Difference (IV, Random, 95% CI)	Subtotals only

14.1 by 6 months	1	162	Mean Difference (IV, Random, 95% CI)	1.10 [‐1.58, 3.78]
14.2 by 12 months	1	171	Mean Difference (IV, Random, 95% CI)	0.70 [‐2.07, 3.47]
14.3 by 18 months	1	176	Mean Difference (IV, Random, 95% CI)	1.0 [‐1.58, 3.58]
14.4 by 24 months	1	166	Mean Difference (IV, Random, 95% CI)	1.70 [‐1.18, 4.58]
14.5 by 30 months	1	164	Mean Difference (IV, Random, 95% CI)	‐0.60 [‐3.56, 2.36]
14.6 by 36 months	1	170	Mean Difference (IV, Random, 95% CI)	0.40 [‐2.47, 3.27]
15 Functioning: 2. Forensic measures (skewed data) Show forest plot			Other data	No numeric data

15.1 arrests ‐ 36 months			Other data	No numeric data
15.2 convictions ‐ 36 months			Other data	No numeric data
15.3 felony ‐ 36 months			Other data	No numeric data
15.4 hospital or jail ‐ 3 months			Other data	No numeric data
15.5 jail days ‐ 36 months			Other data	No numeric data
16 Functioning: 3. Medication hours ‐ 36 months (skewed data) Show forest plot			Other data	No numeric data

17 Functioning: 4. Proportion of time on the street ‐ past 60 days Show forest plot			Other data	No numeric data

17.1 3 months			Other data	No numeric data
17.2 6 months			Other data	No numeric data
17.3 9 months			Other data	No numeric data
18 Functioning: 5. Proportion of time in independent housing ‐ past 60 days Show forest plot			Other data	No numeric data

18.1 3 months			Other data	No numeric data
18.2 6 months			Other data	No numeric data
18.3 9 months			Other data	No numeric data
19 Satisfaction with QOF: Average general score (QOLI, range 1‐7, low = poor) Show forest plot	2		Mean Difference (IV, Random, 95% CI)	Subtotals only

19.1 by 6 months	2	361	Mean Difference (IV, Random, 95% CI)	‐0.11 [‐0.41, 0.20]
19.2 by 12 months	2	372	Mean Difference (IV, Random, 95% CI)	0.02 [‐0.28, 0.32]
19.3 by 18 months	2	377	Mean Difference (IV, Random, 95% CI)	0.09 [‐0.27, 0.44]
19.4 by 24 months	2	370	Mean Difference (IV, Random, 95% CI)	0.02 [‐0.29, 0.33]
19.5 by 30 months	2	366	Mean Difference (IV, Random, 95% CI)	0.02 [‐0.27, 0.32]
19.6 by 36 months	2	373	Mean Difference (IV, Random, 95% CI)	0.10 [‐0.18, 0.38]

Cochrane Review language

Website language

Abstract

Background

Objectives

Search methods

Selection criteria

Data collection and analysis

Main results

Authors' conclusions

PICOs

PICOs

Population

Intervention

Comparison

Outcome

Plain language summary

Psychosocial interventions for people with both severe mental illness and substance misuse

Visual summary

Authors' conclusions

Implications for practice

1. For people with severe mental illness and substance misuse problems, and their carers

2. For clinicians

3. For policy makers and commissioners of care

Implications for research

1. General

1.1 Reporting of outcome measures

1.2 Methodology

2. Specific

Summary of findings

Background

Description of the condition

Description of the intervention

How the intervention might work

Why it is important to do this review

Objectives

Methods

Criteria for considering studies for this review

Types of studies

Types of participants

Types of interventions

1. Provider‐oriented long‐term interventions: integrated and non‐integrated care by community mental health teams for dual diagnosis populations

1.1 Integrated models of care with assertive community treatment (ACT)

1.2 Non‐integrated models of care or intensive case management

2. Patient or client focused short‐term interventions for substance misuse

2.1 Individual approaches

2.2 Group approaches

3. Standard care or treatment as usual

Types of outcome measures

Primary outcomes

1. Numbers lost to treatment: this is a measure of stability and engagement.

2. Change in substance use as defined by each of the studies.

3. Changes in symptoms as defined by each of the studies.

Secondary outcomes

1. Numbers lost to evaluation.

2. Death (all causes).

3. Substance use (alcohol or drugs, or both).

4. Mental state.

5. Global functioning.

6. Social functioning.

7. Quality of life and life satisfaction.

8. Hospital readmissions (and days in the community).

9. Homelessness.

10. Compliance with treatment and medication.

Summary of findings table

Search methods for identification of studies

Electronic searches

Cochrane Schizophrenia Group Trials Register

Searching other resources

1. Reference lists

2. Journal databases

3. Trials registries

4. Personal contact

Data collection and analysis

Selection of studies

Data extraction and management

1. Extraction

2. Management

2.1 Forms

2.2 Scale‐derived data

3.2 Employing the I² statistic