Scolaris Content Display Scolaris Content Display

St John's wort for major depression

Collapse all Expand all

Abstract

available in

Background

In some countries extracts of the plant Hypericum perforatum L. (popularly called St. John's wort) are widely used for treating patients with depressive symptoms.

Objectives

To investigate whether extracts of hypericum are more effective than placebo and as effective as standard antidepressants in the treatment of major depression; and whether they have fewer adverse effects than standard antidepressant drugs.

Search methods

Trials were searched in computerised databases, by checking bibliographies of relevant articles, and by contacting manufacturers and researchers.

Selection criteria

Trials were included if they: (1) were randomised and double‐blind; (2) included patients with major depression; (3) compared extracts of St. John's wort with placebo or standard antidepressants; (4) included clinical outcomes assessing depressive symptoms.

Data collection and analysis

At least two independent reviewers extracted information from study reports. The main outcome measure for assessing effectiveness was the responder rate ratio (the relative risk of having a response to treatment). The main outcome measure for adverse effects was the number of patients dropping out due to adverse effects.

Main results

A total of 29 trials (5489 patients) including 18 comparisons with placebo and 17 comparisons with synthetic standard antidepressants met the inclusion criteria. Results of placebo‐controlled trials showed marked heterogeneity. In nine larger trials the combined response rate ratio (RR) for hypericum extracts compared with placebo was 1.28 (95% confidence interval (CI), 1.10 to 1.49) and from nine smaller trials was 1.87 (95% CI, 1.22 to 2.87). Results of trials comparing hypericum extracts and standard antidepressants were statistically homogeneous. Compared with tri‐ or tetracyclic antidepressants and selective serotonin reuptake inhibitors (SSRIs), respectively, RRs were 1.02 (95% CI, 0.90 to 1.15; 5 trials) and 1.00 (95% CI, 0.90 to 1.11; 12 trials). Both in placebo‐controlled trials and in comparisons with standard antidepressants, trials from German‐speaking countries reported findings more favourable to hypericum. Patients given hypericum extracts dropped out of trials due to adverse effects less frequently than those given older antidepressants (odds ratio (OR) 0.24; 95% CI, 0.13 to 0.46) or SSRIs (OR 0.53, 95% CI, 0.34‐0.83).

Authors' conclusions

The available evidence suggests that the hypericum extracts tested in the included trials a) are superior to placebo in patients with major depression; b) are similarly effective as standard antidepressants; c) and have fewer side effects than standard antidepressants. The association of country of origin and precision with effects sizes complicates the interpretation.

PICOs

Population
Intervention
Comparison
Outcome

The PICO model is widely used and taught in evidence-based health care as a strategy for formulating questions and search strategies and for characterizing clinical studies or meta-analyses. PICO stands for four different potential components of a clinical question: Patient, Population or Problem; Intervention; Comparison; Outcome.

See more on using PICO in the Cochrane Handbook.

St. John's wort for treating depression.

Depression is characterised by depressed mood and/or loss of interest or pleasure in nearly all activities and a variety of other symptoms for periods longer than two weeks. Extracts of St. John's wort (botanical name Hypericum perforatum L.) are prescribed widely for the treatment of depression.

We have reviewed 29 studies in 5489 patients with depression that compared treatment with extracts of St. John's wort for 4 to 12 weeks with placebo treatment or standard antidepressants. The studies came from a variety of countries, tested several different St. John's wort extracts, and mostly included patients suffering from mild to moderately severe symptoms. Overall, the St. John's wort extracts tested in the trials were superior to placebo, similarly effective as standard antidepressants, and had fewer side effects than standard antidepressants. However, findings were more favourable to St. John's wort extracts in studies form German‐speaking countries where these products have a long tradition and are often prescribed by physicians, while in studies from other countries St. John's wort extracts seemed less effective. This differences could be due to the inclusion of patients with slightly different types of depression, but it cannot be ruled out that some smaller studies from German‐speaking countries were flawed and reported overoptimistic results.

Patients suffering from depressive symptoms who wish to use a St. John's wort product should consult a health professional. Using a St. John's wort extract might be justified, but important issues should be taken into account: St. John's wort products available on the market vary to a great extent. The results of this review apply only to the preparations tested in the studies included, and possibly to extracts with similar characteristics. Side effects of St. John's wort extracts are usually minor and uncommon. However, the effects of other drugs might be significantly compromised.

Authors' conclusions

Implications for practice

In older systematic reviews and meta‐analyses of hypericum extracts (Ernst 1995; Linde 1996; Linde 1998; Kim 1999; Gaster 2000; Williams 2000) the findings of the included studies were mostly positive, but reviewers drew cautious conclusions due to methodological limitations. The quality of trials on average clearly improved over recent years. However, study findings became more often contradictory, and in the last version of our review larger trials restricted to patients with major depression showed only minimal effects over placebo (Linde 2005a). With the addition of several new, partly large trials, the cumulative evidence now suggests that hypericum extract have a modest effect over placebo in a similar range as standard antidepressants (Kirsch 2008; Turner 2008). The direct comparisons with older antidepressants and selective serotonine reuptake inhibitors seem to confirm this impression. The available clinical trials now also show that hypericum extracts have fewer side effects than both older antidepressants and selective serotonine reuptake inhibitors. This would imply that an attempt of treating mild to moderate major depression with one of the hypericum preparations positively tested in clinical trials is clearly justified. However, the differences in the findings from different countries make clear‐cut recommendations difficult. The evidence for severe major depression is still insufficient to draw conclusions.

Many patients buy St John's wort products from health food stores and might not disclose this to their physicians (Smith 2004). Such uncontrolled use is problematic as serious interactions can occur with a number of frequently used drugs (Ernst 1999; Hammerness 2003; Knüppel 2004; Whitten 2006). Therefore, physicians should regularly ask their patients about hypericum intake. However, it must be kept in mind that drug interactions are not a problem unique to hypericum extracts, but also common for standard antidepressants (Nieuwstraten 2006).

It has to be emphasised that the quality of hypericum preparations can differ considerably. The composition of a product depends on the raw plant material used, the extraction process, and the solvents. In consequence, the amounts of bioactive constituents in different products can vary enormously. A recent study has shown that a number of products available on the German market contain only minor amounts of bioactive constitutents (Wurglics 2003). The hypericum extracts tested in clinical trials have to be considered high quality products. Results obtained with these extracts cannot be extrapolated directly to other products. In our meta‐analysis, the type of extract did not contribute to the explanation of heterogeneity. This does not mean, however, that all products tested in the available trials are equally effective. Standardisation of a product on a defined component (for example, hyperforin or hypericin) does not resolve the problem, as currently the exact mechanism for the antidepressant effects of hypericum extracts is still unclear, and available research indicates that several components are relevant. The findings of this review most likely apply to products (using ethanol 50 to 60% or methanol 80% for extraction from dried plant material) with daily extract dosages of 500 to 1200 mg with a ratio of raw material to extract of 3‐7:1.

Implications for research

There is a clear need to investigate the reasons for the differences in findings from trials originating from German‐speaking countries and those from other countries. Mulitnational trials would seem desirable, but it is unlikely that there will be funding for such studies in the near future. Individual patient data meta‐analysis of existing trials could be a possible tool to investigate predictors of treatment response in a more accurate manner. The authors will try to obtain such data from researchers and/or sponsors.

Background

Description of the condition

Depressive disorders are characterised by depressed mood and/or loss of interest or pleasure in nearly all activities in the presence of other symptoms such as loss of appetite, fatigue and lack of energy, sleep disturbance, restlessness or irritability, feelings of worthlessness or inappropriate guilt, difficulty in thinking, concentrating or making decisions and thoughts of death or suicide or attempts at suicide (Candy 2008). Depressive disorders are the largest source of non‐fatal disease burden in the world, accounting for 12% of years lived with disability (Ustun 2004). There are two major classification systems to diagnose depressive disorders, the Diagnostic and Statistical Manual of Mental Disorders (DSM; current version DSM‐IV‐TR) and the International Statistical Classification of Diseases and Related Health Problems (ICD; current version ICD‐10). DSM‐IV defined depressive diagnoses include recurrent or persistent major depression and minor depression. ICD‐10 diagnoses include recurrent or persistent depression with mild, moderate or severe episodes. According to the DSM‐IV diagnostic classification, either depressed mood or a loss of interest or pleasure in daily activities consistently for at least a two week period has to be present to diagnose a major depressive disorder. The ICD‐10 system uses the term depressive episode instead of major depressive disorder, but lists similar criteria.

Description of the intervention

Major depressive episodes are most commonly treated with antidepressant medication. Current first line dugs are selective serotonin reuptake inhibitors (SSRI) or tricyclic and related antidepressants (http://guidance.nice.org.uk/CG23). However, the size of effects over placebo in clinical trials has been modest (Turner 2008; Kirsch 2008), and although SSRIs are better tolerated than older antidepressants, side effects still occur in a relevant proportion of patients.

Extracts of the plant Hypericum perforatum L. (St. John's wort), a member of the Hypericaceae family, have been used in folk medicine for a long time for a range of indications including depressive disorders. Extracts of St. John's wort are licensed and widely used in Germany for the treatment of depressive, anxiety and sleep disorders. In recent years, hypericum extracts have also become increasingly popular in other countries.

How the intervention might work

The exact mechanism of action of the antidepressant effects of hypericum extracts is still unclear. Hypericum extracts contain at least seven constituents or groups of components that may contribute to its pharmacological effects (Nahrstedt 1997). These include naphthodianthrons (e.g., hypericins), flavonoids (e.g., quercetin), biflavonoids (e.g., biapigenin), xanthons, and phloroglucinol derivatives (e.g., hyperforin). Hypericum extracts have been shown to be active in a number of standard animal models that are used to indicate antidepressant effects (Wheatley 1998; Caccia 2005; Wurglics 2006). While some isolated substances, as for example hyperforin, have been shown to have antidepressant activity, the total extract seems to be more effective (Reichling 2003).

Why it is important to do this review

Hypericum extracts have been tested in a number of clinical trials since the 1980s. The first two versions of this review and other systematic reviews published between 1995 and 2000 concluded that hypericum extracts are more effective than placebo and are comparable to older antidepressants in the treatment of mild to moderate depressive disorders (Ernst 1995; Linde 1996; Linde 1998; Kim 1999; Gaster 2000; Williams 2000). Several trials included in these reviews were criticised because they included patients with few and/or mild symptoms who did not meet criteria for major depression, were conducted by primary care physicians who were not experienced in depression research, and/or used low doses of comparator drugs (Shelton 2001). In the 2005 update of our review (Linde 2005a; Linde 2005b) several new well‐designed placebo‐controlled trials were included, some of which had negative findings (Shelton 2001; HDTSG 2002) and which had spurred renewed debate about the efficacy of hypericum extracts. We systematically investigated possible reasons for the contradictory findings. We found that larger, more precise studies yielded less positive results, suggesting that small studies with a higher risk of bias might overestimate the effects of hypericum extracts over placebo. The analyses also showed that effects over placebo were less pronounced in studies restricted to patients with major depression. Finally, we had the impression that studies originating from German‐speaking countries (Germany, Austria, and Switzerland) had more positive results than studies originating from other countries independently from precision and formal diagnosis, although multiple regression analyses did not identify this as an independent predictor.

Since we completed the search for our 2005 update, again, several new well‐designed trials restricted to patients with major depression have been published. To sharpen the focus of this review, to reduce clinical heterogeneity, and to reflect the fact that almost all new high‐quality trials of hypericum extracts are restricted to patients with major depression, we decided to limit the review now to this group of patients.

Objectives

This updated review aimed to investigate whether extracts of hypericum:

  • are more effective than placebo and

  • as effective as standard antidepressant drugs, and

  • whether they have less adverse effects compared to standard antidepressant drugs

in the treatment of major depression in adults.

In addition, we investigated possible reasons for varying results across studies, with a focus on precision of the studies, baseline severity of depression, and country of origin.

Methods

Criteria for considering studies for this review

Types of studies

To be included trials had to be double‐blind and randomised.

Types of participants

Patients had to suffer from major depression (meeting DSM‐IV or ICD‐10 criteria). Trials in children (< 16 years) were not eligible. In previous versions of this review (Linde 1998; Linde 2005a) trials not restricted to patients with major depression had been included.

Types of interventions

Experimental intervention
Preparations of hypericum (St. John's wort). Trials investigating combinations of hypericum with other herbs were excluded.

Control intervention
Placebo or synthetic antidepressants (tricyclic and related antidepressants, selective serotonine reuptake inhibitors, serotonine‐noradrenaline reuptake inhibitors). Trials using clearly inadequate synthetic antidepressants (e.g., benzodiatepines) or a dosage clearly below the lower thresholds recommended in current guidelines (Härter 2003, ICSI 2007) were excluded.

Experimental and control treatments had to be given for at least four weeks.

Main comparisons
The following comparisons were performed:
1. hypericum extracts vs. placebo
2. hypericum extracts vs. standard antidepressants

Types of outcome measures

Primary outcome
To be included, trials had to measure clinical outcomes such as depression scales or symptoms. Trials that measured physiological parameters only were excluded. The primary outcomes of interest were

1. Effectiveness: treatment response

2. Safety: the proportion of patients who dropped out due to adverse effects

Secondary outcomes
1. Effectiveness: remission, depression scales such as the Hamilton Depression Scale (HAMD), the Clinical Global Impression Index (CGI), the Montgomery‐Asberg Depression Rating Scale (MADRS), patient‐rated depression scales

2. Safety: total proportion of drop‐outs, proportion of patients reporting adverse effects

Search methods for identification of studies

For the first version of the review we searched for published and unpublished eligible trials in the following ways:

1. Electronic searches
a) Clinical Trials Register of the Cochrane Collaboration Depression, Anxiety & Neurosis Group (CCDANTR)
b) database of the Cochrane Field for Complementary Medicine
c) full text searches in Medline SilverPlatter CD‐ROM from 1983 onwards and Embase 1989 onwards using the terms 'St. John's wort', 'Johanniskraut' (German for St John's wort), 'hyperic*')
d) full text searches in Psychlit and Psychindex 1987 ‐ 1997 CD‐ROM
e) searches in the private database Phytodok, Munich.

2. Searching other resources
a) Checking bibliographies of obtained articles
b) Contacting pharmaceutical companies and authors.
There were no language restrictions.

For the updated version of the review, we searched for published and unpublished eligible trials in the following ways:

1. Electronic searches
For the update, regular electronic searches were performed in CCDANTR (last search July 2007) and PubMed (screening all hits for text word "hypericum", last search July 8, 2008).

2. Searching other resources
We screened bibliographies of published articles, and repeatedly contacted experts, researchers, and manufacturers inquiring for new trials. One reviewer (KL) initially screened reference lists to identify controlled studies on hypericum preparations in humans. All possibly relevant studies or publications were then checked formally for eligibility.

Data collection and analysis

Selection of studies

Two reviewers independently decided on eligibility for the revised inclusion criteria. Disagreements were resolved through discussion. Due to reading errors, disagreements occurred for two trials in which not all patients had major depression and which were excluded after discussion (Vorbach 1994, Winkel 2000). In two trials both reviewers had problems with assessing eligibility: For one small, older trial (Lehrl 1993) the publication did not state that inclusion was limited to patients with major depression, but we had a statement of the sponsor obtained for our 1998 update that all patients met the criteria. As this information could not be verified for this update, we decided to exclude the trial. A Chinese trial (Gu 2001) referred to a Chinese classification. As this classification is not completely comparable to ICD‐10 and DSM‐IV, we decided to also exclude this trial.

Data extraction and management

Primary study characteristics and results were extracted by at least two independent reviewers using a pretested form. In particular, we extracted diagnoses and main inclusion criteria, age, gender, duration of episodes, baseline depression scores, country of origin, number and type of study centers, numbers of patients who were randomised and analysed and who completed protocols, the number and reasons for drop‐outs and withdrawals, numbers of patients reporting adverse effects, and the number and type of adverse effects that were reported.

We assessed numbers of patients who were classified as responders based on score improvements on the Hamilton Rating Scale for Depression (HAMD), the Clinical Global Impression Index (CGI; subscale global improvement rating as at least "much improved"), or any other clinical response measurement. Missing or additional information was sought from authors/sponsors.

Most trials measured clinical outcomes with the Hamilton Depression Scale (HAMD) and the Clinical Global Impression Index (CGI). The HAMD is an observer‐rated scale that focuses mainly on somatic symptoms of depression (Hamilton 1960). The original version includes 21 items, but a version with 17 items is more commonly used in clinical trials. Most studies using the HAMD report the number of 'treatment responders' (patients achieving a score less than 10 and/or less than 50% of the baseline score). When available, we extracted means and standard deviations before, during and after treatment as well as the number of 'responders'. The CGI (CGI 1970) is an observer rated instrument with three items (severity of illness, global improvement, and an efficacy index). We extracted the number of patients rated as 'much improved' or 'very much improved' for global improvement. As recently the Montgomery‐Asberg Depression Rating Scale (MADRS (Montgomery 1979)) and the remission criterion for the HAMD (usually a score of less than 8 at the end of treatment) have been gaining importance as outcome criteria, we also checked all trials for the reporting of these measures. As the DS (Depression Scale von Zerssen (von Zerssen 1996)) was the most often used patient‐rated instrument in the included trials, we extracted post‐treatment data for this scale, if available. For additional post‐hoc analyses, one reviewer (KL) also extracted data for other self‐rating instruments.

Assessment of risk of bias in included studies

The main part of the update process of this review was completed before the new risk of bias tool of the Cochrane Collaboration (Higgins 2008) was available. The methodological quality of each trial was assessed by at least two independent reviewers using scales developed by A. Jadad et al. (Jadad 1996) and by one of the reviewers (KL). The results of the quality scoring are displayed in the table of included studies.

The Jadad scale has three items adding up to a maximum score of five points. 0, 1 or 2 points can be given for randomisation (explicit statement that allocation was randomised and description of an adequate generation of the random sequence), 0, 1 or 2 points for double‐blinding (explicit statement that patients and evaluators were blinded and that treatments were indistinguishable), 0 or 1 point for description of drop‐outs and withdrawals (numbers and reasons for all compared groups separately). The display in the table of included studies is as follows (examples): 2‐2‐1 (full score in every item), 1‐0‐0 (only statement on randomisation).
The second quality scale, the "Internal Validity Scale" (IV), which has been used in other reviews on complementary medicine (Linde 1996b, Linde 1997) has six items with possible scores of 0, 0.5 or 1 point for each. Items 1 through 6 refer to statement of random allocation, adequacy of randomization concealment, baseline comparability, blinding of patients, blinding of evaluators, and likelihood of selection bias after allocation, respectively. Results are displayed by item in the table of included studies (e.g., 1‐1‐1‐0.5‐1‐1 represents a full score, with the exception of blinding of patients which was stated but treatment and placebo might have been distinguishable).

The assessments in the Jadad and IV scores are solely based on the information provided in the publication (as additional information could not be gathered for all studies). In the table 'Characteristics of included studies', however, additional information provided from authors or sponsors was included. This table also contains information on allocation concealment and attrition.

Measures of treatment effect

Our primary outcome measure, to assess the effectiveness of St John's wort versus placebo and versus other antidepressants, was the proportion of responders (according to the Hamilton Depression Scale (first preference) or other responder measures (second preference)) at the end of treatment, or in case of treatment phases longer than 6 weeks, at the time point defined for primary outcome measurement by the study investigators.

Secondary outcome measures were: proportion of responders according to HAMD, proportion of responders according to CGI, mean HAMD after treatment (or, if this was not available, difference after treatment ‐ baseline), at 2, 4, 6 to 8 weeks, and mean DS score after treatment (or, if this was not available, difference after treatment ‐ baseline).

The main outcome measure for the safety analysis was the proportion of patients who dropped out due to adverse effects. Secondary measures were the total proportion of drop‐outs and the proportion of patients reporting adverse effects.

Dichotomous outcomes
We used responder rate ratios (= relative risks = proportion of responders in the treatment group/proportion of responders in the control group) and their 95% confidence intervals for the analysis of treatment response. Responder rate ratios greater than 1 indicate better response in the hypericum group.

Due to highly variable frequency of side or adverse effects reported, odds ratios instead of rate ratios were calculated in the safety analyses. Odds ratios less than 1 indicate that fewer events occurred in the hypericum group.

Continuous outcomes
For HAMD and DS scores we calculated mean differences (also termed weighted mean differences). Negative mean differences indicate better response in the hypericum group.

Unit of analysis issues
Two trials with more than one hypericum group were included in the analyses as follows: Laakmann 1998 included an extract available on the market and an additional experimental extract with low hyperforin content which was never on the market and only used for control reasons. We did not include the data from the group receiving the experimental low‐hyperforin extract in the analyses. Kasper 2006 et al tested two dosages (600 and 1200 mg) of an available product. We pooled the data from these two groups to prevent that the control group of this trial would have been included in the analyses twice.

Dealing with missing data

Dichtomous outcomes
Responder proportions were calculated according to the intention to treat principle, counting drop‐outs as non‐responders. For the comparison hypericum extracts vs. standard antidepressants responder proportions were also calculated on a per protocol basis (as this is considered more appropriate to assess the equivalence of two treatments).

Continuous outcomes
If means and standard deviations from intent to treat analysis with missing values replaced were available, we preferably used these data. In other cases we used analysis based on available data.

Obtaining missing data
If the number of patients responding to treatment and means and/or standard deviations of HAMD scores after completion were not reported, we always tried to contact first or corresponding authors and/or sponsors to obtain these data. In general, we also tried to obtain other missing details on methods and secondary outcomes from authors or sponsors, but the extent to which we were doing this depended on the cooperation of authors/sponsors and the amount of missing information in the publications. We did not impute or recalculate missing standard deviations as these were unavailable only for a few secondary outcomes in a minority of trials.

We tried to contact authors and/or sponsors of 27 of the 29 included trials; for two trials (Kalb 2001; Laakmann 1998) this was considered unnecessary. We did not receive responses for five trials (Behnke 2002; Brenner 2000; Fava 2005; Harrer 1999; Moreno 2005). Very limited additional information was available or needed for three studies (Bjerkenstedt 2005; Volz 2000; Woelk 2000). We obtained relevant additional information to a variable extent from authors, sponsors, or both for the remaining 19 trials.

Data synthesis

The following comparisons were performed:
1. hypericum extracts vs. placebo: a) for dichotomous outcomes (response rate ratios); b) for continuous outcomes; c) for drop‐outs and adverse effects
2. hypericum extracts vs. standard antidepressants: a) for dichotomous outcomes; b) for continuous outcomes; c) for drop‐outs and adverse effects

All main analyses were performed using RevMan 5.

Due to the clinical diversity of the studied populations, the hypericum extracts and the comparison drugs used, we considered that the included studies did not estimate a common underlying effect, but rather that each individual study estimated its single and unique underlying effect. Thus, the application of random effects model in all analyses seemed to be appropriate.

The primary analysis for the comparison of response rate ratios (= relative risks) under treatment with hypericum extracts or placebo was a random‐effects intention to treat meta‐analysis stratified by study precision (above or below median of variance of treatment effect).

Assessment of heterogeneity
Heterogeneity of trials' results was tested with the Chi‐squared test, and the I‐squared statistic was calculated to give an estimate of the degree of heterogeneity. I‐squared values over 50% indicate considerable heterogeneity (Higgins 2003).

Investigation of heterogeneity and subgroup analyses
Predefined subgroup analyses were performed (a) including only trials with response operationalised with the HAMD score; (b) including only trials with response operationalized with the CGI; (c) for the type of extract investigated; and (d) comparing trials originating from German‐speaking countries and from other countries. Weighted mean differences for HAMD scores were calculated after therapy, at 2 to 3, 4, 6 to 8 weeks, and for differences compared to baseline values. For DS scores we calculated after therapy values, for MADRS score after therapy values and differences compared to baseline. As only relatively few studies used the DS, we performed an additional post‐hoc random effects analysis calculating standardised mean differences for any available patient‐rating scale (preferably end of treatment values, but if these were not available also differences from baseline) to investigate whether findings from physician‐rated instruments could be broadly reproduced.

The primary analysis for the comparison of responder rate ratios under treatment with hypericum extracts or standard antidepressants was a random effects intent to treat meta‐analysis stratified for type of synthetic antidepressant (selective serotonine inhibitors or older antidepressants). Predefined subgroup analyses were performed (a) using per protocol data; (b) stratified for country (German‐speaking Europe versus other countries); (c) including only trials with response operationalised with the HAMD score; and (d) including only trials with response operationalised with the CGI.

Additional meta‐regression analyses were performed to investigate the influence of country of origin (German‐speaking versus not German speaking), precision and HAMD baseline values on study findings (responder ratio and mean difference of HAMD scores after treatment) both in placebo and standard antidepressants comparisons. According to current recommendations of experts (Thompson 1999, Lipsey 2000), random effects meta‐regression analyses were carried out using the restricted information maximum likelihood (REML) method. A main advantage of this approach is that it accounts for residual between‐trial heterogeneity. Both univariable and multiple regression models were fitted. We calculated the proportion of explained heterogeneity variance by dividing the heterogeneity explained by the independent variable(s) through the total heterogeneity variance present in random‐effects meta‐analysis. When referring to a whole model, this coefficient was termed R2. When referring to the contribution of single covariates the coefficient was termed β2. In univariable meta‐regression analyses these coefficients are mathematically equal. In multiple meta‐regression analyses, sum of β2 values for all covariates may be slightly different from R2. For all meta‐regression analyses the Statistical Package for the Social Sciences (SPSS; Chicago, Illinois) v13.0 software using additional macros by Wilson (Wilson 2002; Lipsey 2000) was used.

Assessment of reporting biases
Visual analysis of funnel plots was performed to identify possible publication bias (Sterne 2001). Furthermore, the asymmetry coefficient was calculated for formal examination of publication bias (Egger 1997).

Results

Description of studies

Results of the search

A total of 79 possibly relevant studies were identified and checked formally for eligibility.

Included studies

Twenty nine trials including a total of 5489 (range 30 to 388) patients met inclusion criteria (see Characteristics of included studies).
Eighteen trials had a placebo‐control group (Bjerkenstedt 2005; Bracher 2001; Fava 2005; Gastpar 2006; HDTSG 2002; Hänsgen 1996; Kalb 2001; Kasper 2006; Laakmann 1998; Lecrubier 2002; Montgomery 2000; Moreno 2005; Philipp 1999; Schrader 1998; Shelton 2001; Uebelhack 2004; Volz 2000; Witte 1995), and 17 trials compared hypericum with standard antidepressants (Behnke 2002; Bjerkenstedt 2005; Brenner 2000; Fava 2005; Gastpar 2005; Gastpar 2006; Harrer 1993; Harrer 1999; HDTSG 2002; Moreno 2005; Philipp 1999; Schrader 2000; Szegedi 2005; van Gurp 2002; Vorbach 1997; Wheatley 1997; Woelk 2000). Six trials had both a placebo and a standard antidepressant control group (Bjerkenstedt 2005; Fava 2005; Gastpar 2006; HDTSG 2002; Moreno 2005; Philipp 1999). Eight trials are newly included since the last update (Bracher 2001; Fava 2005; Gastpar 2005; Gastpar 2006; Kasper 2006; Moreno 2005; Szegedi 2005; Uebelhack 2004) and one trial which had been included based on an abstract reference only is now included fully (Bjerkenstedt 2005). These eight new trials included a total of 1947 (range 72 to 388) patients. Details on patients, methods, interventions, outcomes, and results of all included studies are described in the table of included studies.

Types of participants
The severity of depression was described as mild to moderate in 19 trials, and as moderate to severe in 9 trials (one trial did not classify severity). Eighteen trials were from German‐speaking countries, four from the US, two from the UK, and one each from Brazil, Canada, Denmark, France and Sweden. Patients were recruited in private practices in all trials from German‐language countries, in the trials from Sweden (Bjerkenstedt 2005) and Canada (van Gurp 2002), and in one of the trials from the UK (Wheatley 1997). The second trial from the UK (Montgomery 2000) and the trial from France (Lecrubier 2002) were performed both in private practices and psychiatric outpatient departments. Three trials from the US (Shelton 2001; HDTSG 2002; Fava 2005) and the Brazilian trial (Moreno 2005) were performed in academic and/or community psychiatry research clinics. Two trials from the US and Denmark (Brenner 2000; Behnke 2002) did not report on the setting.

Types of intervention
A variety of hypericum preparations were studied in the trials. The range of daily extract doses varied between 240 and 1800 mg, but in most trials 500 to 1200 mg were used. The standard antidepressants used as active comparators were fluoxetine (6 trials, dosage 20 to 40 mg), sertraline (4 trials, 50 to 100 mg), imipramine (in 3 trials, dosage 100 to 150 mg), citalopram (1 trial, 20 mg), paroxetine (1 trial, 20 to 40 mg), maprotiline (1 trial, 75 mg), and amitriptyline (1 trial, 75 mg). The comparator dosage of maprotiline and amitriptyline were slightly below of those recommended in current guidelines (Härter 2003, ICSI 2007) and in most other studies at the minimum of recommended dosages. The treatment periods lasted 4 (1 trial), 6 (19), 7 (1), 8 (5) or 12 weeks (4 trials). Four trials included some long‐term follow‐up or continuation treatment after the main trial phase (Brenner 2000; Gastpar 2005; Shelton 2001; Szegedi 2005).

Types of outcome
The most frequently used instrument used for outcome measurement was the Hamilton Rating Scale for Depression (used in all trials). A variety of other ratings scales and instruments were used in addition.

Excluded studies

Fifty trials (see Characteristics of excluded studies) did not meet inclusion criteria: eight trials were not limited to patients with depression (Albertini 1986; Bendre 1980; Dittmer 1992; Hottenrott 1997; Maisenbacher 1995; Panijel 1985; Sindrup 2000; Volz 2002), four trials were on prevention or treatment of depressive symptoms in patients suffering primarily from other diseases (Häring 1996; Li 2005; Mo 2004; Werth 1989), two measured physiological parameters only (as EEG) in depressed patients (Czekalla 1997; Kugler 1990b), five did not include a placebo or standard drug comparison group (Bernhadt 1993; Lenoir 1999; Martinez 1993; Spielberger 1985; Zeller 2000), eight involved healthy volunteers (Brockmöller 1997; Herberg 1992; Johnson 1992; Johnson 1993; Schmidt 1993b; Schulz 1993; Staffeldt 1993; Wienert 1991), three tested combinations of hypericum and other herbal extracts (Ditzler 1992; Kniebel 1988; Steger 1985), and two compared hypericum extract with medications which are no longer considered adequate for depression (diazepam or bromazepam) (Kugler 1990a; Warnecke 1986); one of these trials also was not explicitly randomized. Due to the new exclusion criterion, we excluded 17 trials not restricted to patients with major depression. Fifteen had been included in the previous version of the review (Halama 1991; Harrer 1991; Hoffmann 1979; Hübner 1993; König 1993; Lehrl 1993; Osterheider 1992; Quandt 1993; Reh 1992; Schlich 1987; Schmidt 1989; Schmidt 1993; Sommer 1994; Vorbach 1994; Winkel 2000) while two were not (Agrawal 1994, for which it had not been possible to obtain a full copy and Gu 2001, which was newly identified in the update searches). Finally, we excluded one previously included trial as the standard antidepressant treatment was far below recommended dosages (30 mg Amitriptyline daily; Bergmann 1993).

Risk of bias in included studies

The majority of the trials were of high quality. The median quality scores were 5 (out of 5, range 2 to 5) for the Jadad scale and 4.5 (out of 6; range 2 to 6) for the IV scale (see quality rating of the single trials in the Characteristics of included studies).

Sequence generation/allocation concealment
The information on how the random sequence was generated was reported or provided on request for 18 trials (in all cases a computer program). Twenty two trials reported an adequate method of allocation concealment (most often consecutively numbered medication).

Blinding
All trials were described as double‐blind, but only one trial reported that blinding was tested (HDTSG 2002). In this three‐armed trial (hypericum vs. sertraline vs. placebo) about a third (as expected by chance alone) of guesses made by physicians were correct for hypericum and placebo patients, but in 66% of sertraline patients (p = 0.001).

Incomplete outcome data
In some trials attrition rates were high (for example, Fava 2005; see Characteristics of included studies). All placebo‐controlled trials included an intent to treat analysis.

Effects of interventions

Comparison 1: Hypericum extracts versus placebo

1. Effectiveness
a) Responder analyses

Sixteen of the 18 placebo‐controlled trials reported the number of patients classified as responders based on score reduction on the HAMD scale, one trial reported response according to the MADRS scale (Bracher 2001), and one trial only reported the proportion of patients rated at least as "improved" for the CGI (Volz 2002). Patients receiving hypericum extracts were significantly more likely to be responders (RR = 1.48; 95%CI 1.23 to 1.77; see comparison 1.1 and Figure 1) but study results were highly heterogeneous (I² = 75%). Effects in favour of hypericum extracts were less pronounced in more precise trials (RR = 1.28; 95%CI 1.10 to 1.49) compared to less precise trials (RR = 1.87; 95%CI 1.22 to 2.87) but heterogeneity was still strong in both subgroups (I² = 61% and 79%, respectively).


Forest plot of comparison: 1 Hypericum mono‐preparations vs. placebo A. Dichotomous measures, outcome: 1.1 Responder ‐ grouped by precision ‐ primary analysis.

Forest plot of comparison: 1 Hypericum mono‐preparations vs. placebo A. Dichotomous measures, outcome: 1.1 Responder ‐ grouped by precision ‐ primary analysis.

Findings were similar if response rates were based only on the trials reporting response according to the HAMD scale, or on the CGI (see comparisons 1.2 and 1.3). If trials investigating defined extracts were analysed separately (subgroups of trials testing the same extracts; see comparison 1.4), heterogeneity was strong in 3 of 4 subgroups.

Trials from German‐speaking countries reported more positive findings than trials from other countries (RR = 1.78; 95%CI 1.42 to 2.25 vs.1.07; 95% CI 0.88 to 1.31, respectively; see comparison 1.5 and Figure 2). Six trials reported remission rates. These were significantly higher in patients receiving hypericum extracts than in those receiving placebo (RR = 2.77; 95%CI 1.80 to 4.26; I² = 29%; see comparison 1.6).


Forest plot of comparison: 1 Hypericum mono‐preparations vs. placebo A. Dichotomous measures, outcome: 1.5 Responder among studies from German‐speaking countries and other studies.

Forest plot of comparison: 1 Hypericum mono‐preparations vs. placebo A. Dichotomous measures, outcome: 1.5 Responder among studies from German‐speaking countries and other studies.

There was significant funnel plot asymmetry for the main responder analysis (coefficient = 2.19, p = 0.03; Figure 3).


Funnel plot of comparison: 1 Hypericum mono‐preparations vs. placebo A. Dichotomous measures, outcome: 1.1 Responder ‐ grouped by precision ‐ primary analysis.

Funnel plot of comparison: 1 Hypericum mono‐preparations vs. placebo A. Dichotomous measures, outcome: 1.1 Responder ‐ grouped by precision ‐ primary analysis.

In univariable meta‐regression analyses, country of origin (studies from German‐speaking countries showing larger effect sizes; p = 0.002), precision (more precise studies showing smaller effects; p = 0.032) and baseline values (higher values associated with smaller effect sizes; p = 0.048) were significantly associated with effects sizes. In multiple analyses the association remained significant for country of origin (p = 0.035) and precision (p = 0.017) but became non‐significant for HAMD baseline values. Altogether over half of the variance (R² = 0.51) could be explained by these three variables. Findings of meta‐regression analyses are summarised in Appendix 1.

b) Analyses of depression scales

Analyses based on mean HAMD values yielded similar findings. At the completion of treatment HAMD values were 3.04 (95%CI 1.78 to 4.29) score points lower in hypericum groups compared to placebo groups, but there was strong heterogeneity (I² = 86%; see comparison 2.1). Effects over placebo were significant after 2 (comparison 2.2), 4 (comparison 2.3), 6 to 8 weeks of treatment (comparison 2.4), and for changes from baseline to end of treatment (comparison 2.5). Significant effects over placebo were also reported for the MADRS (comparisons 2.6, and 2.7). Studies from German‐language countries reported much larger effects over placebo (weighted mean difference = 4.29, 95%CI 2.97 to 5.61 score points; comparison 2.8) than studies from other countries (MD = 0.77 score points, 95%CI 0.20 to 1.74 score points).

There was no significant funnel plot asymmetry (coefficient in the analysis of HAMD values at completion of treatment = ‐2.12, p = 0.35).

In multiple meta‐regression analysis, country of origin was significantly associated with effects size (larger effects in trials from German‐speaking countries; p < 0.001) but not precision and HAMD baseline values (R² = 0.63; see Appendix 1).

The four trials reporting results for the patient‐rated von Zerssen Depression Scale (D‐S) showed a significant effect of hypericum extracts over placebo (comparison 2.9). Post‐hoc analyses using available data from 12 placebo‐controlled trials for a variety of self‐rating instruments also confirmed analyses based on physician‐rated outcomes. The pooled standardised mean difference (SMD) was ‐0.47 (95% CI ‐0.64 to ‐0.30; I² = 74%; see comparison 2.10). Trials from German‐speaking countries again reported more favourable findings than trials from other countries (SMDs of ‐0.57 and ‐0.17 respectively; see comparison 2.11).

2. Safety

Primary outcome
The number of patients dropping out for adverse effects was similar among patients receiving hypericum extracts and placebo (OR = 0.92, 95%CI 0.45 to 1.88, I2=0%; see comparison 3.1).

Secondary outcomes
The total number of patients dropping out and the number of patients dropping out for any reason were similar among patients receiving hypericum extracts and placebo (comparisons 3.2 and 3.3).

Comparison 2: Hypericum extracts versus standard antidepressants

1. Effectiveness

a) Responder analyses
All 17 trials comparing hypericum extracts to standard antidepressant treatment reported the number of responders according to the HAMD score. Based on an intention to treat approach the pooled responder rate ratio was 1.01 for all 17 trials (95%CI 0.93 to 1.09; I² = 17%; see comparison 4.1 and Figure 4). For the five trials comparing hypericum extracts with older antidepressants, the pooled estimate was 1.02 (95%CI 0.90 to 1.15; I² = 0%), and 1.00 for the 12 trials with selective serotonine reuptake inhibitors (95%CI 0.90 to 1.12; I² = 29%).


Forest plot of comparison: 4 Hypericum mono‐preparations vs. standard antidepressants. A. Dichotomous measures, outcome: 4.1 Responder (intent to treat) ‐ primary analysis.

Forest plot of comparison: 4 Hypericum mono‐preparations vs. standard antidepressants. A. Dichotomous measures, outcome: 4.1 Responder (intent to treat) ‐ primary analysis.

If analyses were based on per protocol data, the pooled responder rate ratio was 0.96 (95%CI 0.88 to 1.05; I² = 43%; see comparison 4.2). Analysis based on the CGI also found no relevant differences (RR = 1.01; 95%CI 0.94 to 1.09; I² = 24%; see comparison 4.3). In trials originating from German‐speaking countries findings were slightly more favourable to hypericum than in trials from other countries (RR 1.04 and 0.90, respectively; see comparison 4.4 and Figure 5). In the four trials reporting remission rates the response rate ratio was 1.24 (95%CI 1.02 to 1.50; I² = 0%; see comparison 4.5).


Forest plot of comparison: 4 Hypericum mono‐preparations vs. standard antidepressants. A. Dichotomous measures, outcome: 4.4 Responder among studies from German‐speaking studies and other studies.

Forest plot of comparison: 4 Hypericum mono‐preparations vs. standard antidepressants. A. Dichotomous measures, outcome: 4.4 Responder among studies from German‐speaking studies and other studies.

The asymmetry coefficient for the main responder analysis was ‐1.07 (p = 0.09; see funnel plot in Figure 6). In univariable meta‐regression analysis, there was a significant association between country of origin and response (trials from German‐speaking countries favouring hypericum; p = 0.037). In the multivariable meta‐regression analysis, none of the three tested predictors proved significant (R² = 0.24).


Funnel plot of comparison: 4 Hypericum mono‐preparations vs. standard antidepressants. A. Dichotomous measures, outcome: 4.1 Responder (intent to treat) ‐ primary analysis.

Funnel plot of comparison: 4 Hypericum mono‐preparations vs. standard antidepressants. A. Dichotomous measures, outcome: 4.1 Responder (intent to treat) ‐ primary analysis.

b) Analyses of depression scales
Analyses based on HAMD scores confirmed the findings of the responder analysis (see comparisons 5.1 to 5.5, and 5.8). Analyses of MADRS‐ and D‐S‐values are difficult to interpret, as only few trials reported these outcomes (see comparisons 5.6, 5.7, 5.9). There was no funnel plot asymmetry (0.30, p = 0.73). In the multivariable meta‐regression analysis, trials with higher HAMD baseline values showed less favourable results (p = 0.010), while country of origin and precision had no significant influence (R² = 0.44).

Again, post‐hoc analyses using available data for a variety of self‐rating instruments from 10 trials comparing hypericum extracts and standard antidepressants confirmed analyses based on physician‐rated outcomes. The pooled SMD was 0.01 (95%CI ‐0.13 to 0.15; I² = 43%; see comparison 5.10). The pooled SMDs in trials from German‐speaking countries was ‐0.02 compared to 0.10 in trials from other countries (comparison 5.11).

2. Safety

Primary outcome
Patients allocated to hypericum extracts were less likely to drop out from studies due to adverse effects than patients allocated to older standard antidepressants (OR = 0.24; 95%CI 0.13 to 0.46; I² = 0%) or to SSRIs (OR = 0.53; 95%CI 0.34 to 0.83; I² = 0%; see comparison 6.1 and Figure 7).


Forest plot of comparison: 6 Safety ‐ Hypericum mono‐preparations vs. standard antidepressants, outcome: 6.1 Number of patients discontinuing treatment/dropping out due to adverse/side effects.

Forest plot of comparison: 6 Safety ‐ Hypericum mono‐preparations vs. standard antidepressants, outcome: 6.1 Number of patients discontinuing treatment/dropping out due to adverse/side effects.

Secondary outcomes
Attrition for any reason was significantly lower for hypericum extracts compared to older antidepressants (OR = 0.67; 95%CI 0.47 to 0.95; I² = 0%), but not compared to SSRIs (OR = 0.83; 95% 0.63 to 1.08; I² = 0%; see comparison 6.2).

The number of patients reporting side effects was significantly higher in patients receiving older standard antidepressants (OR = 0.39; 95% 0.30 to 0.50; I² =0%); compared to SSRIs, the difference just missed significance (OR = 0.70; 95%CI 0.49 to 1.00; I² =57%; see comparison 6.3).

Discussion

Summary of main results

Overall, the findings from newer trials seem to corrobate the evidence in favour of hypericum extracts. The available data suggest that the hypericum extracts tested in the included trials a) are superior to placebo in patients with major depression; b) are similarly effective as standard antidepressants; c) and have less side effects than standard antidepressants. There are two issues which complicate the interpretation of our findings: 1) While the influence of precision on study results in placebo‐controlled trials is less pronounced in this updated version of our review compared to the previous version (Linde 2005a), results from more precise trials still show smaller effects over placebo than less precise trials. 2) Results from German‐language countries are considerably more favourable for hypericum than trials from other countries.

Interpretation of the findings and limitations

For this update we excluded for the first time from our review all trials which were not restricted to patients with major depression. This does not mean that we believe that major depression is necessarily the only or best indication for hypericum extracts. Some authors argue that patients with signs of atypical depression might be particularly suited for treatment with hypericum extracts (Murck 2002, Murck 2005), the National Institutes of Health are currently funding a trial in patients with minor depression (www.clinicaltrials.gov, identifier NCT00048815), and the findings from older trials (some of which, however, seem methodologically questionable) not restricted to patients with major discussion were very positive (Linde 1998). As pointed out in the introduction, we now focus on major depression to make our review better comparable to overviews on standard antidepressants, to have a more comparable set of studies for analysis, and also because almost all new trials of hypericum extracts are on this indication.

In spite of the tightened inclusion criteria, the findings of the placebo‐controlled trials are still quite heterogeneous. In the trials a variety of hypericum extracts has been tested and daily doses cover a wide range. Differences in the interventions might contribute to some extent to the observed heterogeneity, but they do not seem to be a major factor. In three of four subgroup analyses of single extracts there was strong heterogeneity; for one extract the 95% confidence intervals of the two available trials (Gastpar 2006; Uebelhack 2004) did not even overlap, indicating that the results are hardly compatible. However, some of the factors leading to considerable heterogeneity between study findings could be identified. Considering country of origin, precision, and baseline depression severity of included patients explained 50 to 60 percent of the variance between trial results in comparisons with placebo and 20 to 40 percent in comparison with standard antidepressants. Nevertheless, it has to be stated that meta‐regression analyses are (even if a priori defined) entirely of observative nature. Findings on the association of baseline depression severity and effect size estimates may be biased through structural dependence and regression to the mean, and thus should be interpreted with caution (Higgins 2008). Furthermore, inferences drawn from a meta‐regression analysis on aggregate data may differentiate from inferences drawn from a meta‐regression analysis on individual data (Deeks 2006; e.g. 'ecological bias').

The finding that more precise placebo‐controlled trials yielded less positive results than less precise trials could indicate publication bias (trials with positive results are more likely to be published than trials with negative results) or bias within studies (smaller trials with less rigorous methods yielding overoptimistic results). We cannot rule out, but doubt, that selective publication of overoptimistic results in small trials strongly influences our findings. There is some evidence that "negative" trials without demonstrable differences between extracts and placebo were published less often as full articles than trials with "positive" findings. Our extensive searches identified three "negative" trials that were only published as abstracts or theses. Two that were conducted in the early 1990s (these were included in earlier versions of this review Linde 1996; Linde 1998; Linde 2005a) involved patients without documented major depression (König 1993; Osterheider 1992), and one that was conducted in the late 1990s involved patients with major depression (Montgomery 2000). One positive trial included in our last update, but now excluded, was published as an abstract and as a chapter in a not widely available book (Winkel 2000). One comparably large, positive trial (Bracher 2001) has been published only in a short report as a supplement to a German medical newspaper. This trial is an example that sponsors or manufacturers of herbal medicines sometimes have very limited interest in a major publication if their trial includes a new aspect (in that case a once daily dosage), as there is no patent protection for herbal extracts and results can be exploited by competitors, too. We suspect that there are few additional relevant unpublished trials. Few manufacturers of hypericum extracts sponsor research trials, and the five manufacturers whose products were tested in most of the trials told us they had (with the exception of one smaller negative trial) no other unpublished trials that possibly met our inclusion criteria. Through personal communication we were informed that there are at least one or two unpublished negative trials on tea preparations of Hypericum. However, tea preparations are phytochemically very different from alcoholic extracts and have to be evaluated separately.

We found that the quality of the majority of trials was adequate, and we detected no systematic differences in design aspects known to be potential sources of bias. All trials were double‐blind. Though adequacy of blinding was not formally assessed in most trials, achieving similarity between hypericum and placebo preparations is not particularly difficult. All trials were randomised, and most concealed allocation assignments by using consecutively numbered identical medication containers. Reported drop‐out rates were low in the majority of trials. Investigators involved in older trials may have had less training and/or experience with diagnostic standards and rating scales for depressive symptoms (Shelton 2001), but this issue, if true, is likely to affect generalisation of findings rather than internal validity. Finally, though we found no systematic differences in major factors generally related to trial quality, our subjective judgement was that larger trials tended to be of better quality than smaller trials. The dosages of standard antidepressants were (with two exceptions) within the range recommended in current guidelines (e.g., Härter 2003), but at the lower limit.

Our finding that studies from German‐speaking countries yielded more favourable results than trials performed elsewhere is difficult to interpret. As our analyses are partly data‐driven, they must be considered cautiously. However, the consistency and extent of the observed association suggest that there are important differences in trials performed in different countries. One possibility is that studies performed in German‐speaking countries with a long history of hypericum prescription by physicians enrolled slightly different patients in spite of similar inclusion criteria. With one exception (the extremely positive trial by Uebelhack 2004 performed in a research clinic of a contract research organisation), all German studies recruited patients in private practices, while a number of trials from other countries were performed in academic research settings or hospital outpatient units. Depression with atypical or reversed vegetative features might be present more often in primary care outpatient populations (Murck 2005).The trend that trials with higher HAMD baseline values reported slightly less favourable results also suggests that effectiveness of hypericum extracts might differ between subgroups of depressive patients. While we did not systematically investigate this issue, it seems to us that the trials from countries other than Germany might be more often investigator‐initiated. A closer link of trial planning, performance and analysis with manufacturer interests might influence study findings. This could result possibly in true bias, but also in conditions making a true positive outcome more likely. For example, for at least three trials (Kasper 2006; Schrader 1998; Uebelhack 2004) with large effects performed in German‐speaking countries authors or sponsors reported in the publication or in personal communications that contact times and interaction with patients were limited to minimise placebo response rates. Increasing placebo group response rates due to the intensive care and monitoring in antidepressant trials are considered by some researchers as a potential reason for the problem to show specific effects (Posternak 2007). One could also speculate whether unblinding might lead German physicians (who often use hypericum extracts in their usual practice) to give more positive ratings and (the possibly more sceptic) colleagues from elsewhere to more negative ratings. However, as hypericum extracts have no characteristic side effects such a problem seems only relevant in comparisons with standard antidepressants.

Potential biases in the review process

The work for the first version of this review started in 1993 and three previous versions are available (Linde 1996; Linde 1998; Linde 2005a). During that period a large number of new trials became available, diagnostic classifications used for including patients into studies have changed and the quality of trials has improved. In parallel the methods of our review were adapted. The changes over time make it difficult to report our searches and their results in a consistent and transparent manner. The way how we approached authors/sponsors for obtaining missing information and the contents of inquiries were not fully systematic and have changed over time. This could imply that additional data necessary for some secondary analyses were obtained for a selected subset of studies. However, data for the main analysis were available for all or almost all trials, therefore, major biases seem highly unlikely. A potential source of bias in the responder analyses could be slightly variable responder definitions in the primary studies. Response according to the HAMD was either defined as at least 50% reduction, a HAMD score < 10 (or 11) after treatment, at least one or the combination of both. Whether these definitions were truly made a priori in each study could not be assessed. Decisions on the inclusion of subgroup analyses (for example, regarding precision or country effects) for updates were driven by findings in previous versions of the review. Therefore, these analyses must be interpreted with caution. Publication and small study bias have been discussed in the previous section.

Forest plot of comparison: 1 Hypericum mono‐preparations vs. placebo A. Dichotomous measures, outcome: 1.1 Responder ‐ grouped by precision ‐ primary analysis.
Figures and Tables -
Figure 1

Forest plot of comparison: 1 Hypericum mono‐preparations vs. placebo A. Dichotomous measures, outcome: 1.1 Responder ‐ grouped by precision ‐ primary analysis.

Forest plot of comparison: 1 Hypericum mono‐preparations vs. placebo A. Dichotomous measures, outcome: 1.5 Responder among studies from German‐speaking countries and other studies.
Figures and Tables -
Figure 2

Forest plot of comparison: 1 Hypericum mono‐preparations vs. placebo A. Dichotomous measures, outcome: 1.5 Responder among studies from German‐speaking countries and other studies.

Funnel plot of comparison: 1 Hypericum mono‐preparations vs. placebo A. Dichotomous measures, outcome: 1.1 Responder ‐ grouped by precision ‐ primary analysis.
Figures and Tables -
Figure 3

Funnel plot of comparison: 1 Hypericum mono‐preparations vs. placebo A. Dichotomous measures, outcome: 1.1 Responder ‐ grouped by precision ‐ primary analysis.

Forest plot of comparison: 4 Hypericum mono‐preparations vs. standard antidepressants. A. Dichotomous measures, outcome: 4.1 Responder (intent to treat) ‐ primary analysis.
Figures and Tables -
Figure 4

Forest plot of comparison: 4 Hypericum mono‐preparations vs. standard antidepressants. A. Dichotomous measures, outcome: 4.1 Responder (intent to treat) ‐ primary analysis.

Forest plot of comparison: 4 Hypericum mono‐preparations vs. standard antidepressants. A. Dichotomous measures, outcome: 4.4 Responder among studies from German‐speaking studies and other studies.
Figures and Tables -
Figure 5

Forest plot of comparison: 4 Hypericum mono‐preparations vs. standard antidepressants. A. Dichotomous measures, outcome: 4.4 Responder among studies from German‐speaking studies and other studies.

Funnel plot of comparison: 4 Hypericum mono‐preparations vs. standard antidepressants. A. Dichotomous measures, outcome: 4.1 Responder (intent to treat) ‐ primary analysis.
Figures and Tables -
Figure 6

Funnel plot of comparison: 4 Hypericum mono‐preparations vs. standard antidepressants. A. Dichotomous measures, outcome: 4.1 Responder (intent to treat) ‐ primary analysis.

Forest plot of comparison: 6 Safety ‐ Hypericum mono‐preparations vs. standard antidepressants, outcome: 6.1 Number of patients discontinuing treatment/dropping out due to adverse/side effects.
Figures and Tables -
Figure 7

Forest plot of comparison: 6 Safety ‐ Hypericum mono‐preparations vs. standard antidepressants, outcome: 6.1 Number of patients discontinuing treatment/dropping out due to adverse/side effects.

Comparison 1 Hypericum mono‐preparations vs. placebo A. Dichotomous measures, Outcome 1 Responder ‐ grouped by precision ‐ primary analysis.
Figures and Tables -
Analysis 1.1

Comparison 1 Hypericum mono‐preparations vs. placebo A. Dichotomous measures, Outcome 1 Responder ‐ grouped by precision ‐ primary analysis.

Comparison 1 Hypericum mono‐preparations vs. placebo A. Dichotomous measures, Outcome 2 Responder ‐ according to HAMD.
Figures and Tables -
Analysis 1.2

Comparison 1 Hypericum mono‐preparations vs. placebo A. Dichotomous measures, Outcome 2 Responder ‐ according to HAMD.

Comparison 1 Hypericum mono‐preparations vs. placebo A. Dichotomous measures, Outcome 3 Responder ‐ according to CGI (Clinical Global Impression Index at least "much improved").
Figures and Tables -
Analysis 1.3

Comparison 1 Hypericum mono‐preparations vs. placebo A. Dichotomous measures, Outcome 3 Responder ‐ according to CGI (Clinical Global Impression Index at least "much improved").

Comparison 1 Hypericum mono‐preparations vs. placebo A. Dichotomous measures, Outcome 4 Responder ‐ grouped by extract.
Figures and Tables -
Analysis 1.4

Comparison 1 Hypericum mono‐preparations vs. placebo A. Dichotomous measures, Outcome 4 Responder ‐ grouped by extract.

Comparison 1 Hypericum mono‐preparations vs. placebo A. Dichotomous measures, Outcome 5 Responder among studies from German‐speaking countries and other studies.
Figures and Tables -
Analysis 1.5

Comparison 1 Hypericum mono‐preparations vs. placebo A. Dichotomous measures, Outcome 5 Responder among studies from German‐speaking countries and other studies.

Comparison 1 Hypericum mono‐preparations vs. placebo A. Dichotomous measures, Outcome 6 Remission (HAMD score < 8 or < 7).
Figures and Tables -
Analysis 1.6

Comparison 1 Hypericum mono‐preparations vs. placebo A. Dichotomous measures, Outcome 6 Remission (HAMD score < 8 or < 7).

Comparison 2 Hypericum mono‐preparations vs. placebo. B. Continuous measures, Outcome 1 Mean HAMD (Hamilton Rating Scale for Depression) scores after therapy.
Figures and Tables -
Analysis 2.1

Comparison 2 Hypericum mono‐preparations vs. placebo. B. Continuous measures, Outcome 1 Mean HAMD (Hamilton Rating Scale for Depression) scores after therapy.

Comparison 2 Hypericum mono‐preparations vs. placebo. B. Continuous measures, Outcome 2 Mean HAMD (Hamilton Rating Scale for Depression) scores after 2 to 3 weeks of treatment.
Figures and Tables -
Analysis 2.2

Comparison 2 Hypericum mono‐preparations vs. placebo. B. Continuous measures, Outcome 2 Mean HAMD (Hamilton Rating Scale for Depression) scores after 2 to 3 weeks of treatment.

Comparison 2 Hypericum mono‐preparations vs. placebo. B. Continuous measures, Outcome 3 Mean HAMD (Hamilton Rating Scale for Depression) score after 4 weeks of treatment.
Figures and Tables -
Analysis 2.3

Comparison 2 Hypericum mono‐preparations vs. placebo. B. Continuous measures, Outcome 3 Mean HAMD (Hamilton Rating Scale for Depression) score after 4 weeks of treatment.

Comparison 2 Hypericum mono‐preparations vs. placebo. B. Continuous measures, Outcome 4 Mean HAMD (Hamilton Rating Scale for Depression) scores after 6 to 8 weeks of treatment.
Figures and Tables -
Analysis 2.4

Comparison 2 Hypericum mono‐preparations vs. placebo. B. Continuous measures, Outcome 4 Mean HAMD (Hamilton Rating Scale for Depression) scores after 6 to 8 weeks of treatment.

Comparison 2 Hypericum mono‐preparations vs. placebo. B. Continuous measures, Outcome 5 Difference HAMD (Hamilton Rating Scale for Depression) baseline ‐ end of treatment.
Figures and Tables -
Analysis 2.5

Comparison 2 Hypericum mono‐preparations vs. placebo. B. Continuous measures, Outcome 5 Difference HAMD (Hamilton Rating Scale for Depression) baseline ‐ end of treatment.

Comparison 2 Hypericum mono‐preparations vs. placebo. B. Continuous measures, Outcome 6 MADRS after treatment.
Figures and Tables -
Analysis 2.6

Comparison 2 Hypericum mono‐preparations vs. placebo. B. Continuous measures, Outcome 6 MADRS after treatment.

Comparison 2 Hypericum mono‐preparations vs. placebo. B. Continuous measures, Outcome 7 Difference MADRS baseline ‐ end of treatment.
Figures and Tables -
Analysis 2.7

Comparison 2 Hypericum mono‐preparations vs. placebo. B. Continuous measures, Outcome 7 Difference MADRS baseline ‐ end of treatment.

Comparison 2 Hypericum mono‐preparations vs. placebo. B. Continuous measures, Outcome 8 Mean HAMD after treatment in studies from German‐speaking countries and other studies.
Figures and Tables -
Analysis 2.8

Comparison 2 Hypericum mono‐preparations vs. placebo. B. Continuous measures, Outcome 8 Mean HAMD after treatment in studies from German‐speaking countries and other studies.

Comparison 2 Hypericum mono‐preparations vs. placebo. B. Continuous measures, Outcome 9 Mean Depression Scale von Zerssen (D‐S) after therapy/difference baseline ‐ after therapy.
Figures and Tables -
Analysis 2.9

Comparison 2 Hypericum mono‐preparations vs. placebo. B. Continuous measures, Outcome 9 Mean Depression Scale von Zerssen (D‐S) after therapy/difference baseline ‐ after therapy.

Comparison 2 Hypericum mono‐preparations vs. placebo. B. Continuous measures, Outcome 10 Various self‐rating scales.
Figures and Tables -
Analysis 2.10

Comparison 2 Hypericum mono‐preparations vs. placebo. B. Continuous measures, Outcome 10 Various self‐rating scales.

Comparison 2 Hypericum mono‐preparations vs. placebo. B. Continuous measures, Outcome 11 Various self‐rating scales in studies from German‐speaking countries and other countries.
Figures and Tables -
Analysis 2.11

Comparison 2 Hypericum mono‐preparations vs. placebo. B. Continuous measures, Outcome 11 Various self‐rating scales in studies from German‐speaking countries and other countries.

Comparison 3 Safety ‐ Hypericum mono‐preparations vs. placebo, Outcome 1 Number of patients discontinuing treatment/dropping out for adverse effects ‐ primary analysis.
Figures and Tables -
Analysis 3.1

Comparison 3 Safety ‐ Hypericum mono‐preparations vs. placebo, Outcome 1 Number of patients discontinuing treatment/dropping out for adverse effects ‐ primary analysis.

Comparison 3 Safety ‐ Hypericum mono‐preparations vs. placebo, Outcome 2 Number of patients dropping out.
Figures and Tables -
Analysis 3.2

Comparison 3 Safety ‐ Hypericum mono‐preparations vs. placebo, Outcome 2 Number of patients dropping out.

Comparison 3 Safety ‐ Hypericum mono‐preparations vs. placebo, Outcome 3 Number of patients reporting adverse effects.
Figures and Tables -
Analysis 3.3

Comparison 3 Safety ‐ Hypericum mono‐preparations vs. placebo, Outcome 3 Number of patients reporting adverse effects.

Comparison 4 Hypericum mono‐preparations vs. standard antidepressants. A. Dichotomous measures, Outcome 1 Responder (intent to treat) ‐ primary analysis.
Figures and Tables -
Analysis 4.1

Comparison 4 Hypericum mono‐preparations vs. standard antidepressants. A. Dichotomous measures, Outcome 1 Responder (intent to treat) ‐ primary analysis.

Comparison 4 Hypericum mono‐preparations vs. standard antidepressants. A. Dichotomous measures, Outcome 2 Responder (per protocol).
Figures and Tables -
Analysis 4.2

Comparison 4 Hypericum mono‐preparations vs. standard antidepressants. A. Dichotomous measures, Outcome 2 Responder (per protocol).

Comparison 4 Hypericum mono‐preparations vs. standard antidepressants. A. Dichotomous measures, Outcome 3 Responders according to CGI (Clinical Global Impression Index at least "much improved").
Figures and Tables -
Analysis 4.3

Comparison 4 Hypericum mono‐preparations vs. standard antidepressants. A. Dichotomous measures, Outcome 3 Responders according to CGI (Clinical Global Impression Index at least "much improved").

Comparison 4 Hypericum mono‐preparations vs. standard antidepressants. A. Dichotomous measures, Outcome 4 Responder among studies from German‐speaking studies and other studies.
Figures and Tables -
Analysis 4.4

Comparison 4 Hypericum mono‐preparations vs. standard antidepressants. A. Dichotomous measures, Outcome 4 Responder among studies from German‐speaking studies and other studies.

Comparison 4 Hypericum mono‐preparations vs. standard antidepressants. A. Dichotomous measures, Outcome 5 Remission (HAMD score < 8).
Figures and Tables -
Analysis 4.5

Comparison 4 Hypericum mono‐preparations vs. standard antidepressants. A. Dichotomous measures, Outcome 5 Remission (HAMD score < 8).

Comparison 5 Hypericum mono‐preparations vs. standard antidepressants. B. Continuous measures, Outcome 1 Mean HAMD (Hamilton Rating Scale for Depression) after therapy.
Figures and Tables -
Analysis 5.1

Comparison 5 Hypericum mono‐preparations vs. standard antidepressants. B. Continuous measures, Outcome 1 Mean HAMD (Hamilton Rating Scale for Depression) after therapy.

Comparison 5 Hypericum mono‐preparations vs. standard antidepressants. B. Continuous measures, Outcome 2 Mean HAMD (Hamilton Rating Scale for Depression) scores after 2 or 3 weeks of treatment.
Figures and Tables -
Analysis 5.2

Comparison 5 Hypericum mono‐preparations vs. standard antidepressants. B. Continuous measures, Outcome 2 Mean HAMD (Hamilton Rating Scale for Depression) scores after 2 or 3 weeks of treatment.

Comparison 5 Hypericum mono‐preparations vs. standard antidepressants. B. Continuous measures, Outcome 3 Mean HAMD (Hamilton Rating Scale for Depression) scores after 4 weeks of treatment.
Figures and Tables -
Analysis 5.3

Comparison 5 Hypericum mono‐preparations vs. standard antidepressants. B. Continuous measures, Outcome 3 Mean HAMD (Hamilton Rating Scale for Depression) scores after 4 weeks of treatment.

Comparison 5 Hypericum mono‐preparations vs. standard antidepressants. B. Continuous measures, Outcome 4 Mean HAMD (Hamilton Rating Scale for Depression) scores after 6 to 8 weeks of treatment.
Figures and Tables -
Analysis 5.4

Comparison 5 Hypericum mono‐preparations vs. standard antidepressants. B. Continuous measures, Outcome 4 Mean HAMD (Hamilton Rating Scale for Depression) scores after 6 to 8 weeks of treatment.

Comparison 5 Hypericum mono‐preparations vs. standard antidepressants. B. Continuous measures, Outcome 5 Difference HAMD (Hamilton Rating Scale for Depression) baseline ‐ end of treatment.
Figures and Tables -
Analysis 5.5

Comparison 5 Hypericum mono‐preparations vs. standard antidepressants. B. Continuous measures, Outcome 5 Difference HAMD (Hamilton Rating Scale for Depression) baseline ‐ end of treatment.

Comparison 5 Hypericum mono‐preparations vs. standard antidepressants. B. Continuous measures, Outcome 6 MADRS after treatment.
Figures and Tables -
Analysis 5.6

Comparison 5 Hypericum mono‐preparations vs. standard antidepressants. B. Continuous measures, Outcome 6 MADRS after treatment.

Comparison 5 Hypericum mono‐preparations vs. standard antidepressants. B. Continuous measures, Outcome 7 Difference MADRS baseline ‐ end of treatment.
Figures and Tables -
Analysis 5.7

Comparison 5 Hypericum mono‐preparations vs. standard antidepressants. B. Continuous measures, Outcome 7 Difference MADRS baseline ‐ end of treatment.

Comparison 5 Hypericum mono‐preparations vs. standard antidepressants. B. Continuous measures, Outcome 8 Mean HAMD after treatment in studies from German‐speaking countries and other studies.
Figures and Tables -
Analysis 5.8

Comparison 5 Hypericum mono‐preparations vs. standard antidepressants. B. Continuous measures, Outcome 8 Mean HAMD after treatment in studies from German‐speaking countries and other studies.

Comparison 5 Hypericum mono‐preparations vs. standard antidepressants. B. Continuous measures, Outcome 9 Mean D‐S (Depression Scale von Zerssen) scores after therapy.
Figures and Tables -
Analysis 5.9

Comparison 5 Hypericum mono‐preparations vs. standard antidepressants. B. Continuous measures, Outcome 9 Mean D‐S (Depression Scale von Zerssen) scores after therapy.

Comparison 5 Hypericum mono‐preparations vs. standard antidepressants. B. Continuous measures, Outcome 10 Various self‐rating scales.
Figures and Tables -
Analysis 5.10

Comparison 5 Hypericum mono‐preparations vs. standard antidepressants. B. Continuous measures, Outcome 10 Various self‐rating scales.

Comparison 5 Hypericum mono‐preparations vs. standard antidepressants. B. Continuous measures, Outcome 11 Various self‐rating scales in studies from German‐speaking countries and other countries.
Figures and Tables -
Analysis 5.11

Comparison 5 Hypericum mono‐preparations vs. standard antidepressants. B. Continuous measures, Outcome 11 Various self‐rating scales in studies from German‐speaking countries and other countries.

Comparison 6 Safety ‐ Hypericum mono‐preparations vs. standard antidepressants, Outcome 1 Number of patients discontinuing treatment/dropping out due to adverse/side effects.
Figures and Tables -
Analysis 6.1

Comparison 6 Safety ‐ Hypericum mono‐preparations vs. standard antidepressants, Outcome 1 Number of patients discontinuing treatment/dropping out due to adverse/side effects.

Comparison 6 Safety ‐ Hypericum mono‐preparations vs. standard antidepressants, Outcome 2 Number of patients dropping out.
Figures and Tables -
Analysis 6.2

Comparison 6 Safety ‐ Hypericum mono‐preparations vs. standard antidepressants, Outcome 2 Number of patients dropping out.

Comparison 6 Safety ‐ Hypericum mono‐preparations vs. standard antidepressants, Outcome 3 Number of patients reporting adverse effects.
Figures and Tables -
Analysis 6.3

Comparison 6 Safety ‐ Hypericum mono‐preparations vs. standard antidepressants, Outcome 3 Number of patients reporting adverse effects.

Comparison 1. Hypericum mono‐preparations vs. placebo A. Dichotomous measures

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Responder ‐ grouped by precision ‐ primary analysis Show forest plot

18

3064

Risk Ratio (M‐H, Random, 95% CI)

1.48 [1.23, 1.77]

1.1 Less precise trials

9

1020

Risk Ratio (M‐H, Random, 95% CI)

1.87 [1.22, 2.87]

1.2 More precise trials

9

2044

Risk Ratio (M‐H, Random, 95% CI)

1.28 [1.10, 1.49]

2 Responder ‐ according to HAMD Show forest plot

16

2706

Risk Ratio (M‐H, Random, 95% CI)

1.51 [1.22, 1.87]

2.1 Less precise trials

8

948

Risk Ratio (M‐H, Random, 95% CI)

1.94 [1.19, 3.18]

2.2 More precise trials

8

1758

Risk Ratio (M‐H, Random, 95% CI)

1.28 [1.06, 1.53]

3 Responder ‐ according to CGI (Clinical Global Impression Index at least "much improved") Show forest plot

13

2306

Risk Ratio (M‐H, Random, 95% CI)

1.47 [1.24, 1.74]

3.1 Less precise trials

7

869

Risk Ratio (M‐H, Random, 95% CI)

1.74 [1.30, 2.33]

3.2 More precise trials

6

1437

Risk Ratio (M‐H, Random, 95% CI)

1.26 [1.06, 1.50]

4 Responder ‐ grouped by extract Show forest plot

18

Risk Ratio (M‐H, Random, 95% CI)

Subtotals only

4.1 extract LI 160

6

981

Risk Ratio (M‐H, Random, 95% CI)

1.31 [0.92, 1.86]

4.2 extract WS 5570

2

699

Risk Ratio (M‐H, Random, 95% CI)

1.57 [0.96, 2.56]

4.3 extract WS 5572

2

170

Risk Ratio (M‐H, Random, 95% CI)

1.47 [1.05, 2.06]

4.4 extract STW3‐VI

2

401

Risk Ratio (M‐H, Random, 95% CI)

3.59 [0.41, 31.56]

4.5 other extracts

6

813

Risk Ratio (M‐H, Random, 95% CI)

1.45 [1.08, 1.93]

5 Responder among studies from German‐speaking countries and other studies Show forest plot

18

3064

Risk Ratio (M‐H, Random, 95% CI)

1.48 [1.23, 1.77]

5.1 Studies from German‐speaking countries

11

1770

Risk Ratio (M‐H, Random, 95% CI)

1.78 [1.42, 2.25]

5.2 Studies from other countries

7

1294

Risk Ratio (M‐H, Random, 95% CI)

1.07 [0.88, 1.31]

6 Remission (HAMD score < 8 or < 7) Show forest plot

6

1236

Odds Ratio (M‐H, Random, 95% CI)

2.77 [1.80, 4.26]

Figures and Tables -
Comparison 1. Hypericum mono‐preparations vs. placebo A. Dichotomous measures
Comparison 2. Hypericum mono‐preparations vs. placebo. B. Continuous measures

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Mean HAMD (Hamilton Rating Scale for Depression) scores after therapy Show forest plot

17

2871

Mean Difference (IV, Random, 95% CI)

‐3.04 [‐4.29, ‐1.78]

2 Mean HAMD (Hamilton Rating Scale for Depression) scores after 2 to 3 weeks of treatment Show forest plot

13

2299

Mean Difference (IV, Random, 95% CI)

‐1.22 [‐2.07, ‐0.37]

3 Mean HAMD (Hamilton Rating Scale for Depression) score after 4 weeks of treatment Show forest plot

11

1634

Mean Difference (IV, Random, 95% CI)

‐1.65 [‐2.78, ‐0.52]

4 Mean HAMD (Hamilton Rating Scale for Depression) scores after 6 to 8 weeks of treatment Show forest plot

15

2578

Mean Difference (IV, Random, 95% CI)

‐2.97 [‐4.31, ‐1.63]

5 Difference HAMD (Hamilton Rating Scale for Depression) baseline ‐ end of treatment Show forest plot

17

2931

Mean Difference (IV, Random, 95% CI)

‐3.03 [‐4.67, ‐1.39]

6 MADRS after treatment Show forest plot

3

640

Mean Difference (IV, Random, 95% CI)

‐3.86 [‐7.30, ‐0.42]

7 Difference MADRS baseline ‐ end of treatment Show forest plot

4

1015

Mean Difference (IV, Random, 95% CI)

‐3.01 [‐4.88, ‐1.14]

8 Mean HAMD after treatment in studies from German‐speaking countries and other studies Show forest plot

17

2871

Mean Difference (IV, Random, 95% CI)

‐3.04 [‐4.29, ‐1.78]

8.1 Studies from German‐speaking countries

11

1720

Mean Difference (IV, Random, 95% CI)

‐4.29 [‐5.61, ‐2.97]

8.2 Studies from other countries

6

1151

Mean Difference (IV, Random, 95% CI)

‐0.77 [‐1.74, 0.20]

9 Mean Depression Scale von Zerssen (D‐S) after therapy/difference baseline ‐ after therapy Show forest plot

4

411

Mean Difference (IV, Random, 95% CI)

‐3.72 [‐5.32, ‐2.12]

10 Various self‐rating scales Show forest plot

13

2330

Std. Mean Difference (IV, Random, 95% CI)

‐0.47 [‐0.64, ‐0.30]

10.1 von Zerssen Depression Scale (D‐S) after treatment

3

313

Std. Mean Difference (IV, Random, 95% CI)

‐0.62 [‐1.11, ‐0.14]

10.2 von Zerssen Depression Scale (D‐S) difference baseline ‐ after treatment

2

170

Std. Mean Difference (IV, Random, 95% CI)

‐0.90 [‐2.02, 0.22]

10.3 von Zerssen Adjective Mood Scale

2

401

Std. Mean Difference (IV, Random, 95% CI)

‐0.61 [‐0.84, ‐0.37]

10.4 Beck Depression Inventory

1

195

Std. Mean Difference (IV, Random, 95% CI)

‐0.28 [‐0.56, 0.00]

10.5 Beck Depression Inventory difference baseline ‐ after treatment

2

553

Std. Mean Difference (IV, Random, 95% CI)

‐0.31 [‐0.71, 0.09]

10.6 Zung Self Rating Depression Scale (SDS) difference baseline ‐ after treatment

1

146

Std. Mean Difference (IV, Random, 95% CI)

‐0.37 [‐0.73, ‐0.02]

10.7 Symptom Checklist (SCL‐58) depression score

1

375

Std. Mean Difference (IV, Random, 95% CI)

‐0.16 [‐0.37, 0.04]

10.8 von Zerssen Paranoid‐Depressivitäts‐Skala

1

177

Std. Mean Difference (IV, Random, 95% CI)

‐0.18 [‐0.48, 0.11]

11 Various self‐rating scales in studies from German‐speaking countries and other countries Show forest plot

13

2330

Std. Mean Difference (IV, Random, 95% CI)

‐0.47 [‐0.64, ‐0.30]

11.1 Studies from German‐speaking countries

10

1531

Std. Mean Difference (IV, Random, 95% CI)

‐0.57 [‐0.77, ‐0.37]

11.2 Studies from other countries

3

799

Std. Mean Difference (IV, Random, 95% CI)

‐0.17 [‐0.31, ‐0.04]

Figures and Tables -
Comparison 2. Hypericum mono‐preparations vs. placebo. B. Continuous measures
Comparison 3. Safety ‐ Hypericum mono‐preparations vs. placebo

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Number of patients discontinuing treatment/dropping out for adverse effects ‐ primary analysis Show forest plot

16

2784

Odds Ratio (M‐H, Random, 95% CI)

0.92 [0.45, 1.88]

2 Number of patients dropping out Show forest plot

16

2784

Odds Ratio (M‐H, Random, 95% CI)

0.87 [0.67, 1.12]

3 Number of patients reporting adverse effects Show forest plot

14

2496

Odds Ratio (M‐H, Random, 95% CI)

0.98 [0.78, 1.23]

Figures and Tables -
Comparison 3. Safety ‐ Hypericum mono‐preparations vs. placebo
Comparison 4. Hypericum mono‐preparations vs. standard antidepressants. A. Dichotomous measures

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Responder (intent to treat) ‐ primary analysis Show forest plot

17

2810

Risk Ratio (M‐H, Random, 95% CI)

1.01 [0.93, 1.09]

1.1 vs. older antidepressants

5

1016

Risk Ratio (M‐H, Random, 95% CI)

1.02 [0.90, 1.15]

1.2 vs. SSRIs

12

1794

Risk Ratio (M‐H, Random, 95% CI)

1.00 [0.90, 1.12]

2 Responder (per protocol) Show forest plot

17

2306

Risk Ratio (M‐H, Random, 95% CI)

0.96 [0.88, 1.05]

2.1 vs. older antidepressants

5

854

Risk Ratio (M‐H, Random, 95% CI)

0.93 [0.78, 1.11]

2.2 vs. SSRIs

12

1452

Risk Ratio (M‐H, Random, 95% CI)

0.97 [0.87, 1.08]

3 Responders according to CGI (Clinical Global Impression Index at least "much improved") Show forest plot

12

2234

Risk Ratio (M‐H, Random, 95% CI)

1.01 [0.94, 1.09]

3.1 vs. older antidepressants

4

692

Risk Ratio (M‐H, Random, 95% CI)

0.97 [0.87, 1.09]

3.2 vs. newer antidepressants

8

1542

Risk Ratio (M‐H, Random, 95% CI)

1.03 [0.92, 1.15]

4 Responder among studies from German‐speaking studies and other studies Show forest plot

17

2769

Risk Ratio (M‐H, Random, 95% CI)

1.00 [0.93, 1.09]

4.1 Studies from German‐speaking countries

9

1952

Risk Ratio (M‐H, Random, 95% CI)

1.04 [0.96, 1.13]

4.2 Studies from other countries

8

817

Risk Ratio (M‐H, Random, 95% CI)

0.90 [0.76, 1.06]

5 Remission (HAMD score < 8) Show forest plot

4

685

Risk Ratio (M‐H, Random, 95% CI)

1.24 [1.02, 1.50]

5.1 vs. older antidepressants

0

0

Risk Ratio (M‐H, Random, 95% CI)

0.0 [0.0, 0.0]

5.2 vs. SSRIs

4

685

Risk Ratio (M‐H, Random, 95% CI)

1.24 [1.02, 1.50]

Figures and Tables -
Comparison 4. Hypericum mono‐preparations vs. standard antidepressants. A. Dichotomous measures
Comparison 5. Hypericum mono‐preparations vs. standard antidepressants. B. Continuous measures

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Mean HAMD (Hamilton Rating Scale for Depression) after therapy Show forest plot

12

1889

Mean Difference (IV, Random, 95% CI)

‐0.39 [‐1.23, 0.45]

1.1 vs. older antidepressants

3

477

Mean Difference (IV, Random, 95% CI)

‐0.06 [‐1.82, 1.71]

1.2 vs. SSRIs

9

1412

Mean Difference (IV, Random, 95% CI)

‐0.52 [‐1.55, 0.51]

2 Mean HAMD (Hamilton Rating Scale for Depression) scores after 2 or 3 weeks of treatment Show forest plot

9

1529

Mean Difference (IV, Random, 95% CI)

‐0.12 [‐1.02, 0.78]

2.1 vs. older antidepressants

3

477

Mean Difference (IV, Random, 95% CI)

‐0.05 [‐1.31, 1.20]

2.2 vs. SSRIs

6

1052

Mean Difference (IV, Random, 95% CI)

‐0.25 [‐1.50, 1.00]

3 Mean HAMD (Hamilton Rating Scale for Depression) scores after 4 weeks of treatment Show forest plot

9

1367

Mean Difference (IV, Random, 95% CI)

‐0.34 [‐1.48, 0.80]

3.1 vs. older antidepressants

3

477

Mean Difference (IV, Random, 95% CI)

0.02 [‐1.11, 1.15]

3.2 vs. SSRIs

6

890

Mean Difference (IV, Random, 95% CI)

‐0.69 [‐2.44, 1.06]

4 Mean HAMD (Hamilton Rating Scale for Depression) scores after 6 to 8 weeks of treatment Show forest plot

10

1659

Mean Difference (IV, Random, 95% CI)

‐0.34 [‐1.24, 0.57]

4.1 vs. older antidepressants

2

391

Mean Difference (IV, Random, 95% CI)

‐0.21 [‐2.56, 2.14]

4.2 vs. SSRIs

8

1268

Mean Difference (IV, Random, 95% CI)

‐0.38 [‐1.46, 0.69]

5 Difference HAMD (Hamilton Rating Scale for Depression) baseline ‐ end of treatment Show forest plot

10

1652

Mean Difference (IV, Random, 95% CI)

‐0.35 [‐1.23, 0.52]

5.1 vs. older antidepressants

1

210

Mean Difference (IV, Random, 95% CI)

‐1.20 [‐3.29, 0.89]

5.2 vs. SSRIs

9

1442

Mean Difference (IV, Random, 95% CI)

‐0.25 [‐1.21, 0.71]

6 MADRS after treatment Show forest plot

1

108

Mean Difference (IV, Random, 95% CI)

‐0.90 [‐4.73, 2.93]

6.1 vs. older antidepressants

0

0

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

6.2 vs. SSRIs

1

108

Mean Difference (IV, Random, 95% CI)

‐0.90 [‐4.73, 2.93]

7 Difference MADRS baseline ‐ end of treatment Show forest plot

2

352

Mean Difference (IV, Random, 95% CI)

‐2.90 [‐5.10, ‐0.70]

7.1 vs. older antidepressants

0

0

Mean Difference (IV, Random, 95% CI)

0.0 [0.0, 0.0]

7.2 vs. SSRIs

2

352

Mean Difference (IV, Random, 95% CI)

‐2.90 [‐5.10, ‐0.70]

8 Mean HAMD after treatment in studies from German‐speaking countries and other studies Show forest plot

15

2423

Mean Difference (IV, Random, 95% CI)

‐0.39 [‐1.23, 0.45]

8.1 Studies from German‐speaking countries

9

1888

Mean Difference (IV, Random, 95% CI)

‐0.43 [‐1.28, 0.41]

8.2 Studies from other countries

6

535

Mean Difference (IV, Random, 95% CI)

‐0.44 [‐2.67, 1.79]

9 Mean D‐S (Depression Scale von Zerssen) scores after therapy Show forest plot

4

360

Mean Difference (IV, Random, 95% CI)

2.66 [0.83, 4.50]

9.1 vs. older antidepressants

2

272

Mean Difference (IV, Random, 95% CI)

2.81 [0.77, 4.85]

9.2 vs. SSRIs

2

88

Mean Difference (IV, Random, 95% CI)

2.04 [‐2.13, 6.21]

10 Various self‐rating scales Show forest plot

10

1570

Std. Mean Difference (IV, Random, 95% CI)

0.01 [‐0.13, 0.15]

10.1 von Zerssen Depression Scale (D‐S) after treatment

4

360

Std. Mean Difference (IV, Random, 95% CI)

0.28 [0.07, 0.49]

10.2 Beck Depression Inventory

1

83

Std. Mean Difference (IV, Random, 95% CI)

‐0.01 [‐0.44, 0.42]

10.3 Beck Depression Inventory difference baseline ‐ after treatment

2

466

Std. Mean Difference (IV, Random, 95% CI)

‐0.12 [‐0.53, 0.29]

10.4 Zung Self Rating Depression Scale (SDS) difference baseline ‐ after treatment

1

205

Std. Mean Difference (IV, Random, 95% CI)

‐0.11 [‐0.38, 0.17]

10.5 von Zerssen Adjective Mood Scale

2

456

Std. Mean Difference (IV, Random, 95% CI)

‐0.06 [‐0.24, 0.13]

11 Various self‐rating scales in studies from German‐speaking countries and other countries Show forest plot

10

1570

Std. Mean Difference (IV, Random, 95% CI)

0.01 [‐0.13, 0.15]

11.1 Studies from German‐speaking countries

6

1177

Std. Mean Difference (IV, Random, 95% CI)

‐0.02 [‐0.21, 0.18]

11.2 Studies from other countries

4

393

Std. Mean Difference (IV, Random, 95% CI)

0.10 [‐0.10, 0.29]

Figures and Tables -
Comparison 5. Hypericum mono‐preparations vs. standard antidepressants. B. Continuous measures
Comparison 6. Safety ‐ Hypericum mono‐preparations vs. standard antidepressants

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Number of patients discontinuing treatment/dropping out due to adverse/side effects Show forest plot

16

2785

Odds Ratio (M‐H, Random, 95% CI)

0.41 [0.29, 0.60]

1.1 vs. older antidepressants

5

1016

Odds Ratio (M‐H, Random, 95% CI)

0.24 [0.13, 0.46]

1.2 vs. SSRIs

11

1769

Odds Ratio (M‐H, Random, 95% CI)

0.53 [0.34, 0.83]

2 Number of patients dropping out Show forest plot

16

2785

Odds Ratio (M‐H, Random, 95% CI)

0.77 [0.62, 0.95]

2.1 vs. older antidperessants

5

1016

Odds Ratio (M‐H, Random, 95% CI)

0.67 [0.47, 0.95]

2.2 vs. SSRIs

11

1769

Odds Ratio (M‐H, Random, 95% CI)

0.83 [0.63, 1.08]

3 Number of patients reporting adverse effects Show forest plot

14

2663

Odds Ratio (M‐H, Random, 95% CI)

0.56 [0.43, 0.74]

3.1 vs. older antidepressants

5

1016

Odds Ratio (M‐H, Random, 95% CI)

0.39 [0.30, 0.50]

3.2 vs. SSRIs

9

1647

Odds Ratio (M‐H, Random, 95% CI)

0.70 [0.49, 1.00]

Figures and Tables -
Comparison 6. Safety ‐ Hypericum mono‐preparations vs. standard antidepressants