Scolaris Content Display Scolaris Content Display

Bias due to selective inclusion and reporting of outcomes and analyses in systematic reviews of randomised trials of healthcare interventions

Collapse all Expand all

Background

Systematic reviews may be compromised by selective inclusion and reporting of outcomes and analyses. Selective inclusion occurs when there are multiple effect estimates in a trial report that could be included in a particular meta‐analysis (e.g. from multiple measurement scales and time points) and the choice of effect estimate to include in the meta‐analysis is based on the results (e.g. statistical significance, magnitude or direction of effect). Selective reporting occurs when the reporting of a subset of outcomes and analyses in the systematic review is based on the results (e.g. a protocol‐defined outcome is omitted from the published systematic review).

Objectives

To summarise the characteristics and synthesise the results of empirical studies that have investigated the prevalence of selective inclusion or reporting in systematic reviews of randomised controlled trials (RCTs), investigated the factors (e.g. statistical significance or direction of effect) associated with the prevalence and quantified the bias.

Search methods

We searched the Cochrane Methodology Register (to July 2012), Ovid MEDLINE, Ovid EMBASE, Ovid PsycINFO and ISI Web of Science (each up to May 2013), and the US Agency for Healthcare Research and Quality (AHRQ) Effective Healthcare Program's Scientific Resource Center (SRC) Methods Library (to June 2013). We also searched the abstract books of the 2011 and 2012 Cochrane Colloquia and the article alerts for methodological work in research synthesis published from 2009 to 2011 and compiled in Research Synthesis Methods.

Selection criteria

We included both published and unpublished empirical studies that investigated the prevalence and factors associated with selective inclusion or reporting, or both, in systematic reviews of RCTs of healthcare interventions. We included empirical studies assessing any type of selective inclusion or reporting, such as investigations of how frequently RCT outcome data is selectively included in systematic reviews based on the results, outcomes and analyses are discrepant between protocol and published review or non‐significant outcomes are partially reported in the full text or summary within systematic reviews.

Data collection and analysis

Two review authors independently selected empirical studies for inclusion, extracted the data and performed a risk of bias assessment. A third review author resolved any disagreements about inclusion or exclusion of empirical studies, data extraction and risk of bias. We contacted authors of included studies for additional unpublished data. Primary outcomes included overall prevalence of selective inclusion or reporting, association between selective inclusion or reporting and the statistical significance of the effect estimate, and association between selective inclusion or reporting and the direction of the effect estimate. We combined prevalence estimates and risk ratios (RRs) using a random‐effects meta‐analysis model.

Main results

Seven studies met the inclusion criteria. No studies had investigated selective inclusion of results in systematic reviews, or discrepancies in outcomes and analyses between systematic review registry entries and published systematic reviews. Based on a meta‐analysis of four studies (including 485 Cochrane Reviews), 38% (95% confidence interval (CI) 23% to 54%) of systematic reviews added, omitted, upgraded or downgraded at least one outcome between the protocol and published systematic review. The association between statistical significance and discrepant outcome reporting between protocol and published systematic review was uncertain. The meta‐analytic estimate suggested an increased risk of adding or upgrading (i.e. changing a secondary outcome to primary) when the outcome was statistically significant, although the 95% CI included no association and a decreased risk as plausible estimates (RR 1.43, 95% CI 0.71 to 2.85; two studies, n = 552 meta‐analyses). Also, the meta‐analytic estimate suggested an increased risk of downgrading (i.e. changing a primary outcome to secondary) when the outcome was statistically significant, although the 95% CI included no association and a decreased risk as plausible estimates (RR 1.26, 95% CI 0.60 to 2.62; two studies, n = 484 meta‐analyses). None of the included studies had investigated whether the association between statistical significance and adding, upgrading or downgrading of outcomes was modified by the type of comparison, direction of effect or type of outcome; or whether there is an association between direction of the effect estimate and discrepant outcome reporting.

Several secondary outcomes were reported in the included studies. Two studies found that reasons for discrepant outcome reporting were infrequently reported in published systematic reviews (6% in one study and 22% in the other). One study (including 62 Cochrane Reviews) found that 32% (95% CI 21% to 45%) of systematic reviews did not report all primary outcomes in the abstract. Another study (including 64 Cochrane and 118 non‐Cochrane reviews) found that statistically significant primary outcomes were more likely to be completely reported in the systematic review abstract than non‐significant primary outcomes (RR 2.66, 95% CI 1.81 to 3.90). None of the studies included systematic reviews published after 2009 when reporting standards for systematic reviews (Preferred Reporting Items for Systematic reviews and Meta‐Analyses (PRISMA) Statement, and Methodological Expectations of Cochrane Intervention Reviews (MECIR)) were disseminated, so the results might not be generalisable to more recent systematic reviews.

Authors' conclusions

Discrepant outcome reporting between the protocol and published systematic review is fairly common, although the association between statistical significance and discrepant outcome reporting is uncertain. Complete reporting of outcomes in systematic review abstracts is associated with statistical significance of the results for those outcomes. Systematic review outcomes and analysis plans should be specified prior to seeing the results of included studies to minimise post‐hoc decisions that may be based on the observed results. Modifications that occur once the review has commenced, along with their justification, should be clearly reported. Effect estimates and CIs should be reported for all systematic review outcomes regardless of the results. The lack of research on selective inclusion of results in systematic reviews needs to be addressed and studies that avoid the methodological weaknesses of existing research are also needed.

Bias due to selective inclusion and reporting of outcomes and analyses in systematic reviews of randomised trials of healthcare interventions

A systematic review summarises evidence from multiple studies to answer a specific research question (e.g. what are the benefits and harms of a particular intervention for a particular health condition?). Often, there are many outcomes that systematic review authors could report to address their research question (e.g. pain, disability and quality of life for patients with musculoskeletal conditions) and many different results available for a particular outcome (e.g. a study might measure pain using three different scales at four time points). If the decision about which outcomes to investigate in a systematic review is made based on the results for those outcomes in the eligible studies, this may lead to bias. While, if the decision about which outcomes to report in a systematic review and the ways to report them is based on the results, this may mislead users of the systematic review.

This methodology review summarises the findings of studies examining the inclusion of results and reporting of outcomes in systematic reviews. We searched for studies indexed in electronic bibliographic databases up to May 2013. We included seven studies and found that outcomes investigated and reported in systematic reviews were often changed between the protocol and published systematic review. We also found that it was unclear whether the decision to make these changes was related to how statistically convincing the treatment effect for that outcome was. More studies are needed to confirm if this relationship exists. Also, one study found that some systematic reviews did not report all of the most important outcomes in the abstract of the review. Another study found that outcomes with a more statistically convincing result were more likely to be completely reported in the abstract than other outcomes. The studies that we included were limited to systematic reviews published before 2009. New studies are needed to examine the inclusion of results and reporting of outcomes in more recent systematic reviews.

Authors' conclusions

Implication for methodological research

Potential future studies could investigate: (1) potential selective inclusion of results in systematic reviews; (2) discrepant outcome reporting between registry entries and published systematic reviews (e.g. using registry entries in the PROSPERO database of prospectively registered systematic reviews); (3) discrepant outcome reporting between non‐Cochrane systematic review protocols and published systematic reviews (e.g. using protocols published in journals such as Systematic Reviews); (4) whether discrepant outcome reporting between protocols and published systematic reviews has reduced since the publication of the PRISMA Statement; (5) whether the prevalence of selective partial reporting in systematic review abstracts has reduced following publication of the PRISMA for Abstracts Statement; and (6) whether selective inclusion and reporting occurs less frequently in systematic reviews with a core outcome set. Where possible, investigators should assess both the prevalence and factors associated with selective inclusion and reporting in systematic reviews, and whether the associations are modified by the type of comparison, direction of effect and type of outcome.

Background

Systematic reviews and meta‐analyses of intervention studies provide evidence to decision makers on the benefits and harms of healthcare interventions, keep clinicians up‐to‐date, identify gaps in knowledge and provide recommendations for future research (Higgins 2011b; Moher 2009). However, various practices may affect the validity of systematic reviews. These practices include selective publication of studies (Begg 1988; Dickersin 1990) and selective reporting of a subset of outcomes and analyses within studies based on the results (e.g. statistical significance, magnitude or direction of effect). We refer to the latter as selective reporting (Hutton 2000; Song 2010).

Selective reporting can occur in various ways in randomised controlled trials (RCTs). Examples include omission of all data for a measured outcome, reporting data for only a subset of the time points measured, and reporting only subgroup analyses or sub‐scale data. Additional problems include the failure to report measures of variation (e.g. standard errors (SEs) or 95% confidence intervals (CIs)) or exact P values for non‐significant outcomes, and modification of 'primary' and 'secondary' outcome labels between registry entries, protocols and publications (Chan 2004a; Chan 2004b; Chan 2005; Dwan 2008; Dwan 2014; Mathieu 2009). Certain types of selective reporting in RCTs can lead to bias in the results of meta‐analyses (Kirkham 2010a; Williamson 2005a; Williamson 2005b).

Empirical evidence of selective reporting in RCTs exists. In two landmark studies, Chan and colleagues compared outcomes reported in RCT protocols to the methods and results sections of publications. They found that 71% of RCTs in one study and 88% in another had at least one unreported efficacy outcome (Chan 2004a; Chan 2004b). A systematic review of nine cohorts of RCTs found that at least one primary outcome was added, omitted or changed (upgraded from secondary to primary or downgraded from primary to secondary) in 4% to 50% of publications when compared with the registry entry or protocol. In addition, statistically significant outcomes were more often completely reported compared with non‐significant outcomes (range of odds ratios 2.4 to 4.7) where completely reported was defined as reporting sufficient data for inclusion in a meta‐analysis (Dwan 2011).

Description of the problem or issue

Most research on selective reporting has been undertaken at the RCT level (Norris 2013; Page 2013a). However, there is potential for similar processes to occur at the systematic review level, particularly when multiplicity of outcome data exists in RCT reports. Examples of multiplicity of data include multiple measurement scales, multiple time points and multiple analyses (e.g. when both final and change from baseline values, or both intention‐to‐treat and per‐protocol analyses are reported for a particular outcome) (Bender 2008; Tendal 2011). When multiplicity of data is present, systematic review authors may include a subset of outcome data in the review. There can be reasonable justifications for including a subset of outcome data in systematic reviews and meta‐analyses (e.g. choosing to include data collected using validated measurement scales only, or excluding surrogate outcomes that are of limited clinical relevance), but if the choice is made based on the results (which we refer to as 'selective inclusion of results') the corresponding meta‐analytic effect estimate may be biased (Page 2013a; Tendal 2009; Tendal 2011).

After inclusion in a systematic review, outcomes and analyses may be selectively reported in the same ways as occurs in RCTs (Kirkham 2010b; Moher 2007). For example, systematic review authors may not report results of all primary outcomes in the abstract of the review, or may modify the description of outcomes from primary to secondary or vice versa. Selective reporting can misrepresent the available evidence and mislead the users of systematic reviews regarding the importance of particular outcomes (Chalmers 1990; Liberati 2009; Moher 2009).

There are important differences between empirical investigations of selective reporting in systematic reviews and selective reporting in RCTs. Unlike RCTs, which involve prospective recruitment of participants and collection of outcome data, most systematic reviews are retrospective by nature, in that the studies included are usually identified after they have been completed and their results reported. This means that the outcomes and analyses reported in systematic reviews are dependent on the data available in the included studies. Therefore, when investigating bias due to selective inclusion or reporting in systematic reviews, the extent to which changes to planned outcomes and analyses occurred because none of the included studies measured the necessary outcome data (rather than because of the nature of the results) requires consideration by investigators.

Why it is important to do this review

Several empirical studies have assessed the prevalence of selective inclusion and reporting in systematic reviews of RCTs, and the factors associated with the prevalence, but we are unaware of any systematic reviews of such studies. If selective inclusion and reporting are found to occur frequently, interventions may be necessary to improve the conduct and reporting of systematic reviews. Examples of such interventions include: increasing review authors' awareness of guidelines for the reporting of outcomes in systematic reviews, particularly the Preferred Reporting Items for Systematic reviews and Meta‐Analyses (PRISMA) statement (Liberati 2009; Moher 2009) and the Methodological Expectations of Cochrane Intervention Reviews (MECIR) reporting standards (Chandler 2012); more detailed advice in guidance documents such as the Cochrane Handbook for Systematic Reviews of Interventions regarding these potential sources of bias (Higgins 2011a); the development of core outcome sets for a range of clinical conditions (Clarke 2007; COMET Initiative; Kirkham 2013); and registration of detailed protocols for all systematic reviews of healthcare interventions (Booth 2010; PLoS 2011).

Objectives

To summarise the characteristics and synthesise the results of empirical studies that have investigated the prevalence of selective inclusion or reporting in systematic reviews of RCTs, investigated the factors (e.g. statistical significance or direction of effect) associated with the prevalence and quantified the bias.

Methods

Criteria for considering studies for this review

Types of studies

We included both published and unpublished empirical studies that investigated the prevalence and factors associated with selective inclusion or reporting, or both, in systematic reviews of RCTs of healthcare interventions. If empirical studies included a mixture of systematic reviews of RCTs and non‐randomised studies (NRS) and reported data separately, we only included the findings for systematic reviews of RCTs. If it was not possible to separate the data, we contacted the study authors. If the data could not be separated, we presented the results but did not include them in the quantitative syntheses.

We included empirical studies that comprised a sample or a complete set of systematic reviews (e.g. all systematic reviews registered or published during a specific time period). The empirical studies could have assessed any type of selective inclusion or reporting, such as investigations of how frequently: (1) RCT outcome data is selectively included in systematic reviews based on the results; (2) outcomes and analyses are discrepant between protocol and published review; or (3) non‐significant outcomes are partially reported in the full text or summary within systematic reviews. We defined the full text of the systematic review as the text reported in the results section, tables, forest plots and data available via online appendices. Summaries within systematic reviews included the abstract, plain language summary or 'Summary of findings' tables. We excluded empirical studies assessing discrepancies in information other than outcomes and analyses, e.g. search strategy or inclusion criteria, as discrepancies of this nature were beyond the scope of our review.

Prior to the launch in February 2011 of PROSPERO, an international online prospective register of systematic reviews hosted by the Centre for Reviews and Dissemination (University of York, United Kingdom), access to systematic review protocols was generally limited to organisations such as The Cochrane Collaboration, the Campbell Collaboration and the Joanna Briggs Institute (Booth 2010; Booth 2011). Therefore, we anticipated that it was unlikely that empirical studies that included systematic reviews other than those coming from these organisations would exist at the time of our search. We also anticipated that empirical studies comparing systematic review registry entries to published systematic reviews would be unlikely to exist at this time. However, such studies may be included in future updates of this Cochrane Review. While these issues may limit the generalisability of some of the empirical studies identified, these issues are unlikely to apply to empirical studies investigating selective inclusion of RCT outcome data in systematic reviews or selective partial reporting in systematic reviews, because both of these types of selective inclusion and reporting can be assessed in systematic reviews without a systematic review protocol or registry entry.

Types of data

We included estimates of the prevalence of selective inclusion or reporting in systematic reviews of RCTs and estimates of the association between selective inclusion or reporting and any factors predictive of this (e.g. statistical significance or direction of the effect) as assessed by the investigators who did the original empirical studies. We anticipated that these factors may have been defined by investigators differently, but included any empirical study regardless of the definitions used. The terms "full" or "complete" versus "partial" reporting in either the full text or summary of the systematic review were also anticipated to be defined variously by investigators, but we included any empirical study regardless of the definitions used.

Types of methods

We focused on five types of selective inclusion and reporting of outcomes and analyses in this review. These included:

  • selective inclusion of RCT outcome data in systematic reviews;

  • discrepancies between systematic review registry entries and published systematic reviews;

  • discrepancies between systematic review protocols and published systematic reviews;

  • discrepancies between the full text and the summaries (i.e. abstract, plain language summary or 'Summary of findings' table) in systematic reviews;

  • selective partial reporting in systematic reviews.

Types of outcome measures

For each of the five types of methods, we included three primary outcomes:

  1. Overall prevalence of selective inclusion or reporting;

  2. Whether the overall prevalence of selective inclusion or reporting is associated with statistical significance of the effect estimate;

  3. Whether the overall prevalence of selective inclusion or reporting is associated with direction of the effect estimate.

We also included four secondary outcomes:

  1. Prevalence of specific examples of selective inclusion or reporting (see Appendix 1 for examples that we anticipated might be reported in empirical studies);

  2. Whether the prevalence of each specific example of selective inclusion or reporting is associated with statistical significance of the effect estimate;

  3. Whether the prevalence of each specific example of selective inclusion or reporting is associated with the direction of the effect estimate;

  4. Whether the overall prevalence or prevalence of specific examples of selective inclusion or reporting is associated with other factors investigated by the empirical study investigators (e.g. clinical area, funding of the systematic review).

We did not exclude studies based on the outcomes reported.

Search methods for identification of studies

Electronic searches

We searched the following electronic databases:

  • Cochrane Methodology Register (to July 2012, which was the last month in which new records were added to this database);

  • Ovid MEDLINE (January 1946 to May 2013);

  • Ovid EMBASE (January 1980 to May 2013);

  • Ovid PsycINFO (January 1806 to May 2013);

  • ISI Web of Science (January 1898 to May 2013).

The search strategies for each database are reported in Appendix 2. We also searched the US Agency for Healthcare Research and Quality (AHRQ) Effective Healthcare Program's Scientific Resource Center (SRC) Methods Library (SRC Methods Library 2013) on 5th June 2013, using the following Search Descriptors: "Bias ‐ outcome selection", "Bias ‐ Publication", "Bias ‐ Reporting", "Reporting", "Registries ‐ Systematic Reviews" and "Synthesis, Quantitative ‐ Reporting".

Searching other resources

We searched the abstract books of the 2011 and 2012 Cochrane Colloquia (which were not indexed in any electronic databases at the time of the search). We also searched the article alerts for methodological work in research synthesis published from 2009 to 2011, which are available in Research Synthesis Methods (Hafdahl 2010b; Hafdahl 2010a; Hafdahl 2011a; Hafdahl 2011b; Hafdahl 2012). One review author (MJP) screened the reference lists of all included studies and any relevant reviews identified from the search. One review author (MJP) contacted the authors of all included studies and relevant reviews, as well as individuals with content expertise, to assist with the identification of published, unpublished and ongoing studies.

Data collection and analysis

Selection of studies

Two of four review authors (MJP and either JK, KD or SK) independently screened the titles and available abstracts of all studies identified by the search against the inclusion criteria (see Criteria for considering studies for this review), and excluded any clearly irrelevant studies. We did not exclude studies based on the language of the publication. Two review authors (MJP and SK) independently assessed full‐text copies of reports of potentially eligible studies. The authors resolved disagreements regarding study inclusion by discussion, involving a third review author (JEM) when required.

Data extraction and management

Two of four review authors (MJP and either JK, KD or SK) independently extracted data using a data extraction form developed for this Cochrane Review. Any discrepancies between the authors were resolved through discussion until consensus was reached or by arbitration of a third review author (JEM) when required. All the data extractors pilot tested the data extraction form and modified it accordingly before use. One review author (MJP) compiled all comparisons and entered outcome data into Review Manager 5 (RevMan 2011). One review author (JEM) cross‐checked the data entered. For studies where relevant outcome data were not reported, one review author (MJP) requested further information from the study investigators. When unsuccessful, we included the study in the review and fully described it (e.g. using tables), but did not include it in any quantitative syntheses.

We extracted the following data from each study:

  • characteristics of the study, in particular the types of selective inclusion or reporting investigated, the number of included systematic reviews, methods for selecting the systematic reviews, areas of health care addressed, range of years of publication of the systematic reviews, proportion of systematic reviews that were Cochrane Reviews and methodological quality of the systematic reviews (however assessed by the study authors);

  • data on estimates of prevalence and association, as specified under Types of outcome measures;

  • the definition of factors investigated for their association with selective inclusion or reporting (e.g. statistical significance defined as P < 0.05);

  • the definition of "full" or "partial" reporting used by study investigators who assessed this practice;

  • any confounding variables assessed by the study authors (e.g. funding type).

Assessment of risk of bias in included studies

There is no standard tool available to evaluate the risk of bias of empirical studies eligible for inclusion in this review. We used the following criteria:

  1. What is the risk of selection bias in the empirical study? Low risk of bias: the empirical study included all systematic reviews registered during a specified time period (where registration is defined as registration of the review in an online database or publication of a protocol for the review), or included a random sample of systematic reviews. High risk of bias: the empirical study included a non‐random sample of systematic reviews. Unclear risk of bias: the sampling frame for the sample of systematic reviews of RCTs is unclear.

  2. What is the risk of selective reporting bias in the empirical study? Low risk of bias: all comparisons and outcomes reported in the protocol for the empirical study are fully reported in the results section of the publication. High risk of bias: not all comparisons and outcomes reported in the protocol for the empirical study are fully reported in the results section of the publication. Unclear risk of bias: it is unclear if all comparisons and outcomes are fully reported in the results section of the publication (e.g. because a protocol for the empirical study is not available) (Dwan 2011).

Two of four authors (MJP and either JK, KD or SK) rated each criterion independently. Any discrepancies between the authors were resolved through discussion until consensus was reached or by arbitration of a third author (JEM) when required.

Measures of the effect of the methods

The measures of prevalence and associations between a factor and selective inclusion or reporting were dependent on the summary statistics reported by the empirical study investigators, and we reported the data as available. For empirical studies that reported estimates of prevalence, we reported percentages with 95% CIs. We have presented possible prevalence estimates that we considered including in Appendix 1. For empirical studies investigating the association between a factor and selective inclusion or reporting, we reported risk ratios (RRs) with 95% CIs. We standardised outcomes so that they have a consistent meaning. For example, RRs > 1 denote a higher risk of selective inclusion or reporting bias. If the empirical study investigators reported odds ratios (ORs), we converted these to RRs using the formula provided in section 12.5.4.4 of theCochrane Handbook for Systematic Reviews of Interventions (Schünemann 2011). For studies that quantified bias in meta‐analytic estimates due to selective inclusion, we reported the estimates of bias as reported in the empirical studies.

Unit of analysis issues

Within the systematic reviews evaluated in an empirical study, there is potential for overlap of included RCTs. We contacted the authors of the empirical studies to enquire how they dealt with this issue of RCT overlap. We reported whether the issue was addressed and discussed its likely impact on the results of the empirical study. Furthermore, we anticipated that there might be some overlap in the systematic reviews included in the empirical studies in our review. Based on information regarding the types of reviews (i.e. Cochrane versus non‐Cochrane), years of publication, clinical condition and types of outcomes (e.g. dichotomous versus continuous), we reported how likely the potential for overlap is and how it may have impacted on the results of our review. If we suspected that two studies were likely to have a substantial proportion of overlapping systematic reviews, we performed sensitivity analyses by removing the study with the smaller number of included systematic reviews from the meta‐analysis.

Dealing with missing data

If the empirical studies had missing information on the characteristics of the included studies (e.g. methods for selecting the systematic reviews) or missing outcome data (e.g. measure of variation for a RR denoting the association between statistical significance and discrepant outcome reporting), we contacted the authors. We did not plan to impute any missing outcome data.

Assessment of heterogeneity

We assessed methodological heterogeneity by determining whether the characteristics of the included empirical studies were similar, particularly in terms of the areas of health care addressed by the systematic reviews and type of methodological comparisons undertaken. We assessed statistical heterogeneity by inspecting the forest plots and by calculating the I2 statistic with 95% CIs (Deeks 2011; Higgins 2002). We calculated 95% CIs for the I2 statistic using the non‐central Chi2 approximation implemented in the Stata module heterogi (this module can only be used when there are more than two studies included in a meta‐analysis) (Orsini 2006).

Assessment of reporting biases

To assess small study effects, we planned to generate funnel plots if at least 10 empirical studies examining the same methodological comparison and outcome met our inclusion criteria (Sterne 2011). To assess selective reporting in the included empirical studies, we compared outcomes reported in the empirical study protocol (if available) with outcomes reported in the results section of the publication. We searched for empirical study protocols in the electronic databases listed above (see Electronic searches). If a published protocol could not be identified, we requested one from the authors of the empirical study. If a protocol was not available, we compared the outcomes reported in the methods section of the publication with the outcomes reported in the results section of the publication.

Data synthesis

For each of the five types of selective inclusion and reporting, we combined estimates of the primary outcome of overall prevalence of selective inclusion or reporting (primary outcome #1) in a meta‐analysis using a random‐effects model. We used DerSimonian and Laird's method of moments estimator to estimate the between trial variance (DerSimonian 1986). When it was not possible to meta‐analyse (because similar measures of overall prevalence had not been measured in more than one empirical study or were incompletely reported), we reported percentages and 95% CIs for each empirical study in tables. For each of the five types of selective inclusion and reporting, we combined RR estimates of associations between overall prevalence of selective inclusion or reporting and statistical significance or direction of effect (primary outcomes #2 and #3) in a meta‐analysis using a random‐effects model. Where data could not be combined in a meta‐analysis, we reported RR estimates, 95% CIs and statistical significance for each empirical study in tables. We did not plan to meta‐analyse estimates of bias due to selective inclusion. Instead, we have reported estimates of bias as reported by the empirical study investigators in tables.

Subgroup analysis and investigation of heterogeneity

We planned to undertake subgroup analyses to investigate whether the primary prevalence and association outcomes were modified by whether the systematic reviews included in the empirical studies had a review protocol or not; and whether or not the clinical topic of the systematic reviews had an established core outcome set (COMET Initiative).

Sensitivity analysis

We planned to assess the robustness of meta‐analytic effect estimates based on the risk of bias of the included empirical studies. Specifically, we planned to compare meta‐analytic effect estimates that included all eligible empirical studies to meta‐analytic effect estimates that included empirical studies only if they were rated at low risk of selection bias, and low or unclear risk of selective reporting bias. We used the criterion of low or unclear risk of selective reporting bias because we anticipated that most empirical studies would not have a study protocol and hence would be rated as being at unclear risk of bias on this domain. We also planned to perform sensitivity analyses to determine the robustness of meta‐analysed effect estimates based on the different definitions of selective inclusion, discrepant outcome reporting or selective partial reporting used by the empirical study investigators. In cases where ORs were converted to RR, and we were unable to determine the baseline risk in the empirical study, we planned to undertake sensitivity analyses assuming a range of baseline risks from the other included studies.

We undertook a post‐hoc sensitivity analysis to investigate whether the meta‐analytic proportion was robust to the method of analysis. In our primary analysis, we meta‐analysed raw proportions. However, other methods have been shown to be preferential because they remove bias that can arise from assuming the normal approximation to the binomial and because of the correlation between the proportion and its variance (Trikalinos 2013). We undertook meta‐analyses of logit transformed proportions, arcsine transformed proportions, Freeman‐Tukey double arcsine transformed proportions (Freeman 1950) and fitted a random‐effects logistic regression (binomial‐normal model). These sensitivity analyses were performed using the 'metafor' package in the statistical package 'R' (Viechtbauer 2010).

Results

Description of studies

See Characteristics of included studies; Characteristics of excluded studies; Characteristics of studies awaiting classification; and Characteristics of ongoing studies.

Results of the search

The search of electronic databases identified a total of 5094 records. We identified 14 additional records through other sources. After we removed duplicates, 4660 records remained. From screening titles and abstracts, we excluded 4575 records that were not relevant to the review, and retrieved 85 full text reports for further examination. Of these, seven studies fulfilled the inclusion criteria. Two additional studies are awaiting classification as results are only available as a conference abstract and are currently being drafted for full publication (Johnston 2012; Middleton 2010) and one study is ongoing (Page 2013b). Figure 1 depicts a flow diagram of the study selection process.


Study flow diagram.

Study flow diagram.

Included studies

See Characteristics of included studies.

The seven included studies were published between 2002 and 2013. Four studies were published in a peer‐reviewed journal (Beller 2011; Dwan 2013a; Kirkham 2010b; Silagy 2002) while three were available as conference abstracts only (Hopewell 2010; Parmelli 2007; Vlassov 2008). The median number of included systematic reviews per study was 100 (range: 46 to 288). Six studies only included Cochrane Reviews (Dwan 2013a; Hopewell 2010; Kirkham 2010b; Parmelli 2007; Silagy 2002; Vlassov 2008), while one included both Cochrane and non‐Cochrane reviews (Beller 2011). The years of publication of systematic reviews included in the studies ranged from 1996 to 2009. In six studies, various areas of health care were addressed in the included systematic reviews, while one study only included systematic reviews of interventions for cystic fibrosis and genetic disorders (Dwan 2013a). No study assessed the methodological quality of included systematic reviews (e.g. using the Overview Quality Assessment Questionnaire (OQAQ, Oxman 1991) or AMSTAR (Shea 2007)). Only one study recorded the proportion of systematic reviews that only included RCTs (85%) (Dwan 2013a). However, five of the remaining six studies only included Cochrane Reviews (Hopewell 2010; Kirkham 2010b; Parmelli 2007; Silagy 2002; Vlassov 2008), which infrequently include NRS (CEU 2012; Moher 2007). No study assessed the extent of overlap of RCTs in the included systematic reviews. In terms of the types of selective inclusion and reporting investigated, no study investigated selective inclusion of RCT outcome data in systematic reviews, none investigated discrepancies between systematic review registry entries and published systematic reviews, four investigated discrepant outcome reporting between systematic review protocols and published systematic reviews (Dwan 2013a; Kirkham 2010b; Parmelli 2007; Silagy 2002), two investigated discrepant outcome reporting between the full text and abstract of systematic reviews (Hopewell 2010; Vlassov 2008), and one investigated selective partial reporting in systematic reviews (Beller 2011).

Excluded studies

See Characteristics of excluded studies.

We excluded 62 studies (70 full‐text reports). The main reasons for exclusion were: (i) 35 studies investigated the methods or reporting of systematic reviews using a quality checklist that did not include items focusing on selective inclusion or reporting (e.g. OQAQ, AMSTAR, QUOROM or PRISMA); (ii) five studies investigated the extent of multiplicity of data in RCT reports and the impact of multiplicity on the reliability of meta‐analysis results, but not whether results were selectively included by systematic review authors; (iii) one study investigated discrepancies in meta‐analytic effect estimates between pairs of systematic reviews of the same topic, but could not determine whether discrepancies were due to selective inclusion of RCT data or differential inclusion of trials based on some clinical or methodological rationale; and (iv) 21 studies investigated issues relating to other methodological components of systematic reviews, such as the quality of search strategies or study selection criteria (these 21 studies are not listed in the table of Characteristics of excluded studies but are available from the review authors on request).

Risk of bias in included studies

See Figure 2.


Risk of bias summary: review authors' judgements about each risk of bias item for each included study.

Risk of bias summary: review authors' judgements about each risk of bias item for each included study.

We rated all studies as having a low risk of selection bias because either a random sample of systematic reviews or an equivalent sampling process (e.g. systematic sampling) that was unlikely to introduce bias was used, or all systematic reviews registered or published during a particular time period were included in the study. Three studies were rated as having a low risk of selective reporting bias because data for all outcomes and analyses that were pre‐specified in an unpublished study protocol were either reported in the publications or provided by the authors (Beller 2011; Dwan 2013a; Kirkham 2010b). The remaining four studies were rated as having an unclear risk of selective reporting bias because it was unclear whether all measured and analysed outcomes were reported (Hopewell 2010; Parmelli 2007; Silagy 2002; Vlassov 2008).

Effect of methods

Selective inclusion of RCT outcome data in systematic reviews

We did not identify any studies.

Discrepancies between systematic review registry entries and published systematic reviews

We did not identify any studies.

Discrepancies between systematic review protocols and published systematic reviews

Four studies, including 485 Cochrane Reviews, reported the prevalence of systematic reviews with discrepant outcome reporting between the protocol and published systematic review (Dwan 2013a; Kirkham 2010b; Parmelli 2007; Silagy 2002).

Primary outcomes
Overall prevalence of selective reporting

To estimate the overall prevalence of discrepant outcome reporting, we combined prevalence estimates of systematic reviews with any discrepancy in at least one outcome between the protocol and published systematic review (e.g. adding a new outcome, omitting a protocol‐defined outcome, upgrading a secondary outcome to primary or downgrading a primary outcome to secondary). The combined prevalence of systematic reviews that added, omitted, upgraded or downgraded at least one outcome between the protocol and published systematic review was 38% (95% CI 23% to 54%; Figure 3). There was considerable statistical heterogeneity in the prevalence estimates (I2 = 91%, 95% CI 79% to 96%). The reason for this heterogeneity is unclear. Kirkham 2010b had a lower prevalence estimate than the other studies, but we could not identify any variation in the types of reviews included or anything methodologically unique about Kirkham 2010b that could explain the statistical heterogeneity. The studies included Cochrane Reviews published between 2002 and 2009, and one post‐hoc explanation for the observed heterogeneity that we considered was whether there had been changes to editorial processes used in Cochrane Reviews over time. However, to the best of our knowledge, the only relevant change to editorial processes was that 'Differences between protocol and review' was introduced as a standard heading in Cochrane Reviews in 2008. This heading may discourage authors from changing outcomes between the protocol and review, so its introduction could have reduced the prevalence of discrepant outcome reporting in Cochrane Reviews published after 2008. However, this change would only effect a small proportion of reviews included in Dwan 2013a and in none of the other studies so it does not explain the observed statistical heterogeneity.


Random‐effects meta‐analysis of proportion of systematic reviews with any discrepancy in at least one outcome from protocol to published systematic review.

Random‐effects meta‐analysis of proportion of systematic reviews with any discrepancy in at least one outcome from protocol to published systematic review.

Association between statistical significance and selective reporting

Two studies investigated the association between statistical significance and discrepant outcome reporting (Kirkham 2010b; Parmelli 2007). The authors of Dwan 2013a and Silagy 2002 confirmed that neither study investigated this association. In both Kirkham 2010b and Parmelli 2007, statistical significance of the meta‐analysis was defined as P < 0.05 and types of discrepancies included adding, upgrading or downgrading outcomes between the protocol and the published systematic review. The unit of analysis in both studies was the outcome, which means that more than one outcome per systematic review may have contributed to the analysis. Kirkham 2010b also examined the association at the systematic review level (i.e. the number of systematic reviews with at least one discrepant outcome).

The association between statistical significance and adding or upgrading an outcome between protocol and published systematic review was uncertain. Our meta‐analysis of both studies analysed at the outcome level, including 552 meta‐analyses, suggested an increased risk of adding or upgrading when the outcome was statistically significant, although the 95% CI included no association and a decreased risk as plausible estimates of association (RR 1.43, 95% CI 0.71 to 2.85; Figure 4; Analysis 1.1). The analysis at the review level in Kirkham 2010b produced a similar effect estimate (RR 1.24, 95% CI 0.57 to 2.66; N = 139 systematic reviews).


Forest plot of association between statistical significance and outcome adding/upgrading.

Forest plot of association between statistical significance and outcome adding/upgrading.

There was also uncertainty in the association between statistical significance and downgrading an outcome between protocol and published systematic review. Downgrading was less common than adding or upgrading (Figure 5). Our meta‐analysis of both studies analysed at the outcome level, including 484 meta‐analyses, suggested an increased risk of downgrading when the outcome was statistically significant although the 95% CI included no association and a decreased risk as plausible estimates of association (RR 1.26, 95% CI 0.60 to 2.62; Figure 5; Analysis 1.2). Also, the direction of the estimated association differed for these two studies. The RR when analysed at the review level in Kirkham 2010b was below one, although the 95% CI included no association and an increased risk as plausible estimates of association (RR 0.79, 95% CI 0.21 to 3.00; N = 126 systematic reviews).


Forest plot of association between statistical significance and outcome downgrading.

Forest plot of association between statistical significance and outcome downgrading.

Association between direction of the effect estimate and selective reporting

No study investigated the association between direction of the effect estimate and discrepant outcome reporting (confirmed via contact with authors of all four studies).

Secondary outcomes

The prevalences of specific types of discrepant outcome reporting (e.g. adding a new outcome, downgrading a primary outcome to secondary) are shown in Table 1.

Table 1. Prevalence of specific types of discrepant outcome reporting between systematic review protocols and published systematic reviews

Type of discrepant outcome reporting

Silagy 2002

(2002)1

N = 47

Parmelli 2007

(2005 to 2006)1

N = 104

Kirkham 2010b

(2006 to 2007)1

N = 288

Dwan 2013a

(2006 to 2009)1

N = 46

Any discrepancy in at least one outcome (primary, secondary, or unlabelled) from protocol to published systematic review

22 (47%)

49 (47%)

64 (22%)

18 (39%)

Any discrepancy in at least one primary outcome from protocol to published systematic review

‐‐

‐‐

48 (17%)

‐‐

Upgrade of at least one outcome from secondary or unlabelled in the protocol to primary in the published systematic review

‐‐

11 (11%)

24 (8%)

3 (7%)

Downgrade of at least one outcome from primary in the protocol to secondary or unlabelled in the published systematic review

‐‐

8 (8%)

12 (4%)

10 (22%)

Addition of at least one new outcome (primary, secondary or unlabelled) in the published systematic review that was not specified in the protocol

9 (19%)

32 (31%)

‐‐

7 (15%)

Addition of at least one new primary outcome in the published systematic review that was not specified in the protocol

‐‐

‐‐

8 (3%)

3 (7%)

Addition of at least one new secondary outcome in the published systematic review that was not specified in the protocol

‐‐

‐‐

‐‐

4 (9%)

Addition of new measurement instruments or criteria for existing outcomes in the published systematic review that were not specified in the protocol

7 (15%)

‐‐

‐‐

‐‐

Omission of at least one outcome (primary, secondary or unlabelled) from the published systematic review which was listed in the protocol

6 (13%)

23 (22%)

‐‐

5 (11%)

Omission of at least one primary outcome from the published systematic review which was listed in the protocol

‐‐

‐‐

7 (2%)

2 (4%)

Omission of at least one secondary outcome from the published systematic review which was listed in the protocol

‐‐

‐‐

‐‐

3 (7%)

1indicates range of years of publication of included systematic reviews.

'‐‐' indicates that the type of discrepancy was not examined by the study authors.

In two studies, the reasons for discrepant outcome reporting were sought in the reports of the published systematic reviews (Dwan 2013a; Kirkham 2010b). Kirkham 2010b reported that only four of 64 (6%) and Dwan 2013a reported that only four of 18 (22%) systematic reviews described the reasons for discrepant outcome reporting. In two studies, the systematic review authors were contacted to seek reasons for the discrepancies (Kirkham 2010b; Silagy 2002). Kirkham 2010b contacted the authors of 48 systematic reviews with any discrepancy in at least one primary outcome, of whom 34 replied but only 28 could recall the reason for the discrepancy. Of these 28 systematic reviews, there was potential for bias in eight (29%) as changes were made to the primary outcome after reading the results of the individual trials. Silagy 2002 contacted authors (65% response rate) to clarify the reasons for discrepancies in any section of the systematic review (e.g. background, search strategy, outcomes). However, we were unable to determine which reasons were relevant to discrepancies in outcomes. Despite this, Silagy 2002 reported that none of respondents' stated reasons for discrepancies appeared to be related to knowledge of the results of individual trials. Parmelli 2007 did not assess reasons for discrepant outcome reporting.

Discrepancies between full text and abstract of systematic reviews

Two studies, including 152 Cochrane Reviews, reported the prevalence of systematic reviews with discrepant outcome reporting between the full text and abstract (Hopewell 2010; Vlassov 2008).

Primary outcomes

None of the primary outcomes for this review were measured in these studies (confirmed via contact with the authors).

Secondary outcomes

One study reported the prevalence of systematic reviews that reported only a subset of primary outcomes from the full text of the review in the abstract (Hopewell 2010). Vlassov 2008 confirmed that he did not assess this outcome. Of 62 Cochrane Reviews, 20 (32%, 95% CI 21% to 45%) did not report all primary outcomes in the abstract, where reporting was defined as presenting an effect estimate or at least stating whether or not the effect estimate was statistically or clinically significant. Of the 20 systematic reviews that did not report all primary outcomes in the abstract, the reason for non‐reporting was because either none of the included trials reported data for the outcome (nine reviews) or the reason was unclear (11 reviews). There was insufficient data reported to determine whether there was an association between statistical significance and non‐reporting of primary outcomes in the abstract.

Both studies reported the prevalence of systematic reviews presenting the results of a secondary outcome before the results of any primary outcomes in the abstract. No overlapping systematic reviews were included in the two studies. The combined prevalence of systematic reviews reporting secondary outcomes before a primary outcome was 14% (95% CI 7% to 20%; Figure 6). Insufficient data was reported to determine whether there was an association between statistical significance and the order in which outcomes were reported.


Random‐effects meta‐analysis of proportion of systematic reviews presenting the results of a secondary outcome before the results of the primary outcome(s) in the abstract.

Random‐effects meta‐analysis of proportion of systematic reviews presenting the results of a secondary outcome before the results of the primary outcome(s) in the abstract.

Selective partial reporting in systematic reviews

One study, including 182 systematic reviews (64 Cochrane Reviews and 118 non‐Cochrane reviews), assessed the completeness of reporting of the primary outcome in the abstracts of systematic reviews (Beller 2011).

Primary outcomes

None of the primary outcomes for this review were measured in this study (confirmed via contact with the author).

Secondary outcomes

The association between statistical significance and complete reporting of primary outcomes in the abstract was reported. The study authors defined complete reporting as meeting the following minimum criteria: "The direction and size of effect can be determined from the wording or deduced numerically, and a measure of precision of the effect size was stated (e.g. CI)". Statistically significant primary outcomes were more likely to be completely reported in the abstract than non‐significant primary outcomes (85/112 versus 20/70; RR 2.66, 95% CI 1.81 to 3.90).

Subgroup and sensitivity analyses

There were a few overlapping systematic reviews across the Dwan 2013a and Kirkham 2010b studies. In a sensitivity analysis, removal of Dwan 2013a (which included fewer systematic reviews) resulted in the same meta‐analytic estimate of the prevalence of systematic reviews with at least one discrepant outcome between the protocol and published systematic review (38%, 95% CI 18% to 58%; N = 439 systematic reviews). We were unable to undertake any of our other planned subgroup analyses because there was either an insufficient number of studies or insufficient data available.

Our post‐hoc sensitivity analysis that investigated the impact of combining proportions using a different transformation (logistic), variance stabilising functions (arcsine and double arcsine) and random effects logistic regression found that the meta‐analytic estimate was robust to the method used. The meta‐analytic proportion and its CI was 0.38 (95% CI 0.24 to 0.54) using the logit transformation; 0.38 (95% CI 0.23 to 0.54) using the arcsine transformation; 0.38 (95% CI 0.23 to 0.54) using the double arcsine transformation; and 0.37 (95% CI 0.26 to 0.49) in the random‐effects logistic regression.

Discussion

Summary of main results

Based on a meta‐analysis of four studies, 38% (95% CI 23% to 54%) of systematic reviews added, omitted, upgraded or downgraded at least one outcome between the protocol and published systematic review. The association between statistical significance and discrepant outcome reporting was uncertain. Two studies found that reasons for discrepant outcome reporting were infrequently reported in published systematic reviews. One study found that 32% (95% CI 21% to 45%) of systematic reviews did not report all primary outcomes in the abstract. Another study found that statistically significant primary outcomes were more likely to be completely reported in the systematic review abstract than non‐significant primary outcomes (RR 2.66, 95% CI 1.81 to 3.90).

Overall completeness and applicability of evidence

Some of the types of selective inclusion or reporting which we specified in our protocol have not been empirically investigated. No study has investigated selective inclusion of RCT outcome data in systematic reviews, although one such study is currently underway by authors of this review (Page 2013b). This study will explore whether, in the presence of multiplicity of outcome data, trial effect estimates selected for inclusion in a sample of meta‐analyses are systematically more or less favourable to the intervention than what would be expected under a process consistent with random selection. No study has investigated discrepancies between systematic review registry entries and published systematic reviews, perhaps because a publicly available register for systematic review protocols was only launched in February 2011 (Booth 2013). Also, selective partial reporting of outcomes has only been evaluated in abstracts, not in the full text of systematic reviews.

No study had investigated another type of selective reporting that we did not define in our protocol, but identified through other work (Page 2013a). This is the practice where systematic review authors select a meta‐analytic effect estimate to report after conducting multiple meta‐analyses of the different data available in trial reports for a particular outcome. For example, meta‐analyses of pain could include final values only, change from baseline values only or a mixture of these two, and systematic review authors may choose to report one of these meta‐analytic effects based on the results. One way to investigate this type of selective reporting of analyses would be to compare the analysis plans reported in the systematic review protocol to the analyses reported in the published systematic review. Similar studies have been conducted at the RCT level (Al‐Marzouki 2008; Chan 2008; Vedula 2009; Vedula 2013). These types of investigations are strengthened by retrieval of the studies included in each systematic review to determine whether the systematic review authors could have analysed the RCT data in multiple ways.

None of the included studies evaluated systematic reviews published after 2009. Several initiatives occurred around 2009 that may have reduced the prevalence of selective reporting in more recent reviews. For example, a 'Differences between protocol and review' heading was introduced as a standard heading in Cochrane Reviews in 2008. This heading may discourage authors from changing outcomes between the protocol and review unless they have a good reason to do so. In July 2009, the PRISMA Statement appeared (updating the QUOROM statement), which advises that changes to the outcomes made after the review has started should be described and that justification should be provided. In addition, the PRISMA Statement advises that when available, summary data per group, effect estimates and CIs should be reported for all outcomes considered in the systematic review, regardless of their results (Liberati 2009; Moher 2009). The PRISMA for Abstracts Statement (published April 2013) advises authors to clearly indicate in the abstract the protocol‐defined, pre‐specified importance of each outcome (i.e. primary or secondary) and to not only report outcomes with statistically or clinically significant results (Beller 2013). Similar recommendations are provided in the MECIR reporting standards (published in December 2012), which summarise attributes of reporting considered either mandatory (compliance required for publication) or highly desirable (expected but may be justifiably not done) for Cochrane intervention reviews (Chandler 2012). Given that improvements have been seen in the quality of reporting of RCTs following journal endorsement of the CONSORT reporting guideline (Turner 2012), it might be expect that similar improvements will occur for systematic reviews.

The four studies investigating discrepant outcome reporting between systematic review protocols and published systematic reviews only focused on Cochrane Reviews. This limits the generalisability of the findings, as Cochrane Reviews comprise a minority of all published systematic reviews (Moher 2007). It is not surprising that discrepant outcome reporting has not been investigated in non‐Cochrane reviews since only 10% of non‐Cochrane reviews explicitly report working from a review protocol, with 4% working from a publicly available protocol (Turner 2013). The potential for selective reporting is possibly higher in such reviews, for similar reasons outlined in regards to selective reporting in NRS (Norris 2013). First, systematic review authors not working from a protocol may approach the selection of outcomes to include in an iterative manner, changing their position about the importance of particular outcomes while they identify eligible studies, analyse the data and draft the manuscript. In this scenario, the results of included studies, coupled with existing beliefs about the effectiveness of an intervention, may bias the decisions about the final set of outcomes to report in the review (rather than adhering to a pre‐specified clinical rationale guiding such decisions). Second, review authors who pre‐specify outcomes in a protocol may feel less obliged to follow it if it is not publicly available (Stewart 2012). Protocols for systematic reviews conducted outside of organisations such as The Cochrane Collaboration are becoming increasingly available with the launch of the PROSPERO database of prospectively registered systematic reviews (Booth 2013), a new BioMed Central journal (Systematic Reviews) for systematic reviews and associated research (Moher 2012), and the upcoming PRISMA for Protocols (PRISMA‐P) reporting guideline (Shamseer 2013). These initiatives will allow for future investigations of selective inclusion and reporting in non‐Cochrane reviews.

An unanticipated finding in our review was that the meta‐analytic risk of downgrading an outcome between protocol and published systematic review was increased when the meta‐analysis result for the downgraded outcome was statistically significant. This is in contrast to the association previously found at the RCT level, where outcomes were more likely to be downgraded from trial registry entry to published RCT if the result was non‐significant (Mathieu 2009). While this observed association may be a result of chance, it may be the case that the impetus for upgrading and downgrading outcomes differs depending on the direction of effect, type of outcome and the review authors' preconceptions about the effectiveness and safety of the interventions. For example, in systematic reviews comparing active interventions with placebo, motivation to find an effective, safe intervention may be greater, thus statistically significant efficacy outcomes which favour the active intervention may be more likely to be upgraded, while statistically significant safety outcomes that favour the placebo may be more likely to be downgraded. However, in systematic reviews that compare two active interventions, there may be motivation to both upgrade and downgrade outcomes if systematic review authors have preconceptions about the effectiveness and safety of the included interventions. Neither Kirkham 2010b or Parmelli 2007 reported the percentage of systematic reviews by type of outcome or type of comparison, nor did they examine whether the association was modified by these two factors (confirmed via personal communication). Further investigation of these issues would be useful.

None of the included studies reported sufficient data to enable investigation of whether the prevalence and association between statistical significance and discrepant outcome reporting were modified by the use of a core outcome set, which is a standardised set of outcomes to measure in all clinical trials and include in all systematic reviews of a particular condition (Williamson 2012a; Williamson 2012b). The use of a core outcome set would be expected to reduce bias in systematic reviews. A 2012 survey of Co‐ordinating Editors of Cochrane Review Groups showed that 36% of the groups have a centralised policy regarding which outcomes to include in the 'Summary of Findings' table, which is a standardised table presenting results of up to seven of the most important outcomes of the review (Kirkham 2013).

Quality of the evidence

In studies investigating discrepant outcome reporting between the protocol and published systematic review (Dwan 2013a; Kirkham 2010b; Parmelli 2007; Silagy 2002), methodological weaknesses in the empirical studies exist. The risk of bias was rated as low for both domains in two studies (Dwan 2013a; Kirkham 2010b), while two studies had a low risk of selection bias but an unclear risk of selective reporting bias (Parmelli 2007; Silagy 2002). The 95% CI of the combined prevalence estimate was wide, although all estimates within the confidence limits were of concern. The studies could have benefited from examining if the association between statistical significance and changing the status of an outcome (adding, upgrading or downgrading) was modified by the type of comparison, direction of effect and type of outcome (efficacy or safety). Some studies missed opportunities for analysis, e.g. two studies did not investigate the factors associated with the discrepant outcome reporting (Dwan 2013a; Silagy 2002), and two studies did not assess reasons for discrepant outcome reporting (Parmelli 2007; Silagy 2002). These additions would have helped to disentangle bias‐related reasons from non‐bias‐related reasons (e.g. omitting a protocol‐defined outcome from the published systematic review because no RCTs measured the outcome (Liberati 2009; Stewart 2012)).

Methodological weaknesses also exist in studies investigating selective reporting practices in systematic review abstracts (Beller 2011; Hopewell 2010; Vlassov 2008). The risk of bias was rated as low for both domains in one study (Beller 2011), while the other two studies had a low risk of selection bias but an unclear risk of selective reporting bias (Hopewell 2010; Vlassov 2008). Neither of the two studies investigating discrepant outcome reporting between the full text and abstract formally investigated factors associated with discrepant outcome reporting (e.g. statistical significance or direction of effect).

Potential biases in the review process

There are some limitations to our methods. While we believe that all relevant, published studies were identified, we have no way of verifying this because a validated search strategy for locating empirical studies of selective inclusion and reporting does not exist. As outlined under Differences between protocol and review, we modified the search strategies that we had specified in our protocol for Ovid MEDLINE, Ovid EMBASE, Ovid PsycInfo and ISI Web of Science because the original search strategies yielded an unexpectedly large number of citations for systematic reviews of intervention studies. These citations were most likely retrieved because terms such as 'bias' or 'selective reporting' were mentioned in the abstract of the systematic review. We therefore modified our original search strategies to a similar one used in Dwan 2011, and these updated strategies were reviewed by an information specialist. We also supplemented our original search strategies with searches of three sources that are comprised solely of records of methodological work in research synthesis. In addition, we were able to obtain and use unpublished data from four studies (Beller 2011; Hopewell 2010; Parmelli 2007; Vlassov 2008), which is likely to have reduced the chance of publication bias impacting on the results of our review.

Two of us (JK and KD) were authors of two of the included studies (Dwan 2013a; Kirkham 2010b). Neither author was involved in the data extraction or risk of bias assessment for these two studies. We could have conducted risk of bias assessments under blinded conditions (i.e. where the assessor is unaware of the study author's name, institution, sponsorship, journal, etc.) but evidence regarding whether this process yields systematically different assessments is inconsistent (Morissette 2011).

While we planned to synthesise results of empirical studies including only systematic reviews of RCTs, we found that most studies did not record whether the included systematic reviews included RCTs only. However, since most of the evidence in this review is based on Cochrane Reviews, and Cochrane Reviews infrequently included NRS prior to 2012 (CEU 2012; Moher 2007), we believe our results are unlikely to be affected in an important way by the inclusion of some systematic reviews of NRS.

As outlined under Differences between protocol and review, we re‐labelled one of the five types of selective inclusion and reporting which we planned to focus on (from 'partial reporting in systematic reviews' to 'selective partial reporting in systematic reviews') while working on the review. This is because we identified 13 studies that evaluated systematic reviews using a reporting quality checklist, such as the PRISMA Statement, during the study screening for this review (Assendelft 1995; Auperin 1997; Aytug 2012; CEU 2011; CEU 2012; Faggion Jr 2014; Gianola 2013; Ma 2012; Minozzi 2006a; Minozzi 2006b; Moseley 2009; Roundtree 2009; Shea 2006). These 13 studies collected data on whether outcomes were partially reported (e.g. reporting an effect estimate without a measure of variation). However, the authors defined the reporting practices they evaluated as markers of "reporting quality", not "selective reporting", and none attempted to investigate whether the reasons for partial reporting were related to the results (i.e. selective partial reporting). Studies evaluating "reporting quality" of systematic reviews fall outside the scope of our review.

Agreements and disagreements with other studies or reviews

A 2008 systematic review that synthesised results of methodology reviews investigating any type of bias that can occur throughout the systematic review process did not identify any reviews examining biases associated with selecting studies for inclusion, selecting outcomes, synthesising studies or reporting outcomes (Tricco 2008). To the best of our knowledge, no subsequent systematic reviews of studies investigating such practices have been published.

The most similar systematic review to ours is a methodology review of studies examining discrepant outcome reporting at the RCT level (Dwan 2011). This review identified six studies comparing outcomes between protocols and published reports of RCTs, and three studies comparing outcomes between trial registry entries and published reports of RCTs. The studies examined heterogeneous samples of RCTs and were not combined in a meta‐analysis. The authors reported that at least one primary outcome was added, omitted, upgraded or downgraded in 4% to 50% of RCT publications. The equivalent outcome in our review had a narrower range of prevalence estimates (i.e. 22% to 47%). In another systematic review, statistical significance was associated with complete reporting of outcomes in the full text of RCTs (Dwan 2013b). The magnitude of this association was similar to the association between statistical significance and complete reporting of the primary outcome in the systematic review abstract, as reported in Beller 2011.

Study flow diagram.
Figures and Tables -
Figure 1

Study flow diagram.

Risk of bias summary: review authors' judgements about each risk of bias item for each included study.
Figures and Tables -
Figure 2

Risk of bias summary: review authors' judgements about each risk of bias item for each included study.

Random‐effects meta‐analysis of proportion of systematic reviews with any discrepancy in at least one outcome from protocol to published systematic review.
Figures and Tables -
Figure 3

Random‐effects meta‐analysis of proportion of systematic reviews with any discrepancy in at least one outcome from protocol to published systematic review.

Forest plot of association between statistical significance and outcome adding/upgrading.
Figures and Tables -
Figure 4

Forest plot of association between statistical significance and outcome adding/upgrading.

Forest plot of association between statistical significance and outcome downgrading.
Figures and Tables -
Figure 5

Forest plot of association between statistical significance and outcome downgrading.

Random‐effects meta‐analysis of proportion of systematic reviews presenting the results of a secondary outcome before the results of the primary outcome(s) in the abstract.
Figures and Tables -
Figure 6

Random‐effects meta‐analysis of proportion of systematic reviews presenting the results of a secondary outcome before the results of the primary outcome(s) in the abstract.

Comparison 1 Association between outcome discrepancies and statistical significance, Outcome 1 Outcome "added/upgraded".
Figures and Tables -
Analysis 1.1

Comparison 1 Association between outcome discrepancies and statistical significance, Outcome 1 Outcome "added/upgraded".

Comparison 1 Association between outcome discrepancies and statistical significance, Outcome 2 Outcome "downgraded".
Figures and Tables -
Analysis 1.2

Comparison 1 Association between outcome discrepancies and statistical significance, Outcome 2 Outcome "downgraded".

Comparison 1. Association between outcome discrepancies and statistical significance

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Outcome "added/upgraded" Show forest plot

2

Risk Ratio (M‐H, Random, 95% CI)

Subtotals only

1.1 Unit of analysis: outcomes

2

552

Risk Ratio (M‐H, Random, 95% CI)

1.43 [0.71, 2.85]

1.2 Unit of analysis: systematic reviews

1

139

Risk Ratio (M‐H, Random, 95% CI)

1.24 [0.57, 2.66]

2 Outcome "downgraded" Show forest plot

2

Risk Ratio (M‐H, Random, 95% CI)

Subtotals only

2.1 Unit of analysis: outcomes

2

484

Risk Ratio (M‐H, Random, 95% CI)

1.26 [0.60, 2.62]

2.2 Unit of analysis: systematic reviews

1

126

Risk Ratio (M‐H, Random, 95% CI)

0.79 [0.21, 3.00]

Figures and Tables -
Comparison 1. Association between outcome discrepancies and statistical significance