Top

BMC Medical Research Methodology

Published in:

Open Access 01-12-2017 | Research article

Inter-rater reliability of AMSTAR is dependent on the pair of reviewers

Authors: Dawid Pieper, Anja Jacobs, Beate Weikert, Alba Fishta, Uta Wegewitz

Published in: BMC Medical Research Methodology | Issue 1/2017

Abstract

Background

Inter-rater reliability (IRR) is mainly assessed based on only two reviewers of unknown expertise. The aim of this paper is to examine differences in the IRR of the Assessment of Multiple Systematic Reviews (AMSTAR) and R(evised)-AMSTAR depending on the pair of reviewers.

Methods

Five reviewers independently applied AMSTAR and R-AMSTAR to 16 systematic reviews (eight Cochrane reviews and eight non-Cochrane reviews) from the field of occupational health. Responses were dichotomized and reliability measures were calculated by applying Holsti’s method (r) and Cohen’s kappa (κ) to all potential pairs of reviewers. Given that five reviewers participated in the study, there were ten possible pairs of reviewers.

Results

Inter-rater reliability varied for AMSTAR between r = 0.82 and r = 0.98 (median r = 0.88) using Holsti’s method and κ = 0.41 and κ = 0.69 (median κ = 0.52) using Cohen’s kappa and for R-AMSTAR between r = 0.77 and r = 0.89 (median r = 0.82) and κ = 0.32 and κ = 0.67 (median κ = 0.45) depending on the pair of reviewers. The same pair of reviewers yielded the highest IRR for both instruments. Pairwise Cohen’s kappa reliability measures showed a moderate correlation between AMSTAR and R-AMSTAR (Spearman’s ρ =0.50). The mean inter-rater reliability for AMSTAR was highest for item 1 (κ = 1.00) and item 5 (κ = 0.78), while lowest values were found for items 3, 8, 9 and 11, which showed only fair agreement.

Conclusions

Inter-rater reliability varies widely depending on the pair of reviewers. There may be some shortcomings associated with conducting reliability studies with only two reviewers. Further studies should include additional reviewers and should probably also take account of their level of expertise.

Available only for authorised users

Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res Int J Qual Life Asp Treat Care Rehab. 2010;19(4):539–49.CrossRef

Higgins JP, Altman DG, Gotzsche PC, Juni P, Moher D, Oxman AD, et al. The Cochrane Collaboration's tool for assessing risk of bias in randomised trials. BMJ. 2011;343:d5928.CrossRefPubMedPubMedCentral

Whiting PF, Rutjes AW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529–36.CrossRefPubMed

Wells GA, Shea B, O'Connell D, Peterson J, Welch V, Losos M, et al. The Newcastle-Ottawa Scale (NOS) for assessing the quality of nonrandomised studies in meta-analyses. Available from: http://www.ohri.ca/programs/clinical_epidemiology/oxford.asp. Accessed 22 May 2017.

Centre for Evidence-based Medicine. Critical Appraisal tools: Oxford University. Available from: http://www.cebm.net/critical-appraisal/. Accessed 22 May 2017.

Public Health Resource Unit. The Critical Skills Appraisal Programme: making sense of evidence England. 2006. Available from: http://www.casp-uk.net/. Accessed 22 May 2017.

Oxman AD, Guyatt GH. Validation of an index of the quality of review articles. J Clin Epidemiol. 1991;44(11):1271–8.CrossRefPubMed

Sacks HS, Berrier J, Reitman D, Ancona-Berk VA, Chalmers TC. Meta-analyses of randomized controlled trials. N Engl J Med. 1987;316(8):450–5.CrossRefPubMed

Kung J, Chiappelli F, Cajulis OO, Avezova R, Kossan G, Chew L, et al. From Systematic Reviews to Clinical Recommendations for Evidence-Based Health Care: Validation of Revised Assessment of Multiple Systematic Reviews (R-AMSTAR) for Grading of Clinical Relevance. Open Dent J. 2010;4:84–91.PubMedPubMedCentral

10.

Pieper D, Buechter RB, Li L, Prediger B, Eikermann M. Systematic review found AMSTAR, but not R(evised)-AMSTAR, to have good measurement properties. J Clin Epidemiol. 2015;68(5):574–83.CrossRefPubMed

11.

Jorgensen L, Paludan-Muller AS, Laursen DR, Savovic J, Boutron I, Sterne JA, et al. Evaluation of the Cochrane tool for assessing risk of bias in randomized clinical trials: overview of published comments and analysis of user practice in Cochrane and non-Cochrane reviews. Syst Rev. 2016;5(1):80.CrossRefPubMedPubMedCentral

12.

Shea BJ, Grimshaw JM, Wells GA, Boers M, Andersson N, Hamel C, et al. Development of AMSTAR: a measurement tool to assess the methodological quality of systematic reviews. BMC Med Res Methodol. 2007;7:10.CrossRefPubMedPubMedCentral

13.

Popovich I, Windsor B, Jordan V, Showell M, Shea B, Farquhar CM. Methodological quality of systematic reviews in subfertility: a comparison of two different approaches. PLoS One. 2012;7(12):e50403.CrossRefPubMedPubMedCentral

14.

Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. BMJ. 2009;339:b2535.CrossRefPubMedPubMedCentral

15.

AMSTAR working group. AMSTAR Checklist. 2016. Available from: www.amstar.ca. Accessed 22 May 2017.

16.

Holsti OR. Content analysis for the social sciences and humanities. 1969.

17.

Cohen J. A Coefficient of Agreement for Nominal Scales. Educ Psychol Meas. 1960;20(1):37–46.CrossRef

18.

Lombard M, Snyder-Duch J, Bracken CC. Content analysis in mass communication: Assessment and reporting of intercoder reliability. Hum Commun Res. 2002;28(4):587–604.CrossRef

19.

Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–74.CrossRefPubMed

20.

Jenderek B. Tool. Ein Computerprogramm für die Berechnung von Reliabilitätskoeffizienten 2006 [09.08.2015]. Available from: http://www.kmw.uni-leipzig.de/bereiche/empirie/service/reliabilitaetstool.html.

21.

Burda BU, Holmer HK, Norris SL. Limitations of A Measurement Tool to Assess Systematic Reviews (AMSTAR) and suggestions for improvement. Syst Rev. 2016;5(1):58.CrossRefPubMedPubMedCentral

22.

Faggion CM. Critical appraisal of AMSTAR: challenges, limitations, and potential solutions from the perspective of an assessor. BMC Med Res Methodol. 2015;15(1):1–5.CrossRef

23.

Wegewitz U, Weikert B, Fishta A, Jacobs A, Pieper D. Resuming the discussion of AMSTAR: What can (should) be made better? BMC Med Res Methodol. 2016;16(1):111.CrossRefPubMedPubMedCentral

24.

Whiting P, Savovic J, Higgins JP, Caldwell DM, Reeves BC, Shea B, et al. ROBIS: A new tool to assess risk of bias in systematic reviews was developed. J Clin Epidemiol. 2016;69:225–34.CrossRefPubMedPubMedCentral

25.

Santaguida PL, Riley CM, Matchar DB. Assessing Risk of Bias as a Domain of Quality in Medical Test Studies. 2012.

26.

Hicks GE, Fritz JM, Delitto A, Mishock J. Interrater reliability of clinical examination measures for identification of lumbar segmental instability. Arch Phys Med Rehabil. 2003;84(12):1858–64.CrossRefPubMed

27.

Johnson CJ, Kittner SJ, McCarter RJ, Sloan MA, Stern BJ, Buchholz D, et al. Interrater reliability of an etiologic classification of ischemic stroke. Stroke. 1995;26(1):46–51.CrossRefPubMed

28.

Hartling L, Hamm MP, Milne A, Vandermeer B, Santaguida PL, Ansari M, et al. Testing the risk of bias tool showed low reliability between individual reviewers and across consensus assessments of reviewer pairs. J Clin Epidemiol. 2013;66(9):973–81.CrossRefPubMed

29.

Brouwers MC, Kho ME, Browman GP, Burgers JS, Cluzeau F, Feder G, et al. AGREE II: advancing guideline development, reporting and evaluation in health care. J Clin Epidemiol. 2010;63(12):1308–11.CrossRefPubMed

30.

Armijo-Olivo S, Ospina M, da Costa BR, Egger M, Saltaji H, Fuentes J, et al. Poor reliability between Cochrane reviewers and blinded external reviewers when applying the Cochrane risk of bias tool in physical therapy trials. PLoS One. 2014;9(5):e96920.CrossRefPubMedPubMedCentral

31.

Hartling L, Bond K, Vandermeer B, Seida J, Dryden DM, Rowe BH. Applying the risk of bias tool in a systematic review of combination long-acting beta-agonists and inhaled corticosteroids for persistent asthma. PLoS One. 2011;6(2):e17242.CrossRefPubMedPubMedCentral

32.

Jamilian A, Cannavale R, Piancino MG, Eslami S, Perillo L. Methodological quality and outcome of systematic reviews reporting on orthopaedic treatment for class III malocclusion: Overview of systematic reviews. J Orthod. 2016:1–19.

33.

Laver K, Dyer S, Whitehead C, Clemson L, Crotty M. Interventions to delay functional decline in people with dementia: a systematic review of systematic reviews. BMJ Open. 2016;6(4):e010767.CrossRefPubMedPubMedCentral

34.

Zhang Q, Liu F, Xiao Z, Li Z, Wang B, Dong J, et al. Internal Versus External Fixation for the Treatment of Distal Radial Fractures: A Systematic Review of Overlapping Meta-Analyses. Medicine. 2016;95(9):e2945.CrossRefPubMedPubMedCentral

35.

Gwet KL. Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters: Advanced Analytics, LLC; 2014.

Title: Inter-rater reliability of AMSTAR is dependent on the pair of reviewers
Authors: Dawid Pieper
Anja Jacobs
Beate Weikert
Alba Fishta
Uta Wegewitz
Publication date: 01-12-2017
Publisher: BioMed Central
Published in: BMC Medical Research Methodology / Issue 1/2017
Electronic ISSN: 1471-2288
DOI: https://doi.org/10.1186/s12874-017-0380-y

At a glance: The ONWARDS insulin icodec trials

Springer Medicine

Inter-rater reliability of AMSTAR is dependent on the pair of reviewers

Abstract

Background

Methods

Results

Conclusions

At a glance: The ONWARDS insulin icodec trials

Springer Medicine

Abstract

Background

Methods

Results

Conclusions

Please log in to get access to this content

Other articles of this Issue 1/2017

The impact of the lookback period and definition of confirmatory events on the identification of incident cancer cases in administrative data

Methodologic considerations in the design and analysis of nested case-control studies: association between cytokines and postoperative delirium

Erratum to: Standardizing effect size from linear regression models with log-transformed variables for meta-analysis

A survey of prevalence of narrative and systematic reviews in five major medical journals

What does it mean when people say that they have received expressions of concern about their drinking or advice to cut down on the AUDIT scale?

Review and evaluation of performance measures for survival prediction models in external validation settings