Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2017

Open Access 01-12-2017 | Research article

Inter-rater reliability of AMSTAR is dependent on the pair of reviewers

Authors: Dawid Pieper, Anja Jacobs, Beate Weikert, Alba Fishta, Uta Wegewitz

Published in: BMC Medical Research Methodology | Issue 1/2017

Login to get access

Abstract

Background

Inter-rater reliability (IRR) is mainly assessed based on only two reviewers of unknown expertise. The aim of this paper is to examine differences in the IRR of the Assessment of Multiple Systematic Reviews (AMSTAR) and R(evised)-AMSTAR depending on the pair of reviewers.

Methods

Five reviewers independently applied AMSTAR and R-AMSTAR to 16 systematic reviews (eight Cochrane reviews and eight non-Cochrane reviews) from the field of occupational health. Responses were dichotomized and reliability measures were calculated by applying Holsti’s method (r) and Cohen’s kappa (κ) to all potential pairs of reviewers. Given that five reviewers participated in the study, there were ten possible pairs of reviewers.

Results

Inter-rater reliability varied for AMSTAR between r = 0.82 and r = 0.98 (median r = 0.88) using Holsti’s method and κ = 0.41 and κ = 0.69 (median κ = 0.52) using Cohen’s kappa and for R-AMSTAR between r = 0.77 and r = 0.89 (median r = 0.82) and κ = 0.32 and κ = 0.67 (median κ = 0.45) depending on the pair of reviewers. The same pair of reviewers yielded the highest IRR for both instruments. Pairwise Cohen’s kappa reliability measures showed a moderate correlation between AMSTAR and R-AMSTAR (Spearman’s ρ =0.50). The mean inter-rater reliability for AMSTAR was highest for item 1 (κ = 1.00) and item 5 (κ = 0.78), while lowest values were found for items 3, 8, 9 and 11, which showed only fair agreement.

Conclusions

Inter-rater reliability varies widely depending on the pair of reviewers. There may be some shortcomings associated with conducting reliability studies with only two reviewers. Further studies should include additional reviewers and should probably also take account of their level of expertise.
Appendix
Available only for authorised users
Literature
1.
go back to reference Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res Int J Qual Life Asp Treat Care Rehab. 2010;19(4):539–49.CrossRef Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res Int J Qual Life Asp Treat Care Rehab. 2010;19(4):539–49.CrossRef
2.
go back to reference Higgins JP, Altman DG, Gotzsche PC, Juni P, Moher D, Oxman AD, et al. The Cochrane Collaboration's tool for assessing risk of bias in randomised trials. BMJ. 2011;343:d5928.CrossRefPubMedPubMedCentral Higgins JP, Altman DG, Gotzsche PC, Juni P, Moher D, Oxman AD, et al. The Cochrane Collaboration's tool for assessing risk of bias in randomised trials. BMJ. 2011;343:d5928.CrossRefPubMedPubMedCentral
3.
go back to reference Whiting PF, Rutjes AW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529–36.CrossRefPubMed Whiting PF, Rutjes AW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529–36.CrossRefPubMed
7.
go back to reference Oxman AD, Guyatt GH. Validation of an index of the quality of review articles. J Clin Epidemiol. 1991;44(11):1271–8.CrossRefPubMed Oxman AD, Guyatt GH. Validation of an index of the quality of review articles. J Clin Epidemiol. 1991;44(11):1271–8.CrossRefPubMed
8.
go back to reference Sacks HS, Berrier J, Reitman D, Ancona-Berk VA, Chalmers TC. Meta-analyses of randomized controlled trials. N Engl J Med. 1987;316(8):450–5.CrossRefPubMed Sacks HS, Berrier J, Reitman D, Ancona-Berk VA, Chalmers TC. Meta-analyses of randomized controlled trials. N Engl J Med. 1987;316(8):450–5.CrossRefPubMed
9.
go back to reference Kung J, Chiappelli F, Cajulis OO, Avezova R, Kossan G, Chew L, et al. From Systematic Reviews to Clinical Recommendations for Evidence-Based Health Care: Validation of Revised Assessment of Multiple Systematic Reviews (R-AMSTAR) for Grading of Clinical Relevance. Open Dent J. 2010;4:84–91.PubMedPubMedCentral Kung J, Chiappelli F, Cajulis OO, Avezova R, Kossan G, Chew L, et al. From Systematic Reviews to Clinical Recommendations for Evidence-Based Health Care: Validation of Revised Assessment of Multiple Systematic Reviews (R-AMSTAR) for Grading of Clinical Relevance. Open Dent J. 2010;4:84–91.PubMedPubMedCentral
10.
go back to reference Pieper D, Buechter RB, Li L, Prediger B, Eikermann M. Systematic review found AMSTAR, but not R(evised)-AMSTAR, to have good measurement properties. J Clin Epidemiol. 2015;68(5):574–83.CrossRefPubMed Pieper D, Buechter RB, Li L, Prediger B, Eikermann M. Systematic review found AMSTAR, but not R(evised)-AMSTAR, to have good measurement properties. J Clin Epidemiol. 2015;68(5):574–83.CrossRefPubMed
11.
go back to reference Jorgensen L, Paludan-Muller AS, Laursen DR, Savovic J, Boutron I, Sterne JA, et al. Evaluation of the Cochrane tool for assessing risk of bias in randomized clinical trials: overview of published comments and analysis of user practice in Cochrane and non-Cochrane reviews. Syst Rev. 2016;5(1):80.CrossRefPubMedPubMedCentral Jorgensen L, Paludan-Muller AS, Laursen DR, Savovic J, Boutron I, Sterne JA, et al. Evaluation of the Cochrane tool for assessing risk of bias in randomized clinical trials: overview of published comments and analysis of user practice in Cochrane and non-Cochrane reviews. Syst Rev. 2016;5(1):80.CrossRefPubMedPubMedCentral
12.
go back to reference Shea BJ, Grimshaw JM, Wells GA, Boers M, Andersson N, Hamel C, et al. Development of AMSTAR: a measurement tool to assess the methodological quality of systematic reviews. BMC Med Res Methodol. 2007;7:10.CrossRefPubMedPubMedCentral Shea BJ, Grimshaw JM, Wells GA, Boers M, Andersson N, Hamel C, et al. Development of AMSTAR: a measurement tool to assess the methodological quality of systematic reviews. BMC Med Res Methodol. 2007;7:10.CrossRefPubMedPubMedCentral
13.
go back to reference Popovich I, Windsor B, Jordan V, Showell M, Shea B, Farquhar CM. Methodological quality of systematic reviews in subfertility: a comparison of two different approaches. PLoS One. 2012;7(12):e50403.CrossRefPubMedPubMedCentral Popovich I, Windsor B, Jordan V, Showell M, Shea B, Farquhar CM. Methodological quality of systematic reviews in subfertility: a comparison of two different approaches. PLoS One. 2012;7(12):e50403.CrossRefPubMedPubMedCentral
14.
go back to reference Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. BMJ. 2009;339:b2535.CrossRefPubMedPubMedCentral Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. BMJ. 2009;339:b2535.CrossRefPubMedPubMedCentral
16.
go back to reference Holsti OR. Content analysis for the social sciences and humanities. 1969. Holsti OR. Content analysis for the social sciences and humanities. 1969.
17.
go back to reference Cohen J. A Coefficient of Agreement for Nominal Scales. Educ Psychol Meas. 1960;20(1):37–46.CrossRef Cohen J. A Coefficient of Agreement for Nominal Scales. Educ Psychol Meas. 1960;20(1):37–46.CrossRef
18.
go back to reference Lombard M, Snyder-Duch J, Bracken CC. Content analysis in mass communication: Assessment and reporting of intercoder reliability. Hum Commun Res. 2002;28(4):587–604.CrossRef Lombard M, Snyder-Duch J, Bracken CC. Content analysis in mass communication: Assessment and reporting of intercoder reliability. Hum Commun Res. 2002;28(4):587–604.CrossRef
19.
go back to reference Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–74.CrossRefPubMed Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–74.CrossRefPubMed
21.
go back to reference Burda BU, Holmer HK, Norris SL. Limitations of A Measurement Tool to Assess Systematic Reviews (AMSTAR) and suggestions for improvement. Syst Rev. 2016;5(1):58.CrossRefPubMedPubMedCentral Burda BU, Holmer HK, Norris SL. Limitations of A Measurement Tool to Assess Systematic Reviews (AMSTAR) and suggestions for improvement. Syst Rev. 2016;5(1):58.CrossRefPubMedPubMedCentral
22.
go back to reference Faggion CM. Critical appraisal of AMSTAR: challenges, limitations, and potential solutions from the perspective of an assessor. BMC Med Res Methodol. 2015;15(1):1–5.CrossRef Faggion CM. Critical appraisal of AMSTAR: challenges, limitations, and potential solutions from the perspective of an assessor. BMC Med Res Methodol. 2015;15(1):1–5.CrossRef
23.
go back to reference Wegewitz U, Weikert B, Fishta A, Jacobs A, Pieper D. Resuming the discussion of AMSTAR: What can (should) be made better? BMC Med Res Methodol. 2016;16(1):111.CrossRefPubMedPubMedCentral Wegewitz U, Weikert B, Fishta A, Jacobs A, Pieper D. Resuming the discussion of AMSTAR: What can (should) be made better? BMC Med Res Methodol. 2016;16(1):111.CrossRefPubMedPubMedCentral
24.
go back to reference Whiting P, Savovic J, Higgins JP, Caldwell DM, Reeves BC, Shea B, et al. ROBIS: A new tool to assess risk of bias in systematic reviews was developed. J Clin Epidemiol. 2016;69:225–34.CrossRefPubMedPubMedCentral Whiting P, Savovic J, Higgins JP, Caldwell DM, Reeves BC, Shea B, et al. ROBIS: A new tool to assess risk of bias in systematic reviews was developed. J Clin Epidemiol. 2016;69:225–34.CrossRefPubMedPubMedCentral
25.
go back to reference Santaguida PL, Riley CM, Matchar DB. Assessing Risk of Bias as a Domain of Quality in Medical Test Studies. 2012. Santaguida PL, Riley CM, Matchar DB. Assessing Risk of Bias as a Domain of Quality in Medical Test Studies. 2012.
26.
go back to reference Hicks GE, Fritz JM, Delitto A, Mishock J. Interrater reliability of clinical examination measures for identification of lumbar segmental instability. Arch Phys Med Rehabil. 2003;84(12):1858–64.CrossRefPubMed Hicks GE, Fritz JM, Delitto A, Mishock J. Interrater reliability of clinical examination measures for identification of lumbar segmental instability. Arch Phys Med Rehabil. 2003;84(12):1858–64.CrossRefPubMed
27.
go back to reference Johnson CJ, Kittner SJ, McCarter RJ, Sloan MA, Stern BJ, Buchholz D, et al. Interrater reliability of an etiologic classification of ischemic stroke. Stroke. 1995;26(1):46–51.CrossRefPubMed Johnson CJ, Kittner SJ, McCarter RJ, Sloan MA, Stern BJ, Buchholz D, et al. Interrater reliability of an etiologic classification of ischemic stroke. Stroke. 1995;26(1):46–51.CrossRefPubMed
28.
go back to reference Hartling L, Hamm MP, Milne A, Vandermeer B, Santaguida PL, Ansari M, et al. Testing the risk of bias tool showed low reliability between individual reviewers and across consensus assessments of reviewer pairs. J Clin Epidemiol. 2013;66(9):973–81.CrossRefPubMed Hartling L, Hamm MP, Milne A, Vandermeer B, Santaguida PL, Ansari M, et al. Testing the risk of bias tool showed low reliability between individual reviewers and across consensus assessments of reviewer pairs. J Clin Epidemiol. 2013;66(9):973–81.CrossRefPubMed
29.
go back to reference Brouwers MC, Kho ME, Browman GP, Burgers JS, Cluzeau F, Feder G, et al. AGREE II: advancing guideline development, reporting and evaluation in health care. J Clin Epidemiol. 2010;63(12):1308–11.CrossRefPubMed Brouwers MC, Kho ME, Browman GP, Burgers JS, Cluzeau F, Feder G, et al. AGREE II: advancing guideline development, reporting and evaluation in health care. J Clin Epidemiol. 2010;63(12):1308–11.CrossRefPubMed
30.
go back to reference Armijo-Olivo S, Ospina M, da Costa BR, Egger M, Saltaji H, Fuentes J, et al. Poor reliability between Cochrane reviewers and blinded external reviewers when applying the Cochrane risk of bias tool in physical therapy trials. PLoS One. 2014;9(5):e96920.CrossRefPubMedPubMedCentral Armijo-Olivo S, Ospina M, da Costa BR, Egger M, Saltaji H, Fuentes J, et al. Poor reliability between Cochrane reviewers and blinded external reviewers when applying the Cochrane risk of bias tool in physical therapy trials. PLoS One. 2014;9(5):e96920.CrossRefPubMedPubMedCentral
31.
go back to reference Hartling L, Bond K, Vandermeer B, Seida J, Dryden DM, Rowe BH. Applying the risk of bias tool in a systematic review of combination long-acting beta-agonists and inhaled corticosteroids for persistent asthma. PLoS One. 2011;6(2):e17242.CrossRefPubMedPubMedCentral Hartling L, Bond K, Vandermeer B, Seida J, Dryden DM, Rowe BH. Applying the risk of bias tool in a systematic review of combination long-acting beta-agonists and inhaled corticosteroids for persistent asthma. PLoS One. 2011;6(2):e17242.CrossRefPubMedPubMedCentral
32.
go back to reference Jamilian A, Cannavale R, Piancino MG, Eslami S, Perillo L. Methodological quality and outcome of systematic reviews reporting on orthopaedic treatment for class III malocclusion: Overview of systematic reviews. J Orthod. 2016:1–19. Jamilian A, Cannavale R, Piancino MG, Eslami S, Perillo L. Methodological quality and outcome of systematic reviews reporting on orthopaedic treatment for class III malocclusion: Overview of systematic reviews. J Orthod. 2016:1–19.
33.
go back to reference Laver K, Dyer S, Whitehead C, Clemson L, Crotty M. Interventions to delay functional decline in people with dementia: a systematic review of systematic reviews. BMJ Open. 2016;6(4):e010767.CrossRefPubMedPubMedCentral Laver K, Dyer S, Whitehead C, Clemson L, Crotty M. Interventions to delay functional decline in people with dementia: a systematic review of systematic reviews. BMJ Open. 2016;6(4):e010767.CrossRefPubMedPubMedCentral
34.
go back to reference Zhang Q, Liu F, Xiao Z, Li Z, Wang B, Dong J, et al. Internal Versus External Fixation for the Treatment of Distal Radial Fractures: A Systematic Review of Overlapping Meta-Analyses. Medicine. 2016;95(9):e2945.CrossRefPubMedPubMedCentral Zhang Q, Liu F, Xiao Z, Li Z, Wang B, Dong J, et al. Internal Versus External Fixation for the Treatment of Distal Radial Fractures: A Systematic Review of Overlapping Meta-Analyses. Medicine. 2016;95(9):e2945.CrossRefPubMedPubMedCentral
35.
go back to reference Gwet KL. Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters: Advanced Analytics, LLC; 2014. Gwet KL. Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters: Advanced Analytics, LLC; 2014.
Metadata
Title
Inter-rater reliability of AMSTAR is dependent on the pair of reviewers
Authors
Dawid Pieper
Anja Jacobs
Beate Weikert
Alba Fishta
Uta Wegewitz
Publication date
01-12-2017
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2017
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/s12874-017-0380-y

Other articles of this Issue 1/2017

BMC Medical Research Methodology 1/2017 Go to the issue