Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2006

Open Access 01-12-2006 | Research article

Reproducibility of the STARD checklist: an instrument to assess the quality of reporting of diagnostic accuracy studies

Authors: Nynke Smidt, Anne WS Rutjes, Daniëlle AWM van der Windt, Raymond WJG Ostelo, Patrick M Bossuyt, Johannes B Reitsma, Lex M Bouter, Henrica CW de Vet

Published in: BMC Medical Research Methodology | Issue 1/2006

Login to get access

Abstract

Background

In January 2003, STAndards for the Reporting of Diagnostic accuracy studies (STARD) were published in a number of journals, to improve the quality of reporting in diagnostic accuracy studies. We designed a study to investigate the inter-assessment reproducibility, and intra- and inter-observer reproducibility of the items in the STARD statement.

Methods

Thirty-two diagnostic accuracy studies published in 2000 in medical journals with an impact factor of at least 4 were included. Two reviewers independently evaluated the quality of reporting of these studies using the 25 items of the STARD statement. A consensus evaluation was obtained by discussing and resolving disagreements between reviewers. Almost two years later, the same studies were evaluated by the same reviewers. For each item, percentages agreement and Cohen's kappa between first and second consensus assessments (inter-assessment) were calculated. Intraclass Correlation coefficients (ICC) were calculated to evaluate its reliability.

Results

The overall inter-assessment agreement for all items of the STARD statement was 85% (Cohen's kappa 0.70) and varied from 63% to 100% for individual items. The largest differences between the two assessments were found for the reporting of the rationale of the reference standard (kappa 0.37), number of included participants that underwent tests (kappa 0.28), distribution of the severity of the disease (kappa 0.23), a cross tabulation of the results of the index test by the results of the reference standard (kappa 0.33) and how indeterminate results, missing data and outliers were handled (kappa 0.25). Within and between reviewers, also large differences were observed for these items. The inter-assessment reliability of the STARD checklist was satisfactory (ICC = 0.79 [95% CI: 0.62 to 0.89]).

Conclusion

Although the overall reproducibility of the quality of reporting on diagnostic accuracy studies using the STARD statement was found to be good, substantial disagreements were found for specific items. These disagreements were not so much caused by differences in interpretation of the items by the reviewers but rather by difficulties in assessing the reporting of these items due to lack of clarity within the articles. Including a flow diagram in all reports on diagnostic accuracy studies would be very helpful in reducing confusion between readers and among reviewers.
Appendix
Available only for authorised users
Literature
1.
go back to reference Chan AW, Altman DG: Epidemiology and reporting of randomized trials published in PubMed Journals. Lancet. 2005, 365: 1159-1162. 10.1016/S0140-6736(05)71879-1.CrossRefPubMed Chan AW, Altman DG: Epidemiology and reporting of randomized trials published in PubMed Journals. Lancet. 2005, 365: 1159-1162. 10.1016/S0140-6736(05)71879-1.CrossRefPubMed
2.
go back to reference Honest H, Khan KS: Reporting of measures of accuracy in systematic reviews of diagnostic literature. BMC Health Serv Res. 2002, 2 (1): 4-10.1186/1472-6963-2-4. Epub 2002 Mar 7CrossRefPubMedPubMedCentral Honest H, Khan KS: Reporting of measures of accuracy in systematic reviews of diagnostic literature. BMC Health Serv Res. 2002, 2 (1): 4-10.1186/1472-6963-2-4. Epub 2002 Mar 7CrossRefPubMedPubMedCentral
3.
go back to reference Pocock SJ, Collier TJ, Dandreo KJ, De Stavola BL, Goldman MB, Kalish LA, Kasten LE, McCormack VA: Issues in the reporting of epidemiological studies: a survey of recent practice. British Medical Journal. 2004, 329: 883-CrossRefPubMedPubMedCentral Pocock SJ, Collier TJ, Dandreo KJ, De Stavola BL, Goldman MB, Kalish LA, Kasten LE, McCormack VA: Issues in the reporting of epidemiological studies: a survey of recent practice. British Medical Journal. 2004, 329: 883-CrossRefPubMedPubMedCentral
5.
go back to reference Ioannidis JPA, Lau J: Completeness of safety reporting in randomized trials. An evaluation of 7 medical areas. Journal of the American Medical Association. 2001, 285: 437-443. 10.1001/jama.285.4.437.CrossRefPubMed Ioannidis JPA, Lau J: Completeness of safety reporting in randomized trials. An evaluation of 7 medical areas. Journal of the American Medical Association. 2001, 285: 437-443. 10.1001/jama.285.4.437.CrossRefPubMed
6.
go back to reference Tooth L, Ware R, Purdie DM, Dobson A: Quality of reporting of observational longitudinal research. American Journal of Epidemiology. 2005, 161: 280-288. 10.1093/aje/kwi042.CrossRefPubMed Tooth L, Ware R, Purdie DM, Dobson A: Quality of reporting of observational longitudinal research. American Journal of Epidemiology. 2005, 161: 280-288. 10.1093/aje/kwi042.CrossRefPubMed
7.
go back to reference Mills E, Wu P, Gagnier J, Heels-Ansdell D, Montori VM: An analysis of general medical journals that endorse CONSORT found that reporting was not enforced consistently. Journal of Clinical Epidemiology. 2005, 58: 662-667. 10.1016/j.jclinepi.2005.01.004.CrossRefPubMed Mills E, Wu P, Gagnier J, Heels-Ansdell D, Montori VM: An analysis of general medical journals that endorse CONSORT found that reporting was not enforced consistently. Journal of Clinical Epidemiology. 2005, 58: 662-667. 10.1016/j.jclinepi.2005.01.004.CrossRefPubMed
8.
go back to reference Begg CB, Cho MK, Eastwood S, Horton R, Moher D, Olkin I, Rennie D, Schulz KF, Simel DL, Stroup DF: Improving the quality of reporting of randomized controlled trials: the CONSORT statement. Journal of the American Medical Association. 1996, 276: 637-639. 10.1001/jama.276.8.637.CrossRefPubMed Begg CB, Cho MK, Eastwood S, Horton R, Moher D, Olkin I, Rennie D, Schulz KF, Simel DL, Stroup DF: Improving the quality of reporting of randomized controlled trials: the CONSORT statement. Journal of the American Medical Association. 1996, 276: 637-639. 10.1001/jama.276.8.637.CrossRefPubMed
9.
go back to reference Moher D, Schulz KF, Altman DG, for the CONSORT Group: The CONSORT Statement: Revised recommendations for improving the quality of reports of parallel-group randomised trials. Annals of Internal Medicine. 2001, 134: 657-662.CrossRefPubMed Moher D, Schulz KF, Altman DG, for the CONSORT Group: The CONSORT Statement: Revised recommendations for improving the quality of reports of parallel-group randomised trials. Annals of Internal Medicine. 2001, 134: 657-662.CrossRefPubMed
10.
go back to reference Altman DG, Schultz KF, Moher D, Egger M, Davidoff F, Elbourne D, Gøtzsche PC, Lang T, for the CONSORT Group: The revised CONSORT Statement for reporting randomized trials: explanation and elaboration. Annals of Internal Medicine. 2001, 134: 663-694.CrossRefPubMed Altman DG, Schultz KF, Moher D, Egger M, Davidoff F, Elbourne D, Gøtzsche PC, Lang T, for the CONSORT Group: The revised CONSORT Statement for reporting randomized trials: explanation and elaboration. Annals of Internal Medicine. 2001, 134: 663-694.CrossRefPubMed
11.
go back to reference Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Moher D, Rennie D, de Vet HC, Lijmer JG: The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Clinical Chemistry. 2003, 49: 7-18. 10.1373/49.1.7.CrossRefPubMed Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Moher D, Rennie D, de Vet HC, Lijmer JG: The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Clinical Chemistry. 2003, 49: 7-18. 10.1373/49.1.7.CrossRefPubMed
12.
go back to reference Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Lijmer JG, Moher D, Rennie D, de Vet HC: Towards complete and accurate reporting of studies of diagnostic accuracy, the STARD initiative. Clinical Chemistry. 2003, 49: 1-6. 10.1373/49.1.1.CrossRefPubMed Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Lijmer JG, Moher D, Rennie D, de Vet HC: Towards complete and accurate reporting of studies of diagnostic accuracy, the STARD initiative. Clinical Chemistry. 2003, 49: 1-6. 10.1373/49.1.1.CrossRefPubMed
13.
go back to reference Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, Stroup DF, for the QUOROM group: Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement. Lancet. 1999, 354: 1896-1900. 10.1016/S0140-6736(99)04149-5.CrossRefPubMed Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, Stroup DF, for the QUOROM group: Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement. Lancet. 1999, 354: 1896-1900. 10.1016/S0140-6736(99)04149-5.CrossRefPubMed
14.
go back to reference Stroup DF, Berlin JA, Morton SC, Olkin I, Williamson GD, Rennie D, Moher D, Becker BJ, Sipe TA, Thacker SB: Meta-analysis of observational studies in epidemiology: a proposal for reporting. Meta-analysis Of Observational Studies in Epidemiology (MOOSE) group. Journal of the American Medical Association. 2000, 283: 2008-2012. 10.1001/jama.283.15.2008.CrossRefPubMed Stroup DF, Berlin JA, Morton SC, Olkin I, Williamson GD, Rennie D, Moher D, Becker BJ, Sipe TA, Thacker SB: Meta-analysis of observational studies in epidemiology: a proposal for reporting. Meta-analysis Of Observational Studies in Epidemiology (MOOSE) group. Journal of the American Medical Association. 2000, 283: 2008-2012. 10.1001/jama.283.15.2008.CrossRefPubMed
16.
go back to reference Moher D, Jones A, Lepage L, for the CONSORT group: Use of the CONSORT statement and quality of reports of randomized trials. A comparative before-and-after evaluation. Journal of the American Medical Association. 2001, 285: 1992-1995. 10.1001/jama.285.15.1992.CrossRefPubMed Moher D, Jones A, Lepage L, for the CONSORT group: Use of the CONSORT statement and quality of reports of randomized trials. A comparative before-and-after evaluation. Journal of the American Medical Association. 2001, 285: 1992-1995. 10.1001/jama.285.15.1992.CrossRefPubMed
17.
go back to reference Egger M, Jüni P, Bartlett C: Value of flow diagrams in reports of randomised controlled trials. Journal of the American Medical Association. 2001, 285: 1996-1999. 10.1001/jama.285.15.1996.CrossRefPubMed Egger M, Jüni P, Bartlett C: Value of flow diagrams in reports of randomised controlled trials. Journal of the American Medical Association. 2001, 285: 1996-1999. 10.1001/jama.285.15.1996.CrossRefPubMed
18.
go back to reference Smidt N, Rutjes AWS, Van der Windt AWM, Ostelo RWJG, Reitsma JB, Bossuyt PM, Bouter LM, De Vet HCW: Quality of reporting of diagnostic accuracy studies. Radiology. 2005, 235: 347-353.CrossRefPubMed Smidt N, Rutjes AWS, Van der Windt AWM, Ostelo RWJG, Reitsma JB, Bossuyt PM, Bouter LM, De Vet HCW: Quality of reporting of diagnostic accuracy studies. Radiology. 2005, 235: 347-353.CrossRefPubMed
19.
go back to reference Fleiss JF: Chapter 1: Reliability of measurement. The design and analysis of clinical experiments. 1986, John Wiley & Sons, London, 1-33. Fleiss JF: Chapter 1: Reliability of measurement. The design and analysis of clinical experiments. 1986, John Wiley & Sons, London, 1-33.
20.
go back to reference Landis JR, Koch GG: The measurement of observer agreement for categorical data. Biometrics. 1977, 33: 159-174.CrossRefPubMed Landis JR, Koch GG: The measurement of observer agreement for categorical data. Biometrics. 1977, 33: 159-174.CrossRefPubMed
21.
go back to reference Bland JM, Altman DG: Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986, 1: 307-310.CrossRefPubMed Bland JM, Altman DG: Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986, 1: 307-310.CrossRefPubMed
22.
go back to reference Beckerman H, Roebroeck ME, Lankhorst GJ, Becher JG, Bezemer PD, Verbeek AL: Smallest real difference, a link between reproducibility and responsiveness. Quality of Life Research. 2001, 10: 571-8. 10.1023/A:1013138911638.CrossRefPubMed Beckerman H, Roebroeck ME, Lankhorst GJ, Becher JG, Bezemer PD, Verbeek AL: Smallest real difference, a link between reproducibility and responsiveness. Quality of Life Research. 2001, 10: 571-8. 10.1023/A:1013138911638.CrossRefPubMed
23.
go back to reference Kramer MS, Feinstein AR: Clinical biostatistics LIV The biostatistics of concordance. Clinical Pharmacology and Therapeutics. 1981, 29: 11-23.CrossRef Kramer MS, Feinstein AR: Clinical biostatistics LIV The biostatistics of concordance. Clinical Pharmacology and Therapeutics. 1981, 29: 11-23.CrossRef
24.
go back to reference Vach W: The dependence of Cohen's kappa on the prevalence does not matter. Journal of Clinical Epidemiology . 2005, 58: 655-661. 10.1016/j.jclinepi.2004.02.021.CrossRefPubMed Vach W: The dependence of Cohen's kappa on the prevalence does not matter. Journal of Clinical Epidemiology . 2005, 58: 655-661. 10.1016/j.jclinepi.2004.02.021.CrossRefPubMed
25.
go back to reference Stengel D, Bauwens K, Rademacher G, Mutze S, Ekkernkamp A: Association between compliance with methodological standards of diagnostic research and reported test accuracy: meta-analysis of focused assessment of US for trauma. Radiology. 2005, 236: 102-111.CrossRefPubMed Stengel D, Bauwens K, Rademacher G, Mutze S, Ekkernkamp A: Association between compliance with methodological standards of diagnostic research and reported test accuracy: meta-analysis of focused assessment of US for trauma. Radiology. 2005, 236: 102-111.CrossRefPubMed
26.
go back to reference Lijmer JG, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, Van der Meulen JHP, Bossuyt PMM: Empirical evidence of design-related bias in studies of diagnostic tests. Journal of the American Medical Association. 1999, 282: 1061-1066. 10.1001/jama.282.11.1061.CrossRefPubMed Lijmer JG, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, Van der Meulen JHP, Bossuyt PMM: Empirical evidence of design-related bias in studies of diagnostic tests. Journal of the American Medical Association. 1999, 282: 1061-1066. 10.1001/jama.282.11.1061.CrossRefPubMed
27.
go back to reference Pai M, Flores LL, Pai N, Hubbard A, Riley LW, Colford JM: Diagnostic accuracy of nucleic acid amplification tests for tuberculous meningitis: a systematic review and meta-analysis. Lancet Infectious Diseases. 2003, 3: 633-643. 10.1016/S1473-3099(03)00772-2.CrossRefPubMed Pai M, Flores LL, Pai N, Hubbard A, Riley LW, Colford JM: Diagnostic accuracy of nucleic acid amplification tests for tuberculous meningitis: a systematic review and meta-analysis. Lancet Infectious Diseases. 2003, 3: 633-643. 10.1016/S1473-3099(03)00772-2.CrossRefPubMed
28.
go back to reference Whiting P, Rutjes AWS, Reitsma JB, Glas AS, Bossuyt PMM, Kleijnen J: Sources of variation and bias in studies of diagnostic accuracy. Annals of Internal Medicine. 2004, 140: 189-202.CrossRefPubMed Whiting P, Rutjes AWS, Reitsma JB, Glas AS, Bossuyt PMM, Kleijnen J: Sources of variation and bias in studies of diagnostic accuracy. Annals of Internal Medicine. 2004, 140: 189-202.CrossRefPubMed
Metadata
Title
Reproducibility of the STARD checklist: an instrument to assess the quality of reporting of diagnostic accuracy studies
Authors
Nynke Smidt
Anne WS Rutjes
Daniëlle AWM van der Windt
Raymond WJG Ostelo
Patrick M Bossuyt
Johannes B Reitsma
Lex M Bouter
Henrica CW de Vet
Publication date
01-12-2006
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2006
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/1471-2288-6-12

Other articles of this Issue 1/2006

BMC Medical Research Methodology 1/2006 Go to the issue