Published online Aug 23, 2016.
https://doi.org/10.3348/kjr.2016.17.5.706
Does the Reporting Quality of Diagnostic Test Accuracy Studies, as Defined by STARD 2015, Affect Citation?
Abstract
Objective
To determine the rate with which diagnostic test accuracy studies that are published in a general radiology journal adhere to the Standards for Reporting of Diagnostic Accuracy Studies (STARD) 2015, and to explore the relationship between adherence rate and citation rate while avoiding confounding by journal factors.
Materials and Methods
All eligible diagnostic test accuracy studies that were published in the Korean Journal of Radiology in 2011–2015 were identified. Five reviewers assessed each article for yes/no compliance with 27 of the 30 STARD 2015 checklist items (items 28, 29, and 30 were excluded). The total STARD score (number of fulfilled STARD items) was calculated. The score of the 15 STARD items that related directly to the Quality Assessment of Diagnostic Accuracy Studies (QUADAS)-2 was also calculated. The number of times each article was cited (as indicated by the Web of Science) after publication until March 2016 and the article exposure time (time in months between publication and March 2016) were extracted.
Results
Sixty-three articles were analyzed. The mean (range) total and QUADAS-2-related STARD scores were 20.0 (14.5–25) and 11.4 (7–15), respectively. The mean citation number was 4 (0–21). Citation number did not associate significantly with either STARD score after accounting for exposure time (total score: correlation coefficient = 0.154, p = 0.232; QUADAS-2-related score: correlation coefficient = 0.143, p = 0.266).
Conclusion
The degree of adherence to STARD 2015 was moderate for this journal, indicating that there is room for improvement. When adjusted for exposure time, the degree of adherence did not affect the citation rate.
INTRODUCTION
The quality of scientific research articles that are published in a journal consists of two elements, namely, the quality of the report and the quality of the science. These quality variables do not necessarily concur, although they do overlap. In other words, a well-reported study does not necessarily mean that the study is of good scientific quality while a poorly reported study does not necessarily mean that the scientific quality of the study is also poor. Nevertheless, the quality of the report is very important because poor reporting quality hinders the ability of the readership to understand the authenticity, integrity, quality, and clinical impact of a research study. These deficiencies also hamper the effective generation of systematic reviews, which subsequently impacts the development of clinical guidelines and ultimately influences patient care (1). The great need for quality reporting in clinical research explains why there are longstanding and ongoing international efforts to create and promote standardized reporting guidelines for research studies, such as those reported by the Enhancing the QUAlity and Transparency Of health Research (EQUATOR) network (1). The subject of the present study is the Standards for Reporting of Diagnostic Accuracy Studies (STARD) guideline, which was published in 2003 (2) and was recently updated to its second version, namely, STARD 2015 (3, 4).
Peer-reviewed journals typically have limited space, and therefore publish far fewer articles than they receive. This continues to be true even in the present era of web-based electronic journals and open access journals (5). This space limitation means that journals try to select the highest quality articles for publication using editorial and peer reviews. However, it is well known that this review process can be deficient and subjective: these problems can result in the occasional publication of articles that are grossly deficient (6, 7, 8). One way to improve the objectivity and quality of the review process of journals may be to examine candidate articles for adherence to the appropriate standardized reporting guidelines. However, although this presumption may sound logical, there is as yet little robust evidence that supports it. Two studies have assessed whether the degree of adherence to STARD guidelines can be used as a metric of article quality. Specifically, they asked whether such adherence associated with the journal impact factor (9) or citation rate (10). However, the results were not conclusive; this partly reflected confounding due to impact factor differences between the journals that were included (10).
These observations led us to determine the rate with which original research studies of diagnostic test accuracy that were published in a single journal (the Korean Journal of Radiology [KJR]) complied with STARD 2015 guidelines (3, 4). The fact that the articles were all from a single journal meant that our analysis of the relationship between adherence to STARD 2015 and citation rate was not confounded by journal factors.
MATERIALS AND METHODS
Literature Search and Study Selection
First, all original research papers that were published in KJR between January 2011 and December 2015 and whose abstract or title contained at least one of the following key terms were selected: "sensitivity", "specificity", "accuracy", "performance", "receiver operating", and "ROC". The full text of these articles was double-checked by one reviewer and one of four other reviewers to identify the articles that reported the diagnostic test accuracy of one or more tests relative to the reference standard in humans. All reviewers were experienced in diagnostic test accuracy studies and STARD as well as in radiological research in general.
Data Extraction
The five reviewers evaluated the eligible diagnostic test accuracy papers. Given that STARD 2015 had only been released just before this study started (3, 4), a seminar attended by the five reviewers and an additional expert on literature review and bibliographic research was convened. The aim of the seminar was to review and discuss the items listed in STARD 2015, thereby ensuring that all of the reviewers had a clear understanding of STARD 2015.
Each of the five reviewers was randomly assigned a fifth of all eligible articles. They evaluated the articles independently according to the STARD 2015 checklist and extracted the relevant information, as explained below. Thereafter, any doubtful results were discussed at meetings attended by the five reviewers and the additional expert on literature review and bibliographic search until a complete agreement was achieved among all six people. Fulfilment of each STARD checklist item was recorded in a dichotomous manner, namely, yes (properly reported) or no (not reported). Of the 30 items in the STARD 2015 checklist (3, 4), we excluded items 28 (registration number and name of registry), 29 (where the full study protocol can be accessed), and 30 (sources of funding and other support; role of funders) because items 28 and 29 had not been requested by the journal during the study period and item 30 only applies to funded research studies, which account for a minority of articles published in the journal. After completing this evaluation of the articles, the five reviewers also determined the total number of times each article had been cited by March 1, 2016, as indicated by the citation index reported in the Web of Science (Thomson Reuters, New York, NY, USA).
Statistical Analysis
The proportion (%) of articles that fulfilled each reporting item was determined. The total STARD score was then calculated for each article by adding the number of reported items. Thus, the maximum STARD score was 27 points (i.e., 30 minus 3 items). Assuming that each item is of equal importance, higher scores may roughly indicate better reporting quality. Regarding items 10, 12, 13, and 21, each of which consists of two sub-items (i.e., 10a, 10b, 12a, 12b, 13a, 13b, 21a, and 21b), each sub-item was given 0.5 points when it was fulfilled. This follows the practice established in previous similar studies (9, 11). The mean and standard deviation (SD) of the total STARD scores were determined.
Of the 27 STARD items, 15 relate directly to the revised Quality Assessment of Diagnostic Accuracy Studies (QUADAS)-2 (12) tool for systematic reviews of diagnostic accuracy studies. QUADAS-2 consists of four key domains, namely, patient selection, the index test, the reference standard, and flow and timing. The 15 QUADAS-2-related STARD items are directly used to assess the risk of bias and study applicability and consist of STARD items 3, 6–13, 15, 16, and 19–22. Items 3, 6–9, 20, and 21 concern the patient selection domain of QUADAS-2, items 10a, 12a, and 13a concern the index test domain, items 10b, 11, 12b, and 13b concern the reference standard domain, items 19 and 22 concern the flow and timing domain, and items 15 and 16 concern both the index test and reference standard domains. The QUADAS-2-related yes/no score was calculated for each article, after which the mean (SD) QUADAS-2-related score of all articles was determined.
The articles were also divided according to whether they were cohort-type or case-control-type diagnostic accuracy studies. Cohort-type (single-gate) accuracy studies are characterized by the selection of subjects using one set of inclusion criteria, whereas, in case-control-type accuracy studies, the subjects are selected using multiple sets of inclusion criteria (13, 14, 15). The mean (SD) total STARD, and QUADAS-2-related scores of these subgroups were calculated, and the two study types were compared in terms of these scores using the Student t test.
To assess whether the degree of fulfilment of the STARD 2015 items associated with the number of citations, a multivariable linear regression analysis was performed. The total STARD or QUADAS-2-related score served as an independent variable, exposure time (the time in months between publication and March 2016) served as a covariate to account for confounding by exposure time, and the number of citations served as the dependent variable.
All statistical analyses were performed using MedCalc for Windows (version 15.0; MedCalc Software, Ostend, Belgium), and p < 0.05 was considered statistically significant.
RESULTS
Subjects
The literature search strategy initially led to the identification of 123 articles. Of these, 60 were excluded because they were letters, editorials, or abstracts instead of a full article (n = 1), a case report or series (n = 2), a review article (n = 16), or not in the field of interest (n = 41). Thus, 63 articles fulfilled the eligibility criteria (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78).
Adherence to STARD 2015 Items
The mean ± SD total STARD score (maximum of 27 points) of the included studies was 20.0 ± 2.1 (range 14.5–25). All 63 articles reported more than 50% of the items (total STARD score > 13.5), and 13 articles (20.6%) (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28) reported more than 80% (total STARD score > 21.6). The mean ± SD QUADAS-2-related STARD score (maximum of 15 points) was 11.4 ± 1.7 (range 7–15). Sixty-one articles (96.8%) reported more than 50% of the items (QUADAS-2-related STARD score > 7.5), and 20 articles (31.7%) (16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 27, 34, 58, 59, 61, 63, 67, 71, 75, 78) reported more than 80% (QUADAS-2-related STARD score > 12).
A closer assessment of each reporting item revealed that they varied widely in terms of rate of adherence (from 1.6% to 100%) (Table 1). Four items showed remarkably poor adherence rates (< 20%), namely, item 13b (whether clinical information and index test results were available to the assessors of the reference standard; 15.9%), item 18 (whether intended sample size and how it was determined were reported; 1.6%), item 19 (the flow of participants was indicated using a diagram; 17.5%), and item 25 (whether any adverse events from performing the index test or the reference standard were reported; 9.5%). By contrast, 12 items showed almost perfect adherence (> 95%), namely, item 1 (the study was identified as a study of diagnostic accuracy that used at least one measure of accuracy, item 2 (there was a structured summary of study design, methods, results, and conclusions), item 3 (scientific and clinical background, including the intended use and clinical role of the index test, was provided), item 4 (the study objectives and hypotheses were described), item 5 (it was indicated whether data collection was planned before or after the index test and reference standard were performed), item 6 (the eligibility criteria were listed), item 7 (the basis on which potentially eligible participants were identified was described), item 10a (the index test was described in sufficient detail to allow replication), item 12b (the definition of and rationale for test positivity cut-offs or result categories of the reference standard were described, with pre-specified cut-offs/result categories being distinguished from exploratory cut-offs/result categories), item 14 (methods for estimating or comparing measures of diagnostic accuracy were described), item 16 (how missing data on the index test and reference standard were handled was described), and item 27 (the implications for practice, including the intended use and clinical role of the index test, were described).
Table 1
Rate of Adherence to Each STARD 2015 Item
Of the 63 diagnostic test accuracy studies, there were 45 cohort-type accuracy studies and 18 case-control-type accuracy studies. The cohort- and case-control-type studies did not differ significantly in terms of total STARD score (20.2 ± 2.3 vs. 19.4 ± 1.7, p = 0.165) or QUADAS-2-related STARD score (11.6 ± 1.8 vs. 10.9 ± 1.5, p = 0.416).
Relationship between Adherence to STARD 2015 and Citation Rate
The mean ± SD number of times the included studies were cited between publication and March 2016 was 4 ± 4.6 (range 0–21). Multivariable linear regression analysis showed that exposure time correlated significantly with number of citations both when modelled with the total STARD score (partial correlation coefficient = 0.559, p < 0.001) and the QUADAS-2-related STARD score (partial correlation coefficient = 0.556, p < 0.001). When the effect of exposure time was accounted for, neither the total STARD score nor the QUADAS-2-related STARD score correlated significantly with number of citations (total score: partial correlation coefficient = 0.154, p = 0.232; QUADAS-2-related score: partial correlation coefficient = 0.143, p = 0.266).
DISCUSSION
This analysis indicates that the articles that reported diagnostic test accuracy studies and were published in KJR in the recent past were of moderate reporting quality: all articles reported more than 50% of the 27 STARD items while 20.6% articles reported more than 80% of the items. Moreover, 96.8% articles reported more than 50% of the 15 QUADAS-2-related STARD items while 31.7% articles reported more than 80%.
Several studies have previously investigated the quality of reporting diagnostic test accuracy studies in the radiology field. These studies employed the previous version of STARD (STARD 2003) and either assessed the quality of the reports at a particular single cross-sectional time (i.e., across various research areas or in a particular area) or analyzed the changes in quality over time before and after the release of STARD 2003 (9, 11, 79, 80). These studies showed that the mean rate of adherence to STARD 2003 at specific time points ranged from 47.6% to 73.3% (9, 11, 79, 80) and that the quality of the reports improved gradually, albeit mildly, after STARD 2003 was introduced (79). In the present study, the mean rate with which the diagnostic accuracy studies published in KJR in 2011–2015 adhered to STARD 2015 was 74%, which is at least consistent with the reported results and trend given that STARD 2015 has essentially all of the elements of STARD 2003 (as well as some new elements). This suggests that the journal had maintained its peer-review and editorial processes and reporting quality relatively well over the study period. Nevertheless, there is room for improvement, particularly in terms of the items that had a rather low adherence rate, namely, item 13b (whether clinical information and index test results were available to the assessors of the reference standard), item 18 (the intended sample size and how it was determined were described), item 19 (the flow of participants was indicated using a diagram), and item 25 (any adverse events from performing the index test or the reference standard were indicated).
The ultimate goal of reporting guidelines such as STARD is to ensure that the results of research studies are delivered accurately and clearly, thus improving their impact on patient care. Since it is difficult to measure the ultimate effect of an article, a journal or an adoption of reporting guidelines, the frequency with which an article is cited is often used as a surrogate variable. A recent study investigated the correlation between adherence to STARD 2003 and the citation rate (10). Unlike the present study, it found a weak positive correlation between these variables; however, the results were somewhat inconclusive because this correlation disappeared when the varying impact factor values of the different journals were accounted for (10). Our study lacked a between-journal effect; the fact that we did not detect a correlation between STARD adherence and citation rate may further support the notion that rate of adherence to STARD and citation rate do not really correlate. However, it remains possible that citation rate and journal impact factor do not accurately reflect the ultimate impact of an article or a journal, respectively (81). The ultimate effect of reporting guidelines on scientific publication may need to be examined more closely and thoroughly using alternative measures of article/journal impact.
This study had some limitations. First, because this study included articles published in a single journal, it may not be possible to directly generalize the results to other journals. However, the aim of this study design was to remove confounding caused by differences between journals. It would be worthwhile to also assess the relationship between STARD adherence and citation rate in other individual journals. Second, as mentioned above, the citation rate may not be the best variable for evaluating the ultimate impact or importance of an article. Better variables that allow a more robust analysis should be defined.
In conclusion, the degree of adherence to STARD 2015 was moderate for this particular journal, indicating that there is room for improvement. When adjusted for exposure time, the degree of adherence to STARD 2015 did not affect the citation rate.
References
-
Equator Network, Essential Resources for Writing and Publishing Health Research. Web site. [Accessed July 3, 2016].http://www.equatornetwork.org/.
-
-
Wikipedia. Open access journal. Web site. [Accessed March 3, 2016].
-
-
Van Raan AFJ. The use of bibliometric analysis in research performance assessment and monitoring of interdisciplinary scientific developments. Technikfolgenabschätzung 2003;1:20–29.
-
-
Bossuyt PM, Leeflang MM. Developing Criteria for Including Studies. In: Deeks JJ, Bossuyt PM, Gatsonis C, editors. Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy Version 0.4 [updated September 2008]. Oxford: The Cochrane Collaboration; 2008.
-
-
Lee DH, Lee JM, Klotz E, Kim SJ, Kim KW, Han JK, et al. Detection of recurrent hepatocellular carcinoma in cirrhotic liver after transcatheter arterial chemoembolization: value of quantitative color mapping of the arterial enhancement fraction of the liver. Korean J Radiol 2013;14:51–60.
-
-
Park SO, Seo JB, Kim N, Lee YK, Lee J, Kim DS. Comparison of usual interstitial pneumonia and nonspecific interstitial pneumonia: quantification of disease severity and discrimination between two diseases on HRCT using a texture-based automated system. Korean J Radiol 2011;12:297–307.
-
-
Luczynńska E, Heinze-Paluchowska S, Dyczek S, Blecharz P, Rys J, Reinfuss M. Contrast-enhanced spectral mammography: comparison with conventional mammography and histopathology in 152 women. Korean J Radiol 2014;15:689–696.
-
-
Lee EK, Choi SH, Yun TJ, Kang KM, Kim TM, Lee SH, et al. Prediction of response to concurrent chemoradiotherapy with temozolomide in glioblastoma: application of immediate post-operative dynamic susceptibility contrast and diffusion-weighted MR imaging. Korean J Radiol 2015;16:1341–1348.
-
-
Kim YP, Kannengiesser S, Paek MY, Kim S, Chung TS, Yoo YH, et al. Differentiation between focal malignant marrow-replacing lesions and benign red marrow deposition of the spine with T2*-corrected fat-signal fraction map using a three-echo volume interpolated breath-hold gradient echo Dixon sequence. Korean J Radiol 2014;15:781–791.
-
-
Not-so-deep impact. Nature 2005;435:1003–1004.
-