Top

BMC Medical Imaging

Published in:

Open Access 01-12-2016 | Research article

How to assess intra- and inter-observer agreement with quantitative PET using variance component analysis: a proposal for standardisation

Authors: Oke Gerke, Mie Holm Vilstrup, Eivind Antonsen Segtnan, Ulrich Halekoh, Poul Flemming Høilund-Carlsen

Published in: BMC Medical Imaging | Issue 1/2016

Abstract

Background

Quantitative measurement procedures need to be accurate and precise to justify their clinical use. Precision reflects deviation of groups of measurement from another, often expressed as proportions of agreement, standard errors of measurement, coefficients of variation, or the Bland-Altman plot. We suggest variance component analysis (VCA) to estimate the influence of errors due to single elements of a PET scan (scanner, time point, observer, etc.) to express the composite uncertainty of repeated measurements and obtain relevant repeatability coefficients (RCs) which have a unique relation to Bland-Altman plots. Here, we present this approach for assessment of intra- and inter-observer variation with PET/CT exemplified with data from two clinical studies.

Methods

In study 1, 30 patients were scanned pre-operatively for the assessment of ovarian cancer, and their scans were assessed twice by the same observer to study intra-observer agreement. In study 2, 14 patients with glioma were scanned up to five times. Resulting 49 scans were assessed by three observers to examine inter-observer agreement. Outcome variables were SUVmax in study 1 and cerebral total hemispheric glycolysis (THG) in study 2.

Results

In study 1, we found a RC of 2.46 equalling half the width of the Bland-Altman limits of agreement. In study 2, the RC for identical conditions (same scanner, patient, time point, and observer) was 2392; allowing for different scanners increased the RC to 2543. Inter-observer differences were negligible compared to differences owing to other factors; between observer 1 and 2: −10 (95 % CI: −352 to 332) and between observer 1 vs 3: 28 (95 % CI: −313 to 370).

Conclusions

VCA is an appealing approach for weighing different sources of variation against each other, summarised as RCs. The involved linear mixed effects models require carefully considered sample sizes to account for the challenge of sufficiently accurately estimating variance components.

Available only for authorised users

Alavi A, Reivich M. The conception of FDG-PET imaging. Semin Nucl Med. 2002;32(1):2–5.CrossRefPubMed

Hess S, Blomberg BA, Zhu HJ, Høilund-Carlsen PF, Alavi A. The pivotal role of FDG-PET/CT in modern medicine. Acad Radiol. 2014;21:232–49.CrossRefPubMed

Kwee TC, Gholami S, Werner TJ, Rubello D, Alavi A, Høilund-Carlsen PF. 18 F-FDG, as a single imaging agent in assessing cancer, shows the ongoing biological phenomena in many domains: do we need additional tracers for clinical purposes? Nucl Med Commun. 2016;37(4):333–7.CrossRefPubMed

Thie JA. Understanding the standardized uptake value, its methods, and implications for usage. J Nucl Med. 2004;45(9):1431–4.PubMed

Boellaard R, Delgado-Bolton R, Oyen WJ, Giammarile F, Tatsch K, Eschner W, et al. FDG PET/CT: EANM procedure guidelines for tumour imaging: version 2.0. Eur J Nucl Med Mol Imaging. 2015;42(2):328–54.CrossRefPubMed

Kottner J, Audigé L, Brorson S, Donner A, Gajewski BJ, Hróbjartsson A, et al. Guidelines for reporting reliability and agreement studies (GRRAS) were proposed. J Clin Epidemiol. 2011;64(1):96–106.CrossRefPubMed

Zaki R, Bulgiba A, Ismail R, Ismail NA. Statistical methods used to test for agreement of medical instruments measuring continuous variables in method comparison studies: a systematic review. PLoS ONE. 2012;7(5):e37908.CrossRefPubMedPubMedCentral

Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1(8476):307–10.CrossRefPubMed

Bland JM, Altman DG. Difference versus mean plots. Ann Clin Biochem. 1997;34(5):570–1.CrossRefPubMed

10.

Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res. 1999;8(2):135–60.CrossRefPubMed

11.

de Vet HC, Terwee CB, Knol DL, Bouter LM. When to use agreement versus reliability measures. J Clin Epidemiol. 2006;59(10):1033–9.CrossRefPubMed

12.

Barnhart HX, Haber MJ, Lin LI. An overview on assessing agreement with continuous measurements. J Biopharm Stat. 2007;17(4):529–69.CrossRefPubMed

13.

Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86(2):420–8.CrossRefPubMed

14.

Boellaard R, O'Doherty MJ, Weber WA, Mottaghy FM, Lonsdale MN, Stroobants SG, et al. FDG PET and PET/CT: EANM procedure guidelines for tumour PET imaging: version 1.0. Eur J Nucl Med Mol Imaging. 2010;37(1):181–200.CrossRefPubMed

15.

Searle SR, Casella G, McCulloch CE. Variance Components. New York: Wiley; 1992.CrossRef

16.

Bland M. How do I analyse observer variation studies? 2004. https://www-users.york.ac.uk/~mb55/meas/observer.pdf. Accessed 2 Sept 2016.

17.

Bland M. What is the origin of the formula for repeatability? 2005. https://www-users.york.ac.uk/~mb55/meas/repeat.htm. Accessed 2 Sept 2016.

18.

Bradley EL, Blackwood LG. Comparing paired data: a simultaneous test for means and variances. American Statistician. 1989;43(4):234–5.

19.

Steichen TJ, Cox NJ. Concordance correlation coefficient. Stata Technical Bulletin May 2008 (stb-43):35–39. http://www.stata.com/products/stb/journals/stb43.pdf. Accessed 2 Sept 2016.

20.

Carstensen B. Comparing Clinical Measurement Methods: A Practical Guide. Chichester: Wiley; 2010.CrossRef

21.

Klaassen R, Bennink RJ, van Tienhoven G, Bijlsma MF, Besselink MG, van Berge Henegouwen MI, et al. Feasibility and repeatability of PET with the hypoxia tracer [(18)F]HX4 in oesophageal and pancreatic cancer. Radiother Oncol. 2015;116(1):94–9.CrossRefPubMed

22.

Rockall AG, Avril N, Lam R, Iannone R, Mozley PD, Parkinson C, et al. Repeatability of quantitative FDG-PET/CT and contrast-enhanced CT in recurrent ovarian carcinoma: test-retest measurements for tumor FDG uptake, diameter, and volume. Clin Cancer Res. 2014;20(10):2751–60.CrossRefPubMedPubMedCentral

23.

Thorn SL, de Kemp RA, Dumouchel T, Klein R, Renaud JM, Wells RG, et al. Repeatable noninvasive measurement of mouse myocardial glucose uptake with 18 F-FDG: evaluation of tracer kinetics in a type 1 diabetes model. J Nucl Med. 2013;54(9):1637–44.CrossRefPubMed

24.

Heijmen L, de Geus-Oei LF, de Wilt JH, Visvikis D, Hatt M, Visser EP, et al. Reproducibility of functional volume and activity concentration in 18 F-FDG PET/CT of liver metastases in colorectal cancer. Eur J Nucl Med Mol Imaging. 2012;39(12):1858–67.CrossRefPubMed

25.

Lamoureux M, Thorn S, Dumouchel T, Renaud JM, Klein R, Mason S, et al. Uniformity and repeatability of normal resting myocardial blood flow in rats using [13 N]-ammonia and small animal PET. Nucl Med Commun. 2012;33(9):917–25.CrossRefPubMed

26.

Lodge MA, Jacene HA, Pili R, Wahl RL. Reproducibility of tumor blood flow quantification with 15O-water PET. J Nucl Med. 2008;49(10):1620–7.CrossRefPubMedPubMedCentral

27.

Casella G, Berger RL. Statistical Inference. 2nd ed. Boston: Cengage Learning; 2008.

28.

Beiderwellen KJ, Poeppel TD, Hartung-Knemeyer V, Buchbender C, Kuehl H, Bockisch A, et al. Simultaneous 68Ga-DOTATOC PET/MRI in patients with gastroenteropancreatic neuroendocrine tumors: initial results. Invest Radiol. 2013;48(5):273–9.CrossRefPubMed

29.

Hamill JJ, Sunderland JJ, LeBlanc AK, Kojima CJ, Wall J, Martin EB. Evaluation of CT-based lean-body SUV. Med Phys. 2013;40(9):092504.CrossRefPubMed

30.

Shoukri MM. Measures of Interobserver Agreement and Reliability. 2nd ed. Boca Raton: Chapman & Hall; 2010.CrossRef

31.

Dunn G. Statistical Evaluation of Measurement Errors. Design and Analysis of Reliability Studies. 2nd ed. Chichester: Wiley; 2004.

32.

Altman DG. Practical Statistics for Medical Research. 1990. Chapman & Hall/CRC.

33.

Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol. 1996;49(12):1373–9.CrossRefPubMed

34.

Concato J, Peduzzi P, Holford TR, Feinstein AR. Importance of events per independent variable in proportional hazards analysis. I. Background, goals, and general strategy. J Clin Epidemiol. 1995;48(12):1495–501.CrossRefPubMed

35.

Peduzzi P, Concato J, Feinstein AR, Holford TR. Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates. J Clin Epidemiol. 1995;48(12):1503–10.CrossRefPubMed

36.

Harrell Jr FE, Lee KL, Matchar DB, Reichert TA. Regression models for prognostic prediction: advantages, problems, and suggested solutions. Cancer Treat Rep. 1985;69(10):1071–7.PubMed

37.

Harrell Jr FE, Lee KL, Califf RM, Pryor DB, Rosati RA. Regression modelling strategies for improved prognostic prediction. Stat Med. 1984;3(2):143–52.CrossRefPubMed

38.

Tahari AK, Paidpally V, Chirindel A, Wahl RL, Subramaniam RM. Two-time-point FDG PET/CT: liver SULmean repeatability. Am J Roentgenol. 2015;204(2):402–7.CrossRef

39.

Menda Y, Ponto LL, Schultz MK, Zamba GK, Watkins GL, Bushnell DL, et al. Repeatability of gallium-68 DOTATOC positron emission tomographic imaging in neuroendocrine tumors. Pancreas. 2013;42(6):937–43.CrossRefPubMedPubMedCentral

40.

Guyatt G, Walter S, Norman G. Measuring change over time: assessing the usefulness of evaluative instruments. J Chronic Dis. 1987;40(2):171–8.CrossRefPubMed

41.

Boellaard R. Standards for PET image acquisition and quantitative data analysis. J Nucl Med. 2009;50 Suppl 1:11S–20.CrossRefPubMed

42.

International Organization for Standardization (ISO). Accuracy (Trueness and Precision) of Measurement Methods and Results—Part 1: General Principles and Definitions (5725–1). Geneva: ISO; 1994. http://www.iso.org/iso/catalogue_detail.htm?csnumber=11833. Accessed 2 Sept 2016.

43.

Food and Drug Administration (FDA). Guidance for Industry: Bioanalytical Method Validation. 2001. www.fda.gov/downloads/drugs/guidancecomplianceregulatoryinformation/guidances/ucm070107.pdf. Accessed 2 Sept 2016.

44.

Litwin MS. How to Assess and Interpret Survey Psychometrics. 2nd ed. Thousands Oaks: Sage Publications; 2003.CrossRef

45.

Horton R. Common sense and figures: the rhetoric of validity in medicine (Bradford Hill Memorial Lecture 1999). Stat Med. 2000;19(23):3149–64.CrossRefPubMed

46.

Lord FM, Novick MR. Statistical Theories of Mental Test Scores. Reading: Addison-Wesley; 1968.

47.

Cronbach LJ, Gleser GC, Nanda H, Rajaratnam N. The Dependability of Behavioral Measurements: Theory and Generalizability for Scores and Profiles. New York: Wiley; 1972.

Title: How to assess intra- and inter-observer agreement with quantitative PET using variance component analysis: a proposal for standardisation
Authors: Oke Gerke
Mie Holm Vilstrup
Eivind Antonsen Segtnan
Ulrich Halekoh
Poul Flemming Høilund-Carlsen
Publication date: 01-12-2016
Publisher: BioMed Central
Published in: BMC Medical Imaging / Issue 1/2016
Electronic ISSN: 1471-2342
DOI: https://doi.org/10.1186/s12880-016-0159-3

At a glance: The ONWARDS insulin icodec trials

Springer Medicine

How to assess intra- and inter-observer agreement with quantitative PET using variance component analysis: a proposal for standardisation

Abstract

Background

Methods

Results

Conclusions

At a glance: The ONWARDS insulin icodec trials

Springer Medicine

Abstract

Background

Methods

Results

Conclusions

Please log in to get access to this content

Other articles of this Issue 1/2016

Applying a computer-aided scheme to detect a new radiographic image marker for prediction of chemotherapy outcome

Practical one-dimensional measurements of age-related brain atrophy are validated by 3-dimensional values and clinical outcomes: a retrospective study

Preoperative axillary lymph node staging by ultrasound-guided cytology using a four-level sonographic score

An open source software for analysis of dynamic contrast enhanced magnetic resonance images: UMMPerfusion revisited

Near-infrared operating lamp for intraoperative molecular imaging of a mediastinal tumor

Color radiography in lung nodule detection and characterization: comparison with conventional gray scale radiography