Skip to main content
Top
Published in: BMC Medical Imaging 1/2016

Open Access 01-12-2016 | Research article

How to assess intra- and inter-observer agreement with quantitative PET using variance component analysis: a proposal for standardisation

Authors: Oke Gerke, Mie Holm Vilstrup, Eivind Antonsen Segtnan, Ulrich Halekoh, Poul Flemming Høilund-Carlsen

Published in: BMC Medical Imaging | Issue 1/2016

Login to get access

Abstract

Background

Quantitative measurement procedures need to be accurate and precise to justify their clinical use. Precision reflects deviation of groups of measurement from another, often expressed as proportions of agreement, standard errors of measurement, coefficients of variation, or the Bland-Altman plot. We suggest variance component analysis (VCA) to estimate the influence of errors due to single elements of a PET scan (scanner, time point, observer, etc.) to express the composite uncertainty of repeated measurements and obtain relevant repeatability coefficients (RCs) which have a unique relation to Bland-Altman plots. Here, we present this approach for assessment of intra- and inter-observer variation with PET/CT exemplified with data from two clinical studies.

Methods

In study 1, 30 patients were scanned pre-operatively for the assessment of ovarian cancer, and their scans were assessed twice by the same observer to study intra-observer agreement. In study 2, 14 patients with glioma were scanned up to five times. Resulting 49 scans were assessed by three observers to examine inter-observer agreement. Outcome variables were SUVmax in study 1 and cerebral total hemispheric glycolysis (THG) in study 2.

Results

In study 1, we found a RC of 2.46 equalling half the width of the Bland-Altman limits of agreement. In study 2, the RC for identical conditions (same scanner, patient, time point, and observer) was 2392; allowing for different scanners increased the RC to 2543. Inter-observer differences were negligible compared to differences owing to other factors; between observer 1 and 2: −10 (95 % CI: −352 to 332) and between observer 1 vs 3: 28 (95 % CI: −313 to 370).

Conclusions

VCA is an appealing approach for weighing different sources of variation against each other, summarised as RCs. The involved linear mixed effects models require carefully considered sample sizes to account for the challenge of sufficiently accurately estimating variance components.
Appendix
Available only for authorised users
Literature
2.
go back to reference Hess S, Blomberg BA, Zhu HJ, Høilund-Carlsen PF, Alavi A. The pivotal role of FDG-PET/CT in modern medicine. Acad Radiol. 2014;21:232–49.CrossRefPubMed Hess S, Blomberg BA, Zhu HJ, Høilund-Carlsen PF, Alavi A. The pivotal role of FDG-PET/CT in modern medicine. Acad Radiol. 2014;21:232–49.CrossRefPubMed
3.
go back to reference Kwee TC, Gholami S, Werner TJ, Rubello D, Alavi A, Høilund-Carlsen PF. 18 F-FDG, as a single imaging agent in assessing cancer, shows the ongoing biological phenomena in many domains: do we need additional tracers for clinical purposes? Nucl Med Commun. 2016;37(4):333–7.CrossRefPubMed Kwee TC, Gholami S, Werner TJ, Rubello D, Alavi A, Høilund-Carlsen PF. 18 F-FDG, as a single imaging agent in assessing cancer, shows the ongoing biological phenomena in many domains: do we need additional tracers for clinical purposes? Nucl Med Commun. 2016;37(4):333–7.CrossRefPubMed
4.
go back to reference Thie JA. Understanding the standardized uptake value, its methods, and implications for usage. J Nucl Med. 2004;45(9):1431–4.PubMed Thie JA. Understanding the standardized uptake value, its methods, and implications for usage. J Nucl Med. 2004;45(9):1431–4.PubMed
5.
go back to reference Boellaard R, Delgado-Bolton R, Oyen WJ, Giammarile F, Tatsch K, Eschner W, et al. FDG PET/CT: EANM procedure guidelines for tumour imaging: version 2.0. Eur J Nucl Med Mol Imaging. 2015;42(2):328–54.CrossRefPubMed Boellaard R, Delgado-Bolton R, Oyen WJ, Giammarile F, Tatsch K, Eschner W, et al. FDG PET/CT: EANM procedure guidelines for tumour imaging: version 2.0. Eur J Nucl Med Mol Imaging. 2015;42(2):328–54.CrossRefPubMed
6.
go back to reference Kottner J, Audigé L, Brorson S, Donner A, Gajewski BJ, Hróbjartsson A, et al. Guidelines for reporting reliability and agreement studies (GRRAS) were proposed. J Clin Epidemiol. 2011;64(1):96–106.CrossRefPubMed Kottner J, Audigé L, Brorson S, Donner A, Gajewski BJ, Hróbjartsson A, et al. Guidelines for reporting reliability and agreement studies (GRRAS) were proposed. J Clin Epidemiol. 2011;64(1):96–106.CrossRefPubMed
7.
go back to reference Zaki R, Bulgiba A, Ismail R, Ismail NA. Statistical methods used to test for agreement of medical instruments measuring continuous variables in method comparison studies: a systematic review. PLoS ONE. 2012;7(5):e37908.CrossRefPubMedPubMedCentral Zaki R, Bulgiba A, Ismail R, Ismail NA. Statistical methods used to test for agreement of medical instruments measuring continuous variables in method comparison studies: a systematic review. PLoS ONE. 2012;7(5):e37908.CrossRefPubMedPubMedCentral
8.
go back to reference Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1(8476):307–10.CrossRefPubMed Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1(8476):307–10.CrossRefPubMed
10.
go back to reference Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res. 1999;8(2):135–60.CrossRefPubMed Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res. 1999;8(2):135–60.CrossRefPubMed
11.
go back to reference de Vet HC, Terwee CB, Knol DL, Bouter LM. When to use agreement versus reliability measures. J Clin Epidemiol. 2006;59(10):1033–9.CrossRefPubMed de Vet HC, Terwee CB, Knol DL, Bouter LM. When to use agreement versus reliability measures. J Clin Epidemiol. 2006;59(10):1033–9.CrossRefPubMed
12.
go back to reference Barnhart HX, Haber MJ, Lin LI. An overview on assessing agreement with continuous measurements. J Biopharm Stat. 2007;17(4):529–69.CrossRefPubMed Barnhart HX, Haber MJ, Lin LI. An overview on assessing agreement with continuous measurements. J Biopharm Stat. 2007;17(4):529–69.CrossRefPubMed
13.
go back to reference Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86(2):420–8.CrossRefPubMed Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86(2):420–8.CrossRefPubMed
14.
go back to reference Boellaard R, O'Doherty MJ, Weber WA, Mottaghy FM, Lonsdale MN, Stroobants SG, et al. FDG PET and PET/CT: EANM procedure guidelines for tumour PET imaging: version 1.0. Eur J Nucl Med Mol Imaging. 2010;37(1):181–200.CrossRefPubMed Boellaard R, O'Doherty MJ, Weber WA, Mottaghy FM, Lonsdale MN, Stroobants SG, et al. FDG PET and PET/CT: EANM procedure guidelines for tumour PET imaging: version 1.0. Eur J Nucl Med Mol Imaging. 2010;37(1):181–200.CrossRefPubMed
15.
go back to reference Searle SR, Casella G, McCulloch CE. Variance Components. New York: Wiley; 1992.CrossRef Searle SR, Casella G, McCulloch CE. Variance Components. New York: Wiley; 1992.CrossRef
18.
go back to reference Bradley EL, Blackwood LG. Comparing paired data: a simultaneous test for means and variances. American Statistician. 1989;43(4):234–5. Bradley EL, Blackwood LG. Comparing paired data: a simultaneous test for means and variances. American Statistician. 1989;43(4):234–5.
20.
go back to reference Carstensen B. Comparing Clinical Measurement Methods: A Practical Guide. Chichester: Wiley; 2010.CrossRef Carstensen B. Comparing Clinical Measurement Methods: A Practical Guide. Chichester: Wiley; 2010.CrossRef
21.
go back to reference Klaassen R, Bennink RJ, van Tienhoven G, Bijlsma MF, Besselink MG, van Berge Henegouwen MI, et al. Feasibility and repeatability of PET with the hypoxia tracer [(18)F]HX4 in oesophageal and pancreatic cancer. Radiother Oncol. 2015;116(1):94–9.CrossRefPubMed Klaassen R, Bennink RJ, van Tienhoven G, Bijlsma MF, Besselink MG, van Berge Henegouwen MI, et al. Feasibility and repeatability of PET with the hypoxia tracer [(18)F]HX4 in oesophageal and pancreatic cancer. Radiother Oncol. 2015;116(1):94–9.CrossRefPubMed
22.
go back to reference Rockall AG, Avril N, Lam R, Iannone R, Mozley PD, Parkinson C, et al. Repeatability of quantitative FDG-PET/CT and contrast-enhanced CT in recurrent ovarian carcinoma: test-retest measurements for tumor FDG uptake, diameter, and volume. Clin Cancer Res. 2014;20(10):2751–60.CrossRefPubMedPubMedCentral Rockall AG, Avril N, Lam R, Iannone R, Mozley PD, Parkinson C, et al. Repeatability of quantitative FDG-PET/CT and contrast-enhanced CT in recurrent ovarian carcinoma: test-retest measurements for tumor FDG uptake, diameter, and volume. Clin Cancer Res. 2014;20(10):2751–60.CrossRefPubMedPubMedCentral
23.
go back to reference Thorn SL, de Kemp RA, Dumouchel T, Klein R, Renaud JM, Wells RG, et al. Repeatable noninvasive measurement of mouse myocardial glucose uptake with 18 F-FDG: evaluation of tracer kinetics in a type 1 diabetes model. J Nucl Med. 2013;54(9):1637–44.CrossRefPubMed Thorn SL, de Kemp RA, Dumouchel T, Klein R, Renaud JM, Wells RG, et al. Repeatable noninvasive measurement of mouse myocardial glucose uptake with 18 F-FDG: evaluation of tracer kinetics in a type 1 diabetes model. J Nucl Med. 2013;54(9):1637–44.CrossRefPubMed
24.
go back to reference Heijmen L, de Geus-Oei LF, de Wilt JH, Visvikis D, Hatt M, Visser EP, et al. Reproducibility of functional volume and activity concentration in 18 F-FDG PET/CT of liver metastases in colorectal cancer. Eur J Nucl Med Mol Imaging. 2012;39(12):1858–67.CrossRefPubMed Heijmen L, de Geus-Oei LF, de Wilt JH, Visvikis D, Hatt M, Visser EP, et al. Reproducibility of functional volume and activity concentration in 18 F-FDG PET/CT of liver metastases in colorectal cancer. Eur J Nucl Med Mol Imaging. 2012;39(12):1858–67.CrossRefPubMed
25.
go back to reference Lamoureux M, Thorn S, Dumouchel T, Renaud JM, Klein R, Mason S, et al. Uniformity and repeatability of normal resting myocardial blood flow in rats using [13 N]-ammonia and small animal PET. Nucl Med Commun. 2012;33(9):917–25.CrossRefPubMed Lamoureux M, Thorn S, Dumouchel T, Renaud JM, Klein R, Mason S, et al. Uniformity and repeatability of normal resting myocardial blood flow in rats using [13 N]-ammonia and small animal PET. Nucl Med Commun. 2012;33(9):917–25.CrossRefPubMed
26.
27.
go back to reference Casella G, Berger RL. Statistical Inference. 2nd ed. Boston: Cengage Learning; 2008. Casella G, Berger RL. Statistical Inference. 2nd ed. Boston: Cengage Learning; 2008.
28.
go back to reference Beiderwellen KJ, Poeppel TD, Hartung-Knemeyer V, Buchbender C, Kuehl H, Bockisch A, et al. Simultaneous 68Ga-DOTATOC PET/MRI in patients with gastroenteropancreatic neuroendocrine tumors: initial results. Invest Radiol. 2013;48(5):273–9.CrossRefPubMed Beiderwellen KJ, Poeppel TD, Hartung-Knemeyer V, Buchbender C, Kuehl H, Bockisch A, et al. Simultaneous 68Ga-DOTATOC PET/MRI in patients with gastroenteropancreatic neuroendocrine tumors: initial results. Invest Radiol. 2013;48(5):273–9.CrossRefPubMed
29.
go back to reference Hamill JJ, Sunderland JJ, LeBlanc AK, Kojima CJ, Wall J, Martin EB. Evaluation of CT-based lean-body SUV. Med Phys. 2013;40(9):092504.CrossRefPubMed Hamill JJ, Sunderland JJ, LeBlanc AK, Kojima CJ, Wall J, Martin EB. Evaluation of CT-based lean-body SUV. Med Phys. 2013;40(9):092504.CrossRefPubMed
30.
go back to reference Shoukri MM. Measures of Interobserver Agreement and Reliability. 2nd ed. Boca Raton: Chapman & Hall; 2010.CrossRef Shoukri MM. Measures of Interobserver Agreement and Reliability. 2nd ed. Boca Raton: Chapman & Hall; 2010.CrossRef
31.
go back to reference Dunn G. Statistical Evaluation of Measurement Errors. Design and Analysis of Reliability Studies. 2nd ed. Chichester: Wiley; 2004. Dunn G. Statistical Evaluation of Measurement Errors. Design and Analysis of Reliability Studies. 2nd ed. Chichester: Wiley; 2004.
32.
go back to reference Altman DG. Practical Statistics for Medical Research. 1990. Chapman & Hall/CRC. Altman DG. Practical Statistics for Medical Research. 1990. Chapman & Hall/CRC.
33.
go back to reference Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol. 1996;49(12):1373–9.CrossRefPubMed Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol. 1996;49(12):1373–9.CrossRefPubMed
34.
go back to reference Concato J, Peduzzi P, Holford TR, Feinstein AR. Importance of events per independent variable in proportional hazards analysis. I. Background, goals, and general strategy. J Clin Epidemiol. 1995;48(12):1495–501.CrossRefPubMed Concato J, Peduzzi P, Holford TR, Feinstein AR. Importance of events per independent variable in proportional hazards analysis. I. Background, goals, and general strategy. J Clin Epidemiol. 1995;48(12):1495–501.CrossRefPubMed
35.
go back to reference Peduzzi P, Concato J, Feinstein AR, Holford TR. Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates. J Clin Epidemiol. 1995;48(12):1503–10.CrossRefPubMed Peduzzi P, Concato J, Feinstein AR, Holford TR. Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates. J Clin Epidemiol. 1995;48(12):1503–10.CrossRefPubMed
36.
go back to reference Harrell Jr FE, Lee KL, Matchar DB, Reichert TA. Regression models for prognostic prediction: advantages, problems, and suggested solutions. Cancer Treat Rep. 1985;69(10):1071–7.PubMed Harrell Jr FE, Lee KL, Matchar DB, Reichert TA. Regression models for prognostic prediction: advantages, problems, and suggested solutions. Cancer Treat Rep. 1985;69(10):1071–7.PubMed
37.
go back to reference Harrell Jr FE, Lee KL, Califf RM, Pryor DB, Rosati RA. Regression modelling strategies for improved prognostic prediction. Stat Med. 1984;3(2):143–52.CrossRefPubMed Harrell Jr FE, Lee KL, Califf RM, Pryor DB, Rosati RA. Regression modelling strategies for improved prognostic prediction. Stat Med. 1984;3(2):143–52.CrossRefPubMed
38.
go back to reference Tahari AK, Paidpally V, Chirindel A, Wahl RL, Subramaniam RM. Two-time-point FDG PET/CT: liver SULmean repeatability. Am J Roentgenol. 2015;204(2):402–7.CrossRef Tahari AK, Paidpally V, Chirindel A, Wahl RL, Subramaniam RM. Two-time-point FDG PET/CT: liver SULmean repeatability. Am J Roentgenol. 2015;204(2):402–7.CrossRef
39.
go back to reference Menda Y, Ponto LL, Schultz MK, Zamba GK, Watkins GL, Bushnell DL, et al. Repeatability of gallium-68 DOTATOC positron emission tomographic imaging in neuroendocrine tumors. Pancreas. 2013;42(6):937–43.CrossRefPubMedPubMedCentral Menda Y, Ponto LL, Schultz MK, Zamba GK, Watkins GL, Bushnell DL, et al. Repeatability of gallium-68 DOTATOC positron emission tomographic imaging in neuroendocrine tumors. Pancreas. 2013;42(6):937–43.CrossRefPubMedPubMedCentral
40.
go back to reference Guyatt G, Walter S, Norman G. Measuring change over time: assessing the usefulness of evaluative instruments. J Chronic Dis. 1987;40(2):171–8.CrossRefPubMed Guyatt G, Walter S, Norman G. Measuring change over time: assessing the usefulness of evaluative instruments. J Chronic Dis. 1987;40(2):171–8.CrossRefPubMed
41.
go back to reference Boellaard R. Standards for PET image acquisition and quantitative data analysis. J Nucl Med. 2009;50 Suppl 1:11S–20.CrossRefPubMed Boellaard R. Standards for PET image acquisition and quantitative data analysis. J Nucl Med. 2009;50 Suppl 1:11S–20.CrossRefPubMed
44.
go back to reference Litwin MS. How to Assess and Interpret Survey Psychometrics. 2nd ed. Thousands Oaks: Sage Publications; 2003.CrossRef Litwin MS. How to Assess and Interpret Survey Psychometrics. 2nd ed. Thousands Oaks: Sage Publications; 2003.CrossRef
45.
go back to reference Horton R. Common sense and figures: the rhetoric of validity in medicine (Bradford Hill Memorial Lecture 1999). Stat Med. 2000;19(23):3149–64.CrossRefPubMed Horton R. Common sense and figures: the rhetoric of validity in medicine (Bradford Hill Memorial Lecture 1999). Stat Med. 2000;19(23):3149–64.CrossRefPubMed
46.
go back to reference Lord FM, Novick MR. Statistical Theories of Mental Test Scores. Reading: Addison-Wesley; 1968. Lord FM, Novick MR. Statistical Theories of Mental Test Scores. Reading: Addison-Wesley; 1968.
47.
go back to reference Cronbach LJ, Gleser GC, Nanda H, Rajaratnam N. The Dependability of Behavioral Measurements: Theory and Generalizability for Scores and Profiles. New York: Wiley; 1972. Cronbach LJ, Gleser GC, Nanda H, Rajaratnam N. The Dependability of Behavioral Measurements: Theory and Generalizability for Scores and Profiles. New York: Wiley; 1972.
Metadata
Title
How to assess intra- and inter-observer agreement with quantitative PET using variance component analysis: a proposal for standardisation
Authors
Oke Gerke
Mie Holm Vilstrup
Eivind Antonsen Segtnan
Ulrich Halekoh
Poul Flemming Høilund-Carlsen
Publication date
01-12-2016
Publisher
BioMed Central
Published in
BMC Medical Imaging / Issue 1/2016
Electronic ISSN: 1471-2342
DOI
https://doi.org/10.1186/s12880-016-0159-3

Other articles of this Issue 1/2016

BMC Medical Imaging 1/2016 Go to the issue