Brought to you by:
Topical Review The following article is Open access

Task-based measures of image quality and their relation to radiation dose and patient risk

, , , and

Published 7 January 2015 © 2015 Institute of Physics and Engineering in Medicine
, , Citation Harrison H Barrett et al 2015 Phys. Med. Biol. 60 R1 DOI 10.1088/0031-9155/60/2/R1

0031-9155/60/2/R1

Abstract

The theory of task-based assessment of image quality is reviewed in the context of imaging with ionizing radiation, and objective figures of merit (FOMs) for image quality are summarized. The variation of the FOMs with the task, the observer and especially with the mean number of photons recorded in the image is discussed. Then various standard methods for specifying radiation dose are reviewed and related to the mean number of photons in the image and hence to image quality. Current knowledge of the relation between local radiation dose and the risk of various adverse effects is summarized, and some graphical depictions of the tradeoffs between image quality and risk are introduced. Then various dose-reduction strategies are discussed in terms of their effect on task-based measures of image quality.

Export citation and abstract BibTeX RIS

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

Like any engineering problem, the design of an imaging system always requires tradeoffs. For medical imaging with ionizing radiation, a major tradeoff is between image quality and risk to the patient from absorbed radiation dose. All else being equal, increasing the radiation dose to the patient will always improve image quality by reducing the Poisson noise in the image, but by how much? At the same time, increasing the radiation dose, above some level at least, will always result in increased risk of adverse effects on the patient, but by how much? The purpose of this paper is to survey our current knowledge of how radiation dose relates to image quality and patient risk, thereby laying the groundwork for a rigorous risk-benefit analysis applicable to such important medical modalities as mammography, nuclear medicine and computed tomography.

The tools required for this risk-benefit analysis are (a) the theory and practice of task-based assessment of image quality; (b) methods for calculating the spatiotemporal distribution of absorbed dose in a patient's body, and (c) knowledge from epidemiological data, mathematical models and laboratory studies of the probability of various adverse effects arising from the absorbed dose distribution.

Task-based assessment of image quality rests on the premise that the only rigorous way of defining image quality for a medical image is in terms of the medical goal of the image, which is often called the task of the imaging system (Barrett 1990, Barrett et al 1995, Barrett et al 1998a, Barrett et al 2006, Barrett and Myers 2004). Given an image, the task can be performed by a person such as a radiologist, by a computer algorithm, or by a mathematical construct called an ideal observer. We use the general term observer to refer to any of these methods for performing the task. For a given task, observer and imaging system, we can define a suitable figure of merit (FOM) that specifies how well that task can be performed on some ensemble of patients.

If the chosen task is relevant to the diagnosis and/or treatment of the patients in the ensemble, then the FOM quantifies the benefit to patients from the imaging procedure under study. Many other ad hoc figures of merit for image quality have been proposed in the literature (see section 14.1 in Barrett and Myers (2004) for a survey), but if they do not relate to benefits to the patients, i.e. to task performance, then they cannot be used for risk-benefit analysis.

The theory behind task-based assessment of image quality is now well established, and the methodology is used routinely (but not universally) in many investigations of image quality. After some mathematical and statistical preliminaries in section 2, a review of task-based assessment is given in section 3 of this paper.

A task-based FOM depends not only on the task and the chosen ensemble of patients, but also on many details of the imaging system, such as the spatial and temporal resolution of the system and, in the case of imaging with x-rays and gamma rays, the number of high-energy photons recorded in the image. The number of recorded photons, in turn, depends on many factors including the efficiency of the imaging system at collecting and recording the photons and the exposure time over which the image is acquired. Most importantly for this paper, however, all task-based FOMs will necessarily depend on the radiation dose to the patient.

Many physical processes enter into the calculation of the distribution of absorbed dose in a patient's body. A careful calculation requires consideration of how the photons are delivered to the body, the energy spectrum of the photons, and the physical processes of absorption and scattering in the body. Though complex, these factors are all well understood and precise computational methods have been developed, so there is no difficulty in principle in computing the distribution of the absorbed dose for a given patient and imaging procedure. Unfortunately, it is common practice in the field to reduce the dose distribution, which is really a function of spatial position in the body and time, to a single number which, it is hoped, will capture some indication of the biological hazard from the dose in a way that is easily communicated to the patient or to the scientific and medical community. Some aspects of the dose distribution and various summary measures of it are discussed in section 4.

The biological risk from medical imaging with ionizing radiation is a complicated and controversial subject but, as we shall see in section 5, there are viable approaches to assessing the risks of various adverse effects and incorporating them into an overall framework for assessing the tradeoffs between dose and image quality. It is not the objective of this paper to enter into the debate regarding whether the linear, no-threshold model or some nonlinear model is more nearly correct for some specific radiation hazard, but in section 5 we provide enough information about the models that a designer or user of a medical imaging system can quantify the relationship between image quality and patient risk for any of the extant models and for many different adverse effects of radiation.

Section 6 shows how these major themes—task-based assessment of image quality, spatial and temporal distribution of absorbed dose, and dose-risk models—can be integrated into a general theoretical framework for using and optimizing medical imaging systems. In section 7 we survey current research into methods that might be used for reducing the patient dose in in imaging with ionizing radiations, potentially without loss of image quality. In section 8 we try to identify key gaps in our knowledge of image-quality assessment, dose calculation and estimation of biological hazards, and we suggest some research directions that will help fill these gaps.

2. Mathematical basics and notation

2.1. Mathematical descriptions of objects and images

The notation used in this paper closely follows (Barrett and Myers 2004). An object being imaged is a scalar-valued function of space and time, denoted f (r, t), where r = (x, y, z) is a three-dimensional (3D) position vector and t is the time. In this paper we consider only radiological imaging with x rays or nuclear-medical imaging with gamma rays.

With x-ray imaging the radiation source is outside the patient's body, and f (r, t) represents the spatiotemporal distribution of the x-ray attenuation coefficient in the body, which can arise either from the intrinsic properties of the tissue or from an exogenous x-ray contrast medium. The time dependence arises from respiratory or cardiac motion, peristalsis, gross patient motion or redistribution of the contrast medium. The x-ray attenuation coefficient can also be a function of photon energy, and there can be both absorption and scattering components of the attenuation.

In nuclear medicine, f (r, t) represents the distribution of a radioactive tracer in the body, with the time dependence arising from redistribution of the tracer, radioactive decay or patient motion. For notational ease we denote the object function as f, which can be regarded simply as a shorthand for f (r, t). Technically, f denotes a vector in a Hilbert space, but we do not need this level of rigor here.

Although an object being imaged is a function of continuous spatial and temporal variables, a digital image is a discrete sets of numbers, each obtained as a result of a detector making a measurement of the x-ray or gamma-ray flux impinging on it. We denote the set of measurements for one object as {gm, m = 1, ..., M}, where the curly brackets denote a set, M is the total number of measurements, and index m denotes a particular detector measurement. The measurement consists of integrating the spatially and temporally varying photon flux over some finite detector area and some finite measurement time. Again for notational convenience, we denote the set of measurements as g, which we can think of as an M × 1 column vector.

Because objects are functions of continuous variables and images are discrete vectors, the imaging system is referred to as a continuous-to-discrete (CD) operator (Barrett and Myers 2004), which we denote as $\mathcal{H}$ . To be more precise, we denote the mean image as $\mathbf{\bar{g}}$ , where the overbar indicates an average over many repeated measurements of exactly the same object function. (Deviations from this average are discussed below.) With this notation, we describe the imaging system abstractly as7

Equation (1)

For nuclear medicine, the operator $\mathcal{H}$ is linear to a good approximation. That is, if we replace the object f by the scaled object αf, for example by injecting a different activity of the radiotracer, then we change each component ${{\bar{g}}_{m}}$ of the mean data vector to $\alpha {{\bar{g}}_{m}}$ (where α is some nonnegative constant). For x-ray imaging, $\mathcal{H}$ is not linear because of the exponential attenuation of x-rays, but we can still write (1).

The general form of a linear CD operator is

Equation (2)

where hm(r) is referred to as either the kernel of the operator $\mathcal{H}$ or as the sensitivity function of the system. For single-photon radionuclide imaging, hm(r) specifies the probability that a gamma-ray photon originating at point r results in an event in element m of the discrete image; in conventional rotating-camera SPECT, the index m specifies not only the position of the event on the camera face but also the angular position of the camera, so M is the number of angles times the number of pixels in each 2D projection. The subscript on the integral in (2) indicates that the integral is over the full infinite range of x, y and z, but in practice the range is limited by hm(r).

All CD operators have null functions, functions in object space for which $\mathcal{H}{{\mathbf{f}}_{\text{null}}}=0$ . For linear CD operators, the method of singular-value decomposition (SVD) can be used to decompose any object uniquely as f = fnull + fmeas, where fmeas is called the measurement component of f. By definition, only the measurement component affects the mean data: $\mathbf{\bar{g}}=\mathcal{H}\mathbf{f}=\mathcal{H}{{\mathbf{f}}_{\text{meas}}}$ .

In tomographic imaging modalities such as PET, SPECT or CT, the raw detector measurements g do not constitute the final image. Instead, a linear or nonlinear algorithm is used to produce some discrete approximation to the original object function. We express the algorithm with an operator $\mathcal{O}$ and write

Equation (3)

where $\mathbf{\hat{f}}$ is a set of N numbers on some reconstruction grid (almost always a voxel grid). Thus $\mathbf{\hat{f}}$ is a vector representation of the set $\left\{{{\hat{f}}_{n}},n=1,...,N\right\}$ , and the reconstruction operator $\mathcal{O}$ maps an M × 1 vector to an N × 1 vector; if the algorithm is linear, then $\mathcal{O}$ is an N × M matrix, which we can denote as O.

In x-ray CT, $\mathcal{O}$ is necessarily nonlinear because of a logarithmic operation that compensates for the exponential attenuation of x rays, and in all modalities the subsequent reconstruction algorithm can be nonlinear as well.

2.2. Stochastic descriptions of objects and images

In order to understand and analyze image quality and how it is affected by radiation dose, we need to understand why images are random and how we describe that randomness mathematically.

With ionizing radiation, a primary cause of image randomness is the random interaction of high-energy photons with the detector. Under very general assumptions (Barrett and Myers 2004), virtually always satisfied in medical imaging, the interactions in different detectors, or in the same detector in two different, nonoverlapping time intervals, are statistically independent of each other. From this simple independence assumption, it can be shown (Barrett and Myers 2004) that the random number of detected photons in the mth measurement, denoted Nm, follows the familiar Poisson probability law:

Equation (4)

where the vertical bar indicates a conditional probability; $\Pr \left({{N}_{m}}\mid {{\bar{N}}_{m}}\right)$ is to be read: the probability of observing Nm 'conditional on' (or 'given') its mean ${{\bar{N}}_{m}}$ .

For a photon-counting detector, such as an Anger scintillation camera or some of the new cadmium zinc telluride (CZT) detectors being developed for CT, the detector output for each m is the number of photons detected, so that gm = Nm. If this detector is used in an imaging system, then it follows from (1) that ${{\bar{g}}_{m}}={{\left(\mathcal{H}\mathbf{f}\right)}_{m}}$ , and conditioning on ${{\bar{g}}_{m}}$ is the same as conditioning on f, so

Equation (5)

An important property of Poisson random variables is that the variance equals the mean:

Equation (6)

where the subscripted angle brackets denote an average using the probability law (5).

Because the photon interactions in different detectors are statistically independent (conditional on the object f), the M-dimensional multivariate probability law for the full data vector g is given by

Equation (7)

We can use this multivariate probability to generalize the concept of variance and compute an M × M conditional covariance matrix, Kgf, with elements given by

Equation (8)

where δmm' is the Kronecker delta, which equals 1 if m = m' and 0 if m ≠ m'. Thus, for photon-counting detectors, Kgf is a diagonal matrix, and the diagonal elements, which are the variances of the individual detector outputs, are also the means in this case.

Detectors used in x-ray imaging (Yaffe and Rowlands 1997) are usually not photon-counting, and their outputs are not integers. Instead, gm is a continuous-valued random variable and hence g is an M × 1 continuous-valued random vector, with the randomness arising from both the random photon interactions and electronic readout noise. With these detectors, we need to use a conditional probability density function (PDF), denoted $\text{pr}\left(\mathbf{g}\mid \mathbf{f}\right)$ , rather than the probability Pr(gf) as in (7). The electronic noise has zero mean, so it does not affect the mean detector output $\mathbf{\bar{g}}$ , but it does affect the covariance matrix. If the electronic noise in different detector elements is statistically independent, then we can write the conditional PDF as a product of component PDFs, analogous to (7). The covariance matrix of (8) remains diagonal but the variance is no longer the same as the mean; the covariance matrix has the structure

Equation (9)

where α is a constant related to the random variations in detector output for different x-ray photons and $\sigma _{m}^{2}$ is the variance of the readout noise in detector m. If we normalize the detector output so that ${{\bar{g}}_{m}}$ is the mean number of detected x-ray photons in some exposure time, then σm is in units of the average detector output for one x-ray photon and α, which is the reciprocal of the Swank factor (Swank 1973, Yaffe and Rowlands 1997), is typically in the range 1.2–1.5.

The diagonal form in (9) holds only when fluctuations at different detector elements are statistically independent, but in many x-ray detectors an absorbed photon produces light or charge which spreads out and affects many detector elements, and in those cases the covariance matrix is not diagonal (Kyprianou et al 2009).

So far we have discussed probability laws and covariance matrices conditional on f, which means that they apply to repeated measurements on the same object. In imaging, and especially in assessment of image quality, we must also consider many different objects, which we can think of as random functions f. The overall probability for the data vector g is obtained by averaging the conditional probability over a group of patients or other objects to be imaged8:

Equation (10)

where C denotes some class or group of objects (often a theoretical infinite ensemble), such as all patients who could undergo a particular diagnostic examination, and the subscripted angle brackets denote an average over all objects f in C.

Similarly, the overall covariance matrix is given by

Equation (11)

Because the conditional covariance matrix is itself an average, given in (8), we can also write (11) in component form as

Equation (12)

where the double overbar indicates the overall average, obtained by averaging first over g given f, then over f given that the object was drawn from set C.

A more compact way of writing (12), without explicitly displaying the components, is in terms of outer products; if x is an M × 1 column vector, then its transpose xt is a 1 × M row vector, and xxt, called an outer product, is an M × M matrix with mm' component xmxm'. Thus

Equation (13)

As we shall see in section 3, this overall covariance plays a key rule in image quality.

Moreover, a simple algebraic manipulation of (13) will prove useful when we discuss the relation between dose and image quality in section 6. We merely add and subtract the conditional mean defined by (1), but written here as $\mathbf{\bar{g}}\left(\mathbf{f}\right)$ to emphasize that it depends on the random f. Thus

Equation (14)

When the outer product is expanded out and averaged, the cross terms vanish by construction; for example,

Equation (15)

since ${{\left\langle \mathbf{g}\right\rangle}_{\mathbf{g}\mid \mathbf{f}}}=\mathbf{\bar{g}}\left(\mathbf{f}\right)$ . Thus we find that

Equation (16)

where the first term is the noise covariance matrix averaged over objects and the second term accounts for the covariance induced by the object variability itself. The terms are defined by

Equation (17)

Equation (18)

The covariance decomposition in (16) is just a mathematical tautology; it does not depend on statistical independence of the object and image or on any other assumptions about the object or image statistics.

In situations where the conditional noise covariance matrix Kgf is diagonal, such as (9), the averaged noise covariance matrix $\mathbf{\bar{K}}_{\mathbf{g}\mid \mathbf{C}}^{\text{noise}}$ is also diagonal; the average of diagonal matrices is diagonal. The object term $\mathbf{K}_{\mathbf{\bar{g}}\mid \mathbf{C}}^{\text{obj}}$ , however, is never diagonal; interesting objects always have point-to-point correlations. Because objects are functions of continuous variables, we must describe these correlations by a covariance function, rather than a covariance matrix; the definition is

Equation (19)

If we regard kf (r, t; r', t') as the kernel of a continuous-to-continuous operator ${{\mathcal{K}}_{\mathbf{f}\mid \mathbf{C}}}$ , and $\mathcal{H}$ is linear, then we can express (18) compactly as

Equation (20)

where ${{\mathcal{H}}^{\dagger}}$ is the adjoint (back-projection) operator corresponding to $\mathcal{H}$ ; for more details, see Barrett and Myers (2004).

For tomography, we can also define a covariance matrix for the reconstructed images rather than raw projection data, just by replacing g with $\mathbf{\hat{f}}$ in (13). For a linear reconstruction operator and a linear system, it is easy to show that

Equation (21)

Again we have decomposed the covariance matrix into a term arising from noise (as propagated through the reconstruction algorithm) and a term arising from object variability (as propagated through both the system operator and the algorithm). Now, however, neither term is diagonal; even if the detectors do not introduce correlations, the reconstruction algorithm will.

With nonlinear reconstruction algorithms or systems, decomposition of the covariance matrix into noise and object terms is still valid, but both terms are more complicated than indicated in (21). For a detailed treatment of the noise statistics of images produced by nonlinear reconstruction algorithms, see chapter 15 in Barrett and Myers (2004), Barrett et al (1994), Wilson et al (1994), Fessler (1996), Wang and Gindi (1997) and Qi and Leahy (2000), and for a method of estimating the object covariance term for nonlinear reconstructions, see section 6.3 of this paper.

Covariance decompositions are important in this paper because the two terms depend differently on patient dose (see section 6).

3. Principles of task-based assessment of image quality

3.1. Efficacy

In classic papers published in the 1990s, Fryback and Thornbury (1991) and Thornbury (1994) proposed a systematic approach to expressing the efficacy of diagnostic imaging. They identified six distinct stages of efficacy:

  • Stage 1. Technical capacity
  • Stage 2. Diagnostic accuracy
  • Stage 3. Diagnostic impact
  • Stage 4. Therapeutic impact
  • Stage 5. Patient outcomes
  • Stage 6. Societal outcomes

Stage 1 includes all of the things that engineers measure and attempt to optimize in designing medical imaging systems. Functional characterizations such as point spread function (PSF), modulation transfer function (MTF) and noise power spectrum (NPS) and summary measures such as spatial, temporal and energy resolution and count-rate capability all fit under technical capacity, Stage 1 of efficacy. For purposes of this paper, radiation dose is a technical characterization that is also part of Stage 1.

An immediate difficulty in defining efficacy in purely technical terms is that there are many different parameters and functions that one can—and should—determine as part of a comprehensive system characterization, but it is not clear how one can combine them all into a single scalar merit function that can be used for system optimization.

This difficulty is addressed in Stage 2 of efficacy. The most common measures of the accuracy of a diagnostic test are its sensitivity and specificity. As we shall see in section 3.2, these quantities can be combined into receiver operating characteristic (ROC) curves, and scalar metrics such as the area under the curve (AUC) or a detectability index derived from AUC can be used for optimization (Metz 1978, Metz 2006). Again for purposes of this paper, the variation of AUC with some measure of radiation dose is a relevant metric for the effect of dose on Stage 2 efficacy.

Stage 3, diagnostic impact, relates to the ability of a test to improve diagnostic decisions. This stage is the basis for studies of comparative effectiveness, often quantified by pre- and post-test probability of disease or by the expected utility of the diagnosis. Because diagnostic decisions are commonly based on quantitative information derived from images, we also include performance on parameter-estimation tasks as a metric for Stage 3 efficacy. Radiation dose affects the noise in an image and hence the accuracy of any quantitative information extracted from the images.

Stage 4, therapeutic impact, specifies efficacy of a diagnostic test by its ability to improve therapeutic choices, at least in the short term. In radiation therapy, for example, image data are used to plan a treatment that will control tumor growth while minimizing acute radiation damage to normal organs. The success of this plan depends in part on the quality of the information derived from the images, which in turn, for imaging with ionizing radiation, depends on the radiation dose to the patient in the imaging step.

Stage 5 refers to longer-term outcomes, specified in terms of length or quality of life. A common metric is quality-adjusted life years (QALYs) (Pliskin et al 1980, Mortimer and Segal 2007). Much of the literature on Stage 5 efficacy of imaging attempts to relate the benefit of an imaging study, in QALYs, to the monetary cost of the study. In imaging with ionizing radiation, however, improved diagnostic accuracy and impact (Stages 2–4) can increase the length or quality of life, but the concomitant radiation dose can decrease it.

Stage 6, societal outcomes, is the most difficult kind of efficacy to assess. It is here that the contentious issues of the epidemiological effects of low-level radiation arise. Nevertheless the eventual goal of research into the risk-benefit tradeoff of imaging with ionizing radiation is to extend the efficacy analysis to Stages 5 and 6.

3.2. Tasks and figures of merit

This subsection introduces the tasks and the corresponding figures of merit used to quantify efficacy. Most of the emphasis here is on Stage 2 efficacy, but some discussion of the later stages is also included. Mathematical observers for performing the tasks are summarized in sections 3.3 and 3.4.

The task in imaging is always to draw inferences about the object that produced an image. Because the image data are random, as discussed in the previous section, we require statistical inferences, and the underlying theory is referred to as statistical decision theory. Useful general references for this field include (Melsa and Cohn 1978, Kay 1998, Wickens 2002, Schonhoff and Giordano 2006, Van Trees and Bell 2013).

Generically, the task can be classification, estimation or some combination of the two. A classification task is to assign the object that produced the image to one of two or more disjoint classes. In the context of cancer imaging, the classes can be as simple as 'tumor present' versus 'tumor absent' or 'benign' versus 'malignant'. If there are only two possible classes, as in these examples, the classification task is said to be binary, but often there are more than two classes, such as different types of tumor. A binary classification task in which the objective is to determine whether some feature (such as a tumor) is present in the object is called a detection task.

An estimation task is to determine numerical values for one or more parameters that describe important characteristics of the object. In cancer imaging, an estimation task might be to estimate the volume of a tumor or its uptake of some tumor-specific radiotracer. If only one number is required, it is a scalar estimation task. If the goal is multiple numbers, such as tumor volume, location and uptake, then it is a vector estimation task.

Classification and estimation tasks are often combined in one diagnostic study. The clinician may want to know the type and size of a a tumor. We refer to this situation as a joint classification/estimation task. If the presence of the tumor must also be determined before classification and size estimation, it is a joint detection/classification/estimation task.

3.2.1. Binary detection or classification tasks.

Consider a binary task in which the goal is to use an image vector g to classify the object that produced it as belonging to either class C1 (e.g. signal present) or C0 (e.g. signal absent). The hypothesis that the object was drawn from class C1 is denoted H1, and the hypothesis that the object was drawn from class C0 is denoted H0.

For algorithmic observers (though not for humans), we can assume that the task is performed without any additional randomness beyond that inherent in the image data (no coin flips or random-number generators can be used) and that no equivocation is permitted (every image must be classified as either C0 or C1). Under these two assumptions, it can be shown (see Barrett et al (1998) or section 13.2 in Barrett and Myers (2004)) that any decision rule is equivalent to computing a scalar test statistic t (g) and comparing it to a threshold tc. If t (g) > tc, then the decision is that the object came from class C1, or equivalently that hypothesis H1 is true; we refer to this decision as ${{\mathcal{D}}_{1}}$ . Conversely, if t (g) < tc, then the decision is that the object came from class C0, so H0 is true, and in this case we refer to the decision as ${{\mathcal{D}}_{0}}$ .

We can express this decision rule succinctly as

Equation (22)

which is read, 'choose ${{\mathcal{D}}_{1}}$ if > holds, and vice versa'. Specific choices for t (g), i.e. specific mathematical observers for the binary task, are discussed in section 3.3.

For each image vector g and any particular choice of tc, there are four possible outcome scenarios, as summarized in table 1. The true-positive and true-negative outcomes are correct decisions and the false-positive and false-negative decisions are erroneous. In the terminology of hypothesis testing, a false positive is a Type I error and a false negative is Type II.

Table 1. Possible outcomes for a binary classification task.

1. True positive (TP): H1 is true; observer makes decision ${{\mathcal{D}}_{1}}$ .
2. False positive (FP): H0 is true; observer makes decision ${{\mathcal{D}}_{1}}$ .
3. False negative (FN): H1 is true; observer makes decision ${{\mathcal{D}}_{0}}$ .
4. True negative (TN): H0 is true; observer makes decision ${{\mathcal{D}}_{0}}$ .

Associated with each of these outcomes there is a fraction or probability of occurrence. For example, the true-positive fraction, denoted TPF, is the probability that the observer makes the correct decision ${{\mathcal{D}}_{1}}$ when the object that produced g was actually in class C1, so that hypothesis H1 is true. In signal-detection terminology, TPF is the probability of detecting a signal when it is really there. Conversely, the false-positive fraction, FPF, is the probability of deciding that a signal is present when it really is not, so it is the false-alarm probability.

In terms of conditional probabilities, the four fractions are defined as

Equation (23)

In the medical literature, the TPF is referred to as the sensitivity, since it is an indication of the sensitivity of the test to the presence of an abnormality. The TNF is commonly referred to as the specificity; it measures the proportion of the tests in which the subject really does not have the abnormality that are correctly identified as normal. Because FPF = 1 − TNF, a test with low specificity is one where there are many false positive decisions.

These fractions can be computed if we have knowledge of the probability density functions on the test statistic t. For the generic decision rule of (22) and a given threshold tc, we can express the four fractions as

Equation (24a)

Equation (24b)

Equation (24c)

Equation (24d)

Recall that Pr denotes a probability and $\text{pr}$ denotes a probability density function; thus 0 ⩽ Pr(· ) ⩽ 1 and $\int_{ - \infty}^{\infty} \text{d}t\ \text{pr}\left(t\mid \cdot \right)=1$ .

3.2.2. The ROC curve and figures of merit.

The integrals in (24) show that all four fractions depend on the chosen decision threshold tc, and in particular that decreasing tc increases both TPF and FPF. We can achieve any desired probability of detection, but only at the expense of an increased false-alarm probability. This trade-off is illustrated in a Receiver Operating Characteristic (ROC) curve (see figure 1), which is a plot of TPF versus FPF (or equivalently, sensitivity versus 1 − specificity) as tc is varied.

Figure 1.

Figure 1. ROC curves. Left: single curve showing three different operating points A, B and C, corresponding to three decision thresholds tc. Right: three different ROC curves, corresponding to different values of the detectability index dA (hence different AUCs).

Standard image High-resolution image

As seen from (24a) and (24b), an ROC curve is determined by the two PDFs on the test statistic, $\text{pr}\left(t\mid {{H}_{1}}\right)$ and $\text{pr}\left(t\mid {{H}_{0}}\right)$ . If these PDFs are identical, then there is no way to discriminate between the two hypotheses, and the ROC curve is the diagonal line TPF = FPF as shown in figure 1. Conversely, if there is no overlap of $\text{pr}\left(t\mid {{H}_{1}}\right)$ and $\text{pr}\left(t\mid {{H}_{0}}\right)$ , then perfect discrimination is possible and the ROC curve is a line vertically from the origin to the upper left corner, then horizontally to the upper right corner. Perfect separation of the two PDFs (hence perfect performance on this binary classification task) corresponds to an area under the curve (AUC) of 1.0. The line TPF = FPF, indicating no ability to perform the task, has AUC = 0.5. Real ROC curves will have AUCs intermediate between these extremes, as shown on the right in figure 1, and AUC is a commonly used figure of merit for binary tasks.

In an imaging context, AUC depends on the classes of objects considered (C0 and C1), on the form of the test statistic t (g), and on the statistical properties of the image vectors g. For imaging with ionizing radiation, the statistics of the images, and hence the AUC, depend on radiation dose.

An alternative FOM for separability of $\text{pr}\left(t\mid {{H}_{1}}\right)$ and $\text{pr}\left(t\mid {{H}_{0}}\right)$ is the signal-to-noise ratio of the test statistic, denoted SNRt. With the notation developed in section 2.2, we can express the means and variances of the test statistic under the two hypotheses as

Equation (25)

Then SNRt is defined as

Equation (26)

When $\text{pr}\left(t\mid {{H}_{1}}\right)$ and $\text{pr}\left(t\mid {{H}_{0}}\right)$ are both univariate normal PDFs, the area under the ROC curve is given by Barrett et al (1998) as

Equation (27)

where erf(· ) is the error function, defined as $\text{erf}(z)=\frac{2}{\sqrt{\pi}}{\int}_{0}^{z}\text{d}t\ \text{exp}\left(-{{t}^{2}}\right)$ .

Even when $\text{pr}\left(t\mid {{H}_{1}}\right)$ and $\text{pr}\left(t\mid {{H}_{0}}\right)$ are not normals, AUC can be used to compute an effective signal-to-noise ratio simply by inverting (27). The resulting figure of merit is denoted dA and defined as (ICRU 1997)

Equation (28)

This definition shows that dA is a nonlinear but monotonic transformation of AUC, so these two figures of merit are equivalent.

3.2.3. Multiclass tasks.

There has been some effort to extend the ROC paradigm to classification problems with three or more classes, but the results are at best unwieldy. With N classes, N − 1 scalar variables are needed to make a decision and the ROC curve becomes an (N − 1)-dimensional surface.

It is not clear how to relate these generalized ROC concepts to Stage 2 efficacy, but as discussed below it is relatively straightforward in principle to use Stage 3 efficacy for any number of classes.

3.2.4. Stage 3 efficacy in classification tasks.

The FOMs used for binary classification tasks—AUC, dA and SNRt—are properties of the complete ROC curve, so they are independent of the operating point tc. In practice, however, the choice of operating point itself depends on the clinical task and on the consequences of the outcome. In screening mammography, for example, a clinician might choose a low tc (high TPF and FPF) in order to avoid missing any true malignancies, especially if the consequence of a false positive is just that a second test will be performed to confirm the result from mammography. If, on the other hand, the consequence of a false positive is that the patient will undergo an unnecessary surgery, then a higher tc (lower TPF and FPF) is more appropriate.

In order to account for the consequences of clinical decisions, we must use Stage 3 efficacy. The key feature that sets Stage 3 apart from Stage 2 is the introduction of some measure of the clinical usefulness of the diagnostic information. One way to do this is to introduce the concept of utility.

The utility of a classification outcome has been defined, somewhat subjectively, as a measure of the desirability of the outcome relative to other outcomes (Patton and Woolfenden 1989). The utility can also be viewed as the negative of the cost of the outcome, especially if 'cost' is interpreted more broadly than simply monetary cost. The methodology of Fryback and Thornbury is to elicit estimates of utility from questionnaires submitted to expert clinicians.

The overall diagnostic utility of an imaging test can be evaluated as an expectation of the utility of the test over the population of patients and outcomes. For classification tasks with N classes and imaging system Sk, the overall utility Uk is defined as

Equation (29)

where uij is the utility assigned a priori to decision ${{\mathcal{D}}_{i}}$ when hypothesis Hj is true, and Pr(Hj) is the probability that Hj is true (i.e. the object being imaged came from class Cj) in the population under study. The conditional probability $\Pr \left({{\mathcal{D}}_{i}}\mid {{H}_{j}},{{S}_{k}}\right)$ is the generalization of TPF, FPF, etc to more than 2 classes; it depends on the decision rule and on the statistical properties of the image vectors g, hence on the imaging system and the radiation dose. Note that the other factors in the summand of (29), uij and Pr(Hj), are independent of the imaging system and the radiation dose.

Wunderlich and Abbey (2013) have shown that utility can also be used for choosing the performance-assessment paradigm itself.

The concept of expected utility, and the assessment of diagnostic image quality by Stage 3 efficacy in general, has been criticized on the grounds that it is difficult to know what utilities to assign to different outcomes and what probabilities of disease to use for computing the expected utility (Metz 2006). The increasing availability of Big Data should mitigate the latter problem, and the dependence of overall utility on the choice of the component utilities uij can be regarded as an opportunity for further research rather than a problem. We can in principle study the variation of $\Pr \left({{\mathcal{D}}_{i}}\mid {{H}_{j}},{{S}_{k}}\right)$ with radiation dose for different true hypotheses and decision outcomes and determine not only the overall effect of radiation dose but also its effect on various possible decision outcomes; if the outcomes for which large values of $\Pr \left({{\mathcal{D}}_{i}}\mid {{H}_{j}},{{S}_{k}}\right)$ require large dose are associated with relatively small utilities, then the dose can be reduced with little effect on overall utility.

Moreover, expected utility can be used to compare different imaging systems or to study the effect of radiation dose with a given system just by selecting reasonable values for uij and Pr(Hj) and holding them constant across the comparisons. The robustness of the conclusions can then be checked by varying uij and Pr(Hj) after the fact. In many cases it will suffice if the rank order of the comparisons is maintained as utilities and probabilities are varied over reasonable ranges.

On the other hand, defensible utilities are essential if we want to evaluate image quality not only for Stage 3 efficacy but also for Stages 5 and 6, and in particular if we want to study the impact of radiation dose on long-term patient outcomes and on society.

3.2.5. Estimation tasks.

In image-quality analysis, the goal of an estimation task is to use image data to estimate one or more numerical parameters defined in terms of the object. This goal is distinct from the goals in other areas of imaging, such as data mining, pattern recognition, texture analysis and computer-aided diagnosis (CAD), which often use parameters of an image.

To illustrate this distinction, there is considerable current interest in the heterogeneity of the uptake of fluorodeoxyglucose (FDG) in tumors, as observed in PET scans, because heterogeneity can have prognostic value for radionuclide therapy. Various metrics from the literature on texture analysis (see, for example, Tixier et al (2012)) have been proposed for this purpose, but they describe the heterogeneity of the image, not the object. Noise in the image can introduce heterogeneity in the image where none exists in the object, and limited resolution of the imaging system can hide fine structures that are really present. The identification of image texture features with physical properties of the object, much less with biological significance, has been criticized by Brooks (2013). An approach to avoiding this problem and directly estimating object texture parameters is given in Kupinski et al (2003b).

A scalar parameter θ(f), which is defined in terms of the object f, is said to be linear if it can be written as a scalar product. If f represents a time-independent spatial function f(r), the scalar product has the form (see (23)):

Equation (30)

where χ represents the spatial function χ(r) (strictly, as a vector in a Hilbert space), and χf is a notation for the scalar product of functions. As an example, if f(r) is a radionuclide distribution (activity per unit volume) and χ(r) is an indicator function equal to 1 for r inside the boundary of some region of interest (ROI) and 0 outside, then the integral in (30) is the total amount of activity in the ROI.

If we have a raw image data vector g, we can use one of the estimation rules to be discussed in section 3.4 to form an estimate $\hat{\theta}\left(\mathbf{g}\right)$ . On the other hand, if the data vector is a reconstructed image $\mathbf{\hat{f}}$ , then the estimate is denoted $\hat{\theta}\left(\mathbf{\hat{f}}\right)$ . For simplicity, we consider only $\hat{\theta}\left(\mathbf{g}\right)$ here.

Because g is random, $\hat{\theta}\left(\mathbf{g}\right)$ is also random. That is, if you repeatedly image the same object, hence keeping θ(f) constant, you will get a distribution of random images g and hence a distribution of values of $\hat{\theta}\left(\mathbf{g}\right)$ .

The full statistics of $\hat{\theta}\left(\mathbf{g}\right)$ can be difficult to derive. For this reason, it is common practice to provide the bias and variance only. These quantities are defined, respectively, as

Equation (31)

Thus the bias is the average deviation of the estimate from the true θ (f), and the variance is the mean-square deviation of the estimate from the average of the estimates themselves. Bias is the systematic error and variance is the random error in the measurement of θ.

Bias and variance can be combined into a single FOM, the mean-square error (MSE), which is defined as

Equation (32)

Thus MSE accounts for the total deviation (systematic and random) between the estimate and the true θ.

As the notation implies, bias, variance and MSE all depend on the object f; it is not correct, however, to say that these metrics depend on the true value of the parameter. The problem is that a linear CD imaging operator $\mathcal{H}$ necessarily has null functions, for which $\mathcal{H}{{\mathbf{f}}_{\text{null}}}=0$ . Moreover, any object vector can be decomposed uniquely into a null component and an orthogonal measurement component, f = fmeas + fnull. Objects that differ by a null function produce the same mean data $\mathbf{\bar{g}}$ , and the same conditional PDF $\text{pr}\left(\mathbf{g}\mid \mathbf{f}\right)$ , so they must produce the same conditional mean estimate, $\overline{{\hat{\theta}}}\left(\mathbf{f}\right)$ , and the same conditional variance, $\text{Var}\left\{\hat{\theta}\mid \mathbf{f}\right\}$ . Objects that differ by a null function do not, however correspond to the same true value of the parameter unless θ(f) is independent of the null functions. This condition can be stated as θ(f) = θ (fmeas). A parameter that satisfies this condition is said to be estimable, which means that there exists an unbiased estimate of it for all true values.

For linear parameters as defined in (30), we can state the estimability condition more directly by writing

Equation (33)

There are no cross terms (χmeasfnull) or (χnullfmeas) because null space and measurement space are orthogonal subspaces of object space (Barrett and Myers 2004). Thus, for a general object, a linear parameter is estimable only if the function that defines it, χ (r), contains no null functions, as determined, for example, by an SVD analysis. In particular, the integral of an object over a voxel is virtually never estimable. Intuitively, the sharp edges of the voxel function contain high spatial frequencies that are not captured by the imaging system; for more detail, see Whitaker et al (2008) and Kupinski et al (2013).

One way to account for null functions in the figure of merit for a scalar estimation task is to average the MSE over some class of objects, called an ensemble in this context. The ensemble mean-square error (EMSE) for object class C is defined as

Equation (34)

3.2.6. Vector parameters.

The discussion above on estimation of a single scalar parameter is readily extended to a set of P parameters. The parameters {θp, p = 1, ..., P } can be organized into a P × 1 parameter vector θ, and the estimates can be similarly organized as a P × 1 vector $\mathbf{\hat{\theta}}$ . Likewise the biases and variances are components of P × 1 vectors. A vector parameter θ(f) is estimable if and only if every component is estimable: θp(f) = θp(fmeas).

For P parameters, the EMSE is given by

Equation (35)

where ∣∣ · ∣∣ denotes the norm of the vector.

3.2.7. Efficacy in estimation tasks.

If the parameter values obtained in an estimation task are used to make diagnostic decisions, then Stage 2 efficacy (diagnostic accuracy) becomes relevant, and if we want to discuss the diagnostic impact of an estimation task, thereby moving on to Stage 3 efficacy, we must assign utilities to the outcomes. For classification tasks, we defined uij as the utility of making decision ${{\mathcal{D}}_{i}}$ when hypothesis Hj is true. A natural analog of this definition for a vector estimation task is the utility of obtaining estimate $\widehat{\boldsymbol{\theta }}$ when the true parameter is θ, denoted $u\left(\widehat{\boldsymbol{\theta }},\boldsymbol{\theta }\right)$ .

It is common in the estimation literature to use a quadratic cost function $c(\hat \theta ,\theta ) = {\left\| {\hat \theta - \theta } \right\|^2}$ . Since a cost can be viewed as a constant minus a utility, the negative of the EMSE in (35) is a possible expected utility associated with this quadratic cost.

EMSE is not directly related to diagnosis or Stage 3 efficacy, in part because it treats all components of the vector θ as equally important to the figure of merit. Some of the components may be nuisance parameters, defined as ones that influence the image data g but are of no direct clinical interest. It may be necessary to estimate the nuisance parameters in order to get good estimates of the parameters of interest, but they should be omitted from the EMSE if diagnostic relevance is to be ascribed to that metric. Conversely, high utilities should be assigned to accurate estimates of clinically relevant parameters.

One way to bring these judgments about clinical impact into the estimation figure of merit is to define a weighted EMSE by

Equation (36)

where all of the weights wp are nonnegative. A component considered to be a nuisance parameter is assigned the weight wp = 0, and components of high clinical relevance are assigned high weights. As a side benefit, if different components of θ(f) have different units, the weights can be used to normalize each component by either the variance or the square of the mean of that component, thereby making WEMSE dimensionless.

Even with these modifications, however, there can be objections to relating EMSE to efficacy. There is no reason other than mathematical convenience to assume that the cost increases quadratically with the discrepancy between an estimate of a parameter and its true value. We know that the bias term in MSE or EMSE is influenced by null functions of the imaging system, and this term can be arbitrarily large because a null function remains a null function when it is multiplied by a constant. Rather than accepting a large MSE as indicative of low clinical utility, it is better to construct utility functions more directly related to the clinical tasks.

Diagnosis is ultimately about classification. If one or more of the estimated parameters is used for classification, then the expected utility as defined in (29) is a relevant FOM for an estimation task.

3.2.8. Joint detection and estimation.

In some clinical scenarios, one wants to detect a tumor or other lesion and, if it is present, estimate some numerical parameters such as the size or location of the lesion. These situations lead to several new kinds of ROC curve, including the localization ROC (LROC) and estimation ROC (EROC), both of which are illustrated in figure 2.

Figure 2.

Figure 2. Left: schematic localization ROC (LROC) curve. Right: schematic estimation ROC (EROC) curve.

Standard image High-resolution image

The LROC curve plots the joint probability of detection and correct localization (within some tolerance) versus the false-positive fraction as the decision threshold for detection is varied. The EROC curve (Clarkson 2007), a generalization of LROC, plots the expected utility of a true positive decision against the false-positive fraction, again as the decision threshold is varied. An LROC curve is an EROC where the parameter being estimated is the lesion location and the utility function is unity if the estimate is within the set tolerance of the true location, and zero otherwise.

3.2.9. Phantoms for search tasks.

Popescu and Myers (2013) recently published a paper describing a signal-search paradigm, along with a phantom design, that enables the evaluation and comparison of iterative reconstruction algorithms in terms of signal detectability versus dose performance using a few tens of images. Standard commercial phantoms for CT quality assessment have limited utility for quantitative evaluation of image quality, because of the limited area available for deriving the statistics of a model observer in the signal-absent condition. Moreover, such phantoms yield only one signal size/contrast combination per image. Phantoms that allow for multiple realizations of background-only and signal-present regions in each image, along with the addition of search as a component of the detection task, enable the investigator to customize the signal detectability level through adjustment of the search region area so that meaningful image quality comparisons can be made across algorithms or imaging systems with a fairly small number of images.

3.2.10. Tasks related to radiation therapy and Stage 4 efficacy.

Imaging with ionizing radiation is often used in the planning and monitoring of radiation therapy. For a discussion of anatomical imaging for radiotherapy applications, see Evans (2008), and for a discussion of the role of PET in radiotherapy, see Nestle et al (2009).

A recent paper in this journal (Barrett et al 2013) discussed how radiobiological models could be used to assess image quality in terms of therapeutic efficacy, defined by the tradeoff between the probability of tumor control and the probability of some critical adverse reaction in normal tissues. In essence, this paper defines image quality in terms of Stage 4 efficacy, and if we assign utilities, the theory is applicable to Stage 5.

3.3. Observers for classification and detection tasks

3.3.1. Ideal observer for binary classification tasks.

The ideal observer (IO) for a binary classification task can be defined in several equivalent ways: It maximizes the TPF for any given FPF; it maximizes the area under the ROC curve (AUC); it maximizes the detectability index dA, and it maximizes the overall utility. A well-known result from statistical decision theory is that the ideal observer implements the generic decision strategy (22) with the test statistic t (g) being the likelihood ratio (LR), defined as

Equation (37)

where $\text{pr}\left(\mathbf{g}\mid {{H}_{j}}\right)$ is the multivariate PDF for the data given that hypothesis Hj (j = 0, 1) is true.

An important, though little recognized, point is that the LR is invariant to any invertible transformation of the data (see Barrett and Myers (2004), pp 829–30). For example, if we base the classification decision on a reconstructed image $\mathbf{\hat{f}}$ instead of raw projection data g, and it is possible to go backwards and compute g from $\mathbf{\hat{f}}$ , then $\Lambda \left(\mathbf{\hat{f}}\right)=\Lambda \left(\mathbf{g}\right)$ , and hence the ROC and any FOM derived from it will be the same for the two data sets. If the reconstruction algorithm is not invertible in this sense, then the IO performance on the reconstruction will be less than or equal to its performance on the projection data; if this were not the case, the ideal observer, being ideal, would include the reconstruction algorithm in her decision process when given g.

An important example of an invertible transformation is the logarithm. The logarithm of the likelihood ratio (LLR),

Equation (38)

produces exactly the same ROC curve as the LR itself, hence the same value for any figure of merit derived from the ROC curve. Thus LR and LLR are equivalent test statistics for the ideal observer.

Analytical forms for the LR and LLR are available for several different families of data PDFs, $\text{pr}\left(\mathbf{g}\mid {{H}_{j}}\right)$ , including multivariate normal (Gaussian), log-normal, exponential and Poisson distributions (see sections 13.2.8 and 13.2.9 in Barrett and Myers (2004)).

Here we focus on multivariate normal data, for which the PDF is given by

Equation (39)

where det(· ) denotes determinant, and ${{\overline{{\mathbf{\bar{g}}}}}_{j}}$ and Kj, (j = 0, 1) are shorthand forms for the mean data vector and covariance matrix, respectively, when the object class is Cj. We remind the reader that the means and covariances include averages over both the conditional noise for a given object, described by the PDF $\text{pr}\left(\mathbf{g}\mid \mathbf{f}\right)$ , and the randomness of the object itself; this double averaging is indicated by the double overbar on ${{\overline{{\mathbf{\bar{g}}}}}_{j}}$ .

With (38) and (39), we see that the LLR is given by

Equation (40)

Thus the test statistic for a binary classification task with multivariate-normal data is, in general, a quadratic function of the data vector, referred to as a quadratic discriminant.

In x-ray and gamma-ray imaging, however, it is frequently an excellent approximation to say that the two covariance matrices, K0 and K1, are nearly equal. In transmission x-ray imaging, the signal to be detected results from a localized change in the x-ray attenuation coefficient, but it is observed by its effect on a line integral of the attenuation coefficient through the patient's body. Similarly, in radionuclide imaging a localized change in the activity per unit volume of the tracer is observed by its effect on a line integral through all of the background activity within the body. In neither case is a significant change in the value of a single line-integral projection to be expected; a large change in one projection would be too easily detected from the full set of projections and would be of little use in comparing imaging systems or studying the effect of increased radiation dose. Thus we are interested in detecting small signals in this sense, and the signal is in the mean data vector, so we can usually neglect its effect on the covariances.

If we assume K0 = K1 ≡ Kg, and take advantage of the fact that covariance matrices are symmetric, then (40) becomes

Equation (41)

The last two terms in (41) are constants, independent of the data, so they can be lumped into the decision threshold. Thus the ideal observer for a binary classification task with multivariate normal data and equal covariance matrices is a linear discriminant, with test statistic given by

Equation (42)

Because a covariance matrix is symmetric and positive definite, we can define its square root $\mathbf{K}_{\mathbf{g}}^{1/2}$ and then rewrite (42) as

Equation (43)

The M × M matrix $\mathbf{K}_{\mathbf{g}}^{-1/2}$ is called a prewhitening operator because the transformed data, $\mathbf{K}_{\mathbf{g}}^{-1/2}\mathbf{g}$ , exhibit uncorrelated or 'white' noise. More precisely, the covariance matrix of $\mathbf{K}_{\mathbf{g}}^{-1/2}\mathbf{g}$ is the M × M unit matrix . Thus the IO test statistic for the present problem is just the scalar product of the prewhitened data and the prewhitened mean difference signal.

Because any linear transformation of a normal random vector or random variable is also normal, it follows that $\text{pr}\left({{t}_{\text{IO}}}\mid {{H}_{1}}\right)$ and $\text{pr}\left({{t}_{\text{IO}}}\mid {{H}_{0}}\right)$ are both univariate normal PDFs. Thus AUCIO is given by (27), with the SNR as defined in (26); it can be shown that

Equation (44)

Note that calculation of both the test statistic in (43) and its SNR in (44) requires prior knowledge of the mean data vector for both classes and their common covariance matrix.

Some alternative ways of writing this SNR are:

Equation (45)

where tr(·) denotes the trace (sum of the diagonal elements) of a matrix.

3.3.2. The Hotelling observer.

In 1931, the American statistician Harold Hotelling devised a multivariate generalization of Student's t test for determining the separability of two groups of N-dimensional random vectors (Hotelling 1931). In essence, Hotelling's separability measure, which he called T2, was a linear discriminant function (Fisher's linear discriminant (Fisher 1936), in fact, though Hotelling's work was five years earlier) in the ND space.

The concept we now refer to as the Hotelling observer was introduced into the image-quality literature by Smith and Barrett (1986) and Fiete et al (1987). Because T2 was in the form of a trace of a matrix, Smith and Fiete referred to this proposed FOM for classification tasks as the Hotelling trace criterion. A key difference, however, is that T2 is a random variable (a test statistic) but the Hotelling trace, as Smith and Fiete used the term, is an ensemble average separability (a figure of merit).

Current terminology in the image-quality literature is that the Hotelling observer is a linear discriminant in the form of a scalar product, $t\left(\mathbf{g}\right)={{\mathbf{w}}^{t}}\mathbf{g}=\underset{m=1}{\overset{M}{\sum}} {{w}_{m}}{{g}_{m}}$ , where w is a vector of the same dimension as the data g. We refer to w as a template because the scalar product can be visualized by thinking of g as a transparency on a light box and w as a second transparency laid over the first one. The total light that emerges, in this metaphor, is proportional to the product of the two transmittances, integrated over position on the light box.

Specifically, the Hotelling observer uses the linear test statistic that maximizes SNR2. It can be shown by a straightforward calculation (Barrett and Myers 2004) that the Hotelling test statistic is given by

Equation (46)

where $\mathbf{w}_{\text{Hot}}^{t}=\Delta {{\overline{{\mathbf{\bar{g}}}}}^{t}} \mathbf{K}_{\mathbf{g}}^{-1}$ , so that ${{\mathbf{w}}_{\text{Hot}}}=\mathbf{K}_{\mathbf{g}}^{-1} \Delta \overline{{\mathbf{\bar{g}}}}$ (because the inverse of a covariance matrix is symmetric), and now

Equation (47)

The corresponding SNR2 is found to be

Equation (48)

Comparing these results to (42) and (44), respectively, we see that the Hotelling observer is also the ideal observer, in the sense of maximizing AUC, when the data PDFs are multivariate normal with equal covariances under the two hypotheses. Even when these conditions are not satisfied, however, the Hotelling observer is still optimal among all linear observers in the sense of maximizing SNR2.

On the other hand, no linear observer performs well on a signal detection task when the signal is at a random location in the image. Recall that the double overbar on $\Delta \overline{{\mathbf{\bar{g}}}}$ indicates averages over both image noise (Poisson or electronic) and object randomness, and Δ indicates the difference in these averages between the signal-absent and signal-present classes. If the background (non-signal) part of the data has the same statistics in the two classes, then $\Delta \overline{{\mathbf{\bar{g}}}}$ reflects only the randomness of the signal. If the signal can appear randomly anywhere in the image, the components of the Hotelling template vector, [wHot]m, will have very little variation with the pixel index m, and the test statistic tHot(g) is little more than a sum over the components of g, which conveys very little information about the presence or absence of the signal.

For this reason, the use of the Hotelling observer, as we have described it so far, is restricted to situations where the location of a potential signal is known, and the task is to determine whether or not it is present at that location. If we further restrict the signal to be completely nonrandom, then the task is said to be SKE (signal known exactly). These restrictions will be lifted below when we discuss scanning observers that can perform well even when the signal location and other parameters are random variables. The converse of SKE is SKS (signal known statistically).

We might also consider the case of BKE (background known exactly), which for example would describe the task of detecting a lesion in a water tank or other spatially uniform phantom. The much more realistic case is BKS (background known statistically), where there can be organs of unknown size, shape, location and radiographic appearance in the phantom or object being imaged, and there can be unknown fine structure within the organs. The SKE/BKS combination is often useful in image-quality studies, but SKE/BKE should be avoided because the BKE assumption eliminates the object-variability term in the covariance matrix. In particular, as we shall see in section 6, the noise and object-covariance terms depend differently on radiation dose, so conclusions about the effect of dose in a BKE task are not applicable in the real world of random, unknown anatomical backgrounds.

An important early example of a mathematically tractable random background is the lumpy background developed by Rolland and Barrett (1992) and used by Myers et al (1990) to study the optimization of pinhole apertures in nuclear medicine. A later variant called the clustered lumpy background provides an excellent model for the random background inhomogeneity seen in mammography (Bochud et al 1998).

3.3.3. Approximating the covariance matrix and performing Its inverse.

An immediate difficulty with implementing the Hotelling observer is that the covariance matrix is enormous. As a simple example, a rotating-camera SPECT system with 64 2D projection images, each consisting of a grid of 128 × 128 pixels, acquires a data vector with approximately 106 elements, so the covariance matrix of the data is 106 × 106, or 1012 elements. Modern CT systems can deliver even larger data vectors and covariance matrices; a typical single-organ volumetric scan might consist of 2000 projection images, each 2500 × 320 pixels.

The daunting prospect of computing and storing such large matrices, much less inverting them, has led to a malady termed megalopinakophobia (fear of large matrices) (Barrett et al 2001). The key to the treatment of this syndrome is the covariance decomposition of (16) or (21), where the covariance matrix is split into two terms, one arising from Poisson and other noise sources, and one arising because the objects being imaged are unknown and unpredictable, hence best regarded as random processes.

Many studies of task-based image quality these days are performed with simulated objects. Simulations can be very realistic and flexible, and they avoid all issues related to knowing the true class of the object or the true values of parameters being estimated. Most importantly for the present paper, they permit arbitrary variation of the radiation dose delivered to the 'patient' and hence allow control of the relative importance of the two terms in the covariance decomposition. In particular, we can simulate images with no noise, and with modern computers we can simulate as many of them as we wish. Therefore we can generate image data or reconstructions related just to the second term in the decomposition, with no noise term. Analytic models such as (9) then permit construction of the first term at any desired level of radiation dose.

If we have simulated a set of Ns noise-free sample images, and we denote the kth such image as gk, then we can form the M × Ns matrix R given by Smith and Barrett (1986) and Fiete et al (1987),

Equation (49)

where

Equation (50)

Thus each column of R is $1/\sqrt{{{N}_{s}}-1}$ times the difference between a sample image and the arithmetic average of all of the sample images.

With these definitions, we can estimate the object covariance by a sample covariance matrix of the form,

Equation (51)

where the hat indicates an estimate. The overall covariance matrix can then be estimated via

Equation (52)

The noise covariance matrix $\mathbf{\bar{K}}_{\mathbf{g}}^{\text{noise}}$ will be diagonal if we use noise models (8) or (9) but it will still depend on the grand mean of the data, which for a linear system is $\overline{{\mathbf{\bar{g}}}}=\mathcal{H}\mathbf{\bar{f}}$ , where $\mathbf{\bar{f}}$ can be estimated as the average of the noise-free images used to construct R. Logically, therefore, we should include a hat on $\mathbf{\bar{K}}_{\mathbf{g}}^{\text{noise}}$ , but in practice the error in the estimate of this term is negligible because we are making use of the analytic noise models and we can treat $\mathbf{\bar{K}}_{\mathbf{g}}^{\text{noise}}$ as a known, full-rank, diagonal matrix.

An exact inverse of the estimated overall covariance matrix ${{\mathbf{\hat{K}}}_{\mathbf{g}}}$ can be obtained without onerous calculations or impossibly large matrix inversions by use of the Woodbury matrix inversion lemma (Tylavsky and Sohie 1986, Barrett and Myers 2004). From (52), the lemma allows us to write

Equation (53)

where I is the Ns × Ns identity matrix.

The advantage of this form is that $\mathbf{I}+{{\mathbf{R}}^{t}} {{\left[\mathbf{K}_{\mathbf{g}}^{\text{noise}}\right]}^{-1}} \mathbf{R}$ is an Ns × Ns matrix, readily inverted numerically for Ns up to 105 or so. The matrix $\mathbf{K}_{\mathbf{g}}^{\text{noise}}$ is still M × M, but it is full rank and in fact diagonal if we work with the noise model of (9). Hence all of the inverses indicated in (53), including ${{\left[{{\mathbf{\hat{K}}}_{\mathbf{g}}}\right]}^{-1}}$ itself, will exist.

3.3.4. Estimating the Hotelling template.

An alternative to inverting ${{\mathbf{\hat{K}}}_{\mathbf{g}}}$ is to use it in an iterative algorithm to find the Hotelling template. A simple manipulation of (46) shows that this template satisfies ${{\mathbf{K}}_{\mathbf{g}}}{{\mathbf{w}}_{\text{Hot}}}=\Delta \overline{{\mathbf{\bar{g}}}}$ , so an estimate of the template can be found by solving (see (52))

Equation (54)

A variant of the iterative Landweber algorithm for solving this equation is given in Barrett and Myers (2004); the iteration rule is

Equation (55)

where $\mathbf{\hat{w}}_{\text{Hot}}^{(n)}$ is the estimated template at the nth iteration and α is a positive constant that controls the convergence rate.

Once the template has been estimated, the SNR can be found by applying the template to a set of sample images, determining the mean and variance of the resulting scalar test statistic under each hypothesis, and estimating the observer's performance via (26). Alternatively, we can directly estimate $\text{SNR}_{\text{Hot}}^{2}$ by (48) as $\Delta {{\overline{{\mathbf{\bar{g}}}}}^{t}}{{\mathbf{\hat{w}}}_{\text{Hot}}}$ . For a discussion of the errors in the estimate, see Kupinski et al (2007).

3.3.5. Channelized Hotelling observers.

A channel vector is a relatively low-dimensional vector used in place of the original large data vector to perform the desired detection and/or estimation task. In almost all of the copious literature on this topic (which began with Myers and Barrett (1987)), the jth component of the channel vector is the scalar product of a channel template vector with a data vector of the same dimensionality:

Equation (56)

where Nc is the number of channels and ujm is the mth component of template vector uj. Equation (56) can also be written in matrix-vector form as

Equation (57)

where the jth column of the M × Nc matrix U is the template vector for the jth channel.

By analogy to (46), the weak-signal form of the Hotelling discriminant for operating on channelized data is

Equation (58)

where

Equation (59)

Equation (60)

Since Kv is an Nc × Nc matrix, computation of the CHO test statistic is much less onerous than for the full Hotelling observer.

The signal-to-noise ratio for the channelized Hotelling observer is given by (see (48))

Equation (61)

One possible objective in designing a set of channels is to make $\text{SNR}_{\text{CHO}}^{2}$ as close to $\text{SNR}_{\text{Hot}}^{2}$ as possible. Channels that achieve this goal to a good approximation are said to be efficient, meaning that there is little loss of information (defined by the performance of a Hotelling observer) as a result of the dimensionality reduction inherent in projecting the full data vector onto the channels. The condition for approximate efficiency in this sense is

Equation (62)

for all signals of interest. Channel vectors that satisfy this condition can be regarded as approximate sufficient statistics for the detection problem.

One way to construct efficient channels is to take advantage of the symmetries of the signal and noise in the images under study. For example, if we are dealing with 2D images and we know that the signal is rotationally symmetric and that the noise covariance is isotropic (independent of direction in the plane), then we can expand the channel template in rotationally symmetric functions such as Laguerre–Gauss polynomials (Barrett et al 1998b). For a detailed study of Laguerre–Gauss (LG) channels and the errors in estimating the CHO SNR from sample images, see Gallas and Barrett (2003).

3.3.6. Human observers and anthropomorphic models.

Much of the early development of statistical decision theory was aimed at analyzing the results of psychophysical studies with human observers (Tanner et al 1954, Peterson et al 1954, Swets 1964, Green and Swets 1966, Chesters 1992), and some of the early papers on the Hotelling observer showed that $\text{SNR}_{\text{Hot}}^{2}$ was a good predictor of human performance on SKE/BKS tasks with uncorrelated noise (Fiete et al 1987, Rolland and Barrett 1992). The work of Myers on channels (Myers and Barrett 1987), however, was motivated by the observation that the Hotelling observer was not a good predictor of human performance when the noise was correlated as a result of the high-pass filter required in tomographic reconstruction. At the time, available psychophysical data on perception in tomographic images (e.g. Judy and Swensson 1981, Burgess 1984, Myers et al 1985) were generally summarized by saying that humans were unable to prewhiten the noise. Myers, however, was able to build on many previous experiments in human vision (e.g. Campbell and Robson 1968, Sachs et al 1971, Graham and Nachmias 1971, Strohmeyer and Julesz 1972, Mostafavi and Sakrison 1976) which showed that humans process visual information through spatial-frequency-selective channels. Typically, the channels had about one octave of bandwidth in the magnitude of the spatial-frequency vector and ∼± 45° selectivity in the orientation of the spatial frequency vector. By incorporating such channels in a channelized Hotelling model, Myers was able to account for the extant data on human detection performance in tomographic images.

Many subsequent studies (Barrett et al 1992, Barrett et al 1993, Abbey and Barrett 2001) confirmed the broad applicability of the CHO as a predictor of human task performance and refined the model by including an additional source of noise internal to the visual system; no adjustable parameters were needed in the resultant model (Burgess and Colborne 1988, Eckstein et al 1997, Abbey and Barrett 2001).

Of particular importance to the present paper, Abbey and Barrett (2001) showed that the CHO correctly predicted the minimum contrast required for a human observer to detect a weak signal as a function of radiation dose, over a range of several decades in dose.

This same paper (Abbey and Barrett 2001) showed that the CHO performance and its correlation with human performance were very insensitive to the spatial-frequency profiles of the channels, so long as all channels had a bandpass structure with zero response at zero spatial frequency. Interestingly, the response of Laguerre–Gauss channels is maximum, not zero, at zero frequency, so LG channels should not be used as models for human observers.

For a comprehensive review of research through 2000 on visual detection tasks in correlated image noise with linear model observers, see Abbey and Bochud (2000), and for a practical guide to model observers for visual detection in synthetic and natural noisy images, see Eckstein et al (2000). A more recent review of these topics is given in Burgess (2011).

Yu et al (2013) investigated how well a channelized Hotelling observer predicts human observer performance to validate their use for the evaluation of CT reconstruction algorithms. Their work made use of a 2-alternative forced choice (2AFC) lesion-detection paradigm. They considered two different reconstruction algorithms: a filtered-backprojection (FBP) and an iterative reconstruction (IR) method. Images of a torso-shaped phantom containing three rods of different diameters that simulated small, medium and large lesions were reconstructed. Because the goal of the work was to show the predictive ability of the model observer as a surrogate for human observers, the observer model incorporated Gabor channels intended to model the human visual system. Sixty channels in all were utilized, including six frequency-selective passbands, five orientations, and two phases. With this model the investigators found that the performance of the human and model observers were highly correlated. Investigations of this type provide strong evidence that the channelized Hotelling observer with anthropomorphic channels can be used to predict human performance for the evaluation of CT image reconstruction algorithms.

The use of a CHO for the assessment of volumetric imaging data sets is presented in Platisa et al (2011).

3.3.7. Ideal observers for detection in non-Gaussian random backgrounds.

Linear observers are computationally convenient, and in section 6.3 they will prove to be useful vehicles for the study of task performance as a function of radiation dose. If the background statistics are not Gaussian, however, the performance of the ideal observer can significantly exceed that of the ideal linear observer, especially at high dose where object variability can dominate over measurement noise.

One fruitful approach to ideal-observer computation in medical imaging is the use of Markov-chain Monte Carlo (MCMC) techniques (Kupinski et al 2003c, Barrett and Myers 2004, He et al 2008, Park and Clarkson 2009), in which one generates a sequence of likelihood ratios drawn from the posterior density of the background given a measured image g or $\mathbf{\hat{f}}$ . The average of this sequence is then an estimate of the LR for that image, and repeating the estimation for many images leads to an ROC curve for the ideal observer. This method can then be used for optimizing imaging hardware (Gross et al 2003) or for determining the efficiency of a CHO (Park and Clarkson 2009) or a human observer (Park et al 2005) relative to the ideal observer.

Another approach is to use surrogate figures of merit based on Fisher information (see section 3.4), which accurately approximate the performance of the ideal observer (Shen and Clarkson 2006, Clarkson and Shen 2010, Clarkson 2012).

3.3.8. Stationary noise and Fourier methods for detection tasks.

Much of the early research on task-based assessment of image quality in radiology used continuous Fourier transforms to define quantities such as TF (transfer function), MTF (modulation transfer function), NPS (noise power spectrum), NEQ (noise equivalent quanta) and DQE (detective quantum efficiency), all of which are functions of the spatial frequency vector ρ. One of the first papers to make use of these quantities was the treatment of image quality in CT presented by Hanson (1979). In that paper Hanson lays out the various contributors to image quality in CT, including the noise power spectrum, the MTF, and the NEQ. Further, Hanson presents signal detection theory and the concept of an optimum receiver that makes use of knowledge of the noise correlations in the image as part of its test statistic, first in tutorial/general form and then as it applies to the particular case of CT, including the concept of dose efficiency in the comparison of systems. For a history of the use of Fourier methods in the evaluation of image quality, see Barrett (2009), and for a detailed critique of its applicability to task-based assessment, see section 16.1.6 of Barrett and Myers (2004).

Continuous Fourier transforms are useful when the image is a function of continuous spatial variables instead of discrete pixel indices; when the imaging system is linear and shift invariant, so that the operator $\mathcal{H}$ can be modeled as a convolution; and when both the noise and object randomness are statistically stationary continuous random processes, so that the autocovariance function of the continuous data depends only on the vector distance between two points of interest. When all of these conditions are satisfied for 2D continuous data, the Hotelling test statistic and SNR in the spatial-frequency domain are given, respectively, by (see (46) and (48))

Equation (63)

where ρ is a 2D spatial-frequency vector, G (ρ) is the 2D transform of the continuous data g (r), the asterisk denotes complex conjugate, and NPS(ρ) is the noise power spectrum, defined as the 2D Fourier transform of the autocovariance function of the data.

One thing to note about (63), in contrast to all of the other Hotelling expressions in this section, is that there is no inverse covariance present in either tHot or $\text{SNR}_{\text{Hot}}^{2}$ . In essence, under the stated assumptions, the Fourier transform simultaneously diagonalizes both the system operator $\mathcal{H}$ and the covariance operator $\mathcal{K}$ . The system operator (now a convolution) is realizable as a product in the Fourier domain and the inverse of the covariance becomes a simple division by the NPS.

As in the covariance decomposition of (16), the NPS can be expressed as a sum of two terms, one arising from the noise in the data conditional on the object, and one arising from object randomness, provided that both of these effects are described by stationary random processes. Even with the simplest of noise models, (8) or (9), however, the noise in imaging with ionizing radiation is not stationary because it depends on the mean image, which must necessarily vary from point to point in any interesting image. Similarly, the object variability, described in continuous terms by (19), cannot be expected to depend on only r − r'.

Another conclusion to be drawn from (63) is that this form of $\text{SNR}_{\text{Hot}}^{2}$ is independent of signal location; a shift in position of a signal in the space domain corresponds to a phase change in its Fourier transform, so the square modulus of $\Delta \overline{{\bar{G}}}\left(\mathbf{\rho}\right)$ is invariant to signal location.

Further problems arise when one replaces the Fourier integrals required in (63) with discrete Fourier transforms (DFTs) for purposes of computer implementation. The correct discretization of a linear, shift-invariant operator or a stationary autocovariance function is a Toeplitz matrix, but DFTs diagonalize only circulant matrices; for a graphical depiction of the difference between a Toeplitz matrix and its circulant approximation, see figure 7.15 in Barrett and Myers (2004). A DFT will actually undiagonalize a noise covariance described by (8) or (9).

In spite of these difficulties, there is considerable current interest in Fourier-based approaches to task-based assessment, with their simple, intuitive interpretations (Siewerdsen et al 2002, Richard et al 2005, Tward et al 2007, Richard and Siewerdsen 2008, Pineda et al 2008, Pineda et al 2012, Tward and Siewerdsen 2008, Tward and Siewerdsen 2009, Gang et al 2010). In many cases deviations from strict shift-invariance or strict stationarity can be accommodated by considering detection of a signal at a specific location r0 (in 2D or 3D as appropriate), then defining a local point spread function and a local autocovariance function, from which we can use a Fourier transform to get a local modulation transfer function LMTF(ρr0), and a local noise power spectrum LNPS(ρ; r0), respectively. There are many options as to how the intermediate functions are defined and how one handles the discretization, but the end result will be a Fourier-based formula for a Hotelling-like figure of merit appropriate to detection of a signal localized at r0.

For more on the mathematics of quasistationary noise and local Fourier metrics for image quality, see sections 13.2.3 and 13.2.4 in Barrett and Myers (2004). Fourier-based approaches to task-based assessment will be discussed further in section 7 of this paper.

3.4. Observers for estimation and joint classification/estimation tasks

For an estimation task in imaging, the term observer refers to an algorithm for assigning numerical values to all components of a parameter vector on the basis of some data vector. In most of this subsection we denote the data available for performing the task as g and refer to the estimate as $\widehat{\boldsymbol{\theta }}\left(\mathbf{g}\right)$ , but the results still apply if we substitute a reconstructed image $\mathbf{\hat{f}}$ for g.

We begin with a discussion of general estimation principles and then specialize them to the kinds of data and parameters of interest in imaging with ionizing radiation. The dose dependence of the resulting figures of merit will be discussed in section 6.4.

3.4.1. Likelihood and Fisher information.

The likelihood function for estimating a P × 1 parameter vector θ from an M × 1 data vector g is defined as the conditional probability density function (or probability, for discrete-valued data), $\text{pr}\left(\mathbf{g}\mid \boldsymbol{\theta }\right)$ . For a particular data vector, $\text{pr}\left(\mathbf{g}\mid \boldsymbol{\theta }\right)$ can be regarded as a function in a P-dimensional parameter space.

The score s(g) is a P × 1 vector of partial derivatives of the log of the likelihood function, with components given by

Equation (64)

In terms of repeated data acquisitions with θ fixed, the score is a random vector, and its mean is readily shown to be zero, i.e. 〈s(g)〉gθ = 0. The P × P covariance matrix of the score, which is called the Fisher Information Matrix (FIM) and denoted F(θ), is defined in component form by

Equation (65)

The FIM sets a fundamental lower bound, called the Cramér-Rao Bound (CRB), on the conditional variance of any unbiased estimator. If the bias of $\widehat{\boldsymbol{\theta }}\left(\mathbf{g}\right)$ is zero, then it can be shown that the variance of the pth component must satisfy

Equation (66)

An unbiased estimator for which this inequality becomes an equality is said to be efficient, which means that it achieves the best possible variance among all unbiased estimators for a specific estimation task.

It is always possible to find an estimator with zero bias just by ignoring the data and setting $\widehat{\boldsymbol{\theta }}$ to a constant. More generally, variance can always be made arbitrarily small just by accepting as much bias as needed. The tradeoff between variance and bias in an estimation task is analogous to the tradeoff between sensitivity and specificity in a detection task; in both cases, the tradeoff should be made in terms of clinical efficacy.

The form of the CRB in (66) immediately suggests that one or more of the diagonal elements of the inverse of the FIM can be used as FOMs for estimation tasks (Kupinski et al 2003a).

3.4.2. Maximum-likelihood estimation.

The maximum-likelihood estimate (MLE) is defined as the point in the P-dimensional parameter space where the likelihood function, $\text{pr}\left(\mathbf{g}\mid \mathbf{\theta}\right)$ for a given, observed g, is maximal. This condition is expressed mathematically as

Equation (67)

where the second form yields the same MLE as the first because the maximum of the logarithm of a function occurs at the same point as the maximum of the function itself.

The MLE is efficient (unbiased and best possible variance) in any problem where an efficient estimator exists. Further properties of the MLE are discussed in Barrett and Myers (2004), section 13.3.6.

3.4.3. Bayesian estimates.

In Bayesian estimation, the parameter of interest is treated as a random variable, described a priori (i.e. before any data are acquired) by a PDF $\text{pr}\left(\boldsymbol{\theta }\right)$ . The estimation rule is then based on the posterior density, $\text{pr}\left(\boldsymbol{\theta }\mid \mathbf{g}\right)$ , which is interpreted as the probability of the parameter after a data set is acquired. By Bayes' rule, this posterior can be written as $\text{pr}\left(\mathbf{g}\mid \boldsymbol{\theta }\right)\text{pr}\left(\boldsymbol{\theta }\right)/\text{pr}\left(\mathbf{g}\right)$ , and because $\text{pr}\left(\mathbf{g}\right)$ is independent of θ, the maximum a posteriori (MAP) estimate is given by

Equation (68)

Thus the MAP estimator maximizes the product of the likelihood and the prior. Equivalently, it is the mode of the posterior density.

Another important Bayesian estimator is the posterior mean,

Equation (69)

which can be shown to minimize the EMSE defined in (35).

3.4.4. Linear estimators for linear parameters.

As we saw in (30), a linear, scalar parameter has the form of a scalar product, θ (f) = χf. The general linear estimator of this parameter has the same structure as a general linear discriminant, $\widehat{\boldsymbol{\theta }}={{\mathbf{w}}^{t}}\mathbf{g}$ , and design of a linear estimator is equivalent to choosing the template w. The template can range from a simple indicator function that defines a region of interest to ones that account for the statistics of the object and image noise; whatever template is chosen, however, the task performance will depend on these statistics.

The Gauss–Markov Estimator (GME) accounts for the noise statistics but not the object variability. It applies to data obtained with a linear system operator $\mathcal{H}$ , and it minimizes the conditional MSE for a given object f by choosing (Barrett 1990)

Equation (70)

where $\mathbf{K}_{\mathbf{g}\mid \mathbf{f}}^{-1/2}$ is recognized from section 3.3 (see (43)) as a prewhitening operator and [ · ]+ denotes a Moore–Penrose pseudoinverse; for many properties of pseudoinverses and algorithms for computing them, see chapter 1 in Barrett and Myers (2004).

For completeness, the GME for estimation from a reconstructed image with a linear reconstruction operator $\mathcal{O}$ is given by

Equation (71)

Even though the GME was derived by minimizing MSE for a chosen object, we can still get the EMSE by averaging over an ensemble of objects. An expression for the resulting EMSE is given in Barrett (1990). This averaging step does not make the GME a Bayesian estimator because the statistics of the object ensemble are not used in defining the estimator.

3.4.5. Optimal Bayesian linear estimator.

We have noted above that the Bayesian estimator that minimizes EMSE among all estimators is the posterior mean estimator defined in (69), but the integral in that expression usually requires Markov-chain Monte Carlo techniques as in Kupinski et al (2003c). A more tractable approach is to use the Wiener Estimator (WE) which minimizes the EMSE among all linear estimators.

The most compact way of writing the WE is to assume that the prior means of the parameter and the data are zero, in which case the best linear estimate of a P × 1 parameter vector is given by

Equation (72)

where Kθ,g is the P × M cross-covariance matrix of the random parameter and the random data. When the prior means of the parameter and the data are not zero but known, the WE becomes

Equation (73)

In this form, the WE is an affine transformation of the data, not a strictly linear one.

3.4.6. Signal parameters.

To apply the estimators discussed above to imaging, it is useful to decompose the object being imaged into signal and background components. The distinction is that the signal contributes to task performance and hence to efficacy and the background does not; therefore we treat the background as a nuisance parameter, albeit a very high-dimensional one.

For imaging with ionizing radiation, where no part of the object is opaque to the radiation, we can write the signal/background decomposition as a sum, and for estimation tasks we can assume that the parameters of interest are associated only with the signal. Thus we can write the object function as

Equation (74)

or, in our usual shorthand, f = fbg + fsig(θ). Common choices for the components of θ might include its spatial coordinates, some measures of its shape and size, and a measure of the strength of the signal, such as its mean x-ray attenuation coefficient or the uptake of a radiotracer.

3.4.7. Gaussian likelihood models.

It may be a useful approximation to treat the likelihood for the signal parameters as a multivariate normal of the form

Equation (75)

This PDF is a complicated function of the parameter vector through the covariance matrix, Kgθ, but there are many cases where this dependence can be neglected. The easiest approach is to consider signals that are very weak compared to the background. If the signal is not weak, we may still be able to approximate Kgθ by either ${{\mathbf{K}}_{\mathbf{g}\mid \overline{\boldsymbol{\theta }}}}$ or 〈Kgθθ (Whitaker et al 2008, Kupinski et al 2013).

If the θ dependence of the covariance matrix can be neglected or approximated away, then a Gaussian log-likelihood can be written as

Equation (76)

which is a linear function of g. Under the same assumptions, the components of the Fisher information matrix can be shown to be:

Equation (77)

3.4.8. Linear observers for joint detection/estimation tasks.

As we mentioned in section 3.3, linear discriminant functions do not perform well on detection tasks when the signal, if present, can be at a random location in the image. Similarly when the task is to estimate the location, size or other parameters of a lesion at a random location, then linear estimators do not perform well either (Whitaker et al 2008).

A Scanning Linear Estimator (SLE) is an estimator that computes a scalar product of the data vector with a template w(θ) and then searches for the point in parameter space where the scalar product is maximum. Formally,

Equation (78)

If the parameter vector consists only of spatial coordinates of a tumor or other signal, then ${{\widehat{\boldsymbol{\theta }}}_{\text{SL}}}\left(\mathbf{g}\right)$ has a natural interpretation as spatial scanning of a template; more generally, (75) describes scanning in parameter space, not just spatially. The argmax operator is a nonlinear selection of the point in parameter space where the maximum occurs, so the SLE is nonlinear even though it performs only linear operations on the data.

Moreover, if the task requires detection of a signal before estimating its parameters, the same search in parameter space yields the test statistic for detection:

Equation (79)

As in any detection problem, we make the decision that the signal is present if t (g) exceeds the decision threshold (see (22)).

Conversely, any of the linear observers for detection discussed in section 3.3 can be converted to an SLE simply by showing its dependence on the signal parameters explicitly. For example, the Hotelling observer of (46) leads to

Equation (80)

If the data covariance arises entirely from the background, as in the case of a weak signal, then Kg is independent of θ and w(θ) can be computed with methods from section 3.3 without performing a matrix inverse at each point in parameter space.

The performance of an SLE on a joint detection/estimation task can be specified by the area under an LROC or EROC curve (see figure 2). If detection is not at issue, the SLE performance can be specified by some variant of EMSE or by an expected utility. All of these FOMs will vary with radiation dose (see section 6.4).

Khurd and Gindi presented ideal-observer decision strategies that maximize the area under the LROC curve (Khurd and Gindi 2005), and Clarkson extended that work to the area under the EROC curve (Clarkson 2007). For a general treatment of methods for performing, analyzing and validating observer studies involving detection and localization, see Chakraborty and Berbaum (2004).

An interesting practical application is the work of Yendiki and Fessler (2007), who considered model observers for which the decision variable was the maximum value of a local test statistic within a search area. Their goal was to optimize regularized image reconstruction for emission tomography with respect to lesion detectability. They made use of approximations of tail probabilities for the maximum of correlated Gaussian random fields to facilitate the analytical evaluation of detection performance. They illustrated the reasonable accuracy of these approximations in the regime of low probability of false alarms and their potential use in the optimization of regularization functions.

4. Radiation dose

4.1. Basic definitions

A confusing mélange of terminologies and definitions related to radiation dose has appeared in the literature, but in this paper we adhere to the 2007 recommendations of the International Commission on Radiological Protection (ICRP 2007, Wrixon 2008).

The basic quantity on which these recommendations are based is the absorbed dose, which is a measure of the energy per unit mass deposited in a material by ionizing radiation. Absorbed dose can depend on the location in the material, denoted by a 3D vector r, and it can also vary with time. Often we are interested in just the spatial distribution, so we integrate the dose per unit time over some relevant time, such as the duration of an x-ray imaging procedure or over 0 < t <  for radionuclide imaging, taking into account the physical halflife and the time-dependent biodistribution of the radiopharmaceutical.

The spatial distribution of absorbed radiation dose for a single study on a given patient is denoted d(r) and measured in grays (Gy), where 1 Gy = 1 J kg−1. Because dose is defined as absorbed energy per unit mass, the absorbed energy per unit volume is given by d(r)ρ (r), where ρ (r) is the tissue density at point r. Thus the basic definition of d(r) is that the differential energy dE absorbed in the differential volume element d3r is given by d(rρ (r) d3r.

The dose distribution can be integrated spatially to get the dose to any organ. If K different organs are considered, the dose to organ k, (k = 1, ..., K), is given by

Equation (81)

where Vk is the volume of organ k and the denominator is the mass of the organ. An expression similar to (81) applies if we consider a particular tissue type t confined to a volume Vt, and in that case we refer to the tissue dose (in Gy) as Dt.

The equivalent dose to tissue t, denoted as Ht, is defined by multiplying Dt by a factor QR for the relative biological effectiveness of different types of radiation, indicated by the subscript R. By international convention, this factor is taken to be 1.0 for x-ray photons and electrons of all energies. In this paper, we do not consider neutrons or alpha particles, which would have a factor QR different from 1, so for our purposes the numerical values for the absorbed tissue dose and the equivalent tissue dose are the same. However, according to the conventions of the ICRP, the equivalent dose is expressed in a unit with a different name, the sievert (Sv), where 1 Sv = 1 J kg−1. Therefore, we can define the equivalent tissue dose, in the context of medical imaging, as the absorbed dose in the tissue multiplied by 1 Sv Gy−1. The same conclusion applies to organ doses.

The effective dose, denoted E and also measured in sieverts, is defined by summing up all equivalent doses to all irradiated organs k, each weighted by a numerical factor wk defined by ICRP:

Equation (82)

The weighting factor wk is explained by ICRP (2007) as the

'... factor by which the equivalent dose in a tissue or organ ... is weighted to represent the relative contribution of that tissue or organ to the total health detriment resulting from uniform irradiation of the body.' [Emphasis added.]

Radiation detriment is defined by ICRP as a

'... function of several factors, including incidence of radiation-related cancer or heritable effects, lethality of these conditions, quality of life, and years of life lost owing to these conditions.'

In the ICRP view, effective dose has a very limited range of applications. According to a summary by Wrixon (2008),

'ICRP emphasises that the effective dose provides a measure of radiation detriment for protection purposes only. It does not provide an individual-specific dose and should not be used for epidemiological evaluations. Furthermore, the collective effective dose, the main use of which is in the optimisation of radiological protection, should not be used in epidemiological studies and in assessing the hypothetical number of cases of cancer or heritable disease in an exposed population.'

Because the weighting factors are defined so that they sum to unity, the ratio of the effective dose (in Sv) to the absorbed dose (in Gy) depends strongly on the volume of tissue irradiated. As an example, suppose that only the liver receives any absorbed dose in a CT scan, and that d(r) is constant over the liver and zero everywhere else. Moreover, take the density of the liver as 1 g cm−3 at all points and assume that the average liver has a mass of 1 kg. Then, if the absorbed energy is 0.2 J, the absorbed dose to the liver is 0.2 Gy or 200 mGy, but because the weighting factor for liver is 0.04, the reported effective dose would be 8 mSv. This reported value would cause little concern compared to other sources of whole-body radiation, but in fact the local absorbed dose for the liver would still be 200 mGy. The same 200 mGy dose delivered uniformly to the whole body, corresponding to about a 70-fold increase in energy absorbed in the body for an average adult male, would yield an effective dose of 200 mSv.

4.2. Other dose measures

4.2.1. Dose distribution concepts from radiation oncology.

Radiation treatment planning almost always starts by segmenting a CT image to identify the boundaries of the tumors and surrounding normal tissues. It is usually assumed for purposes of treatment planning that the tissue distribution within an organ is homogeneous and that the boundaries returned by the segmentation algorithm are exact. In some case substructures within an organ, such as cortical bone and marrow, are considered separately, and there are some current investigations into how much the substructure in organs like the lung would change the results of the dose calculations if they were not considered separately.

Even with the assumption that the tissues in an organ are homogeneous, the simulation tools used in therapy planning are able to provide some information about the distribution of doses within an organ. However, at the current state, this information is not used; the absorbed doses predicted by the plan are averaged over the organs, and often even these average values are then summarized by a single value, typically the effective dose.

If one wants to use the information available from the planning algorithm for more in-depth evaluation one could use the so-called dose–volume histograms (DVHs), preferably for each organ. This concept was introduced in 1979 in the context of radiation oncology (Shipley et al 1979, Drzymala et al 1991), where it is used to predict probability of tumor control and/or probability of healthy tissue detriment.

One might expect the DVH for an organ to be a histogram of the dose values in the volume elements (voxels) in that organ, hence a discretized approximation of the probability density function for occurrence of different values of the absorbed dose in a voxel, Dv. Mathematically, this PDF could be denoted $\text{pr}\left({{D}_{v}}\right)$ . In the radiation-oncology literature, however, the term DVH corresponds to a cumulative probability distribution function rather than a PDF. Specifically, the DVH is defined as the fraction of voxels that receive a dose greater than some amount D0, plotted as a function of D0. Thus it is a discretized approximation of the probability (not PDF) Pr(Dv > D0), which is expressed mathematically as $1-\underset{0}{\overset{{{D}_{0}}}{\int}} pr\left({{D}_{v}}\right)\text{d}{{D}_{v}}$ .

The equivalent uniform dose (EUD), a concept similar to that of DVH but more related to possible biological effects, was established by Niemierko (1997) in the context of radiation oncology. This concept can be used for any organ specifically or for the whole body. It is designed in a way that the current dose distribution is compared to another one which is assumed to have the same radiobiological effect. If the biological effect follows the linear-no-threshold theory, and if only photons and electrons are involved, the EUD is equivalent to average of the dose distribution. If so-called deterministic effects (see section 5) might occur, for example due to large dose variations inside an organ like the skin, the EUD can be much higher than the average dose.

4.2.2. Mean glandular dose.

The breast is not a homogeneous tissue. Rather, it consists of glandular tissue, fibrous tissue, fatty tissue and the skin. Mainly depending on the size of the breast, the fractional content of the various tissues is different and hence the optimum imaging conditions are different (Thilander-Klang et al 1997). Besides the different contrast appearances of the different tissues, it is also important to note that the glandular (and the fibrous) tissues are much more sensitive to ionizing radiation than the fatty tissue and the skin. To compare mammographic imaging systems on the basis of radiation risk, it is therefore more useful to evaluate the average dose to the glandular and fibrous tissue. This can be done by measuring the air kerma at the entrance of a breast phantom, multiplying this value by a conversion coefficient to get the entrance dose for the real breast and then multiplying by a factor relating this entrance dose to the mean glandular dose. These factors have been determined by Dance et al, first for 50% glandularity and then for various different glandularities (Dance 1990, Dance et al 2000). While Dance was assuming a homogeneous distribution in the main area of the breast between fatty and glandular tissue, this is obviously not the case. Therefore there are approaches that use more realistic phantoms of the breast either by evaluating large number of mammograms and finding mathematical representations resulting in 3D tissue distributions (Lau et al 2012, Bakic et al 2011, Bakic et al 2002) or by high-resolution 3D-scanning of specially fixed breast specimens from corpses (Hoeschen et al 2005). The segmented high-resolution phantoms need to be extended to whole-body phantoms in order to gain additional dosimetric information for comparison of different imaging modalities.

4.2.3. DLP and CTDI.

The effective dose is intended to be a way to compare systems regarding their radiation burden for a large group of patients undergoing similar examinations, but it is not a directly measurable unit. In addition, for CT applications, which generate the highest per capita radiation burden in most of the developed countries today, there is no available measurement system for really measuring radiation dose to the patient directly. Instead, various dose indicators have been proposed. They are typically based on measuring absorbed dose within polymethyl methacrylcate (PMMA) phantoms during standard dose protocols. With these values and the known system parameters for a particular patient examination, dose indicators such as the so-called dose–length product (DLP) and the computed tomography dose index (CTDI) can be calculated and displayed. The DLP in general is the dose in a certain area multiplied by the length of the investigated area; in a CT scanner this would mean the scan length (typically calculated as the slice thickness multiplied by the number of slices). However, there is not a measured or calculated value for the absorbed dose within the body of the patient. Therefore the CTDIvol is used as the multiplier.

The CTDI as it was originally defined by the Food and Drug Administration (FDA) in 1981 was introduced for axial scanners (Shope et al 1981). It was meant to characterize the dose from the primary beam and that resulting from scattered radiation emitted from surrounding slices in such a scanning geometry and for standard conditions. The CTDI is thus defined as the equivalent value of absorbed dose in a central slice assuming that the total radiation is concentrated in that specific slice. Normalization of this value by the current output of the tube gives a hint for the dose profile which can be reached by a scanner but not about the dose of the patient or the dose efficiency of the scanner. This basic concept has been modified in various approaches to make CTDI meaningful for spiral and multi-detector-row CT. CTDI100 is assumed to represent the dose contribution from a 100 mm measurement length along the axis with 100 mm dosimetric chambers measured in standard conditions for a single collimation scan still in a phantom. Doing such a measurement in the center of the acrylic phantom typically 32 cm or 16 cm diameter and on various positions at the periphery permits calculation of the weighted CTDI (CTDIw). For this purpose, one weights the central measurement by 1/3 and the average of the peripheral measurements by 2/3 as expressed in the following equation:

Equation (83)

From CTDIw one can calculate the volume CTDI (CTDIvol), which is the most commonly used CT dose index today, by dividing the weighted CTDI by the pitch factor. This dose index is somehow related to an absorbed dose of the patient in a specific volume, but only indirectly because it does not refer to either a phantom measurement or to measurements in the center and the periphery with a fixed weighting. Instead, CTDI refers to a standard operation protocol for the scanner. Using the CTDIvol, the scan length, the body part investigated and conversion coefficients resulting from calculation programs based on results from Monte Carlo simulations seems to give results consistent with thermoluminescent dosimeter (TLD) measurements in anthropomorphic phantoms (Lechel et al 2007) for calculated effective doses on standard patients as indicators for dose comparisons of systems. Using these values for risk estimates on single patients is probably not useful for various reasons.

4.2.4. Approaches to patient-specific dose.

One approach to getting a better patient-specific dose estimate is the size-specific dose estimate as introduced by the AAPM (AAPM 2011), which tries to find a way to calculate dose values dependent on the size of the patient. However, again this value does not take into account the patient-specific radiation sensitivity and it therefore does not give a true patient-specific risk value, but only an indicator.

Li et al (2011b) developed a method to estimate patient-specific dose and cancer risk from CT examinations by combining a validated Monte Carlo program with patient-specific anatomical models. The organ dose, effective dose, and risk index (a surrogate of cancer risk) was estimated for clinical chest and abdominopelvic protocols. In Li et al (2011a), this same group evaluated the dependence of dose and risk on patient size and scanning parameters. The reported relationships can be used to estimate patient-specific dose and risk in clinical practice for a given pediatric patient.

Schmidt and Kalender (2002) describe a voxel-based Monte Carlo method for scanner- and patient-specific dose calculations in computed tomography.

4.3. Computation of dose

4.3.1. X-ray imaging.

Dose values cannot usually be measured inside a patient. It is possible to measure the skin dose directly on the patient or to calculate it from dose measurements (mostly air kerma measurements) in the x-ray beam, but this measurement does not yield the dose distributions in specific organs throughout the body. Therefore, ICRP and ICRU as well as many research groups have been using simulation tools in combination with models of the human body and biokinetic models to determine absorbed doses to organs. The simulation tools that are most commonly used today are Monte Carlo simulations (e.g. EGS, MCNP, Fluka, Penelope and GEANT (Bielajew et al 1994, Baró et al 1995, Agostinelli et al 2003, Ferrari et al 2005, Allison et al 2006, Reed 2007, Ljungberg and King 2012, Pozuelo et al 2012)). The models range from pure mathematical models (ICRP 1979, Zankl et al 1988)) to various anthropomorphic voxel phantoms (Zubal et al 1994, Xu et al 2000, Petoussi-Henss et al 2002, Kramer et al 2004, Lee et al 2006, Menzel et al 2009, Xu and Eckerman 2010).

The voxel models for average adults in ICRP 110 (Menzel et al 2009) are based on real patient images that were scanned and segmented. These models were then modified to fit the conditions of the organs of average human beings. ICRP and ICRU are currently working on models for children of different ages. There are also other phantoms available (Zhang et al 2007, Becker et al 2007, Segars et al 2010). For all these phantoms and all these kinds of simulation tools, there are many publications investigating the variation of dose with different imaging parameters e.g. for angiography or CT scanning in Schlattl et al (2007, 2012), Li et al (2012).

Many details on computational phantoms were presented at the 4th International Workshop on Computational Phantoms for Radiation Protection, Imaging, and Radiotherapy in 2013, and selected papers from this workshop were published in a special section of Physics in Medicine and Biology in September, 2014; included is a topical review on computational phantoms (Xu 2014).

4.3.2. Radionuclide imaging.

Two separate steps are required for rigorous dose calculation in emission imaging and radionuclide therapy. First, the spatiotemporal radiopharmaceutical distribution should be determined. Then, in principle, the distribution of absorbed dose, d (r, t), should be calculated, but in practice usually only time-integrated organ doses are considered.

Time-dependent radiopharmaceutical distributions for nuclear medical imaging have been investigated in Zankl et al (2012) and Giussani et al (2012). A detailed review of dosimetry in nuclear medicine is given by Stabin (2006).

5. Relating dose to risk

5.1. Types and measures of risk

Risks associated with ionizing radiation have been known for almost as long as ionizing radiation itself (Doll 1995). Within a year of the discovery of x-rays by Röntgen skin burns had been reported (Gilchrist 1897, Stevens 1896), and within 7 years a case of skin cancer was observed (Frieben 1902), in all cases associated with high-dose x-ray exposure. In general, risks associated with ionizing radiation can be divided into the so-called stochastic effects (genetic risks in offspring, somatic effects (cancer) in directly exposed population), and so-called tissue-reaction effects (previously termed deterministic effects) (ICRP 2007, 2012). We briefly review the stochastic and tissue-reaction risks associated with exposure to radiation.

There are a number of commonly used measures of risk employed in epidemiological studies. The odds ratio (OR) is the ratio of the odds of disease occurrence in a group with exposure to a factor to that in an unexposed group: within each group, the odds are the ratio of the numbers of diseased and non-diseased individuals.

The relative risk (RR) is the ratio of the disease rate in a group under study to that in a comparison group, with adjustment for confounding factors such as age, if necessary. The excess relative risk (ERR) is RR − 1. The excess absolute risk (EAR) is the difference in absolute risk between exposed and control populations.

5.2. Tissue-reaction (deterministic) effects

Tissue-reaction (deterministic) effects generally occur only after high-dose acute exposure (mostly > 0.1 Gy, see table 2) and are characterized by nonlinear dose responses, with a threshold dose below which the effect is not observed. Because of these features, tissue-reaction effects are of most relevance in radiotherapy; normal tissue therapy doses are limited to avoid these effects. Tissue-reaction effects are thought to arise from the killing of large groups of cells in the tissues concerned, leading to functional deterioration in the organs affected. Tissue-reaction effects generally arise within days (e.g. gastrointestinal syndrome or central nervous system syndrome) or weeks (e.g. haematopoietic syndrome, pulmonary syndrome) of exposure; however, certain tissue-reaction effects (e.g. cataracts, hypothyroidisim) are manifest only over periods of years or more. Most of the information on tissue-reaction effects of radiation comes from (a) medically exposed groups, (b) the survivors of the atomic bombings of Hiroshima and Nagasaki, (c) radiation accidents and (d) animal experiments (Edwards and Lloyd 1998, ICRP 2012).

Table 2. Estimates of the threshold doses for approximately 1% incidence of morbidity in tissues and organs in adults exposed to acute, fractionated or protracted, and chronic irradiation (taken from ICRP (2012)).

Effect Organ/tissue Time to develop effect Acute exposure (Gy) Highly fractionated (2 Gy fractions) or equivalent protracted exposures (Gy) Annual (chronic) dose rate for many years (Gy/year)
Temporary sterility Testes 3–9 weeks ∼ 0.1 NA 0.4
Permanent sterility Testes 3 weeks ∼ 6 < 6 2.0
Permanent sterility Ovaries < 1 week ∼ 3 6.0 >0.2
Depression of haematopoiesis Bone marrow 3.7 d ∼ 0.5 10–14 >0.4
Xerostomia Salivary glands 1 week NA < 20 NA
Dysphagia, stricture Oesophagus 3–8 months NA 55 NA
Dysphagia, ulceration Stomach 2 years NA 50 NA
Stricture Small intestine 1.5 years NA 45 NA
Stricture Colon 2 years NA 45 NA
Anorectal dysfunction Rectum 1 year NA 60 NA
Hepatomegaly, ascites Liver 2 weeks–3 months NA < 30–32 NA
Main phase of skin reddening Skin (large areas) 1–4 weeks < 3–6 30 NA
Skin burns Skin (large areas) 2–3 weeks 5–10 35 NA
Temporary hair loss Skin 2–3 weeks ∼ 4 NA NA
Late atrophy Skin (large areas) > 1 year 10 40 NA
Telangiectasia at 5 years Skin (large areas) > 1 year 10 40 NA
Cataract (visual impairment) Eye > 20 years ∼ 0.5 ∼ 0.5 ∼ 0.5 divided by years duration
Acute pneumonitis Lung 1–3 months 6–7 18 NA
Oedema Larynx 4–5 months NA 70 NA
Renal failure Kidney > 1 year 7–8 18 NA
Fibrosis/necrosis Bladder > 6 months 15 55 NA
Stricture Ureters > 6 months NA 55–60 NA
Fracture Adult bone > 1 year NA 50 NA
Fracture Growing bone < 1 year NA 25 NA
  Muscle Several years NA 55 NA
Endocrine dysfunction Thyroid > 10 years NA > 18 NA
Endocrine dysfunction Pituitary > 10 years NA ⩽ 10 NA
Paralysis Spinal cord > 6 months NA 55 NA
Necrosis Brain > 1 year NA 55–60 NA
Cognitive defects Brain Several years 1–2 < 20 NA
Cognitive defects infants <18 months Brain Several years 0.1–0.2 NA NA

Edwards and Lloyd (1998) propose that the probability, P, of most tissue-reaction effects following an acute dose, D, is given by a modified Weibull distribution:

Equation (84)

where V is the shape factor, determining the steepness of the risk function, T is the threshold dose below which no effect is observed, and 1D > T is an indicator function which equals 1 if D > T and 0 otherwise. The quantity D50, defined as the dose at which the effect is expected to be observed in half the population, is a function of the radiation dose rate DR (in Gy h −1):

Equation (85)

ICRP (2012) recommends use of a practical threshold dose for various tissue-reaction effects, below which morbidity would occur at <1%, as indicated in table 2.

Although (84) is thought to describe most tissue-reaction effects, there are certain effects for which it probably does not apply. Particularly problematic in this respect are severe mental retardation and reduction in IQ following irradiation of the fetus. Otake et al (1996) and Otake and Schull (1998) observed a dose-related increase in severe mental retardation in those exposed in utero as a result of the atomic-bomb explosions in Hiroshima and Nagasaki. This effect was particularly marked for those exposed in the period 8–15 weeks post-conception. A threshold in the region of 0.1–0.3 Gy was indicated. The authors found a linear no-threshold reduction in IQ with increasing uterine dose in the atomic-bomb survivors, although thresholds of at least 0.1 Gy are consistent with the data (Otake and Schull 1998).

It is also arguable that a threshold may not apply for radiation-induced circulatory disease. There have been a number of reviews of circulatory disease in people exposed to low and moderate doses of radiation (Little et al 2008b, McGale and Darby 2005, McGale and Darby 2008, UNSCEAR 2008a, McMillan et al 2010, Little et al 2010, Little et al 2012). The systematic review and meta-analysis of Little et al (2012) documented statistically significant excess relative risk coefficients for three of the four major subtypes of circulatory disease (IHD, heart disease apart from IHD, stroke, all other circulatory disease) in people exposed to low and moderate doses (mean < 0.5 Sv) of radiation. For ischemic and non-ischemic heart disease, there was no significant heterogeneity in risks between the various studies; however, this was not the case for stroke and other circulatory diseases (Little et al 2012). There is also accumulating evidence for radiation-associated cataracts following moderate dose and low dose-rate radiation exposure (Little 2013).

5.3. Stochastic effects

Stochastic effects are the main late health effects that are expected to occur in populations exposed to ionizing radiation; somatic risks dominate the overall estimate of health detriment (see section 4.1). For both somatic and genetic effects the probability of their occurrence, but not their severity, is taken to depend on the radiation dose. The dose response may be non-linear, as for tissue-reaction effects. However, in contrast to the situation for tissue-reaction effects, for most stochastic effects it is generally accepted that at sufficiently low doses there is a non-zero linear component to the dose response, i.e. there is no threshold. There is little biological or epidemiological evidence for thresholds for stochastic effects (Little and Muirhead 1996, Little and Muirhead 1998, Pierce and Preston 2000, Little et al 2009).

5.3.1. Heritable genetic effects.

The heritable genetic risks associated with radiation exposure are estimated directly from animal studies in combination with data on baseline incidence of disease in human populations (UNSCEAR 2001). There are no usable human data on radiation-induced germ cell mutations, let alone induced genetic diseases, and the results of the largest and most comprehensive of human epidemiological studies, namely that carried out on the children of the Japanese atomic bomb survivors, are negative; there are no statistically significant radiation-associated adverse hereditary effects in this cohort (Neel et al 1990). The data on the induction of germline mutations at human minisatellite loci (Dubrova et al 1996), although of importance from the standpoint of direct demonstration of radiation-induced heritable genetic changes in humans, are probably not suitable for risk estimation; they occur in non-coding DNA and are thought not to be associated with heritable disease (UNSCEAR 2001, Little et al 2013).

Simple linear models of dose are generally employed to model genetic effects. For example, the preferred risk model used in the 2001 report of the United Nations Scientific Committee on the Effects of Atomic Radiation (UNSCEAR 2001) assumes that the excess heritable genetic risk for disease class i associated with a dose D of radiation to the parental gonads is given by:

Equation (86)

where Pi is the baseline incidence of that disease class, MCi is the mutational component of the disease class (defined as the relative increase in disease frequency (relative to the baseline) per unit relative increase in mutation rate (relative to the spontaneous rate)), PCRFi is the potential recoverability correction factor for the disease class (the fraction of induced mutations compatible with live births) and DDi is the mutational doubling dose (i.e. the dose required to double the mutational load associated with the disease). Table 3 summarizes estimates of risk for the first- and second-generation progeny of an irradiated population that has sustained radiation exposure in the parental generation and no radiation subsequently. These risk estimates assume a mutational doubling dose, DDi, of 1 Gy, that is based on human data on spontaneous mutation rates and mouse data on induced mutation rates (UNSCEAR 2001). Neel et al (1990) derived an estimate of the doubling dose in the atomic bomb survivors of 3.4–4.5 Sv, based on examination of a combination of five endpoints. However, as discussed in UNSCEAR (2001), in view of the differences between the endpoints considered by Neel et al (1990) and those employed by UNSCEAR, and the uncertainties in the estimates of Neel et al, their estimate of the doubling dose is entirely compatible with that used by UNSCEAR.

Table 3. Cancer and heritable genetic percentage risk estimates associated with low-dose irradiation (taken from UNSCEAR (2001, 2008b).

Somatic (cancer) mortality risksa   % Lifetime risk/Gy
Generalized relative risk models for Leukemia 0.50 (0.06, 1.09)b
solid cancer and for leukemia Solid cancer 5.45 (2.67, 8.51)b
Generalized relative risk models for Leukemia 0.52 (0.08, 0.99)b
solid cancer and for leukemia Solid cancer 4.48 (1.90, 7.33)b
Heritable genetic riskc    
Autosomal dominant, X-linked Risk to first generation 0.075–0.15
  Risk to second generation 0.05–0.10
Autosomal recessive Risk to first generation ≈ 0
  Risk to second generation ≈ 0
Chronic multifactorial disease Risk to first generation 0.025–0.12
  Risk to second generation 0.025–0.12
Developmental abnormalities Risk to first generation 0.20
  Risk to second generation 0.04–0.10
Total Risk to first generation 0.30–0.47
  Risk to second generation 0.12–0.32

aRisk calculated for a UK population assuming a test dose of 0.1 Gy administered. bMean and 95% BCI. cRisk calculated for radiation exposure in one generation only, with a dose and dose-rate effectiveness factor of 3 applied.

5.3.2. Somatic effects (cancer).

Most of the information on radiation-induced cancer risk comes from (a) the Japanese atomic bomb survivors, (b) medically exposed populations and (c) occupationally exposed groups (UNSCEAR 2008b). The Life Span Study (LSS) cohort of Japanese atomic bomb survivors is unusual among exposed populations in that both genders and a wide range of ages were exposed, comparable with those of a general population (Ozasa et al 2012, Preston et al 2007). Most medically treated groups are more restricted in the age and gender mix. For example, the International Radiation Study of Cervical Cancer patients (IRSCC), a cohort of women followed up after treatment for cancer of the cervix, were all treated as adults, most above the age of 40 (Boice et al 1987, Boice et al 1988). Organ doses among those treated with radiotherapy tend to be higher than those received by the Japanese atomic bomb survivors, although there are some exceptions, e.g. breast doses in the IRSCC patients (Boice et al 1987, Boice et al 1988).

Occupationally exposed groups are also more restricted in their age and gender mix. For example, the cohorts of workers exposed in the nuclear industry (Cardis et al 2007, Muirhead et al 2009) are overwhelmingly male and exposed in adulthood, as are the groups of underground miners (National Research Council 1999). For these reasons, most standard-setting bodies (National Research Council 1999, National Research Council 2006, ICRP 2007) use the LSS as the basis for estimates of population cancer risk associated with exposure to low-LET radiation. For certain cancer sites and types of radiation exposures, other groups are occasionally used; in particular for high-LET (α-particle) exposure to the lung, underground miners are used (National Research Council 1999). However, lung-cancer risks estimated for the miners by applying those estimated in the LSS in combination with the current ICRP dosimetric model (ICRP 1994) are close to those that can be estimated directly (Little 2002).

5.3.3. Temporal patterns of risk for radiation-induced cancer.

One of the principal uncertainties that surround the calculation of population cancer risks from epidemiological data results from the fact that few radiation-exposed cohorts have been followed up to extinction. For example, 58 years after the atomic bombings of Hiroshima and Nagasaki, about 42% of the survivors were still alive (Ozasa et al 2012). In attempting to calculate lifetime population cancer risks, it is therefore important to predict how risks might vary as a function of time after radiation exposure, in particular for that group for whom the uncertainties in projection of risk to the end of life are most uncertain, namely those who were exposed in childhood.

Analyses of solid cancers in the LSS and other exposed groups have found that the radiation-induced excess risk can be approximately described by a constant-relative-risk model (UNSCEAR 2008b). The time-constant excess relative risk (ERR) model assumes that if a dose of radiation is administered to a population, then, after some latent period, there is an increase in the cancer rate, the excess rate being proportional to the underlying cancer rate in an unirradiated population.

It is well known that for all cancer subtypes (including leukemia) the ERR diminishes with increasing age at exposure (UNSCEAR 2008b). For those irradiated in childhood there is some evidence of a reduction in the ERR of solid cancer 25 or more years after exposure. Therefore, even for solid cancers various factors have to be employed to modify the ERR. For many solid cancers a generalized relative risk model is commonly used, in which the cancer rate t years after exposure for gender s following exposure at age e to a dose D of radiation is given by

Equation (87)

where r0(a, s) is the cancer rate in the absence of irradiation, i.e. the baseline cancer rate, a = t + e is the age at observation (attained age) of the person and F(D) is the function determining the dose dependency of the cancer risk, to be discussed below. The expression ϕ(t, e, s) describes the modification to the ERR, F(D), as a function of time since exposure t, age at exposure e and gender s. Similar types of model have been fitted to leukemia (Hsu et al 2013). However, generalized absolute risk models can also be used to model both solid cancers and leukemia, in which the cancer rate t years after exposure for gender s following exposure at age e to a dose D of radiation is given by:

Equation (88)

The expression ψ(t, e, s) describes the modification to the EAR, F(D), as a function of time since exposure t, age at exposure e and gender s.

Given appropriate forms of the modifying functions ϕ(t, e, s) and ψ(t, e, s) of the relative and absolute risk respectively, equivalently good fits to the solid cancer and leukemia mortality dataset were achieved using both generalized ERR and generalized EAR models (Little et al 2008a). It is to some extent arbitrary which of these two models one uses. However, as can be seen from table 3, models with equivalent fit can yield slightly different lifetime population risks. The reason for this is that, as noted above, about 42% of the LSS cohort are still alive (Ozasa et al 2012), so that population risk calculations based on this dataset (used by many scientific committees (National Research Council 2006, ICRP 2007)) crucially depend on extrapolating the current mortality and incidence follow-up of this group to the end of life.

Uncertainties due to risk projection are greatest for solid cancers, because the radiation-associated excess risk in the LSS is still increasing (Ozasa et al 2012, Preston et al 2007). For leukemia the excess risk is reducing over time (Hsu et al 2013), and most models used predict very few radiation-associated leukemia deaths or cases from the current follow-up point in the LSS to extinction.

Use can also be made of so-called biologically based cancer risk models, as summarized by Little (2010). However, these would not be expected to yield materially different population cancer risks to those given by purely empirical models of the form discussed above, although they may be parametrically more parsimonious than purely empirical models.

5.3.4. Forms of cancer dose response.

It has been customary to model the dose-response function F(D) that appears in (87) and (88) in fits to biological (UNSCEAR 1993) and epidemiological data (UNSCEAR 2008b) by the linear-quadratic expression:

Equation (89)

There is significant curvilinearity in the dose response for leukemia in the LSS (Hsu et al 2013), although for solid cancers, apart from non-melanoma skin cancer (Ron et al 1998, Preston et al 2007), there is little evidence for anything other than a linear dose response in the Japanese cohort or in any other group (UNSCEAR 2008b). It should be noted that as well as modifications in effectiveness (per unit dose) relating to alterations in the total dose there are also possible variations of effectiveness as a result of dose fractionation (the process of splitting a given dose into a number of smaller doses suitably separated in time) and dose rate (UNSCEAR 1993). This is not surprising radiobiologically; by administering a given dose at progressively lower dose rates (i.e. giving the same total dose over longer periods of time), or by splitting it into many fractions, the biological system has time to repair the damage, so that the total damage induced will be less (UNSCEAR 1993). Therefore, although for cancers other than leukemia there is generally little justification for assuming anything other than a linear dose response, i.e. β = 0, it may nevertheless be justifiable to employ a dose and dose-rate effectiveness factor (DDREF) other than 1. The DDREF is the factor by which one divides risks for high-dose and high-dose-rate exposure to obtain risks for low doses and low dose rates. The ICRP (2007) recommended that a DDREF of 2 be used together with models linear in dose for all cancer sites, on the basis largely of the observations in various epidemiological datasets. UNSCEAR (2008b) used linear-quadratic models to derive cancer risks (as shown in table 3), so that a DDREF should not be applied.

Another form of dose response, perhaps less commonly used, slightly generalizes (89):

Equation (90)

and this form has been employed in fits to biological data (UNSCEAR 1993) and epidemiological data (Boice et al 1987, Thomas et al 1992, Little et al 1999). The αD + βD2 component represents the effect of (carcinogenic) mutation induction, while the exp(γD) term represents the effect of cell sterilization or killing. In general the cell sterilization coefficient γ is negative. Essentially this implies that there is a competing effect due to cell killing which is greater at higher radiation doses. A dead cell cannot proliferate and become the focus of a malignant clone. Variant forms of the cell-sterilization term exp(γD), incorporating higher powers of dose D, i.e. exp(γDk) for k > 1, are sometimes employed (Little and Charles 1997).

6. Relating task-based image quality to dose and risk

6.1. General considerations

The most basic relationship between image quality and radiation dose in x-ray and gamma-ray imaging is that they both depend, albeit in complicated ways, on the number of high-energy photons that interact with the patient's body. How one defines this number is different for external-source x-ray imaging (e.g. planar radiography, CT, digital breast tomosynthesis) and internal-source radionuclide imaging (e.g. planar nuclear medicine, SPECT, PET).

In external-source imaging, the dose distribution dj(r) for patient j is proportional to the mean number of x-ray photons (averaged over Poisson fluctuations) incident on the patient, denoted $\bar{N}_{j}^{\text{inc}}$ . Similarly, in radionuclide imaging, the dose distribution is proportional to the mean number of gamma rays emitted by the radiotracer before it decays fully or is eliminated from the body. We shall denote this number also as $\bar{N}_{j}^{\text{inc}}$ . In external-source imaging, $\bar{N}_{j}^{\text{inc}}$ can be controlled while holding all other factors constant by varying the current in the x-ray tube, and in radionuclide imaging it is controlled by varying the injected activity of the radiopharmaceutical.

For both external-source imaging and radionuclide imaging, we can write

Equation (91)

where d1j(r) is interpreted as the mean dose at point r in patient j per incident photon. Of course, d1j(r) is a complicated function of the parameters of the imaging system and the anatomy of patient j.

Because the dose to organ k in patient j is a linear functional of dj(r) (see (81)), we also have

Equation (92)

where D1kj is the dose to organ k of patient j per incident photon. Averaging both sides of (92) over an ensemble of patients, we can write the average organ dose as

Equation (93)

where the overbar on Dk indicates an average over patients; the second overbar is added to ${{\bar{N}}^{\text{inc}}}$ to account for any patient-specific variations in imaging protocol.

In contrast to the linear relations between dose and number of incident photons in (92) and (93), image quality depends nonlinearly on the number of detected photons and hence on the number of incident photons and the dose.

Let Q be some task-based figure of merit for an ensemble of patients undergoing a specific diagnostic examination with a specific instrument and imaging protocol. The figure of merit depends in a complicated way on the task, the patient ensemble and various system and algorithmic parameters, but if we hold these factors constant and focus on the effect of the number of detected photons in the image, it always turns out that more photons are better and that image quality can be increased by increasing the radiation dose to the patient.

To be more precise, suppose an imaging examination for patient j is based on detection of $ N_{j}^{\text{det}} $ photons, where $ N_{j}^{\text{det}} $ is a Poisson random variable with mean $ \bar{N}_{j}^{\text{det}} $ . If we average this number over the ensemble of patients undergoing the given examination, we can denote the resulting mean total number of detected photons per patient as $ {{\overline{{\bar{N}}}}^{\text{det}}} $ , and with all other factors held constant, it will be found that Q is a nondecreasing function of $ {{\overline{{\bar{N}}}}^{\text{det}}} $ . This statement is true even if there are other sources of noise besides Poisson statistics and even if the imaging instrument does not operate in a photon-counting mode.

To make the connection between image quality and radiation dose, we define an average detection efficiency (averaged over patients in the ensemble) as

Equation (94)

Thus $\bar{\eta}$ is the average fraction of photons incident on the patient that result in measurable signals in the detector system in the imaging instrument. For external-source x-ray imaging, $\bar{\eta}$ is determined by the transmission through the patient and the geometry and efficiency of the detectors. For radionuclide imaging, there is an additional factor of the image-acquisition time relative to the decay and washout of the radiotracer; all gamma-ray emissions in the patient's body contribute to the dose, but only those during the acquisition time contribute to the image.

We can now find an expression for the derivative of the image-quality metric Q with respect to the average dose ${{\bar{D}}_{k}}$ to any organ of interest:

Equation (95)

where ${{\bar{D}}_{k1}}\equiv \text{d}{{\bar{D}}_{k}}/\text{d}{{\overline{{\bar{N}}}}_{\text{inc}}}$ is the mean dose to organ k for one incident photon. Because dose depends linearly on the number of incident photons, this quantity can be estimated by any Monte Carlo dose-calculation program just by dividing the computed dose by the number of incident photons used. Similarly, the factor $\text{d}Q/\text{d} {{\overline{{\bar{N}}}}_{\text{det}}} $ is the increment in Q for one additional detected photon. This derivative can be determined by analytic methods surveyed in sections 6.3 and 6.4 or numerically by use of image-quality programs available on the internet (Ljungberg and King 2012), and we can then use (95) to relate the result to the derivative of the task-based image quality with respect to the mean organ dose. The conversion factor is simply the mean detection efficiency divided by the mean organ dose produced by one incident photon.

In some imaging studies, it will be possible to identify one or two critical organs that are at risk of damage. In these cases, (95) can be used to determine when an increased organ dose will result in little or no increase in image quality.

The relations in (95) will still hold if we perform a weighted average of the organ doses to get the effective dose E, as defined in (82), but (95) has the huge advantage that organ doses can be related directly to adverse effects of the radiation on particular critical organs, which is no longer possible if we try to summarize the organ doses with an effective dose.

6.2. Dose dependence of mean vectors and covariance matrices

In many cases, especially for complicated tasks and realistic object models, there is no substitute for simulation and numerical evaluation of figures of merit if one wants to study the relationship of image quality to radiation dose and patient risk. Nevertheless, some general conclusions and useful scaling laws can be derived from the analytical expressions for image-quality metrics developed in section 3.

In order to discuss how the various expressions for task performance developed in section 3 vary with absorbed dose, we must first decide how quantities such as the object f, system operator $\mathcal{H}$ , the mean and covariance of the data and various probability density functions relate to the mean number of detected photons. The discussion is different for linear operators as in SPECT or PET and nonlinear operators as in CT and mammography, and it is different for photon-counting and integrating detectors. It is also different for linear and nonlinear observers.

6.2.1. Emission computed tomography.

In SPECT and PET, the object of interest is the spatiotemporal distribution of a radiopharmaceutical, described in detail by the function f (r, t), which describes the mean number of emitted gamma-ray photons per second per unit volume in the patient's body. The mean number of photons incident (from within) on the patient's body, following an injection at t = 0 is given by

Equation (96)

where the single overbar indicates that ${{\bar{N}}^{\text{inc}}}\left(\mathbf{f}\right)$ is averaged only over Poisson statistics, not over object variability, so it remains a function of f. We can define a normalized object function f0(r, t) by

Equation (97)

so that $f\left(\mathbf{r},t\right)={{\bar{N}}^{\text{inc}}}\left(\mathbf{f}\right){{f}_{0}}\left(\mathbf{r},t\right)$ .

Because SPECT and PET systems are linear continuous-to-discrete mappings, the mean detector output (averaged only over Poisson statistics) is given by (see (2))

Equation (98)

where the image acquisition takes place over t1 < t < t2 and hm(r, t) is the probability (not PDF) that a gamma-ray photon emitted at time t and location r will be detected in detector bin m. The mean number of detected photons, in all bins, for a given object is

Equation (99)

where s (r, t), defined as $\underset{m=1}{\overset{M}{\sum}} {{h}_{m}}\left(\mathbf{r},t\right)$ is called the system sensitivity function; in SPECT it can vary with time as the detector rotates around the patient's body.

The mean number of incident photons, ${{\bar{N}}^{\text{inc}}}\left(\mathbf{f}\right)$ , varies from patient to patient because the total injected activity is not always the same and because different patients may have different biological elimination rates. It is reasonable to assume that these effects are statistically independent of the more complicated patient-to-patient variations captured in the normalized distribution f0(r, t). Thus the efficiency defined in (94) is given, for patients in group C, by

Equation (100)

where the expectation within the integral is the average of the normalized object distribution weighted by the system sensitivity. We see immediately that the efficiency can be increased just by increasing the acquisition time t2 −t1, though practical issues of patient motion and patient throughput may intrude.

As we saw in section 3, many figures of merit for image quality depend on some form of data covariance matrix. By (16) or (21), these matrices can always be decomposed into two terms, one arising from measurement noise and one from the randomness in the object being imaged. As noted in section 2.2, these terms have different dependences on $ {{\overline{{\bar{N}}}}^{\text{det}}} $ and hence on any measure of the radiation dose administered during the imaging procedure.

For SPECT and PET, the noise term in the covariance expansion is obtained from (8) by averaging over object variability, which requires only adding an overbar to f. From this result and (97) we obtain

Equation (101)

where the last step has used the fact that $\mathcal{H}$ is linear for SPECT and PET. We can express (101) in vector form, without explicit components, as

Equation (102)

where Diag indicates a diagonal matrix and the final covariance on the right is the noise covariance for the normalized object f0. Similar manipulations show that

Equation (103)

Thus, for the linear system operators in SPECT and PET, the noise covariance term for the raw projection data varies linearly with the mean number of incident photons, and hence linearly with any measure of radiation dose, and the object-variability term varies quadratically. The same conclusions hold if we consider SPECT and PET images reconstructed with linear or nonlinear algorithms.

6.2.2. Transmission computed tomography.

The situation is more complicated for x-ray modalities where the system operator is nonlinear and integrating detectors are usually used rather than photon-counting detectors as in SPECT and PET. We begin by examining the statistics of the raw detector outputs, and how they vary with dose, and then we derive expressions for the mean vectors and covariance matrices after logarithmic mapping and also after image reconstruction.

The object of interest in CT is the 3D distribution of x-ray attenuation coefficient in some portion of the patient's body. Though the attenuation coefficient is conventionally called μ, we refer to it as f in order to maintain a common notation across modalities. As noted in section 2, we can continue to write $\mathbf{\bar{g}}\left(\mathbf{f}\right)=\mathcal{H}\mathbf{f}$ , but the system operator is now nonlinear because f appears in an exponent. Moreover, in a complete treatment, the energy spectrum of the source, the energy dependence of the attenuation coefficient and the variation of the detector efficiency with photon energy would all have to be taken into account, yielding a very complicated nonlinear expression for the mean data (see section 16.1.4 in Barrett and Myers (2004), especially equation (16.109)).

For the purposes of this paper, however, we approximate the mean number of primary (unscattered) x-ray photons reaching the mth detector element in some exposure time by

Equation (104)

where $\mathcal{R}$ is a linear operator that integrates the object over a thin tube of directions from the x-ray source to the detector element and $\bar{N}_{m}^{\text{inc}}$ is the mean number of photons that would be incident on the detector element in the absence of the object, and hence it is the number incident on the object in the tube of integration when the object is present. The steps involved in getting to (104) are detailed in section 16.1.7 in Barrett and Myers (2004).

In addition to the primary photons, there are also some photons scattered in the patient's body that reach the detector element, so we can write

Equation (105)

Detected scattered photons convey very little information about the task, but it is important that we retain them in the noise analysis.

An initial step in the processing of CT data is usually to normalize each detector output to the number of incident photons (presumably known by careful scanner calibration) and take the negative logarithm of the ratio:

Equation (106)

The conditional mean of ym is

Equation (107)

Equation (108)

where we have used a truncated Taylor expansion, $ \ln (1+x)\approx x-\frac{1}{2}{{x}^{2}}$ , and the last step has used the noise model of (9). Notice that the variance of gm affects the mean of ym because of the logarithmic nonlinearity.

The usual assumption in CT is that ${{\bar{y}}_{m}}\left(\mathbf{f}\right)$ is equal to the tube integral ${{\left(\mathcal{R}\mathbf{f}\right)}_{m}}$ , but it isn't because of the last term in (108) and the scattered radiation. Instead we get

Equation (109)

where

Equation (110)

The second term in bm is negligible if the scatter-to-primary ratio is small, and the first term is small if the mean number of detected photons for one detector element is large; neither assumption is guaranteed to be valid in practice.

Calculation of the conditional covariance matrix, Kyf, follows the same pattern that led to (108), but with a different truncated Taylor series: [ln(1 + x)]2 ≈ x2. The result is

Equation (111)

Averaging over all objects f in set C yields the average noise covariance matrix for data vector y:

Equation (112)

In principle, this average can be estimated by collecting raw projection data for a large number of real or accurately simulated patients in some defined set C and estimating the ensemble average with an arithmetic average, separately for each index m.

The object term in the covariance matrix for y is complicated by the object-dependent bias bm. We should write (see (18))

Equation (113)

but for simplicity we assume that ${{b}_{m}}-{{\bar{b}}_{m}}$ is small, and we find

Equation (114)

where ${{\mathcal{K}}_{\mathbf{f}\mid \mathbf{C}}}$ was introduced in (19) and (20).

By analogy to (21), we can express the covariance matrix for a CT image reconstructed by a linear algorithm $\mathcal{O}$ , such that $\mathbf{\hat{f}}=\mathcal{O}\mathbf{y}$ , as

Equation (115)

6.2.3. Comparison of ECT and TCT.

We are now able to discuss the relationship of these various mean vectors and covariance matrices to radiation dose and to compare the results for transmission computed tomography (TCT, or CT for short) to what we found earlier for ECT (SPECT and PET). One obvious difference is that the object in ECT is the source of the ionizing radiation, so we immediately have that fECT is linearly related to the absorbed dose at any point or to any organ dose. In CT, on the other hand, fCT is an x-ray attenuation coefficient, hence unrelated to dose. In both cases, however, $\mathcal{H}\mathbf{f}$ is linearly related to dose; the proportionality to dose comes from f in ECT and from $\mathcal{H}$ in TCT.

These and other dose dependences are summarized in table 4, and their relation to figures of merit for image quality are discussed in sections 6.3 and 6.4.

Table 4. Dose dependences of mean vectors and covariance matrices for emission computed tomography (ECT) and transmission computed tomography (TCT). Photon-counting detectors are assumed for ECT and integrating detectors are assumed for TCT. Symbol D indicates a generic dose quantity (such as absorbed dose at a point, organ dose or effective dose), and D2, for example, indicates that the quantity in the left column varies quadratically with dose; D0 is used for a quantity independent of dose. The notation ≈ D0 in the TCT column means that the quantity in the first column would be independent of dose if we neglected the noise-induced bias resulting from the logarithmic step in CT reconstruction. Angle brackets, included only when they affect the dose dependence, denote an average over object variability. Constant c relates to the dose-independent electronic noise in integrating detectors, and constant a serves to scale the dose-dependent part of the noise covariance to the dose-independent part. Double right arrow indicates asymptotic dependence for large dose. All other symbols are defined in the text.

Quantity ECT TCT
f, $\mathbf{\bar{f}}$ D D0
$\mathcal{H}$ D0 D
$\mathbf{\bar{g}}$ , $\overline{{\mathbf{\bar{g}}}}$ D D
$\mathbf{\bar{y}}$ , $\overline{{\mathbf{\bar{y}}}}$ NA D0
$\overline{{\mathbf{\hat{f}}}}$ , $\overline{\overline{{\mathbf{\hat{f}}}}}$ D D0
$\mathbf{\bar{K}}_{\mathbf{g}}^{\text{noise}}$ D bD + c〉 ⇒ D
$\mathbf{K}_{{\mathbf{\bar{g}}}}^{\text{obj}}$ D2 D2
$\mathbf{\bar{K}}_{\mathbf{y}}^{\text{noise}}$ NA $\left\langle \frac{bD+c}{{{D}^{2}}}\right\rangle \Rightarrow {{D}^{-1}}$
$\mathbf{K}_{{\mathbf{\bar{y}}}}^{\text{obj}}$ NA D0
$\mathbf{\bar{K}}_{{\mathbf{\hat{f}}}}^{\text{noise}}$ D $\left\langle \frac{bD+c}{{{D}^{2}}}\right\rangle \Rightarrow {{D}^{-1}}$
$\mathbf{K}_{\overline{{\mathbf{\hat{f}}}}}^{\text{obj}}$ D2 D0

6.3. Dose dependence of figures of merit for classification tasks

The various task-based figures of merit introduced in section 3 can be classified according to the task (classification, estimation or both) and how much knowledge of the underlying object and data statistics is used by the observer. Even if the observers do not make use of certain statistical properties, the performance of the observers must necessarily depend on these statistics and hence on the radiation dose. In this section we illustrate these points for classification tasks with various examples.

6.3.1. Signal detection with linear discriminants and dose-independent templates.

Consider first a simple linear discriminant for the task of detecting the presence of a lesion at a single, specified location. If the task is performed with raw projection data g (as opposed to a reconstructed image $\mathbf{\hat{f}})$ , then the linear discriminant is defined by a template wld, and the test statistic is a simple scalar product:

Equation (116)

We saw in (46) that the Hotelling observer uses a test statistic of this form, but there the template depends on the data mean vectors and covariance matrices, hence on the radiation dose. Here we consider templates that do not depend on the dose.

The template might have been designed for a certain signal s0 and without regard to the background in which it is immersed, but if it is applied in a real clinical setting (as opposed to a simulation study or a simplified psychophysical study with synthetic data), then the inevitable random variations in signal and background determine the statistics of g and hence of tld(g).

These statistics can be different for images that do not contain the signal and those that do, so we define two subsets of objects C0 and C1, respectively, and we denote averages over data vectors in the subsets with the corresponding subscripts. Thus

Equation (117)

Because tld(g) is a scalar, it does not have a covariance matrix, but its variance can still be decomposed into terms arising from noise and object variability. From (16), it is easy to show that

Equation (118)

The signal-to-noise ratio associated with tld(g) is defined by (see (26))

Equation (119)

For a weak signal, it will usually be a valid approximation to replace the average variance in the denominator here with the signal-absent variance (see section 3.3).

The template wld is independent of dose, and the dose dependence of the terms in (117) and (118) can be found in table 4. For ECT, we find that

Equation (120)

where a and b are constants that depend not only on the task, system operator and object and image statistics, but also on the particular measure of dose used. If D is an organ dose, Dk, the constants depend on which organ is chosen.

For TCT with integrating detectors and the task performed on raw data, we find a different dependence,

Equation (121)

where c arises from the electronic readout noise, introduced in (9). The discussion below that equation shows that the readout noise is negligible if ${{\bar{g}}_{m}}\left(\mathbf{f}\right)>>\sigma _{m}^{2}$ . where σm is the standard deviation of the noise in detector m in units of the mean output for one x-ray photon, but this condition might be difficult to achieve for small detector pixels and low-dose imaging.

In TCT, the situation is more complicated when the detection task is performed on reconstructed images rather than raw projections, as it almost always will be in clinical practice. In that case w is a vector with the same dimension as $\mathbf{\hat{f}}$ , and ${{t}_{ld}}\left(\mathbf{\hat{f}}\right)=\mathbf{w}_{ld}^{t}\mathbf{\hat{f}}$ . Equations (117)–(119) still hold if we replace g with $\mathbf{\hat{f}}$ everywhere, but when we consult the TCT column of table 4 we find

Equation (122)

where a, b and c depend on the object statistics but are independent of D. The necessity for averaging the noise term over objects in this manner arises from the logarithmic nonlinearity used in CT reconstruction.

It follows from (121) and (122) that the SNR2 for any linear discriminant with a dose-independent template must asymptotically approach an upper limit as dose is increased.

6.3.2. Signal detection from raw projection data with Hotelling observers.

As we saw in section 3.3, the Hotelling observer is also a linear discriminant of the form (116), but now the observer has full knowledge of the mean and covariance matrix of the data and uses it optimally to construct the template w; as a result the template depends on the dose.

A convenient way to study the dose-dependence of the performance of the Hotelling observer is to use the Karhunen-Loève (KL) expansion of the estimated covariance matrix ${{\mathbf{\hat{K}}}_{\mathbf{g}}}$ , defined in (52). In brief, KL is an expansion of a covariance matrix in terms of its eigenvectors, with its eigenvalues as coefficients.

Following Kupinski et al (2007), we rewrite (52) as

Equation (123)

where I is the M × M unit matrix and R is the M × Ns matrix of noise-free sample images as defined by (49) and (50). We recognize that ${{\left[\mathbf{\bar{K}}_{\mathbf{g}}^{\text{noise}}\right]}^{-1/2}}$ has the same form as the prewhitening operator, introduced in (43), but it prewhitens with respect to the average noise covariance only. Because $\mathbf{\bar{K}}_{\mathbf{g}}^{\text{noise}}$ is often a diagonal matrix, as in (9), it can be very simple in practice to perform this kind of prewitening operation.

After prewhitening in this sense, the data vector g and the matrix R become, respectively,

Equation (124)

Thus (123) becomes

Equation (125)

The next step is to solve the eigenvalue problem for the M × M prewhitened sample covariance matrix ${{\mathbf{\widetilde R}}} {{{{\mathbf{\widetilde R}}}}^{t}}$ :

Equation (126)

If the complete set of M × 1 eigenvectors, {ϕj , j = 1, ..., M}, is found, it will form a complete orthonormal basis for the M-dimensional data space.

However, because a sample covariance matrix with Ns < M (where Ns is the number of samples) has rank Ns − 1, only Ns − 1 of the eigenvalues will be nonzero. We can find these nonzero eigenvalues and the corresponding eigenvectors by solving the eigenvalue problem for the much smaller Ns × Ns matrix ${{{{\mathbf{\widetilde R}}}}^{t}}{{\mathbf{\widetilde R}}} $ :

Equation (127)

It can be shown that all of these eigenvalues are nonnegative, and it is convenient to order them from largest to smallest, i.e. μ1 ⩾ μ2 ⩾ μ3 ⩾ .... ⩾ μNs − 1 > 0.

Equation (127) can be solved by standard linear-algebra programs such as Matlab or Lapack for Ns up to 104 or so, and once the Ns × 1 eigenvectors ψj and the corresponding eigenvalues μj are determined, the requisite M × 1 eigenvectors can be found from (Barrett and Myers 2004)

Equation (128)

With these eigenvectors and eigenvalues, we can express ${{\mathbf{\widetilde R}}} {{{{\mathbf{\widetilde R}}}}^{t}}$ exactly by its spectral representation,

Equation (129)

If we choose M − Ns + 1 additional vectors that form an orthonormal set and satisfy $\mathbf{\tilde{R}}{{\mathbf{\tilde{R}}}^{t}}\boldsymbol{\phi }_{{j}}=0,{{N}_{s}}<j\leqslant M$ , then we can express the M × M unit matrix as

Equation (130)

It turns out that we will not need any particular eigenvectors for j ⩾ Ns; it will suffice to know that they exist.

The estimated covariance matrix in the KL domain is now

Equation (131)

and by the orthonormality of the eigenfunctions, the inverse of the estimated covariance is

Equation (132)

If we use $\frac{1}{1+{{\mu}_{j}}}=1-\frac{{{\mu}_{j}}}{1+{{\mu}_{j}}}$ , we find

Equation (133)

This form of the inverse is equivalent to (53) but now in terms of KL eigenfunctions.

The estimated Hotelling FOM, $\widehat{\text{SNR}_{\text{Hot}}^{2}}\equiv \Delta {{\overline{{\mathbf{\bar{g}}}}}^{t}}\mathbf{\hat{K}}_{\mathbf{g}}^{-1}\Delta \overline{{\mathbf{\bar{g}}}}$ , is given in terms of these eigenfunctions by

Equation (134)

The first term in this expression is exactly what we would have for an SKE/BKE problem, and the second term (with its minus sign) is the estimated reduction in the FOM resulting from the object randomness. Because $\frac{{{\mu}_{j}}}{1+{{\mu}_{j}}}<1$ , the overall estimated SNR2 cannot go negative.

From table 4, we see that $\mathbf{\bar{g}}$ , $\overline{{\mathbf{\bar{g}}}}$ and hence $\Delta \overline{{\mathbf{\bar{g}}}}$ are ∝ D and that ${{\left[\mathbf{\bar{K}}_{\mathbf{g}}^{\text{noise}}\right]}^{-1/2}}\propto {{D}^{-1/2}}$ in ECT. It follows that ${{\mathbf{\widetilde R}}} {{{{\mathbf{\widetilde R}}}}^{t}}\propto D$ . Moreover, because the eigenvectors ϕj always have unit norm, μj, ∝ D. Thus we can write μj = μj1D, where μj1 is the eigenvalue found by solving (127) for D = 1 in whatever units we wish (e.g. Gy or mGY) and for whatever particular dose quantity we are discussing (e.g. organ dose, ROI dose or even effective dose.)

Assembling these results, we find that the estimated FOM scales with dose as

Equation (135)

where a is $\Delta {{\overline{{\mathbf{\bar{g}}}}}^{t}}{{\left[\mathbf{\bar{K}}_{\mathbf{g}}^{\text{noise}}\right]}^{-1}}\Delta \overline{{\mathbf{\bar{g}}}}$ as computed for D = 1 and γj is the difference signal after prewhitening for the noise covariance and projecting it onto the jth basis vector in the KL expansion of the object covariance:

Equation (136)

The result in (134) holds for TCT as well as ECT, in spite of the fact that the system operator $\mathcal{H}$ is nonlinear for TCT; the exponential nonlinearity is included in the simulation code that generates noise-free sample images. From results in table 4, we can also show that (135) holds for TCT provided the excess detector noise (c in the table) can be neglected.

6.3.3. Hotelling observer applied to reconstructed images.

So far we have discussed application of the Hotelling observer to raw projection data g, but the expression in (134) still applies formally if we substitute the reconstruction $\mathbf{\hat{f}}$ for g everywhere. The practical difference is that the prewhitening operator is more complicated with reconstructions.

For linear algorithms in ECT, we know from (21) that $\mathbf{\bar{K}}_{{\mathbf{\hat{f}}}}^{\text{noise}}=\mathcal{O}\mathbf{\bar{K}}_{\mathbf{g}}^{\text{noise}}{{\mathcal{O}}^{t}}$ , where the reconstruction operator $\mathcal{O}$ is an N × M matrix with elements Onm, with index n specifying a voxel located at position rn in the reconstructed image. With the pure Poisson noise model of (8), we find (see Barrett and Myers (2004), section 15.2.6)

Equation (137)

To understand this expression and its implications for prewhitening, it is useful to look at the elements of the conditional covariance matrix for a single object, ${{\left[\mathbf{K}_{\mathbf{\hat{f}}\mid \mathbf{f}}^{\text{noise}}\right]}_{nn\prime}}$ , obtained from (137) just by omitting the overbars on K and f. With (2), we find

Equation (138)

The conditional variance of the reconstruction at voxel n is the nn diagonal element of this covariance matrix, which can be written as

Equation (139)

where ℵn(r), called the noise kernel (see Barrett and Myers (2004), section 15.2.6), is defined by

Equation (140)

In 2D filtered backprojection (FBP), ℵn(r) varies approximately as ∣r − rn−1, with the disconcerting consequence that the variance at the center of the image of a disk object increases linearly, without bound, as the diameter of the disk increases (see Barrett and Myers (2004), section 17.3.2). A variance map (image of $\text{Var}({{\hat{f}}_{n}})$ versus rn) looks like a highly blurred version of the object f (r). Moreover, the correlations predicted by (138) have very long range, especially if a line drawn through voxels n and n' passes through the center of rotation in a tomographic imaging system; low-dose FBP images exhibit radial noise pattern extending across the whole reconstruction.

In short, the noise in tomographic reconstructions with linear algorithms is very nonlocal, with strong, long-range correlations. For the present discussion, this means that $\mathbf{\bar{K}}_{{\mathbf{\hat{f}}}}^{\text{noise}}$ , though in principle full rank, is nowhere close to being diagonal, and computation of the prewhitening operator ${{\left[\mathbf{\bar{K}}_{{\mathbf{\hat{f}}}}^{\text{noise}}\right]}^{-1/2}}$ is far from trivial.

The noise covariance matrix behaves quite differently when nonlinear reconstruction algorithms such as MLEM (maximum-likelihood expectation maximization) are used. The noise is much more local, the variance map looks very much like the object, and ${{\left[\mathbf{K}_{\mathbf{\hat{f}}\mid \mathbf{f}}^{\text{noise}}\right]}_{nn\prime}}$ falls off rapidly as ∣rn − rn∣ increases (Barrett et al 1994, Wilson et al 1994). The average noise covariance matrix $\mathbf{\bar{K}}_{{\mathbf{\hat{f}}}}^{\text{noise}}$ is sparse, and there are efficient algorithms for prewhitening.

6.3.4. Channelized Hotelling observers.

All of the computational difficulties associated with estimating SNR2 for the Hotelling observer, and its dependence on radiation dose, disappear if we use the channelized Hotelling observer. If the number of sample images is much greater than the number of channels (Ns >> Nc), then the sample covariance matrix for the channel outputs, ${{\mathbf{\hat{K}}}_{\mathbf{v}}}$ , will be invertible and well conditioned, for both raw data and for reconstructed images. There is no need to use the covariance decomposition, and there may not be much advantage to simulating noise-free images.

6.4. Dose dependence of figures of merit for estimation tasks

6.4.1. Linear estimation of tracer uptake in a defined ROI.

The estimators used for estimating radiotracer uptake in SPECT or PET are usually simple templates applied to the reconstructed images. The templates do not account for the spatial resolution of the imaging system, the background statistics, the noise correlations introduced by the reconstruction algorithm or the effects of radiation dose. They do not acknowledge the presence of null functions in the system operator or the effect of nuisance parameters. As a result, the estimates can be highly biased and suboptimal in terms of variance. Various correction methods referred to as contrast recovery or partial-volume correction have been invented, but they usually assume that the background is spatially uniform and nonrandom. For a recent review of partial-volume-correction techniques for ECT, see Erlandsson et al (2012).

The activity of a SPECT or PET tracer inside a specified ROI is a scalar defined as θ ≡ χf, where χ represents an indicator function χ (r) which equals 1 inside the ROI and 0 outside. The expression χf denotes a scalar product in object space as defined by the integral in (30).

A linear estimate of the uptake from a reconstructed image is $\hat{\theta}\left(\mathbf{\hat{f}}\right)\equiv {{\mathbf{w}}^{t}}\mathbf{\hat{f}}$ , where w is some template in reconstruction space, perhaps just a voxelized version of χ. For simplicity we assume that the reconstruction is linear so the operator $\mathcal{O}$ is the N × M matrix O.

It is straightforward to show that the bias and variance of this estimate, conditioned on a particular object, are given, respectively, by

Equation (141)

The conditional mean-square error, MSE(f), is related to conditional bias and variance by (32), and a further average over objects, followed by some algebra, yields

Equation (142)

The first term in this equation is the average variance in the estimate arising from noise in the image; the second term is the variance arising from object randomness, and the third term is the square of the ensemble-average bias. Both the second and third terms would be zero if we could choose the template so that ${{\mathbf{w}}^{t}}\mathbf{O}\mathcal{H}={\mathbf{\chi^\dagger}} $ , or equivalently ${{(\mathbf{O}\mathcal{H})}^\dagger} \mathbf{w}=\mathbf{\chi}$ , but this is not possible if the indicator function has null components, where $\mathbf{O}\mathcal{H}{{\mathbf{\chi}}_{\text{null}}}=0$ . In essence, the last two terms in (142) arise because we are trying to estimate a parameter that is not estimable.

With the dose dependences from table 4, we find that the first term in (142) varies linearly with any measure of dose, D, and the second and third terms go as D2. Of course, the parameter being estimated—activity in an ROI—is also linearly related to dose, and it is reasonable to normalize the estimate to the total injected activity. The normalized EMSE (denoted NEMSE) then has a noise term that varies as 1/D plus two dose-independent terms that come from object randomness and from the null functions. Smaller NEMSE is better, so its reciprocal can be used as a figure of merit, and we see that

Equation (143)

which is the same dependence as seen in (120) for a detection task with a dose-independent template.

6.4.2. More sophisticated estimators of tracer uptake in an ROI.

We cannot eliminate the bias term in (142), but we can minimize it by finding a least-squares solution to ${{(\mathbf{O}\mathcal{H})}^\dagger}\mathbf{w}=\mathbf{\chi}$ ; that is, we seek a template whose backprojection is, as closely as possible, the indicator function that defines the parameter being estimated. From Barrett and Myers (2004), section 1.7.4, the solution is

Equation (144)

where superscript + denotes a Moore–Penrose pseudoinverse. The operator $\mathbf{O}\mathcal{H}{{\mathcal{H}}^\dagger}{{\mathbf{O}}^{t}}$ is an N × N matrix, and in many cases the pseudoinverse becomes an ordinary matrix inverse.

As we saw in section 3.4, the Gauss–Markov estimator (GME) minimizes the first term in (143). From (70), as transcribed to apply to reconstructed images, its template satisfies

Equation (145)

The dose dependence of the FOM given in (143) still holds for the LS and GM templates, though of course the constants are different.

If we wish to minimize the sum of the three terms in (142), we should use the Wiener estimator from (73). For the special case of estimating a scalar linear parameter such as uptake in an ROI from a reconstructed image, the Wiener estimator becomes (Whitaker et al 2008)

Equation (146)

where $\bar{\theta}$ is the prior mean of θ. The corresponding EMSE is

Equation (147)

where $\sigma _{\theta}^{2}$ is the prior variance of θ. From the ECT column of table 4, $\sigma _{\theta}^{2}\sim {{D}^{2}}$ , ${{\mathcal{K}}_{\mathbf{f}}}\sim {{D}^{2}}$ , $\mathbf{\bar{K}}_{{\mathbf{\hat{f}}}}^{\text{noise}}\sim D$ and $\mathbf{K}_{\overline{{\mathbf{\hat{f}}}}}^{\text{obj}}\sim {{D}^{2}}$ . To get the overall dose dependence, we can approximate $\mathbf{K}_{{\mathbf{\hat{f}}}}^{-1}$ as in (133), but with eigenvalues and eigenvectors determined by use of simulated noise-free reconstructions $\overline{{\mathbf{\hat{f}}}}$ rather than simulated, noise-free g.

6.4.3. Estimation of signal parameters.

Now suppose we separate the object into background and signal parts as in (74), f = fbg + fsig(θ), and our task is to estimate the P × 1 signal parameter vector θ. For example, the components of θ might be the 3D coordinates, diameter and tracer uptake for a tumor modeled as a sphere, or they might include other parameters to incorporate departures from sphericity or spatial variations of uptake within the boundaries. Some of these components might be null functions (e.g. fine structures in the signal beyond the resolution limit of the system) and some may be nuisance parameters (e.g. precise location of a tumor when the task is to estimate the total tracer uptake).

The use of models specified by a small number of parameters usually means that all of the parameters are estimable—provided the model is correct. On the other hand, if the model is wrong (as all models are), then there is a new form of bias arising from modeling error. There is no reason to think that bias from modeling error will depend on dose in the same way as bias arising from properties of the estimator or bias stemming from null functions in the template that defines the parameter, but this point does not seem to have been studied in the imaging literature.

We will proceed as if the model used in defining fsig(θ) is an accurate description of the real object and θ is estimable. We assume also that the set of P parameters (components of θ) is adequate to describe the signal completely, with no remaining randomness.

6.4.4. Dose dependence of ML and MAP estimates of signal parameters.

In section 3.4 we introduced a multivariate normal data model and some reasonable approximations under which the log-likelihood, $ \ln\ \text{pr}\left(\mathbf{g}\mid \boldsymbol{\theta }\right)$ , reduces to a linear function of g, albeit a nonlinear function of θ (see (76)). The corresponding P × P Fisher information matrix is given in (77).

This data model was also considered by Abbey et al (1998), who used earlier work by Barndorff-Nielsen (1983) and Fessler (1996) to develop methods for approximating the PDF of maximum-likelihood and maximum a posteriori estimates. For low noise and ML estimation, Abbey et al showed that

Equation (148)

which has the same structure as the generic multivariate normal PDF (see (39)) except that the Fisher information matrix F (θ) has replaced the inverse of the covariance matrix, ${{\left[{{\mathbf{K}}_{\widehat{\boldsymbol{\theta }}\mid \boldsymbol{\theta }}}\right]}^{-1}}$ , and the true parameter vector θ has replaced the mean of the estimate.

The PDF in (148) could have been obtained by citing a theorem that says that in the asymptotic (low-noise) limit, a maximum-likelihood estimate is unbiased, efficient and normally distributed, but Abbey et al derived it from a different viewpoint that facilitated its extension to higher noise where the PDF is not multivariate normal and to MAP estimates which are necessarily biased. The approximations derived by Abbey et al were in excellent agreement with Monte Carlo simulations.

The FIM that appears in (148), defined in (77), involves ${\bf{K}}_{\bf{g}}^{ - 1}$ , which can be approximated by (133) if we can simulate realistic noise-free images. If all components of θ are defined to be independent of dose, then the derivatives in the FIM, $\partial {{\overline{{\bar{g}}}}_{m}}/\partial {{\theta}_{p}}$ , have the same dependence on dose as ${{\overline{{\bar{g}}}}_{m}}$ itself, and the dose dependence of any component of the FIM has the same form as the detection SNR2 in (135). The variance of any component of an ML estimate as a function of dose can then easily be found numerically by performing a P-dimensional inverse of the FIM, and from there we can get the dose dependence of the EMSE (35), the WEMSE (36), or any other desired measure of efficacy for the estimation task.

6.5. Graphical depictions of tradeoffs

The results in sections 6.3 and 6.4 enable us to plot task-based measures of image quality as a function of any desired measure of dose. Examples of such plots are given in figure 3, where the performance of a linear discriminant with a dose-independent template, referred to here as a non-prewhitening matched filter (NPWMF), is compared to the performance of a Hotelling observer, which is a linear discriminant where the template varies optimally with dose. The SNR2 was calculated from (120) for the NPWMF and from (135) for the Hotelling observer; in both cases, SNR was converted to AUC by use of (28). The abscissa in this figure, D, can be any convenient measure of dose, including absorbed dose at a point, dose in a region of interest, organ dose, or effective dose.

Figure 3.

Figure 3. Schematic quality-dose characteristic (QDC) curves for a binary classification task. Left: quality specified by SNR2. Right: quality specified by area under the ROC curve (AUC). For both panels D can be any suitable dose measure (see text).

Standard image High-resolution image

An interesting observation from figure 3 is that AUC as a function of D saturates at a level less than AUC = 1 for the NPWMF, so perfect performance cannot be obtained with this observer even at infinite dose. For the Hotelling observer, on the other hand, AUC is asymptotic to 1 as the dose increases (provided the covariance matrix is known accurately).

Note that the logarithm of SNR2 is plotted in figure 3. This makes it easy to see how the curves vary with signal contrast C. For both observers, SNR2 ∝ C2, so the graphs of ln[SNR2] simply shift vertically as the contrast is changed. Even though the AUC saturates at 1.0 at high dose for some specific contrast, it does not follow that more dose is not useful; it could be useful for detecting a signal with smaller contrast. For this reason, it can be very helpful to plot detection performance in terms of SNR rather than AUC.

The QDC curves of figure 3 can be converted to plots of task performance versus some measure of risk from the ionizing radiation. For this purpose we consider three response models introduced in section 5 and plotted in figure 4. The resulting quality-risk characteristic (QRC) curves derived by combining figures 3 and 4 are shown in figure 5. The qualitative features of the QDC are relatively insensitive to the response model chosen; in particular, the performance of the NPWMF saturates at AUC <1 as a function of risk for all models.

Figure 4.

Figure 4. Schematic curves of risk versus dose for three response models. The term 'risk' is used broadly here and in subsequent graphics to include any of the absolute or relative risks defined in section 5 or the rates of cancer incidence given by (87) or (88). The solid curve is the conventional linear response model with no threshold (LNT); the dashed curve is the linear-quadratic (LQ) model of (89), and the dotted line is a linear-quadratic model modified with an exponential factor (LQE) as in (90).

Standard image High-resolution image
Figure 5.

Figure 5. Schematic quality-risk characteristic (QRC) curves. Left: SNR2 on a binary detection task for each of the three risk models shown in figure 4 and for NPWMF and Hotelling observers. Right: AUC for the same observers and risk models.

Standard image High-resolution image

None of the three RDCs illustrated in figure 4 and used to construct figure 5 explicitly includes a threshold. The only threshold-based model introduced in section 5, and commonly used in the community, is the Weibull model of (84). This model for tissue-reaction effects has three free parameters: threshold T, shape factor V and dose for 50% response, D50. As discussed in section 5, there is no strong evidence for a non-zero threshold for certain of the tissue-reaction effects (in particular circulatory disease, cataracts and CNS damage); nevertheless, ICRP (2012) recommends use of a practical threshold dose, for various tissue-reaction effects, below which morbidity would occur at <1%, as indicated in table 2.

With this definition of the threshold, the Weibull distribution for various shape factors is shown in figure 6. The interesting range of doses on this plot is D ≪ D50, and the corresponding QRCs are very similar to those for the no-threshold models.

Figure 6.

Figure 6. Plots of the Weibull probability of a tissue-reaction effect. Left: threshold set at 1% incidence as recommended in ICRP (2012) and various shape factors. Right: fixed shape factor and various thresholds. Note that the graph on the right covers a small area in the lower left corner of the graph at the left.

Standard image High-resolution image

It is important to note that the ICRP-recommended thresholds for tissue-reaction effects are on the response axis, not the dose axis. In essence, ICRP merely says that if you keep the dose below the threshold listed in table 2, the incidence of the effect will be less than 1%.

Moreover, the dose must be in Gy, not Sv. A threshold, if there is one, must be local to where the adverse effect occurs in the body, not somehow averaged over the body or even necessarily over an organ. For example, radiation-induced cataractogenesis depends on dose to the lens, not the whole eye and certainly not the whole body. The energy absorbed in the lens divided by the mass of the lens is the only quantity that can matter.

7. Methods of dose reduction and their effect on task performance

7.1. General considerations

In SPECT, gamma-ray detectors are used with image-forming elements such as collimators, pinholes or coded apertures to form an image of the distribution of a gamma-ray-emitting radiopharmaceutical. Factors that influence task performance include: the properties of the image-forming elements; the characteristics of the image detector; the kind of information acquired and stored by the image detector; the reconstruction algorithm and, most importantly, the radiopharmaceutical itself.

PET imaging usually uses arrays of discrete detector elements operated in coincidence, so the image formation is implicit in the detector configuration itself. Other factors that can influence task performance include the coincidence timing resolution, the reconstruction algorithm and, again, the radiopharmaceutical.

In these emission modalities, the absorbed dose distribution in the patient's body, and hence any potential risk to the patient, is determined entirely by the radiopharmaceutical chosen and the amount of activity administered. Progress in detectors, image formation and reconstruction algorithms can improve task performance and diagnostic efficacy, but they will result in lower dose or risk only if the administered activity of the radiotracer is reduced.

Similarly, in transmission x-ray imaging modalities such as mammography and CT, the absorbed dose distribution and potential risk are determined entirely by the radiation incident on the patient's body. Image quality can be increased by improvements in the image detector, reconstruction algorithm and other imaging factors, but these measures result in dose reduction only if the incident radiation is reduced in some way. For a survey of strategies for reducing radiation dose in CT, see McCollough et al (2009) or Mattsson (2011).

Moreover, as we saw in sections 6.4 and 6.5, the relationship between radiation dose and objective measures of image quality can be very nonlinear, depending on the task, the observer and the class of objects being imaged. In particular, it is very important to consider both image noise, as expressed by Knoise, and object randomness, as given by Kobj. These covariance terms depend differently on dose (see table 4), and both must be considered in meaningful assessment of image quality and strategies for dose reduction. Because of the object randomness, there are many situations where the task performance saturates as a function of dose or risk (see figures 3 and 5), and in these cases dose reduction by reducing the injected activity in ECT or the tube current in TCT will not degrade image quality at all.

Conversely, any study of image quality or dose reduction that considers only a single patient or phantom, and states the result as contrast-to-noise ratio, subjective appearance or any other metric that does not involve an average over objects, will have little to do with clinical imaging.

In the subsections that follow, we present the major categories of approaches to dose reduction, provide some references to research on their relation to objective measures of image quality, and make general comments on the state of the art in each category.

7.2. DRAPE: Dose Reduction by Aperture Enhancement

It has long been recognized that pinholes and collimators are very inefficient in collecting gamma-ray photons from a radionuclide source. More photons can be collected by increasing the diameter of a pinhole or the bore of a collimator, but the concomitant loss in spatial resolution can degrade image quality as measured by task performance. One of the earliest papers on task-based assessment was by Tsui, Metz and others on optimizing collimators by using statistical decision theory and observer performance (Tsui et al 1978).

An alternative to pinholes and collimators is coded-aperture imaging, introduced into nuclear medicine in the early 1970s (Barrett 1972a, 1972b, Rogers et al 1972). Coded apertures can collect far more photons than pinholes or collimators for the same exposure time and spatial resolution, but the overlap of projection images, called mutiplexing, can reduce task performance. A proper analysis of the effect of multiplexing must account for both noise and object randomness. Myers et al (1990) evaluated the impact of spatially varying, or lumpy, backgrounds on the optimum aperture size for both detection and discrimination tasks. They found that, while detectability of a signal in a uniform background is maximized by enlarging the aperture size, a lumpy background results in an optimum aperture that is matched naturally to the size of the signal. For the task of discriminating a single signal from a double signal, a so-called Rayleigh task, the optimum aperture size is smaller in the presence of a lumpy background than the optimum aperture size for the flat-background case. These evaluations were among the first to demonstrate the use of statistical backgrounds in the objective evaluation of image quality using model observers in general, and the particular finding that there is a greater need for resolution in the presence of a non-uniform background than is needed when the background is known and uniform. Hesterman et al (2007) performed a detailed experimental and simulation-based study of multiple-pinhole apertures for lesion detection in a lumpy background. The experimental lumpy-background phantom consisted of plastic beads immersed in a radioactive solution. Measured covariance functions exhibited a power-law dependence previously seen in simulation studies, helping to validate the use of simulated lumpy backgrounds in observer studies. Detection performance was quantified by use of channelized Hotelling observer is used with Laguerre–Gauss channels as a function of radiation dose, signal size, lump size and aperture configuratrion. These studies reveal an improvement in AUC for certain ranges of signal and lump combinations through the use of multiplexed, multiple-pinhole apertures, indicating a need for task-specific aperture optimization.

There are many other papers in the literature that use task-based methods to optimize collimators for SPECT; examples include (Kijewski et al 2001, Zhou et al 2008, Zhou and Gindi 2009). For a review of assessment methodology based on Monte Carlo simulation of SPECT systems, see Ljungberg and King (2012).

7.3. DRUID: Dose Reduction with Upgraded Image Detectors

This subsection deals with SPECT and PET detectors; x-ray detectors are discussed in section 7.4.

7.3.1. Detectors for SPECT.

The detectors commonly used in SPECT and planar nuclear medicine are scintillation cameras, modern variants of the venerable Anger camera (Peterson and Furenlid 2011), but there is also considerable current interest in semiconductor detector arrays. The important characteristics of these detectors that can affect image quality, and eventually radiation dose, are detection efficiency, energy resolution, spatial resolution and overall detector area. The detector area divided by the area of one resolution cell is referred to as the space-bandwidth product.

Of these detector properties, perhaps the most important in terms of dose and image quality is the space-bandwidth product, which is basically the dimension M of the data vector g in the preceding sections. All else being equal, increasing M will always increase the task performance at any given dose, or reduce the dose required for a specified task performance for any task.

Moreover, increasing the space-bandwidth product of the detector also opens up the opportunity for new imaging configurations that improve task performance. For example, in a smulation of new pinhole imaging systems for brain SPECT, Rogulski et al (1999) demonstrate that use of a detector with a very large number of resolvable elements permits the use of a large number of pinholes, but without overlap of the images on the detector. The result is that the system can collect more photons without multiplexing or loss of spatial resolution than would be possible with a detector having smaller space-bandwidth product.

Improvements in energy resolution in SPECT detectors result immediately in improvements in task performance by reducing the number of scattered photons that contribute to the data. In addition, better energy resolution facilitates imaging studies with multiple radioisotopes or multiple emission lines from a single isotope. Improved energy resolution is the main motivation for developing semiconductor detector arrays.

One way to improve the spatial and energy resolution of either scintillation cameras or semiconductor detector arrays is by using maximum-likelihood estimation to determine the position and energy of the gamma-ray interaction event. For a review of ML estimation applied to scintillation cameras, see Barrett et al (2009), and for a discussion of real-time implementations, see Hesterman et al (2010). For more on this topic. see section 7.4.

7.3.2. Detectors for PET.

For a review of the basic principles of PET imaging, see Muehllehner and Karp (2006), and for a tutorial on recent developments in PET detector technology, see Lewellen (2008).

Current PET systems usually use discrete arrays of scintillation detectors. Space-bandwidth product in these detectors can be interpreted simply as the number of individual scintillation crystals. In an attempt to get high absorption efficiency for 511 keV gamma rays and good spatial resolution, these crystals are usually several cm long and a few mm in cross section.

Early PET systems used a single ring of scintillation detector to image a single 2D slice of the patient at a time, but recent trends are in the direction of multiple contiguous rings for imaging a significant volume of the patient with one position of the detector array. This benefits the photon collection efficiency, and it facilitates dynamic studies which are an important application of PET.

Advances in scintillation crystals, photodetector technology and readout electronics have led to improvements in coincidence timing resolution, thereby improving the rejection of accidental coincidences and allowing some degree of localization of a positron annihilation event in three dimensions by estimating the difference in time of flight (TOF) of the two gamma rays to their respective detectors.

These technological advances should lead directly to improvements in image quality or allow a reduction in the administered activity (hence reducing the absorbed dose) at the same image quality. A few groups are now using task-based assessment methods to investigate these gains quantitatively.

El Fakhri, Surti and coworkers have used model and human observers to demonstrate improvements in lesion detection in oncological PET both by volumetric imaging and with TOF-PET (El Fakhri et al 2007, El Fakhri et al 2011, Surti et al 2011). Lartizien et al (2004) compared the impact of 2-dimensional (2D) and fully 3-dimensional (3D) acquisition modes on the performance of human observers in detecting and localizing tumors in whole-body 18F-FDG images.

7.4. DRACO: Dose Reduction with the Advent of Counting Options

Detectors for high-resolution digital x-ray imaging usually consist of either a monolithic (non-segmented) slab of scintillator material with an array of photodetectors or a monolithic slab of a semiconductor and an array of electrodes (Yaffe and Rowlands 1997).

These detectors are used in an integrating mode where the accumulated signal after x-ray exposure for some time is read out and stored as the vector g of previous sections. An immediate difficulty is that the charge or light produced by a single x-ray photon usually spreads out over several contiguous readout elements (electrodes or photodetectors), making the noise covariance matrix $\mathbf{\bar{K}}_{\mathbf{g}}^{\text{noise}}$ nondiagonal and reducing task performance for all tasks. Further performance reduction arises from the inevitable noise contributions of dark current and various noise sources in the electronics. There is also the so-called Swank noise (Swank 1973) from the random amount of light or charge produced by different x-ray photons.

All of these deleterious effects can be ameliorated by reading out the charge or light signals produced from individual x-ray photons. The term photon counting is often used for this approach, but it is very misleading. The simplest, and most common, way of 'counting photons' is to determine the sensor element that receives the maximum signal for a particular x-ray interaction and assign the event to a data bin determined by the coordinates of that element. Moreover, the amount of signal in this chosen readout element is interpreted as the energy of the photon. At best it is an estimate of the energy deposited by the photon (which may not be all of its energy), and in practice it is a very poor estimate of the deposited energy because much of the deposited energy produces signals in adjacent elements, which are not even read out, much less used in the estimation of position and energy.

A much better approach, first applied to integrating semiconductor detectors by Marks et al (1999), is to read out the cluster of array elements that receive charge or light from a single x-ray photon and use all of that data to do ML estimation of the position and energy of the interaction. The rate at which these operations can be performed with modern readout technologies and computers is increasing rapidly, and there is no other approach that will produce more dose reduction in x-ray imaging.

To be specific, a digital x-ray detector fast enough to respond to individual x-ray interactions, to read out all sensor elements that respond to each event and to do ML estimation of position and energy would accomplish the following goals (Barrett 2007):

  • Eliminate Swank noise
  • Be immune to readout noise (thresholded away)
  • Have pure Poisson image statistics (conditional on f)
  • Have subpixel spatial resolution
  • Provide an estimate of depth of interaction
  • Provide accurate energy information.

The best current approach to meeting these goals might be the use of crystalline silicon detectors where the charge spreading is small and the output of a single electrode is a good estimate of energy.

In 1985 Tapiovaara and Wagner (1985) showed that a moderate increase in detection performance could be achieved by using a linear discriminant function that assigned different weights to photons of different energy. In a recent paper Kalluri et al (2013) investigated the improvements in task performance with energy weighting for photon-counting breast CT.

7.5. DROLL: Dose Reduction by Obtaining Likelihood Lists

A common misconception in imaging is that it is necessary to detect a large number of photons on each detector pixel in order to achieve a large detector signal-to-noise ratio (DSNR). This quantity, not to be confused with task-based observer SNR, is defined as the mean signal from a detector element divided by its standard deviation for a constant x-ray flux; for a pure photon-counting detector, DSNR is just the square root of the mean number of detected photons per detector pixel. Some CT manufacturers go so far as to bin together the outputs of adjacent detector elements, thereby deliberately degrading spatial resolution in a quest for increased DSNR. A recent paper by Baek et al (2013) shows that such binning also degrades performance on detection tasks, or equivalently requires more radiation dose for the same performance. In fact, one can show quite generally that best ideal-observer task performance for any photon-counting detector is obtained by making the detector elements very small, even to the point where the average number of detected photons per element is ≪1.

Rather than trying to construct detectors with ever smaller pixels, it is preferable to use a continuous, monolithic detector material and use ML estimation as discussed in section 7.4 to estimate the position and energy of each interaction. Once this is accomplished, the complete information about each photon interaction is preserved if we store the estimated position, energy and possibly other attributes in a list, what is commonly called list-mode storage.

The relation of list-mode data to task performance has been studied in Barrett et al (1997), Parra and Barrett (1998) and Caucci and Barrett (2012). A concept that emerges from these studies is that of the Ideal Dose Utilizer. Specifically, an Ideal Dose Utilizer is an x-ray or gamma-ray imaging system that maximizes the area under the quality-dose characteristic curve (AUQDC) by:

  • Responding to individual photons;
  • Collecting all sensor signals for each photon;
  • Using the sensor signals to compute ML estimates of photon attributes;
  • Recording the estimated attributes in list mode;
  • Implementing the list-mode ideal observer.

The list-mode ideal observer, is discussed in detail in Caucci and Barrett (2012) for detection tasks, and a similar development for estimation tasks is possible.

7.6. DRAMA: Dose Reduction by Accurate Modeling Algorithms

Iterative image-reconstruction algorithms are used routinely in PET and SPECT (Qi and Leahy 2006), and rapid advances in computer power are making them increasingly feasible for the large data sets in CT as well (Nuyts et al 2013). Iterative algorithms are not expected to be useful in dose reduction just because they are iterative, but rather because they reduce the level of inaccuracy in the modeling of the object and imaging system. Currently, all CT manufacturers implement some kind of iterative algorithm involving a system model, but details are usually not available in the literature.

Many iterative algorithms are variants on the well-known maximum-likelihood expectation-maximization (MLEM) algorithm (Dempster et al 1977, Shepp and Vardi 1982), so they require knowledge of the likelihood function, $\text{pr}\left(\mathbf{g}\mid \mathbf{f}\right)$ . The models used in practice for the likelihood functions are inaccurate for several reasons: (a) the object is represented by a finite set of numbers (usually voxel values) instead of the true continuous function; (b) the system is represented by a matrix rather than a CD operator; (c) various aspects of the physics of the radiation transport and detection process are omitted or oversimplified; (d) prior knowledge of the object support may not be used, and (e) the fact that the object cannot be negative may not be incorporated into the algorithm.

One of the earliest papers to make use of an objective assessment framework for the evaluation of image reconstruction algorithms, with attention to accurate modeling of the image-formation process, was the 1990 publication by Hanson (1990). In that work Hanson compared unconstrained ART (Algebraic Reconstruction Technique) to ART with a non-negativity constraint, and considered both noise-free and noisy data scenarios. Of most relevance to our emphasis in this section is that he explicitly modeled the continuous-to-discrete nature of the image formation process and incorporated object support into the reconstruction algorithm. The figure of merit was the detectability of low-contrast signals, where the signals had known, but random, locations. The task was further complicated in that the scenes contained significant artifacts generated by randomly-located high contrast disks, allowing for the evaluation of image quality in the presence of such artifacts particularly in the case of a limited number of projection views.

In spite of the high degree of interest in iterative algorithms for CT, the great majority of papers in this field do not make use of objective, task-based assessment of image quality. Instead they use subjective ratings of anecdotal images; opinion polls, such as preference studies or conspicuity and sharpness ratings with experienced radiologists; or various noise measures such as variances and contrast-to noise ratio. Recent examples of this kind of non-task-based assessment of iterative methods for CT include (May et al 2011, Hou et al 2012, Husarik et al 2012, Kalra et al 2012, Katsura et al 2012, Sato et al 2012, Singh et al 2012, Qi et al 2012, Ebersberger et al 2013).

Willemink et al (2013) performed a systematic review of the literature on iterative reconstruction in CT, covering the period January, 2006 through January 2012. They found that there were many indications that iterative reconstruction preserved image quality (compared to FBP) and allowed some dose reduction, but they concluded that 'IR has not yet been investigated with clinical diagnosis and accuracy as endpoints'.

A few recent papers have begun to fill this gap in the literature, using human observers, model observers or both for objective assessment of reconstruction algorithms.

Miéville et al (2013) evaluated human observer performance as well as a number of Stage 1 efficacy metrics (CT number accuracy, pixel standard deviation, noise power spectrum and modulation transfer function) for three commercial CT iterative reconstruction methods to assess their image quality and dose reduction performance. The human observers performed four-alternative forced-choice (4AFC) experiments to assess the detectability of low- and high-contrast objects embedded in two pediatric phantoms. These investigators found, not surprisingly, that the largest potential improvements in dose efficiency are to be found by making use of model-based iterative reconstruction algorithms that utilize a full volumetric model of the image acquisition process.

Xu and Tsui (2014) used approximations to the likelihood function and the ideal observer to investigate the effect of inaccurate statistical modeling on a detection task in CT.

An often overlooked form of modeling error is the use of an M × N matrix H to represent a CD operator $\mathcal{H}$ . Of course the use of a matrix to approximate an operator is inevitable in computer implementations, but inaccuracies in the representations can impair task performance. In general, the number of reconstruction voxels N should be large, and the elements of the matrix should be as nearly as possible Hmn = hm(rn), where rn is the 3D location of voxel n.

Finally, accurate modeling of the physics of the photon transport, image formation and photon detection is essential in order to get the best possible task performance. Many groups are addressing this issue in SPECT (Ljungberg and King 2012) and PET (Nuyts et al 2013).

7.7. DRESS: Dose Reduction Enabled by Sparse Sampling

Sparse sampling, also called compressive sensing, refers to the use of 'incomplete data' to form an image. In a way, this term is misleading because all imaging systems that map functions to discrete data sets have null functions, hence there are no complete data sets. In practice, sparse sampling in tomography refers to using fewer projections than would be indicated by sampling theory if the object is both space-limited and band-limited (which is not possible).

The number of projections can be restricted by collecting over a limited range of projection angles, collecting at larger angular increments, or both. In transmission tomography, restricting the number of projections results in lower dose if the dose per projection is held constant.

Hanson presented a framework for the objective evaluation of sparse data-sampling methods for CT in 1990 (Hanson 1990). In that work he compared variations of the ART algorithm in the presence of a limited number of projection views, where the task was the detection of low-contrast disk signals. The task was confounded by the presence of artifacts caused by high-contrast disks at random locations in the object. It is important to evaluate sparsely sampled systems with a framework of this kind, where the artifacts are randomized, to fully and realistically account for their associated degradation in task performance.

There is currently a high level of interest in applying sophisticated task-based assessment methodology to digital breast tomosynthesis (DBT) and related limited-angle CT systems. For reviews of DBT, see Dobbins and Godfrey (2003), Sechopoulos (2013a) and Sechopoulos (2013b), and for a comprehensive review of breast CT, see Glick (2007).

A recent paper by Young et al (2013) presented a virtual-trial framework for the assessment of DBT using digital phantoms, open-source x-ray transport codes, and a projection-space model observer (the channelized Hotelling observer). Using this framework, the authors compared various DBT system geometries (in terms of numbers of projection images and overall scan angle range, for example) in terms of the resulting detectability of a small signal embedded in a large sample of digital breast phantoms with randomly-varying anatomical backgrounds. Experiments were repeated for three test cases, where the detectability-limiting factor was dominated by anatomical variability, quantum noise, or electronic noise. It is notable that the authors made use of 1000 digital breast phantoms in an effort to more completely sample the range of patient variability in clinical imaging and thereby derive meaningful estimates of both the mean system performance and its variation resulting from random patient anatomical structure.

Young et al (2012) used the same virtual-trial framework to evaluate various exposure delivery schemes for DBT, including the standard approach that exposes the patient uniformly over projection angles, as well as exposure delivery schemes that emphasize either the central projections or the most oblique projections. This work further illustrates the power of virtual clinical trials using large samples of virtual patients and simulated image acquisition systems for the objective evaluation of image quality and patient exposure settings.

Rupcich et al (2013) evaluated various methods for reducing dose to the breast during CT coronary angiography. In a simulation study, they compare breast shielding, angular tube current modulation, reduced kV, and partial angle protocols. They make use of both a detection and a localization task in this work. They plot a variety of figures of merit versus dose, including contrast-to-noise ratio (CNR) and detectability. They state that the system rankings are pretty similar for these figures of merit, but in their discussion and conclusion they point out the concern that CNR does not account for noise correlations or artifacts.

Makeev and Glick (2013) have investigated a penalized maximum likelihood (PML) algorithm (using Huber and Total Variation priors) for breast CT, and have reported its performance with varying dose levels using a Fourier domain ideal observer model. This same group also studied the PML reconstruction algorithm for breast tomosynthesis, and showed improved performance at lower dose than with filtered backprojection (Das et al 2011).

Das et al (2009) present the results of human observer studies (no model observers) using an LROC study design to compare a tomosynthesis system with uniform dose distribution to an acquisition with variable dose distribution. This work thus considers the question of how best to allocate the dose to be incurred for a given imaging exam. They investigated both mass and microcalcification detection tasks; the images were simulated (without scatter or breast motion) and reconstructed using FBP.

Other recent papers on task-based assessment of DBT and related systems include (Das et al 2009, Park et al 2010, Reiser and Nishikawa 2010, Van de Sompel et al 2011, Warren et al 2012).

8.  Conclusions and recommendations for future research

This topical review has attempted to synthesize current knowledge on image-quality assessment, calculation and specification of radiation dose, and risk of various adverse effects from ionizing radiation, thereby providing the tools for a rigorous risk-benefit analysis of medical imaging systems based on gamma rays and x-rays.

In section 2 we introduced the mathematical framework for this synthesis, emphasizing two key points. First, objects to be imaged are correctly described as functions of spatial variables and possibly time, so the imaging system is a linear or nonlinear operator that, on average, maps a function of continuous variables to a discrete data vector. The basic equation $\mathbf{\bar{g}}=\mathcal{H}\mathbf{f}$ runs throughout this paper. Second, statistical deviations from this average arise from two fundamentally different sources: noise in the measurement process and randomness in the objects themselves. Another recurring basic equation is that the covariance matrix for the data can be rigorously decomposed in two terms, ${{\mathbf{K}}_{\mathbf{g}}}=\mathbf{\bar{K}}_{\mathbf{g}}^{\text{noise}}+\mathbf{K}_{{\mathbf{\bar{g}}}}^{\text{obj}}$ .

Section 3 is a survey of the principles of objective or task-based assessment of image quality. We consider only image-quality metrics that are defined in terms of patient benefit (efficacy), leaving aside metrics like contrast-to-noise ratio or subjective preference that are only indirectly related to task performance or efficacy. Even with this restriction, however, there is a wide variety of tasks and observers from which to choose. Section 3 starts with a review of the Fryback–Thornbury classification of stages of efficacy in medical imaging. Almost all of the past research on task-based assessment has focused on Stage 2, diagnostic accuracy, and in fact the same is true of this paper, but the long-term goal of image-quality research must be quantitative evaluation of the higher stages, especially Stage 5, patient outcomes, and Stage 6, societal outcomes. The remainder of section 3 surveys observers and figures of merit for classification tasks, parameter-estimation tasks and and joint classification/estimation tasks.

Section 4 is a summary of definitions and computational methods for absorbed dose. The important point is that absorbed dose is a function of spatial position and time, d (r, t). It is futile to reduce this function to a single global number, effective dose in sieverts, if one wants to discuss the localized risks that can arise in medical imaging. As quoted in section 4, ICRP makes it clear that effective dose is not intended for many of the uses to which it is commonly put, such as projecting the number of cancers in a population undergoing medical examinations with ionizing radiation.

Section 5 surveys our current understanding of these risks and the mathematical models for them. A useful advance in the terminology used in this field is that the category of deterministic effects is now called tissue-reaction effects; nothing is deterministic in medicine. Moreover, it is important that all of the models express risk in terms of absorbed dose in grays, localized at least to the organ level.

Section 6 brings the three threads—image quality, dose and risk—together by developing mathematical expressions and graphical depictions showing at least schematically how image quality varies with measures of dose and risk. Key to this discussion is table 4, showing how the various mean vectors and covariance matrices that are needed for calculating task performance depend on radiation dose. This table shows that the noise and object terms in the covariance decomposition depend differently on radiation dose, and these dependences are built into figures of merit for task performance.

Section 7 considers nine categories of methods that have been proposed for dose reduction in CT and other modalities. These categories differ widely in how far the concepts of task-based assessment have penetrated into them. For some categories, there are many groups working on rigorous assessment of image quality and, at least indirectly, the relation between quality and dose. In these categories we can cite many references, but in other categories there is almost no use of quality metrics related to patient benefit, hence very little to cite in this survey of risk-benefit analysis.

One goal for future research in this area is to begin to convert the schematic quality-dose and quality-risk characteristic curves introduced in section 6 into real curves with real numbers applicable to specific medical examinations and specific risks.

As noted in sections 3 and 6, task-based image-quality analysis requires rigor and realism in modeling imaging systems and the objects that will be imaged with them. The current rapid advances in using realistic object models, accurate simulated images and validated mathematical model observers to evaluate and optimize imaging systems bode very well for the future of this effort. The term virtual clinical trials has been coined for this direction of research, and an AAPM Task Group on Virtual Tools for the Evaluation of New 3D/4D Breast Imaging Systems is currently formulating methodologies and standards for it. Examples of virtual clinical trials relevant to analysis of the tradeoff between radiation dose and image quality include (Pineda et al 2006, Gifford et al 2008, Chawla et al 2009, Lu and Chan 2011, Van de Sompel et al 2011, Young et al 2012, Young et al 2013).

These advances also open up the prospect of extending image-quality studies to higher levels of efficacy. This step will require quantitative risks and utilities, especially for Stages 5 and 6, but these assignments seem to be within reach. For a survey of approaches to this problem, see Mortimer and Segal (2007).

A long-term goal for this field would be the ability to create societal efficacy characteristic (SEC) curves for different radiological imaging examinations. An SEC would be a plot of the change in quality-adjusted life years (QALYs) against the radiation dose delivered in an imaging study. For imaging with ionizing radiation, this curve would start at zero change for zero dose, because the study would not be performed, then it would increase with dose and reach a plateau where an increase in dose has little or no effect. Finally, as dose increases still further, the change in QALYs would eventually come down and maybe even go negative, because at some level ionizing radiation always does damage. The methods described in this paper may assist in locating the plateau rigorously and determining how much latitude clinicians have in choosing the operating point on an SEC curve.

Acknowledgments

This research was supported by the National Institutes of Health under grants R01 EB000803 and P41 EB002035 at the University of Arizona. This work was also supported by the Intramural Research Program of the National Institutes of Health, the National Cancer Institute, Division of Cancer Epidemiology and Genetics.

We wish to thank Luca Caucci for careful reading of the manuscript and thoughtful comments and Helen Fan for preparing the figures.

Footnotes

  • Typographically, we will always denote vectors of any dimensionality with lower-case bold roman (e.g. g, f, $\mathbf{\hat{f}})$ ; finite-dimensional matrices with upper-case bold roman (e.g. O, K), and general operators (which may reduce to matrices in special cases) with a bold calligraphic font (e.g. $\mathcal{H},\mathcal{O})$ .

  • We use the lower-case pr for PDF here, but the results are general enough to apply to discrete-valued random vectors as well.

Please wait… references are loading.