Skip to main content
Top
Published in: Health and Quality of Life Outcomes 1/2004

Open Access 01-12-2004 | Research

Sample size and power estimation for studies with health related quality of life outcomes: a comparison of four methods using the SF-36

Author: Stephen J Walters

Published in: Health and Quality of Life Outcomes | Issue 1/2004

Login to get access

Abstract

We describe and compare four different methods for estimating sample size and power, when the primary outcome of the study is a Health Related Quality of Life (HRQoL) measure. These methods are: 1. assuming a Normal distribution and comparing two means; 2. using a non-parametric method; 3. Whitehead's method based on the proportional odds model; 4. the bootstrap. We illustrate the various methods, using data from the SF-36. For simplicity this paper deals with studies designed to compare the effectiveness (or superiority) of a new treatment compared to a standard treatment at a single point in time. The results show that if the HRQoL outcome has a limited number of discrete values (< 7) and/or the expected proportion of cases at the boundaries is high (scoring 0 or 100), then we would recommend using Whitehead's method (Method 3). Alternatively, if the HRQoL outcome has a large number of distinct values and the proportion at the boundaries is low, then we would recommend using Method 1. If a pilot or historical dataset is readily available (to estimate the shape of the distribution) then bootstrap simulation (Method 4) based on this data will provide a more accurate and reliable sample size estimate than conventional methods (Methods 1, 2, or 3). In the absence of a reliable pilot set, bootstrapping is not appropriate and conventional methods of sample size estimation or simulation will need to be used. Fortunately, with the increasing use of HRQoL outcomes in research, historical datasets are becoming more readily available. Strictly speaking, our results and conclusions only apply to the SF-36 outcome measure. Further empirical work is required to see whether these results hold true for other HRQoL outcomes. However, the SF-36 has many features in common with other HRQoL outcomes: multi-dimensional, ordinal or discrete response categories with upper and lower bounds, and skewed distributions, so therefore, we believe these results and conclusions using the SF-36 will be appropriate for other HRQoL measures.
Appendix
Available only for authorised users
Literature
1.
go back to reference Altman DG, Machin D, Bryant TN, Gardner MJ: Statistics with Confidence: Confidence intervals and statistical guidelines 2 Edition London: British Medical Journal 2000. Altman DG, Machin D, Bryant TN, Gardner MJ: Statistics with Confidence: Confidence intervals and statistical guidelines 2 Edition London: British Medical Journal 2000.
2.
go back to reference Machin D, Campbell MJ, Fayers PM, Pinol APY: Sample Sizes Tables for Clinical Studies 2 Edition Oxford: Blackwell Science 1997. Machin D, Campbell MJ, Fayers PM, Pinol APY: Sample Sizes Tables for Clinical Studies 2 Edition Oxford: Blackwell Science 1997.
3.
go back to reference Fayers PM, Machin D: Quality of Life Assessment, Analysis and Interpretation Chichester: Wiley 2000. Fayers PM, Machin D: Quality of Life Assessment, Analysis and Interpretation Chichester: Wiley 2000.
4.
go back to reference Walters SJ, Campbell MJ, Lall R: Design and Analysis of Trials with Quality of Life as an Outcome: a practical guide. Journal of Biopharmaceutical Statistics 2001,11(3):155–176. 10.1081/BIP-100107655PubMedCrossRef Walters SJ, Campbell MJ, Lall R: Design and Analysis of Trials with Quality of Life as an Outcome: a practical guide. Journal of Biopharmaceutical Statistics 2001,11(3):155–176. 10.1081/BIP-100107655PubMedCrossRef
5.
go back to reference Walters SJ, Campbell MJ, Paisley S: Methods for determining sample sizes for studies involving health-related quality of life measures: a tutorial. Health Services & Outcomes Research Methodology 2001, 2: 83–99. 10.1023/A:1020102612073CrossRef Walters SJ, Campbell MJ, Paisley S: Methods for determining sample sizes for studies involving health-related quality of life measures: a tutorial. Health Services & Outcomes Research Methodology 2001, 2: 83–99. 10.1023/A:1020102612073CrossRef
6.
go back to reference Efron B, Tibshirani RJ: An Introduction to the Bootstrap New York: Chapman & Hall 1993.CrossRef Efron B, Tibshirani RJ: An Introduction to the Bootstrap New York: Chapman & Hall 1993.CrossRef
7.
go back to reference Morrell CJ, Spiby H, Stewart P, Walters S, Morgan A: Costs and effectiveness of community postnatal support workers: randomised controlled trial. British Medical Journal 2000, 321: 593–598. 10.1136/bmj.321.7261.593PubMedCentralPubMedCrossRef Morrell CJ, Spiby H, Stewart P, Walters S, Morgan A: Costs and effectiveness of community postnatal support workers: randomised controlled trial. British Medical Journal 2000, 321: 593–598. 10.1136/bmj.321.7261.593PubMedCentralPubMedCrossRef
8.
go back to reference Staquet MJ, Hays RD, Fayers PM: Quality of Life Assessment in Clinical Trials: Methods and Practice Oxford: Oxford University Press 1998. Staquet MJ, Hays RD, Fayers PM: Quality of Life Assessment in Clinical Trials: Methods and Practice Oxford: Oxford University Press 1998.
9.
go back to reference Ware JE Jr, Sherbourne CD: The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Medical Care 1992, 30: 473–483.PubMedCrossRef Ware JE Jr, Sherbourne CD: The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Medical Care 1992, 30: 473–483.PubMedCrossRef
10.
go back to reference Brazier JE, Harper R, Jones NMB, O'Cathain A, Thomas KJ, Usherwood T, Westlake L: Validating the SF-36 health survey questionnaire: new outcome measure for primary care. British Medical Journal 1992, 305: 160–164.PubMedCentralPubMedCrossRef Brazier JE, Harper R, Jones NMB, O'Cathain A, Thomas KJ, Usherwood T, Westlake L: Validating the SF-36 health survey questionnaire: new outcome measure for primary care. British Medical Journal 1992, 305: 160–164.PubMedCentralPubMedCrossRef
11.
go back to reference Pocock SJ: Clinical Trials: A Practical Approach Chichester: Wiley 1983. Pocock SJ: Clinical Trials: A Practical Approach Chichester: Wiley 1983.
12.
go back to reference Lehman EL: Nonparametric Statistical Methods Based on Ranks San Francisco: Holden-Day 1975. Lehman EL: Nonparametric Statistical Methods Based on Ranks San Francisco: Holden-Day 1975.
13.
go back to reference Noether GE: Sample Size Determination for Some Common Nonparametric Tests. J American Statistical Association 1987,82(398):645–647.CrossRef Noether GE: Sample Size Determination for Some Common Nonparametric Tests. J American Statistical Association 1987,82(398):645–647.CrossRef
14.
go back to reference Hamilton MA, Collings BJ: Determining the Appropriate Sample Size for Nonparametric Tests for Location Shift. Technometrics 1991,3(33):327–337.CrossRef Hamilton MA, Collings BJ: Determining the Appropriate Sample Size for Nonparametric Tests for Location Shift. Technometrics 1991,3(33):327–337.CrossRef
15.
go back to reference Simonoff JS, Hochberg Y, Reiser B: Alternative Estimation Procedures for Pr(X < Y) in Categorised Data. Biometrics 1986, 42: 895–907.PubMedCrossRef Simonoff JS, Hochberg Y, Reiser B: Alternative Estimation Procedures for Pr(X < Y) in Categorised Data. Biometrics 1986, 42: 895–907.PubMedCrossRef
16.
go back to reference Whitehead J: Sample size calculations for ordered categorical data. Statistics in Medicine 1993, 12: 2257–2271. [published erratum appears in Stat Med 1994 Apr 30;13(8):871].PubMedCrossRef Whitehead J: Sample size calculations for ordered categorical data. Statistics in Medicine 1993, 12: 2257–2271. [published erratum appears in Stat Med 1994 Apr 30;13(8):871].PubMedCrossRef
17.
go back to reference Campbell MJ: Statistics at Square Two: Understanding Modern Statistical Applications in Medicine London: British Medical Journal 2001. Campbell MJ: Statistics at Square Two: Understanding Modern Statistical Applications in Medicine London: British Medical Journal 2001.
18.
go back to reference Shepstone L: Re-conceptualising and Generalising the Absolute Risk Difference: A unification of Effect Sizes, Odds Ratios and Number-Needed-to-Treat. Journal of Epidemiology & Community Health 2001,55((Suppl 1) 1a):A7. Shepstone L: Re-conceptualising and Generalising the Absolute Risk Difference: A unification of Effect Sizes, Odds Ratios and Number-Needed-to-Treat. Journal of Epidemiology & Community Health 2001,55((Suppl 1) 1a):A7.
19.
go back to reference Collings BJ, Hamilton MA: Estimating the Power of the Two-Sample Wilcoxon Test for Location Shift. Biometrics 1998, 44: 847–860.CrossRef Collings BJ, Hamilton MA: Estimating the Power of the Two-Sample Wilcoxon Test for Location Shift. Biometrics 1998, 44: 847–860.CrossRef
20.
go back to reference Walters SJ, Brazier JE: Sample Sizes for the SF-6D Preference Based Measure of Health from the SF-36: A Comparison of Two Methods. Health Services & Outcomes Research Methodology 2003, 4: 35–47. 10.1023/A:1025876827228CrossRef Walters SJ, Brazier JE: Sample Sizes for the SF-6D Preference Based Measure of Health from the SF-36: A Comparison of Two Methods. Health Services & Outcomes Research Methodology 2003, 4: 35–47. 10.1023/A:1025876827228CrossRef
21.
go back to reference Simon JL: Resampling Stats: Users Guide. v5.02 Arlington: Resampling Stats Inc 2000. Simon JL: Resampling Stats: Users Guide. v5.02 Arlington: Resampling Stats Inc 2000.
22.
go back to reference Ware JE Jr, Snow KK, Kosinski M, Gandek B: SF-36 Health Survey Manual and Interpretation Guide Boston, MA The Health Institute, New England Medical Centre 1993. Ware JE Jr, Snow KK, Kosinski M, Gandek B: SF-36 Health Survey Manual and Interpretation Guide Boston, MA The Health Institute, New England Medical Centre 1993.
23.
go back to reference Elashoff JD: nQuery Advisor Version 3.0 User's Guide Los Angeles Statistical Solutions 1999. Elashoff JD: nQuery Advisor Version 3.0 User's Guide Los Angeles Statistical Solutions 1999.
24.
go back to reference Sullivan lM, D'Agostino RB: Robustness and power of analysis of covariance applied to data distorted from Normality by floor effects. Statistics in Medicine 1996, 15: 477–496. Publisher Full Text 10.1002/(SICI)1097-0258(19960315)15:5<477::AID-SIM217>3.0.CO;2-RPubMedCrossRef Sullivan lM, D'Agostino RB: Robustness and power of analysis of covariance applied to data distorted from Normality by floor effects. Statistics in Medicine 1996, 15: 477–496. Publisher Full Text 10.1002/(SICI)1097-0258(19960315)15:5<477::AID-SIM217>3.0.CO;2-RPubMedCrossRef
25.
go back to reference Heeren T, D'Agostino RB: Robustness of the two independent samples t -test when applied to ordinal scaled data. Statistics in Medicine 1987, 6: 79–90.PubMedCrossRef Heeren T, D'Agostino RB: Robustness of the two independent samples t -test when applied to ordinal scaled data. Statistics in Medicine 1987, 6: 79–90.PubMedCrossRef
26.
go back to reference Sullivan lM, D'Agostino RB: Robustness and power of analysis of covariance applied to ordinal scaled data as arising in randomized controlled trials. Statistics in Medicine 2003, 22: 1317–1334. 10.1002/sim.1433PubMedCrossRef Sullivan lM, D'Agostino RB: Robustness and power of analysis of covariance applied to ordinal scaled data as arising in randomized controlled trials. Statistics in Medicine 2003, 22: 1317–1334. 10.1002/sim.1433PubMedCrossRef
27.
go back to reference Julious SA, George S, Machin D, Stephens RJ: Sample sizes for randomized trials measuring quality of life in cancer patients. Quality of Life Research 1997, 6: 109–117. 10.1023/A:1026481815304PubMedCrossRef Julious SA, George S, Machin D, Stephens RJ: Sample sizes for randomized trials measuring quality of life in cancer patients. Quality of Life Research 1997, 6: 109–117. 10.1023/A:1026481815304PubMedCrossRef
28.
go back to reference Thompson SG, Barber JA: How should cost data in pragmatic randomised trials be analysed? British Medical Journal 2000, 320: 1197–1200. 10.1136/bmj.320.7243.1197PubMedCentralPubMedCrossRef Thompson SG, Barber JA: How should cost data in pragmatic randomised trials be analysed? British Medical Journal 2000, 320: 1197–1200. 10.1136/bmj.320.7243.1197PubMedCentralPubMedCrossRef
29.
go back to reference Hogg RV, Tanis EA: Probability and Statistical Inference 3 Edition New York: McMillan 1988. Hogg RV, Tanis EA: Probability and Statistical Inference 3 Edition New York: McMillan 1988.
30.
go back to reference Troendle JF: Approximating the Power of Wilcoxon's Rank-Sum Test Against Shift Alternatives. Statistics in Medicine 1999, 18: 2763–2773. 10.1002/(SICI)1097-0258(19991030)18:20<2763::AID-SIM197>3.3.CO;2-DPubMedCrossRef Troendle JF: Approximating the Power of Wilcoxon's Rank-Sum Test Against Shift Alternatives. Statistics in Medicine 1999, 18: 2763–2773. 10.1002/(SICI)1097-0258(19991030)18:20<2763::AID-SIM197>3.3.CO;2-DPubMedCrossRef
31.
go back to reference Lesaffre E, Scheys I, Frohlich J, Bluhmki E: Calculation of power and sample size with bounded outcome scores. Statistics in Medicine 1993, 12: 1063–1078.PubMedCrossRef Lesaffre E, Scheys I, Frohlich J, Bluhmki E: Calculation of power and sample size with bounded outcome scores. Statistics in Medicine 1993, 12: 1063–1078.PubMedCrossRef
32.
go back to reference Tsodikov A, Hasenclever D, Loeffler M: Regression with Bounded Outcome Score: Evaluation of Power by Bootstrap and Simulation in a Chronic Myelogenous Leukaemia Clinical Trial. Statistics in Medicine 1998, 17: 1909–1922. 10.1002/(SICI)1097-0258(19980915)17:17<1909::AID-SIM890>3.0.CO;2-0PubMedCrossRef Tsodikov A, Hasenclever D, Loeffler M: Regression with Bounded Outcome Score: Evaluation of Power by Bootstrap and Simulation in a Chronic Myelogenous Leukaemia Clinical Trial. Statistics in Medicine 1998, 17: 1909–1922. 10.1002/(SICI)1097-0258(19980915)17:17<1909::AID-SIM890>3.0.CO;2-0PubMedCrossRef
33.
go back to reference White IR, Thomson SG: Choice of test for comparing two groups, with particular application to skewed outcomes. Statistics in Medicine 2003, 22: 1205–1215. 10.1002/sim.1420PubMedCrossRef White IR, Thomson SG: Choice of test for comparing two groups, with particular application to skewed outcomes. Statistics in Medicine 2003, 22: 1205–1215. 10.1002/sim.1420PubMedCrossRef
34.
go back to reference Fairclough DL: Design and Analysis of Quality of Life Studies in Clinical Trials New York: Chapman & Hall 2002. Fairclough DL: Design and Analysis of Quality of Life Studies in Clinical Trials New York: Chapman & Hall 2002.
35.
go back to reference Cella D, Bullinger M, Scott C, Barofsky I, and the Clinical Significance Consensus Meeting Group: Group vs individual approaches to understanding the clinical significance of differences or changes in quality of life. Mayo Clinic Proceedings 2002,77(4):384–392.PubMedCrossRef Cella D, Bullinger M, Scott C, Barofsky I, and the Clinical Significance Consensus Meeting Group: Group vs individual approaches to understanding the clinical significance of differences or changes in quality of life. Mayo Clinic Proceedings 2002,77(4):384–392.PubMedCrossRef
36.
go back to reference Williamson P, Hutton JL, Bliss J, Blunt J, Campbell MJ, Nicholson R: Statistical review by research ethics committees. J Roy Statist Soc A 2000, 163: 5–13. 10.1111/1467-985X.00152CrossRef Williamson P, Hutton JL, Bliss J, Blunt J, Campbell MJ, Nicholson R: Statistical review by research ethics committees. J Roy Statist Soc A 2000, 163: 5–13. 10.1111/1467-985X.00152CrossRef
Metadata
Title
Sample size and power estimation for studies with health related quality of life outcomes: a comparison of four methods using the SF-36
Author
Stephen J Walters
Publication date
01-12-2004
Publisher
BioMed Central
Published in
Health and Quality of Life Outcomes / Issue 1/2004
Electronic ISSN: 1477-7525
DOI
https://doi.org/10.1186/1477-7525-2-26

Other articles of this Issue 1/2004

Health and Quality of Life Outcomes 1/2004 Go to the issue