Skip to main content
Top
Published in: BMC Medical Research Methodology 1/2015

Open Access 01-12-2015 | Research article

Sample size calculations for skewed distributions

Authors: Bonnie Cundill, Neal DE Alexander

Published in: BMC Medical Research Methodology | Issue 1/2015

Login to get access

Abstract

Background

Sample size calculations should correspond to the intended method of analysis. Nevertheless, for non-normal distributions, they are often done on the basis of normal approximations, even when the data are to be analysed using generalized linear models (GLMs).

Methods

For the case of comparison of two means, we use GLM theory to derive sample size formulae, with particular cases being the negative binomial, Poisson, binomial, and gamma families. By simulation we estimate the performance of normal approximations, which, via the identity link, are special cases of our approach, and for common link functions such as the log. The negative binomial and gamma scenarios are motivated by examples in hookworm vaccine trials and insecticide-treated materials, respectively.

Results

Calculations on the link function (log) scale work well for the negative binomial and gamma scenarios examined and are often superior to the normal approximations. However, they have little advantage for the Poisson and binomial distributions.

Conclusions

The proposed method is suitable for sample size calculations for comparisons of means of highly skewed outcome variables.
Appendix
Available only for authorised users
Literature
1.
go back to reference Lachin JM. Introduction to sample size determination and power analysis for clinical trials. Control Clin Trials. 1981;2:93–113.CrossRefPubMed Lachin JM. Introduction to sample size determination and power analysis for clinical trials. Control Clin Trials. 1981;2:93–113.CrossRefPubMed
2.
go back to reference Wong KS, Chen C, Fu J, Chang HM, Suwanwela NC, Huang YN, et al. Clopidogrel plus aspirin versus aspirin alone for reducing embolisation in patients with acute symptomatic cerebral or carotid artery stenosis (CLAIR study): a randomised, open-label, blinded-endpoint trial. Lancet Neurol. 2010;9(5):489–97.CrossRefPubMed Wong KS, Chen C, Fu J, Chang HM, Suwanwela NC, Huang YN, et al. Clopidogrel plus aspirin versus aspirin alone for reducing embolisation in patients with acute symptomatic cerebral or carotid artery stenosis (CLAIR study): a randomised, open-label, blinded-endpoint trial. Lancet Neurol. 2010;9(5):489–97.CrossRefPubMed
3.
go back to reference Watson-Jones D, Weiss HA, Rusizoka M, Changalucha J, Baisley K, Mugeye K, et al. Effect of herpes simplex suppression on incidence of HIV among women in Tanzania. N Engl J Med. 2008;358(15):1560–71.CrossRefPubMedPubMedCentral Watson-Jones D, Weiss HA, Rusizoka M, Changalucha J, Baisley K, Mugeye K, et al. Effect of herpes simplex suppression on incidence of HIV among women in Tanzania. N Engl J Med. 2008;358(15):1560–71.CrossRefPubMedPubMedCentral
4.
go back to reference Kessler D, Lewis G, Kaur S, Wiles N, King M, Weich S, et al. Therapist-delivered Internet psychotherapy for depression in primary care: a randomised controlled trial. Lancet. 2009;374(9690):628–34.CrossRefPubMed Kessler D, Lewis G, Kaur S, Wiles N, King M, Weich S, et al. Therapist-delivered Internet psychotherapy for depression in primary care: a randomised controlled trial. Lancet. 2009;374(9690):628–34.CrossRefPubMed
5.
go back to reference Holland R, Lenaghan E, Harvey I, Smith R, Shepstone L, Lipp A, et al. Does home based medication review keep older people out of hospital? The HOMER randomised controlled trial. BMJ. 2005;330(7486):293.CrossRefPubMedPubMedCentral Holland R, Lenaghan E, Harvey I, Smith R, Shepstone L, Lipp A, et al. Does home based medication review keep older people out of hospital? The HOMER randomised controlled trial. BMJ. 2005;330(7486):293.CrossRefPubMedPubMedCentral
6.
go back to reference Kaul R, Kimani J, Nagelkerke NJ, Fonck K, Ngugi EN, Keli F, et al. Monthly antibiotic chemoprophylaxis and incidence of sexually transmitted infections and HIV-1 infection in Kenyan sex workers: a randomized controlled trial. JAMA. 2004;291(21):2555–62.CrossRefPubMed Kaul R, Kimani J, Nagelkerke NJ, Fonck K, Ngugi EN, Keli F, et al. Monthly antibiotic chemoprophylaxis and incidence of sexually transmitted infections and HIV-1 infection in Kenyan sex workers: a randomized controlled trial. JAMA. 2004;291(21):2555–62.CrossRefPubMed
7.
go back to reference Kirkwood BR, Sterne JAC. Essentials of medical statistics. 2nd ed. Oxford: Blackwell Scientific Publications; 2003. Kirkwood BR, Sterne JAC. Essentials of medical statistics. 2nd ed. Oxford: Blackwell Scientific Publications; 2003.
8.
go back to reference van Belle G. Statistical rules of thumb. 2nd ed. Hoboken, N.J.: Wiley-Interscience; 2008.CrossRef van Belle G. Statistical rules of thumb. 2nd ed. Hoboken, N.J.: Wiley-Interscience; 2008.CrossRef
9.
go back to reference Rosner B. Fundamentals of biostatistics. 7th ed. Boston: Duxbury Press; 2010. Rosner B. Fundamentals of biostatistics. 7th ed. Boston: Duxbury Press; 2010.
10.
go back to reference Daly L, Bourke GJ. Interpretation and uses of medical statistics. 5th ed. Oxford: Blackwell Science; 2000.CrossRef Daly L, Bourke GJ. Interpretation and uses of medical statistics. 5th ed. Oxford: Blackwell Science; 2000.CrossRef
11.
go back to reference Whittemore AS. Sample size for logistic regression with small response probability. J Am Stat Assoc. 1981;76(323):27–32.CrossRef Whittemore AS. Sample size for logistic regression with small response probability. J Am Stat Assoc. 1981;76(323):27–32.CrossRef
12.
go back to reference Hsieh FY, Bloch DA, Larsen MD. A simple method of sample size calculation for linear and logistic regression. Stat Med. 1998;17:1623–34.CrossRefPubMed Hsieh FY, Bloch DA, Larsen MD. A simple method of sample size calculation for linear and logistic regression. Stat Med. 1998;17:1623–34.CrossRefPubMed
13.
go back to reference Vaeth M, Skovlund E. A simple approach to power and sample size calculations in logistic regression and Cox regression models. Stat Med. 2004;23(11):1781–92.CrossRefPubMed Vaeth M, Skovlund E. A simple approach to power and sample size calculations in logistic regression and Cox regression models. Stat Med. 2004;23(11):1781–92.CrossRefPubMed
14.
go back to reference Alam MK, Rao MB, Cheng F-C. Sample size determination in logistic regression. Sankhya. 2010;72-B(1):58–75.CrossRef Alam MK, Rao MB, Cheng F-C. Sample size determination in logistic regression. Sankhya. 2010;72-B(1):58–75.CrossRef
15.
go back to reference Signorini DF. Sample size for poisson regression. Biometrika. 1991;78:446–50.CrossRef Signorini DF. Sample size for poisson regression. Biometrika. 1991;78:446–50.CrossRef
16.
go back to reference Shieh G. Sample size calculations for logistic and poisson regression models. Biometrika. 2001;88(4):1193–9.CrossRef Shieh G. Sample size calculations for logistic and poisson regression models. Biometrika. 2001;88(4):1193–9.CrossRef
17.
go back to reference Zhu H, Lakkis H. Sample size calculation for comparing two negative binomial rates. Stat Med. 2014;33(3):376–87.CrossRefPubMed Zhu H, Lakkis H. Sample size calculation for comparing two negative binomial rates. Stat Med. 2014;33(3):376–87.CrossRefPubMed
18.
go back to reference Self SG, Mauritsen RH. Power/sample size calculations for generalized linear models. Biometrics. 1988;44:79–86.CrossRef Self SG, Mauritsen RH. Power/sample size calculations for generalized linear models. Biometrics. 1988;44:79–86.CrossRef
19.
go back to reference Shieh G. On power and sample size calculations for likelihood ratio tests in generalized linear models. Biometrics. 2000;56(4):1192–6.CrossRefPubMed Shieh G. On power and sample size calculations for likelihood ratio tests in generalized linear models. Biometrics. 2000;56(4):1192–6.CrossRefPubMed
20.
go back to reference Feller W. An introduction to probability theory and its applications. 2nd ed. New York: Wiley & Sons; 1971. Feller W. An introduction to probability theory and its applications. 2nd ed. New York: Wiley & Sons; 1971.
21.
go back to reference Shevtsova IG. On the absolute constants in the Berry–Esseen inequality and its structural and nonuniform improvements. Informatika i Ee Primeneniya [Informatics and its Applications]. 2013;7(1):124–5. Shevtsova IG. On the absolute constants in the Berry–Esseen inequality and its structural and nonuniform improvements. Informatika i Ee Primeneniya [Informatics and its Applications]. 2013;7(1):124–5.
22.
go back to reference Korolev VA, Shevtsova I. An improvement of the Berry-Esseen inequality with applications to poisson and mixed poisson random sums. Scand Actuar J. 2012;2012:81–105.CrossRef Korolev VA, Shevtsova I. An improvement of the Berry-Esseen inequality with applications to poisson and mixed poisson random sums. Scand Actuar J. 2012;2012:81–105.CrossRef
23.
go back to reference Hipp C, Mattner L. On the normal approximation to symmetric binomial distributions. Theory Probability Appl. 2008;52(3):516–23.CrossRef Hipp C, Mattner L. On the normal approximation to symmetric binomial distributions. Theory Probability Appl. 2008;52(3):516–23.CrossRef
24.
go back to reference Nagaev SV, Chebotarev VI. On the bound of proximity of the binomial distribution to the normal one. Theory Probability Appl. 2012;56(2):213–39.CrossRef Nagaev SV, Chebotarev VI. On the bound of proximity of the binomial distribution to the normal one. Theory Probability Appl. 2012;56(2):213–39.CrossRef
25.
go back to reference Hilbe JM. Negative binomial regression. 1st ed. Cambridge: Cambridge University Press; 2007.CrossRef Hilbe JM. Negative binomial regression. 1st ed. Cambridge: Cambridge University Press; 2007.CrossRef
26.
go back to reference Zelterman D. Discrete distributions: applications in the health sciences. Chichester: Wiley; 2004.CrossRef Zelterman D. Discrete distributions: applications in the health sciences. Chichester: Wiley; 2004.CrossRef
27.
go back to reference McCullagh P, Nelder JA. Generalized linear models. 1st ed. London: Chapman and Hall; 1983.CrossRef McCullagh P, Nelder JA. Generalized linear models. 1st ed. London: Chapman and Hall; 1983.CrossRef
28.
go back to reference Hilbe JM. Negative binomial regression. 2nd ed. Cambridge: Cambridge University Press; 2011.CrossRef Hilbe JM. Negative binomial regression. 2nd ed. Cambridge: Cambridge University Press; 2011.CrossRef
29.
go back to reference Brooker S, Bethony JM, Rodrigues LC, Alexander N, Geiger S, Hotez PJ. Epidemiological, immunological and practical considerations in developing and evaluating a human hookworm vaccine. Expert Rev Vaccines. 2005;4(1):35–50.CrossRef Brooker S, Bethony JM, Rodrigues LC, Alexander N, Geiger S, Hotez PJ. Epidemiological, immunological and practical considerations in developing and evaluating a human hookworm vaccine. Expert Rev Vaccines. 2005;4(1):35–50.CrossRef
30.
go back to reference Fox J. Applied regression analysis and generalized linear models. 2nd ed. Thousand Oaks, California: Sage Publications, Inc; 2008. Fox J. Applied regression analysis and generalized linear models. 2nd ed. Thousand Oaks, California: Sage Publications, Inc; 2008.
31.
go back to reference Alexander N, Cundill B, Sabatelli L, Bethony JM, Diemert D, Hotez P, et al. Selection and quantification of infection endpoints for trials of vaccines against intestinal helminths. Vaccine. 2011;29(20):3686–94.CrossRefPubMedPubMedCentral Alexander N, Cundill B, Sabatelli L, Bethony JM, Diemert D, Hotez P, et al. Selection and quantification of infection endpoints for trials of vaccines against intestinal helminths. Vaccine. 2011;29(20):3686–94.CrossRefPubMedPubMedCentral
32.
go back to reference Manning WG, Basu A, Mullahy J. Generalized modeling approaches to risk adjustment of skewed outcomes data. J Health Econ. 2005;24(3):465–88.CrossRefPubMed Manning WG, Basu A, Mullahy J. Generalized modeling approaches to risk adjustment of skewed outcomes data. J Health Econ. 2005;24(3):465–88.CrossRefPubMed
33.
go back to reference Evans M, Hastings N, Peacock B. Statistical distributions. 3rd ed. New York: Wiley; 2000. Evans M, Hastings N, Peacock B. Statistical distributions. 3rd ed. New York: Wiley; 2000.
35.
go back to reference Rodríguez M, Pérez L, Caicedo JC, Prieto G, Arroyo JA, Kaur H, et al. Composition and biting activity of Anopheles (Diptera: Culicidae) in the Amazon region of Colombia in relation to mosquito net policy. J Med Entomol. 2009;46(2):307–15.CrossRefPubMedPubMedCentral Rodríguez M, Pérez L, Caicedo JC, Prieto G, Arroyo JA, Kaur H, et al. Composition and biting activity of Anopheles (Diptera: Culicidae) in the Amazon region of Colombia in relation to mosquito net policy. J Med Entomol. 2009;46(2):307–15.CrossRefPubMedPubMedCentral
36.
go back to reference Heeren T, d’Agostino R. Robustness of the two-independent samples t-test when applies to ordinal scale data. Stat Med. 1987;6:79–90.CrossRefPubMed Heeren T, d’Agostino R. Robustness of the two-independent samples t-test when applies to ordinal scale data. Stat Med. 1987;6:79–90.CrossRefPubMed
37.
go back to reference Boneau CA. The effects of violations of assumptions underlying the t test. Psychol Bull. 1960;57(1):49–64.CrossRefPubMed Boneau CA. The effects of violations of assumptions underlying the t test. Psychol Bull. 1960;57(1):49–64.CrossRefPubMed
38.
go back to reference Stonehouse JM, Forrester GJ. Robustness of the t and U tests under combined assumption violations. J Appl Stat. 1998;25(1):63–74.CrossRef Stonehouse JM, Forrester GJ. Robustness of the t and U tests under combined assumption violations. J Appl Stat. 1998;25(1):63–74.CrossRef
39.
go back to reference Noether GE. Sample size determination for some common nonparametric tests. J Am Stat Assoc. 1987;82(398):645–7.CrossRef Noether GE. Sample size determination for some common nonparametric tests. J Am Stat Assoc. 1987;82(398):645–7.CrossRef
40.
go back to reference Sully BG, Julious SA, Nicholl J. A reinvestigation of recruitment to randomised, controlled, multicenter trials: a review of trials funded by two UK funding agencies. Trials. 2013;14:166.CrossRefPubMedPubMedCentral Sully BG, Julious SA, Nicholl J. A reinvestigation of recruitment to randomised, controlled, multicenter trials: a review of trials funded by two UK funding agencies. Trials. 2013;14:166.CrossRefPubMedPubMedCentral
41.
go back to reference Bacchetti P, Wolf LE, Segal MR, McCulloch CE. Ethics and sample size. Am J Epidemiol. 2005;161(2):105–10.CrossRefPubMed Bacchetti P, Wolf LE, Segal MR, McCulloch CE. Ethics and sample size. Am J Epidemiol. 2005;161(2):105–10.CrossRefPubMed
42.
go back to reference Anderson RM, May RM. Infectious diseases of humans: dynamics and control. 1st ed. Oxford: Oxford University Press; 1991. Anderson RM, May RM. Infectious diseases of humans: dynamics and control. 1st ed. Oxford: Oxford University Press; 1991.
43.
go back to reference Nedelman J. A negative binomial model for sampling mosquitoes in a malaria survey. Biometrics. 1983;39:1009–20.CrossRefPubMed Nedelman J. A negative binomial model for sampling mosquitoes in a malaria survey. Biometrics. 1983;39:1009–20.CrossRefPubMed
44.
go back to reference Mwangi TW, Fegan G, Williams TN, Kinyanjui SM, Snow RW, Marsh K. Evidence for over-dispersion in the distribution of clinical malaria episodes in children. PLoS One. 2008;3(5):e2196.CrossRefPubMedPubMedCentral Mwangi TW, Fegan G, Williams TN, Kinyanjui SM, Snow RW, Marsh K. Evidence for over-dispersion in the distribution of clinical malaria episodes in children. PLoS One. 2008;3(5):e2196.CrossRefPubMedPubMedCentral
45.
go back to reference Aban IB, Cutter GR, Mavinga N. Inferences and power analysis concerning two negative binomial distributions with an application to MRI lesion counts data. Comput Stat Data Anal. 2008;53(3):820–33.CrossRefPubMedPubMedCentral Aban IB, Cutter GR, Mavinga N. Inferences and power analysis concerning two negative binomial distributions with an application to MRI lesion counts data. Comput Stat Data Anal. 2008;53(3):820–33.CrossRefPubMedPubMedCentral
46.
go back to reference Periwal SB, Spagna K, Shahabi V, Quiroz J, Shroff KE. Statistical evaluation for detection of peptide specific interferon-gamma secreting T-cells induced by HIV vaccine determined by ELISPOT assay. J Immunol Methods. 2005;305(2):128–34.CrossRefPubMed Periwal SB, Spagna K, Shahabi V, Quiroz J, Shroff KE. Statistical evaluation for detection of peptide specific interferon-gamma secreting T-cells induced by HIV vaccine determined by ELISPOT assay. J Immunol Methods. 2005;305(2):128–34.CrossRefPubMed
47.
go back to reference Krebs CJ. Ecological methodology. 2nd ed. Menlo Park: Benjamin/Cummings; 1999. Krebs CJ. Ecological methodology. 2nd ed. Menlo Park: Benjamin/Cummings; 1999.
48.
go back to reference Barber JA, Thompson SG. Analysis of cost data in randomized trials: an application of the non-parametric bootstrap. Stat Med. 2000;19(23):3219–36.CrossRefPubMed Barber JA, Thompson SG. Analysis of cost data in randomized trials: an application of the non-parametric bootstrap. Stat Med. 2000;19(23):3219–36.CrossRefPubMed
49.
go back to reference Chaves PH, Xue QL, Guralnik JM, Ferrucci L, Volpato S, Fried LP. What constitutes normal hemoglobin concentration in community-dwelling disabled older women? J Am Geriatr Soc. 2004;52(11):1811–6.CrossRefPubMed Chaves PH, Xue QL, Guralnik JM, Ferrucci L, Volpato S, Fried LP. What constitutes normal hemoglobin concentration in community-dwelling disabled older women? J Am Geriatr Soc. 2004;52(11):1811–6.CrossRefPubMed
50.
go back to reference Garcia-Broncano P, Berenguer J, Fernandez-Rodriguez A, Pineda-Tenor D, Jimenez-Sousa MA, Garcia-Alvarez M, et al. PPARgamma2 Pro12Ala polymorphism was associated with favorable cardiometabolic risk profile in HIV/HCV coinfected patients: a cross-sectional study. J Transl Med. 2014;12:235.CrossRefPubMedPubMedCentral Garcia-Broncano P, Berenguer J, Fernandez-Rodriguez A, Pineda-Tenor D, Jimenez-Sousa MA, Garcia-Alvarez M, et al. PPARgamma2 Pro12Ala polymorphism was associated with favorable cardiometabolic risk profile in HIV/HCV coinfected patients: a cross-sectional study. J Transl Med. 2014;12:235.CrossRefPubMedPubMedCentral
51.
go back to reference Balakrishnan N, Nevzorov VB. A primer on statistical distributions. Hoboken, New Jersey: Wiley-Interscience; 2003.CrossRef Balakrishnan N, Nevzorov VB. A primer on statistical distributions. Hoboken, New Jersey: Wiley-Interscience; 2003.CrossRef
52.
go back to reference Mood AM, Graybill FA, Boes DC. Introduction to the theory of statistics. 3rd ed. New York: McGraw-Hill; 1974. Mood AM, Graybill FA, Boes DC. Introduction to the theory of statistics. 3rd ed. New York: McGraw-Hill; 1974.
Metadata
Title
Sample size calculations for skewed distributions
Authors
Bonnie Cundill
Neal DE Alexander
Publication date
01-12-2015
Publisher
BioMed Central
Published in
BMC Medical Research Methodology / Issue 1/2015
Electronic ISSN: 1471-2288
DOI
https://doi.org/10.1186/s12874-015-0023-0

Other articles of this Issue 1/2015

BMC Medical Research Methodology 1/2015 Go to the issue