Skip to main content

Logistic Regression

  • Protocol
Topics in Biostatistics

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 404))

Abstract

The Medical Subject Headings (MeSH) thesaurus used by the National Library of Medicine defines logistic regression models as “statistical models which describe the relationship between a qualitative dependent variable (that is, one which can take only certain discrete values, such as the presence or absence of a disease) and an independent variable.” Logistic regression models are used to study effects of predictor variables on categorical outcomes and normally the outcome is binary, such as presence or absence of disease (e.g., non-Hodgkin’s lymphoma), in which case the model is called a binary logistic model. When there are multiple predictors (e.g., risk factors and treatments) the model is referred to as a multiple or multivariable logistic regression model and is one of the most frequently used statistical model in medical journals. In this chapter, we examine both simple and multiple binary logistic regression models and present related issues, including interaction, categorical predictor variables, continuous predictor variables, and goodness of fit.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Cox, D. R. (1958) The regression analysis of binary sequences. J. R. Stat. Soc. Ser. B. 20, 215–42.

    Google Scholar 

  2. Berkson, J. (1955) Maximum likelihood and minimum chi-square estimates of the logistic function. J. Am. Stat. Assoc. 50, 130–62.

    Article  Google Scholar 

  3. Cornfield, J., Gordon, T., and Smith W. W. (1961) Quantal response curves for experimentally uncontrolled variables. Bull. Int. Stat. Inst. 38, 91–115.

    Google Scholar 

  4. MeSH B. (2005) Bethesda: National Library of Medicine. Available at http://www.nlm.nih.gov/mesh/MBrowser.html. Retrieved April 2, 2007.

    Google Scholar 

  5. Mullner, M., Matthews, H., and Altman D. G. (2002) Reporting on statistical methods to adjust for confounding: a cross-sectional survey. Ann. Intern. Med. 136, 122–6.

    PubMed  Google Scholar 

  6. Tibshirani, R. (1982) A plain man’s guide to the proportional hazards model. Clin. Invest. Med. 5, 63–8.

    PubMed  CAS  Google Scholar 

  7. Harrell, F. E. (1986) SUGI Supplemental Library User’s Guide, Version 5 Edition. Cary, SAS Institute Inc., pp. 269–93.

    Google Scholar 

  8. Ojo, A. O., Held, P. J., Port, F. K., Wolfe, R. A., Leichtman, A. B., Young, E. W., Arndorfer, J., Christensen, L., and Merion, R. M. (2003) Chronic renal failure after transplantation of a nonrenal organ. N. Engl. J. Med. 349, 931–40.

    Article  PubMed  CAS  Google Scholar 

  9. Baan, C. C., Balk, A. H., Holweg, C. T., van Riemsdijk, I. C., Matt, L. P., Vantrimpont, P. J., Niesters, H. G., and Weimar, W. (2000) Renal failure after clinical heart transplantation is associated with the TGF-beta 1 codon 10 gene polymorphism. J. Heart Lung Transplant. 19, 866–72.

    Article  PubMed  CAS  Google Scholar 

  10. Harrell, F. E., Jr. (2001) Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. New York, Springer-Verlag.

    Google Scholar 

  11. Katz, M. H. (1999) Multivariable Analysis: A Practical Guide for Clinicians. New York, Cambridge University Press.

    Google Scholar 

  12. Bland, J. M., and Altman, D. G. (2000) Statistics notes. The odds ratio. BMJ 320, 1468.

    Article  PubMed  CAS  Google Scholar 

  13. Slattery, M. L., Samowtiz, W., Ma, K., Murtaugh, M., Sweeney, C., Levin, T. R., and Neuhausen, S. (2004) CYP1A1, cigarette smoking, and colon and rectal cancer. Am. J. Epidemiol. 160, 842–52.

    Article  PubMed  Google Scholar 

  14. Hishida, A., Matsuo, K., Tajima, K. Ogura, M., Kagami, Y., Taji, H., Morishima, Y., Emi, N., Naoe, T., and Hamajima, N. (2004) Polymorphisms of p53 Arg72Pro, p73 G4C14-to-A4T14 at exon 2 and p21 Ser31Arg and the risk of non-Hodgkin’s lymphoma in Japanese. Leuk. Lymphoma 45, 957–64.

    Article  PubMed  CAS  Google Scholar 

  15. Harrell, F. E., Jr., Lee, K. L., and Pollock, B. G. (1988) Regression models in clinical studies: determining relationships between predictors and response. J. Natl. Cancer Inst. 80, 1198–202.

    Article  PubMed  Google Scholar 

  16. Kleinbaum, D. G. (1994) Logistic Regression: A Self-learning Text. New York, Springer-Verlag.

    Google Scholar 

  17. Dupont, W. D. (2002) Statistical Modeling for Biomedical Researchers. Cambridge, Cambridge University Press.

    Google Scholar 

  18. Hosmer, D. W., and Lemeshow, S. (2000) Applied Logistic Regression. New York, John Wiley & Sons.

    Book  Google Scholar 

  19. Nick, T. G., and Hardin, J. M. (1999) Regression modeling strategies: an illustrative case study from medical rehabilitation outcomes research. Am. J. Occup. Ther. 53, 459–70.

    PubMed  CAS  Google Scholar 

  20. Walter, S. D., Feinstein, A. R., and Wells, C. K. (1987) Coding ordinal independent variables in multiple regression analyses. Am. J. Epidemiol. 125, 319–23.

    PubMed  CAS  Google Scholar 

  21. Ford, E. S., Mokdad, A. H., and Liu, S. (2005) Healthy Eating Index and C-reactive protein concentration: findings from the National Health and Nutrition Examination Survey III, 1988–1994. Eur. J. Clin. Nutr. 59, 278–83.

    Article  PubMed  CAS  Google Scholar 

  22. Menard, S. (2004) Six approaches to calculating standardized logistic regression coefficients. Am. Stat. 58, 218–23.

    Article  Google Scholar 

  23. Harrell, F. E., Jr., Lee, K. L., and Mark, D. B. (1996) Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat. Med. 15, 361–87.

    Article  PubMed  Google Scholar 

  24. Adams, R. A., Sherer, M., Struchen, M. A., and Nick, T. G. (2004) Post-acute brain injury rehabilitation for patients with stroke. Brain Inj. 18, 811–23.

    Article  PubMed  Google Scholar 

  25. Sherer, M., Hart, T., and Nick, T. G. (2003) Measurement of impaired self-awareness after traumatic brain injury: a comparison of the patient competency rating scale and the awareness questionnaire. Brain Inj. 17, 25–37.

    Article  PubMed  Google Scholar 

  26. Sherer, M., Hart, T., Nick, T. G., et al. (2003) Early impaired self-awareness after traumatic brain injury. Arch. Phys. Med. Rehabil. 84, 168–76.

    Article  PubMed  Google Scholar 

  27. Ottenbacher, K. J., Ottenbacher, H. R., Tooth, L., and Ostir, G. V. (2004) A review of two journals found that articles using multivariable logistic regression frequently did not report commonly recommended assumptions. J. Clin. Epidemiol. 57, 1147–52.

    Article  PubMed  Google Scholar 

  28. Lang, T. A., and Secic, M. (1997) How to Report Statistics in Medicine: Annotated Guidelines for Authors, Editors, and Reviewers. Philadelphia, American College of Physicians.

    Google Scholar 

  29. Bagley, S. C., White, H., and Golomb, B. A. (2001) Logistic regression in the medical literature: standards for use and reporting, with particular attention to one medical domain. J. Clin. Epidemiol. 54, 979–85.

    Article  PubMed  CAS  Google Scholar 

  30. Concato, J., Feinstein, A. R., and Holford, T. R. (1993) The risk of determining risk with multivariable models. Ann. Intern. Med. 118, 201–10.

    PubMed  CAS  Google Scholar 

  31. Riester, K. L. A., Peduzzi, P., Holford, T. R., Ellison, R. T., 3rd, and Donta, S. T. (1997) Statistical evaluation of the role of Helicobacter pylori in stress gastritis: applications of splines and bootstrapping to the logistic model. J. Clin. Epidemiol. 50, 1273–9.

    Article  PubMed  CAS  Google Scholar 

  32. Matthews, J. N., and Altman, D. G. (1996) Statistics notes. Interaction 2: compare effect sizes not P values. BMJ 313, 808.

    PubMed  CAS  Google Scholar 

  33. Altman, D. G., and Bland, J. M. (2003) Interaction revisited: the difference between two estimates. BMJ 326, 219.

    Article  PubMed  Google Scholar 

  34. Farewll, V. T. (1998) Interaction, In: Armitage, P., and Colton, T., eds. Encyclopedia of Biostatistics. New York, John Wiley & Sons, pp. 2060–2061.

    Google Scholar 

  35. Pregibon, D. (1981) Logistic regression diagnositcs. Ann. Stat. 9, 705–24.

    Article  Google Scholar 

  36. Hosmer, D. W., Hosmer, T., Le Cessie, S., and Lemeshow, S. (1997) A comparison of goodness-of-fit tests for the logistic regression model. Stat. Med. 16, 965–80.

    Article  PubMed  CAS  Google Scholar 

  37. Ash, A., and Shwartz, M. (1999) R2: a useful measure of model performance when predicting a dichotomous outcome. Stat. Med. 18, 375–84.

    Article  PubMed  CAS  Google Scholar 

  38. Nagelkerke, N. J. D. (1991) A note on a general definition of the coefficient of determination. Biometrika 78, 691–2.

    Article  Google Scholar 

  39. Mittlbock, M., and Schemper, M. (1996) Explained variation for logistic regression. Stat. Med. 15, 1987–97.

    Article  PubMed  CAS  Google Scholar 

  40. Steyerberg, E. W., Harrell, F. E., Jr., Borsboom, G. J., Eijkemans, M. J., Vergouwe, Y., and Habbema, J. D. (2001) Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J. Clin. Epidemiol. 54, 774–81.

    Article  PubMed  CAS  Google Scholar 

  41. Steyerberg, E. W., Bleeker, S. E., Moll, H. A., Grobbee, D. E., and Moons, K. G. (2003) Internal and external validation of predictive models: a simulation study of bias and precision in small samples. J. Clin. Epidemiol. 56, 441–7.

    Article  PubMed  Google Scholar 

  42. Harrell, F. E. (2005) Design Library. Available at http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/RS.

    Google Scholar 

  43. Elashoff, J. (2005) nQuery Advisor Version 6.0 User’s Guide. Los Angeles, Statistical Solutions.

    Google Scholar 

  44. Hintze, J. (2002) PASS. Kaysville, NCSS Statistical Software.

    Google Scholar 

  45. Peduzzi, P., Concato, J., Kemper, E., Holford, T. R., and Feinstein, A. R. (1996) A simulation study of the number of events per variable in logistic regression analysis. J. Clin. Epidemiol. 49, 1373–9.

    Article  PubMed  CAS  Google Scholar 

  46. Steyerberg, E. W., Eijkemans, M. J., and Habbema, J. D. (1999) Stepwise selection in small data sets: a simulation study of bias in logistic regression analysis. J. Clin. Epidemiol. 52, 35–42.

    Article  Google Scholar 

  47. Austin, P. C., and Tu, J. V. (2004) Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality. J. Clin. Epidemiol. 57, 1138–46.

    Article  PubMed  Google Scholar 

  48. Ambler, G., Brady, A. R., and Royston, P. (2002) Simplifying a prognostic model: a simulation study based on clinical data. Stat. Med. 21, 3803–22.

    Article  PubMed  Google Scholar 

  49. Harrell, F. E., Jr., Margolis, P. A., Gove, S., Mason, K. E., Mulholland, E. K., Lehmann, D., Muhe, L., Catchalian, S., and Eichenwald, H. F. (1998) Development of a clinical prediction model for an ordinal outcome: the World Health Organization Multicentre Study of Clinical Signs and Etiological Agents of Pneumonia, Sepsis and Meningitis in Young Infants. WHO/ARI Young Infant Multicentre Study Group. Stat. Med. 17, 909–44.

    Article  PubMed  Google Scholar 

  50. Moons, K. G., Donders, A. R., Steyerberg, E. W., and Harrell, F. E. (2004) Penalized maximum likelihood estimation to directly adjust diagnostic and prognostic prediction models for overoptimism: a clinical example. J. Clin. Epidemiol. 57, 1262–70.

    Article  PubMed  CAS  Google Scholar 

  51. Steyerberg, E. W., Borsboom, G. J., van Houwelingen, H. C., Eijkemans, M. J., and Habbema, J. D. (2004) Validation and updating of predictive logistic regression models: a study on sample size and shrinkage. Stat. Med. 23, 2567–86.

    Article  PubMed  Google Scholar 

  52. Antoniadis, A. (2003) Penalized logistic regression and classification of microarray data. Available at http://www.bioconductor.org/workshops/2003/Milan/Lectures/anestisMilan3.pdf. Assessed April 2, 2007.

    Google Scholar 

  53. Campbell, G. (2004) Some statistical and regulatory issues in the evaluation of genetic and genomic tests. J. Biopharm. Stat. 14, 539–52.

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Humana Press Inc., Totowa, NJ

About this protocol

Cite this protocol

Nick, T.G., Campbell, K.M. (2007). Logistic Regression. In: Ambrosius, W.T. (eds) Topics in Biostatistics. Methods in Molecular Biology™, vol 404. Humana Press. https://doi.org/10.1007/978-1-59745-530-5_14

Download citation

  • DOI: https://doi.org/10.1007/978-1-59745-530-5_14

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-531-6

  • Online ISBN: 978-1-59745-530-5

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics