Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2019

Open Access 01-12-2019 | Research article

The index lift in data mining has a close relationship with the association measure relative risk in epidemiological studies

Authors: Khanh Vu, Rebecca A. Clark, Colin Bellinger, Graham Erickson, Alvaro Osornio-Vargas, Osmar R. Zaïane, Yan Yuan

Published in: BMC Medical Informatics and Decision Making | Issue 1/2019

Login to get access

Abstract

Background

Data mining tools have been increasingly used in health research, with the promise of accelerating discoveries. Lift is a standard association metric in the data mining community. However, health researchers struggle with the interpretation of lift. As a result, dissemination of data mining results can be met with hesitation. The relative risk and odds ratio are standard association measures in the health domain, due to their straightforward interpretation and comparability across populations. We aimed to investigate the lift-relative risk and the lift-odds ratio relationships, and provide tools to convert lift to the relative risk and odds ratio.

Methods

We derived equations linking lift-relative risk and lift-odds ratio. We discussed how lift, relative risk, and odds ratio behave numerically with varying association strengths and exposure prevalence levels. The lift-relative risk relationship was further illustrated using a high-dimensional dataset which examines the association of exposure to airborne pollutants and adverse birth outcomes. We conducted spatial association rule mining using the Kingfisher algorithm, which identified association rules using its built-in lift metric. We directly estimated relative risks and odds ratios from 2 by 2 tables for each identified rule. These values were compared to the corresponding lift values, and relative risks and odds ratios were computed using the derived equations.

Results

As the exposure-outcome association strengthens, the odds ratio and relative risk move away from 1 faster numerically than lift, i.e. |log (odds ratio)| ≥ |log (relative risk)| ≥ |log (lift)|. In addition, lift is bounded by the smaller of the inverse probability of outcome or exposure, i.e. lift≤ min (1/P(O), 1/P(E)). Unlike the relative risk and odds ratio, lift depends on the exposure prevalence for fixed outcomes. For example, when an exposure A and a less prevalent exposure B have the same relative risk for an outcome, exposure A has a lower lift than B.

Conclusions

Lift, relative risk, and odds ratio are positively correlated and share the same null value. However, lift depends on the exposure prevalence, and thus is not straightforward to interpret or to use to compare association strength. Tools are provided to obtain the relative risk and odds ratio from lift.
Appendix
Available only for authorised users
Literature
1.
go back to reference Bellinger C, Mohomed Jabbar MS, Zaïane OR, Osornio-Vargas A. A systematic review of data mining and machine learning for air pollution epidemiology. BMC Public Health. 2017;17(1):907.CrossRef Bellinger C, Mohomed Jabbar MS, Zaïane OR, Osornio-Vargas A. A systematic review of data mining and machine learning for air pollution epidemiology. BMC Public Health. 2017;17(1):907.CrossRef
2.
go back to reference Yoo I, Alafaireet P, Marinov M, Pena-Hernandez K, Gopidi R, Chang JF, et al. Data mining in healthcare and biomedicine: a survey of the literature. J Med Syst. 2012;36(4):2431–48.CrossRef Yoo I, Alafaireet P, Marinov M, Pena-Hernandez K, Gopidi R, Chang JF, et al. Data mining in healthcare and biomedicine: a survey of the literature. J Med Syst. 2012;36(4):2431–48.CrossRef
3.
go back to reference Jalali-Heravi M, Zaïane OR. A study on interestingness measures for associative classifiers. Proceedings of the 2010 ACM Symposium on Applied Computing; Sierre, Switzerland 1774306: ACM; 2010. p. 1039–1046. Jalali-Heravi M, Zaïane OR. A study on interestingness measures for associative classifiers. Proceedings of the 2010 ACM Symposium on Applied Computing; Sierre, Switzerland 1774306: ACM; 2010. p. 1039–1046.
4.
go back to reference Silverstein C, Brin S, Motwani R. Beyond market baskets: generalizing association rules to dependence rules. Data Min Knowl Disc. 1998;2(1):39–68.CrossRef Silverstein C, Brin S, Motwani R. Beyond market baskets: generalizing association rules to dependence rules. Data Min Knowl Disc. 1998;2(1):39–68.CrossRef
5.
go back to reference Brin S, Motwani R, Ullman JD, Tsur S. Dynamic itemset counting and implication rules for market basket data. SIGMOD Rec. 1997;26(2):255–64.CrossRef Brin S, Motwani R, Ullman JD, Tsur S. Dynamic itemset counting and implication rules for market basket data. SIGMOD Rec. 1997;26(2):255–64.CrossRef
6.
go back to reference Lee DG, Ryu KS, Bashir M, Bae J-W, Ryu KH. Discovering medical knowledge using association rule Mining in Young Adults with acute myocardial infarction. J Med Syst. 2013;37(2):9896.CrossRef Lee DG, Ryu KS, Bashir M, Bae J-W, Ryu KH. Discovering medical knowledge using association rule Mining in Young Adults with acute myocardial infarction. J Med Syst. 2013;37(2):9896.CrossRef
7.
go back to reference Chin CY, Weng MY, Lin TC, Cheng SY, Yang YH, Tseng VS. Mining disease risk patterns from nationwide clinical databases for the assessment of early rheumatoid arthritis risk. PLoS One. 2015;10(4):e0122508.CrossRef Chin CY, Weng MY, Lin TC, Cheng SY, Yang YH, Tseng VS. Mining disease risk patterns from nationwide clinical databases for the assessment of early rheumatoid arthritis risk. PLoS One. 2015;10(4):e0122508.CrossRef
8.
go back to reference Koh HC, Tan G. Data mining applications in healthcare. J. Healthc. Inf. Manag. 2005;19(2):64–72.PubMed Koh HC, Tan G. Data mining applications in healthcare. J. Healthc. Inf. Manag. 2005;19(2):64–72.PubMed
9.
go back to reference Tang JY, Chuang LY, Hsi E, Lin YD, Yang CH, Chang HW. Identifying the association rules between clinicopathologic factors and higher survival performance in operation-centric oral cancer patients using the Apriori algorithm. Biomed Res Int. 2013;2013:359634.PubMedPubMedCentral Tang JY, Chuang LY, Hsi E, Lin YD, Yang CH, Chang HW. Identifying the association rules between clinicopathologic factors and higher survival performance in operation-centric oral cancer patients using the Apriori algorithm. Biomed Res Int. 2013;2013:359634.PubMedPubMedCentral
10.
go back to reference Wang C, Guo XJ, Xu JF, Wu C, Sun YL, Ye XF, et al. Exploration of the association rules mining technique for the signal detection of adverse drug events in spontaneous reporting systems. PLoS One. 2012;7(7):e40561.CrossRef Wang C, Guo XJ, Xu JF, Wu C, Sun YL, Ye XF, et al. Exploration of the association rules mining technique for the signal detection of adverse drug events in spontaneous reporting systems. PLoS One. 2012;7(7):e40561.CrossRef
13.
go back to reference Geng L, Hamilton HJ. Interestingness measures for data mining: a survey. ACM Computing Surveys (CSUR). 2006;38(3):9.CrossRef Geng L, Hamilton HJ. Interestingness measures for data mining: a survey. ACM Computing Surveys (CSUR). 2006;38(3):9.CrossRef
14.
go back to reference Sahar S. Interestingness measures-on determining what is interesting. Data mining and knowledge discovery handbook: Springer; 2009. p. 603–12. Sahar S. Interestingness measures-on determining what is interesting. Data mining and knowledge discovery handbook: Springer; 2009. p. 603–12.
15.
go back to reference Jalali-Heravi M, Zaïane OR, editors. A study on interestingness measures for associative classifiers. Proceedings of the 2010 ACM Symposium on Applied Computing; 2010: ACM. Jalali-Heravi M, Zaïane OR, editors. A study on interestingness measures for associative classifiers. Proceedings of the 2010 ACM Symposium on Applied Computing; 2010: ACM.
16.
go back to reference Davies HTO, Crombie IK, Tavakoli M. When can odds ratios mislead? BMJ. 1998;316(7136):989–91.CrossRef Davies HTO, Crombie IK, Tavakoli M. When can odds ratios mislead? BMJ. 1998;316(7136):989–91.CrossRef
17.
go back to reference Grimes DA, Schulz KF. Making sense of odds and odds ratios. Obstet Gynecol. 2008;111(2):423–6.CrossRef Grimes DA, Schulz KF. Making sense of odds and odds ratios. Obstet Gynecol. 2008;111(2):423–6.CrossRef
18.
go back to reference Viera AJ. Odds ratios and risk ratios: what's the difference and why does it matter? South Med J. 2008;101(7):730–4.CrossRef Viera AJ. Odds ratios and risk ratios: what's the difference and why does it matter? South Med J. 2008;101(7):730–4.CrossRef
23.
go back to reference Li J, Adilmagambetov A, Jabbar MSM, Zaïane OR, Osornio-Vargas A, Wine O. On discovering co-location patterns in datasets: a case study of pollutants and child cancers. GeoInformatica. 2016;20(4):651–92.CrossRef Li J, Adilmagambetov A, Jabbar MSM, Zaïane OR, Osornio-Vargas A, Wine O. On discovering co-location patterns in datasets: a case study of pollutants and child cancers. GeoInformatica. 2016;20(4):651–92.CrossRef
24.
go back to reference Hämäläinen W. Kingfisher: an efficient algorithm for searching for both positive and negative dependency rules with statistical significance measures. Knowl Inf Syst. 2012;32(2):383–414.CrossRef Hämäläinen W. Kingfisher: an efficient algorithm for searching for both positive and negative dependency rules with statistical significance measures. Knowl Inf Syst. 2012;32(2):383–414.CrossRef
27.
go back to reference Höfler M. Causal inference based on counterfactuals. BMC Med Res Methodol. 2005;5(1):28.CrossRef Höfler M. Causal inference based on counterfactuals. BMC Med Res Methodol. 2005;5(1):28.CrossRef
Metadata
Title
The index lift in data mining has a close relationship with the association measure relative risk in epidemiological studies
Authors
Khanh Vu
Rebecca A. Clark
Colin Bellinger
Graham Erickson
Alvaro Osornio-Vargas
Osmar R. Zaïane
Yan Yuan
Publication date
01-12-2019
Publisher
BioMed Central
Published in
BMC Medical Informatics and Decision Making / Issue 1/2019
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-019-0838-4

Other articles of this Issue 1/2019

BMC Medical Informatics and Decision Making 1/2019 Go to the issue