Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 5/2018

Open Access 01-12-2018 | Research

Application of data mining methods to improve screening for the risk of early gastric cancer

Authors: Mi-Mi Liu, Li Wen, Yong-Jia Liu, Qiao Cai, Li-Ting Li, Yong-Ming Cai

Published in: BMC Medical Informatics and Decision Making | Special Issue 5/2018

Login to get access

Abstract

Background

Although gastric cancer is a malignancy with high morbidity and mortality in China, the survival rate of patients with early gastric cancer (EGC) is high after surgical resection. To strengthen diagnosing and screening is the key to improve the survival and life quality of patients with EGC. This study applied data mining methods to improve screening for the risk of EGC on the basis of noninvasive factors, and displayed important influence factors for the risk of EGC.

Methods

The dataset was derived from a project of the First Hospital Affiliated Guangdong Pharmaceutical University. A series of questionnaire surveys, serological examinations and endoscopy plus pathology biopsy were conducted in 618 patients with gastric diseases. Their risk of EGC was categorized into low and high risk of EGC by the results of endoscopy plus pathology biopsy. The synthetic minority oversampling technique (SMOTE) was used to solve imbalance categories of the risk of EGC. Four classification models of the risk of EGC was established, including logistic regression (LR) and three data mining algorithms.

Results

The three data mining models had higher accuracy than the LR model. Gain curves of the three data mining models were convexes more closer to ideal curves by contrast with that of the LR model. AUC of the three data mining models were larger than that of the LR model as well. The three data mining models predicted the risk of EGC more effectively in comparison with the LR model. Moreover, this study found 16 important influence factors for the risk of EGC, such as occupations, helicobacter pylori infection, drinking hot water and so on.

Conclusions

The three data mining models have optimal predictive behaviors over the LR model, therefore can effectively evaluate the risk of EGC and assist clinicians in improving the diagnosis and screening of EGC. Sixteen important influence factors for the risk of EGC were illustrated, which may helpfully assess gastric carcinogenesis, and remind to early prevention and early detection of gastric cancer. This study may also be conducive to clinical researchers in selecting and conducting the optimal predictive models.
Literature
1.
go back to reference Chen WQ, Zheng RS, Zhang SW, Zeng HM, Zou XL, Hao J. Report of Cancer incidence and mortality in China, 2013. China Cancer. 2017;26:1–7. Chen WQ, Zheng RS, Zhang SW, Zeng HM, Zou XL, Hao J. Report of Cancer incidence and mortality in China, 2013. China Cancer. 2017;26:1–7.
4.
go back to reference Wang YC, Wei LJ, Liu JT, Li SX, Wang QS. Comparison and Analysis of the incidence and mortality rate of Cancer in developed and developing countries. Chin J Clin Oncol. 2012;39:679–82. Wang YC, Wei LJ, Liu JT, Li SX, Wang QS. Comparison and Analysis of the incidence and mortality rate of Cancer in developed and developing countries. Chin J Clin Oncol. 2012;39:679–82.
5.
go back to reference Deng GH. A review on early diagnosis of gastric cancer. Clin J Chin Med. 2017;9:146–8. Deng GH. A review on early diagnosis of gastric cancer. Clin J Chin Med. 2017;9:146–8.
6.
go back to reference Park CH, Kim EH, Chung H, Lee H, Park JC, Shin SK, Lee YC, An JY, Kim HI, Cheong JH. The optimal endoscopic screening interval for detecting early gastric neoplasms. Gastrointest Endosc. 2014;80:253–9.CrossRefPubMed Park CH, Kim EH, Chung H, Lee H, Park JC, Shin SK, Lee YC, An JY, Kim HI, Cheong JH. The optimal endoscopic screening interval for detecting early gastric neoplasms. Gastrointest Endosc. 2014;80:253–9.CrossRefPubMed
7.
go back to reference Zhu P, Wu YL. The present situation and countermeasure for missed diagnosis of gastric cancer with gastroscopy in China. J Int Med Con Pract. 2015;10:158–60. Zhu P, Wu YL. The present situation and countermeasure for missed diagnosis of gastric cancer with gastroscopy in China. J Int Med Con Pract. 2015;10:158–60.
8.
go back to reference Crowther PS, Cox RJ. A method for optimal division of data sets for use in neural networks. Knowledge-based intelligent information and engineering systems, Pt 4, Proceedings. 2005;3684:1–7. Crowther PS, Cox RJ. A method for optimal division of data sets for use in neural networks. Knowledge-based intelligent information and engineering systems, Pt 4, Proceedings. 2005;3684:1–7.
9.
go back to reference López V, Fernandez A, Garcia S, Palade V, Herrera F. An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci. 2013;250:113–41.CrossRef López V, Fernandez A, Garcia S, Palade V, Herrera F. An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci. 2013;250:113–41.CrossRef
10.
go back to reference Luengo J, Fernandez A, Garcia S, Herrera F. Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling. Soft Comput. 2011;15:1909–36.CrossRef Luengo J, Fernandez A, Garcia S, Herrera F. Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling. Soft Comput. 2011;15:1909–36.CrossRef
11.
go back to reference Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57.CrossRef Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57.CrossRef
12.
go back to reference Nakamura M, Kajiwara Y, Otsuka A, Kimura H. LVQ-SMOTE-learning vector quantization based synthetic minority over-sam-pling technique for biomedical data. Biodata Mining. 2013;6:1–10.CrossRef Nakamura M, Kajiwara Y, Otsuka A, Kimura H. LVQ-SMOTE-learning vector quantization based synthetic minority over-sam-pling technique for biomedical data. Biodata Mining. 2013;6:1–10.CrossRef
13.
go back to reference Dai HL. Class imbalance learning via a fuzzy total margin based support vector machine. Appl Soft Comput. 2015;31:172–84.CrossRef Dai HL. Class imbalance learning via a fuzzy total margin based support vector machine. Appl Soft Comput. 2015;31:172–84.CrossRef
14.
go back to reference Sun T, Wu HF, Liang ZG, He W, Zhang L, Lv PX, Guo XH. Application of SMOTE arithmetic for unbalanced data. Beijing Biomed Eng. 2012;31:528–30. Sun T, Wu HF, Liang ZG, He W, Zhang L, Lv PX, Guo XH. Application of SMOTE arithmetic for unbalanced data. Beijing Biomed Eng. 2012;31:528–30.
15.
go back to reference Chen SJ, Yang L, Wu SZ, Li J. C4.5 classification-based quantitative analysis of risk factors for respiratory diseases. Chin J Med Library Inform Sci. 2016;25:35–41. Chen SJ, Yang L, Wu SZ, Li J. C4.5 classification-based quantitative analysis of risk factors for respiratory diseases. Chin J Med Library Inform Sci. 2016;25:35–41.
16.
go back to reference Lawrence RL, Moran CJ. The America view classification methods accuracy comparison project: a rigorous approach for model selection. Remote Sens Environ. 2015;170:115–20.CrossRef Lawrence RL, Moran CJ. The America view classification methods accuracy comparison project: a rigorous approach for model selection. Remote Sens Environ. 2015;170:115–20.CrossRef
17.
go back to reference Rafe V, Farhoud SH, Rasoolzadeh S. Breast Cancer prediction by using C5.0 algorithm and BOOSTING method. J Med Imag Health Inform. 2014;4:600–4.CrossRef Rafe V, Farhoud SH, Rasoolzadeh S. Breast Cancer prediction by using C5.0 algorithm and BOOSTING method. J Med Imag Health Inform. 2014;4:600–4.CrossRef
18.
go back to reference Madden MG. On the classification performance of TAN and general Bayesian networks. Knowl-Based Syst. 2009;22:489–95.CrossRef Madden MG. On the classification performance of TAN and general Bayesian networks. Knowl-Based Syst. 2009;22:489–95.CrossRef
19.
go back to reference Browne A. Representation and extrapolation in multilayer perceptrons. Neural Comput. 2002;14:1739–54.CrossRefPubMed Browne A. Representation and extrapolation in multilayer perceptrons. Neural Comput. 2002;14:1739–54.CrossRefPubMed
20.
go back to reference Kim YS. Performance evaluation for classification methods: a comparative simulation study. Expert Syst Appl. 2010;37:2292–306.CrossRef Kim YS. Performance evaluation for classification methods: a comparative simulation study. Expert Syst Appl. 2010;37:2292–306.CrossRef
21.
go back to reference Kim YS. Comparison of the decision tree, artificial neural network, and linear regression methods based on the number and types of independent variables and sample size. Expert Syst Appl. 2008;34:1227–34.CrossRef Kim YS. Comparison of the decision tree, artificial neural network, and linear regression methods based on the number and types of independent variables and sample size. Expert Syst Appl. 2008;34:1227–34.CrossRef
22.
go back to reference Ture M, Tokatli F, Omurlu IK. The comparisons of prognostic indexes using data mining techniques and cox regression analysis in the breast cancer data. Expert Syst Appl. 2009;36:8247–54.CrossRef Ture M, Tokatli F, Omurlu IK. The comparisons of prognostic indexes using data mining techniques and cox regression analysis in the breast cancer data. Expert Syst Appl. 2009;36:8247–54.CrossRef
23.
go back to reference Delen D, Walker G, Kadam A. Predicting breast cancer survivability: a comparison of three data mining methods. Artif Intell Med. 2005;34:113–27.CrossRefPubMed Delen D, Walker G, Kadam A. Predicting breast cancer survivability: a comparison of three data mining methods. Artif Intell Med. 2005;34:113–27.CrossRefPubMed
24.
go back to reference Yamaguchi Y, Nagata Y, Hiratsuka R, Kawase Y, Tominaga T, Takeuchi S, Sakaganni S, Ishida S. Gastric Cancer screening by combined assay for serum anti-helicobacter pylori IgG antibody and serum pepsinogen levels-the ABC method. Digestion. 2016;93:13–8.CrossRefPubMed Yamaguchi Y, Nagata Y, Hiratsuka R, Kawase Y, Tominaga T, Takeuchi S, Sakaganni S, Ishida S. Gastric Cancer screening by combined assay for serum anti-helicobacter pylori IgG antibody and serum pepsinogen levels-the ABC method. Digestion. 2016;93:13–8.CrossRefPubMed
25.
go back to reference Meng WB, Bai B, Sheng L, Li Y, Yue P, Li X, Qiao L. Role of helicobacter pylori in gastric Cancer: advances and controversies. Discov Med. 2015;20:285–93.PubMed Meng WB, Bai B, Sheng L, Li Y, Yue P, Li X, Qiao L. Role of helicobacter pylori in gastric Cancer: advances and controversies. Discov Med. 2015;20:285–93.PubMed
26.
go back to reference Li YX, Li XM, Zhang N, Zhang W, Chen C, Tao L, Zhao J, Li SG, Li F, Zhang WJ. Associations of helicobacter pylori infection with the pathogenesis, progression and prognosis in patients with gastric cancer. Chin J Cancer Prev Treat. 2015;22:91–4. Li YX, Li XM, Zhang N, Zhang W, Chen C, Tao L, Zhao J, Li SG, Li F, Zhang WJ. Associations of helicobacter pylori infection with the pathogenesis, progression and prognosis in patients with gastric cancer. Chin J Cancer Prev Treat. 2015;22:91–4.
27.
go back to reference Zou SM, Du RY, Wen D, Chen Y, Wang FH, Zhao D, Zhu N. Heavy metals pollution in vegetables grown on some farmlands around Dabaoshan mine and its HealthyRisk evaluation. J Agric Res Environ. 2016;33:568–75. Zou SM, Du RY, Wen D, Chen Y, Wang FH, Zhao D, Zhu N. Heavy metals pollution in vegetables grown on some farmlands around Dabaoshan mine and its HealthyRisk evaluation. J Agric Res Environ. 2016;33:568–75.
28.
go back to reference Dong M, Liu HB, Wang YX, Shi WF. Analysis of water quality from homemade wells in Fangshan district. Beijing Chin J Health Lab Tec. 2016;26:2240–1. Dong M, Liu HB, Wang YX, Shi WF. Analysis of water quality from homemade wells in Fangshan district. Beijing Chin J Health Lab Tec. 2016;26:2240–1.
29.
go back to reference Chen L, Chen J, Zhang XZ, Xie P. A review of reproductive toxicity of microcystins. J Hazard Mater. 2016;301:381–99.CrossRefPubMed Chen L, Chen J, Zhang XZ, Xie P. A review of reproductive toxicity of microcystins. J Hazard Mater. 2016;301:381–99.CrossRefPubMed
30.
go back to reference Wu ZY, Chen RF, Liu WY, Ye QY, Chen F, Wang Z, Huang GP, Xiang XQ, Zhang GB. A case-control study on the relationship of drinking water from farmland edge and digestive tract cancers. Zhejiang Prev Med. 2014;26:888–92. Wu ZY, Chen RF, Liu WY, Ye QY, Chen F, Wang Z, Huang GP, Xiang XQ, Zhang GB. A case-control study on the relationship of drinking water from farmland edge and digestive tract cancers. Zhejiang Prev Med. 2014;26:888–92.
31.
go back to reference Malongane F, McGaw LJ, Mudau FN. The synergistic potential of various teas, herbs and therapeutic drugs in health improvement: a review. J Sci Food Agric. 2017;97:4679–89.CrossRef Malongane F, McGaw LJ, Mudau FN. The synergistic potential of various teas, herbs and therapeutic drugs in health improvement: a review. J Sci Food Agric. 2017;97:4679–89.CrossRef
32.
go back to reference Cheng SL, Zhang FB, Li B. Risk factors for gastric Cancer in Chinese population: a meta-analysis. Chin J Public Health. 2017;33:1775–80. Cheng SL, Zhang FB, Li B. Risk factors for gastric Cancer in Chinese population: a meta-analysis. Chin J Public Health. 2017;33:1775–80.
33.
go back to reference Karagulle M, Fidan E, Kavgac H, Ozdemir F. The effects of environmental and dietary factors on the development of gastric cancer. J Buon. 2014;19:1076–82.PubMed Karagulle M, Fidan E, Kavgac H, Ozdemir F. The effects of environmental and dietary factors on the development of gastric cancer. J Buon. 2014;19:1076–82.PubMed
36.
go back to reference Yaghoobi M, Bijarchi R, Narod SA. Family history and the risk of gastric cancer. Br J Cancer. 2010;102:237–42.CrossRefPubMed Yaghoobi M, Bijarchi R, Narod SA. Family history and the risk of gastric cancer. Br J Cancer. 2010;102:237–42.CrossRefPubMed
38.
go back to reference Sekikawa A, Fukui H, Maruo T, Tsumura T, Okabe Y, Osaki Y. Diabetes mellitus increases the risk of early gastric cancer development. Eur J Cancer. 2014;50:2065–71.CrossRefPubMed Sekikawa A, Fukui H, Maruo T, Tsumura T, Okabe Y, Osaki Y. Diabetes mellitus increases the risk of early gastric cancer development. Eur J Cancer. 2014;50:2065–71.CrossRefPubMed
39.
go back to reference Hong JB, Zuo W, Wang AJ, Xu S, Tu LX, Chen YX, Zhu X, Lu NH. Gastric ulcer patients are more susceptible to developing gastric cancer compared with concomitant gastric and duodenal ulcer patients. Oncol Lett. 2014;8:2790–4.CrossRefPubMedPubMedCentral Hong JB, Zuo W, Wang AJ, Xu S, Tu LX, Chen YX, Zhu X, Lu NH. Gastric ulcer patients are more susceptible to developing gastric cancer compared with concomitant gastric and duodenal ulcer patients. Oncol Lett. 2014;8:2790–4.CrossRefPubMedPubMedCentral
40.
go back to reference Ali Z, Deng Y, Ma C. Progress of research in gastric Cancer. J Nanosci Nanotechnol. 2012;12:8241–8.CrossRefPubMed Ali Z, Deng Y, Ma C. Progress of research in gastric Cancer. J Nanosci Nanotechnol. 2012;12:8241–8.CrossRefPubMed
41.
go back to reference Jang JS, Choi SR, Han SY, et al. Predictive significance of serum IL-6, VEGF, and CRP in gastric adenoma and mucosal carcinoma before endoscopic submucosal dissection. Kor J Gastroenterol. 2009;54:99–107.CrossRef Jang JS, Choi SR, Han SY, et al. Predictive significance of serum IL-6, VEGF, and CRP in gastric adenoma and mucosal carcinoma before endoscopic submucosal dissection. Kor J Gastroenterol. 2009;54:99–107.CrossRef
Metadata
Title
Application of data mining methods to improve screening for the risk of early gastric cancer
Authors
Mi-Mi Liu
Li Wen
Yong-Jia Liu
Qiao Cai
Li-Ting Li
Yong-Ming Cai
Publication date
01-12-2018
Publisher
BioMed Central
DOI
https://doi.org/10.1186/s12911-018-0689-4

Other articles of this Special Issue 5/2018

BMC Medical Informatics and Decision Making 5/2018 Go to the issue