Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 5/2020

Open Access 01-08-2020 | Research

Ordinal labels in machine learning: a user-centered approach to improve data validity in medical settings

Authors: Andrea Seveso, Andrea Campagner, Davide Ciucci, Federico Cabitza

Published in: BMC Medical Informatics and Decision Making | Special Issue 5/2020

Login to get access

Abstract

Background

Despite the vagueness and uncertainty that is intrinsic in any medical act, interpretation and decision (including acts of data reporting and representation of relevant medical conditions), still little research has focused on how to explicitly take this uncertainty into account. In this paper, we focus on the representation of a general and wide-spread medical terminology, which is grounded on a traditional and well-established convention, to represent severity of health conditions (for instance, pain, visible signs), ranging from Absent to Extreme. Specifically, we will study how both potential patients and doctors perceive the different levels of the terminology in both quantitative and qualitative terms, and if the embedded user knowledge could improve the representation of ordinal values in the construction of machine learning models.

Methods

To this aim, we conducted a questionnaire-based research study involving a relatively large sample of 1,152 potential patients and 31 clinicians to represent numerically the perceived meaning of standard and widely-applied labels to describe health conditions. Using these collected values, we then present and discuss different possible fuzzy-set based representations that address the vagueness of medical interpretation by taking into account the perceptions of domain experts. We also apply the findings of this user study to evaluate the impact of different encodings on the predictive performance of common machine learning models in regard to a real-world medical prognostic task.

Results

We found significant differences in the perception of pain levels between the two user groups. We also show that the proposed encodings can improve the performances of specific classes of models, and discuss when this is the case.

Conclusions

In perspective, our hope is that the proposed techniques for ordinal scale representation and ordinal encoding may be useful to the research community, and also that our methodology will be applied to other widely used ordinal scales for improving validity of datasets and bettering the results of machine learning tasks.
Literature
1.
go back to reference Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017; 542(7639):115. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017; 542(7639):115.
2.
go back to reference Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, Venugopalan S, Widner K, Madams T, Cuadros J, Kim R, Raman R, Cuadros J, Nelson PC, Mega JL, Webster DR. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. J Am Med Assoc. 2016; 316(22):2402–10. Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, Venugopalan S, Widner K, Madams T, Cuadros J, Kim R, Raman R, Cuadros J, Nelson PC, Mega JL, Webster DR. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. J Am Med Assoc. 2016; 316(22):2402–10.
3.
go back to reference Cabitza F, Campagner A, Ciucci D. New frontiers in explainable AI: Understanding the GI to interpret the GO LNCS, volume 11713. In: International Cross-Domain Conference for Machine Learning and Knowledge Extraction. Cham: Springer: 2019. p. 27–47. Cabitza F, Campagner A, Ciucci D. New frontiers in explainable AI: Understanding the GI to interpret the GO LNCS, volume 11713. In: International Cross-Domain Conference for Machine Learning and Knowledge Extraction. Cham: Springer: 2019. p. 27–47.
4.
go back to reference Fox RC. Medical uncertainty revisited. Handb Soc Stud Health Med. 2000; 409:425. Fox RC. Medical uncertainty revisited. Handb Soc Stud Health Med. 2000; 409:425.
5.
go back to reference Abbod MF, von Keyserlingk DG, Linkens DA, Mahfouf M. Survey of utilisation of fuzzy technology in medicine and healthcare. Fuzzy Sets Syst. 2001; 120(2):331–49. Abbod MF, von Keyserlingk DG, Linkens DA, Mahfouf M. Survey of utilisation of fuzzy technology in medicine and healthcare. Fuzzy Sets Syst. 2001; 120(2):331–49.
6.
go back to reference Ahmadi H, Gholamzadeh M, Shahmoradi L, Nilashi M, Rashvand P. Diseases diagnosis using fuzzy logic methods: A systematic and meta-analysis review. Comput Methods Prog Biomed. 2018; 161:145–72. Ahmadi H, Gholamzadeh M, Shahmoradi L, Nilashi M, Rashvand P. Diseases diagnosis using fuzzy logic methods: A systematic and meta-analysis review. Comput Methods Prog Biomed. 2018; 161:145–72.
7.
go back to reference Szczepaniak P, Lisboa P, Kacprzyk J, (eds). Fuzzy Systems in Medicine. Heidelberg: Springer; 2000. Szczepaniak P, Lisboa P, Kacprzyk J, (eds). Fuzzy Systems in Medicine. Heidelberg: Springer; 2000.
8.
go back to reference Barro S, Marín R. Fuzzy Logic in Medicine. Heidelberg: Springer; 2002. Barro S, Marín R. Fuzzy Logic in Medicine. Heidelberg: Springer; 2002.
9.
go back to reference Godo L, de Mántaras RL, Puyol-Gruart J, Sierra C. Renoir, pneumon-ia and terap-ia: three medical applications based on fuzzy logic. Artif Intell Med. 2001; 21(1-3):153–62.PubMed Godo L, de Mántaras RL, Puyol-Gruart J, Sierra C. Renoir, pneumon-ia and terap-ia: three medical applications based on fuzzy logic. Artif Intell Med. 2001; 21(1-3):153–62.PubMed
10.
go back to reference Sanchez E. In: Jones A, Kaufmann A, Zimmermann H-J, (eds).Medical Applications with Fuzzy Sets. Dordrecht: Springer; 1986, pp. 331–47. Sanchez E. In: Jones A, Kaufmann A, Zimmermann H-J, (eds).Medical Applications with Fuzzy Sets. Dordrecht: Springer; 1986, pp. 331–47.
11.
go back to reference Vetterlein T, Mandl H, Adlassnig K-P. Fuzzy Arden syntax: A fuzzy programming language for medicine. Artif Intell Med. 2010; 49(1):1–10.PubMed Vetterlein T, Mandl H, Adlassnig K-P. Fuzzy Arden syntax: A fuzzy programming language for medicine. Artif Intell Med. 2010; 49(1):1–10.PubMed
12.
go back to reference El-Sappagh S, Elmogy M. A fuzzy ontology modeling for case base knowledge in diabetes mellitus domain. Eng Sci Technol Int J. 2017; 20(3):1025–40. El-Sappagh S, Elmogy M. A fuzzy ontology modeling for case base knowledge in diabetes mellitus domain. Eng Sci Technol Int J. 2017; 20(3):1025–40.
13.
go back to reference Lee C-S, Wang M-H, Hsu C-Y, Chen Z-W. Type-2 fuzzy set and fuzzy ontology for diet application. Stud Fuzziness Soft Comput. 2013; 301:237–56. Lee C-S, Wang M-H, Hsu C-Y, Chen Z-W. Type-2 fuzzy set and fuzzy ontology for diet application. Stud Fuzziness Soft Comput. 2013; 301:237–56.
14.
go back to reference Vetterlein T, Zamansky A. Reasoning with graded information: The case of diagnostic rating scales in healthcare. Fuzzy Sets Syst. 2016; 298:207–21. Vetterlein T, Zamansky A. Reasoning with graded information: The case of diagnostic rating scales in healthcare. Fuzzy Sets Syst. 2016; 298:207–21.
15.
go back to reference Zywica P. Modelling medical uncertainties with use of fuzzy sets and their extensions vol. 855. In: 17th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems. Cham: Springer: 2018. Zywica P. Modelling medical uncertainties with use of fuzzy sets and their extensions vol. 855. In: 17th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems. Cham: Springer: 2018.
16.
go back to reference Saripalle R, Runyan C, Russell M. Using HL7 FHIR to achieve interoperability in patient health record. J Biomed Inform. 2019; 94:103188.PubMed Saripalle R, Runyan C, Russell M. Using HL7 FHIR to achieve interoperability in patient health record. J Biomed Inform. 2019; 94:103188.PubMed
18.
go back to reference Hernandez G, Garin O, Dima AL, Pont A, Pastor MM, Alonso J, Van Ganse E, Laforest L, de Bruin M, Mayoral K, Serra-Sutton V, Ferrer M. EuroQol (EQ-5D-5L) Validity in Assessing the Quality of Life in Adults With Asthma: Cross-Sectional Study. J Med Internet Res. 2019; 21(1):10178. Hernandez G, Garin O, Dima AL, Pont A, Pastor MM, Alonso J, Van Ganse E, Laforest L, de Bruin M, Mayoral K, Serra-Sutton V, Ferrer M. EuroQol (EQ-5D-5L) Validity in Assessing the Quality of Life in Adults With Asthma: Cross-Sectional Study. J Med Internet Res. 2019; 21(1):10178.
19.
go back to reference Black N. Patient reported outcome measures could help transform healthcare. Br Med J. 2013; 346:167. Black N. Patient reported outcome measures could help transform healthcare. Br Med J. 2013; 346:167.
20.
go back to reference Baumhauer JF, Bozic KJ. Value-based healthcare: patient-reported outcomes in clinical decision making. Clin Orthop Relat Res. 2016; 474(6):1375–8.PubMedPubMedCentral Baumhauer JF, Bozic KJ. Value-based healthcare: patient-reported outcomes in clinical decision making. Clin Orthop Relat Res. 2016; 474(6):1375–8.PubMedPubMedCentral
21.
go back to reference Challener DW, Prokop LJ, Abu-Saleh O. The proliferation of reports on clinical scoring systems: Issues about uptake and clinical utility. J Am Med Assoc. 2019; 321(24):2405–406. Challener DW, Prokop LJ, Abu-Saleh O. The proliferation of reports on clinical scoring systems: Issues about uptake and clinical utility. J Am Med Assoc. 2019; 321(24):2405–406.
22.
go back to reference Forrest M, Andersen B. Ordinal scale and statistics in medical research. Br Med J Clin Res Ed. 1986; 292(6519):537–8.PubMedPubMedCentral Forrest M, Andersen B. Ordinal scale and statistics in medical research. Br Med J Clin Res Ed. 1986; 292(6519):537–8.PubMedPubMedCentral
23.
go back to reference Jakobsson U. Statistical presentation and analysis of ordinal data in nursing research. Scand J Caring Sci. 2004; 18(4):437–40.PubMed Jakobsson U. Statistical presentation and analysis of ordinal data in nursing research. Scand J Caring Sci. 2004; 18(4):437–40.PubMed
24.
go back to reference Salomon JA. Reconsidering the use of rankings in the valuation of health states: a model for estimating cardinal values from ordinal data. Popul Health Metrics. 2003; 1(1):12. Salomon JA. Reconsidering the use of rankings in the valuation of health states: a model for estimating cardinal values from ordinal data. Popul Health Metrics. 2003; 1(1):12.
25.
go back to reference Atkinson TM, Hay JL, Dueck AC, Mitchell SA, Mendoza TR, Rogak LJ, Minasian LM, Basch E. What do ‘none,’ ‘mild,’ ‘moderate,’‘severe,’ and ‘very severe’ mean to patients with cancer? Content validity of PRO-CTCAE response scales. J Pain Symptom Manag. 2018; 55(3):3–6. Atkinson TM, Hay JL, Dueck AC, Mitchell SA, Mendoza TR, Rogak LJ, Minasian LM, Basch E. What do ‘none,’ ‘mild,’ ‘moderate,’‘severe,’ and ‘very severe’ mean to patients with cancer? Content validity of PRO-CTCAE response scales. J Pain Symptom Manag. 2018; 55(3):3–6.
26.
go back to reference Zadeh LA. The concept of a linguistic variable and its application to approximate reasoning I. Inf Sci. 1975; 8(3):199–249. Zadeh LA. The concept of a linguistic variable and its application to approximate reasoning I. Inf Sci. 1975; 8(3):199–249.
27.
go back to reference Li Q. A novel likert scale based on fuzzy sets theory. Expert Syst Appl. 2013; 40(5):1609–18. Li Q. A novel likert scale based on fuzzy sets theory. Expert Syst Appl. 2013; 40(5):1609–18.
28.
go back to reference Vonglao P. Application of fuzzy logic to improve the likert scale to measure latent variables. Kasetsart J Soc Sci. 2017; 38(3):337–44. Vonglao P. Application of fuzzy logic to improve the likert scale to measure latent variables. Kasetsart J Soc Sci. 2017; 38(3):337–44.
30.
go back to reference Crichton N. Visual analogue scale (VAS). J Clin Nurs. 2001; 10(5):706–6. Crichton N. Visual analogue scale (VAS). J Clin Nurs. 2001; 10(5):706–6.
31.
go back to reference Cabitza F, Ciucci D. Fuzzification of ordinal classes. The case of the HL7 severity grading LNCS, volume 11142. In: International Conference on Scalable Uncertainty Management. Cham: Springer: 2018. p. 64–77. Cabitza F, Ciucci D. Fuzzification of ordinal classes. The case of the HL7 severity grading LNCS, volume 11142. In: International Conference on Scalable Uncertainty Management. Cham: Springer: 2018. p. 64–77.
32.
go back to reference Dijkman JG, van Haeringen H, de Lange SJ. Fuzzy numbers. J Math Anal Appl. 1983; 92(2):301–41. Dijkman JG, van Haeringen H, de Lange SJ. Fuzzy numbers. J Math Anal Appl. 1983; 92(2):301–41.
33.
go back to reference Van Leekwijck W, Kerre EE. Defuzzification: criteria and classification. Fuzzy Sets Syst. 1999; 108(2):159–78. Van Leekwijck W, Kerre EE. Defuzzification: criteria and classification. Fuzzy Sets Syst. 1999; 108(2):159–78.
34.
go back to reference Kroese D, Taimre T, Botev Z. Handbook of Monte Carlo Methods. Hoboken: Wiley; 2011. Kroese D, Taimre T, Botev Z. Handbook of Monte Carlo Methods. Hoboken: Wiley; 2011.
35.
go back to reference Greenfield S, Chiclana F, John R, Coupland S. The sampling method of defuzzification for type-2 fuzzy sets: Experimental evaluation. Inf Sci. 2012; 189:77–92. Greenfield S, Chiclana F, John R, Coupland S. The sampling method of defuzzification for type-2 fuzzy sets: Experimental evaluation. Inf Sci. 2012; 189:77–92.
36.
go back to reference Breiman L. Random forests. Mach Learn. 2001; 45(1):5–32. Breiman L. Random forests. Mach Learn. 2001; 45(1):5–32.
37.
go back to reference Smola AJ, Schölkopf B. A tutorial on support vector regression. Stat Comput. 2004; 14(3):199–222. Smola AJ, Schölkopf B. A tutorial on support vector regression. Stat Comput. 2004; 14(3):199–222.
38.
go back to reference Fernández-Delgado M, Cernadas E, Barro S, Amorim D. Do we need hundreds of classifiers to solve real world classification problems?J Mach Learn Res. 2014; 15(1):3133–81. Fernández-Delgado M, Cernadas E, Barro S, Amorim D. Do we need hundreds of classifiers to solve real world classification problems?J Mach Learn Res. 2014; 15(1):3133–81.
39.
go back to reference Altman NS. An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat. 1992; 46(3):175–85. Altman NS. An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat. 1992; 46(3):175–85.
40.
go back to reference Tibshirani R. Regression shrinkage and selection via the LASSO. J R Stat Soc Ser B Methodol. 1996; 58(1):267–88. Tibshirani R. Regression shrinkage and selection via the LASSO. J R Stat Soc Ser B Methodol. 1996; 58(1):267–88.
41.
go back to reference Massey Jr FJ. The Kolmogorov-Smirnov test for goodness of fit. J Am Stat Assoc. 1951; 46(253):68–78. Massey Jr FJ. The Kolmogorov-Smirnov test for goodness of fit. J Am Stat Assoc. 1951; 46(253):68–78.
42.
go back to reference Landoni E, Ambrogi F, Mariani L, Miceli R. Parametric and nonparametric two-sample tests for feature screening in class comparison: a simulation study. Epidemiol Biostat Public Health. 2016; 13(2). Landoni E, Ambrogi F, Mariani L, Miceli R. Parametric and nonparametric two-sample tests for feature screening in class comparison: a simulation study. Epidemiol Biostat Public Health. 2016; 13(2).
43.
go back to reference Mann HB, Whitney DR. On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat. 1947; 18(1):50–60. Mann HB, Whitney DR. On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat. 1947; 18(1):50–60.
44.
go back to reference Fay MP, Proschan MA. Wilcoxon-Mann-Whitney or t-test? on assumptions for hypothesis tests and multiple interpretations of decision rules. Stat Surv. 2010; 4:1.PubMedPubMedCentral Fay MP, Proschan MA. Wilcoxon-Mann-Whitney or t-test? on assumptions for hypothesis tests and multiple interpretations of decision rules. Stat Surv. 2010; 4:1.PubMedPubMedCentral
45.
go back to reference Boyle CM. Difference between patients’ and doctors’ interpretation of some common medical terms. Br Med J. 1970; 2(5704):286–9.PubMedPubMedCentral Boyle CM. Difference between patients’ and doctors’ interpretation of some common medical terms. Br Med J. 1970; 2(5704):286–9.PubMedPubMedCentral
46.
go back to reference Forrest M. Assessment of pain: a comparison between patients and doctors. Acta Anaesthesiol Scand. 1989; 33(3):255–6.PubMed Forrest M. Assessment of pain: a comparison between patients and doctors. Acta Anaesthesiol Scand. 1989; 33(3):255–6.PubMed
47.
go back to reference Cabitza F, Locoro A, Laderighi C, Rasoini R, Compagnone D, Berjano P. The elephant in the record: on the multiplicity of data recording work. Health Inform J. 2019; 25(3):475–90. Cabitza F, Locoro A, Laderighi C, Rasoini R, Compagnone D, Berjano P. The elephant in the record: on the multiplicity of data recording work. Health Inform J. 2019; 25(3):475–90.
48.
go back to reference Cabitza F, Ciucci D, Rasoini R. A giant with feet of clay: on the validity of the data that feed machine learning in medicine In: Cabitza F, Magni M, Batini C, editors. Organizing for the Digital World. Cham: Springer: 2019. p. 121–36. Cabitza F, Ciucci D, Rasoini R. A giant with feet of clay: on the validity of the data that feed machine learning in medicine In: Cabitza F, Magni M, Batini C, editors. Organizing for the Digital World. Cham: Springer: 2019. p. 121–36.
49.
go back to reference Hastie T, Tibshirani R, Friedman J. Additive Models, Trees, and Related Methods. New York, NY: Springer; 2009, pp. 295–336. Hastie T, Tibshirani R, Friedman J. Additive Models, Trees, and Related Methods. New York, NY: Springer; 2009, pp. 295–336.
50.
go back to reference Yuan M, Lin Y. J R Stat Soc Ser B Stat Methodol. 2006; 68(1):49–67. Yuan M, Lin Y. J R Stat Soc Ser B Stat Methodol. 2006; 68(1):49–67.
51.
go back to reference Puig AT, Wiesel A, Hero AO. A multidimensional shrinkage-thresholding operator. In: 2009 IEEE/SP 15th Workshop on Statistical Signal Processing. Cardiff: IEEE: 2009. p. 113–116. Puig AT, Wiesel A, Hero AO. A multidimensional shrinkage-thresholding operator. In: 2009 IEEE/SP 15th Workshop on Statistical Signal Processing. Cardiff: IEEE: 2009. p. 113–116.
52.
go back to reference Murphy KP. Machine Learning: a Probabilistic Perspective. Cambridge, Massachusetts: MIT press; 2012. Murphy KP. Machine Learning: a Probabilistic Perspective. Cambridge, Massachusetts: MIT press; 2012.
53.
go back to reference Potdar K, Pardawala TS, Pai CD. A comparative study of categorical variable encoding techniques for neural network classifiers. Int J Comput Appl. 2017; 175(4):7–9. Potdar K, Pardawala TS, Pai CD. A comparative study of categorical variable encoding techniques for neural network classifiers. Int J Comput Appl. 2017; 175(4):7–9.
54.
go back to reference Ranstam J. Why the p-value culture is bad and confidence intervals a better alternative. Osteoarthr Cartil. 2012; 20(8):805–8.PubMed Ranstam J. Why the p-value culture is bad and confidence intervals a better alternative. Osteoarthr Cartil. 2012; 20(8):805–8.PubMed
55.
go back to reference Hung M, Bounsanga J, Voss MW, Saltzman CL. Establishing minimum clinically important difference values for the patient-reported outcomes measurement information system physical function, hip disability and osteoarthritis outcome score for joint reconstruction, and knee injury and osteoarthritis outcome score for joint reconstruction in orthopaedics. World J Orthop. 2018; 9(3):41.PubMedPubMedCentral Hung M, Bounsanga J, Voss MW, Saltzman CL. Establishing minimum clinically important difference values for the patient-reported outcomes measurement information system physical function, hip disability and osteoarthritis outcome score for joint reconstruction, and knee injury and osteoarthritis outcome score for joint reconstruction in orthopaedics. World J Orthop. 2018; 9(3):41.PubMedPubMedCentral
56.
go back to reference Ophem HV, Stam P, Praag BV. Multichoice logit: modeling incomplete preference rankings of classical concerts. J Bus Econ Stat. 1999; 17(1):117–28. Ophem HV, Stam P, Praag BV. Multichoice logit: modeling incomplete preference rankings of classical concerts. J Bus Econ Stat. 1999; 17(1):117–28.
Metadata
Title
Ordinal labels in machine learning: a user-centered approach to improve data validity in medical settings
Authors
Andrea Seveso
Andrea Campagner
Davide Ciucci
Federico Cabitza
Publication date
01-08-2020
Publisher
BioMed Central
DOI
https://doi.org/10.1186/s12911-020-01152-8