Top

BMC Medical Informatics and Decision Making

Published in:

Open Access 01-12-2020 | Research article

Violence detection explanation via semantic roles embeddings

Authors: Enrico Mensa, Davide Colla, Marco Dalmasso, Marco Giustini, Carlo Mamo, Alessio Pitidis, Daniele P. Radicioni

Published in: BMC Medical Informatics and Decision Making | Issue 1/2020

Abstract

Background

Emergency room reports pose specific challenges to natural language processing techniques. In this setting, violence episodes on women, elderly and children are often under-reported. Categorizing textual descriptions as containing violence-related injuries (V) vs. non-violence-related injuries (NV) is thus a relevant task to the ends of devising alerting mechanisms to track (and prevent) violence episodes.

Methods

We present ViDeS (so dubbed after Violence Detection System), a system to detect episodes of violence from narrative texts in emergency room reports. It employs a deep neural network for categorizing textual ER reports data, and complements such output by making explicit which elements corroborate the interpretation of the record as reporting about violence-related injuries. To these ends we designed a novel hybrid technique for filling semantic frames that employs distributed representations of terms herein, along with syntactic and semantic information. The system has been validated on real data annotated with two sorts of information: about the presence vs. absence of violence-related injuries, and about some semantic roles that can be interpreted as major cues for violent episodes, such as the agent that committed violence, the victim, the body district involved, etc.. The employed dataset contains over 150K records annotated with class (V,NV) information, and 200 records with finer-grained information on the aforementioned semantic roles.

Results

We used data coming from an Italian branch of the EU-Injury Database (EU-IDB) project, compiled by hospital staff. Categorization figures approach full precision and recall for negative cases and.97 precision and.94 recall on positive cases. As regards as the recognition of semantic roles, we recorded an accuracy varying from.28 to.90 according to the semantic roles involved. Moreover, the system allowed unveiling annotation errors committed by hospital staff.

Conclusions

Explaining systems’ results, so to make their output more comprehensible and convincing, is today necessary for AI systems. Our proposal is to combine distributed and symbolic (frame-like) representations as a possible answer to such pressing request for interpretability. Although presently focused on the medical domain, the proposed methodology is general and, in principle, it can be extended to further application areas and categorization tasks.

Moulin B, Irandoust H, Bélanger M, Desbordes G. Explanation and argumentation capabilities: Towards the creation of more persuasive agents. Artif Intell Rev. 2002; 17(3):169–222.CrossRef

Aamodt A. Explanation-driven case-based reasoning. In: European Workshop on Case-Based Reasoning. Springer: 1993. p. 274–88.

Roth-Berghofer TR. Explanations and case-based reasoning: Foundational issues. In: European Conference on Case-Based Reasoning. Springer: 2004. p. 389–403.

Quinlan JR. Induction of decision trees. Mach Learn. 1986; 1(1):81–106.

Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol. 1996; 58(1):267–88.

Colla D, Mensa E, Radicioni DP, Lieto A. Tell me why: Computational explanation of conceptual similarity judgments. Commun Comput Inf Sci. 2018; 853:74–85. https://doi.org/10.1007/978-3-319-91473-2_7.

Mensa E, Radicioni DP, Lieto A. COVER: a linguistic resource combining common sense and lexicographic information. Lang Resour Eval. 2018; 52(4):921–48. https://doi.org/10.1007/s10579-018-9417-z.CrossRef

Voigt P, Von dem Bussche A. The EU General Data Protection Regulation (GDPR) In: A Practical Guide, editor. 1st Ed. Cham: Springer International Publishing: 2017.

Ras G, van Gerven M, Haselager P. In: Escalante H, et al., (eds).Explanation Methods in Deep Learning: Users, Values, Concerns and Challenges. Cham: Springer; 2018, pp. 19–36. https://doi.org/10.1007/978-3-319-98131-4_2. https://link.springer.com/chapter/10.1007%2F978-3-319-98131-4_2#citeas. https://link.springer.com/chapter/10.1007%2F978-3-319-98131-4_2#citeas.

10.

Pieters W. Explanation and trust: what to tell the user in security and AI?Ethics Inf Technol. 2011; 13(1):53–64.CrossRef

11.

Miller T. Explanation in artificial intelligence: Insights from the social sciences. 2019; 267:1–38.

12.

Lapuschkin S, Wäldchen S, Binder A, Montavon G, Samek W, Müller K-R. Unmasking clever hans predictors and assessing what machines really learn. Nat Commun. 2019; 10(1):1–8.CrossRef

13.

Basile V, Caselli T, Radicioni DP. Meaning in Context: Ontologically and linguistically motivated representations of objects and events. Appl Ontol. 2019; 14(4):335–41. https://doi.org/10.3233/AO-190221.CrossRef

14.

Samek W, Vol. 11700. Explainable AI: interpreting, explaining and visualizing deep learning: Springer; 2019.

15.

World Health Organization. Responding to intimate partner violence and sexual violence against women: WHO clinical and policy guidelines: Technical report, World Health Organization; 2013.

16.

World Health Organization, et al.WHO: addressing violence against women: key achievements and priorities: Technical report, World Health Organization; 2018.

17.

Leeb RT. Child maltreatment surveillance: Uniform definitions for public health and recommended data elements. Centers for Disease Control and Prevention, National Center for Injury Prevention and Control. 2008.

18.

Fillmore CJ, Baker C. A frames approach to semantic analysis. In: The Oxford Handbook of Linguistic Analysis: 2010.

19.

Hermann KM, Das D, Weston J, Ganchev K. Semantic frame identification with distributed word representations. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Baltimore, Maryland: Association for Computational Linguistics: 2014. p. 1448–58. https://doi.org/10.3115/v1/P14-1136. https://www.aclweb.org/anthology/P14-1136.

20.

Sikos J, Padó S. Using embeddings to compare framenet frames across languages. In: Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing: 2018. p. 91–101.

21.

Palmirani M, Ceci M, Radicioni DP, Mazzei A. FrameNet model of the suspension of norms. In: Proceedings of the 13th International Conference on Artificial Intelligence and law: 2011. p. 189–93. https://doi.org/10.1145/2018358.2018385.

22.

Gianfelice D, Lesmo L, Palmirani M, Perlo D, Radicioni DP. Modificatory provisions detection: a hybrid NLP approach. In: Proceedings of the 14th International Conference on Artificial Intelligence and Law: 2013. p. 43–52. https://doi.org/10.1145/2514601.2514607.

23.

Gildea D, Jurafsky D. Automatic labeling of semantic roles. Comput Linguist. 2002; 28(3):245–88.CrossRef

24.

Croce D, Giannone C, Annesi P, Basili R. Towards open-domain semantic role labeling. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics: 2010. p. 237–46, Association for Computational Linguistics.

25.

Zapirain B, Agirre E, Marquez L, Surdeanu M. Selectional preferences for semantic role classification. Comput Linguist. 2013; 39(3):631–63.CrossRef

26.

Roth M, Lapata M. Neural semantic role labeling with dependency path embeddings. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers): 2016. p. 1192–202.

27.

Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. J Mach Learn Res. 2011; 12:2493–537.

28.

Haug PJ, Koehler SB, Christensen LM, Gundersen ML, Van Bree RE. Probabilistic method for natural language processing and for encoding free-text data into a medical database by utilizing a Bayesian network to perform spell checking of words. 2001. US Patent 6,292,771.

29.

Ruch P, Baud RH, Geiddbühler A, Lovis C, Rassinoux A-M, Riviere A. Looking back or looking all around: comparing two spell checking strategies for documents edition in an electronic patient record. In: Proceedings of the AMIA Symposium: 2001. p. 568, American Medical Informatics Association.

30.

Lyons R, Kisse R, Rogmans W. EU-Injury database Introduction to the functioning of the Injury Database (IDB). European Association for Injury Prevention and Safety Promotion (EuroSafe). 2015. https://bit.ly/37FAKaB.

31.

Kisser R, Latarjet J, Bauer R, Rogmans W. Injury data needs and opportunities in Europe. Int J Inj Control Saf Promot. 2009; 16(2):103–12.CrossRef

32.

McNaughton R, Yamada H. Regular expressions and state graphs for automata. IRE transactions on Electronic Comput. 1960; EC-9(1):39–47.CrossRef

33.

Bojanowski GE, Joulin A, Mikolov T. Enriching word vectors with subword information. Trans Assoc Comput Linguist. 2017; 5:135–46.CrossRef

34.

Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. 2014.

35.

Minsky M. A framework for representing knowledge. In: Computation & Intelligence: 1995. p. 163–89, American Association for Artificial Intelligence.

36.

Fillmore CJ. Frame semantics. Cogn Linguist Basic Readings. 2006; 34:373–400.CrossRef

37.

Jurafsky D. Part-of-speech tagging. In: Speech & language processing. Upper Saddle River: Pearson Education India: 2009. p. 157–206.

38.

Tseng H, Jurafsky D, Manning CD. Morphological features help POS tagging of unknown words across language varieties. In: Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing. Association for Computational Linguistics: 2005. p. 32–39.

39.

Ciaramita M, Altun Y. Broad-coverage sense disambiguation and information extraction with a supersense sequence tagger. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing: 2006. p. 594–602, Association for Computational Linguistics.

40.

Miller GA. WordNet: a lexical database for English. Commun ACM. 1995; 38(11):39–41.CrossRef

41.

Aprosio AP, Moretti G. Italy goes to Stanford: a collection of CoreNLP modules for Italian. arXiv preprint arXiv:1609.06204. 2016. http://arxiv.org/abs/1609.06204.

42.

Chen D, Manning C. A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP): 2014. p. 740–50.

43.

Picca D, Gliozzo AM, Ciaramita M. Supersense tagger for Italian. In: Proceedings of the International Conference on Language Resources and Evaluation. European Language Resources Association: 2008. p. 2386–90.

44.

Pitidis A, Fondi G, Giustini M, Longo E, Balducci G, Gruppo di lavoro SINIACA-IDB, Dipartimento di Ambiente e Connessa Prevenzione Primaria ISS. Il Sistema SINIACA-IDB per la sorveglianza degli incidenti. Notiziario dell’Istituto Superiore di Sanità. 2014; 27(2):11–6.

45.

Zvára K, Tomecková M, Peleška J, Svátek V, Zvárová J. Tool-supported interactive correction and semantic annotation of narrative clinical reports. Methods Inf Med. 2017; 56(03):217–29.PubMedCrossRef

46.

Wang L, Luo L, Wang Y, Wampfler J, Yang P, Liu H. Natural language processing for populating lung cancer clinical research data. BMC Med Informa Decis Mak. 2019; 19(5):239.CrossRef

47.

Constant M, Eryiğit G, Monti J, Van Der Plas L, Ramisch C, Rosner M, Todirascu A. Multiword expression processing: A survey. Comput Linguist. 2017; 43(4):837–92.CrossRef

48.

Ramisch C, Villavicencio A, Boitet C. Mwetoolkit: a framework for multiword expression identification. In: LREC: 2010. p. 662–9, Valletta.

49.

Ivanova A, Oepen S, Øvrelid L. Survey on parsing three dependency representations for English. In: 51st Annual Meeting of the Association for Computational Linguistics Proceedings of the Student Research Workshop: 2013. p. 31–7.

50.

De Mori R. Spoken language understanding: a survey. In: 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU): 2007. p. 365–76, IEEE.

51.

Wang Z, Zhang J, Feng J, Chen Z. Knowledge graph embedding by translating on hyperplanes. In: Twenty-Eighth AAAI Conference on Artificial Intelligence: 2014. p. 1112–9.

52.

Goyal P, Ferrara E. Graph embedding techniques, applications, and performance: A survey. Knowl-Based Syst. 2018; 151:78–94.CrossRef

53.

Ma F, Wang Y, Xiao H, Yuan Y, Chitta R, Zhou J, Gao J. Incorporating medical code descriptions for diagnosis prediction in healthcare. BMC Med Informa Decis Mak. 2019; 19(6):1–13.

54.

Danescu-Niculescu-Mizil C, Gamon M, Dumais S. Mark my words!: Linguistic style accommodation in social media. In: Proceedings of the 20th International Conference on World Wide Web: 2011. p. 745–54, ACM.

55.

Wang Y-Y. A robust parser for spoken language understanding. In: Sixth European Conference on Speech Communication and Technology: 1999.

56.

Aldinucci M, Bagnasco S, Lusso S, Pasteris P, Rabellino S, Vallero S. OCCAM: a flexible, multi-purpose and extendable HPC cluster. J Phys Conf Ser. 2017; 898(8):082039.CrossRef

Title: Violence detection explanation via semantic roles embeddings
Authors: Enrico Mensa
Davide Colla
Marco Dalmasso
Marco Giustini
Carlo Mamo
Alessio Pitidis
Daniele P. Radicioni
Publication date: 01-12-2020
Publisher: BioMed Central
Published in: BMC Medical Informatics and Decision Making / Issue 1/2020
Electronic ISSN: 1472-6947
DOI: https://doi.org/10.1186/s12911-020-01237-4

Keynote webinar | Spotlight on sleep in brain health

Springer Medicine

Violence detection explanation via semantic roles embeddings

Abstract

Background

Methods

Results

Conclusions

Keynote webinar | Spotlight on sleep in brain health

Springer Medicine

Abstract

Background

Methods

Results

Conclusions

Please log in to get access to this content

Other articles of this Issue 1/2020

Towards developing a secure medical image sharing system based on zero trust principles and blockchain technology

Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients

Design and implementation of a mobile health electronic data capture platform that functions in fully-disconnected settings: a pilot study in rural Liberia

Composite CDE: modeling composite relationships between common data elements for representing complex clinical data

Validation of an EMR algorithm to measure the prevalence of ADHD in the Canadian Primary Care Sentinel Surveillance Network (CPCSSN)

Understanding the utilisation of a novel interactive electronic medication safety dashboard in general practice: a mixed methods study