Published in:
01-07-2022 | Original Article
Neuroimaging-ITM: A Text Mining Pipeline Combining Deep Adversarial Learning with Interaction Based Topic Modeling for Enabling the FAIR Neuroimaging Study
Authors:
Jianzhuo Yan, Lihong Chen, Yongchuan Yu, Hongxia Xu, Zhe Xu, Ying Sheng, Jianhui Chen
Published in:
Neuroinformatics
|
Issue 3/2022
Login to get access
Abstract
Sharing various neuroimaging digital resources have received widespread attention in FAIR (Findable, Accessible, Interoperable and Reusable) neuroscience. In order to support a comprehensive understanding of brain cognition, neuroimaging provenance should be constructed to characterize both research processes and results, and integrates various digital resources for quick replication and open cooperation. This brings new challenges to neuroimaging text mining, including fragmented information, lack of labelled corpora, and vague topics. This paper proposes a text mining pipeline for enabling the FAIR neuroimaging study. In order to avoid fragmented information, the Brain Informatics provenance model is redesigned based on NIDM (Neuroimaging Data Model) and FAIR facets. It can systematically capture the provenance requests from the FAIR neuroimaging study and then transform them into a group of text mining tasks. A neuroimaging text mining pipeline combining deep adversarial learning with interaction based topic modeling, called neuroimaging interaction topic model (Neuroimaging-ITM), is proposed to automatically extract neuroimaging provenance and identify research topics in the few-shot scenario. Finally, a group of experiments is completed by using real data from the journal PloS One. The experimental results show that Neuroimaging-ITM can systematically and accurately extract provenance information and obtain high-quality research topics from the full text of neuroimaging articles. Most of the mean F1 values of provenance extraction exceed 0.9. The topic coherence and KL (Kullback–Leibler) divergence reach 9.95 and 0.96 respectively. The results are obviously better than baseline methods.