Skip to main content
Top
Published in: Neuroinformatics 3/2022

01-07-2022 | Original Article

Neuroimaging-ITM: A Text Mining Pipeline Combining Deep Adversarial Learning with Interaction Based Topic Modeling for Enabling the FAIR Neuroimaging Study

Authors: Jianzhuo Yan, Lihong Chen, Yongchuan Yu, Hongxia Xu, Zhe Xu, Ying Sheng, Jianhui Chen

Published in: Neuroinformatics | Issue 3/2022

Login to get access

Abstract

Sharing various neuroimaging digital resources have received widespread attention in FAIR (Findable, Accessible, Interoperable and Reusable) neuroscience. In order to support a comprehensive understanding of brain cognition, neuroimaging provenance should be constructed to characterize both research processes and results, and integrates various digital resources for quick replication and open cooperation. This brings new challenges to neuroimaging text mining, including fragmented information, lack of labelled corpora, and vague topics. This paper proposes a text mining pipeline for enabling the FAIR neuroimaging study. In order to avoid fragmented information, the Brain Informatics provenance model is redesigned based on NIDM (Neuroimaging Data Model) and FAIR facets. It can systematically capture the provenance requests from the FAIR neuroimaging study and then transform them into a group of text mining tasks. A neuroimaging text mining pipeline combining deep adversarial learning with interaction based topic modeling, called neuroimaging interaction topic model (Neuroimaging-ITM), is proposed to automatically extract neuroimaging provenance and identify research topics in the few-shot scenario. Finally, a group of experiments is completed by using real data from the journal PloS One. The experimental results show that Neuroimaging-ITM can systematically and accurately extract provenance information and obtain high-quality research topics from the full text of neuroimaging articles. Most of the mean F1 values of provenance extraction exceed 0.9. The topic coherence and KL (Kullback–Leibler) divergence reach 9.95 and 0.96 respectively. The results are obviously better than baseline methods.
Literature
go back to reference Abacha, A. B., Herrera, A., Ke, W., Long, L. R., Antani, S., & Demner-Fushman, D.(2017). Named entity recognition in functional neuroimaging literature. IEEE International Conference on Bioinformatics Biomedicine. IEEE, Kansas City, MO, USA, 2218–2220. Abacha, A. B., Herrera, A., Ke, W., Long, L. R., Antani, S., & Demner-Fushman, D.(2017). Named entity recognition in functional neuroimaging literature. IEEE International Conference on Bioinformatics Biomedicine. IEEE, Kansas City, MO, USA, 2218–2220.
go back to reference Alhazmi, F., Beaton, D., & Abdi, H. (2018). Semantically defined subdomains of functional neuroimaging literature and their corresponding brain regions. Human Brain Mapping, 39(7), 2764–2776.CrossRef Alhazmi, F., Beaton, D., & Abdi, H. (2018). Semantically defined subdomains of functional neuroimaging literature and their corresponding brain regions. Human Brain Mapping, 39(7), 2764–2776.CrossRef
go back to reference Amplayo, R. K., & Hwang, S. W. (2017). Aspect Sentiment Model for Micro Reviews. IEEE International Conference on Data Mining (pp.727–732). In Proc. 2017 IEEE International Conference on Data Mining (ICDM), New Orleans, LA, 727–732. Amplayo, R. K., & Hwang, S. W. (2017). Aspect Sentiment Model for Micro Reviews. IEEE International Conference on Data Mining (pp.727–732). In Proc. 2017 IEEE International Conference on Data Mining (ICDM), New Orleans, LA, 727–732.
go back to reference Andrzejewski, D., Zhu, X., & Craven, M. (2009). Incorporating Domain Knowledge into Topic Modeling via Dirichlet Forest Priors. Proc Int Conf Mach Learn. Andrzejewski, D., Zhu, X., & Craven, M. (2009). Incorporating Domain Knowledge into Topic Modeling via Dirichlet Forest Priors. Proc Int Conf Mach Learn.
go back to reference Balikas, G., Amini, M. R., & Clausel, M. (2016). On a Topic Model for Sentences. InProc. 39th International ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR ’16). Pisa, Italy, 2016, 921–924. Balikas, G., Amini, M. R., & Clausel, M. (2016). On a Topic Model for Sentences. InProc. 39th International ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR ’16). Pisa, Italy, 2016, 921–924.
go back to reference Blei, D. M., Ng, A. Y., & Jordan, M. I. (2001). Latent Dirichlet Allocation. Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems: Natural and Synthetic, NIPS 2001, December 3–8, 2001, Vancouver, British Columbia, Canada. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2001). Latent Dirichlet Allocation. Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems: Natural and Synthetic, NIPS 2001, December 3–8, 2001, Vancouver, British Columbia, Canada.
go back to reference Camille, M., Satrajit, G., Yaroslav, H., Dorota, J., Nolan, N., et al. (2019). The best of both worlds: using semantic web with JSON-LD. An example with NIDM-Results Datalad. OHBM2019. Camille, M., Satrajit, G., Yaroslav, H., Dorota, J., Nolan, N., et al. (2019). The best of both worlds: using semantic web with JSON-LD. An example with NIDM-Results Datalad. OHBM2019.
go back to reference Chen, Z., Mukherjee, A., Bing, L., Hsu, M., & Ghosh, R. (2013). Leveraging Multi-Domain Prior Knowledge in Topic Models. in Proc. Twenty-Third international joint conference on Artificial Intelligence (IJCAI ’13), Beijing, China, 2071–2077. Chen, Z., Mukherjee, A., Bing, L., Hsu, M., & Ghosh, R. (2013). Leveraging Multi-Domain Prior Knowledge in Topic Models. in Proc. Twenty-Third international joint conference on Artificial Intelligence (IJCAI ’13), Beijing, China, 2071–2077.
go back to reference Cho, M., Ha, J., Park, C., & Park, S. (2020). Combinatorial feature embedding based on CNN and LSTM for biomedical named entity recognition. Journal of Biomedical Informatics, 103(2020):103381. Cho, M., Ha, J., Park, C., & Park, S. (2020). Combinatorial feature embedding based on CNN and LSTM for biomedical named entity recognition. Journal of Biomedical Informatics, 103(2020):103381.
go back to reference Dacosta-Aguayo, R., Graa, M., Fern´andez-Andu´jar, M., L´opez-Cancio, E., & Matar´o. M. (2014). Structural integrity of the contralesional hemisphere predicts cognitive impairment in ischemic stroke at three months. PloS One, 9(1). Dacosta-Aguayo, R., Graa, M., Fern´andez-Andu´jar, M., L´opez-Cancio, E., & Matar´o. M. (2014). Structural integrity of the contralesional hemisphere predicts cognitive impairment in ischemic stroke at three months. PloS One, 9(1).
go back to reference Dieng, A. B., Chong, W., Gao, J., & Paisley, J. (2016). TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency. In Proceedings of the International Conference on Learning Representations (ICLR 2017), Toulon, France. Dieng, A. B., Chong, W., Gao, J., & Paisley, J. (2016). TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency. In Proceedings of the International Conference on Learning Representations (ICLR 2017), Toulon, France.
go back to reference Fauqueur, J., Thillaisundara, A., & Togia, T. (2019). Constructing large scale biomedical knowledge bases from scratch with rapid annotation of interpretable patterns. Fauqueur, J., Thillaisundara, A., & Togia, T. (2019). Constructing large scale biomedical knowledge bases from scratch with rapid annotation of interpretable patterns.
go back to reference Huang, Y., Hullfish, J., DD Ridder, & Vanneste, S. (2018). Meta-analysis of functional subdivisions within human posteromedial cortex. Brain Structure and Function, (7). 224, 435–452. Huang, Y., Hullfish, J., DD Ridder, & Vanneste, S. (2018). Meta-analysis of functional subdivisions within human posteromedial cortex. Brain Structure and Function, (7). 224, 435–452.
go back to reference Huang, J., Xie, L., Guo, R., Wang, J., & Ma, S. (2020). Abnormal brain activity patterns during spatial working memory task in patients with end-stage renal disease on maintenance hemodialysis: A FMRI study. Brain Imaging and Behavior, 1–14 https://doi.org/10.1007/s11682-02000383-7 Huang, J., Xie, L., Guo, R., Wang, J., & Ma, S. (2020). Abnormal brain activity patterns during spatial working memory task in patients with end-stage renal disease on maintenance hemodialysis: A FMRI study. Brain Imaging and Behavior, 1–14 https://​doi.​org/​10.​1007/​s11682-02000383-7
go back to reference Keator, D., Helmer, K., Maumet, C., Padhy, S., Jarecka, D., Ghosh, S., Poline J. (2019). Tools for FAIR neuroimaging experiment metadata annotation with NIDM experiment. In: Proc. 25th Annual Meeting of the Organization for Human Brain Mapping (OHBM) 1–5. Keator, D., Helmer, K., Maumet, C., Padhy, S., Jarecka, D., Ghosh, S., Poline J. (2019). Tools for FAIR neuroimaging experiment metadata annotation with NIDM experiment. In: Proc. 25th Annual Meeting of the Organization for Human Brain Mapping (OHBM) 1–5.
go back to reference Martinsen, S., Flodin, P., Berrebi, J., L¨ofgren, M., Bileviciute-Ljungar, I., Ingvar, M., et al. (2014). Fibromyalgia patients had normal distraction related pain inhibition but cognitive impairment reflected in caudate nucleus and hippocampus during the stroop color word test. PloS One, 9. https://doi.org/10.1371/journal.pone.0108637 Martinsen, S., Flodin, P., Berrebi, J., L¨ofgren, M., Bileviciute-Ljungar, I., Ingvar, M., et al. (2014). Fibromyalgia patients had normal distraction related pain inhibition but cognitive impairment reflected in caudate nucleus and hippocampus during the stroop color word test. PloS One, 9https://​doi.​org/​10.​1371/​journal.​pone.​0108637
go back to reference Mikolov, T., Yih, W. T., & Zweig, G. (2013). Linguistic Regularities in Continuous Space Word Representations. Hlt Naacl, 746–751. Mikolov, T., Yih, W. T., & Zweig, G. (2013). Linguistic Regularities in Continuous Space Word Representations. Hlt Naacl, 746–751.
go back to reference Shalaby, W., & Zadrozny, W. (2017). Mined Semantic Analysis: A New Concept Space Model for Semantic, Representation of Textual Data. Shalaby, W., & Zadrozny, W. (2017). Mined Semantic Analysis: A New Concept Space Model for Semantic, Representation of Textual Data.
go back to reference Sheng, Y., Lin, S., Gao, J., He, X., & Chen, J. (2019). Research Sharing-Oriented Functional Neuroimaging Named Entity Recognition. 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). San Diego, CA, USA: IEEE Press, 2019, 1629–1632. Sheng, Y., Lin, S., Gao, J., He, X., & Chen, J. (2019). Research Sharing-Oriented Functional Neuroimaging Named Entity Recognition. 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). San Diego, CA, USA: IEEE Press, 2019, 1629–1632.
go back to reference Soomro, P. D., Kumar, S., Banbhrani, A. A. S., & Raj, H. (2017). Bio-NER: Biomedical Named Entity Recognition using Rule-Based and Statistical Learners. International Journal of Advanced Computer Science and Applications (IJACSA), 8(12), 163–170. Soomro, P. D., Kumar, S., Banbhrani, A. A. S., & Raj, H. (2017). Bio-NER: Biomedical Named Entity Recognition using Rule-Based and Statistical Learners. International Journal of Advanced Computer Science and Applications (IJACSA), 8(12), 163–170.
go back to reference Stevens, K., Kegelmeyer, P., Andrzejewski, D., & Buttler, D. (2012). Exploring topic coherence over many models and many topics. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning: 1214 July 2012; Jeju Island, Korea, 952–961. Stevens, K., Kegelmeyer, P., Andrzejewski, D., & Buttler, D. (2012). Exploring topic coherence over many models and many topics. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning: 1214 July 2012; Jeju Island, Korea, 952–961.
go back to reference Van Horn, J.D., Grethe, J.S., & Kostelec, P., et al. (2001). The functional magnetic resonance imaging data center (fMRIDC): the challenges and rewards of large-scale databasing of neuroimaging studies. Philosophical Transactions Royal Society B: Biological Sciences, 13231339. https://doi.org/10.1098/rstb.2001.0916. Van Horn, J.D., Grethe, J.S., & Kostelec, P., et al. (2001). The functional magnetic resonance imaging data center (fMRIDC): the challenges and rewards of large-scale databasing of neuroimaging studies. Philosophical Transactions Royal Society B: Biological Sciences, 13231339. https://​doi.​org/​10.​1098/​rstb.​2001.​0916.
go back to reference Zhihao. Y., That, T., Dai, H., Kothari, S., et al. (2018). Utilizing provenance in reusable research objects. Informatics. Zhihao. Y., That, T., Dai, H., Kothari, S., et al. (2018). Utilizing provenance in reusable research objects. Informatics.
go back to reference Yang, J. L, Zhang, Q. J., Guo, Y. M., Gao, Y. J., Ming-Yue, M. A., & Min, X. U. (2009). An MRI quantitative study of corpus callosum in normal adults. Journal of Medical Imaging, 23(6), 346-351. Yang, J. L, Zhang, Q. J., Guo, Y. M., Gao, Y. J., Ming-Yue, M. A., & Min, X. U. (2009). An MRI quantitative study of corpus callosum in normal adults. Journal of Medical Imaging, 23(6), 346-351.
go back to reference Yasunaga, M., Kasai, J., & Radev, D. (2018). Robust Multilingual Part-of-Speech Tagging via Adversarial Training. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Yasunaga, M., Kasai, J., & Radev, D. (2018). Robust Multilingual Part-of-Speech Tagging via Adversarial Training. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
go back to reference Yan, X., Guo, J., Lan, Y., et al. (2013). A biterm topic model for short texts. In International Conference on World Wide Web. ACM, 1445–1456. Yan, X., Guo, J., Lan, Y., et al. (2013). A biterm topic model for short texts. In International Conference on World Wide Web. ACM, 1445–1456.
go back to reference Zhang, S., Sheng, Y., Gao, J., Chen, J., Huang, J., & Lin, S. (2019). A Multi-domain Named Entity Recognition Method Based on Part-of-Speech Attention Mechanism. in Proc. CCF Conference on Computer Supported Cooperative Work and Social Computing, Kunming, China, 631–644. Zhang, S., Sheng, Y., Gao, J., Chen, J., Huang, J., & Lin, S. (2019). A Multi-domain Named Entity Recognition Method Based on Part-of-Speech Attention Mechanism. in Proc. CCF Conference on Computer Supported Cooperative Work and Social Computing, Kunming, China, 631–644.
Metadata
Title
Neuroimaging-ITM: A Text Mining Pipeline Combining Deep Adversarial Learning with Interaction Based Topic Modeling for Enabling the FAIR Neuroimaging Study
Authors
Jianzhuo Yan
Lihong Chen
Yongchuan Yu
Hongxia Xu
Zhe Xu
Ying Sheng
Jianhui Chen
Publication date
01-07-2022
Publisher
Springer US
Published in
Neuroinformatics / Issue 3/2022
Print ISSN: 1539-2791
Electronic ISSN: 1559-0089
DOI
https://doi.org/10.1007/s12021-022-09571-w

Other articles of this Issue 3/2022

Neuroinformatics 3/2022 Go to the issue