Ontology Based Text Classifier for Information Extraction from Coronavirus Literature

Authors

  • M Sivakami Department of Computer Science School of Information Technology, Madurai Kamaraj University, Madurai, Tamilnadu, India
  • M Thangaraj Department of Computer Science School of Information Technology, Madurai Kamaraj University, Madurai, Tamilnadu, India

DOI:

https://doi.org/10.48048/tis.2021.47

Keywords:

COVID-19, Multi class classifier, Text classification, Ontology, Knowledge graphs

Abstract

The world is fighting an unprecedented coronavirus pandemic, and no country was prepared for it. Understanding the nature of this disease, when there is no available cure, is vital to encourage accurate clinical diagnosis and drug discovery prospects. When the amount of literature available is vast, it is important to represent the disease domain as completely as possible. The system should capture the morphology, semantics, syntax, and pragmatics of the given literature, in order to extract useful information. Also, building a classifier for a particular domain suffers from a zero frequency issue. To solve this effectively, latent topics are extracted and semantically represented in ontology to build a text classifier for coronavirus literature. The classifier is equipped with 2 components- ‘ontology’ and ‘machine learning data model’. Ontology helps to model the morphology and the semantic and pragmatic aspects of the text data through Latent Drichlet Allocation (LDA). It also preserves the contextual information in the document space, providing holistic feature representation facilities. To solve zero frequency and to extract actionable insights, a machine learning algorithm, Multi class Support Vector Machine (M-SVM), is incorporated with the ontology. It encodes features and achieves a classifier with highly discriminated classes. Further, to preserve contextual information space, and to enable data model formulation, the ontology is generated as a knowledge graph with their respective predefined classes. The resulting dataset can be used for clinical diagnosis and further research on the disease. Experimental results have shown that the proposed classifier outperforms the existing systems, with better domain representation.

HIGHLIGHTS

  • When the amount of literature available is vast, it is important to represent the disease domain as completely as possible. The system should capture the morphology, semantics, syntax, and pragmatics of the given literature, in order to extract useful information
  • The classifier is equipped with 2 components- ‘ontology’ and ‘machine learning data model’. Ontology helps to model the morphology and the semantic and pragmatic aspects of the text data through Latent Drichlet Allocation (LDA). It also preserves the contextual information in the document space, providing holistic feature representation facilities
  • To preserve contextual information space, and to enable data model formulation, the ontology is generated as a knowledge graph with their respective predefined classes. The resulting dataset can be used for clinical diagnosis and further research on the disease

GRAPHICAL ABSTRACT

Downloads

Download data is not yet available.

References

M Abdollahi, X Gao, Y Mei, S Ghosh and J Li. An ontology-based two-stage approach to medical text classification with feature selection by particle swarm optimisation. In: Proceedings of the IEEE Congress on Evolutionary Computation, Wellington, New Zealand. 2019, p. 119-26.

F Ali, D Kwak, P Khan, S El-Sappagh, A Ali, S Ullah, KH Kim and KS Kwak. Transportation sentiment analysis using word embedding and ontology-based topic modeling. Knowl. Based Syst. 2019; 174, 27-42.

M Allahyari, KJ Kochut and M Janik. Ontology-based text classification into dynamically defined topics. In: Proceedings of the IEEE International Conference on Semantic Computing, Newport Beach, CA, USA. 2014, p. 273-8.

F Camous, S Blott and AF Smeaton. Ontology-based MEDLINE document classification. In: S Hochreiter and R Wagner (Eds.). Bioinformatics research and development. Springer, Berlin Heidelberg, 2007, p. 439-52.

F Cardillo and U Straccia. Towards ontology-based explainable classification of rare events, Available at: https://hal.archives-ouvertes.fr/hal-02104520, accessed October 2020.

CK Cheng, X Pan and F Kurfess. Ontology-based semantic classification of unstructured documents. In: A Nurnberger and M Detyniecki (Eds.). Adaptive multimedia retrieval. Springer, Berlin, Heidelberg, 2004, p. 120-31.

J Fang, L Guo, X Wang and N Yang. Ontology-based automatic classification and ranking for web documents. In: Proceedings of the 4th International Conference on Fuzzy Systems and Knowledge Discovery, Haikou, China. 2007, p. 627-31.

M Gayathri and RJ Kannan. Ontology based concept extraction and classification of ayurvedic documents. Proc. Comput. Sci. 2020; 172, 511-6.

A Hawalah. Semantic ontology-based approach to enhance Arabic text classification. Big Data Cogn. Comput. 2019; 3, 53.

HH Kim and HY Rhee. An ontology-based labeling of influential topics using topic network analysis. J. Inf. Process. Syst. 2019; 15, 1096-107.

T Li and Z Chen. An ontology-based learning approach for automatically classifying security requirements. J. Syst. Softw. 2020; 165, 110566.

A Rashwan, O Ormandjieva and R Witte. Ontology-based classification of non-functional requirements in software specifications: A new corpus and SVM-based classifier. In: Proceedings of the IEEE 37th Annual Computer Software and Applications Conference, Kyoto, Japan. 2013, p. 381-6.

K Sangounpao and P Muenchaisri. Ontology-based naive bayes short text classification method for a small dataset. In: Proceedings of the 20th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, Toyama, Japan. 2019, p. 53-8.

N Shanavas, H Wang, Z Lin and G Hawe. Ontology-based enriched concept graphs for medical document classification. Inf. Sci. 2020; 525, 172-81.

M Thangaraj and M Sivakami. A comprehensive framework for ontology based classifier using unstructured data. Int. J. Eng. Adv. Technol. 2019; 9, 6918-25.

H Jelodar, Y Wang, C Yuan, X Feng, X Jiang, Y Li and L Zhao. Latent Dirichlet allocation (LDA) and topic modeling: Models, applications, a survey. Multimed. Tools Appl. 2019; 78, 15169-211.

VS Anoop, S Asharaf and P Deepak. Unsupervised concept hierarchy learning: A topic modeling guided approach. Proc. Comput. Sci. 2016; 89, 386-94.

RMA Mohammad. An enhanced multiclass support vector machine model and its application to classifying file systems affected by a digital crime. J. King Saud Univ. Comput. Inf. Sci. 2019, DOI: https://doi.org/10.1016/j.jksuci.2019.10.010.

E Bernadó-Mansilla and JM Garrell-Guiu. Accuracy-based learning classifier systems: Models, analysis and applications to classification tasks. Evol. Comput. 2003; 11, 209-38.

M Allahyari, KJ Kochut and M Janik. Ontology-based text classification into dynamically defined topics. In: Proceedings of the IEEE International Conference on Semantic Computing, Newport Beach, CA, USA. 2014, p. 273-8.

NW Chi, KY Lin and SH Hsieh. Using ontology-based text classification to assist Job Hazard Analysis. Adv. Eng. Inf. 2014; 28, 381-94.

N Sanchez-Pi, L Marti and ACB Garcia. Improving ontology-based text classification: An occupational health and security application. J. Appl. Log. 2016; 17, 48-58.

F Ali, D Kwak, P Khan, S El-Sappagh, A Ali, S Ullah, KH Kim and KS Kwak. Transportation sentiment analysis using word embedding and ontology-based topic modeling. Knowl. Based Syst. 2019; 174, 27-42.

C Goutte and E Gaussier. A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In: Proceedings of the European Conference on Information Retrieval, Berlin, Heidelberg. 2005, p. 345-59.

A Tharwat. Classification assessment methods. Appl. Comput. Inf. 2021; 17, 168-92.

B Dutta and M DeBellis. CODO: An ontology for collection and analysis of COVID-19 data. In: Proceedings of the 12th International Conference on Knowledge Engineering and Ontology Development, Budapest, Hungary. 2020.

Downloads

Published

2021-11-23