To read this content please select one of the options below:

Feature Engineered Relation Extraction – Medical Documents Setting

Ioana Barbantan (Technical University of Cluj-Napoca, Cluj-Napoca, Romania)
Mihaela Porumb (Technical University of Cluj-Napoca, Cluj-Napoca, Romania)
Camelia Lemnaru (Technical University of Cluj-Napoca, Cluj-Napoca, Romania)
Rodica Potolea (Technical University of Cluj-Napoca, Cluj-Napoca, Romania)

International Journal of Web Information Systems

ISSN: 1744-0084

Article publication date: 15 August 2016

428

Abstract

Purpose

Improving healthcare services by developing assistive technologies includes both the health aid devices and the analysis of the data collected by them. The acquired data modeled as a knowledge base give more insight into each patient’s health status and needs. Therefore, the ultimate goal of a health-care system is obtaining recommendations provided by an assistive decision support system using such knowledge base, benefiting the patients, the physicians and the healthcare industry. This paper aims to define the knowledge flow for a medical assistive decision support system by structuring raw medical data and leveraging the knowledge contained in the data proposing solutions for efficient data search, medical investigation or diagnosis and medication prediction and relationship identification.

Design/methodology/approach

The solution this paper proposes for implementing a medical assistive decision support system can analyze any type of unstructured medical documents which are processed by applying Natural Language Processing (NLP) tasks followed by semantic analysis, leading to the medical concept identification, thus imposing a structure on the input documents. The structured information is filtered and classified such that custom decisions regarding patients’ health status can be made. The current research focuses on identifying the relationships between medical concepts as defined by the REMed (Relation Extraction from Medical documents) solution that aims at finding the patterns that lead to the classification of concept pairs into concept-to-concept relations.

Findings

This paper proposed the REMed solution expressed as a multi-class classification problem tackled using the support vector machine classifier. Experimentally, this paper determined the most appropriate setup for the multi-class classification problem which is a combination of lexical, context, syntactic and grammatical features, as each feature category is good at representing particular relations, but not all. The best results we obtained are expressed as F1-measure of 74.9 per cent which is 1.4 per cent better than the results reported by similar systems.

Research limitations/implications

The difficulty to discriminate between TrIP and TrAP relations revolves around the hierarchical relationship between the two classes as TrIP is a particular type (an instance) of TrAP. The intuition behind this behavior was that the classifier cannot discern the correct relations because of the bias toward the majority classes. The analysis was conducted by using only sentences from electronic health record that contain at least two medical concepts. This limitation was introduced by the availability of the annotated data with reported results, as relations were defined at sentence level.

Originality/value

The originality of the proposed solution lies in the methodology to extract valuable information from the medical records via semantic searches; concept-to-concept relation identification; and recommendations for diagnosis, treatment and further investigations. The REMed solution introduces a learning-based approach for the automatic discovery of relations between medical concepts. We propose an original list of features: lexical – 3, context – 6, grammatical – 4 and syntactic – 4. The similarity feature introduced in this paper has a significant influence on the classification, and, to the best of the authors’ knowledge, it has not been used as feature in similar solutions.

Keywords

Citation

Barbantan, I., Porumb, M., Lemnaru, C. and Potolea, R. (2016), "Feature Engineered Relation Extraction – Medical Documents Setting", International Journal of Web Information Systems, Vol. 12 No. 3, pp. 336-358. https://doi.org/10.1108/IJWIS-03-2016-0015

Publisher

:

Emerald Group Publishing Limited

Copyright © 2016, Emerald Group Publishing Limited

Related articles