Search results

1 – 10 of 200
Article
Publication date: 7 April 2015

Andreas Vlachidis and Douglas Tudhope

The purpose of this paper is to present the role and contribution of natural language processing techniques, in particular negation detection and word sense disambiguation in the…

Abstract

Purpose

The purpose of this paper is to present the role and contribution of natural language processing techniques, in particular negation detection and word sense disambiguation in the process of Semantic Annotation of Archaeological Grey Literature. Archaeological reports contain a great deal of information that conveys facts and findings in different ways. This kind of information is highly relevant to the research and analysis of archaeological evidence but at the same time can be a hindrance for the accurate indexing of documents with respect to positive assertions.

Design/methodology/approach

The paper presents a method for adapting the biomedicine oriented negation algorithm NegEx to the context of archaeology and discusses the evaluation results of the new modified negation detection module. A particular form of polysemy, which is inflicted by the definition of ontology classes and concerning the semantics of small finds in archaeology, is addressed by a domain specific word-sense disambiguation module.

Findings

The performance of the negation dection module is compared against a “Gold Standard” that consists of 300 manually annotated pages of archaeological excavation and evaluation reports. The evaluation results are encouraging, delivering overall 89 per cent precision, 80 per cent recall and 83 per cent F-measure scores. The paper addresses limitations and future improvements of the current work and highlights the need for ontological modelling to accommodate negative assertions.

Originality/value

The discussed NLP modules contribute to the aims of the OPTIMA pipeline delivering an innovative application of such methods in the context of archaeological reports for the semantic annotation of archaeological grey literature with respect to the CIDOC-CRM ontology.

Article
Publication date: 21 December 2020

Sudha Cheerkoot-Jalim and Kavi Kumar Khedo

This work shows the results of a systematic literature review on biomedical text mining. The purpose of this study is to identify the different text mining approaches used in…

Abstract

Purpose

This work shows the results of a systematic literature review on biomedical text mining. The purpose of this study is to identify the different text mining approaches used in different application areas of the biomedical domain, the common tools used and the challenges of biomedical text mining as compared to generic text mining algorithms. This study will be of value to biomedical researchers by allowing them to correlate text mining approaches to specific biomedical application areas. Implications for future research are also discussed.

Design/methodology/approach

The review was conducted following the principles of the Kitchenham method. A number of research questions were first formulated, followed by the definition of the search strategy. The papers were then selected based on a list of assessment criteria. Each of the papers were analyzed and information relevant to the research questions were extracted.

Findings

It was found that researchers have mostly harnessed data sources such as electronic health records, biomedical literature, social media and health-related forums. The most common text mining technique was natural language processing using tools such as MetaMap and Unstructured Information Management Architecture, alongside the use of medical terminologies such as Unified Medical Language System. The main application area was the detection of adverse drug events. Challenges identified included the need to deal with huge amounts of text, the heterogeneity of the different data sources, the duality of meaning of words in biomedical text and the amount of noise introduced mainly from social media and health-related forums.

Originality/value

To the best of the authors’ knowledge, other reviews in this area have focused on either specific techniques, specific application areas or specific data sources. The results of this review will help researchers to correlate most relevant and recent advances in text mining approaches to specific biomedical application areas by providing an up-to-date and holistic view of work done in this research area. The use of emerging text mining techniques has great potential to spur the development of innovative applications, thus considerably impacting on the advancement of biomedical research.

Details

Journal of Knowledge Management, vol. 25 no. 3
Type: Research Article
ISSN: 1367-3270

Keywords

Article
Publication date: 27 August 2019

Barkha Bansal and Sangeet Srivastava

Vast volumes of rich online consumer-generated content (CGC) can be used effectively to gain important insights for decision-making, product improvement and brand management…

Abstract

Purpose

Vast volumes of rich online consumer-generated content (CGC) can be used effectively to gain important insights for decision-making, product improvement and brand management. Recently, many studies have proposed semi-supervised aspect-based sentiment classification of unstructured CGC. However, most of the existing CGC mining methods rely on explicitly detecting aspect-based sentiments and overlooking the context of sentiment-bearing words. Therefore, this study aims to extract implicit context-sensitive sentiment, and handle slangs, ambiguous, informal and special words used in CGC.

Design/methodology/approach

A novel text mining framework is proposed to detect and evaluate implicit semantic word relations and context. First, POS (part of speech) tagging is used for detecting aspect descriptions and sentiment-bearing words. Then, LDA (latent Dirichlet allocation) is used to group similar aspects together and to form an attribute. Semantically and contextually similar words are found using the skip-gram model for distributed word vectorisation. Finally, to find context-sensitive sentiment of each attribute, cosine similarity is used along with a set of positive and negative seed words.

Findings

Experimental results using more than 400,000 Amazon mobile phone reviews showed that the proposed method efficiently found product attributes and corresponding context-aware sentiments. This method also outperforms the classification accuracy of the baseline model and state-of-the-art techniques using context-sensitive information on data sets from two different domains.

Practical implications

Extracted attributes can be easily classified into consumer issues and brand merits. A brand-based comparative study is presented to demonstrate the practical significance of the proposed approach.

Originality/value

This paper presents a novel method for context-sensitive attribute-based sentiment analysis of CGC, which is useful for both brand and product improvement.

Details

Kybernetes, vol. 50 no. 2
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 21 June 2023

Debasis Majhi and Bhaskar Mukherjee

The purpose of this study is to identify the research fronts by analysing highly cited core papers adjusted with the age of a paper in library and information science (LIS) where…

Abstract

Purpose

The purpose of this study is to identify the research fronts by analysing highly cited core papers adjusted with the age of a paper in library and information science (LIS) where natural language processing (NLP) is being applied significantly.

Design/methodology/approach

By excavating international databases, 3,087 core papers that received at least 5% of the total citations have been identified. By calculating the average mean years of these core papers, and total citations received, a CPT (citation/publication/time) value was calculated in all 20 fronts to understand how a front is relatively receiving greater attention among peers within a course of time. One theme article has been finally identified from each of these 20 fronts.

Findings

Bidirectional encoder representations from transformers with CPT value 1.608 followed by sentiment analysis with CPT 1.292 received highest attention in NLP research. Columbia University New York, in terms of University, Journal of the American Medical Informatics Association, in terms of journals, USA followed by People Republic of China, in terms of country and Xu, H., University of Texas, in terms of author are the top in these fronts. It is identified that the NLP applications boost the performance of digital libraries and automated library systems in the digital environment.

Practical implications

Any research fronts that are identified in the findings of this paper may be used as a base for researchers who intended to perform extensive research on NLP.

Originality/value

To the best of the authors’ knowledge, the methodology adopted in this paper is the first of its kind where meta-analysis approach has been used for understanding the research fronts in sub field like NLP for a broad domain like LIS.

Details

Digital Library Perspectives, vol. 39 no. 3
Type: Research Article
ISSN: 2059-5816

Keywords

Book part
Publication date: 13 March 2023

Xiao Liu

The expansion of marketing data is encouraging the growing use of deep learning (DL) in marketing. I summarize the intuition behind deep learning and explain the mechanisms of six…

Abstract

The expansion of marketing data is encouraging the growing use of deep learning (DL) in marketing. I summarize the intuition behind deep learning and explain the mechanisms of six popular algorithms: three discriminative (convolutional neural network (CNN), recurrent neural network (RNN), and Transformer), two generative (variational autoencoder (VAE) and generative adversarial networks (GAN)), and one RL (DQN). I discuss what marketing problems DL is useful for and what fueled its growth in recent years. I emphasize the power and flexibility of DL for modeling unstructured data when formal theories and knowledge are absent. I also describe future research directions.

Article
Publication date: 4 September 2019

Yi-Hung Liu, Xiaolong Song and Sheng-Fong Chen

Whether automatically generated summaries of health social media can aid users in managing their diseases appropriately is an important question. The purpose of this paper is to…

Abstract

Purpose

Whether automatically generated summaries of health social media can aid users in managing their diseases appropriately is an important question. The purpose of this paper is to introduce a novel text summarization approach for acquiring the most informative summaries from online patient posts accurately and effectively.

Design/methodology/approach

The data set regarding diabetes and HIV posts was, respectively, collected from two online disease forums. The proposed summarizer is based on the graph-based method to generate summaries by considering social network features, text sentiment and sentence features. Representative health-related summaries were identified and summarization performance as well as user judgments were analyzed.

Findings

The findings show that awarding sentences without using all the incorporating features decreases summarization performance compared with the classic summarization method and comparison approaches. The proposed summarizer significantly outperformed the comparison baseline.

Originality/value

This study contributes to the literature on health knowledge management by analyzing patients’ experiences and opinions through the health summarization model. The research additionally develops a new mindset to design abstractive summarization weighting schemes from the health user-generated content.

Details

Aslib Journal of Information Management, vol. 71 no. 6
Type: Research Article
ISSN: 2050-3806

Keywords

Open Access
Article
Publication date: 19 July 2022

Shreyesh Doppalapudi, Tingyan Wang and Robin Qiu

Clinical notes typically contain medical jargons and specialized words and phrases that are complicated and technical to most people, which is one of the most challenging…

1062

Abstract

Purpose

Clinical notes typically contain medical jargons and specialized words and phrases that are complicated and technical to most people, which is one of the most challenging obstacles in health information dissemination to consumers by healthcare providers. The authors aim to investigate how to leverage machine learning techniques to transform clinical notes of interest into understandable expressions.

Design/methodology/approach

The authors propose a natural language processing pipeline that is capable of extracting relevant information from long unstructured clinical notes and simplifying lexicons by replacing medical jargons and technical terms. Particularly, the authors develop an unsupervised keywords matching method to extract relevant information from clinical notes. To automatically evaluate completeness of the extracted information, the authors perform a multi-label classification task on the relevant texts. To simplify lexicons in the relevant text, the authors identify complex words using a sequence labeler and leverage transformer models to generate candidate words for substitution. The authors validate the proposed pipeline using 58,167 discharge summaries from critical care services.

Findings

The results show that the proposed pipeline can identify relevant information with high completeness and simplify complex expressions in clinical notes so that the converted notes have a high level of readability but a low degree of meaning change.

Social implications

The proposed pipeline can help healthcare consumers well understand their medical information and therefore strengthen communications between healthcare providers and consumers for better care.

Originality/value

An innovative pipeline approach is developed to address the health literacy problem confronted by healthcare providers and consumers in the ongoing digital transformation process in the healthcare industry.

Article
Publication date: 8 July 2010

Andreas Vlachidis, Ceri Binding, Douglas Tudhope and Keith May

This paper sets out to discuss the use of information extraction (IE), a natural language‐processing (NLP) technique to assist “rich” semantic indexing of diverse archaeological…

908

Abstract

Purpose

This paper sets out to discuss the use of information extraction (IE), a natural language‐processing (NLP) technique to assist “rich” semantic indexing of diverse archaeological text resources. The focus of the research is to direct a semantic‐aware “rich” indexing of diverse natural language resources with properties capable of satisfying information retrieval from online publications and datasets associated with the Semantic Technologies for Archaeological Resources (STAR) project.

Design/methodology/approach

The paper proposes use of the English Heritage extension (CRM‐EH) of the standard core ontology in cultural heritage, CIDOC CRM, and exploitation of domain thesauri resources for driving and enhancing an Ontology‐Oriented Information Extraction process. The process of semantic indexing is based on a rule‐based Information Extraction technique, which is facilitated by the General Architecture of Text Engineering (GATE) toolkit and expressed by Java Annotation Pattern Engine (JAPE) rules.

Findings

Initial results suggest that the combination of information extraction with knowledge resources and standard conceptual models is capable of supporting semantic‐aware term indexing. Additional efforts are required for further exploitation of the technique and adoption of formal evaluation methods for assessing the performance of the method in measurable terms.

Originality/value

The value of the paper lies in the semantic indexing of 535 unpublished online documents often referred to as “Grey Literature”, from the Archaeological Data Service OASIS corpus (Online AccesS to the Index of archaeological investigationS), with respect to the CRM ontological concepts E49.Time Appellation and P19.Physical Object.

Details

Aslib Proceedings, vol. 62 no. 4/5
Type: Research Article
ISSN: 0001-253X

Keywords

Article
Publication date: 15 June 2023

Abena Owusu and Aparna Gupta

Although risk culture is a key determinant for an effective risk management, identifying the risk culture of a firm can be challenging due to the abstract concept of culture. This…

Abstract

Purpose

Although risk culture is a key determinant for an effective risk management, identifying the risk culture of a firm can be challenging due to the abstract concept of culture. This paper proposes a novel approach that uses unsupervised machine learning techniques to identify significant features needed to assess and differentiate between different forms of risk culture.

Design/methodology/approach

To convert the unstructured text in our sample of banks' 10K reports into structured data, a two-dimensional dictionary for text mining is built to capture risk culture characteristics and the bank's attitude towards the risk culture characteristics. A principal component analysis (PCA) reduction technique is applied to extract the significant features that define risk culture, before using a K-means unsupervised learning to cluster the reports into distinct risk culture groups.

Findings

The PCA identifies uncertainty, litigious and constraining sentiments among risk culture features to be significant in defining the risk culture of banks. Cluster analysis on the PCA factors proposes three distinct risk culture clusters: good, fair and poor. Consistent with regulatory expectations, a good or fair risk culture in banks is characterized by high profitability ratios, bank stability, lower default risk and good governance.

Originality/value

The relationship between culture and risk management can be difficult to study given that it is hard to measure culture from traditional data sources that are messy and diverse. This study offers a better understanding of risk culture using an unsupervised machine learning approach.

Details

International Journal of Managerial Finance, vol. 20 no. 2
Type: Research Article
ISSN: 1743-9132

Keywords

Article
Publication date: 4 February 2021

Ransome Epie Bawack, Samuel Fosso Wamba and Kevin Daniel André Carillo

The current evolution of artificial intelligence (AI) practices and applications is creating a disconnection between modern-day information system (IS) research and practices. The…

2695

Abstract

Purpose

The current evolution of artificial intelligence (AI) practices and applications is creating a disconnection between modern-day information system (IS) research and practices. The purpose of this study is to propose a classification framework that connects the IS discipline to contemporary AI practices.

Design/methodology/approach

We conducted a review of practitioner literature to derive our framework's key dimensions. We reviewed 103 documents on AI published by 25 leading technology companies ranked in the 2019 list of Fortune 500 companies. After that, we reviewed and classified 110 information system (IS) publications on AI using our proposed framework to demonstrate its ability to classify IS research on AI and reveal relevant research gaps.

Findings

Practitioners have adopted different definitional perspectives of AI (field of study, concept, ability, system), explaining the differences in the development, implementation and expectations from AI experienced today. All these perspectives suggest that perception, comprehension, action and learning are the four capabilities AI artifacts must possess. However, leading IS journals have mostly published research adopting the “AI as an ability” perspective of AI with limited theoretical and empirical studies on AI adoption, use and impact.

Research limitations/implications

First, the framework is based on the perceptions of AI by a limited number of companies, although it includes all the companies leading current AI practices. Secondly, the IS literature reviewed is limited to a handful of journals. Thus, the conclusions may not be generalizable. However, they remain true for the articles reviewed, and they all come from well-respected IS journals.

Originality/value

This is the first study to consider the practitioner's AI perspective in designing a conceptual framework for AI research classification. The proposed framework and research agenda are used to show how IS could become a reference discipline in contemporary AI research.

Details

Journal of Enterprise Information Management, vol. 34 no. 2
Type: Research Article
ISSN: 1741-0398

Keywords

1 – 10 of 200