Search results

1 – 10 of 21
Article
Publication date: 16 August 2022

Jung Ran Park, Erik Poole and Jiexun Li

The purpose of this study is to explore linguistic stylometric patterns encompassing lexical, syntactic, structural, sentiment and politeness features that are found in…

Abstract

Purpose

The purpose of this study is to explore linguistic stylometric patterns encompassing lexical, syntactic, structural, sentiment and politeness features that are found in librarians’ responses to user queries.

Design/methodology/approach

A total of 462 online texts/transcripts comprising answers of librarians to users’ questions drawn from the Internet Public Library were examined. A Principal Component Analysis, which is a data reduction technique, was conducted on the texts and transcripts. Data analysis illustrates the three principal components that predominantly occur in librarians’ answers: stylometric richness, stylometric brevity and interpersonal support.

Findings

The results of the study have important implications in digital information services because stylometric features such as lexical richness, structural clarity and interpersonal support may interplay with the degree of complexity of user queries, the (a)synchronous communication mode, application of information service guideline and manuals and overall characteristics and quality of a given digital information service. Such interplay may bring forth a direct impact on user perceptions and satisfaction regarding interaction with librarians and the information service received through the computer-mediated communication channel.

Originality/value

To the best of the authors’ knowledge, the stylometric features encompassing lexical, syntactic, structural, sentiment and politeness using Principal Component Analysis have not been explored in digital information/reference services. Thus, there is an emergent need to explore more fully how linguistic stylometric features interplay with the types of user queries, the asynchronous online communication mode, application of information service guidelines and the quality of a particular digital information service.

Details

Global Knowledge, Memory and Communication, vol. 73 no. 3
Type: Research Article
ISSN: 2514-9342

Keywords

Open Access
Article
Publication date: 31 July 2020

Omar Alqaryouti, Nur Siyam, Azza Abdel Monem and Khaled Shaalan

Digital resources such as smart applications reviews and online feedback information are important sources to seek customers’ feedback and input. This paper aims to help…

7374

Abstract

Digital resources such as smart applications reviews and online feedback information are important sources to seek customers’ feedback and input. This paper aims to help government entities gain insights on the needs and expectations of their customers. Towards this end, we propose an aspect-based sentiment analysis hybrid approach that integrates domain lexicons and rules to analyse the entities smart apps reviews. The proposed model aims to extract the important aspects from the reviews and classify the corresponding sentiments. This approach adopts language processing techniques, rules, and lexicons to address several sentiment analysis challenges, and produce summarized results. According to the reported results, the aspect extraction accuracy improves significantly when the implicit aspects are considered. Also, the integrated classification model outperforms the lexicon-based baseline and the other rules combinations by 5% in terms of Accuracy on average. Also, when using the same dataset, the proposed approach outperforms machine learning approaches that uses support vector machine (SVM). However, using these lexicons and rules as input features to the SVM model has achieved higher accuracy than other SVM models.

Details

Applied Computing and Informatics, vol. 20 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 20 December 2022

Javaid Ahmad Wani and Shabir Ahmad Ganaie

The current study aims to map the scientific output of grey literature (GL) through bibliometric approaches.

Abstract

Purpose

The current study aims to map the scientific output of grey literature (GL) through bibliometric approaches.

Design/methodology/approach

The source for data extraction is a comprehensive “indexing and abstracting” database, “Web of Science” (WOS). A lexical title search was applied to get the corpus of the study – a total of 4,599 articles were extracted for data analysis and visualisation. Further, the data were analysed by using the data analytical tools, R-studio and VOSViewer.

Findings

The findings showed that the “publications” have substantially grown up during the timeline. The most productive phase (2018–2021) resulted in 47% of articles. The prominent sources were PLOS One and NeuroImage. The highest number of papers were contributed by Haddaway and Kumar. The most relevant countries were the USA and UK.

Practical implications

The study is useful for researchers interested in the GL research domain. The study helps to understand the evolution of the GL to provide research support further in this area.

Originality/value

The present study provides a new orientation to the scholarly output of the GL. The study is rigorous and all-inclusive based on analytical operations like the research networks, collaboration and visualisation. To the best of the authors' knowledge, this manuscript is original, and no similar works have been found with the research objectives included here.

Details

Library Hi Tech, vol. 42 no. 1
Type: Research Article
ISSN: 0737-8831

Keywords

Open Access
Article
Publication date: 31 July 2023

Daniel Šandor and Marina Bagić Babac

Sarcasm is a linguistic expression that usually carries the opposite meaning of what is being said by words, thus making it difficult for machines to discover the actual meaning…

3033

Abstract

Purpose

Sarcasm is a linguistic expression that usually carries the opposite meaning of what is being said by words, thus making it difficult for machines to discover the actual meaning. It is mainly distinguished by the inflection with which it is spoken, with an undercurrent of irony, and is largely dependent on context, which makes it a difficult task for computational analysis. Moreover, sarcasm expresses negative sentiments using positive words, allowing it to easily confuse sentiment analysis models. This paper aims to demonstrate the task of sarcasm detection using the approach of machine and deep learning.

Design/methodology/approach

For the purpose of sarcasm detection, machine and deep learning models were used on a data set consisting of 1.3 million social media comments, including both sarcastic and non-sarcastic comments. The data set was pre-processed using natural language processing methods, and additional features were extracted and analysed. Several machine learning models, including logistic regression, ridge regression, linear support vector and support vector machines, along with two deep learning models based on bidirectional long short-term memory and one bidirectional encoder representations from transformers (BERT)-based model, were implemented, evaluated and compared.

Findings

The performance of machine and deep learning models was compared in the task of sarcasm detection, and possible ways of improvement were discussed. Deep learning models showed more promise, performance-wise, for this type of task. Specifically, a state-of-the-art model in natural language processing, namely, BERT-based model, outperformed other machine and deep learning models.

Originality/value

This study compared the performance of the various machine and deep learning models in the task of sarcasm detection using the data set of 1.3 million comments from social media.

Details

Information Discovery and Delivery, vol. 52 no. 2
Type: Research Article
ISSN: 2398-6247

Keywords

Article
Publication date: 30 May 2023

Dario Aversa

Climate change has a direct impact on companies. Therefore, the scenario analysis is used to provide companies and stakeholders in this specific sector with forward-looking…

Abstract

Purpose

Climate change has a direct impact on companies. Therefore, the scenario analysis is used to provide companies and stakeholders in this specific sector with forward-looking measures and narratives of the world's future state. This work aims to provide an independent, wide and rigorous literature review on the topics of scenario analysis and climate change, analyzing a large set of referred papers included in economic journals on the Web of Science Clarivate Analytics data source. This review, by means of a mixed approach, can help address new policy strategies and business models.

Design/methodology/approach

The work employs 416 abstracts and relative titles in the field of economics, employing data mining for qualitative variables and performing descriptive statistics and lexicometric measures, similarity analysis and clustering with Reinert's hierarchical method in order to extract knowledge. Furthermore, qualitative content analysis allows for the return of a comprehensive and complete universe of meaning, as well as the analysis of co-occurences.

Findings

Content analysis reveals three main classification clusters and four unknown patterns: model area, risks, emissions and energy and carbon pricing, indicating research directions and limitations through an overview with an extensive reference bibliography. In the research, the prevalent use of quantitative instruments and their limitations emerge, while qualitative instruments are residual for climate change assessment; they also highlight the centrality of transition risk over adaptation measures and the combination of different types of instruments with reference to carbon pricing.

Originality/value

Scenario analysis is a relatively new topic in economics and finance research, and it is under-investigated by the academy. The analysis combines quantitative and qualitative research using text analytics.

Details

British Food Journal, vol. 126 no. 1
Type: Research Article
ISSN: 0007-070X

Keywords

Article
Publication date: 5 May 2023

Ying Yu and Jing Ma

The tender documents, an essential data source for internet-based logistics tendering platforms, incorporate massive fine-grained data, ranging from information on tenderee…

Abstract

Purpose

The tender documents, an essential data source for internet-based logistics tendering platforms, incorporate massive fine-grained data, ranging from information on tenderee, shipping location and shipping items. Automated information extraction in this area is, however, under-researched, making the extraction process a time- and effort-consuming one. For Chinese logistics tender entities, in particular, existing named entity recognition (NER) solutions are mostly unsuitable as they involve domain-specific terminologies and possess different semantic features.

Design/methodology/approach

To tackle this problem, a novel lattice long short-term memory (LSTM) model, combining a variant contextual feature representation and a conditional random field (CRF) layer, is proposed in this paper for identifying valuable entities from logistic tender documents. Instead of traditional word embedding, the proposed model uses the pretrained Bidirectional Encoder Representations from Transformers (BERT) model as input to augment the contextual feature representation. Subsequently, with the Lattice-LSTM model, the information of characters and words is effectively utilized to avoid error segmentation.

Findings

The proposed model is then verified by the Chinese logistic tender named entity corpus. Moreover, the results suggest that the proposed model excels in the logistics tender corpus over other mainstream NER models. The proposed model underpins the automatic extraction of logistics tender information, enabling logistic companies to perceive the ever-changing market trends and make far-sighted logistic decisions.

Originality/value

(1) A practical model for logistic tender NER is proposed in the manuscript. By employing and fine-tuning BERT into the downstream task with a small amount of data, the experiment results show that the model has a better performance than other existing models. This is the first study, to the best of the authors' knowledge, to extract named entities from Chinese logistic tender documents. (2) A real logistic tender corpus for practical use is constructed and a program of the model for online-processing real logistic tender documents is developed in this work. The authors believe that the model will facilitate logistic companies in converting unstructured documents to structured data and further perceive the ever-changing market trends to make far-sighted logistic decisions.

Details

Data Technologies and Applications, vol. 58 no. 1
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 7 July 2023

Wuyan Liang and Xiaolong Xu

In the COVID-19 era, sign language (SL) translation has gained attention in online learning, which evaluates the physical gestures of each student and bridges the communication…

Abstract

Purpose

In the COVID-19 era, sign language (SL) translation has gained attention in online learning, which evaluates the physical gestures of each student and bridges the communication gap between dysphonia and hearing people. The purpose of this paper is to devote the alignment between SL sequence and nature language sequence with high translation performance.

Design/methodology/approach

SL can be characterized as joint/bone location information in two-dimensional space over time, forming skeleton sequences. To encode joint, bone and their motion information, we propose a multistream hierarchy network (MHN) along with a vocab prediction network (VPN) and a joint network (JN) with the recurrent neural network transducer. The JN is used to concatenate the sequences encoded by the MHN and VPN and learn their sequence alignments.

Findings

We verify the effectiveness of the proposed approach and provide experimental results on three large-scale datasets, which show that translation accuracy is 94.96, 54.52, and 92.88 per cent, and the inference time is 18 and 1.7 times faster than listen-attend-spell network (LAS) and visual hierarchy to lexical sequence network (H2SNet) , respectively.

Originality/value

In this paper, we propose a novel framework that can fuse multimodal input (i.e. joint, bone and their motion stream) and align input streams with nature language. Moreover, the provided framework is improved by the different properties of MHN, VPN and JN. Experimental results on the three datasets demonstrate that our approaches outperform the state-of-the-art methods in terms of translation accuracy and speed.

Details

Data Technologies and Applications, vol. 58 no. 2
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 3 October 2023

Anna Sokolova, Polina Lobanova and Ilya Kuzminov

The purpose of the paper is to present an integrated methodology for identifying trends in a particular subject area based on a combination of advanced text mining and expert…

Abstract

Purpose

The purpose of the paper is to present an integrated methodology for identifying trends in a particular subject area based on a combination of advanced text mining and expert methods. The authors aim to test it in an area of clinical psychology and psychotherapy in 2010–2019.

Design/methodology/approach

The authors demonstrate the way of applying text-mining and the Word2Vec model to identify hot topics (HT) and emerging trends (ET) in clinical psychology and psychotherapy. The analysis of 11.3 million scientific publications in the Microsoft Academic Graph database revealed the most rapidly growing clinical psychology and psychotherapy terms – those with the largest increase in the number of publications reflecting real or potential trends.

Findings

The proposed approach allows one to identify HT and ET for the six thematic clusters related to mental disorders, symptoms, pharmacology, psychotherapy, treatment techniques and important psychological skills.

Practical implications

The developed methodology allows one to see the broad picture of the most dynamic research areas in the field of clinical psychology and psychotherapy in 2010–2019. For clinicians, who are often overwhelmed by practical work, this map of the current research can help identify the areas worthy of further attention to improve the effectiveness of their clinical work. This methodology might be applied for the identification of trends in any other subject area by taking into account its specificity.

Originality/value

The paper demonstrates the value of the advanced text-mining approach for understanding trends in a subject area. To the best of the authors’ knowledge, for the first time, text-mining and the Word2Vec model have been applied to identifying trends in the field of clinical psychology and psychotherapy.

Details

foresight, vol. 26 no. 1
Type: Research Article
ISSN: 1463-6689

Keywords

Article
Publication date: 6 February 2024

Somayeh Tamjid, Fatemeh Nooshinfard, Molouk Sadat Hosseini Beheshti, Nadjla Hariri and Fahimeh Babalhavaeji

The purpose of this study is to develop a domain independent, cost-effective, time-saving and semi-automated ontology generation framework that could extract taxonomic concepts…

Abstract

Purpose

The purpose of this study is to develop a domain independent, cost-effective, time-saving and semi-automated ontology generation framework that could extract taxonomic concepts from unstructured text corpus. In the human disease domain, ontologies are found to be extremely useful for managing the diversity of technical expressions in favour of information retrieval objectives. The boundaries of these domains are expanding so fast that it is essential to continuously develop new ontologies or upgrade available ones.

Design/methodology/approach

This paper proposes a semi-automated approach that extracts entities/relations via text mining of scientific publications. Text mining-based ontology (TmbOnt)-named code is generated to assist a user in capturing, processing and establishing ontology elements. This code takes a pile of unstructured text files as input and projects them into high-valued entities or relations as output. As a semi-automated approach, a user supervises the process, filters meaningful predecessor/successor phrases and finalizes the demanded ontology-taxonomy. To verify the practical capabilities of the scheme, a case study was performed to drive glaucoma ontology-taxonomy. For this purpose, text files containing 10,000 records were collected from PubMed.

Findings

The proposed approach processed over 3.8 million tokenized terms of those records and yielded the resultant glaucoma ontology-taxonomy. Compared with two famous disease ontologies, TmbOnt-driven taxonomy demonstrated a 60%–100% coverage ratio against famous medical thesauruses and ontology taxonomies, such as Human Disease Ontology, Medical Subject Headings and National Cancer Institute Thesaurus, with an average of 70% additional terms recommended for ontology development.

Originality/value

According to the literature, the proposed scheme demonstrated novel capability in expanding the ontology-taxonomy structure with a semi-automated text mining approach, aiming for future fully-automated approaches.

Details

The Electronic Library , vol. 42 no. 2
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 22 June 2023

Chiara Alzetta, Felice Dell'Orletta, Alessio Miaschi, Elena Prat and Giulia Venturi

The authors’ goal is to investigate variations in the writing style of book reviews published on different social reading platforms and referring to books of different genres…

Abstract

Purpose

The authors’ goal is to investigate variations in the writing style of book reviews published on different social reading platforms and referring to books of different genres, which enables acquiring insights into communication strategies adopted by readers to share their reading experiences.

Design/methodology/approach

The authors propose a corpus-based study focused on the analysis of A Good Review, a novel corpus of online book reviews written in Italian, posted on Amazon and Goodreads, and covering six literary fiction genres. The authors rely on stylometric analysis to explore the linguistic properties and lexicon of reviews and the authors conducted automatic classification experiments using multiple approaches and feature configurations to predict either the review's platform or the literary genre.

Findings

The analysis of user-generated reviews demonstrates that language is a quite variable dimension across reading platforms, but not as much across book genres. The classification experiments revealed that features modelling the syntactic structure of the sentence are reliable proxies for discerning Amazon and Goodreads reviews, whereas lexical information showed a higher predictive role for automatically discriminating the genre.

Originality/value

The high availability of cultural products makes information services necessary to help users navigate these resources and acquire information from unstructured data. This study contributes to a better understanding of the linguistic characteristics of user-generated book reviews, which can support the development of linguistically-informed recommendation services. Additionally, the authors release a novel corpus of online book reviews meant to support the reproducibility and advancements of the research.

Details

Journal of Documentation, vol. 80 no. 1
Type: Research Article
ISSN: 0022-0418

Keywords

Access

Year

Last 6 months (21)

Content type

Article (21)
1 – 10 of 21