Search results
1 – 10 of 19Qinxu Ding, Ding Ding, Yue Wang, Chong Guan and Bosheng Ding
The rapid rise of large language models (LLMs) has propelled them to the forefront of applications in natural language processing (NLP). This paper aims to present a comprehensive…
Abstract
Purpose
The rapid rise of large language models (LLMs) has propelled them to the forefront of applications in natural language processing (NLP). This paper aims to present a comprehensive examination of the research landscape in LLMs, providing an overview of the prevailing themes and topics within this dynamic domain.
Design/methodology/approach
Drawing from an extensive corpus of 198 records published between 1996 to 2023 from the relevant academic database encompassing journal articles, books, book chapters, conference papers and selected working papers, this study delves deep into the multifaceted world of LLM research. In this study, the authors employed the BERTopic algorithm, a recent advancement in topic modeling, to conduct a comprehensive analysis of the data after it had been meticulously cleaned and preprocessed. BERTopic leverages the power of transformer-based language models like bidirectional encoder representations from transformers (BERT) to generate more meaningful and coherent topics. This approach facilitates the identification of hidden patterns within the data, enabling authors to uncover valuable insights that might otherwise have remained obscure. The analysis revealed four distinct clusters of topics in LLM research: “language and NLP”, “education and teaching”, “clinical and medical applications” and “speech and recognition techniques”. Each cluster embodies a unique aspect of LLM application and showcases the breadth of possibilities that LLM technology has to offer. In addition to presenting the research findings, this paper identifies key challenges and opportunities in the realm of LLMs. It underscores the necessity for further investigation in specific areas, including the paramount importance of addressing potential biases, transparency and explainability, data privacy and security, and responsible deployment of LLM technology.
Findings
The analysis revealed four distinct clusters of topics in LLM research: “language and NLP”, “education and teaching”, “clinical and medical applications” and “speech and recognition techniques”. Each cluster embodies a unique aspect of LLM application and showcases the breadth of possibilities that LLM technology has to offer. In addition to presenting the research findings, this paper identifies key challenges and opportunities in the realm of LLMs. It underscores the necessity for further investigation in specific areas, including the paramount importance of addressing potential biases, transparency and explainability, data privacy and security, and responsible deployment of LLM technology.
Practical implications
This classification offers practical guidance for researchers, developers, educators, and policymakers to focus efforts and resources. The study underscores the importance of addressing challenges in LLMs, including potential biases, transparency, data privacy, and responsible deployment. Policymakers can utilize this information to shape regulations, while developers can tailor technology development based on the diverse applications identified. The findings also emphasize the need for interdisciplinary collaboration and highlight ethical considerations, providing a roadmap for navigating the complex landscape of LLM research and applications.
Originality/value
This study stands out as the first to examine the evolution of LLMs across such a long time frame and across such diversified disciplines. It provides a unique perspective on the key areas of LLM research, highlighting the breadth and depth of LLM’s evolution.
Details
Keywords
Shreyesh Doppalapudi, Tingyan Wang and Robin Qiu
Clinical notes typically contain medical jargons and specialized words and phrases that are complicated and technical to most people, which is one of the most challenging…
Abstract
Purpose
Clinical notes typically contain medical jargons and specialized words and phrases that are complicated and technical to most people, which is one of the most challenging obstacles in health information dissemination to consumers by healthcare providers. The authors aim to investigate how to leverage machine learning techniques to transform clinical notes of interest into understandable expressions.
Design/methodology/approach
The authors propose a natural language processing pipeline that is capable of extracting relevant information from long unstructured clinical notes and simplifying lexicons by replacing medical jargons and technical terms. Particularly, the authors develop an unsupervised keywords matching method to extract relevant information from clinical notes. To automatically evaluate completeness of the extracted information, the authors perform a multi-label classification task on the relevant texts. To simplify lexicons in the relevant text, the authors identify complex words using a sequence labeler and leverage transformer models to generate candidate words for substitution. The authors validate the proposed pipeline using 58,167 discharge summaries from critical care services.
Findings
The results show that the proposed pipeline can identify relevant information with high completeness and simplify complex expressions in clinical notes so that the converted notes have a high level of readability but a low degree of meaning change.
Social implications
The proposed pipeline can help healthcare consumers well understand their medical information and therefore strengthen communications between healthcare providers and consumers for better care.
Originality/value
An innovative pipeline approach is developed to address the health literacy problem confronted by healthcare providers and consumers in the ongoing digital transformation process in the healthcare industry.
Details
Keywords
Reema Khaled AlRowais and Duaa Alsaeed
Automatically extracting stance information from natural language texts is a significant research problem with various applications, particularly after the recent explosion of…
Abstract
Purpose
Automatically extracting stance information from natural language texts is a significant research problem with various applications, particularly after the recent explosion of data on the internet via platforms like social media sites. Stance detection system helps determine whether the author agree, against or has a neutral opinion with the given target. Most of the research in stance detection focuses on the English language, while few research was conducted on the Arabic language.
Design/methodology/approach
This paper aimed to address stance detection on Arabic tweets by building and comparing different stance detection models using four transformers, namely: Araelectra, MARBERT, AraBERT and Qarib. Using different weights for these transformers, the authors performed extensive experiments fine-tuning the task of stance detection Arabic tweets with the four different transformers.
Findings
The results showed that the AraBERT model learned better than the other three models with a 70% F1 score followed by the Qarib model with a 68% F1 score.
Research limitations/implications
A limitation of this study is the imbalanced dataset and the limited availability of annotated datasets of SD in Arabic.
Originality/value
Provide comprehensive overview of the current resources for stance detection in the literature, including datasets and machine learning methods used. Therefore, the authors examined the models to analyze and comprehend the obtained findings in order to make recommendations for the best performance models for the stance detection task.
Details
Keywords
Bahareh Farhoudinia, Selcen Ozturkcan and Nihat Kasap
This paper aims to conduct an interdisciplinary systematic literature review (SLR) of fake news research and to advance the socio-technical understanding of digital information…
Abstract
Purpose
This paper aims to conduct an interdisciplinary systematic literature review (SLR) of fake news research and to advance the socio-technical understanding of digital information practices and platforms in business and management studies.
Design/methodology/approach
The paper applies a focused, SLR method to analyze articles on fake news in business and management journals from 2010 to 2020.
Findings
The paper analyzes the definition, theoretical frameworks, methods and research gaps of fake news in the business and management domains. It also identifies some promising research opportunities for future scholars.
Practical implications
The paper offers practical implications for various stakeholders who are affected by or involved in fake news dissemination, such as brands, consumers and policymakers. It provides recommendations to cope with the challenges and risks of fake news.
Social implications
The paper discusses the social consequences and future threats of fake news, especially in relation to social networking and social media. It calls for more awareness and responsibility from online communities to prevent and combat fake news.
Originality/value
The paper contributes to the literature on information management by showing the importance and consequences of fake news sharing for societies. It is among the frontier systematic reviews in the field that covers studies from different disciplines and focuses on business and management studies.
Details
Keywords
Martin Nečaský, Petr Škoda, David Bernhauer, Jakub Klímek and Tomáš Skopal
Semantic retrieval and discovery of datasets published as open data remains a challenging task. The datasets inherently originate in the globally distributed web jungle, lacking…
Abstract
Purpose
Semantic retrieval and discovery of datasets published as open data remains a challenging task. The datasets inherently originate in the globally distributed web jungle, lacking the luxury of centralized database administration, database schemes, shared attributes, vocabulary, structure and semantics. The existing dataset catalogs provide basic search functionality relying on keyword search in brief, incomplete or misleading textual metadata attached to the datasets. The search results are thus often insufficient. However, there exist many ways of improving the dataset discovery by employing content-based retrieval, machine learning tools, third-party (external) knowledge bases, countless feature extraction methods and description models and so forth.
Design/methodology/approach
In this paper, the authors propose a modular framework for rapid experimentation with methods for similarity-based dataset discovery. The framework consists of an extensible catalog of components prepared to form custom pipelines for dataset representation and discovery.
Findings
The study proposes several proof-of-concept pipelines including experimental evaluation, which showcase the usage of the framework.
Originality/value
To the best of authors’ knowledge, there is no similar formal framework for experimentation with various similarity methods in the context of dataset discovery. The framework has the ambition to establish a platform for reproducible and comparable research in the area of dataset discovery. The prototype implementation of the framework is available on GitHub.
Details
Keywords
This paper purposed a multi-facet sentiment analysis system.
Abstract
Purpose
This paper purposed a multi-facet sentiment analysis system.
Design/methodology/approach
Hence, This paper uses multidomain resources to build a sentiment analysis system. The manual lexicon based features that are extracted from the resources are fed into a machine learning classifier to compare their performance afterward. The manual lexicon is replaced with a custom BOW to deal with its time consuming construction. To help the system run faster and make the model interpretable, this will be performed by employing different existing and custom approaches such as term occurrence, information gain, principal component analysis, semantic clustering, and POS tagging filters.
Findings
The proposed system featured by lexicon extraction automation and characteristics size optimization proved its efficiency when applied to multidomain and benchmark datasets by reaching 93.59% accuracy which makes it competitive to the state-of-the-art systems.
Originality/value
The construction of a custom BOW. Optimizing features based on existing and custom feature selection and clustering approaches.
Details
Keywords
Linzi Wang, Qiudan Li, Jingjun David Xu and Minjie Yuan
Mining user-concerned actionable and interpretable hot topics will help management departments fully grasp the latest events and make timely decisions. Existing topic models…
Abstract
Purpose
Mining user-concerned actionable and interpretable hot topics will help management departments fully grasp the latest events and make timely decisions. Existing topic models primarily integrate word embedding and matrix decomposition, which only generates keyword-based hot topics with weak interpretability, making it difficult to meet the specific needs of users. Mining phrase-based hot topics with syntactic dependency structure have been proven to model structure information effectively. A key challenge lies in the effective integration of the above information into the hot topic mining process.
Design/methodology/approach
This paper proposes the nonnegative matrix factorization (NMF)-based hot topic mining method, semantics syntax-assisted hot topic model (SSAHM), which combines semantic association and syntactic dependency structure. First, a semantic–syntactic component association matrix is constructed. Then, the matrix is used as a constraint condition to be incorporated into the block coordinate descent (BCD)-based matrix decomposition process. Finally, a hot topic information-driven phrase extraction algorithm is applied to describe hot topics.
Findings
The efficacy of the developed model is demonstrated on two real-world datasets, and the effects of dependency structure information on different topics are compared. The qualitative examples further explain the application of the method in real scenarios.
Originality/value
Most prior research focuses on keyword-based hot topics. Thus, the literature is advanced by mining phrase-based hot topics with syntactic dependency structure, which can effectively analyze the semantics. The development of syntactic dependency structure considering the combination of word order and part-of-speech (POS) is a step forward as word order, and POS are only separately utilized in the prior literature. Ignoring this synergy may miss important information, such as grammatical structure coherence and logical relations between syntactic components.
Details
Keywords
Stanislav Ivanov and Mohammad Soliman
The paper aims to evaluate the ways ChatGPT is going to disrupt tourism education and research.
Abstract
Purpose
The paper aims to evaluate the ways ChatGPT is going to disrupt tourism education and research.
Design/methodology/approach
This is a conceptual paper.
Findings
ChatGPT has the potential to revolutionize tourism education and research because it can do what students and researchers should do, namely, generate text (assignments and research papers). Universities will need to reevaluate their teaching and assessment strategies and incorporate generative language models in teaching. Publishers will need to be more receptive toward manuscripts that are partially generated by artificial intelligence. In the future, digital teachers and research assistants will take over many of the cognitive tasks of tourism educators and researchers.
Originality/value
To the authors’ best knowledge, this is one of the first academic papers that investigates the implications of ChatGPT to tourism education and research.
Details
Keywords
Dhong Fhel K. Gom-os and Kelvin Y. Yong
The goal of this study is to test the real-world use of an emotion recognition system.
Abstract
Purpose
The goal of this study is to test the real-world use of an emotion recognition system.
Design/methodology/approach
The researchers chose an existing algorithm that displayed high accuracy and speed. Four emotions: happy, sadness, anger and surprise, are used from six of the universal emotions, associated by their own mood markers. The mood-matrix interface is then coded as a web application. Four guidance counselors and 10 students participated in the testing of the mood-matrix. Guidance counselors answered the technology acceptance model (TAM) to assess its usefulness, and the students answered the general comfort questionnaire (GCQ) to assess their comfort levels.
Findings
Results from TAM found that the mood-matrix has significant use for the guidance counselors and the GCQ finds that the students were comfortable during testing.
Originality/value
No study yet has tested an emotion recognition system applied to counseling or any mental health or psychological transactions.
Details
Keywords
Ahmed A. Khalifa and Mariam A. Ibrahim
The study aims to evaluate PubMed publications on ChatGPT or artificial intelligence (AI) involvement in scientific or medical writing and investigate whether ChatGPT or AI was…
Abstract
Purpose
The study aims to evaluate PubMed publications on ChatGPT or artificial intelligence (AI) involvement in scientific or medical writing and investigate whether ChatGPT or AI was used to create these articles or listed as authors.
Design/methodology/approach
This scoping review was conducted according to Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews (PRISMA-ScR) guidelines. A PubMed database search was performed for articles published between January 1 and November 29, 2023, using appropriate search terms; both authors performed screening and selection independently.
Findings
From the initial search results of 127 articles, 41 were eligible for final analysis. Articles were published in 34 journals. Editorials were the most common article type, with 15 (36.6%) articles. Authors originated from 27 countries, and authors from the USA contributed the most, with 14 (34.1%) articles. The most discussed topic was AI tools and writing capabilities in 19 (46.3%) articles. AI or ChatGPT was involved in manuscript preparation in 31 (75.6%) articles. None of the articles listed AI or ChatGPT as an author, and in 19 (46.3%) articles, the authors acknowledged utilizing AI or ChatGPT.
Practical implications
Researchers worldwide are concerned with AI or ChatGPT involvement in scientific research, specifically the writing process. The authors believe that precise and mature regulations will be developed soon by journals, publishers and editors, which will pave the way for the best usage of these tools.
Originality/value
This scoping review expressed data published on using AI or ChatGPT in various scientific research and writing aspects, besides alluding to the advantages, disadvantages and implications of their usage.
Details