Search results

1 – 10 of over 4000
Article
Publication date: 5 July 2024

Nouhaila Bensalah, Habib Ayad, Abdellah Adib and Abdelhamid Ibn El Farouk

The paper aims to enhance Arabic machine translation (MT) by proposing novel approaches: (1) a dimensionality reduction technique for word embeddings tailored for Arabic text…

Abstract

Purpose

The paper aims to enhance Arabic machine translation (MT) by proposing novel approaches: (1) a dimensionality reduction technique for word embeddings tailored for Arabic text, optimizing efficiency while retaining semantic information; (2) a comprehensive comparison of meta-embedding techniques to improve translation quality; and (3) a method leveraging self-attention and Gated CNNs to capture token dependencies, including temporal and hierarchical features within sentences, and interactions between different embedding types. These approaches collectively aim to enhance translation quality by combining different embedding schemes and leveraging advanced modeling techniques.

Design/methodology/approach

Recent works on MT in general and Arabic MT in particular often pick one type of word embedding model. In this paper, we present a novel approach to enhance Arabic MT by addressing three key aspects. Firstly, we propose a new dimensionality reduction technique for word embeddings, specifically tailored for Arabic text. This technique optimizes the efficiency of embeddings while retaining their semantic information. Secondly, we conduct an extensive comparison of different meta-embedding techniques, exploring the combination of static and contextual embeddings. Through this analysis, we identify the most effective approach to improve translation quality. Lastly, we introduce a novel method that leverages self-attention and Gated convolutional neural networks (CNNs) to capture token dependencies, including temporal and hierarchical features within sentences, as well as interactions between different types of embeddings. Our experimental results demonstrate the effectiveness of our proposed approach in significantly enhancing Arabic MT performance. It outperforms baseline models with a BLEU score increase of 2 points and achieves superior results compared to state-of-the-art approaches, with an average improvement of 4.6 points across all evaluation metrics.

Findings

The proposed approaches significantly enhance Arabic MT performance. The dimensionality reduction technique improves the efficiency of word embeddings while preserving semantic information. Comprehensive comparison identifies effective meta-embedding techniques, with the contextualized dynamic meta-embeddings (CDME) model showcasing competitive results. Integration of Gated CNNs with the transformer model surpasses baseline performance, leveraging both architectures' strengths. Overall, these findings demonstrate substantial improvements in translation quality, with a BLEU score increase of 2 points and an average improvement of 4.6 points across all evaluation metrics, outperforming state-of-the-art approaches.

Originality/value

The paper’s originality lies in its departure from simply fine-tuning the transformer model for a specific task. Instead, it introduces modifications to the internal architecture of the transformer, integrating Gated CNNs to enhance translation performance. This departure from traditional fine-tuning approaches demonstrates a novel perspective on model enhancement, offering unique insights into improving translation quality without solely relying on pre-existing architectures. The originality in dimensionality reduction lies in the tailored approach for Arabic text. While dimensionality reduction techniques are not new, the paper introduces a specific method optimized for Arabic word embeddings. By employing independent component analysis (ICA) and a post-processing method, the paper effectively reduces the dimensionality of word embeddings while preserving semantic information which has not been investigated before especially for MT task.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 17 no. 3
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 20 September 2023

Hei-Chia Wang, Army Justitia and Ching-Wen Wang

The explosion of data due to the sophistication of information and communication technology makes it simple for prospective tourists to learn about previous hotel guests'…

Abstract

Purpose

The explosion of data due to the sophistication of information and communication technology makes it simple for prospective tourists to learn about previous hotel guests' experiences. They prioritize the rating score when selecting a hotel. However, rating scores are less reliable for suggesting a personalized preference for each aspect, especially when they are in a limited number. This study aims to recommend ratings and personalized preference hotels using cross-domain and aspect-based features.

Design/methodology/approach

We propose an aspect-based cross-domain personalized recommendation (AsCDPR), a novel framework for rating prediction and personalized customer preference recommendations. We incorporate a cross-domain personalized approach and aspect-based features of items from the review text. We extracted aspect-based feature vectors from two domains using bidirectional long short-term memory and then mapped them by a multilayer perceptron (MLP). The cross-domain recommendation module trains MLP to analyze sentiment and predict item ratings and the polarities of the aspect based on user preferences.

Findings

Expanded by its synonyms, aspect-based features significantly improve the performance of sentiment analysis on accuracy and the F1-score matrix. With relatively low mean absolute error and root mean square error values, AsCDPR outperforms matrix factorization, collaborative matrix factorization, EMCDPR and Personalized transfer of user preferences for cross-domain recommendation. These values are 1.3657 and 1.6682, respectively.

Research limitation/implications

This study assists users in recommending hotels based on their priority preferences. Users do not need to read other people's reviews to capture the key aspects of items. This model could enhance system reliability in the hospitality industry by providing personalized recommendations.

Originality/value

This study introduces a new approach that embeds aspect-based features of items in a cross-domain personalized recommendation. AsCDPR predicts ratings and provides recommendations based on priority aspects of each user's preferences.

Article
Publication date: 31 October 2023

Hong Zhou, Binwei Gao, Shilong Tang, Bing Li and Shuyu Wang

The number of construction dispute cases has maintained a high growth trend in recent years. The effective exploration and management of construction contract risk can directly…

Abstract

Purpose

The number of construction dispute cases has maintained a high growth trend in recent years. The effective exploration and management of construction contract risk can directly promote the overall performance of the project life cycle. The miss of clauses may result in a failure to match with standard contracts. If the contract, modified by the owner, omits key clauses, potential disputes may lead to contractors paying substantial compensation. Therefore, the identification of construction project contract missing clauses has heavily relied on the manual review technique, which is inefficient and highly restricted by personnel experience. The existing intelligent means only work for the contract query and storage. It is urgent to raise the level of intelligence for contract clause management. Therefore, this paper aims to propose an intelligent method to detect construction project contract missing clauses based on Natural Language Processing (NLP) and deep learning technology.

Design/methodology/approach

A complete classification scheme of contract clauses is designed based on NLP. First, construction contract texts are pre-processed and converted from unstructured natural language into structured digital vector form. Following the initial categorization, a multi-label classification of long text construction contract clauses is designed to preliminary identify whether the clause labels are missing. After the multi-label clause missing detection, the authors implement a clause similarity algorithm by creatively integrating the image detection thought, MatchPyramid model, with BERT to identify missing substantial content in the contract clauses.

Findings

1,322 construction project contracts were tested. Results showed that the accuracy of multi-label classification could reach 93%, the accuracy of similarity matching can reach 83%, and the recall rate and F1 mean of both can reach more than 0.7. The experimental results verify the feasibility of intelligently detecting contract risk through the NLP-based method to some extent.

Originality/value

NLP is adept at recognizing textual content and has shown promising results in some contract processing applications. However, the mostly used approaches of its utilization for risk detection in construction contract clauses predominantly are rule-based, which encounter challenges when handling intricate and lengthy engineering contracts. This paper introduces an NLP technique based on deep learning which reduces manual intervention and can autonomously identify and tag types of contractual deficiencies, aligning with the evolving complexities anticipated in future construction contracts. Moreover, this method achieves the recognition of extended contract clause texts. Ultimately, this approach boasts versatility; users simply need to adjust parameters such as segmentation based on language categories to detect omissions in contract clauses of diverse languages.

Details

Engineering, Construction and Architectural Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0969-9988

Keywords

Article
Publication date: 30 January 2023

Zhongbao Liu and Wenjuan Zhao

In recent years, Chinese sentiment analysis has made great progress, but the characteristics of the language itself and downstream task requirements were not explored thoroughly…

Abstract

Purpose

In recent years, Chinese sentiment analysis has made great progress, but the characteristics of the language itself and downstream task requirements were not explored thoroughly. It is not practical to directly migrate achievements obtained in English sentiment analysis to the analysis of Chinese because of the huge difference between the two languages.

Design/methodology/approach

In view of the particularity of Chinese text and the requirement of sentiment analysis, a Chinese sentiment analysis model integrating multi-granularity semantic features is proposed in this paper. This model introduces the radical and part-of-speech features based on the character and word features, with the application of bidirectional long short-term memory, attention mechanism and recurrent convolutional neural network.

Findings

The comparative experiments showed that the F1 values of this model reaches 88.28 and 84.80 per cent on the man-made dataset and the NLPECC dataset, respectively. Meanwhile, an ablation experiment was conducted to verify the effectiveness of attention mechanism, part of speech, radical, character and word factors in Chinese sentiment analysis. The performance of the proposed model exceeds that of existing models to some extent.

Originality/value

The academic contribution of this paper is as follows: first, in view of the particularity of Chinese texts and the requirement of sentiment analysis, this paper focuses on solving the deficiency problem of Chinese sentiment analysis under the big data context. Second, this paper borrows ideas from multiple interdisciplinary frontier theories and methods, such as information science, linguistics and artificial intelligence, which makes it innovative and comprehensive. Finally, this paper deeply integrates multi-granularity semantic features such as character, word, radical and part of speech, which further complements the theoretical framework and method system of Chinese sentiment analysis.

Details

Data Technologies and Applications, vol. 57 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 13 August 2024

Samia Nawaz Yousafzai, Hooria Shahbaz, Armughan Ali, Amreen Qamar, Inzamam Mashood Nasir, Sara Tehsin and Robertas Damaševičius

The objective is to develop a more effective model that simplifies and accelerates the news classification process using advanced text mining and deep learning (DL) techniques. A…

Abstract

Purpose

The objective is to develop a more effective model that simplifies and accelerates the news classification process using advanced text mining and deep learning (DL) techniques. A distributed framework utilizing Bidirectional Encoder Representations from Transformers (BERT) was developed to classify news headlines. This approach leverages various text mining and DL techniques on a distributed infrastructure, aiming to offer an alternative to traditional news classification methods.

Design/methodology/approach

This study focuses on the classification of distinct types of news by analyzing tweets from various news channels. It addresses the limitations of using benchmark datasets for news classification, which often result in models that are impractical for real-world applications.

Findings

The framework’s effectiveness was evaluated on a newly proposed dataset and two additional benchmark datasets from the Kaggle repository, assessing the performance of each text mining and classification method across these datasets. The results of this study demonstrate that the proposed strategy significantly outperforms other approaches in terms of accuracy and execution time. This indicates that the distributed framework, coupled with the use of BERT for text analysis, provides a robust solution for analyzing large volumes of data efficiently. The findings also highlight the value of the newly released corpus for further research in news classification and emotion classification, suggesting its potential to facilitate advancements in these areas.

Originality/value

This research introduces an innovative distributed framework for news classification that addresses the shortcomings of models trained on benchmark datasets. By utilizing cutting-edge techniques and a novel dataset, the study offers significant improvements in accuracy and processing speed. The release of the corpus represents a valuable contribution to the field, enabling further exploration into news and emotion classification. This work sets a new standard for the analysis of news data, offering practical implications for the development of more effective and efficient news classification systems.

Details

International Journal of Intelligent Computing and Cybernetics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1756-378X

Keywords

Open Access
Article
Publication date: 11 October 2023

Bachriah Fatwa Dhini, Abba Suganda Girsang, Unggul Utan Sufandi and Heny Kurniawati

The authors constructed an automatic essay scoring (AES) model in a discussion forum where the result was compared with scores given by human evaluators. This research proposes…

Abstract

Purpose

The authors constructed an automatic essay scoring (AES) model in a discussion forum where the result was compared with scores given by human evaluators. This research proposes essay scoring, which is conducted through two parameters, semantic and keyword similarities, using a SentenceTransformers pre-trained model that can construct the highest vector embedding. Combining these models is used to optimize the model with increasing accuracy.

Design/methodology/approach

The development of the model in the study is divided into seven stages: (1) data collection, (2) pre-processing data, (3) selected pre-trained SentenceTransformers model, (4) semantic similarity (sentence pair), (5) keyword similarity, (6) calculate final score and (7) evaluating model.

Findings

The multilingual paraphrase-multilingual-MiniLM-L12-v2 and distilbert-base-multilingual-cased-v1 models got the highest scores from comparisons of 11 pre-trained multilingual models of SentenceTransformers with Indonesian data (Dhini and Girsang, 2023). Both multilingual models were adopted in this study. A combination of two parameters is obtained by comparing the response of the keyword extraction responses with the rubric keywords. Based on the experimental results, proposing a combination can increase the evaluation results by 0.2.

Originality/value

This study uses discussion forum data from the general biology course in online learning at the open university for the 2020.2 and 2021.2 semesters. Forum discussion ratings are still manual. In this survey, the authors created a model that automatically calculates the value of discussion forums, which are essays based on the lecturer's answers moreover rubrics.

Details

Asian Association of Open Universities Journal, vol. 18 no. 3
Type: Research Article
ISSN: 1858-3431

Keywords

Open Access
Article
Publication date: 9 January 2024

Kazuyuki Motohashi and Chen Zhu

This study aims to assess the technological capability of Chinese internet platforms (BAT: Baidu, Alibaba, Tencent) compared to US ones (GAFA: Google, Amazon, Facebook, Apple)…

Abstract

Purpose

This study aims to assess the technological capability of Chinese internet platforms (BAT: Baidu, Alibaba, Tencent) compared to US ones (GAFA: Google, Amazon, Facebook, Apple). More specifically, this study explores Baidu’s technological catching-up process with Google by analyzing their patent textual information.

Design/methodology/approach

The authors retrieved 26,383 Google patents and 6,695 Baidu patents from PATSTAT 2019 Spring version. The collected patent documents were vectorized using the Word2Vec model first, and then K-means clustering was applied to visualize the technological space of two firms. Finally, novel indicators were proposed to capture the technological catching-up process between Baidu and Google.

Findings

The results show that Baidu follows a trend of US rather than Chinese technology which suggests Baidu is aggressively seeking to catch up with US players in the process of its technological development. At the same time, the impact index of Baidu patents increases over time, reflecting its upgrading of technological competitiveness.

Originality/value

This study proposed a new method to analyze technology mapping and evolution based on patent text information. As both US and China are crucial players in the internet industry, it is vital for policymakers in third countries to understand the technological capacity and competitiveness of both countries to develop strategic partnerships effectively.

Details

Asia Pacific Journal of Innovation and Entrepreneurship, vol. 18 no. 3
Type: Research Article
ISSN: 2071-1395

Keywords

Book part
Publication date: 22 November 2023

Chapman J. Lindgren, Wei Wang, Siddharth K. Upadhyay and Vladimer B. Kobayashi

Sentiment analysis is a text analysis method that is developed for systematically detecting, identifying, or extracting the emotional intent of words to infer if the text…

Abstract

Sentiment analysis is a text analysis method that is developed for systematically detecting, identifying, or extracting the emotional intent of words to infer if the text expresses a positive or negative tone. Although this novel method has opened an exciting new avenue for organizational research – mainly due to the abundantly available text data in organizations and the well-developed sentiment analysis techniques, it has also posed a serious challenge to many organizational researchers. This chapter aims to introduce the sentiment analysis method in the text mining area to the organizational research community. In this chapter, the authors first briefly discuss the central role of sentiment in organizational research and then introduce the traditional and modern approaches to sentiment analysis. The authors further delineate research paradigms for text analysis research, advocating the iterative research paradigm (cf., inductive and deductive research paradigms) that is more suitable for text mining research, and also introduce the analytical procedures for sentiment analysis with three stages – discovery, measurement, and inference. More importantly, the authors highlight both the dictionary-based and machine learning (ML) approaches in the measurement stage, with special coverage on deep learning and word embedding techniques as the latest breakthroughs in sentiment and text analyses. Lastly, the authors provide two illustrative examples to demonstrate the applications of sentiment analysis in organizational research. It is the authors’ hope that this chapter – by providing these practical guidelines – will help facilitate more applications of this novel method in organizational research in the future.

Details

Stress and Well-being at the Strategic Level
Type: Book
ISBN: 978-1-83797-359-0

Keywords

Article
Publication date: 29 August 2023

Hei-Chia Wang, Martinus Maslim and Hung-Yu Liu

A clickbait is a deceptive headline designed to boost ad revenue without presenting closely relevant content. There are numerous negative repercussions of clickbait, such as…

Abstract

Purpose

A clickbait is a deceptive headline designed to boost ad revenue without presenting closely relevant content. There are numerous negative repercussions of clickbait, such as causing viewers to feel tricked and unhappy, causing long-term confusion, and even attracting cyber criminals. Automatic detection algorithms for clickbait have been developed to address this issue. The fact that there is only one semantic representation for the same term and a limited dataset in Chinese is a need for the existing technologies for detecting clickbait. This study aims to solve the limitations of automated clickbait detection in the Chinese dataset.

Design/methodology/approach

This study combines both to train the model to capture the probable relationship between clickbait news headlines and news content. In addition, part-of-speech elements are used to generate the most appropriate semantic representation for clickbait detection, improving clickbait detection performance.

Findings

This research successfully compiled a dataset containing up to 20,896 Chinese clickbait news articles. This collection contains news headlines, articles, categories and supplementary metadata. The suggested context-aware clickbait detection (CA-CD) model outperforms existing clickbait detection approaches on many criteria, demonstrating the proposed strategy's efficacy.

Originality/value

The originality of this study resides in the newly compiled Chinese clickbait dataset and contextual semantic representation-based clickbait detection approach employing transfer learning. This method can modify the semantic representation of each word based on context and assist the model in more precisely interpreting the original meaning of news articles.

Details

Data Technologies and Applications, vol. 58 no. 2
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 7 May 2024

Xinzhe Li, Qinglong Li, Dasom Jeong and Jaekyeong Kim

Most previous studies predicting review helpfulness ignored the significance of deep features embedded in review text and instead relied on hand-crafted features. Hand-crafted and…

Abstract

Purpose

Most previous studies predicting review helpfulness ignored the significance of deep features embedded in review text and instead relied on hand-crafted features. Hand-crafted and deep features have the advantages of high interpretability and predictive accuracy. This study aims to propose a novel review helpfulness prediction model that uses deep learning (DL) techniques to consider the complementarity between hand-crafted and deep features.

Design/methodology/approach

First, an advanced convolutional neural network was applied to extract deep features from unstructured review text. Second, this study used previous studies to extract hand-crafted features that impact the helpfulness of reviews and enhance their interpretability. Third, this study incorporated deep and hand-crafted features into a review helpfulness prediction model and evaluated its performance using the Yelp.com data set. To measure the performance of the proposed model, this study used 2,417,796 restaurant reviews.

Findings

Extensive experiments confirmed that the proposed methodology performs better than traditional machine learning methods. Moreover, this study confirms through an empirical analysis that combining hand-crafted and deep features demonstrates better prediction performance.

Originality/value

To the best of the authors’ knowledge, this is one of the first studies to apply DL techniques and use structured and unstructured data to predict review helpfulness in the restaurant context. In addition, an advanced feature-fusion method was adopted to better use the extracted feature information and identify the complementarity between features.

研究目的

大多数先前预测评论有用性的研究忽视了嵌入在评论文本中的深层特征的重要性, 而主要依赖手工制作的特征。手工制作和深层特征具有高解释性和预测准确性的优势。本研究提出了一种新颖的评论有用性预测模型, 利用深度学习技术来考虑手工制作特征和深层特征之间的互补性。

研究方法

首先, 采用先进的卷积神经网络从非结构化的评论文本中提取深层特征。其次, 本研究利用先前研究中提取的手工制作特征, 这些特征影响了评论的有用性并增强了其解释性。第三, 本研究将深层特征和手工制作特征结合到一个评论有用性预测模型中, 并使用Yelp.com数据集对其性能进行评估。为了衡量所提出模型的性能, 本研究使用了2,417,796条餐厅评论。

研究发现

广泛的实验验证了所提出的方法优于传统的机器学习方法。此外, 通过实证分析, 本研究证实了结合手工制作和深层特征可以展现出更好的预测性能。

研究创新

据我们所知, 这是首个在餐厅评论预测中应用深度学习技术, 并结合了结构化和非结构化数据来预测评论有用性的研究之一。此外, 本研究采用了先进的特征融合方法, 更好地利用了提取的特征信息, 并识别了特征之间的互补性。

1 – 10 of over 4000