Search results

1 – 10 of 716
Article
Publication date: 30 January 2023

Zhongbao Liu and Wenjuan Zhao

In recent years, Chinese sentiment analysis has made great progress, but the characteristics of the language itself and downstream task requirements were not explored thoroughly…

Abstract

Purpose

In recent years, Chinese sentiment analysis has made great progress, but the characteristics of the language itself and downstream task requirements were not explored thoroughly. It is not practical to directly migrate achievements obtained in English sentiment analysis to the analysis of Chinese because of the huge difference between the two languages.

Design/methodology/approach

In view of the particularity of Chinese text and the requirement of sentiment analysis, a Chinese sentiment analysis model integrating multi-granularity semantic features is proposed in this paper. This model introduces the radical and part-of-speech features based on the character and word features, with the application of bidirectional long short-term memory, attention mechanism and recurrent convolutional neural network.

Findings

The comparative experiments showed that the F1 values of this model reaches 88.28 and 84.80 per cent on the man-made dataset and the NLPECC dataset, respectively. Meanwhile, an ablation experiment was conducted to verify the effectiveness of attention mechanism, part of speech, radical, character and word factors in Chinese sentiment analysis. The performance of the proposed model exceeds that of existing models to some extent.

Originality/value

The academic contribution of this paper is as follows: first, in view of the particularity of Chinese texts and the requirement of sentiment analysis, this paper focuses on solving the deficiency problem of Chinese sentiment analysis under the big data context. Second, this paper borrows ideas from multiple interdisciplinary frontier theories and methods, such as information science, linguistics and artificial intelligence, which makes it innovative and comprehensive. Finally, this paper deeply integrates multi-granularity semantic features such as character, word, radical and part of speech, which further complements the theoretical framework and method system of Chinese sentiment analysis.

Details

Data Technologies and Applications, vol. 57 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 29 August 2023

Hei-Chia Wang, Martinus Maslim and Hung-Yu Liu

A clickbait is a deceptive headline designed to boost ad revenue without presenting closely relevant content. There are numerous negative repercussions of clickbait, such as…

Abstract

Purpose

A clickbait is a deceptive headline designed to boost ad revenue without presenting closely relevant content. There are numerous negative repercussions of clickbait, such as causing viewers to feel tricked and unhappy, causing long-term confusion, and even attracting cyber criminals. Automatic detection algorithms for clickbait have been developed to address this issue. The fact that there is only one semantic representation for the same term and a limited dataset in Chinese is a need for the existing technologies for detecting clickbait. This study aims to solve the limitations of automated clickbait detection in the Chinese dataset.

Design/methodology/approach

This study combines both to train the model to capture the probable relationship between clickbait news headlines and news content. In addition, part-of-speech elements are used to generate the most appropriate semantic representation for clickbait detection, improving clickbait detection performance.

Findings

This research successfully compiled a dataset containing up to 20,896 Chinese clickbait news articles. This collection contains news headlines, articles, categories and supplementary metadata. The suggested context-aware clickbait detection (CA-CD) model outperforms existing clickbait detection approaches on many criteria, demonstrating the proposed strategy's efficacy.

Originality/value

The originality of this study resides in the newly compiled Chinese clickbait dataset and contextual semantic representation-based clickbait detection approach employing transfer learning. This method can modify the semantic representation of each word based on context and assist the model in more precisely interpreting the original meaning of news articles.

Details

Data Technologies and Applications, vol. 58 no. 2
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 10 July 2017

Shubhadeep Mukherjee and Pradip Kumar Bala

The purpose of this paper is to study sarcasm in online text – specifically on twitter – to better understand customer opinions about social issues, products, services, etc. This…

1431

Abstract

Purpose

The purpose of this paper is to study sarcasm in online text – specifically on twitter – to better understand customer opinions about social issues, products, services, etc. This can be immensely helpful in reducing incorrect classification of consumer sentiment toward issues, products and services.

Design/methodology/approach

In this study, 5,000 tweets were downloaded and analyzed. Relevant features were extracted and supervised learning algorithms were applied to identify the best differentiating features between a sarcastic and non-sarcastic sentence.

Findings

The results using two different classification algorithms, namely, Naïve Bayes and maximum entropy show that function words and content words together are most effective in identifying sarcasm in tweets. The most differentiating features between a sarcastic and a non-sarcastic tweet were identified.

Practical implications

Understanding the use of sarcasm in tweets let companies do better sentiment analysis and product recommendations for users. This could help businesses attract new customers and retain the old ones resulting in better customer management.

Originality/value

This paper uses novel features to identify sarcasm in online text which is one of the most challenging problems in natural language processing. To the authors’ knowledge, this is the first study on sarcasm detection from a customer management perspective.

Details

Industrial Management & Data Systems, vol. 117 no. 6
Type: Research Article
ISSN: 0263-5577

Keywords

Article
Publication date: 12 April 2022

Mengjuan Zha, Changping Hu and Yu Shi

Sentiment lexicon is an essential resource for sentiment analysis of user reviews. By far, there is still a lack of domain sentiment lexicon with large scale and high accuracy for…

Abstract

Purpose

Sentiment lexicon is an essential resource for sentiment analysis of user reviews. By far, there is still a lack of domain sentiment lexicon with large scale and high accuracy for Chinese book reviews. This paper aims to construct a large-scale sentiment lexicon based on the ultrashort reviews of Chinese books.

Design/methodology/approach

First, large-scale ultrashort reviews of Chinese books, whose length is no more than six Chinese characters, are collected and preprocessed as candidate sentiment words. Second, non-sentiment words are filtered out through certain rules, such as part of speech rules, context rules, feature word rules and user behaviour rules. Third, the relative frequency is used to select and judge the polarity of sentiment words. Finally, the performance of the sentiment lexicon is evaluated through experiments.

Findings

This paper proposes a method of sentiment lexicon construction based on ultrashort reviews and successfully builds one for Chinese books with nearly 40,000 words based on the Douban book.

Originality/value

Compared with the idea of constructing a sentiment lexicon based on a small number of reviews, the proposed method can give full play to the advantages of data scale to build a corpus. Moreover, different from the computer segmentation method, this method helps to avoid the problems caused by immature segmentation technology and an imperfect N-gram language model.

Details

The Electronic Library , vol. 40 no. 3
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 31 May 2022

Osamah M. Al-Qershi, Junbum Kwon, Shuning Zhao and Zhaokun Li

For the case of many content features, This paper aims to investigate which content features in video and text ads more contribute to accurately predicting the success of…

1089

Abstract

Purpose

For the case of many content features, This paper aims to investigate which content features in video and text ads more contribute to accurately predicting the success of crowdfunding by comparing prediction models.

Design/methodology/approach

With 1,368 features extracted from 15,195 Kickstarter campaigns in the USA, the authors compare base models such as logistic regression (LR) with tree-based homogeneous ensembles such as eXtreme gradient boosting (XGBoost) and heterogeneous ensembles such as XGBoost + LR.

Findings

XGBoost shows higher prediction accuracy than LR (82% vs 69%), in contrast to the findings of a previous relevant study. Regarding important content features, humans (e.g. founders) are more important than visual objects (e.g. products). In both spoken and written language, words related to experience (e.g. eat) or perception (e.g. hear) are more important than cognitive (e.g. causation) words. In addition, a focus on the future is more important than a present or past time orientation. Speech aids (see and compare) to complement visual content are also effective and positive tone matters in speech.

Research limitations/implications

This research makes theoretical contributions by finding more important visuals (human) and language features (experience, perception and future time). Also, in a multimodal context, complementary cues (e.g. speech aids) across different modalities help. Furthermore, the noncontent parts of speech such as positive “tone” or pace of speech are important.

Practical implications

Founders are encouraged to assess and revise the content of their video or text ads as well as their basic campaign features (e.g. goal, duration and reward) before they launch their campaigns. Next, overly complex ensembles may suffer from overfitting problems. In practice, model validation using unseen data is recommended.

Originality/value

Rather than reducing the number of content feature dimensions (Kaminski and Hopp, 2020), by enabling advanced prediction models to accommodate many contents features, prediction accuracy rises substantially.

Article
Publication date: 14 May 2024

Xuemei Tang, Jun Wang and Qi Su

Recent trends have shown the integration of Chinese word segmentation (CWS) and part-of-speech (POS) tagging to enhance syntactic and semantic parsing. However, the potential…

Abstract

Purpose

Recent trends have shown the integration of Chinese word segmentation (CWS) and part-of-speech (POS) tagging to enhance syntactic and semantic parsing. However, the potential utility of hierarchical and structural information in these tasks remains underexplored. This study aims to leverage multiple external knowledge sources (e.g. syntactic and semantic features, lexicons) through various modules for the joint task.

Design/methodology/approach

We introduce a novel learning framework for the joint CWS and POS tagging task, utilizing graph convolutional networks (GCNs) to encode syntactic structure and semantic features. The framework also incorporates a pre-defined lexicon through a lexicon attention module. We evaluate our model on a range of public corpora, including CTB5, PKU and UD, the novel ZX dataset and the comprehensive CTB9 dataset.

Findings

Experimental results on these benchmark corpora demonstrate the effectiveness of our model in improving the performance of the joint task. Notably, we find that syntax information significantly enhances performance, while lexicon information helps mitigate the issue of out-of-vocabulary (OOV) words.

Originality/value

This study introduces a comprehensive approach to the joint CWS and POS tagging task by combining multiple features. Moreover, the proposed framework offers potential adaptability to other sequence labeling tasks, such as named entity recognition (NER).

Details

Aslib Journal of Information Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 9 May 2016

Valeria Noguti

This study aims to uncover relationships between content communities post language, such as parts of speech, and user engagement.

3612

Abstract

Purpose

This study aims to uncover relationships between content communities post language, such as parts of speech, and user engagement.

Design/methodology/approach

Analyses of almost 12,000 posts from the content community Reddit are undertaken. First, posts’ titles are subjected to electronic classification and subsequent counting of main parts of speech and other language elements. Then, statistical models are built to examine the relationships between these elements and user engagement, controlling for variables identified in previous research.

Findings

The number of adjectives and nouns, adverbs, pronouns, punctuation (exclamation marks, quotation marks and ellipses), question marks, advisory words (should, shall, must and have to) and complexity indicators that appear in content community posts’ titles relate to post popularity (scores: number of favourable minus unfavourable votes) and number of comments. However, these relationships vary according to the category, for example, text-based categories (e.g. Politics and World News) vs image-based ones (e.g. Pictures).

Research limitations/implications

While the relationships uncovered are appealing, this research is correlational, so causality cannot be implied.

Practical implications

Among other implications, companies may tailor their own content community post titles to match the types of language related to higher user engagement in a particular category. Companies may also provide advice to brand ambassadors on how to make better use of language to increase user engagement.

Originality/value

This paper shows that language features add explained variance to models of online engagement variables, providing significant contribution to both language and social media researchers and practitioners.

Details

European Journal of Marketing, vol. 50 no. 5/6
Type: Research Article
ISSN: 0309-0566

Keywords

Article
Publication date: 1 January 1953

B.C. VICKERY

The function of a subject index is above all practical: it is a working tool designed to help the user to find his way about the documented information in a given subject field…

Abstract

The function of a subject index is above all practical: it is a working tool designed to help the user to find his way about the documented information in a given subject field. Any system of indexing and classification must be judged by its practical value to the user, not by its conformity to a set of abstract principles. Despite this, it is only on the basis of inductively derived principles that a system can be constructed at all.

Details

Journal of Documentation, vol. 9 no. 1
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 1 March 1998

Robert Gaizauskas and Yorick Wilks

In this paper we give a synoptic view of the growth of the text processing technology of information extraction (IE) whose function is to extract information about a pre‐specified…

1437

Abstract

In this paper we give a synoptic view of the growth of the text processing technology of information extraction (IE) whose function is to extract information about a pre‐specified set of entities, relations or events from natural language texts and to record this information in structured representations called templates. Here we describe the nature of the IE task, review the history of the area from its origins in AI work in the 1960s and 70s till the present, discuss the techniques being used to carry out the task, describe application areas where IE systems are or are about to be at work, and conclude with a discussion of the challenges facing the area. What emerges is a picture of an exciting new text processing technology with a host of new applications, both on its own and in conjunction with other technologies, such as information retrieval, machine translation and data mining.

Details

Journal of Documentation, vol. 54 no. 1
Type: Research Article
ISSN: 0022-0418

Keywords

Open Access
Article
Publication date: 14 August 2020

Paramita Ray and Amlan Chakrabarti

Social networks have changed the communication patterns significantly. Information available from different social networking sites can be well utilized for the analysis of users…

12811

Abstract

Social networks have changed the communication patterns significantly. Information available from different social networking sites can be well utilized for the analysis of users opinion. Hence, the organizations would benefit through the development of a platform, which can analyze public sentiments in the social media about their products and services to provide a value addition in their business process. Over the last few years, deep learning is very popular in the areas of image classification, speech recognition, etc. However, research on the use of deep learning method in sentiment analysis is limited. It has been observed that in some cases the existing machine learning methods for sentiment analysis fail to extract some implicit aspects and might not be very useful. Therefore, we propose a deep learning approach for aspect extraction from text and analysis of users sentiment corresponding to the aspect. A seven layer deep convolutional neural network (CNN) is used to tag each aspect in the opinionated sentences. We have combined deep learning approach with a set of rule-based approach to improve the performance of aspect extraction method as well as sentiment scoring method. We have also tried to improve the existing rule-based approach of aspect extraction by aspect categorization with a predefined set of aspect categories using clustering method and compared our proposed method with some of the state-of-the-art methods. It has been observed that the overall accuracy of our proposed method is 0.87 while that of the other state-of-the-art methods like modified rule-based method and CNN are 0.75 and 0.80 respectively. The overall accuracy of our proposed method shows an increment of 7–12% from that of the state-of-the-art methods.

Details

Applied Computing and Informatics, vol. 18 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

1 – 10 of 716