To read this content please select one of the options below:

CA-CD: context-aware clickbait detection using new Chinese clickbait dataset with transfer learning method

Hei-Chia Wang (Institute of Information Management, National Cheng Kung University, Tainan, Taiwan) (Center for Innovative FinTech Business Models, National Cheng Kung University, Tainan, Taiwan)
Martinus Maslim (Institute of Information Management, National Cheng Kung University, Tainan, Taiwan) (Universitas Atma Jaya Yogyakarta, Yogyakarta, Indonesia)
Hung-Yu Liu (Institute of Information Management, National Cheng Kung University, Tainan, Taiwan)

Data Technologies and Applications

ISSN: 2514-9288

Article publication date: 29 August 2023

Issue publication date: 15 April 2024

72

Abstract

Purpose

A clickbait is a deceptive headline designed to boost ad revenue without presenting closely relevant content. There are numerous negative repercussions of clickbait, such as causing viewers to feel tricked and unhappy, causing long-term confusion, and even attracting cyber criminals. Automatic detection algorithms for clickbait have been developed to address this issue. The fact that there is only one semantic representation for the same term and a limited dataset in Chinese is a need for the existing technologies for detecting clickbait. This study aims to solve the limitations of automated clickbait detection in the Chinese dataset.

Design/methodology/approach

This study combines both to train the model to capture the probable relationship between clickbait news headlines and news content. In addition, part-of-speech elements are used to generate the most appropriate semantic representation for clickbait detection, improving clickbait detection performance.

Findings

This research successfully compiled a dataset containing up to 20,896 Chinese clickbait news articles. This collection contains news headlines, articles, categories and supplementary metadata. The suggested context-aware clickbait detection (CA-CD) model outperforms existing clickbait detection approaches on many criteria, demonstrating the proposed strategy's efficacy.

Originality/value

The originality of this study resides in the newly compiled Chinese clickbait dataset and contextual semantic representation-based clickbait detection approach employing transfer learning. This method can modify the semantic representation of each word based on context and assist the model in more precisely interpreting the original meaning of news articles.

Keywords

Acknowledgements

Funding: The research is based on work supported by Taiwan Ministry of Science and Technology under Grant No. MOST 107-2410-H-006 040-MY3 and MOST 108-2511-H-006-009. We would like to thank partially research grant supported by “Higher Education SPROUT Project” and “Center for Innovative FinTech Business Models” of National Cheng Kung University (NCKU), sponsored by the Ministry of Education, Taiwan.

Citation

Wang, H.-C., Maslim, M. and Liu, H.-Y. (2024), "CA-CD: context-aware clickbait detection using new Chinese clickbait dataset with transfer learning method", Data Technologies and Applications, Vol. 58 No. 2, pp. 243-266. https://doi.org/10.1108/DTA-03-2023-0072

Publisher

:

Emerald Publishing Limited

Copyright © 2023, Emerald Publishing Limited

Related articles