To read this content please select one of the options below:

X-News dataset for online news categorization

Samia Nawaz Yousafzai (Department of Computer Science, HITEC University, Taxila Cantt, Pakistan)
Hooria Shahbaz (Department of Computer Science, HITEC University, Taxila Cantt, Pakistan)
Armughan Ali (Department of Computer Science, HITEC University, Taxila Cantt, Pakistan)
Amreen Qamar (Department of Statistics, Quaid-I-Azam University, Islamabad, Pakistan)
Inzamam Mashood Nasir (Centre of Real Time Computer Systems, Kaunas University of Technology, Kaunas, Lithuania)
Sara Tehsin (Centre of Real Time Computer Systems, Kaunas University of Technology, Kaunas, Lithuania)
Robertas Damaševičius (Vytautas Magnus University, Kaunas, Lithuania)

International Journal of Intelligent Computing and Cybernetics

ISSN: 1756-378X

Article publication date: 13 August 2024

Issue publication date: 11 November 2024

45

Abstract

Purpose

The objective is to develop a more effective model that simplifies and accelerates the news classification process using advanced text mining and deep learning (DL) techniques. A distributed framework utilizing Bidirectional Encoder Representations from Transformers (BERT) was developed to classify news headlines. This approach leverages various text mining and DL techniques on a distributed infrastructure, aiming to offer an alternative to traditional news classification methods.

Design/methodology/approach

This study focuses on the classification of distinct types of news by analyzing tweets from various news channels. It addresses the limitations of using benchmark datasets for news classification, which often result in models that are impractical for real-world applications.

Findings

The framework’s effectiveness was evaluated on a newly proposed dataset and two additional benchmark datasets from the Kaggle repository, assessing the performance of each text mining and classification method across these datasets. The results of this study demonstrate that the proposed strategy significantly outperforms other approaches in terms of accuracy and execution time. This indicates that the distributed framework, coupled with the use of BERT for text analysis, provides a robust solution for analyzing large volumes of data efficiently. The findings also highlight the value of the newly released corpus for further research in news classification and emotion classification, suggesting its potential to facilitate advancements in these areas.

Originality/value

This research introduces an innovative distributed framework for news classification that addresses the shortcomings of models trained on benchmark datasets. By utilizing cutting-edge techniques and a novel dataset, the study offers significant improvements in accuracy and processing speed. The release of the corpus represents a valuable contribution to the field, enabling further exploration into news and emotion classification. This work sets a new standard for the analysis of news data, offering practical implications for the development of more effective and efficient news classification systems.

Keywords

Citation

Yousafzai, S.N., Shahbaz, H., Ali, A., Qamar, A., Nasir, I.M., Tehsin, S. and Damaševičius, R. (2024), "X-News dataset for online news categorization", International Journal of Intelligent Computing and Cybernetics, Vol. 17 No. 4, pp. 737-758. https://doi.org/10.1108/IJICC-04-2024-0184

Publisher

:

Emerald Publishing Limited

Copyright © 2024, Emerald Publishing Limited

Related articles