To read this content please select one of the options below:

Web spam detection using trust and distrust-based ant colony optimization learning

Bundit Manaskasemsak (Department of Computer Engineering, Faculty of Engineering, Kasetsart University, Bangkok, Thailand)
Arnon Rungsawang (Department of Computer Engineering, Faculty of Engineering, Kasetsart University, Bangkok, Thailand)

International Journal of Web Information Systems

ISSN: 1744-0084

Article publication date: 15 June 2015

277

Abstract

Purpose

This paper aims to present a machine learning approach for solving the problem of Web spam detection. Based on an adoption of the ant colony optimization (ACO), three algorithms are proposed to construct rule-based classifiers to distinguish between non-spam and spam hosts. Moreover, the paper also proposes an adaptive learning technique to enhance the spam detection performance.

Design/methodology/approach

The Trust-ACO algorithm is designed to let an ant start from a non-spam seed, and afterwards, decide to walk through paths in the host graph. Trails (i.e. trust paths) discovered by ants are then interpreted and compiled to non-spam classification rules. Similarly, the Distrust-ACO algorithm is designed to generate spam classification ones. The last Combine-ACO algorithm aims to accumulate rules given from the former algorithms. Moreover, an adaptive learning technique is introduced to let ants walk with longer (or shorter) steps by rewarding them when they find desirable paths or penalizing them otherwise.

Findings

Experiments are conducted on two publicly available WEBSPAM-UK2006 and WEBSPAM-UK2007 datasets. The results show that the proposed algorithms outperform well-known rule-based classification baselines. Especially, the proposed adaptive learning technique helps improving the AUC scores up to 0.899 and 0.784 on the former and the latter datasets, respectively.

Originality/value

To the best of our knowledge, this is the first comprehensive study that adopts the ACO learning approach to solve the problem of Web spam detection. In addition, we have improved the traditional ACO by using the adaptive learning technique.

Keywords

Acknowledgements

The initiative idea of this paper has been previously explored and published in ICCSA2014 conference; the authors thank Mr Apichat Taweesiriwate and Mr Jirayus Jiarpakdee, our former students, for their contribution on the first implementation of the algorithms.

Citation

Manaskasemsak, B. and Rungsawang, A. (2015), "Web spam detection using trust and distrust-based ant colony optimization learning", International Journal of Web Information Systems, Vol. 11 No. 2, pp. 142-161. https://doi.org/10.1108/IJWIS-12-2014-0047

Publisher

:

Emerald Group Publishing Limited

Copyright © 2015, Emerald Group Publishing Limited

Related articles