To read this content please select one of the options below:

Phishing web site detection using diverse machine learning algorithms

Ammara Zamir (Department of Computer Science, University of Wah, Quaid Avenue, Wah Cantt, Pakistan and Department of Computer Science, COMSATS University Islamabad – Wah Campus, Islamabad, Pakistan)
Hikmat Ullah Khan (Department of Computer Science, COMSATS University Islamabad, Wah Campus, Islamabad, Pakistan)
Tassawar Iqbal (Department of Computer Science, COMSATS University Islamabad, Wah Campus, Islamabad, Pakistan)
Nazish Yousaf (Department of Computer Science, University of Wah, Quaid Avenue, Wah Cantt, Pakistan and Department of Computer and Software Engineering, College of Electrical and Mechanical Engineering, Islamabad, Pakistan)
Farah Aslam (Department of Computer Science, University of Wah, Quaid Avenue, Wah Cantt, Pakistan)
Almas Anjum (Department of Computer and Software Engineering, College of Electrical and Mechanical Engineering, Islamabad, Pakistan)
Maryam Hamdani (Department of Computer Science, University of Wah, Quaid Avenue, Wah Cantt, Pakistan)

The Electronic Library

ISSN: 0264-0473

Article publication date: 10 January 2020

Issue publication date: 19 March 2020

3174

Abstract

Purpose

This paper aims to present a framework to detect phishing websites using stacking model. Phishing is a type of fraud to access users’ credentials. The attackers access users’ personal and sensitive information for monetary purposes. Phishing affects diverse fields, such as e-commerce, online business, banking and digital marketing, and is ordinarily carried out by sending spam emails and developing identical websites resembling the original websites. As people surf the targeted website, the phishers hijack their personal information.

Design/methodology/approach

Features of phishing data set are analysed by using feature selection techniques including information gain, gain ratio, Relief-F and recursive feature elimination (RFE) for feature selection. Two features are proposed combining the strongest and weakest attributes. Principal component analysis with diverse machine learning algorithms including (random forest [RF], neural network [NN], bagging, support vector machine, Naïve Bayes and k-nearest neighbour) is applied on proposed and remaining features. Afterwards, two stacking models: Stacking1 (RF + NN + Bagging) and Stacking2 (kNN + RF + Bagging) are applied by combining highest scoring classifiers to improve the classification accuracy.

Findings

The proposed features played an important role in improving the accuracy of all the classifiers. The results show that RFE plays an important role to remove the least important feature from the data set. Furthermore, Stacking1 (RF + NN + Bagging) outperformed all other classifiers in terms of classification accuracy to detect phishing website with 97.4% accuracy.

Originality/value

This research is novel in this regard that no previous research focusses on using feed forward NN and ensemble learners for detecting phishing websites.

Keywords

Citation

Zamir, A., Khan, H.U., Iqbal, T., Yousaf, N., Aslam, F., Anjum, A. and Hamdani, M. (2020), "Phishing web site detection using diverse machine learning algorithms", The Electronic Library, Vol. 38 No. 1, pp. 65-80. https://doi.org/10.1108/EL-05-2019-0118

Publisher

:

Emerald Publishing Limited

Copyright © 2020, Emerald Publishing Limited

Related articles