Search results

1 – 10 of 811
Article
Publication date: 4 December 2018

Zhongyi Hu, Raymond Chiong, Ilung Pranata, Yukun Bao and Yuqing Lin

Malicious web domain identification is of significant importance to the security protection of internet users. With online credibility and performance data, the purpose of this…

Abstract

Purpose

Malicious web domain identification is of significant importance to the security protection of internet users. With online credibility and performance data, the purpose of this paper to investigate the use of machine learning techniques for malicious web domain identification by considering the class imbalance issue (i.e. there are more benign web domains than malicious ones).

Design/methodology/approach

The authors propose an integrated resampling approach to handle class imbalance by combining the synthetic minority oversampling technique (SMOTE) and particle swarm optimisation (PSO), a population-based meta-heuristic algorithm. The authors use the SMOTE for oversampling and PSO for undersampling.

Findings

By applying eight well-known machine learning classifiers, the proposed integrated resampling approach is comprehensively examined using several imbalanced web domain data sets with different imbalance ratios. Compared to five other well-known resampling approaches, experimental results confirm that the proposed approach is highly effective.

Practical implications

This study not only inspires the practical use of online credibility and performance data for identifying malicious web domains but also provides an effective resampling approach for handling the class imbalance issue in the area of malicious web domain identification.

Originality/value

Online credibility and performance data are applied to build malicious web domain identification models using machine learning techniques. An integrated resampling approach is proposed to address the class imbalance issue. The performance of the proposed approach is confirmed based on real-world data sets with different imbalance ratios.

Article
Publication date: 18 October 2018

Kalyan Nagaraj, Biplab Bhattacharjee, Amulyashree Sridhar and Sharvani GS

Phishing is one of the major threats affecting businesses worldwide in current times. Organizations and customers face the hazards arising out of phishing attacks because of…

Abstract

Purpose

Phishing is one of the major threats affecting businesses worldwide in current times. Organizations and customers face the hazards arising out of phishing attacks because of anonymous access to vulnerable details. Such attacks often result in substantial financial losses. Thus, there is a need for effective intrusion detection techniques to identify and possibly nullify the effects of phishing. Classifying phishing and non-phishing web content is a critical task in information security protocols, and full-proof mechanisms have yet to be implemented in practice. The purpose of the current study is to present an ensemble machine learning model for classifying phishing websites.

Design/methodology/approach

A publicly available data set comprising 10,068 instances of phishing and legitimate websites was used to build the classifier model. Feature extraction was performed by deploying a group of methods, and relevant features extracted were used for building the model. A twofold ensemble learner was developed by integrating results from random forest (RF) classifier, fed into a feedforward neural network (NN). Performance of the ensemble classifier was validated using k-fold cross-validation. The twofold ensemble learner was implemented as a user-friendly, interactive decision support system for classifying websites as phishing or legitimate ones.

Findings

Experimental simulations were performed to access and compare the performance of the ensemble classifiers. The statistical tests estimated that RF_NN model gave superior performance with an accuracy of 93.41 per cent and minimal mean squared error of 0.000026.

Research limitations/implications

The research data set used in this study is publically available and easy to analyze. Comparative analysis with other real-time data sets of recent origin must be performed to ensure generalization of the model against various security breaches. Different variants of phishing threats must be detected rather than focusing particularly toward phishing website detection.

Originality/value

The twofold ensemble model is not applied for classification of phishing websites in any previous studies as per the knowledge of authors.

Details

Journal of Systems and Information Technology, vol. 20 no. 3
Type: Research Article
ISSN: 1328-7265

Keywords

Article
Publication date: 27 November 2020

Chaoqun Wang, Zhongyi Hu, Raymond Chiong, Yukun Bao and Jiang Wu

The aim of this study is to propose an efficient rule extraction and integration approach for identifying phishing websites. The proposed approach can elucidate patterns of…

Abstract

Purpose

The aim of this study is to propose an efficient rule extraction and integration approach for identifying phishing websites. The proposed approach can elucidate patterns of phishing websites and identify them accurately.

Design/methodology/approach

Hyperlink indicators along with URL-based features are used to build the identification model. In the proposed approach, very simple rules are first extracted based on individual features to provide meaningful and easy-to-understand rules. Then, the F-measure score is used to select high-quality rules for identifying phishing websites. To construct a reliable and promising phishing website identification model, the selected rules are integrated using a simple neural network model.

Findings

Experiments conducted using self-collected and benchmark data sets show that the proposed approach outperforms 16 commonly used classifiers (including seven non–rule-based and four rule-based classifiers as well as five deep learning models) in terms of interpretability and identification performance.

Originality/value

Investigating patterns of phishing websites based on hyperlink indicators using the efficient rule-based approach is innovative. It is not only helpful for identifying phishing websites, but also beneficial for extracting simple and understandable rules.

Details

The Electronic Library , vol. 38 no. 5/6
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 25 November 2013

Michael Levi and Matthew Leighton Williams

– This paper aims to map out multi-agency partnerships in the UK information assurance (UKIA) network in the UK.

3145

Abstract

Purpose

This paper aims to map out multi-agency partnerships in the UK information assurance (UKIA) network in the UK.

Design/methodology/approach

The paper surveyed members of the UKIA community and achieved a 52 percent response rate (n=104). The paper used a multi-dimensional scaling (MDS) technique to map the multi-agency cooperation space and factor analysis and ordinary least squares regression to identify predictive factors of cooperation frequency. Qualitative data were also solicited via the survey and interviews with security managers.

Findings

Via the quantitative measures, the paper locates gaps in the multi-agency cooperation network and identifies predictors of cooperation. The data indicate an over-crowded cybersecurity space, problems in apprehending perpetrators, and poor business case justifications for SMEs as potential inhibitors to cooperation, while concern over certain cybercrimes and perceptions of organisational effectiveness were identified as motivators.

Practical implications

The data suggest that the neo-liberal rationality that has been evoked in other areas of crime control is also evident in the control of cybercrimes. The paper concludes divisions exist between the High Policing rhetoric of the UK's Cyber Security Strategy and the (relatively) Low Policing cooperation outcomes in “on the ground” cyber-policing. If the cooperation outcomes advocated by the UK Cyber Security Strategy are to be realised, UKIA organisations must begin to acknowledge and remedy gaps and barriers in cooperation.

Originality/value

This paper provides the first mixed-methods evidence on the multi-agency cooperation patterns amongst the UKIA community in the UK and highlights significant gaps in the network.

Details

Information Management & Computer Security, vol. 21 no. 5
Type: Research Article
ISSN: 0968-5227

Keywords

Article
Publication date: 1 November 2019

Shrawan Kumar Trivedi and Shubhamoy Dey

Email is a rapid and cheapest medium of sharing information, whereas unsolicited email (spam) is constant trouble in the email communication. The rapid growth of the spam creates…

Abstract

Purpose

Email is a rapid and cheapest medium of sharing information, whereas unsolicited email (spam) is constant trouble in the email communication. The rapid growth of the spam creates a necessity to build a reliable and robust spam classifier. This paper aims to presents a study of evolutionary classifiers (genetic algorithm [GA] and genetic programming [GP]) without/with the help of an ensemble of classifiers method. In this research, the classifiers ensemble has been developed with adaptive boosting technique.

Design/methodology/approach

Text mining methods are applied for classifying spam emails and legitimate emails. Two data sets (Enron and SpamAssassin) are taken to test the concerned classifiers. Initially, pre-processing is performed to extract the features/words from email files. Informative feature subset is selected from greedy stepwise feature subset search method. With the help of informative features, a comparative study is performed initially within the evolutionary classifiers and then with other popular machine learning classifiers (Bayesian, naive Bayes and support vector machine).

Findings

This study reveals the fact that evolutionary algorithms are promising in classification and prediction applications where genetic programing with adaptive boosting is turned out not only an accurate classifier but also a sensitive classifier. Results show that initially GA performs better than GP but after an ensemble of classifiers (a large number of iterations), GP overshoots GA with significantly higher accuracy. Amongst all classifiers, boosted GP turns out to be not only good regarding classification accuracy but also low false positive (FP) rates, which is considered to be the important criteria in email spam classification. Also, greedy stepwise feature search is found to be an effective method for feature selection in this application domain.

Research limitations/implications

The research implication of this research consists of the reduction in cost incurred because of spam/unsolicited bulk email. Email is a fundamental necessity to share information within a number of units of the organizations to be competitive with the business rivals. In addition, it is continually a hurdle for internet service providers to provide the best emailing services to their customers. Although, the organizations and the internet service providers are continuously adopting novel spam filtering approaches to reduce the number of unwanted emails, the desired effect could not be significantly seen because of the cost of installation, customizable ability and the threat of misclassification of important emails. This research deals with all the issues and challenges faced by internet service providers and organizations.

Practical implications

In this research, the proposed models have not only provided excellent performance accuracy, sensitivity with low FP rate, customizable capability but also worked on reducing the cost of spam. The same models may be used for other applications of text mining also such as sentiment analysis, blog mining, news mining or other text mining research.

Originality/value

A comparison between GP and GAs has been shown with/without ensemble in spam classification application domain.

Article
Publication date: 3 January 2023

Saleem Raja A., Sundaravadivazhagan Balasubaramanian, Pradeepa Ganesan, Justin Rajasekaran and Karthikeyan R.

The internet has completely merged into contemporary life. People are addicted to using internet services for everyday activities. Consequently, an abundance of information about…

Abstract

Purpose

The internet has completely merged into contemporary life. People are addicted to using internet services for everyday activities. Consequently, an abundance of information about people and organizations is available online, which encourages the proliferation of cybercrimes. Cybercriminals often use malicious links for large-scale cyberattacks, which are disseminated via email, SMS and social media. Recognizing malicious links online can be exceedingly challenging. The purpose of this paper is to present a strong security system that can detect malicious links in the cyberspace using natural language processing technique.

Design/methodology/approach

The researcher recommends a variety of approaches, including blacklisting and rules-based machine/deep learning, for automatically recognizing malicious links. But the approaches generally necessitate the generation of a set of features to generalize the detection process. Most of the features are generated by processing URLs and content of the web page, as well as some external features such as the ranking of the web page and domain name system information. This process of feature extraction and selection typically takes more time and demands a high level of expertise in the domain. Sometimes the generated features may not leverage the full potentials of the data set. In addition, the majority of the currently deployed systems make use of a single classifier for the classification of malicious links. However, prediction accuracy may vary widely depending on the data set and the classifier used.

Findings

To address the issue of generating feature sets, the proposed method uses natural language processing techniques (term frequency and inverse document frequency) that vectorize URLs. To build a robust system for the classification of malicious links, the proposed system implements weighted soft voting classifier, an ensemble classifier that combines predictions of base classifiers. The ability or skill of each classifier serves as the base for the weight that is assigned to it.

Originality/value

The proposed method performs better when the optimal weights are assigned. The performance of the proposed method was assessed by using two different data sets (D1 and D2) and compared performance against base machine learning classifiers and previous research results. The outcome accuracy shows that the proposed method is superior to the existing methods, offering 91.4% and 98.8% accuracy for data sets D1 and D2, respectively.

Details

International Journal of Pervasive Computing and Communications, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 30 October 2018

Shrawan Kumar Trivedi and Prabin Kumar Panigrahi

Email spam classification is now becoming a challenging area in the domain of text classification. Precise and robust classifiers are not only judged by classification accuracy…

Abstract

Purpose

Email spam classification is now becoming a challenging area in the domain of text classification. Precise and robust classifiers are not only judged by classification accuracy but also by sensitivity (correctly classified legitimate emails) and specificity (correctly classified unsolicited emails) towards the accurate classification, captured by both false positive and false negative rates. This paper aims to present a comparative study between various decision tree classifiers (such as AD tree, decision stump and REP tree) with/without different boosting algorithms (bagging, boosting with re-sample and AdaBoost).

Design/methodology/approach

Artificial intelligence and text mining approaches have been incorporated in this study. Each decision tree classifier in this study is tested on informative words/features selected from the two publically available data sets (SpamAssassin and LingSpam) using a greedy step-wise feature search method.

Findings

Outcomes of this study show that without boosting, the REP tree provides high performance accuracy with the AD tree ranking as the second-best performer. Decision stump is found to be the under-performing classifier of this study. However, with boosting, the combination of REP tree and AdaBoost compares favourably with other classification models. If the metrics false positive rate and performance accuracy are taken together, AD tree and REP tree with AdaBoost were both found to carry out an effective classification task. Greedy stepwise has proven its worth in this study by selecting a subset of valuable features to identify the correct class of emails.

Research limitations/implications

This research is focussed on the classification of those email spams that are written in the English language only. The proposed models work with content (words/features) of email data that is mostly found in the body of the mail. Image spam has not been included in this study. Other messages such as short message service or multi-media messaging service were not included in this study.

Practical implications

In this research, a boosted decision tree approach has been proposed and used to classify email spam and ham files; this is found to be a highly effective approach in comparison with other state-of-the-art modes used in other studies. This classifier may be tested for different applications and may provide new insights for developers and researchers.

Originality/value

A comparison of decision tree classifiers with/without ensemble has been presented for spam classification.

Details

Journal of Systems and Information Technology, vol. 20 no. 3
Type: Research Article
ISSN: 1328-7265

Keywords

Article
Publication date: 6 September 2021

Sivaraman Eswaran, Vakula Rani, Daniel D., Jayabrabu Ramakrishnan and Sadhana Selvakumar

In the recent era, banking infrastructure constructs various remotely handled platforms for users. However, the security risk toward the banking sector has also elevated, as it is…

Abstract

Purpose

In the recent era, banking infrastructure constructs various remotely handled platforms for users. However, the security risk toward the banking sector has also elevated, as it is visible from the rising number of reported attacks against these security systems. Intelligence shows that cyberattacks of the crawlers are increasing. Malicious crawlers can crawl the Web pages, crack the passwords and reap the private data of the users. Besides, intrusion detection systems in a dynamic environment provide more false positives. The purpose of this research paper is to propose an efficient methodology to sense the attacks for creating low levels of false positives.

Design/methodology/approach

In this research, the authors have developed an efficient approach for malicious crawler detection and correlated the security alerts. The behavioral features of the crawlers are examined for the recognition of the malicious crawlers, and a novel methodology is proposed to improvise the bank user portal security. The authors have compared various machine learning strategies including Bayesian network, support sector machine (SVM) and decision tree.

Findings

This proposed work stretches in various aspects. Initially, the outcomes are stated for the mixture of different kinds of log files. Then, distinct sites of various log files are selected for the construction of the acceptable data sets. Session identification, attribute extraction, session labeling and classification were held. Moreover, this approach clustered the meta-alerts into higher level meta-alerts for fusing multistages of attacks and the various types of attacks.

Originality/value

This methodology used incremental clustering techniques and analyzed the probability of existing topologies in SVM classifiers for more deterministic classification. It also enhanced the taxonomy for various domains.

Details

International Journal of Pervasive Computing and Communications, vol. 18 no. 1
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 1 July 2004

Stefanos Gritzalis

This paper presents a state‐of‐the‐art review of the Web privacy and anonymity enhancing security mechanisms, tools, applications and services, with respect to their architecture…

3066

Abstract

This paper presents a state‐of‐the‐art review of the Web privacy and anonymity enhancing security mechanisms, tools, applications and services, with respect to their architecture, operational principles and vulnerabilities. Furthermore, to facilitate a detailed comparative analysis, the appropriate parameters have been selected and grouped in classes of comparison criteria, in the form of an integrated comparison framework. The main concern during the design of this framework was to cover the confronted security threats, applied technological issues and users' demands satisfaction. GNUnet's Anonymity Protocol (GAP), Freedom, Hordes, Crowds, Onion Routing, Platform for Privacy Preferences (P3P), TRUSTe, Lucent Personalized Web Assistant (LPWA), and Anonymizer have been reviewed and compared. The comparative review has clearly highlighted that the pros and cons of each system do not coincide, mainly due to the fact that each one exhibits different design goals and thus adopts dissimilar techniques for protecting privacy and anonymity.

Details

Information Management & Computer Security, vol. 12 no. 3
Type: Research Article
ISSN: 0968-5227

Keywords

Article
Publication date: 14 June 2013

Tran Tri Dang and Tran Khanh Dang

The purpose of this paper is to propose novel information visualization and interaction techniques to help security administrators analyze past web form submissions, with the…

Abstract

Purpose

The purpose of this paper is to propose novel information visualization and interaction techniques to help security administrators analyze past web form submissions, with the goals of searching, inspecting, verifying, and understanding about malicious submissions.

Design/methodology/approach

The authors utilize well‐known visual design principles in the techniques to support the analysis process. They also implement a prototype and use it to investigate simulated normal and malicious web submissions.

Findings

The techniques can increase analysts' efficiency by displaying large amounts of information at a time, help analysts detect certain kinds of anomalies, and support the analyzing process via provided interaction capabilities.

Research limitations/implications

Due to resources constraints, the authors experimented on simulated data only, not real data.

Practical implications

The techniques can be used to investigate past web form submissions, which is a first step in analysing and understanding the current security situation and attackers' skills. The knowledge gained from this process can be used to plan for effective future defence strategy, e.g. by improving/fine‐tuning the attack signatures of an automatic intrusion detection system.

Originality/value

The visualization and interaction designs are the first visual analysis technique for security investigation of web form submissions.

Details

International Journal of Web Information Systems, vol. 9 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

1 – 10 of 811