Search results

1 – 10 of over 5000

Open Access

Article

Publication date: 18 July 2022

Enabling intrusion detection systems with dueling double deep Q-learning

In this research, the authors demonstrate the advantage of reinforcement learning (RL) based intrusion detection systems (IDS) to solve very complex problems (e.g. selecting input…

HTML

PDF (5.2 MB)

Downloads

1491

Abstract

Purpose

In this research, the authors demonstrate the advantage of reinforcement learning (RL) based intrusion detection systems (IDS) to solve very complex problems (e.g. selecting input features, considering scarce resources and constrains) that cannot be solved by classical machine learning. The authors include a comparative study to build intrusion detection based on statistical machine learning and representational learning, using knowledge discovery in databases (KDD) Cup99 and Installation Support Center of Expertise (ISCX) 2012.

Design/methodology/approach

The methodology applies a data analytics approach, consisting of data exploration and machine learning model training and evaluation. To build a network-based intrusion detection system, the authors apply dueling double deep Q-networks architecture enabled with costly features, k-nearest neighbors (K-NN), support-vector machines (SVM) and convolution neural networks (CNN).

Findings

Machine learning-based intrusion detection are trained on historical datasets which lead to model drift and lack of generalization whereas RL is trained with data collected through interactions. RL is bound to learn from its interactions with a stochastic environment in the absence of a training dataset whereas supervised learning simply learns from collected data and require less computational resources.

Research limitations/implications

All machine learning models have achieved high accuracy values and performance. One potential reason is that both datasets are simulated, and not realistic. It was not clear whether a validation was ever performed to show that data were collected from real network traffics.

Practical implications

The study provides guidelines to implement IDS with classical supervised learning, deep learning and RL.

Originality/value

The research applied the dueling double deep Q-networks architecture enabled with costly features to build network-based intrusion detection from network traffics. This research presents a comparative study of reinforcement-based instruction detection with counterparts built with statistical and representational machine learning.

Details

Digital Transformation and Society, vol. 1 no. 1

Type: Research Article

DOI:

ISSN: 2755-0761

Keywords

View access options

Article

Publication date: 13 March 2017

Comparison of supervised machine learning techniques for customer churn prediction based on analysis of customer behavior

Samira Khodabandehlou and Mahmoud Zivari Rahman

This paper aims to provide a predictive framework of customer churn through six stages for accurate prediction and preventing customer churn in the field of business.

HTML

PDF (347 KB)

Downloads

4791

Abstract

Purpose

This paper aims to provide a predictive framework of customer churn through six stages for accurate prediction and preventing customer churn in the field of business.

Design/methodology/approach

The six stages are as follows: first, collection of customer behavioral data and preparation of the data; second, the formation of derived variables and selection of influential variables, using a method of discriminant analysis; third, selection of training and testing data and reviewing their proportion; fourth, the development of prediction models using simple, bagging and boosting versions of supervised machine learning; fifth, comparison of churn prediction models based on different versions of machine-learning methods and selected variables; and sixth, providing appropriate strategies based on the proposed model.

Findings

According to the results, five variables, the number of items, reception of returned items, the discount, the distribution time and the prize beside the recency, frequency and monetary (RFM) variables (RFMITSDP), were chosen as the best predictor variables. The proposed model with accuracy of 97.92 per cent, in comparison to RFM, had much better performance in churn prediction and among the supervised machine learning methods, artificial neural network (ANN) had the highest accuracy, and decision trees (DT) was the least accurate one. The results show the substantially superiority of boosting versions in prediction compared with simple and bagging models.

Research limitations/implications

The period of the available data was limited to two years. The research data were limited to only one grocery store whereby it may not be applicable to other industries; therefore, generalizing the results to other business centers should be used with caution.

Practical implications

Business owners must try to enforce a clear rule to provide a prize for a certain number of purchased items. Of course, the prize can be something other than the purchased item. Business owners must accept the items returned by the customers for any reasons, and the conditions for accepting returned items and the deadline for accepting the returned items must be clearly communicated to the customers. Store owners must consider a discount for a certain amount of purchase from the store. They have to use an exponential rule to increase the discount when the amount of purchase is increased to encourage customers for more purchase. The managers of large stores must try to quickly deliver the ordered items, and they should use equipped and new transporting vehicles and skilled and friendly workforce for delivering the items. It is recommended that the types of services, the rules for prizes, the discount, the rules for accepting the returned items and the method of distributing the items must be prepared and shown in the store for all the customers to see. The special services and reward rules of the store must be communicated to the customers using new media such as social networks. To predict the customer behaviors based on the data, the future researchers should use the boosting method because it increases efficiency and accuracy of prediction. It is recommended that for predicting the customer behaviors, particularly their churning status, the ANN method be used. To extract and select the important and effective variables influencing customer behaviors, the discriminant analysis method can be used which is a very accurate and powerful method for predicting the classes of the customers.

Originality/value

The current study tries to fill this gap by considering five basic and important variables besides RFM in stores, i.e. prize, discount, accepting returns, delay in distribution and the number of items, so that the business owners can understand the role services such as prizes, discount, distribution and accepting returns play in retraining the customers and preventing them from churning. Another innovation of the current study is the comparison of machine-learning methods with their boosting and bagging versions, especially considering the fact that previous studies do not consider the bagging method. The other reason for the study is the conflicting results regarding the superiority of machine-learning methods in a more accurate prediction of customer behaviors, including churning. For example, some studies introduce ANN (Huang et al., 2010; Hung and Wang, 2004; Keramati et al., 2014; Runge et al., 2014), some introduce support vector machine ( Guo-en and Wei-dong, 2008; Vafeiadis et al., 2015; Yu et al., 2011) and some introduce DT (Freund and Schapire, 1996; Qureshi et al., 2013; Umayaparvathi and Iyakutti, 2012) as the best predictor, confusing the users of the results of these studies regarding the best prediction method. The current study identifies the best prediction method specifically in the field of store businesses for researchers and the owners. Moreover, another innovation of the current study is using discriminant analysis for selecting and filtering variables which are important and effective in predicting churners and non-churners, which is not used in previous studies. Therefore, the current study is unique considering the used variables, the method of comparing their accuracy and the method of selecting effective variables.

Details

Journal of Systems and Information Technology, vol. 19 no. 1/2

Type: Research Article

DOI:

ISSN: 1328-7265

Keywords

View access options

Article

Publication date: 30 April 2020

Thermal comfort prediction by applying supervised machine learning in green sidewalks of Tehran

Nasim Eslamirad, Soheil Malekpour Kolbadinejad, Mohammadjavad Mahdavinejad and Mohammad Mehranrad

This research aims to introduce a new methodology for integration between urban design strategies and supervised machine learning (SML) method – by applying both energy…

HTML

PDF (1.8 MB)

Downloads

606

Abstract

Purpose

This research aims to introduce a new methodology for integration between urban design strategies and supervised machine learning (SML) method – by applying both energy engineering modeling (evaluating phase) for the existing green sidewalks and statistical energy modeling (predicting phase) for the new ones – to offer algorithms that help to catch the optimum morphology of green sidewalks, in case of high quality of the outdoor thermal comfort and less errors in results.

Design/methodology/approach

The tools of the study are the way of processing by SML, predicting the future based on the past. Machine learning is benefited from Python advantages. The structure of the study consisted of two main parts, as the majority of the similar studies follow: engineering energy modeling and statistical energy modeling. According to the concept of the study, at first, from 2268 models, some are randomly selected, simulated and sensitively analyzed by ENVI-met. Furthermore, the Envi-met output as the quantity of thermal comfort – predicted mean vote (PMV) and weather items are inputs of Python. Then, the formed data set is processed by SML, to reach the final reliable predicted output.

Findings

The process of SML leads the study to find thermal comfort of current models and other similar sidewalks. The results are evaluated by both PMV mathematical model and SML error evaluation functions. The results confirm that the average of the occurred error is about 1%. Then the method of study is reliable to apply in the variety of similar fields. Finding of this study can be helpful in perspective of the sustainable architecture strategies in the buildings and urban scales, to determine, monitor and control energy-based behaviors (thermal comfort, heating, cooling, lighting and ventilation) in operational phase of the systems (existed elements in buildings, and constructions) and the planning and designing phase of the future built cases – all over their life spans.

Research limitations/implications

Limitations of the study are related to the study variables and alternatives that are notable impact on the findings. Furthermore, the most trustable input data will result in the more accuracy in output. Then modeling and simulation processes are most significant part of the research to reach the exact results in the final step.

Practical implications

Finding of the study can be helpful in urban design strategies. By finding outdoor thermal comfort that resulted from machine learning method, urban and landscape designers, policymakers and architects are able to estimate the features of their designs in air quality and urban health and can be sure in catching design goals in case of thermal comfort in urban atmosphere.

Social implications

By 2030, cities are delved as living spaces for about three out of five people. As green infrastructures influence in moderating the cities’ climate, the relationship between green spaces and habitants’ thermal comfort is deduced. Although the strategies to outside thermal comfort improvement, by design methods and applicants, are not new subject to discuss, applying machines that may be common in predicting results can be called as a new insight in applying more effective design strategies and in urban environment’s comfort preparation. Then study’s footprint in social implications stems in learning from the previous projects and developing more efficient strategies to prepare cities as the more comfortable and healthy places to live, with the more efficient models and consuming money and time.

Originality/value

The study achievements are expected to be applied not only in Tehran but also in other climate zones as the pattern in more eco-city design strategies. Although some similar studies are done in different majors, the concept of study is new vision in urban studies.

Details

Smart and Sustainable Built Environment, vol. 9 no. 4

Type: Research Article

DOI:

ISSN: 2046-6099

Keywords

View access options

Article

Publication date: 25 October 2018

Business environmental analysis for textual data using data mining and sentence-level classification

Yoon-Sung Kim, Hae-Chang Rim and Do-Gil Lee

The purpose of this paper is to propose a methodology to analyze a large amount of unstructured textual data into categories of business environmental analysis frameworks.

HTML

PDF (454 KB)

Downloads

1983

Abstract

Purpose

The purpose of this paper is to propose a methodology to analyze a large amount of unstructured textual data into categories of business environmental analysis frameworks.

Design/methodology/approach

This paper uses machine learning to classify a vast amount of unstructured textual data by category of business environmental analysis framework. Generally, it is difficult to produce high quality and massive training data for machine-learning-based system in terms of cost. Semi-supervised learning techniques are used to improve the classification performance. Additionally, the lack of feature problem that traditional classification systems have suffered is resolved by applying semantic features by utilizing word embedding, a new technique in text mining.

Findings

The proposed methodology can be used for various business environmental analyses and the system is fully automated in both the training and classifying phases. Semi-supervised learning can solve the problems with insufficient training data. The proposed semantic features can be helpful for improving traditional classification systems.

Research limitations/implications

This paper focuses on classifying sentences that contain the information of business environmental analysis in large amount of documents. However, the proposed methodology has a limitation on the advanced analyses which can directly help managers establish strategies, since it does not summarize the environmental variables that are implied in the classified sentences. Using the advanced summarization and recommendation techniques could extract the environmental variables among the sentences, and they can assist managers to establish effective strategies.

Originality/value

The feature selection technique developed in this paper has not been used in traditional systems for business and industry, so that the whole process can be fully automated. It also demonstrates practicality so that it can be applied to various business environmental analysis frameworks. In addition, the system is more economical than traditional systems because of semi-supervised learning, and can resolve the lack of feature problem that traditional systems suffer. This work is valuable for analyzing environmental factors and establishing strategies for companies.

Details

Industrial Management & Data Systems, vol. 119 no. 1

Type: Research Article

DOI:

ISSN: 0263-5577

Keywords

View access options

Article

Publication date: 16 August 2021

A systematic survey on deep learning and machine learning approaches of fake news detection in the pre- and post-COVID-19 pandemic

Rajshree Varma, Yugandhara Verma, Priya Vijayvargiya and Prathamesh P. Churi

The rapid advancement of technology in online communication and fingertip access to the Internet has resulted in the expedited dissemination of fake news to engage a global…

HTML

PDF (1.3 MB)

Downloads

1479

Abstract

Purpose

The rapid advancement of technology in online communication and fingertip access to the Internet has resulted in the expedited dissemination of fake news to engage a global audience at a low cost by news channels, freelance reporters and websites. Amid the coronavirus disease 2019 (COVID-19) pandemic, individuals are inflicted with these false and potentially harmful claims and stories, which may harm the vaccination process. Psychological studies reveal that the human ability to detect deception is only slightly better than chance; therefore, there is a growing need for serious consideration for developing automated strategies to combat fake news that traverses these platforms at an alarming rate. This paper systematically reviews the existing fake news detection technologies by exploring various machine learning and deep learning techniques pre- and post-pandemic, which has never been done before to the best of the authors’ knowledge.

Design/methodology/approach

The detailed literature review on fake news detection is divided into three major parts. The authors searched papers no later than 2017 on fake news detection approaches on deep learning and machine learning. The papers were initially searched through the Google scholar platform, and they have been scrutinized for quality. The authors kept “Scopus” and “Web of Science” as quality indexing parameters. All research gaps and available databases, data pre-processing, feature extraction techniques and evaluation methods for current fake news detection technologies have been explored, illustrating them using tables, charts and trees.

Findings

The paper is dissected into two approaches, namely machine learning and deep learning, to present a better understanding and a clear objective. Next, the authors present a viewpoint on which approach is better and future research trends, issues and challenges for researchers, given the relevance and urgency of a detailed and thorough analysis of existing models. This paper also delves into fake new detection during COVID-19, and it can be inferred that research and modeling are shifting toward the use of ensemble approaches.

Originality/value

The study also identifies several novel automated web-based approaches used by researchers to assess the validity of pandemic news that have proven to be successful, although currently reported accuracy has not yet reached consistent levels in the real world.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 14 no. 4

Type: Research Article

DOI:

ISSN: 1756-378X

Keywords

View access options

Article

Publication date: 5 April 2021

Evaluating disaster-related tweet credibility using content-based and user-based features

Nasser Assery, Yuan (Dorothy) Xiaohong, Qu Xiuli, Roy Kaushik and Sultan Almalki

This study aims to propose an unsupervised learning model to evaluate the credibility of disaster-related Twitter data and present a performance comparison with commonly used…

HTML

PDF (713 KB)

Downloads

186

Abstract

Purpose

This study aims to propose an unsupervised learning model to evaluate the credibility of disaster-related Twitter data and present a performance comparison with commonly used supervised machine learning models.

Design/methodology/approach

First historical tweets on two recent hurricane events are collected via Twitter API. Then a credibility scoring system is implemented in which the tweet features are analyzed to give a credibility score and credibility label to the tweet. After that, supervised machine learning classification is implemented using various classification algorithms and their performances are compared.

Findings

The proposed unsupervised learning model could enhance the emergency response by providing a fast way to determine the credibility of disaster-related tweets. Additionally, the comparison of the supervised classification models reveals that the Random Forest classifier performs significantly better than the SVM and Logistic Regression classifiers in classifying the credibility of disaster-related tweets.

Originality/value

In this paper, an unsupervised 10-point scoring model is proposed to evaluate the tweets’ credibility based on the user-based and content-based features. This technique could be used to evaluate the credibility of disaster-related tweets on future hurricanes and would have the potential to enhance emergency response during critical events. The comparative study of different supervised learning methods has revealed effective supervised learning methods for evaluating the credibility of Tweeter data.

Details

Information Discovery and Delivery, vol. 50 no. 1

Type: Research Article

DOI:

ISSN: 2398-6247

Keywords

Open Access

Article

Publication date: 28 April 2023

A hybrid machine learning approach for analysis of stegomalware

Prudence Kadebu, Robert T.R. Shoniwa, Kudakwashe Zvarevashe, Addlight Mukwazvure, Innocent Mapanga, Nyasha Fadzai Thusabantu and Tatenda Trust Gotora

Given how smart today’s malware authors have become through employing highly sophisticated techniques, it is only logical that methods be developed to combat the most potent…

HTML

PDF (1.6 MB)

Downloads

2422

Abstract

Purpose

Given how smart today’s malware authors have become through employing highly sophisticated techniques, it is only logical that methods be developed to combat the most potent threats, particularly where the malware is stealthy and makes indicators of compromise (IOC) difficult to detect. After the analysis is completed, the output can be employed to detect and then counteract the attack. The goal of this work is to propose a machine learning approach to improve malware detection by combining the strengths of both supervised and unsupervised machine learning techniques. This study is essential as malware has certainly become ubiquitous as cyber-criminals use it to attack systems in cyberspace. Malware analysis is required to reveal hidden IOC, to comprehend the attacker’s goal and the severity of the damage and to find vulnerabilities within the system.

Design/methodology/approach

This research proposes a hybrid approach for dynamic and static malware analysis that combines unsupervised and supervised machine learning algorithms and goes on to show how Malware exploiting steganography can be exposed.

Findings

The tactics used by malware developers to circumvent detection are becoming more advanced with steganography becoming a popular technique applied in obfuscation to evade mechanisms for detection. Malware analysis continues to call for continuous improvement of existing techniques. State-of-the-art approaches applying machine learning have become increasingly popular with highly promising results.

Originality/value

Cyber security researchers globally are grappling with devising innovative strategies to identify and defend against the threat of extremely sophisticated malware attacks on key infrastructure containing sensitive data. The process of detecting the presence of malware requires expertise in malware analysis. Applying intelligent methods to this process can aid practitioners in identifying malware’s behaviour and features. This is especially expedient where the malware is stealthy, hiding IOC.

Details

International Journal of Industrial Engineering and Operations Management, vol. 5 no. 2

Type: Research Article

DOI:

ISSN: 2690-6090

Keywords

View access options

Book part

Publication date: 18 January 2023

Garbage in, Garbage out: A Theory-Driven Approach to Improve Data Handling in Supervised Machine Learning

Steven J. Hyde, Eric Bachura and Joseph S. Harrison

Machine learning (ML) has recently gained momentum as a method for measurement in strategy research. Yet, little guidance exists regarding how to appropriately apply the method…

HTML

PDF (988 KB)

EPUB (3.9 MB)

Abstract

Machine learning (ML) has recently gained momentum as a method for measurement in strategy research. Yet, little guidance exists regarding how to appropriately apply the method for this purpose in our discipline. We address this by offering a guide to the application of ML in strategy research, with a particular emphasis on data handling practices that should improve our ability to accurately measure our constructs of interest using ML techniques. We offer a brief overview of ML methodologies that can be used for measurement before describing key challenges that exist when applying those methods for this purpose in strategy research (i.e., sample sizes, data noise, and construct complexity). We then outline a theory-driven approach to help scholars overcome these challenges and improve data handling and the subsequent application of ML techniques in strategy research. We demonstrate the efficacy of our approach by applying it to create a linguistic measure of CEOs' motivational needs in a sample of S&P 500 firms. We conclude by describing steps scholars can take after creating ML-based measures to continue to improve the application of ML in strategy research.

Details

Methods to Improve Our Field

Type: Book

DOI:

ISBN: 978-1-80455-365-7

Keywords

Open Access

Article

Publication date: 18 November 2021

Combating money laundering with machine learning – applicability of supervised-learning algorithms at cryptocurrency exchanges

Eric Pettersson Ruiz and Jannis Angelis

This study aims to explore how to deanonymize cryptocurrency money launderers with the help of machine learning (ML). Money is laundered through cryptocurrencies by distributing…

HTML

PDF (279 KB)

Downloads

5942

Abstract

Purpose

This study aims to explore how to deanonymize cryptocurrency money launderers with the help of machine learning (ML). Money is laundered through cryptocurrencies by distributing funds to multiple accounts and then reexchanging the crypto back. This process of exchanging currencies is done through cryptocurrency exchanges. Current preventive efforts are outdated, and ML may provide novel ways to identify illicit currency movements. Hence, this study investigates ML applicability for combatting money laundering activities using cryptocurrency.

Design/methodology/approach

Four supervised-learning algorithms were compared using the Bitcoin Elliptic Dataset. The method covered a quantitative analysis of the algorithmic performance, capturing differences in three key evaluation metrics of F1-scores, precision and recall. Two complementary qualitative interviews were performed at cryptocurrency exchanges to identify fit and applicability of the algorithms.

Findings

The study results show that the current implemented ML tools for preventing money laundering at cryptocurrency exchanges are all too slow and need to be optimized for the task. The results also show that while not one single algorithm is most suitable for detecting transactions related to money-laundering, the specific applicability of the decision tree algorithm is most suitable for adoption by cryptocurrency exchanges.

Originality/value

Given the growth of cryptocurrency use, this study explores the newly developed field of algorithmic tools to combat illicit currency movement, in particular in the growing arena of cryptocurrencies. The study results provide new insights into the applicability of ML as a tool to combat money laundering using cryptocurrency exchanges.

Details

Journal of Money Laundering Control, vol. 25 no. 4

Type: Research Article

DOI:

ISSN: 1368-5201

Keywords

Open Access

Article

Publication date: 28 July 2020

Predictive modelling and analytics for diabetes using a machine learning approach

Harleen Kaur and Vinita Kumari

Diabetes is a major metabolic disorder which can affect entire body system adversely. Undiagnosed diabetes can increase the risk of cardiac stroke, diabetic nephropathy and other…

HTML

PDF (1.3 MB)

Downloads

12893

Abstract

Diabetes is a major metabolic disorder which can affect entire body system adversely. Undiagnosed diabetes can increase the risk of cardiac stroke, diabetic nephropathy and other disorders. All over the world millions of people are affected by this disease. Early detection of diabetes is very important to maintain a healthy life. This disease is a reason of global concern as the cases of diabetes are rising rapidly. Machine learning (ML) is a computational method for automatic learning from experience and improves the performance to make more accurate predictions. In the current research we have utilized machine learning technique in Pima Indian diabetes dataset to develop trends and detect patterns with risk factors using R data manipulation tool. To classify the patients into diabetic and non-diabetic we have developed and analyzed five different predictive models using R data manipulation tool. For this purpose we used supervised machine learning algorithms namely linear kernel support vector machine (SVM-linear), radial basis function (RBF) kernel support vector machine, k-nearest neighbour (k-NN), artificial neural network (ANN) and multifactor dimensionality reduction (MDR).

Details

Applied Computing and Informatics, vol. 18 no. 1/2

Type: Research Article

DOI:

ISSN: 2634-1964

Keywords

Access

Year

Content type

1 – 10 of over 5000

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Social implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Details

Keywords

Access

Year

Content type

All feedback is valuable

Report an issue or find answers to frequently asked questions