Search results

1 – 10 of 23

Open Access

Article

Publication date: 28 April 2023

A hybrid machine learning approach for analysis of stegomalware

Prudence Kadebu, Robert T.R. Shoniwa, Kudakwashe Zvarevashe, Addlight Mukwazvure, Innocent Mapanga, Nyasha Fadzai Thusabantu and Tatenda Trust Gotora

Given how smart today’s malware authors have become through employing highly sophisticated techniques, it is only logical that methods be developed to combat the most potent…

HTML

PDF (1.6 MB)

Downloads

2421

Abstract

Purpose

Given how smart today’s malware authors have become through employing highly sophisticated techniques, it is only logical that methods be developed to combat the most potent threats, particularly where the malware is stealthy and makes indicators of compromise (IOC) difficult to detect. After the analysis is completed, the output can be employed to detect and then counteract the attack. The goal of this work is to propose a machine learning approach to improve malware detection by combining the strengths of both supervised and unsupervised machine learning techniques. This study is essential as malware has certainly become ubiquitous as cyber-criminals use it to attack systems in cyberspace. Malware analysis is required to reveal hidden IOC, to comprehend the attacker’s goal and the severity of the damage and to find vulnerabilities within the system.

Design/methodology/approach

This research proposes a hybrid approach for dynamic and static malware analysis that combines unsupervised and supervised machine learning algorithms and goes on to show how Malware exploiting steganography can be exposed.

Findings

The tactics used by malware developers to circumvent detection are becoming more advanced with steganography becoming a popular technique applied in obfuscation to evade mechanisms for detection. Malware analysis continues to call for continuous improvement of existing techniques. State-of-the-art approaches applying machine learning have become increasingly popular with highly promising results.

Originality/value

Cyber security researchers globally are grappling with devising innovative strategies to identify and defend against the threat of extremely sophisticated malware attacks on key infrastructure containing sensitive data. The process of detecting the presence of malware requires expertise in malware analysis. Applying intelligent methods to this process can aid practitioners in identifying malware’s behaviour and features. This is especially expedient where the malware is stealthy, hiding IOC.

Details

International Journal of Industrial Engineering and Operations Management, vol. 5 no. 2

Type: Research Article

DOI:

ISSN: 2690-6090

Keywords

Open Access

Article

Publication date: 19 April 2023

Two decades of financial statement fraud detection literature review; combination of bibliometric analysis and topic modeling approach

Milad Soltani, Alexios Kythreotis and Arash Roshanpoor

The emergence of machine learning has opened a new way for researchers. It allows them to supplement the traditional manual methods for conducting a literature review and turning…

HTML

PDF (3.7 MB)

Downloads

5654

Abstract

Purpose

The emergence of machine learning has opened a new way for researchers. It allows them to supplement the traditional manual methods for conducting a literature review and turning it into smart literature. This study aims to present a framework for incorporating machine learning into financial statement fraud (FSF) literature analysis. This framework facilitates the analysis of a large amount of literature to show the trend of the field and identify the most productive authors, journals and potential areas for future research.

Design/methodology/approach

In this study, a framework was introduced that merges bibliometric analysis techniques such as word frequency, co-word analysis and coauthorship analysis with the Latent Dirichlet Allocation topic modeling approach. This framework was used to uncover subtopics from 20 years of financial fraud research articles. Furthermore, the hierarchical clustering method was used on selected subtopics to demonstrate the primary contexts in the literature on FSF.

Findings

This study has contributed to the literature in two ways. First, this study has determined the top journals, articles, countries and keywords based on various bibliometric metrics. Second, using topic modeling and then hierarchy clustering, this study demonstrates the four primary contexts in FSF detection.

Research limitations/implications

In this study, the authors tried to comprehensively view the studies related to financial fraud conducted over two decades. However, this research has limitations that can be an opportunity for future researchers. The first limitation is due to language bias. This study has focused on English language articles, so it is suggested that other researchers consider other languages as well. The second limitation is caused by citation bias. In this study, the authors tried to show the top articles based on the citation criteria. However, judging based on citation alone can be misleading. Therefore, this study suggests that the researchers consider other measures to check the citation quality and assess the studies’ precision by applying meta-analysis.

Originality/value

Despite the popularity of bibliometric analysis and topic modeling, there have been limited efforts to use machine learning for literature review. This novel approach of using hierarchical clustering on topic modeling results enable us to uncover four primary contexts. Furthermore, this method allowed us to show the keywords of each context and highlight significant articles within each context.

Details

Journal of Financial Crime, vol. 30 no. 5

Type: Research Article

DOI:

ISSN: 1359-0790

Keywords

Open Access

Article

Publication date: 4 November 2022

An IoT-based and cloud-assisted AI-driven monitoring platform for smart manufacturing: design architecture and experimental validation

Bianca Caiazzo, Teresa Murino, Alberto Petrillo, Gianluca Piccirillo and Stefania Santini

This work aims at proposing a novel Internet of Things (IoT)-based and cloud-assisted monitoring architecture for smart manufacturing systems able to evaluate their overall status…

HTML

PDF (5.2 MB)

Downloads

2954

Abstract

Purpose

This work aims at proposing a novel Internet of Things (IoT)-based and cloud-assisted monitoring architecture for smart manufacturing systems able to evaluate their overall status and detect eventual anomalies occurring into the production. A novel artificial intelligence (AI) based technique, able to identify the specific anomalous event and the related risk classification for possible intervention, is hence proposed.

Design/methodology/approach

The proposed solution is a five-layer scalable and modular platform in Industry 5.0 perspective, where the crucial layer is the Cloud Cyber one. This embeds a novel anomaly detection solution, designed by leveraging control charts, autoencoders (AE) long short-term memory (LSTM) and Fuzzy Inference System (FIS). The proper combination of these methods allows, not only detecting the products defects, but also recognizing their causalities.

Findings

The proposed architecture, experimentally validated on a manufacturing system involved into the production of a solar thermal high-vacuum flat panel, provides to human operators information about anomalous events, where they occur, and crucial information about their risk levels.

Practical implications

Thanks to the abnormal risk panel; human operators and business managers are able, not only of remotely visualizing the real-time status of each production parameter, but also to properly face with the eventual anomalous events, only when necessary. This is especially relevant in an emergency situation, such as the COVID-19 pandemic.

Originality/value

The monitoring platform is one of the first attempts in leading modern manufacturing systems toward the Industry 5.0 concept. Indeed, it combines human strengths, IoT technology on machines, cloud-based solutions with AI and zero detect manufacturing strategies in a unified framework so to detect causalities in complex dynamic systems by enabling the possibility of products’ waste avoidance.

Details

Journal of Manufacturing Technology Management, vol. 34 no. 4

Type: Research Article

DOI:

ISSN: 1741-038X

Keywords

Open Access

Article

Publication date: 18 July 2022

Enabling intrusion detection systems with dueling double deep Q-learning

Youakim Badr

In this research, the authors demonstrate the advantage of reinforcement learning (RL) based intrusion detection systems (IDS) to solve very complex problems (e.g. selecting input…

HTML

PDF (5.2 MB)

Downloads

1489

Abstract

Purpose

In this research, the authors demonstrate the advantage of reinforcement learning (RL) based intrusion detection systems (IDS) to solve very complex problems (e.g. selecting input features, considering scarce resources and constrains) that cannot be solved by classical machine learning. The authors include a comparative study to build intrusion detection based on statistical machine learning and representational learning, using knowledge discovery in databases (KDD) Cup99 and Installation Support Center of Expertise (ISCX) 2012.

Design/methodology/approach

The methodology applies a data analytics approach, consisting of data exploration and machine learning model training and evaluation. To build a network-based intrusion detection system, the authors apply dueling double deep Q-networks architecture enabled with costly features, k-nearest neighbors (K-NN), support-vector machines (SVM) and convolution neural networks (CNN).

Findings

Machine learning-based intrusion detection are trained on historical datasets which lead to model drift and lack of generalization whereas RL is trained with data collected through interactions. RL is bound to learn from its interactions with a stochastic environment in the absence of a training dataset whereas supervised learning simply learns from collected data and require less computational resources.

Research limitations/implications

All machine learning models have achieved high accuracy values and performance. One potential reason is that both datasets are simulated, and not realistic. It was not clear whether a validation was ever performed to show that data were collected from real network traffics.

Practical implications

The study provides guidelines to implement IDS with classical supervised learning, deep learning and RL.

Originality/value

The research applied the dueling double deep Q-networks architecture enabled with costly features to build network-based intrusion detection from network traffics. This research presents a comparative study of reinforcement-based instruction detection with counterparts built with statistical and representational machine learning.

Details

Digital Transformation and Society, vol. 1 no. 1

Type: Research Article

DOI:

ISSN: 2755-0761

Keywords

Open Access

Article

Publication date: 13 November 2018

Anomaly data management and big data analytics: an application on disability datasets

Zhiwen Pan, Wen Ji, Yiqiang Chen, Lianjun Dai and Jun Zhang

The disability datasets are the datasets that contain the information of disabled populations. By analyzing these datasets, professionals who work with disabled populations can…

HTML

PDF (724 KB)

Downloads

1306

Abstract

Purpose

The disability datasets are the datasets that contain the information of disabled populations. By analyzing these datasets, professionals who work with disabled populations can have a better understanding of the inherent characteristics of the disabled populations, so that working plans and policies, which can effectively help the disabled populations, can be made accordingly.

Design/methodology/approach

In this paper, the authors proposed a big data management and analytic approach for disability datasets.

Findings

By using a set of data mining algorithms, the proposed approach can provide the following services. The data management scheme in the approach can improve the quality of disability data by estimating miss attribute values and detecting anomaly and low-quality data instances. The data mining scheme in the approach can explore useful patterns which reflect the correlation, association and interactional between the disability data attributes. Experiments based on real-world dataset are conducted at the end to prove the effectiveness of the approach.

Originality/value

The proposed approach can enable data-driven decision-making for professionals who work with disabled populations.

Details

International Journal of Crowd Science, vol. 2 no. 2

Type: Research Article

DOI:

ISSN: 2398-7294

Keywords

Open Access

Article

Publication date: 20 September 2022

Open problems in medical federated learning

Joo Hun Yoo, Hyejun Jeong, Jaehyeok Lee and Tai-Myoung Chung

This study aims to summarize the critical issues in medical federated learning and applicable solutions. Also, detailed explanations of how federated learning techniques can be…

HTML

PDF (1.4 MB)

Downloads

3424

Abstract

Purpose

This study aims to summarize the critical issues in medical federated learning and applicable solutions. Also, detailed explanations of how federated learning techniques can be applied to the medical field are presented. About 80 reference studies described in the field were reviewed, and the federated learning framework currently being developed by the research team is provided. This paper will help researchers to build an actual medical federated learning environment.

Design/methodology/approach

Since machine learning techniques emerged, more efficient analysis was possible with a large amount of data. However, data regulations have been tightened worldwide, and the usage of centralized machine learning methods has become almost infeasible. Federated learning techniques have been introduced as a solution. Even with its powerful structural advantages, there still exist unsolved challenges in federated learning in a real medical data environment. This paper aims to summarize those by category and presents possible solutions.

Findings

This paper provides four critical categorized issues to be aware of when applying the federated learning technique to the actual medical data environment, then provides general guidelines for building a federated learning environment as a solution.

Originality/value

Existing studies have dealt with issues such as heterogeneity problems in the federated learning environment itself, but those were lacking on how these issues incur problems in actual working tasks. Therefore, this paper helps researchers understand the federated learning issues through examples of actual medical machine learning environments.

Details

International Journal of Web Information Systems, vol. 18 no. 2/3

Type: Research Article

DOI:

ISSN: 1744-0084

Keywords

Open Access

Article

Publication date: 28 April 2022

Machine learning and engineering feature approaches to detect events perturbing the indoor microclimate in Ringebu and Heddal stave churches (Norway)

Pietro Miglioranza, Andrea Scanu, Giuseppe Simionato, Nicholas Sinigaglia and America Califano

Climate-induced damage is a pressing problem for the preservation of cultural properties. Their physical deterioration is often the cumulative effect of different environmental…

HTML

PDF (1.7 MB)

Downloads

433

Abstract

Purpose

Climate-induced damage is a pressing problem for the preservation of cultural properties. Their physical deterioration is often the cumulative effect of different environmental hazards of variable intensity. Among these, fluctuations of temperature and relative humidity may cause nonrecoverable physical changes in building envelopes and artifacts made of hygroscopic materials, such as wood. Microclimatic fluctuations may be caused by several factors, including the presence of many visitors within the historical building. Within this framework, the current work is focused on detecting events taking place in two Norwegian stave churches, by identifying the fluctuations in temperature and relative humidity caused by the presence of people attending the public events.

Design/methodology/approach

The identification of such fluctuations and, so, of the presence of people within the churches has been carried out through three different methods. The first is an unsupervised clustering algorithm here termed “density peak,” the second is a supervised deep learning model based on a standard convolutional neural network (CNN) and the third is a novel ad hoc engineering feature approach “unexpected mixing ratio (UMR) peak.”

Findings

While the first two methods may have some instabilities (in terms of precision, recall and normal mutual information [NMI]), the last one shows a promising performance in the detection of microclimatic fluctuations induced by the presence of visitors.

Originality/value

The novelty of this work stands in using both well-established and in-house ad hoc machine learning algorithms in the field of heritage science, proving that these smart approaches could be of extreme usefulness and could lead to quick data analyses, if used properly.

Details

International Journal of Building Pathology and Adaptation, vol. 42 no. 1

Type: Research Article

DOI:

ISSN: 2398-4708

Keywords

Open Access

Article

Publication date: 22 November 2022

Research on optimization of index system design and its inspection method: data quality diagnosis, index classification and stratification

Kedong Yin, Yun Cao, Shiwei Zhou and Xinman Lv

The purposes of this research are to study the theory and method of multi-attribute index system design and establish a set of systematic, standardized, scientific index systems…

HTML

PDF (779 KB)

Downloads

915

Abstract

Purpose

The purposes of this research are to study the theory and method of multi-attribute index system design and establish a set of systematic, standardized, scientific index systems for the design optimization and inspection process. The research may form the basis for a rational, comprehensive evaluation and provide the most effective way of improving the quality of management decision-making. It is of practical significance to improve the rationality and reliability of the index system and provide standardized, scientific reference standards and theoretical guidance for the design and construction of the index system.

Design/methodology/approach

Using modern methods such as complex networks and machine learning, a system for the quality diagnosis of index data and the classification and stratification of index systems is designed. This guarantees the quality of the index data, realizes the scientific classification and stratification of the index system, reduces the subjectivity and randomness of the design of the index system, enhances its objectivity and rationality and lays a solid foundation for the optimal design of the index system.

Findings

Based on the ideas of statistics, system theory, machine learning and data mining, the focus in the present research is on “data quality diagnosis” and “index classification and stratification” and clarifying the classification standards and data quality characteristics of index data; a data-quality diagnosis system of “data review – data cleaning – data conversion – data inspection” is established. Using a decision tree, explanatory structural model, cluster analysis, K-means clustering and other methods, classification and hierarchical method system of indicators is designed to reduce the redundancy of indicator data and improve the quality of the data used. Finally, the scientific and standardized classification and hierarchical design of the index system can be realized.

Originality/value

The innovative contributions and research value of the paper are reflected in three aspects. First, a method system for index data quality diagnosis is designed, and multi-source data fusion technology is adopted to ensure the quality of multi-source, heterogeneous and mixed-frequency data of the index system. The second is to design a systematic quality-inspection process for missing data based on the systematic thinking of the whole and the individual. Aiming at the accuracy, reliability, and feasibility of the patched data, a quality-inspection method of patched data based on inversion thought and a unified representation method of data fusion based on a tensor model are proposed. The third is to use the modern method of unsupervised learning to classify and stratify the index system, which reduces the subjectivity and randomness of the design of the index system and enhances its objectivity and rationality.

Details

Marine Economics and Management, vol. 5 no. 2

Type: Research Article

DOI:

ISSN: 2516-158X

Keywords

Open Access

Article

Publication date: 8 December 2022

A comparative study of frequentist vs Bayesian A/B testing in the detection of E-commerce fraud

James Christopher Westland

This paper tests whether Bayesian A/B testing yields better decisions that traditional Neyman-Pearson hypothesis testing. It proposes a model and tests it using a large, multiyear…

HTML

PDF (356 KB)

Downloads

1516

Abstract

Purpose

This paper tests whether Bayesian A/B testing yields better decisions that traditional Neyman-Pearson hypothesis testing. It proposes a model and tests it using a large, multiyear Google Analytics (GA) dataset.

Design/methodology/approach

This paper is an empirical study. Competing A/B testing models were used to analyze a large, multiyear dataset of GA dataset for a firm that relies entirely on their website and online transactions for customer engagement and sales.

Findings

Bayesian A/B tests of the data not only yielded a clear delineation of the timing and impact of the intellectual property fraud, but calculated the loss of sales dollars, traffic and time on the firm’s website, with precise confidence limits. Frequentist A/B testing identified fraud in bounce rate at 5% significance, and bounces at 10% significance, but was unable to ascertain fraud at the standard significance cutoffs for scientific studies.

Research limitations/implications

None within the scope of the research plan.

Practical implications

Bayesian A/B tests of the data not only yielded a clear delineation of the timing and impact of the IP fraud, but calculated the loss of sales dollars, traffic and time on the firm’s website, with precise confidence limits.

Social implications

Bayesian A/B testing can derive economically meaningful statistics, whereas frequentist A/B testing only provide p-value’s whose meaning may be hard to grasp, and where misuse is widespread and has been a major topic in metascience. While misuse of p-values in scholarly articles may simply be grist for academic debate, the uncertainty surrounding the meaning of p-values in business analytics actually can cost firms money.

Originality/value

There is very little empirical research in e-commerce that uses Bayesian A/B testing. Almost all corporate testing is done via frequentist Neyman-Pearson methods.

Details

Journal of Electronic Business & Digital Economics, vol. 1 no. 1/2

Type: Research Article

DOI:

ISSN: 2754-4214

Keywords

Open Access

Article

Publication date: 16 July 2021

Big data as a value generator in decision support systems: a literature review

Gustavo Grander, Luciano Ferreira da Silva and Ernesto Del Rosário Santibañez Gonzalez

This paper aims to analyze how decision support systems manage Big data to obtain value.

HTML

PDF (343 KB)

Downloads

3955

Abstract

Purpose

This paper aims to analyze how decision support systems manage Big data to obtain value.

Design/methodology/approach

A systematic literature review was performed with screening and analysis of 72 articles published between 2012 and 2019.

Findings

The findings reveal that techniques of big data analytics, machine learning algorithms and technologies predominantly related to computer science and cloud computing are used on decision support systems. Another finding was that the main areas that these techniques and technologies are been applied are logistic, traffic, health, business and market. This article also allows authors to understand the relationship in which descriptive, predictive and prescriptive analyses are used according to an inverse relationship of complexity in data analysis and the need for human decision-making.

Originality/value

As it is an emerging theme, this study seeks to present an overview of the techniques and technologies that are being discussed in the literature to solve problems in their respective areas, as a form of theoretical contribution. The authors also understand that there is a practical contribution to the maturity of the discussion and with reflections even presented as suggestions for future research, such as the ethical discussion. This study’s descriptive classification can also serve as a guide for new researchers who seek to understand the research involving decision support systems and big data to gain value in our society.

Details

Revista de Gestão, vol. 28 no. 3

Type: Research Article

DOI:

ISSN: 1809-2276

Keywords

Access

Year

Content type

Article (23)

1 – 10 of 23