Search results
1 – 10 of 23Prudence Kadebu, Robert T.R. Shoniwa, Kudakwashe Zvarevashe, Addlight Mukwazvure, Innocent Mapanga, Nyasha Fadzai Thusabantu and Tatenda Trust Gotora
Given how smart today’s malware authors have become through employing highly sophisticated techniques, it is only logical that methods be developed to combat the most potent…
Abstract
Purpose
Given how smart today’s malware authors have become through employing highly sophisticated techniques, it is only logical that methods be developed to combat the most potent threats, particularly where the malware is stealthy and makes indicators of compromise (IOC) difficult to detect. After the analysis is completed, the output can be employed to detect and then counteract the attack. The goal of this work is to propose a machine learning approach to improve malware detection by combining the strengths of both supervised and unsupervised machine learning techniques. This study is essential as malware has certainly become ubiquitous as cyber-criminals use it to attack systems in cyberspace. Malware analysis is required to reveal hidden IOC, to comprehend the attacker’s goal and the severity of the damage and to find vulnerabilities within the system.
Design/methodology/approach
This research proposes a hybrid approach for dynamic and static malware analysis that combines unsupervised and supervised machine learning algorithms and goes on to show how Malware exploiting steganography can be exposed.
Findings
The tactics used by malware developers to circumvent detection are becoming more advanced with steganography becoming a popular technique applied in obfuscation to evade mechanisms for detection. Malware analysis continues to call for continuous improvement of existing techniques. State-of-the-art approaches applying machine learning have become increasingly popular with highly promising results.
Originality/value
Cyber security researchers globally are grappling with devising innovative strategies to identify and defend against the threat of extremely sophisticated malware attacks on key infrastructure containing sensitive data. The process of detecting the presence of malware requires expertise in malware analysis. Applying intelligent methods to this process can aid practitioners in identifying malware’s behaviour and features. This is especially expedient where the malware is stealthy, hiding IOC.
Details
Keywords
Milad Soltani, Alexios Kythreotis and Arash Roshanpoor
The emergence of machine learning has opened a new way for researchers. It allows them to supplement the traditional manual methods for conducting a literature review and turning…
Abstract
Purpose
The emergence of machine learning has opened a new way for researchers. It allows them to supplement the traditional manual methods for conducting a literature review and turning it into smart literature. This study aims to present a framework for incorporating machine learning into financial statement fraud (FSF) literature analysis. This framework facilitates the analysis of a large amount of literature to show the trend of the field and identify the most productive authors, journals and potential areas for future research.
Design/methodology/approach
In this study, a framework was introduced that merges bibliometric analysis techniques such as word frequency, co-word analysis and coauthorship analysis with the Latent Dirichlet Allocation topic modeling approach. This framework was used to uncover subtopics from 20 years of financial fraud research articles. Furthermore, the hierarchical clustering method was used on selected subtopics to demonstrate the primary contexts in the literature on FSF.
Findings
This study has contributed to the literature in two ways. First, this study has determined the top journals, articles, countries and keywords based on various bibliometric metrics. Second, using topic modeling and then hierarchy clustering, this study demonstrates the four primary contexts in FSF detection.
Research limitations/implications
In this study, the authors tried to comprehensively view the studies related to financial fraud conducted over two decades. However, this research has limitations that can be an opportunity for future researchers. The first limitation is due to language bias. This study has focused on English language articles, so it is suggested that other researchers consider other languages as well. The second limitation is caused by citation bias. In this study, the authors tried to show the top articles based on the citation criteria. However, judging based on citation alone can be misleading. Therefore, this study suggests that the researchers consider other measures to check the citation quality and assess the studies’ precision by applying meta-analysis.
Originality/value
Despite the popularity of bibliometric analysis and topic modeling, there have been limited efforts to use machine learning for literature review. This novel approach of using hierarchical clustering on topic modeling results enable us to uncover four primary contexts. Furthermore, this method allowed us to show the keywords of each context and highlight significant articles within each context.
Details
Keywords
Bianca Caiazzo, Teresa Murino, Alberto Petrillo, Gianluca Piccirillo and Stefania Santini
This work aims at proposing a novel Internet of Things (IoT)-based and cloud-assisted monitoring architecture for smart manufacturing systems able to evaluate their overall status…
Abstract
Purpose
This work aims at proposing a novel Internet of Things (IoT)-based and cloud-assisted monitoring architecture for smart manufacturing systems able to evaluate their overall status and detect eventual anomalies occurring into the production. A novel artificial intelligence (AI) based technique, able to identify the specific anomalous event and the related risk classification for possible intervention, is hence proposed.
Design/methodology/approach
The proposed solution is a five-layer scalable and modular platform in Industry 5.0 perspective, where the crucial layer is the Cloud Cyber one. This embeds a novel anomaly detection solution, designed by leveraging control charts, autoencoders (AE) long short-term memory (LSTM) and Fuzzy Inference System (FIS). The proper combination of these methods allows, not only detecting the products defects, but also recognizing their causalities.
Findings
The proposed architecture, experimentally validated on a manufacturing system involved into the production of a solar thermal high-vacuum flat panel, provides to human operators information about anomalous events, where they occur, and crucial information about their risk levels.
Practical implications
Thanks to the abnormal risk panel; human operators and business managers are able, not only of remotely visualizing the real-time status of each production parameter, but also to properly face with the eventual anomalous events, only when necessary. This is especially relevant in an emergency situation, such as the COVID-19 pandemic.
Originality/value
The monitoring platform is one of the first attempts in leading modern manufacturing systems toward the Industry 5.0 concept. Indeed, it combines human strengths, IoT technology on machines, cloud-based solutions with AI and zero detect manufacturing strategies in a unified framework so to detect causalities in complex dynamic systems by enabling the possibility of products’ waste avoidance.
Details
Keywords
In this research, the authors demonstrate the advantage of reinforcement learning (RL) based intrusion detection systems (IDS) to solve very complex problems (e.g. selecting input…
Abstract
Purpose
In this research, the authors demonstrate the advantage of reinforcement learning (RL) based intrusion detection systems (IDS) to solve very complex problems (e.g. selecting input features, considering scarce resources and constrains) that cannot be solved by classical machine learning. The authors include a comparative study to build intrusion detection based on statistical machine learning and representational learning, using knowledge discovery in databases (KDD) Cup99 and Installation Support Center of Expertise (ISCX) 2012.
Design/methodology/approach
The methodology applies a data analytics approach, consisting of data exploration and machine learning model training and evaluation. To build a network-based intrusion detection system, the authors apply dueling double deep Q-networks architecture enabled with costly features, k-nearest neighbors (K-NN), support-vector machines (SVM) and convolution neural networks (CNN).
Findings
Machine learning-based intrusion detection are trained on historical datasets which lead to model drift and lack of generalization whereas RL is trained with data collected through interactions. RL is bound to learn from its interactions with a stochastic environment in the absence of a training dataset whereas supervised learning simply learns from collected data and require less computational resources.
Research limitations/implications
All machine learning models have achieved high accuracy values and performance. One potential reason is that both datasets are simulated, and not realistic. It was not clear whether a validation was ever performed to show that data were collected from real network traffics.
Practical implications
The study provides guidelines to implement IDS with classical supervised learning, deep learning and RL.
Originality/value
The research applied the dueling double deep Q-networks architecture enabled with costly features to build network-based intrusion detection from network traffics. This research presents a comparative study of reinforcement-based instruction detection with counterparts built with statistical and representational machine learning.
Details
Keywords
Zhiwen Pan, Wen Ji, Yiqiang Chen, Lianjun Dai and Jun Zhang
The disability datasets are the datasets that contain the information of disabled populations. By analyzing these datasets, professionals who work with disabled populations can…
Abstract
Purpose
The disability datasets are the datasets that contain the information of disabled populations. By analyzing these datasets, professionals who work with disabled populations can have a better understanding of the inherent characteristics of the disabled populations, so that working plans and policies, which can effectively help the disabled populations, can be made accordingly.
Design/methodology/approach
In this paper, the authors proposed a big data management and analytic approach for disability datasets.
Findings
By using a set of data mining algorithms, the proposed approach can provide the following services. The data management scheme in the approach can improve the quality of disability data by estimating miss attribute values and detecting anomaly and low-quality data instances. The data mining scheme in the approach can explore useful patterns which reflect the correlation, association and interactional between the disability data attributes. Experiments based on real-world dataset are conducted at the end to prove the effectiveness of the approach.
Originality/value
The proposed approach can enable data-driven decision-making for professionals who work with disabled populations.
Details
Keywords
Joo Hun Yoo, Hyejun Jeong, Jaehyeok Lee and Tai-Myoung Chung
This study aims to summarize the critical issues in medical federated learning and applicable solutions. Also, detailed explanations of how federated learning techniques can be…
Abstract
Purpose
This study aims to summarize the critical issues in medical federated learning and applicable solutions. Also, detailed explanations of how federated learning techniques can be applied to the medical field are presented. About 80 reference studies described in the field were reviewed, and the federated learning framework currently being developed by the research team is provided. This paper will help researchers to build an actual medical federated learning environment.
Design/methodology/approach
Since machine learning techniques emerged, more efficient analysis was possible with a large amount of data. However, data regulations have been tightened worldwide, and the usage of centralized machine learning methods has become almost infeasible. Federated learning techniques have been introduced as a solution. Even with its powerful structural advantages, there still exist unsolved challenges in federated learning in a real medical data environment. This paper aims to summarize those by category and presents possible solutions.
Findings
This paper provides four critical categorized issues to be aware of when applying the federated learning technique to the actual medical data environment, then provides general guidelines for building a federated learning environment as a solution.
Originality/value
Existing studies have dealt with issues such as heterogeneity problems in the federated learning environment itself, but those were lacking on how these issues incur problems in actual working tasks. Therefore, this paper helps researchers understand the federated learning issues through examples of actual medical machine learning environments.
Details
Keywords
Pietro Miglioranza, Andrea Scanu, Giuseppe Simionato, Nicholas Sinigaglia and America Califano
Climate-induced damage is a pressing problem for the preservation of cultural properties. Their physical deterioration is often the cumulative effect of different environmental…
Abstract
Purpose
Climate-induced damage is a pressing problem for the preservation of cultural properties. Their physical deterioration is often the cumulative effect of different environmental hazards of variable intensity. Among these, fluctuations of temperature and relative humidity may cause nonrecoverable physical changes in building envelopes and artifacts made of hygroscopic materials, such as wood. Microclimatic fluctuations may be caused by several factors, including the presence of many visitors within the historical building. Within this framework, the current work is focused on detecting events taking place in two Norwegian stave churches, by identifying the fluctuations in temperature and relative humidity caused by the presence of people attending the public events.
Design/methodology/approach
The identification of such fluctuations and, so, of the presence of people within the churches has been carried out through three different methods. The first is an unsupervised clustering algorithm here termed “density peak,” the second is a supervised deep learning model based on a standard convolutional neural network (CNN) and the third is a novel ad hoc engineering feature approach “unexpected mixing ratio (UMR) peak.”
Findings
While the first two methods may have some instabilities (in terms of precision, recall and normal mutual information [NMI]), the last one shows a promising performance in the detection of microclimatic fluctuations induced by the presence of visitors.
Originality/value
The novelty of this work stands in using both well-established and in-house ad hoc machine learning algorithms in the field of heritage science, proving that these smart approaches could be of extreme usefulness and could lead to quick data analyses, if used properly.
Details
Keywords
Kedong Yin, Yun Cao, Shiwei Zhou and Xinman Lv
The purposes of this research are to study the theory and method of multi-attribute index system design and establish a set of systematic, standardized, scientific index systems…
Abstract
Purpose
The purposes of this research are to study the theory and method of multi-attribute index system design and establish a set of systematic, standardized, scientific index systems for the design optimization and inspection process. The research may form the basis for a rational, comprehensive evaluation and provide the most effective way of improving the quality of management decision-making. It is of practical significance to improve the rationality and reliability of the index system and provide standardized, scientific reference standards and theoretical guidance for the design and construction of the index system.
Design/methodology/approach
Using modern methods such as complex networks and machine learning, a system for the quality diagnosis of index data and the classification and stratification of index systems is designed. This guarantees the quality of the index data, realizes the scientific classification and stratification of the index system, reduces the subjectivity and randomness of the design of the index system, enhances its objectivity and rationality and lays a solid foundation for the optimal design of the index system.
Findings
Based on the ideas of statistics, system theory, machine learning and data mining, the focus in the present research is on “data quality diagnosis” and “index classification and stratification” and clarifying the classification standards and data quality characteristics of index data; a data-quality diagnosis system of “data review – data cleaning – data conversion – data inspection” is established. Using a decision tree, explanatory structural model, cluster analysis, K-means clustering and other methods, classification and hierarchical method system of indicators is designed to reduce the redundancy of indicator data and improve the quality of the data used. Finally, the scientific and standardized classification and hierarchical design of the index system can be realized.
Originality/value
The innovative contributions and research value of the paper are reflected in three aspects. First, a method system for index data quality diagnosis is designed, and multi-source data fusion technology is adopted to ensure the quality of multi-source, heterogeneous and mixed-frequency data of the index system. The second is to design a systematic quality-inspection process for missing data based on the systematic thinking of the whole and the individual. Aiming at the accuracy, reliability, and feasibility of the patched data, a quality-inspection method of patched data based on inversion thought and a unified representation method of data fusion based on a tensor model are proposed. The third is to use the modern method of unsupervised learning to classify and stratify the index system, which reduces the subjectivity and randomness of the design of the index system and enhances its objectivity and rationality.
Details
Keywords
This paper tests whether Bayesian A/B testing yields better decisions that traditional Neyman-Pearson hypothesis testing. It proposes a model and tests it using a large, multiyear…
Abstract
Purpose
This paper tests whether Bayesian A/B testing yields better decisions that traditional Neyman-Pearson hypothesis testing. It proposes a model and tests it using a large, multiyear Google Analytics (GA) dataset.
Design/methodology/approach
This paper is an empirical study. Competing A/B testing models were used to analyze a large, multiyear dataset of GA dataset for a firm that relies entirely on their website and online transactions for customer engagement and sales.
Findings
Bayesian A/B tests of the data not only yielded a clear delineation of the timing and impact of the intellectual property fraud, but calculated the loss of sales dollars, traffic and time on the firm’s website, with precise confidence limits. Frequentist A/B testing identified fraud in bounce rate at 5% significance, and bounces at 10% significance, but was unable to ascertain fraud at the standard significance cutoffs for scientific studies.
Research limitations/implications
None within the scope of the research plan.
Practical implications
Bayesian A/B tests of the data not only yielded a clear delineation of the timing and impact of the IP fraud, but calculated the loss of sales dollars, traffic and time on the firm’s website, with precise confidence limits.
Social implications
Bayesian A/B testing can derive economically meaningful statistics, whereas frequentist A/B testing only provide p-value’s whose meaning may be hard to grasp, and where misuse is widespread and has been a major topic in metascience. While misuse of p-values in scholarly articles may simply be grist for academic debate, the uncertainty surrounding the meaning of p-values in business analytics actually can cost firms money.
Originality/value
There is very little empirical research in e-commerce that uses Bayesian A/B testing. Almost all corporate testing is done via frequentist Neyman-Pearson methods.
Details
Keywords
Gustavo Grander, Luciano Ferreira da Silva and Ernesto Del Rosário Santibañez Gonzalez
This paper aims to analyze how decision support systems manage Big data to obtain value.
Abstract
Purpose
This paper aims to analyze how decision support systems manage Big data to obtain value.
Design/methodology/approach
A systematic literature review was performed with screening and analysis of 72 articles published between 2012 and 2019.
Findings
The findings reveal that techniques of big data analytics, machine learning algorithms and technologies predominantly related to computer science and cloud computing are used on decision support systems. Another finding was that the main areas that these techniques and technologies are been applied are logistic, traffic, health, business and market. This article also allows authors to understand the relationship in which descriptive, predictive and prescriptive analyses are used according to an inverse relationship of complexity in data analysis and the need for human decision-making.
Originality/value
As it is an emerging theme, this study seeks to present an overview of the techniques and technologies that are being discussed in the literature to solve problems in their respective areas, as a form of theoretical contribution. The authors also understand that there is a practical contribution to the maturity of the discussion and with reflections even presented as suggestions for future research, such as the ethical discussion. This study’s descriptive classification can also serve as a guide for new researchers who seek to understand the research involving decision support systems and big data to gain value in our society.
Details