Search results
1 – 10 of 446Classification techniques have been applied to many applications in various fields of sciences. There are several ways of evaluating classification algorithms. The analysis of…
Abstract
Classification techniques have been applied to many applications in various fields of sciences. There are several ways of evaluating classification algorithms. The analysis of such metrics and its significance must be interpreted correctly for evaluating different learning algorithms. Most of these measures are scalar metrics and some of them are graphical methods. This paper introduces a detailed overview of the classification assessment measures with the aim of providing the basics of these measures and to show how it works to serve as a comprehensive source for researchers who are interested in this field. This overview starts by highlighting the definition of the confusion matrix in binary and multi-class classification problems. Many classification measures are also explained in details, and the influence of balanced and imbalanced data on each metric is presented. An illustrative example is introduced to show (1) how to calculate these measures in binary and multi-class classification problems, and (2) the robustness of some measures against balanced and imbalanced data. Moreover, some graphical measures such as Receiver operating characteristics (ROC), Precision-Recall, and Detection error trade-off (DET) curves are presented with details. Additionally, in a step-by-step approach, different numerical examples are demonstrated to explain the preprocessing steps of plotting ROC, PR, and DET curves.
Details
Keywords
Rui Wang, Shunjie Zhang, Shengqiang Liu, Weidong Liu and Ao Ding
The purpose is using generative adversarial network (GAN) to solve the problem of sample augmentation in the case of imbalanced bearing fault data sets and improving residual…
Abstract
Purpose
The purpose is using generative adversarial network (GAN) to solve the problem of sample augmentation in the case of imbalanced bearing fault data sets and improving residual network is used to improve the diagnostic accuracy of the bearing fault intelligent diagnosis model in the environment of high signal noise.
Design/methodology/approach
A bearing vibration data generation model based on conditional GAN (CGAN) framework is proposed. The method generates data based on the adversarial mechanism of GANs and uses a small number of real samples to generate data, thereby effectively expanding imbalanced data sets. Combined with the data augmentation method based on CGAN, a fault diagnosis model of rolling bearing under the condition of data imbalance based on CGAN and improved residual network with attention mechanism is proposed.
Findings
The method proposed in this paper is verified by the western reserve data set and the truck bearing test bench data set, proving that the CGAN-based data generation method can form a high-quality augmented data set, while the CGAN-based and improved residual with attention mechanism. The diagnostic model of the network has better diagnostic accuracy under low signal-to-noise ratio samples.
Originality/value
A bearing vibration data generation model based on CGAN framework is proposed. The method generates data based on the adversarial mechanism of GAN and uses a small number of real samples to generate data, thereby effectively expanding imbalanced data sets. Combined with the data augmentation method based on CGAN, a fault diagnosis model of rolling bearing under the condition of data imbalance based on CGAN and improved residual network with attention mechanism is proposed.
Details
Keywords
Ema Utami, Irwan Oyong, Suwanto Raharjo, Anggit Dwi Hartanto and Sumarni Adi
Gathering knowledge regarding personality traits has long been the interest of academics and researchers in the fields of psychology and in computer science. Analyzing profile…
Abstract
Purpose
Gathering knowledge regarding personality traits has long been the interest of academics and researchers in the fields of psychology and in computer science. Analyzing profile data from personal social media accounts reduces data collection time, as this method does not require users to fill any questionnaires. A pure natural language processing (NLP) approach can give decent results, and its reliability can be improved by combining it with machine learning (as shown by previous studies).
Design/methodology/approach
In this, cleaning the dataset and extracting relevant potential features “as assessed by psychological experts” are essential, as Indonesians tend to mix formal words, non-formal words, slang and abbreviations when writing social media posts. For this article, raw data were derived from a predefined dominance, influence, stability and conscientious (DISC) quiz website, returning 316,967 tweets from 1,244 Twitter accounts “filtered to include only personal and Indonesian-language accounts”. Using a combination of NLP techniques and machine learning, the authors aim to develop a better approach and more robust model, especially for the Indonesian language.
Findings
The authors find that employing a SMOTETomek re-sampling technique and hyperparameter tuning boosts the model’s performance on formalized datasets by 57% (as measured through the F1-score).
Originality/value
The process of cleaning dataset and extracting relevant potential features assessed by psychological experts from it are essential because Indonesian people tend to mix formal words, non-formal words, slang words and abbreviations when writing tweets. Organic data derived from a predefined DISC quiz website resulting 1244 records of Twitter accounts and 316.967 tweets.
Details
Keywords
Bo Wang, Guanwei Wang, Youwei Wang, Zhengzheng Lou, Shizhe Hu and Yangdong Ye
Vehicle fault diagnosis is a key factor in ensuring the safe and efficient operation of the railway system. Due to the numerous vehicle categories and different fault mechanisms…
Abstract
Purpose
Vehicle fault diagnosis is a key factor in ensuring the safe and efficient operation of the railway system. Due to the numerous vehicle categories and different fault mechanisms, there is an unbalanced fault category problem. Most of the current methods to solve this problem have complex algorithm structures, low efficiency and require prior knowledge. This study aims to propose a new method which has a simple structure and does not require any prior knowledge to achieve a fast diagnosis of unbalanced vehicle faults.
Design/methodology/approach
This study proposes a novel K-means with feature learning based on the feature learning K-means-improved cluster-centers selection (FKM-ICS) method, which includes the ICS and the FKM. Specifically, this study defines cluster centers approximation to select the initialized cluster centers in the ICS. This study uses improved term frequency-inverse document frequency to measure and adjust the feature word weights in each cluster, retaining the top τ feature words with the highest weight in each cluster and perform the clustering process again in the FKM. With the FKM-ICS method, clustering performance for unbalanced vehicle fault diagnosis can be significantly enhanced.
Findings
This study finds that the FKM-ICS can achieve a fast diagnosis of vehicle faults on the vehicle fault text (VFT) data set from a railway station in the 2017 (VFT) data set. The experimental results on VFT indicate the proposed method in this paper, outperforms several state-of-the-art methods.
Originality/value
This is the first effort to address the vehicle fault diagnostic problem and the proposed method performs effectively and efficiently. The ICS enables the FKM-ICS method to exclude the effect of outliers, solves the disadvantages of the fault text data contained a certain amount of noisy data, which effectively enhanced the method stability. The FKM enhances the distribution of feature words that discriminate between different fault categories and reduces the number of feature words to make the FKM-ICS method faster and better cluster for unbalanced vehicle fault diagnostic.
Details
Keywords
M'hamed Bilal Abidine, Mourad Oussalah, Belkacem Fergani and Hakim Lounis
Mobile phone-based human activity recognition (HAR) consists of inferring user’s activity type from the analysis of the inertial mobile sensor data. This paper aims to mainly…
Abstract
Purpose
Mobile phone-based human activity recognition (HAR) consists of inferring user’s activity type from the analysis of the inertial mobile sensor data. This paper aims to mainly introduce a new classification approach called adaptive k-nearest neighbors (AKNN) for intelligent HAR using smartphone inertial sensors with a potential real-time implementation on smartphone platform.
Design/methodology/approach
The proposed method puts forward several modification on AKNN baseline by using kernel discriminant analysis for feature reduction and hybridizing weighted support vector machines and KNN to tackle imbalanced class data set.
Findings
Extensive experiments on a five large scale daily activity recognition data set have been performed to demonstrate the effectiveness of the method in terms of error rate, recall, precision, F1-score and computational/memory resources, with several comparison with state-of-the art methods and other hybridization modes. The results showed that the proposed method can achieve more than 50% improvement in error rate metric and up to 5.6% in F1-score. The training phase is also shown to be reduced by a factor of six compared to baseline, which provides solid assets for smartphone implementation.
Practical implications
This work builds a bridge to already growing work in machine learning related to learning with small data set. Besides, the availability of systems that are able to perform on flight activity recognition on smartphone will have a significant impact in the field of pervasive health care, supporting a variety of practical applications such as elderly care, ambient assisted living and remote monitoring.
Originality/value
The purpose of this study is to build and test an accurate offline model by using only a compact training data that can reduce the computational and memory complexity of the system. This provides grounds for developing new innovative hybridization modes in the context of daily activity recognition and smartphone-based implementation. This study demonstrates that the new AKNN is able to classify the data without any training step because it does not use any model for fitting and only uses memory resources to store the corresponding support vectors.
Details
Keywords
Clara Martin-Duque, Juan José Fernández-Muñoz, Javier M. Moguerza and Aurora Ruiz-Rua
Recommendation systems are a fundamental tool for hotels to adopt a differentiating competitive strategy. The main purpose of this work is to use machine learning techniques to…
Abstract
Purpose
Recommendation systems are a fundamental tool for hotels to adopt a differentiating competitive strategy. The main purpose of this work is to use machine learning techniques to treat imbalanced data sets, not applied until now in the tourism field. These techniques have allowed the authors to analyse the influence of imbalance data on hotel recommendation models and how this phenomenon affects client dissatisfaction.
Design/methodology/approach
An opinion survey was conducted among hotel customers of different categories in 120 different countries. A total of 135.102 surveys were collected over eleven quarters. A longitudinal design was conducted during this period. A binary logistic model was applied using the function generalized lineal model (GLM).
Findings
Through the analysis of a representative amount of data, the authors empirically demonstrate that the imbalance phenomenon is systematically present in hotel recommendation surveys. In addition, the authors show that the imbalance exists independently of the period in which the survey is done, which means that it is intrinsic to recommendation surveys on this topic. The authors demonstrate the improvement of recommendation systems highlighting the presence of imbalance data and consequences for marketing strategies.
Originality/value
The main contribution of the current work is to apply to the tourism sector the framework for imbalanced data, typically used in the machine learning, improving predictive models.
Details
Keywords
Leila Ismail and Huned Materwala
Machine Learning is an intelligent methodology used for prediction and has shown promising results in predictive classifications. One of the critical areas in which machine…
Abstract
Purpose
Machine Learning is an intelligent methodology used for prediction and has shown promising results in predictive classifications. One of the critical areas in which machine learning can save lives is diabetes prediction. Diabetes is a chronic disease and one of the 10 causes of death worldwide. It is expected that the total number of diabetes will be 700 million in 2045; a 51.18% increase compared to 2019. These are alarming figures, and therefore, it becomes an emergency to provide an accurate diabetes prediction.
Design/methodology/approach
Health professionals and stakeholders are striving for classification models to support prognosis of diabetes and formulate strategies for prevention. The authors conduct literature review of machine models and propose an intelligent framework for diabetes prediction.
Findings
The authors provide critical analysis of machine learning models, propose and evaluate an intelligent machine learning-based architecture for diabetes prediction. The authors implement and evaluate the decision tree (DT)-based random forest (RF) and support vector machine (SVM) learning models for diabetes prediction as the mostly used approaches in the literature using our framework.
Originality/value
This paper provides novel intelligent diabetes mellitus prediction framework (IDMPF) using machine learning. The framework is the result of a critical examination of prediction models in the literature and their application to diabetes. The authors identify the training methodologies, models evaluation strategies, the challenges in diabetes prediction and propose solutions within the framework. The research results can be used by health professionals, stakeholders, students and researchers working in the diabetes prediction area.
Details
Keywords
Xiaomei Jiang, Shuo Wang, Wenjian Liu and Yun Yang
Traditional Chinese medicine (TCM) prescriptions have always relied on the experience of TCM doctors, and machine learning(ML) provides a technical means for learning these…
Abstract
Purpose
Traditional Chinese medicine (TCM) prescriptions have always relied on the experience of TCM doctors, and machine learning(ML) provides a technical means for learning these experiences and intelligently assists in prescribing. However, in TCM prescription, there are the main (Jun) herb and the auxiliary (Chen, Zuo and Shi) herb collocations. In a prescription, the types of auxiliary herbs are often more than the main herb and the auxiliary herbs often appear in other prescriptions. This leads to different frequencies of different herbs in prescriptions, namely, imbalanced labels (herbs). As a result, the existing ML algorithms are biased, and it is difficult to predict the main herb with less frequency in the actual prediction and poor performance. In order to solve the impact of this problem, this paper proposes a framework for multi-label traditional Chinese medicine (ML-TCM) based on multi-label resampling.
Design/methodology/approach
In this work, a multi-label learning framework is proposed that adopts and compares the multi-label random resampling (MLROS), multi-label synthesized resampling (MLSMOTE) and multi-label synthesized resampling based on local label imbalance (MLSOL), three multi-label oversampling techniques to rebalance the TCM data.
Findings
The experimental results show that after resampling, the less frequent but important herbs can be predicted more accurately. The MLSOL method is shown to be the best with over 10% improvements on average because it balances the data by considering both features and labels when resampling.
Originality/value
The authors first systematically analyzed the label imbalance problem of different sampling methods in the field of TCM and provide a solution. And through the experimental results analysis, the authors proved the feasibility of this method, which can improve the performance by 10%−30% compared with the state-of-the-art methods.
Details
Keywords
Afreen Khan, Swaleha Zubair and Samreen Khan
This study aimed to assess the potential of the Clinical Dementia Rating (CDR) Scale in the prognosis of dementia in elderly subjects.
Abstract
Purpose
This study aimed to assess the potential of the Clinical Dementia Rating (CDR) Scale in the prognosis of dementia in elderly subjects.
Design/methodology/approach
Dementia staging severity is clinically an essential task, so the authors used machine learning (ML) on the magnetic resonance imaging (MRI) features to locate and study the impact of various MR readings onto the classification of demented and nondemented patients. The authors used cross-sectional MRI data in this study. The designed ML approach established the role of CDR in the prognosis of inflicted and normal patients. Moreover, the pattern analysis indicated CDR as a strong cohort amongst the various attributes, with CDR to have a significant value of p < 0.01. The authors employed 20 ML classifiers.
Findings
The mean prediction accuracy varied with the various ML classifier used, with the bagging classifier (random forest as a base estimator) achieving the highest (93.67%). A series of ML analyses demonstrated that the model including the CDR score had better prediction accuracy and other related performance metrics.
Originality/value
The results suggest that the CDR score, a simple clinical measure, can be used in real community settings. It can be used to predict dementia progression with ML modeling.
Details
Keywords
Francois Du Rand, André Francois van der Merwe and Malan van Tonder
This paper aims to discuss the development of a defect classification system that can be used to detect and classify powder bed surface defects from captured layer images without…
Abstract
Purpose
This paper aims to discuss the development of a defect classification system that can be used to detect and classify powder bed surface defects from captured layer images without the need for specialised computational hardware. The idea is to develop this system by making use of more traditional machine learning (ML) models instead of using computationally intensive deep learning (DL) models.
Design/methodology/approach
The approach that is used by this study is to use traditional image processing and classification techniques that can be applied to captured layer images to detect and classify defects without the need for DL algorithms.
Findings
The study proved that a defect classification algorithm could be developed by making use of traditional ML models with a high degree of accuracy and the images could be processed at higher speeds than typically reported in literature when making use of DL models.
Originality/value
This paper addresses a need that has been identified for a high-speed defect classification algorithm that can detect and classify defects without the need for specialised hardware that is typically used when making use of DL technologies. This is because when developing closed-loop feedback systems for these additive manufacturing machines, it is important to detect and classify defects without inducing additional delays to the control system.
Details