Search results

1 – 10 of 487
Book part
Publication date: 26 October 2017

Son Nguyen, John Quinn and Alan Olinsky

We propose an oversampling technique to increase the true positive rate (sensitivity) in classifying imbalanced datasets (i.e., those with a value for the target variable that…

Abstract

We propose an oversampling technique to increase the true positive rate (sensitivity) in classifying imbalanced datasets (i.e., those with a value for the target variable that occurs with a small frequency) and hence boost the overall performance measurements such as balanced accuracy, G-mean and area under the receiver operating characteristic (ROC) curve, AUC. This oversampling method is based on the idea of applying the Synthetic Minority Oversampling Technique (SMOTE) on only a selective portion of the dataset instead of the entire dataset. We demonstrate the effectiveness of our oversampling method with four real and simulated datasets generated from three models.

Details

Advances in Business and Management Forecasting
Type: Book
ISBN: 978-1-78743-069-3

Keywords

Book part
Publication date: 1 September 2021

Son Nguyen, Phyllis Schumacher, Alan Olinsky and John Quinn

We study the performances of various predictive models including decision trees, random forests, neural networks, and linear discriminant analysis on an imbalanced data set of…

Abstract

We study the performances of various predictive models including decision trees, random forests, neural networks, and linear discriminant analysis on an imbalanced data set of home loan applications. During the process, we propose our undersampling algorithm to cope with the issues created by the imbalance of the data. Our technique is shown to work competitively against popular resampling techniques such as random oversampling, undersampling, synthetic minority oversampling technique (SMOTE), and random oversampling examples (ROSE). We also investigate the relation between the true positive rate, true negative rate, and the imbalance of the data.

Article
Publication date: 4 December 2018

Zhongyi Hu, Raymond Chiong, Ilung Pranata, Yukun Bao and Yuqing Lin

Malicious web domain identification is of significant importance to the security protection of internet users. With online credibility and performance data, the purpose of this…

Abstract

Purpose

Malicious web domain identification is of significant importance to the security protection of internet users. With online credibility and performance data, the purpose of this paper to investigate the use of machine learning techniques for malicious web domain identification by considering the class imbalance issue (i.e. there are more benign web domains than malicious ones).

Design/methodology/approach

The authors propose an integrated resampling approach to handle class imbalance by combining the synthetic minority oversampling technique (SMOTE) and particle swarm optimisation (PSO), a population-based meta-heuristic algorithm. The authors use the SMOTE for oversampling and PSO for undersampling.

Findings

By applying eight well-known machine learning classifiers, the proposed integrated resampling approach is comprehensively examined using several imbalanced web domain data sets with different imbalance ratios. Compared to five other well-known resampling approaches, experimental results confirm that the proposed approach is highly effective.

Practical implications

This study not only inspires the practical use of online credibility and performance data for identifying malicious web domains but also provides an effective resampling approach for handling the class imbalance issue in the area of malicious web domain identification.

Originality/value

Online credibility and performance data are applied to build malicious web domain identification models using machine learning techniques. An integrated resampling approach is proposed to address the class imbalance issue. The performance of the proposed approach is confirmed based on real-world data sets with different imbalance ratios.

Article
Publication date: 20 June 2016

Lei Wang, Yongde Zhang, Shuanghui Hao, Baoyu Song, Minghui Hao and Zili Tang

To eliminate the angle deviation of magnetic encoder, this paper aims to propose a compensation method based on permanent magnet synchronous motor (PMSM) sensorless control. The…

Abstract

Purpose

To eliminate the angle deviation of magnetic encoder, this paper aims to propose a compensation method based on permanent magnet synchronous motor (PMSM) sensorless control. The paper also describes the experiments performed to verify the validity of this proposed method.

Design/methodology/approach

The proposed method uses PMSM sensorless control method to get high precision virtual angle value, and then get the deviation value between virtual position and magnetic angle which is used as compensation table. Oversampling linear interpolation tabulation method has been proposed to eliminate the noise signals. Finally, a magnetic encoder with precision (repeatability) 0.09° and unidirectional motion precision 0.03 is realized. The control system with an encoder running at 14,000 and 0.01 r/min showing high motion resolution is also realized.

Findings

Higher value of current in PMSM leads to a magnetic encoder with higher precision. When using oversampling linear interpolation to tabulate the compensation table, it is understood that more oversampling does not lead to a better result. Finally, validated by experiments, using eight intervals to calculate the mean value of angle deviation leads to the best result.

Practical implications

The angle deviation compensation method proposed in this paper has a great practical implication and a good commercial application. The method proposed in this paper could be effectively used to self-correct the magnetic encoder using arctangent method and also correct any rotary encoder sensor.

Originality/value

This paper originally proposes an adaptive correction method for a rotary encoder based on PMSM sensorless control. To eliminate the noise signals in an angle compensation table, over-sampling linear interpolation tabulation method has been proposed which also guarantees the precision of the compensation table.

Details

Sensor Review, vol. 36 no. 3
Type: Research Article
ISSN: 0260-2288

Keywords

Article
Publication date: 15 March 2021

Putta Hemalatha and Geetha Mary Amalanathan

Adequate resources for learning and training the data are an important constraint to develop an efficient classifier with outstanding performance. The data usually follows a…

Abstract

Purpose

Adequate resources for learning and training the data are an important constraint to develop an efficient classifier with outstanding performance. The data usually follows a biased distribution of classes that reflects an unequal distribution of classes within a dataset. This issue is known as the imbalance problem, which is one of the most common issues occurring in real-time applications. Learning of imbalanced datasets is a ubiquitous challenge in the field of data mining. Imbalanced data degrades the performance of the classifier by producing inaccurate results.

Design/methodology/approach

In the proposed work, a novel fuzzy-based Gaussian synthetic minority oversampling (FG-SMOTE) algorithm is proposed to process the imbalanced data. The mechanism of the Gaussian SMOTE technique is based on finding the nearest neighbour concept to balance the ratio between minority and majority class datasets. The ratio of the datasets belonging to the minority and majority class is balanced using a fuzzy-based Levenshtein distance measure technique.

Findings

The performance and the accuracy of the proposed algorithm is evaluated using the deep belief networks classifier and the results showed the efficiency of the fuzzy-based Gaussian SMOTE technique achieved an AUC: 93.7%. F1 Score Prediction: 94.2%, Geometric Mean Score: 93.6% predicted from confusion matrix.

Research limitations/implications

The proposed research still retains some of the challenges that need to be focused such as application FG-SMOTE to multiclass imbalanced dataset and to evaluate dataset imbalance problem in a distributed environment.

Originality/value

The proposed algorithm fundamentally solves the data imbalance issues and challenges involved in handling the imbalanced data. FG-SMOTE has aided in balancing minority and majority class datasets.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 14 no. 2
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 29 September 2020

G. Sreeram, S. Pradeep, K. Sreenivasa Rao, B. Deevana Raju and Parveen Nikhat

The paper aims to precise and fast categorization on to transaction evolves into indispensible. The effective capacity difficulty of all the IDS simulates today at below discovery…

Abstract

Purpose

The paper aims to precise and fast categorization on to transaction evolves into indispensible. The effective capacity difficulty of all the IDS simulates today at below discovery amount of fewer regular barrage associations and therefore the next warning rate.

Design/methodology/approach

The reticulum perception is that the methods which examine and determine the scheme of contact on unearths toward number of dangerous and perchance fateful interchanges occurring toward the system. Within character of guaran-teeing the slumberous, opening and uprightness count of to socialize for professional. The precise and fast categorization on to transaction evolves into indispensible. The effective capacity difficulty of all the intrusion detection simulation (IDS) simulates today at below discovery amount of fewer regular barrage associations and therefore the next warning rate. The container with systems of connections are reproduction everything beacon subject to the series of actions to achieve results accepts exists a contemporary well-known method. At the indicated motivation a hybrid methodology supported pairing distinct ripple transformation and human intelligence artificial neural network (ANN) for IDS is projected. The lack of balance of the situation traversing the space beyond information range was eliminated through synthetic minority oversampling technique-based oversampling have low regular object and irregular below examine of the dominant object. We are binding with three layer ANN is being used for classification, and thus the experimental results on knowledge discovery databases are being used for the facts in occurrence of accuracy rate and disclosure estimation toward identical period. True and false made up accepted.

Findings

At the indicated motivation a hybrid methodology supported pairing distinct ripple transformation and human intelligence ANN for IDS is projected. The lack of balance of the situation traversing the space beyond information range was eliminated through synthetic minority oversampling technique-based oversampling have low regular object and irregular below examine of the dominant object.

Originality/value

Chain interruption discovery is the series of actions for the results knowing the familiarity opening and honor number associate order, the scientific categorization undertaking become necessary. The capacity issues of invasion discovery is the order to determine and examine. The arrangement of simulations at the occasion under discovery estimation for low regular aggression associations and above made up feeling sudden panic amount.

Details

International Journal of Pervasive Computing and Communications, vol. 17 no. 1
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 28 February 2019

Gabrijela Dimic, Dejan Rancic, Nemanja Macek, Petar Spalevic and Vida Drasute

This paper aims to deal with the previously unknown prediction accuracy of students’ activity pattern in a blended learning environment.

Abstract

Purpose

This paper aims to deal with the previously unknown prediction accuracy of students’ activity pattern in a blended learning environment.

Design/methodology/approach

To extract the most relevant activity feature subset, different feature-selection methods were applied. For different cardinality subsets, classification models were used in the comparison.

Findings

Experimental evaluation oppose the hypothesis that feature vector dimensionality reduction leads to prediction accuracy increasing.

Research limitations/implications

Improving prediction accuracy in a described learning environment was based on applying synthetic minority oversampling technique, which had affected results on correlation-based feature-selection method.

Originality/value

The major contribution of the research is the proposed methodology for selecting the optimal low-cardinal subset of students’ activities and significant prediction accuracy improvement in a blended learning environment.

Details

Information Discovery and Delivery, vol. 47 no. 2
Type: Research Article
ISSN: 2398-6247

Keywords

Article
Publication date: 10 July 2017

Hui Li, Yu-Hui Xu and Lean Yu

Available information for evaluating the possibility of hospitality firm failure in emerging countries is often deficient. Oversampling can compensate for this but can also yield…

Abstract

Purpose

Available information for evaluating the possibility of hospitality firm failure in emerging countries is often deficient. Oversampling can compensate for this but can also yield mixed samples, which limit prediction models’ effectiveness. This research aims to provide a feasible approach to handle possible mixed information caused by oversampling.

Design/methodology/approach

This paper uses mixed sample modelling (MSM) when evaluating the possibility of firm failure on enlarged hospitality firms. The mixed sample is filtered out with a mixed sample index through control of the noisy parameter and outliner parameter and meta-models are used to build MSM models for hospitality firm failure prediction, with performances compared to traditional models.

Findings

The proposed models are helpful in predicting hospitality firm failure in the mixed information situation caused by oversampling, whereas MSM significantly improves the performance of traditional models. Meanwhile, only partial mixed hospitality samples matter in predicting firm failure in both rich- and poor-information situations.

Practical implications

This research is helpful for managers, investors, employees and customers to reduce their hospitality-related risk in the emerging Chinese market. The two-dimensional sample collection strategies, three-step prediction process and five MSM modelling principles are helpful for practice of hospitality firm failure prediction.

Originality/value

This research provides a means of processing mixed hospitality firm samples through the early definition and proposal of MSM, which addresses the ranking information within samples in deficient information environments and improves forecasting accuracy of traditional models. Moreover, it provides empirical evidence for the validation of sample selection and sample pairing strategy in evaluating the possibility of hospitality firm failure.

Details

International Journal of Contemporary Hospitality Management, vol. 29 no. 7
Type: Research Article
ISSN: 0959-6119

Keywords

Article
Publication date: 5 May 2015

Yijiu Zhao, Houjun Wang and Zhijian Dai

The purpose of this paper is to present a model calibration technique for modulated wideband converter (MWC) with non-ideal lowpass filter. Without making any change to the system…

Abstract

Purpose

The purpose of this paper is to present a model calibration technique for modulated wideband converter (MWC) with non-ideal lowpass filter. Without making any change to the system architecture, at the cost of a moderate oversampling, the calibrated system can perform as the system with ideal lowpass filter.

Design/methodology/approach

A known test sparse signal is used to approximate the finite impulse response (FIR) of the practical non-ideal lowpass filter. Based on the approximated FIR filter, a digital compensation filter is designed to calibrate the practical filter. The calibrated filter can meet the perfect reconstruction condition. The non-ideal sub-Nyquist samples are filtered by a compensation filter.

Findings

Experimental results indicate that, by calibrating the MWC with the proposed algorithm, the impaction of non-ideal lowpass filter could be avoided. The performance of signal reconstruction could be improved significantly.

Originality/value

Without making any change to the MWC architecture, the proposed algorithm can calibrated the non-ideal lowpass filter. By filtering the non-ideal sub-Nyquist samples with the designed compensation filter, the original signal could be reconstructed with high accuracy.

Details

COMPEL: The International Journal for Computation and Mathematics in Electrical and Electronic Engineering, vol. 34 no. 3
Type: Research Article
ISSN: 0332-1649

Keywords

Open Access
Article
Publication date: 29 January 2024

Miaoxian Guo, Shouheng Wei, Chentong Han, Wanliang Xia, Chao Luo and Zhijian Lin

Surface roughness has a serious impact on the fatigue strength, wear resistance and life of mechanical products. Realizing the evolution of surface quality through theoretical…

Abstract

Purpose

Surface roughness has a serious impact on the fatigue strength, wear resistance and life of mechanical products. Realizing the evolution of surface quality through theoretical modeling takes a lot of effort. To predict the surface roughness of milling processing, this paper aims to construct a neural network based on deep learning and data augmentation.

Design/methodology/approach

This study proposes a method consisting of three steps. Firstly, the machine tool multisource data acquisition platform is established, which combines sensor monitoring with machine tool communication to collect processing signals. Secondly, the feature parameters are extracted to reduce the interference and improve the model generalization ability. Thirdly, for different expectations, the parameters of the deep belief network (DBN) model are optimized by the tent-SSA algorithm to achieve more accurate roughness classification and regression prediction.

Findings

The adaptive synthetic sampling (ADASYN) algorithm can improve the classification prediction accuracy of DBN from 80.67% to 94.23%. After the DBN parameters were optimized by Tent-SSA, the roughness prediction accuracy was significantly improved. For the classification model, the prediction accuracy is improved by 5.77% based on ADASYN optimization. For regression models, different objective functions can be set according to production requirements, such as root-mean-square error (RMSE) or MaxAE, and the error is reduced by more than 40% compared to the original model.

Originality/value

A roughness prediction model based on multiple monitoring signals is proposed, which reduces the dependence on the acquisition of environmental variables and enhances the model's applicability. Furthermore, with the ADASYN algorithm, the Tent-SSA intelligent optimization algorithm is introduced to optimize the hyperparameters of the DBN model and improve the optimization performance.

Details

Journal of Intelligent Manufacturing and Special Equipment, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2633-6596

Keywords

1 – 10 of 487