We propose an oversampling technique to increase the true positive rate (sensitivity) in classifying imbalanced datasets (i.e., those with a value for the target variable that occurs with a small frequency) and hence boost the overall performance measurements such as balanced accuracy, G-mean and area under the receiver operating characteristic (ROC) curve, AUC. This oversampling method is based on the idea of applying the Synthetic Minority Oversampling Technique (SMOTE) on only a selective portion of the dataset instead of the entire dataset. We demonstrate the effectiveness of our oversampling method with four real and simulated datasets generated from three models.
Nguyen, S., Quinn, J. and Olinsky, A. (2017), "An Oversampling Technique for Classifying Imbalanced Datasets", Advances in Business and Management Forecasting (Advances in Business and Management Forecasting, Vol. 12), Emerald Publishing Limited, pp. 63-80. https://doi.org/10.1108/S1477-407020170000012004Download as .RIS
Emerald Publishing Limited
Copyright © 2018 Emerald Publishing Limited