Search results
1 – 10 of over 2000Single-shot multi-category clothing recognition and retrieval play a crucial role in online searching and offline settlement scenarios. Existing clothing recognition methods based…
Abstract
Purpose
Single-shot multi-category clothing recognition and retrieval play a crucial role in online searching and offline settlement scenarios. Existing clothing recognition methods based on RGBD clothing images often suffer from high-dimensional feature representations, leading to compromised performance and efficiency.
Design/methodology/approach
To address this issue, this paper proposes a novel method called Manifold Embedded Discriminative Feature Selection (MEDFS) to select global and local features, thereby reducing the dimensionality of the feature representation and improving performance. Specifically, by combining three global features and three local features, a low-dimensional embedding is constructed to capture the correlations between features and categories. The MEDFS method designs an optimization framework utilizing manifold mapping and sparse regularization to achieve feature selection. The optimization objective is solved using an alternating iterative strategy, ensuring convergence.
Findings
Empirical studies conducted on a publicly available RGBD clothing image dataset demonstrate that the proposed MEDFS method achieves highly competitive clothing classification performance while maintaining efficiency in clothing recognition and retrieval.
Originality/value
This paper introduces a novel approach for multi-category clothing recognition and retrieval, incorporating the selection of global and local features. The proposed method holds potential for practical applications in real-world clothing scenarios.
Details
Keywords
Chong Wu, Xiaofang Chen and Yongjie Jiang
While the Chinese securities market is booming, the phenomenon of listed companies falling into financial distress is also emerging, which affects the operation and development of…
Abstract
Purpose
While the Chinese securities market is booming, the phenomenon of listed companies falling into financial distress is also emerging, which affects the operation and development of enterprises and also jeopardizes the interests of investors. Therefore, it is important to understand how to accurately and reasonably predict the financial distress of enterprises.
Design/methodology/approach
In the present study, ensemble feature selection (EFS) and improved stacking were used for financial distress prediction (FDP). Mutual information, analysis of variance (ANOVA), random forest (RF), genetic algorithms, and recursive feature elimination (RFE) were chosen for EFS to select features. Since there may be missing information when feeding the results of the base learner directly into the meta-learner, the features with high importance were fed into the meta-learner together. A screening layer was added to select the meta-learner with better performance. Finally, Optima hyperparameters were used for parameter tuning by the learners.
Findings
An empirical study was conducted with a sample of A-share listed companies in China. The F1-score of the model constructed using the features screened by EFS reached 84.55%, representing an improvement of 4.37% compared to the original features. To verify the effectiveness of improved stacking, benchmark model comparison experiments were conducted. Compared to the original stacking model, the accuracy of the improved stacking model was improved by 0.44%, and the F1-score was improved by 0.51%. In addition, the improved stacking model had the highest area under the curve (AUC) value (0.905) among all the compared models.
Originality/value
Compared to previous models, the proposed FDP model has better performance, thus bridging the research gap of feature selection. The present study provides new ideas for stacking improvement research and a reference for subsequent research in this field.
Details
Keywords
Faris Elghaish, Sandra Matarneh, Essam Abdellatef, Farzad Rahimian, M. Reza Hosseini and Ahmed Farouk Kineber
Cracks are prevalent signs of pavement distress found on highways globally. The use of artificial intelligence (AI) and deep learning (DL) for crack detection is increasingly…
Abstract
Purpose
Cracks are prevalent signs of pavement distress found on highways globally. The use of artificial intelligence (AI) and deep learning (DL) for crack detection is increasingly considered as an optimal solution. Consequently, this paper introduces a novel, fully connected, optimised convolutional neural network (CNN) model using feature selection algorithms for the purpose of detecting cracks in highway pavements.
Design/methodology/approach
To enhance the accuracy of the CNN model for crack detection, the authors employed a fully connected deep learning layers CNN model along with several optimisation techniques. Specifically, three optimisation algorithms, namely adaptive moment estimation (ADAM), stochastic gradient descent with momentum (SGDM), and RMSProp, were utilised to fine-tune the CNN model and enhance its overall performance. Subsequently, the authors implemented eight feature selection algorithms to further improve the accuracy of the optimised CNN model. These feature selection techniques were thoughtfully selected and systematically applied to identify the most relevant features contributing to crack detection in the given dataset. Finally, the authors subjected the proposed model to testing against seven pre-trained models.
Findings
The study's results show that the accuracy of the three optimisers (ADAM, SGDM, and RMSProp) with the five deep learning layers model is 97.4%, 98.2%, and 96.09%, respectively. Following this, eight feature selection algorithms were applied to the five deep learning layers to enhance accuracy, with particle swarm optimisation (PSO) achieving the highest F-score at 98.72. The model was then compared with other pre-trained models and exhibited the highest performance.
Practical implications
With an achieved precision of 98.19% and F-score of 98.72% using PSO, the developed model is highly accurate and effective in detecting and evaluating the condition of cracks in pavements. As a result, the model has the potential to significantly reduce the effort required for crack detection and evaluation.
Originality/value
The proposed method for enhancing CNN model accuracy in crack detection stands out for its unique combination of optimisation algorithms (ADAM, SGDM, and RMSProp) with systematic application of multiple feature selection techniques to identify relevant crack detection features and comparing results with existing pre-trained models.
Details
Keywords
Fung Yuen Chin, Kong Hoong Lem and Khye Mun Wong
The amount of features in handwritten digit data is often very large due to the different aspects in personal handwriting, leading to high-dimensional data. Therefore, the…
Abstract
Purpose
The amount of features in handwritten digit data is often very large due to the different aspects in personal handwriting, leading to high-dimensional data. Therefore, the employment of a feature selection algorithm becomes crucial for successful classification modeling, because the inclusion of irrelevant or redundant features can mislead the modeling algorithms, resulting in overfitting and decrease in efficiency.
Design/methodology/approach
The minimum redundancy and maximum relevance (mRMR) and the recursive feature elimination (RFE) are two frequently used feature selection algorithms. While mRMR is capable of identifying a subset of features that are highly relevant to the targeted classification variable, mRMR still carries the weakness of capturing redundant features along with the algorithm. On the other hand, RFE is flawed by the fact that those features selected by RFE are not ranked by importance, albeit RFE can effectively eliminate the less important features and exclude redundant features.
Findings
The hybrid method was exemplified in a binary classification between digits “4” and “9” and between digits “6” and “8” from a multiple features dataset. The result showed that the hybrid mRMR + support vector machine recursive feature elimination (SVMRFE) is better than both the sole support vector machine (SVM) and mRMR.
Originality/value
In view of the respective strength and deficiency mRMR and RFE, this study combined both these methods and used an SVM as the underlying classifier anticipating the mRMR to make an excellent complement to the SVMRFE.
Details
Keywords
Mohd Mustaqeem, Suhel Mustajab and Mahfooz Alam
Software defect prediction (SDP) is a critical aspect of software quality assurance, aiming to identify and manage potential defects in software systems. In this paper, we have…
Abstract
Purpose
Software defect prediction (SDP) is a critical aspect of software quality assurance, aiming to identify and manage potential defects in software systems. In this paper, we have proposed a novel hybrid approach that combines Gray Wolf Optimization with Feature Selection (GWOFS) and multilayer perceptron (MLP) for SDP. The GWOFS-MLP hybrid model is designed to optimize feature selection, ultimately enhancing the accuracy and efficiency of SDP. Gray Wolf Optimization, inspired by the social hierarchy and hunting behavior of gray wolves, is employed to select a subset of relevant features from an extensive pool of potential predictors. This study investigates the key challenges that traditional SDP approaches encounter and proposes promising solutions to overcome time complexity and the curse of the dimensionality reduction problem.
Design/methodology/approach
The integration of GWOFS and MLP results in a robust hybrid model that can adapt to diverse software datasets. This feature selection process harnesses the cooperative hunting behavior of wolves, allowing for the exploration of critical feature combinations. The selected features are then fed into an MLP, a powerful artificial neural network (ANN) known for its capability to learn intricate patterns within software metrics. MLP serves as the predictive engine, utilizing the curated feature set to model and classify software defects accurately.
Findings
The performance evaluation of the GWOFS-MLP hybrid model on a real-world software defect dataset demonstrates its effectiveness. The model achieves a remarkable training accuracy of 97.69% and a testing accuracy of 97.99%. Additionally, the receiver operating characteristic area under the curve (ROC-AUC) score of 0.89 highlights the model’s ability to discriminate between defective and defect-free software components.
Originality/value
Experimental implementations using machine learning-based techniques with feature reduction are conducted to validate the proposed solutions. The goal is to enhance SDP’s accuracy, relevance and efficiency, ultimately improving software quality assurance processes. The confusion matrix further illustrates the model’s performance, with only a small number of false positives and false negatives.
Details
Keywords
Yong Gui and Lanxin Zhang
Influenced by the constantly changing manufacturing environment, no single dispatching rule (SDR) can consistently obtain better scheduling results than other rules for the…
Abstract
Purpose
Influenced by the constantly changing manufacturing environment, no single dispatching rule (SDR) can consistently obtain better scheduling results than other rules for the dynamic job-shop scheduling problem (DJSP). Although the dynamic SDR selection classifier (DSSC) mined by traditional data-mining-based scheduling method has shown some improvement in comparison to an SDR, the enhancement is not significant since the rule selected by DSSC is still an SDR.
Design/methodology/approach
This paper presents a novel data-mining-based scheduling method for the DJSP with machine failure aiming at minimizing the makespan. Firstly, a scheduling priority relation model (SPRM) is constructed to determine the appropriate priority relation between two operations based on the production system state and the difference between their priority values calculated using multiple SDRs. Subsequently, a training sample acquisition mechanism based on the optimal scheduling schemes is proposed to acquire training samples for the SPRM. Furthermore, feature selection and machine learning are conducted using the genetic algorithm and extreme learning machine to mine the SPRM.
Findings
Results from numerical experiments demonstrate that the SPRM, mined by the proposed method, not only achieves better scheduling results in most manufacturing environments but also maintains a higher level of stability in diverse manufacturing environments than an SDR and the DSSC.
Originality/value
This paper constructs a SPRM and mines it based on data mining technologies to obtain better results than an SDR and the DSSC in various manufacturing environments.
Details
Keywords
Khalid Iqbal and Muhammad Shehrayar Khan
In this digital era, email is the most pervasive form of communication between people. Many users become a victim of spam emails and their data have been exposed.
Abstract
Purpose
In this digital era, email is the most pervasive form of communication between people. Many users become a victim of spam emails and their data have been exposed.
Design/methodology/approach
Researchers contribute to solving this problem by a focus on advanced machine learning algorithms and improved models for detecting spam emails but there is still a gap in features. To achieve good results, features also play an important role. To evaluate the performance of applied classifiers, 10-fold cross-validation is used.
Findings
The results approve that the spam emails are correctly classified with the accuracy of 98.00% for the Support Vector Machine and 98.06% for the Artificial Neural Network as compared to other applied machine learning classifiers.
Originality/value
In this paper, Point-Biserial correlation is applied to each feature concerning the class label of the University of California Irvine (UCI) spambase email dataset to select the best features. Extensive experiments are conducted on selected features by training the different classifiers.
Details
Keywords
Vaishali Rajput, Preeti Mulay and Chandrashekhar Madhavrao Mahajan
Nature’s evolution has shaped intelligent behaviors in creatures like insects and birds, inspiring the field of Swarm Intelligence. Researchers have developed bio-inspired…
Abstract
Purpose
Nature’s evolution has shaped intelligent behaviors in creatures like insects and birds, inspiring the field of Swarm Intelligence. Researchers have developed bio-inspired algorithms to address complex optimization problems efficiently. These algorithms strike a balance between computational efficiency and solution optimality, attracting significant attention across domains.
Design/methodology/approach
Bio-inspired optimization techniques for feature engineering and its applications are systematically reviewed with chief objective of assessing statistical influence and significance of “Bio-inspired optimization”-based computational models by referring to vast research literature published between year 2015 and 2022.
Findings
The Scopus and Web of Science databases were explored for review with focus on parameters such as country-wise publications, keyword occurrences and citations per year. Springer and IEEE emerge as the most creative publishers, with indicative prominent and superior journals, namely, PLoS ONE, Neural Computing and Applications, Lecture Notes in Computer Science and IEEE Transactions. The “National Natural Science Foundation” of China and the “Ministry of Electronics and Information Technology” of India lead in funding projects in this area. China, India and Germany stand out as leaders in publications related to bio-inspired algorithms for feature engineering research.
Originality/value
The review findings integrate various bio-inspired algorithm selection techniques over a diverse spectrum of optimization techniques. Anti colony optimization contributes to decentralized and cooperative search strategies, bee colony optimization (BCO) improves collaborative decision-making, particle swarm optimization leads to exploration-exploitation balance and bio-inspired algorithms offer a range of nature-inspired heuristics.
Details
Keywords
K.V. Sheelavathy and V. Udaya Rani
Internet of Things (IoT) is a network, which provides the connection with various physical objects such as smart machines, smart home appliance and so on. The physical objects are…
Abstract
Purpose
Internet of Things (IoT) is a network, which provides the connection with various physical objects such as smart machines, smart home appliance and so on. The physical objects are allocated with a unique internet address, namely, Internet Protocol, which is used to perform the data broadcasting with the external objects using the internet. The sudden increment in the number of attacks generated by intruders, causes security-related problems in IoT devices while performing the communication. The main purpose of this paper is to develop an effective attack detection to enhance the robustness against the attackers in IoT.
Design/methodology/approach
In this research, the lasso regression algorithm is proposed along with ensemble classifier for identifying the IoT attacks. The lasso algorithm is used for the process of feature selection that modeled fewer parameters for the sparse models. The type of regression is analyzed for showing higher levels when certain parts of model selection is needed for parameter elimination. The lasso regression obtains the subset for predictors to lower the prediction error with respect to the quantitative response variable. The lasso does not impose a constraint for modeling the parameters caused the coefficients with some variables shrink as zero. The selected features are classified by using an ensemble classifier, that is important for linear and nonlinear types of data in the dataset, and the models are combined for handling these data types.
Findings
The lasso regression with ensemble classifier–based attack classification comprises distributed denial-of-service and Mirai botnet attacks which achieved an improved accuracy of 99.981% than the conventional deep neural network (DNN) methods.
Originality/value
Here, an efficient lasso regression algorithm is developed for extracting the features to perform the network anomaly detection using ensemble classifier.
Details
Keywords
Oladosu Oyebisi Oladimeji, Abimbola Oladimeji and Olayanju Oladimeji
Diabetes is one of the life-threatening chronic diseases, which is already affecting 422m people globally based on (World Health Organization) WHO report as at 2018. This costs…
Abstract
Purpose
Diabetes is one of the life-threatening chronic diseases, which is already affecting 422m people globally based on (World Health Organization) WHO report as at 2018. This costs individuals, government and groups a whole lot; right from its diagnosis stage to the treatment stage. The reason for this cost, among others, is that it is a long-term treatment disease. This disease is likely to continue to affect more people because of its long asymptotic phase, which makes its early detection not feasible.
Design/methodology/approach
In this study, the authors have presented machine learning models with feature selection, which can detect diabetes disease at its early stage. Also, the models presented are not costly and available to everyone, including those in the remote areas.
Findings
The study result shows that feature selection helps in getting better model, as it prevents overfitting and removes redundant data. Hence, the study result when compared with previous research shows the better result has been achieved, after it was evaluated based on metrics such as F-measure, Precision-Recall curve and Receiver Operating Characteristic Area Under Curve. This discovery has the potential to impact on clinical practice, when health workers aim at diagnosing diabetes disease at its early stage.
Originality/value
This study has not been published anywhere else.
Details