Search results

1 – 10 of over 7000
Open Access
Article
Publication date: 25 July 2022

Fung Yuen Chin, Kong Hoong Lem and Khye Mun Wong

The amount of features in handwritten digit data is often very large due to the different aspects in personal handwriting, leading to high-dimensional data. Therefore, the…

1016

Abstract

Purpose

The amount of features in handwritten digit data is often very large due to the different aspects in personal handwriting, leading to high-dimensional data. Therefore, the employment of a feature selection algorithm becomes crucial for successful classification modeling, because the inclusion of irrelevant or redundant features can mislead the modeling algorithms, resulting in overfitting and decrease in efficiency.

Design/methodology/approach

The minimum redundancy and maximum relevance (mRMR) and the recursive feature elimination (RFE) are two frequently used feature selection algorithms. While mRMR is capable of identifying a subset of features that are highly relevant to the targeted classification variable, mRMR still carries the weakness of capturing redundant features along with the algorithm. On the other hand, RFE is flawed by the fact that those features selected by RFE are not ranked by importance, albeit RFE can effectively eliminate the less important features and exclude redundant features.

Findings

The hybrid method was exemplified in a binary classification between digits “4” and “9” and between digits “6” and “8” from a multiple features dataset. The result showed that the hybrid mRMR +  support vector machine recursive feature elimination (SVMRFE) is better than both the sole support vector machine (SVM) and mRMR.

Originality/value

In view of the respective strength and deficiency mRMR and RFE, this study combined both these methods and used an SVM as the underlying classifier anticipating the mRMR to make an excellent complement to the SVMRFE.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 12 June 2020

Sandeepkumar Hegde and Monica R. Mundada

According to the World Health Organization, by 2025, the contribution of chronic disease is expected to rise by 73% compared to all deaths and it is considered as global burden of…

Abstract

Purpose

According to the World Health Organization, by 2025, the contribution of chronic disease is expected to rise by 73% compared to all deaths and it is considered as global burden of disease with a rate of 60%. These diseases persist for a longer duration of time, which are almost incurable and can only be controlled. Cardiovascular disease, chronic kidney disease (CKD) and diabetes mellitus are considered as three major chronic diseases that will increase the risk among the adults, as they get older. CKD is considered a major disease among all these chronic diseases, which will increase the risk among the adults as they get older. Overall 10% of the population of the world is affected by CKD and it is likely to double in the year 2030. The paper aims to propose novel feature selection approach in combination with the machine-learning algorithm which can early predict the chronic disease with utmost accuracy. Hence, a novel feature selection adaptive probabilistic divergence-based feature selection (APDFS) algorithm is proposed in combination with the hyper-parameterized logistic regression model (HLRM) for the early prediction of chronic disease.

Design/methodology/approach

A novel feature selection APDFS algorithm is proposed which explicitly handles the feature associated with the class label by relevance and redundancy analysis. The algorithm applies the statistical divergence-based information theory to identify the relationship between the distant features of the chronic disease data set. The data set required to experiment is obtained from several medical labs and hospitals in India. The HLRM is used as a machine-learning classifier. The predictive ability of the framework is compared with the various algorithm and also with the various chronic disease data set. The experimental result illustrates that the proposed framework is efficient and achieved competitive results compared to the existing work in most of the cases.

Findings

The performance of the proposed framework is validated by using the metric such as recall, precision, F1 measure and ROC. The predictive performance of the proposed framework is analyzed by passing the data set belongs to various chronic disease such as CKD, diabetes and heart disease. The diagnostic ability of the proposed approach is demonstrated by comparing its result with existing algorithms. The experimental figures illustrated that the proposed framework performed exceptionally well in prior prediction of CKD disease with an accuracy of 91.6.

Originality/value

The capability of the machine learning algorithms depends on feature selection (FS) algorithms in identifying the relevant traits from the data set, which impact the predictive result. It is considered as a process of choosing the relevant features from the data set by removing redundant and irrelevant features. Although there are many approaches that have been already proposed toward this objective, they are computationally complex because of the strategy of following a one-step scheme in selecting the features. In this paper, a novel feature selection APDFS algorithm is proposed which explicitly handles the feature associated with the class label by relevance and redundancy analysis. The proposed algorithm handles the process of feature selection in two separate indices. Hence, the computational complexity of the algorithm is reduced to O(nk+1). The algorithm applies the statistical divergence-based information theory to identify the relationship between the distant features of the chronic disease data set. The data set required to experiment is obtained from several medical labs and hospitals of karkala taluk ,India. The HLRM is used as a machine learning classifier. The predictive ability of the framework is compared with the various algorithm and also with the various chronic disease data set. The experimental result illustrates that the proposed framework is efficient and achieved competitive results are compared to the existing work in most of the cases.

Details

International Journal of Pervasive Computing and Communications, vol. 17 no. 1
Type: Research Article
ISSN: 1742-7371

Keywords

Open Access
Article
Publication date: 28 July 2020

Noura AlNuaimi, Mohammad Mehedy Masud, Mohamed Adel Serhani and Nazar Zaki

Organizations in many domains generate a considerable amount of heterogeneous data every day. Such data can be processed to enhance these organizations’ decisions in real time…

3576

Abstract

Organizations in many domains generate a considerable amount of heterogeneous data every day. Such data can be processed to enhance these organizations’ decisions in real time. However, storing and processing large and varied datasets (known as big data) is challenging to do in real time. In machine learning, streaming feature selection has always been considered a superior technique for selecting the relevant subset features from highly dimensional data and thus reducing learning complexity. In the relevant literature, streaming feature selection refers to the features that arrive consecutively over time; despite a lack of exact figure on the number of features, numbers of instances are well-established. Many scholars in the field have proposed streaming-feature-selection algorithms in attempts to find the proper solution to this problem. This paper presents an exhaustive and methodological introduction of these techniques. This study provides a review of the traditional feature-selection algorithms and then scrutinizes the current algorithms that use streaming feature selection to determine their strengths and weaknesses. The survey also sheds light on the ongoing challenges in big-data research.

Details

Applied Computing and Informatics, vol. 18 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 9 December 2020

Aditya Singh, Padmakar Pandey and G.C. Nandi

For efficient trajectory control of industrial robots, a cumbersome computation for inverse kinematics and inverse dynamics is needed, which is usually developed using spatial…

Abstract

Purpose

For efficient trajectory control of industrial robots, a cumbersome computation for inverse kinematics and inverse dynamics is needed, which is usually developed using spatial transformation using Denavit–Hartenberg principle and Lagrangian or Newton–Euler methods, respectively. The model is highly non-linear and needs to deal with uncertainties because of lack of accurate measurement of mechanical parameters, noise and non-inclusion of joint friction, which results in some inaccuracies in predicting accurate torque trajectories. To get a guaranteed closed form solution, the robot designers normally follow Pieper’s recommendation and compromise with the mechanical design. While this may be acceptable for the industrial robots where the aesthetic look is not that important, it is not for humanoid and social robots. To help solve this problem, this study aims to propose an alternative machine learning-based computational approach based on a multi-gated sequence model for finding appropriate mapping between Cartesian space to joint space and motion space to joint torque space.

Design/methodology/approach

First, the authors generate sufficient data required for the sequence model, using forward kinematics and forward dynamics by running N number of nested loops, where N is the number of joints of the robot. Subsequently, to develop a learning-based model based on sequence analysis, the authors propose to use long short-term memory (LSTM) and hence, train an LSTM model, the architecture details of which have been discussed in the paper. To make LSTM learning algorithms perform efficiently, the authors need to detect and eliminate redundant features from the data set, which the authors propose to do using an elegant statistical tool called Pearson coefficient.

Findings

To validate the proposed model, the authors have performed rigorous experiments using both hardware and simulation robots (Baxter/Anukul robot) available in their laboratory and KUKA simulation robot data set made available from Neural Learning for Robotics Laboratory. Through several characteristic plots, it has been shown that a sequence-based LSTM model of deep learning architecture with non-redundant features could help the robots to learn smooth and accurate trajectories more quickly compared to data sets having redundancy. Such data-driven modeling techniques can change the future course of direction of robotics research for solving the classical problems such as trajectory planning and motion planning for manipulating industrial as well as social humanoid robots.

Originality/value

The present investigation involves development of deep learning-based computation model, statistical analyses to eliminate redundant features, data creation from one hardware robot (Anukul) and one simulation robot model (KUKA), rigorously training and testing separately two computational models (specially configured two LSTM models) – one for learning inverse kinematics and one for learning inverse dynamics problem – and comparison of the inverse dynamics model with the state-of-the-art model. Hence, the authors strongly believe that the present paper is compact and complete to get published in a reputed journal so that dissemination of new ideas can benefit the researchers in the area of robotics.

Details

Industrial Robot: the international journal of robotics research and application, vol. 48 no. 1
Type: Research Article
ISSN: 0143-991X

Keywords

Article
Publication date: 1 November 2023

Juan Yang, Zhenkun Li and Xu Du

Although numerous signal modalities are available for emotion recognition, audio and visual modalities are the most common and predominant forms for human beings to express their…

Abstract

Purpose

Although numerous signal modalities are available for emotion recognition, audio and visual modalities are the most common and predominant forms for human beings to express their emotional states in daily communication. Therefore, how to achieve automatic and accurate audiovisual emotion recognition is significantly important for developing engaging and empathetic human–computer interaction environment. However, two major challenges exist in the field of audiovisual emotion recognition: (1) how to effectively capture representations of each single modality and eliminate redundant features and (2) how to efficiently integrate information from these two modalities to generate discriminative representations.

Design/methodology/approach

A novel key-frame extraction-based attention fusion network (KE-AFN) is proposed for audiovisual emotion recognition. KE-AFN attempts to integrate key-frame extraction with multimodal interaction and fusion to enhance audiovisual representations and reduce redundant computation, filling the research gaps of existing approaches. Specifically, the local maximum–based content analysis is designed to extract key-frames from videos for the purpose of eliminating data redundancy. Two modules, including “Multi-head Attention-based Intra-modality Interaction Module” and “Multi-head Attention-based Cross-modality Interaction Module”, are proposed to mine and capture intra- and cross-modality interactions for further reducing data redundancy and producing more powerful multimodal representations.

Findings

Extensive experiments on two benchmark datasets (i.e. RAVDESS and CMU-MOSEI) demonstrate the effectiveness and rationality of KE-AFN. Specifically, (1) KE-AFN is superior to state-of-the-art baselines for audiovisual emotion recognition. (2) Exploring the supplementary and complementary information of different modalities can provide more emotional clues for better emotion recognition. (3) The proposed key-frame extraction strategy can enhance the performance by more than 2.79 per cent on accuracy. (4) Both exploring intra- and cross-modality interactions and employing attention-based audiovisual fusion can lead to better prediction performance.

Originality/value

The proposed KE-AFN can support the development of engaging and empathetic human–computer interaction environment.

Article
Publication date: 25 January 2022

Tobias Mueller, Alexander Segin, Christoph Weigand and Robert H. Schmitt

In the determination of the measurement uncertainty, the GUM procedure requires the building of a measurement model that establishes a functional relationship between the…

Abstract

Purpose

In the determination of the measurement uncertainty, the GUM procedure requires the building of a measurement model that establishes a functional relationship between the measurand and all influencing quantities. Since the effort of modelling as well as quantifying the measurement uncertainties depend on the number of influencing quantities considered, the aim of this study is to determine relevant influencing quantities and to remove irrelevant ones from the dataset.

Design/methodology/approach

In this work, it was investigated whether the effort of modelling for the determination of measurement uncertainty can be reduced by the use of feature selection (FS) methods. For this purpose, 9 different FS methods were tested on 16 artificial test datasets, whose properties (number of data points, number of features, complexity, features with low influence and redundant features) were varied via a design of experiments.

Findings

Based on a success metric, the stability, universality and complexity of the method, two FS methods could be identified that reliably identify relevant and irrelevant influencing quantities for a measurement model.

Originality/value

For the first time, FS methods were applied to datasets with properties of classical measurement processes. The simulation-based results serve as a basis for further research in the field of FS for measurement models. The identified algorithms will be applied to real measurement processes in the future.

Details

International Journal of Quality & Reliability Management, vol. 40 no. 3
Type: Research Article
ISSN: 0265-671X

Keywords

Article
Publication date: 6 January 2022

Deepti Sisodia and Dilip Singh Sisodia

The problem of choosing the utmost useful features from hundreds of features from time-series user click data arises in online advertising toward fraudulent publisher's…

Abstract

Purpose

The problem of choosing the utmost useful features from hundreds of features from time-series user click data arises in online advertising toward fraudulent publisher's classification. Selecting feature subsets is a key issue in such classification tasks. Practically, the use of filter approaches is common; however, they neglect the correlations amid features. Conversely, wrapper approaches could not be applied due to their complexities. Moreover, in particular, existing feature selection methods could not handle such data, which is one of the major causes of instability of feature selection.

Design/methodology/approach

To overcome such issues, a majority voting-based hybrid feature selection method, namely feature distillation and accumulated selection (FDAS), is proposed to investigate the optimal subset of relevant features for analyzing the publisher's fraudulent conduct. FDAS works in two phases: (1) feature distillation, where significant features from standard filter and wrapper feature selection methods are obtained using majority voting; (2) accumulated selection, where we enumerated an accumulated evaluation of relevant feature subset to search for an optimal feature subset using effective machine learning (ML) models.

Findings

Empirical results prove enhanced classification performance with proposed features in average precision, recall, f1-score and AUC in publisher identification and classification.

Originality/value

The FDAS is evaluated on FDMA2012 user-click data and nine other benchmark datasets to gauge its generalizing characteristics, first, considering original features, second, with relevant feature subsets selected by feature selection (FS) methods, third, with optimal feature subset obtained by the proposed approach. ANOVA significance test is conducted to demonstrate significant differences between independent features.

Details

Data Technologies and Applications, vol. 56 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 17 April 2023

Cornelia Grabe, Florian Jäckel, Parv Khurana and Richard P. Dwight

This paper aims to improve Reynolds-averaged Navier Stokes (RANS) turbulence models using a data-driven approach based on machine learning (ML). A special focus is put on…

Abstract

Purpose

This paper aims to improve Reynolds-averaged Navier Stokes (RANS) turbulence models using a data-driven approach based on machine learning (ML). A special focus is put on determining the optimal input features used for the ML model.

Design/methodology/approach

The field inversion and machine learning (FIML) approach is applied to the negative Spalart-Allmaras turbulence model for transonic flows over an airfoil where shock-induced separation occurs.

Findings

Optimal input features and an ML model are developed, which improve the existing negative Spalart-Allmaras turbulence model with respect to shock-induced flow separation.

Originality/value

A comprehensive workflow is demonstrated that yields insights on which input features and which ML model should be used in the context of the FIML approach

Details

International Journal of Numerical Methods for Heat & Fluid Flow, vol. 33 no. 4
Type: Research Article
ISSN: 0961-5539

Keywords

Article
Publication date: 22 March 2024

Mohd Mustaqeem, Suhel Mustajab and Mahfooz Alam

Software defect prediction (SDP) is a critical aspect of software quality assurance, aiming to identify and manage potential defects in software systems. In this paper, we have…

Abstract

Purpose

Software defect prediction (SDP) is a critical aspect of software quality assurance, aiming to identify and manage potential defects in software systems. In this paper, we have proposed a novel hybrid approach that combines Gray Wolf Optimization with Feature Selection (GWOFS) and multilayer perceptron (MLP) for SDP. The GWOFS-MLP hybrid model is designed to optimize feature selection, ultimately enhancing the accuracy and efficiency of SDP. Gray Wolf Optimization, inspired by the social hierarchy and hunting behavior of gray wolves, is employed to select a subset of relevant features from an extensive pool of potential predictors. This study investigates the key challenges that traditional SDP approaches encounter and proposes promising solutions to overcome time complexity and the curse of the dimensionality reduction problem.

Design/methodology/approach

The integration of GWOFS and MLP results in a robust hybrid model that can adapt to diverse software datasets. This feature selection process harnesses the cooperative hunting behavior of wolves, allowing for the exploration of critical feature combinations. The selected features are then fed into an MLP, a powerful artificial neural network (ANN) known for its capability to learn intricate patterns within software metrics. MLP serves as the predictive engine, utilizing the curated feature set to model and classify software defects accurately.

Findings

The performance evaluation of the GWOFS-MLP hybrid model on a real-world software defect dataset demonstrates its effectiveness. The model achieves a remarkable training accuracy of 97.69% and a testing accuracy of 97.99%. Additionally, the receiver operating characteristic area under the curve (ROC-AUC) score of 0.89 highlights the model’s ability to discriminate between defective and defect-free software components.

Originality/value

Experimental implementations using machine learning-based techniques with feature reduction are conducted to validate the proposed solutions. The goal is to enhance SDP’s accuracy, relevance and efficiency, ultimately improving software quality assurance processes. The confusion matrix further illustrates the model’s performance, with only a small number of false positives and false negatives.

Details

International Journal of Intelligent Computing and Cybernetics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 4 January 2022

Satish Kumar, Tushar Kolekar, Ketan Kotecha, Shruti Patil and Arunkumar Bongale

Excessive tool wear is responsible for damage or breakage of the tool, workpiece, or machining center. Thus, it is crucial to examine tool conditions during the machining process…

Abstract

Purpose

Excessive tool wear is responsible for damage or breakage of the tool, workpiece, or machining center. Thus, it is crucial to examine tool conditions during the machining process to improve its useful functional life and the surface quality of the final product. AI-based tool wear prediction techniques have proven to be effective in estimating the Remaining Useful Life (RUL) of the cutting tool. However, the model prediction needs improvement in terms of accuracy.

Design/methodology/approach

This paper represents a methodology of fusing a feature selection technique along with state-of-the-art deep learning models. The authors have used NASA milling data sets along with vibration signals for tool wear prediction and performance analysis in 15 different fault scenarios. Multiple steps are used for the feature selection and ranking. Different Long Short-Term Memory (LSTM) approaches are used to improve the overall prediction accuracy of the model for tool wear prediction. LSTM models' performance is evaluated using R-square, Mean Absolute Error (MAE), Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE) parameters.

Findings

The R-square accuracy of the hybrid model is consistently high and has low MAE, MAPE and RMSE values. The average R-square score values for LSTM, Bidirection, Encoder–Decoder and Hybrid LSTM are 80.43, 84.74, 94.20 and 97.85%, respectively, and corresponding average MAPE values are 23.46, 22.200, 9.5739 and 6.2124%. The hybrid model shows high accuracy as compared to the remaining LSTM models.

Originality/value

The low variance, Spearman Correlation Coefficient and Random Forest Regression methods are used to select the most significant feature vectors for training the miscellaneous LSTM model versions and highlight the best approach. The selected features pass to different LSTM models like Bidirectional, Encoder–Decoder and Hybrid LSTM for tool wear prediction. The Hybrid LSTM approach shows a significant improvement in tool wear prediction.

Details

International Journal of Quality & Reliability Management, vol. 39 no. 7
Type: Research Article
ISSN: 0265-671X

Keywords

1 – 10 of over 7000