Search results
1 – 10 of 188Gaurav Sarin, Pradeep Kumar and M. Mukund
Text classification is a widely accepted and adopted technique in organizations to mine and analyze unstructured and semi-structured data. With advancement of technological…
Abstract
Purpose
Text classification is a widely accepted and adopted technique in organizations to mine and analyze unstructured and semi-structured data. With advancement of technological computing, deep learning has become more popular among academicians and professionals to perform mining and analytical operations. In this work, the authors study the research carried out in field of text classification using deep learning techniques to identify gaps and opportunities for doing research.
Design/methodology/approach
The authors adopted bibliometric-based approach in conjunction with visualization techniques to uncover new insights and findings. The authors collected data of two decades from Scopus global database to perform this study. The authors discuss business applications of deep learning techniques for text classification.
Findings
The study provides overview of various publication sources in field of text classification and deep learning together. The study also presents list of prominent authors and their countries working in this field. The authors also presented list of most cited articles based on citations and country of research. Various visualization techniques such as word cloud, network diagram and thematic map were used to identify collaboration network.
Originality/value
The study performed in this paper helped to understand research gaps that is original contribution to body of literature. To best of the authors' knowledge, in-depth study in the field of text classification and deep learning has not been performed in detail. The study provides high value to scholars and professionals by providing them opportunities of research in this area.
Details
Keywords
Konstantinos Kalodanis, Panagiotis Rizomiliotis and Dimosthenis Anagnostopoulos
The purpose of this paper is to highlight the key technical challenges that derive from the recently proposed European Artificial Intelligence Act and specifically, to investigate…
Abstract
Purpose
The purpose of this paper is to highlight the key technical challenges that derive from the recently proposed European Artificial Intelligence Act and specifically, to investigate the applicability of the requirements that the AI Act mandates to high-risk AI systems from the perspective of AI security.
Design/methodology/approach
This paper presents the main points of the proposed AI Act, with emphasis on the compliance requirements of high-risk systems. It matches known AI security threats with the relevant technical requirements, it demonstrates the impact that these security threats can have to the AI Act technical requirements and evaluates the applicability of these requirements based on the effectiveness of the existing security protection measures. Finally, the paper highlights the necessity for an integrated framework for AI system evaluation.
Findings
The findings of the EU AI Act technical assessment highlight the gap between the proposed requirements and the available AI security countermeasures as well as the necessity for an AI security evaluation framework.
Originality/value
AI Act, high-risk AI systems, security threats, security countermeasures.
Details
Keywords
Elisa Gonzalez Santacruz, David Romero, Julieta Noguez and Thorsten Wuest
This research paper aims to analyze the scientific and grey literature on Quality 4.0 and zero-defect manufacturing (ZDM) frameworks to develop an integrated quality 4.0 framework…
Abstract
Purpose
This research paper aims to analyze the scientific and grey literature on Quality 4.0 and zero-defect manufacturing (ZDM) frameworks to develop an integrated quality 4.0 framework (IQ4.0F) for quality improvement (QI) based on Six Sigma and machine learning (ML) techniques towards ZDM. The IQ4.0F aims to contribute to the advancement of defect prediction approaches in diverse manufacturing processes. Furthermore, the work enables a comprehensive analysis of process variables influencing product quality with emphasis on the use of supervised and unsupervised ML techniques in Six Sigma’s DMAIC (Define, Measure, Analyze, Improve and Control) cycle stage of “Analyze.”
Design/methodology/approach
The research methodology employed a systematic literature review (SLR) based on PRISMA guidelines to develop the integrated framework, followed by a real industrial case study set in the automotive industry to fulfill the objectives of verifying and validating the proposed IQ4.0F with primary data.
Findings
This research work demonstrates the value of a “stepwise framework” to facilitate a shift from conventional quality management systems (QMSs) to QMSs 4.0. It uses the IDEF0 modeling methodology and Six Sigma’s DMAIC cycle to structure the steps to be followed to adopt the Quality 4.0 paradigm for QI. It also proves the worth of integrating Six Sigma and ML techniques into the “Analyze” stage of the DMAIC cycle for improving defect prediction in manufacturing processes and supporting problem-solving activities for quality managers.
Originality/value
This research paper introduces a first-of-its-kind Quality 4.0 framework – the IQ4.0F. Each step of the IQ4.0F was verified and validated in an original industrial case study set in the automotive industry. It is the first Quality 4.0 framework, according to the SLR conducted, to utilize the principal component analysis technique as a substitute for “Screening Design” in the Design of Experiments phase and K-means clustering technique for multivariable analysis, identifying process parameters that significantly impact product quality. The proposed IQ4.0F not only empowers decision-makers with the knowledge to launch a Quality 4.0 initiative but also provides quality managers with a systematic problem-solving methodology for quality improvement.
Details
Keywords
Loris Nanni, Alessandra Lumini and Sheryl Brahnam
Automatic anatomical therapeutic chemical (ATC) classification is progressing at a rapid pace because of its potential in drug development. Predicting an unknown compound's…
Abstract
Purpose
Automatic anatomical therapeutic chemical (ATC) classification is progressing at a rapid pace because of its potential in drug development. Predicting an unknown compound's therapeutic and chemical characteristics in terms of how it affects multiple organs and physiological systems makes automatic ATC classification a vital yet challenging multilabel problem. The aim of this paper is to experimentally derive an ensemble of different feature descriptors and classifiers for ATC classification that outperforms the state-of-the-art.
Design/methodology/approach
The proposed method is an ensemble generated by the fusion of neural networks (i.e. a tabular model and long short-term memory networks (LSTM)) and multilabel classifiers based on multiple linear regression (hMuLab). All classifiers are trained on three sets of descriptors. Features extracted from the trained LSTMs are also fed into hMuLab. Evaluations of ensembles are compared on a benchmark data set of 3883 ATC-coded pharmaceuticals taken from KEGG, a publicly available drug databank.
Findings
Experiments demonstrate the power of the authors’ best ensemble, EnsATC, which is shown to outperform the best methods reported in the literature, including the state-of-the-art developed by the fast.ai research group. The MATLAB source code of the authors’ system is freely available to the public at https://github.com/LorisNanni/Neural-networks-for-anatomical-therapeutic-chemical-ATC-classification.
Originality/value
This study demonstrates the power of extracting LSTM features and combining them with ATC descriptors in ensembles for ATC classification.
Details
Keywords
Saleem Raja A., Sundaravadivazhagan Balasubaramanian, Pradeepa Ganesan, Justin Rajasekaran and Karthikeyan R.
The internet has completely merged into contemporary life. People are addicted to using internet services for everyday activities. Consequently, an abundance of information about…
Abstract
Purpose
The internet has completely merged into contemporary life. People are addicted to using internet services for everyday activities. Consequently, an abundance of information about people and organizations is available online, which encourages the proliferation of cybercrimes. Cybercriminals often use malicious links for large-scale cyberattacks, which are disseminated via email, SMS and social media. Recognizing malicious links online can be exceedingly challenging. The purpose of this paper is to present a strong security system that can detect malicious links in the cyberspace using natural language processing technique.
Design/methodology/approach
The researcher recommends a variety of approaches, including blacklisting and rules-based machine/deep learning, for automatically recognizing malicious links. But the approaches generally necessitate the generation of a set of features to generalize the detection process. Most of the features are generated by processing URLs and content of the web page, as well as some external features such as the ranking of the web page and domain name system information. This process of feature extraction and selection typically takes more time and demands a high level of expertise in the domain. Sometimes the generated features may not leverage the full potentials of the data set. In addition, the majority of the currently deployed systems make use of a single classifier for the classification of malicious links. However, prediction accuracy may vary widely depending on the data set and the classifier used.
Findings
To address the issue of generating feature sets, the proposed method uses natural language processing techniques (term frequency and inverse document frequency) that vectorize URLs. To build a robust system for the classification of malicious links, the proposed system implements weighted soft voting classifier, an ensemble classifier that combines predictions of base classifiers. The ability or skill of each classifier serves as the base for the weight that is assigned to it.
Originality/value
The proposed method performs better when the optimal weights are assigned. The performance of the proposed method was assessed by using two different data sets (D1 and D2) and compared performance against base machine learning classifiers and previous research results. The outcome accuracy shows that the proposed method is superior to the existing methods, offering 91.4% and 98.8% accuracy for data sets D1 and D2, respectively.
Details
Keywords
Ning Chen, Zhenyu Zhang and An Chen
Consequence prediction is an emerging topic in safety management concerning the severity outcome of accidents. In practical applications, it is usually implemented through…
Abstract
Purpose
Consequence prediction is an emerging topic in safety management concerning the severity outcome of accidents. In practical applications, it is usually implemented through supervised learning methods; however, the evaluation of classification results remains a challenge. The previous studies mostly adopted simplex evaluation based on empirical and quantitative assessment strategies. This paper aims to shed new light on the comprehensive evaluation and comparison of diverse classification methods through visualization, clustering and ranking techniques.
Design/methodology/approach
An empirical study is conducted using 9 state-of-the-art classification methods on a real-world data set of 653 construction accidents in China for predicting the consequence with respect to 39 carefully featured factors and accident type. The proposed comprehensive evaluation enriches the interpretation of classification results from different perspectives. Furthermore, the critical factors leading to severe construction accidents are identified by analyzing the coefficients of a logistic regression model.
Findings
This paper identifies the critical factors that significantly influence the consequence of construction accidents, which include accident type (particularly collapse), improper accident reporting and handling (E21), inadequate supervision engineers (O41), no special safety department (O11), delayed or low-quality drawings (T11), unqualified contractor (C21), schedule pressure (C11), multi-level subcontracting (C22), lacking safety examination (S22), improper operation of mechanical equipment (R11) and improper construction procedure arrangement (T21). The prediction models and findings of critical factors help make safety intervention measures in a targeted way and enhance the experience of safety professionals in the construction industry.
Research limitations/implications
The empirical study using some well-known classification methods for forecasting the consequences of construction accidents provides some evidence for the comprehensive evaluation of multiple classifiers. These techniques can be used jointly with other evaluation approaches for a comprehensive understanding of the classification algorithms. Despite the limitation of specific methods used in the study, the presented methodology can be configured with other classification methods and performance metrics and even applied to other decision-making problems such as clustering.
Originality/value
This study sheds new light on the comprehensive comparison and evaluation of classification results through visualization, clustering and ranking techniques using an empirical study of consequence prediction of construction accidents. The relevance of construction accident type is discussed with the severity of accidents. The critical factors influencing the accident consequence are identified for the sake of taking prevention measures for risk reduction. The proposed method can be applied to other decision-making tasks where the evaluation is involved as an important component.
Details
Keywords
Single-shot multi-category clothing recognition and retrieval play a crucial role in online searching and offline settlement scenarios. Existing clothing recognition methods based…
Abstract
Purpose
Single-shot multi-category clothing recognition and retrieval play a crucial role in online searching and offline settlement scenarios. Existing clothing recognition methods based on RGBD clothing images often suffer from high-dimensional feature representations, leading to compromised performance and efficiency.
Design/methodology/approach
To address this issue, this paper proposes a novel method called Manifold Embedded Discriminative Feature Selection (MEDFS) to select global and local features, thereby reducing the dimensionality of the feature representation and improving performance. Specifically, by combining three global features and three local features, a low-dimensional embedding is constructed to capture the correlations between features and categories. The MEDFS method designs an optimization framework utilizing manifold mapping and sparse regularization to achieve feature selection. The optimization objective is solved using an alternating iterative strategy, ensuring convergence.
Findings
Empirical studies conducted on a publicly available RGBD clothing image dataset demonstrate that the proposed MEDFS method achieves highly competitive clothing classification performance while maintaining efficiency in clothing recognition and retrieval.
Originality/value
This paper introduces a novel approach for multi-category clothing recognition and retrieval, incorporating the selection of global and local features. The proposed method holds potential for practical applications in real-world clothing scenarios.
Details
Keywords
Faisal Mehraj Wani, Jayaprakash Vemuri and Rajaram Chenna
Near-fault pulse-like ground motions have distinct and very severe effects on reinforced concrete (RC) structures. However, there is a paucity of recorded data from Near-Fault…
Abstract
Purpose
Near-fault pulse-like ground motions have distinct and very severe effects on reinforced concrete (RC) structures. However, there is a paucity of recorded data from Near-Fault Ground Motions (NFGMs), and thus forecasting the dynamic seismic response of structures, using conventional techniques, under such intense ground motions has remained a challenge.
Design/methodology/approach
The present study utilizes a 2D finite element model of an RC structure subjected to near-fault pulse-like ground motions with a focus on the storey drift ratio (SDR) as the key demand parameter. Five machine learning classifiers (MLCs), namely decision tree, k-nearest neighbor, random forest, support vector machine and Naïve Bayes classifier , were evaluated to classify the damage states of the RC structure.
Findings
The results such as confusion matrix, accuracy and mean square error indicate that the Naïve Bayes classifier model outperforms other MLCs with 80.0% accuracy. Furthermore, three MLC models with accuracy greater than 75% were trained using a voting classifier to enhance the performance score of the models. Finally, a sensitivity analysis was performed to evaluate the model's resilience and dependability.
Originality/value
The objective of the current study is to predict the nonlinear storey drift demand for low-rise RC structures using machine learning techniques, instead of labor-intensive nonlinear dynamic analysis.
Details
Keywords
Yong Gui and Lanxin Zhang
Influenced by the constantly changing manufacturing environment, no single dispatching rule (SDR) can consistently obtain better scheduling results than other rules for the…
Abstract
Purpose
Influenced by the constantly changing manufacturing environment, no single dispatching rule (SDR) can consistently obtain better scheduling results than other rules for the dynamic job-shop scheduling problem (DJSP). Although the dynamic SDR selection classifier (DSSC) mined by traditional data-mining-based scheduling method has shown some improvement in comparison to an SDR, the enhancement is not significant since the rule selected by DSSC is still an SDR.
Design/methodology/approach
This paper presents a novel data-mining-based scheduling method for the DJSP with machine failure aiming at minimizing the makespan. Firstly, a scheduling priority relation model (SPRM) is constructed to determine the appropriate priority relation between two operations based on the production system state and the difference between their priority values calculated using multiple SDRs. Subsequently, a training sample acquisition mechanism based on the optimal scheduling schemes is proposed to acquire training samples for the SPRM. Furthermore, feature selection and machine learning are conducted using the genetic algorithm and extreme learning machine to mine the SPRM.
Findings
Results from numerical experiments demonstrate that the SPRM, mined by the proposed method, not only achieves better scheduling results in most manufacturing environments but also maintains a higher level of stability in diverse manufacturing environments than an SDR and the DSSC.
Originality/value
This paper constructs a SPRM and mines it based on data mining technologies to obtain better results than an SDR and the DSSC in various manufacturing environments.
Details
Keywords
Loris Nanni and Sheryl Brahnam
Automatic DNA-binding protein (DNA-BP) classification is now an essential proteomic technology. Unfortunately, many systems reported in the literature are tested on only one or…
Abstract
Purpose
Automatic DNA-binding protein (DNA-BP) classification is now an essential proteomic technology. Unfortunately, many systems reported in the literature are tested on only one or two datasets/tasks. The purpose of this study is to create the most optimal and universal system for DNA-BP classification, one that performs competitively across several DNA-BP classification tasks.
Design/methodology/approach
Efficient DNA-BP classifier systems require the discovery of powerful protein representations and feature extraction methods. Experiments were performed that combined and compared descriptors extracted from state-of-the-art matrix/image protein representations. These descriptors were trained on separate support vector machines (SVMs) and evaluated. Convolutional neural networks with different parameter settings were fine-tuned on two matrix representations of proteins. Decisions were fused with the SVMs using the weighted sum rule and evaluated to experimentally derive the most powerful general-purpose DNA-BP classifier system.
Findings
The best ensemble proposed here produced comparable, if not superior, classification results on a broad and fair comparison with the literature across four different datasets representing a variety of DNA-BP classification tasks, thereby demonstrating both the power and generalizability of the proposed system.
Originality/value
Most DNA-BP methods proposed in the literature are only validated on one (rarely two) datasets/tasks. In this work, the authors report the performance of our general-purpose DNA-BP system on four datasets representing different DNA-BP classification tasks. The excellent results of the proposed best classifier system demonstrate the power of the proposed approach. These results can now be used for baseline comparisons by other researchers in the field.
Details