Search results

1 – 10 of over 60000

Open Access

Article

Publication date: 22 November 2022

Research on optimization of index system design and its inspection method: data quality diagnosis, index classification and stratification

Kedong Yin, Yun Cao, Shiwei Zhou and Xinman Lv

The purposes of this research are to study the theory and method of multi-attribute index system design and establish a set of systematic, standardized, scientific index systems…

HTML

PDF (779 KB)

Downloads

915

Abstract

Purpose

The purposes of this research are to study the theory and method of multi-attribute index system design and establish a set of systematic, standardized, scientific index systems for the design optimization and inspection process. The research may form the basis for a rational, comprehensive evaluation and provide the most effective way of improving the quality of management decision-making. It is of practical significance to improve the rationality and reliability of the index system and provide standardized, scientific reference standards and theoretical guidance for the design and construction of the index system.

Design/methodology/approach

Using modern methods such as complex networks and machine learning, a system for the quality diagnosis of index data and the classification and stratification of index systems is designed. This guarantees the quality of the index data, realizes the scientific classification and stratification of the index system, reduces the subjectivity and randomness of the design of the index system, enhances its objectivity and rationality and lays a solid foundation for the optimal design of the index system.

Findings

Based on the ideas of statistics, system theory, machine learning and data mining, the focus in the present research is on “data quality diagnosis” and “index classification and stratification” and clarifying the classification standards and data quality characteristics of index data; a data-quality diagnosis system of “data review – data cleaning – data conversion – data inspection” is established. Using a decision tree, explanatory structural model, cluster analysis, K-means clustering and other methods, classification and hierarchical method system of indicators is designed to reduce the redundancy of indicator data and improve the quality of the data used. Finally, the scientific and standardized classification and hierarchical design of the index system can be realized.

Originality/value

The innovative contributions and research value of the paper are reflected in three aspects. First, a method system for index data quality diagnosis is designed, and multi-source data fusion technology is adopted to ensure the quality of multi-source, heterogeneous and mixed-frequency data of the index system. The second is to design a systematic quality-inspection process for missing data based on the systematic thinking of the whole and the individual. Aiming at the accuracy, reliability, and feasibility of the patched data, a quality-inspection method of patched data based on inversion thought and a unified representation method of data fusion based on a tensor model are proposed. The third is to use the modern method of unsupervised learning to classify and stratify the index system, which reduces the subjectivity and randomness of the design of the index system and enhances its objectivity and rationality.

Details

Marine Economics and Management, vol. 5 no. 2

Type: Research Article

DOI:

ISSN: 2516-158X

Keywords

View access options

Article

Publication date: 9 July 2020

A data mining approach for lubricant-based fault diagnosis

James Wakiru, Liliane Pintelon, Peter Muchiri and Peter Chemweno

The purpose of this paper is to develop a maintenance decision support system (DSS) framework using in-service lubricant data for fault diagnosis. The DSS reveals embedded…

HTML

PDF (1.1 MB)

Downloads

324

Abstract

Purpose

The purpose of this paper is to develop a maintenance decision support system (DSS) framework using in-service lubricant data for fault diagnosis. The DSS reveals embedded patterns in the data (knowledge discovery) and automatically quantifies the influence of lubricant parameters on the unhealthy state of the machine using alternative classifiers. The classifiers are compared for robustness from which decision-makers select an appropriate classifier given a specific lubricant data set.

Design/methodology/approach

The DSS embeds a framework integrating cluster and principal component analysis, for feature extraction, and eight classifiers among them extreme gradient boosting (XGB), random forest (RF), decision trees (DT) and logistic regression (LR). A qualitative and quantitative criterion is developed in conjunction with practitioners for comparing the classifier models.

Findings

The results show the importance of embedded knowledge, explored via a knowledge discovery approach. Moreover, the efficacy of the embedded knowledge on maintenance DSS is emphasized. Importantly, the proposed framework is demonstrated as plausible for decision support due to its high accuracy and consideration of practitioners needs.

Practical implications

The proposed framework will potentially assist maintenance managers in accurately exploiting lubricant data for maintenance DSS, while offering insights with reduced time and errors.

Originality/value

Advances in lubricant-based intelligent approach for fault diagnosis is seldom utilized in practice, however, may be incorporated in the information management systems offering high predictive accuracy. The classification models' comparison approach, will inevitably assist the industry in selecting amongst divergent models' for DSS.

Details

Journal of Quality in Maintenance Engineering, vol. 27 no. 2

Type: Research Article

DOI:

ISSN: 1355-2511

Keywords

View access options

Article

Publication date: 28 July 2020

Chicken swarm foraging algorithm for big data classification using the deep belief network classifier

Sathyaraj R, Ramanathan L, Lavanya K, Balasubramanian V and Saira Banu J

The innovation in big data is increasing day by day in such a way that the conventional software tools face several problems in managing the big data. Moreover, the occurrence of…

HTML

PDF (2.2 MB)

Downloads

181

Abstract

Purpose

The innovation in big data is increasing day by day in such a way that the conventional software tools face several problems in managing the big data. Moreover, the occurrence of the imbalance data in the massive data sets is a major constraint to the research industry.

Design/methodology/approach

The purpose of the paper is to introduce a big data classification technique using the MapReduce framework based on an optimization algorithm. The big data classification is enabled using the MapReduce framework, which utilizes the proposed optimization algorithm, named chicken-based bacterial foraging (CBF) algorithm. The proposed algorithm is generated by integrating the bacterial foraging optimization (BFO) algorithm with the cat swarm optimization (CSO) algorithm. The proposed model executes the process in two stages, namely, training and testing phases. In the training phase, the big data that is produced from different distributed sources is subjected to parallel processing using the mappers in the mapper phase, which perform the preprocessing and feature selection based on the proposed CBF algorithm. The preprocessing step eliminates the redundant and inconsistent data, whereas the feature section step is done on the preprocessed data for extracting the significant features from the data, to provide improved classification accuracy. The selected features are fed into the reducer for data classification using the deep belief network (DBN) classifier, which is trained using the proposed CBF algorithm such that the data are classified into various classes, and finally, at the end of the training process, the individual reducers present the trained models. Thus, the incremental data are handled effectively based on the training model in the training phase. In the testing phase, the incremental data are taken and split into different subsets and fed into the different mappers for the classification. Each mapper contains a trained model which is obtained from the training phase. The trained model is utilized for classifying the incremental data. After classification, the output obtained from each mapper is fused and fed into the reducer for the classification.

Findings

The maximum accuracy and Jaccard coefficient are obtained using the epileptic seizure recognition database. The proposed CBF-DBN produces a maximal accuracy value of 91.129%, whereas the accuracy values of the existing neural network (NN), DBN, naive Bayes classifier-term frequency–inverse document frequency (NBC-TFIDF) are 82.894%, 86.184% and 86.512%, respectively. The Jaccard coefficient of the proposed CBF-DBN produces a maximal Jaccard coefficient value of 88.928%, whereas the Jaccard coefficient values of the existing NN, DBN, NBC-TFIDF are 75.891%, 79.850% and 81.103%, respectively.

Originality/value

In this paper, a big data classification method is proposed for categorizing massive data sets for meeting the constraints of huge data. The big data classification is performed on the MapReduce framework based on training and testing phases in such a way that the data are handled in parallel at the same time. In the training phase, the big data is obtained and partitioned into different subsets of data and fed into the mapper. In the mapper, the features extraction step is performed for extracting the significant features. The obtained features are subjected to the reducers for classifying the data using the obtained features. The DBN classifier is utilized for the classification wherein the DBN is trained using the proposed CBF algorithm. The trained model is obtained as an output after the classification. In the testing phase, the incremental data are considered for the classification. New data are first split into subsets and fed into the mapper for classification. The trained models obtained from the training phase are used for the classification. The classified results from each mapper are fused and fed into the reducer for the classification of big data.

Details

Data Technologies and Applications, vol. 55 no. 3

Type: Research Article

DOI:

ISSN: 2514-9288

Keywords

View access options

Article

Publication date: 28 September 2021

Cancer data classification by quantum-inspired immune clone optimization-based optimal feature selection using gene expression data: deep learning approach

Nageswara Rao Eluri, Gangadhara Rao Kancharla, Suresh Dara and Venkatesulu Dondeti

Gene selection is considered as the fundamental process in the bioinformatics field. The existing methodologies pertain to cancer classification are mostly clinical basis, and its…

HTML

PDF (17.3 MB)

Downloads

238

Abstract

Purpose

Gene selection is considered as the fundamental process in the bioinformatics field. The existing methodologies pertain to cancer classification are mostly clinical basis, and its diagnosis capability is limited. Nowadays, the significant problems of cancer diagnosis are solved by the utilization of gene expression data. The researchers have been introducing many possibilities to diagnose cancer appropriately and effectively. This paper aims to develop the cancer data classification using gene expression data.

Design/methodology/approach

The proposed classification model involves three main phases: “(1) Feature extraction, (2) Optimal Feature Selection and (3) Classification”. Initially, five benchmark gene expression datasets are collected. From the collected gene expression data, the feature extraction is performed. To diminish the length of the feature vectors, optimal feature selection is performed, for which a new meta-heuristic algorithm termed as quantum-inspired immune clone optimization algorithm (QICO) is used. Once the relevant features are selected, the classification is performed by a deep learning model called recurrent neural network (RNN). Finally, the experimental analysis reveals that the proposed QICO-based feature selection model outperforms the other heuristic-based feature selection and optimized RNN outperforms the other machine learning methods.

Findings

The proposed QICO-RNN is acquiring the best outcomes at any learning percentage. On considering the learning percentage 85, the accuracy of the proposed QICO-RNN was 3.2% excellent than RNN, 4.3% excellent than RF, 3.8% excellent than NB and 2.1% excellent than KNN for Dataset 1. For Dataset 2, at learning percentage 35, the accuracy of the proposed QICO-RNN was 13.3% exclusive than RNN, 8.9% exclusive than RF and 14.8% exclusive than NB and KNN. Hence, the developed QICO algorithm is performing well in classifying the cancer data using gene expression data accurately.

Originality/value

This paper introduces a new optimal feature selection model using QICO and QICO-based RNN for effective classification of cancer data using gene expression data. This is the first work that utilizes an optimal feature selection model using QICO and QICO-RNN for effective classification of cancer data using gene expression data.

Details

Data Technologies and Applications, vol. 56 no. 2

Type: Research Article

DOI:

ISSN: 2514-9288

Keywords

View access options

Article

Publication date: 5 March 2018

A comparative study of classifier techniques for lift index data analysis

Mohammad Asjad, Azazullah Alam and Faisal Hasan

A classifier technique is one of the important tools which may be used to classify the data or information into systematic manner based on certain criteria pertaining to get the…

HTML

PDF (365 KB)

Downloads

137

Abstract

Purpose

A classifier technique is one of the important tools which may be used to classify the data or information into systematic manner based on certain criteria pertaining to get the accurate statistical information for decision making. It plays a vital role in the various applications, such as business organization, e-commerce, health care, scientific and engineering application. The purpose of this paper is to examine the performance of different classification techniques in lift index (LI) data classification.

Design/methodology/approach

The analyses consist of two stages. First, the random data are generated for lifting task through computer programming, which is then put into the National Institute for Occupational Safety and Health equation for LI estimation. Based on the evaluated index, the task may be classified into two groups, i.e. high-risk and low-risk task. The classified task is considered to analyze the performance of different tools like Artificial Neural Network (ANN), discriminant analysis (DA) and support vector machines (SVMs).

Findings

The work clearly demonstrates the accuracy and computational ability of ANN, DA and SVM for data classification problems in general and LI data in particular. From the research it may be concluded that SVM may outperform ANN and DA.

Research limitations/implications

The research is limited to a particular kind of data that may be further explored by selecting the different controllable parameters and model specification. The study can also be applied to realistic problem of manual loading. It is expected that this will help researchers, designers and practicing engineers by making them aware of the performance of classification techniques in this area.

Originality/value

The objective of this research work is to assess and compare the relative performance of some well-known classification techniques like DA, ANN and SVM, which suggest that data characteristics considerably impact the classification performance of the methods.

Details

Benchmarking: An International Journal, vol. 25 no. 2

Type: Research Article

DOI:

ISSN: 1463-5771

Keywords

View access options

Article

Publication date: 17 August 2018

Adjustment mode decision based on support vector data description and evidence theory for assembly lines

Youlong Lv, Wei Qin, Jungang Yang and Jie Zhang

Three adjustment modes are alternatives for mixed-model assembly lines (MMALs) to improve their production plans according to constantly changing customer requirements. The…

HTML

PDF (425 KB)

Downloads

234

Abstract

Purpose

Three adjustment modes are alternatives for mixed-model assembly lines (MMALs) to improve their production plans according to constantly changing customer requirements. The purpose of this paper is to deal with the decision-making problem between these modes by proposing a novel multi-classification method. This method recommends appropriate adjustment modes for the assembly lines faced with different customer orders through machine learning from historical data.

Design/methodology/approach

The decision-making method uses the classification model composed of an input layer, two intermediate layers and an output layer. The input layer describes the assembly line in a knowledge-intensive manner by presenting the impact degrees of production parameters on line performances. The first intermediate layer provides the support vector data description (SVDD) of each adjustment mode through historical data training. The second intermediate layer employs the Dempster–Shafer (D–S) theory to combine the posterior classification possibilities generated from different SVDDs. The output layer gives the adjustment mode with the maximum posterior possibility as the classification result according to Bayesian decision theory.

Findings

The proposed method achieves higher classification accuracies than the support vector machine methods and the traditional SVDD method in the numerical test consisting of data sets from the machine-learning repository and the case study of a diesel engine assembly line.

Practical implications

This research recommends appropriate adjustment modes for MMALs in response to customer demand changes. According to the suggested adjustment mode, the managers can improve the line performance more effectively by using the well-designed optimization methods for a specific scope.

Originality/value

The adjustment mode decision belongs to the multi-classification problem featured with limited historical data. Although traditional SVDD methods can solve these problems by providing the posterior possibility of each classification result, they might have poor classification accuracies owing to the conflicts and uncertainties of these possibilities. This paper develops a novel classification model that integrates the SVDD method with the D–S theory. By handling the conflicts and uncertainties appropriately, this model achieves higher classification accuracies than traditional methods.

Details

Industrial Management & Data Systems, vol. 118 no. 8

Type: Research Article

DOI:

ISSN: 0263-5577

Keywords

View access options

Article

Publication date: 13 August 2019

A systematical approach to classification problems with feature space heterogeneity

Hongshan Xiao and Yu Wang

Feature space heterogeneity exists widely in various application fields of classification techniques, such as customs inspection decision, credit scoring and medical diagnosis…

HTML

PDF (840 KB)

Downloads

122

Abstract

Purpose

Feature space heterogeneity exists widely in various application fields of classification techniques, such as customs inspection decision, credit scoring and medical diagnosis. This paper aims to study the relationship between feature space heterogeneity and classification performance.

Design/methodology/approach

A measurement is first developed for measuring and identifying any significant heterogeneity that exists in the feature space of a data set. The main idea of this measurement is derived from a meta-analysis. For the data set with significant feature space heterogeneity, a classification algorithm based on factor analysis and clustering is proposed to learn the data patterns, which, in turn, are used for data classification.

Findings

The proposed approach has two main advantages over the previous methods. The first advantage lies in feature transform using orthogonal factor analysis, which results in new features without redundancy and irrelevance. The second advantage rests on samples partitioning to capture the feature space heterogeneity reflected by differences of factor scores. The validity and effectiveness of the proposed approach is verified on a number of benchmarking data sets.

Research limitations/implications

Measurement should be used to guide the heterogeneity elimination process, which is an interesting topic in future research. In addition, to develop a classification algorithm that enables scalable and incremental learning for large data sets with significant feature space heterogeneity is also an important issue.

Practical implications

Measuring and eliminating the feature space heterogeneity possibly existing in the data are important for accurate classification. This study provides a systematical approach to feature space heterogeneity measurement and elimination for better classification performance, which is favorable for applications of classification techniques in real-word problems.

Originality/value

A measurement based on meta-analysis for measuring and identifying any significant feature space heterogeneity in a classification problem is developed, and an ensemble classification framework is proposed to deal with the feature space heterogeneity and improve the classification accuracy.

Details

Kybernetes, vol. 48 no. 9

Type: Research Article

DOI:

ISSN: 0368-492X

Keywords

View access options

Article

Publication date: 7 November 2016

A lexicon based approach for classifying Arabic multi-labeled text

Ismail Hmeidi, Mahmoud Al-Ayyoub, Nizar A. Mahyoub and Mohammed A. Shehab

Multi-label Text Classification (MTC) is one of the most recent research trends in data mining and information retrieval domains because of many reasons such as the rapid growth…

HTML

PDF (659 KB)

Downloads

350

Abstract

Purpose

Multi-label Text Classification (MTC) is one of the most recent research trends in data mining and information retrieval domains because of many reasons such as the rapid growth of online data and the increasing tendency of internet users to be more comfortable with assigning multiple labels/tags to describe documents, emails, posts, etc. The dimensionality of labels makes MTC more difficult and challenging compared with traditional single-labeled text classification (TC). Because it is a natural extension of TC, several ways are proposed to benefit from the rich literature of TC through what is called problem transformation (PT) methods. Basically, PT methods transform the multi-label data into a single-label one that is suitable for traditional single-label classification algorithms. Another approach is to design novel classification algorithms customized for MTC. Over the past decade, several works have appeared on both approaches focusing mainly on the English language. This work aims to present an elaborate study of MTC of Arabic articles.

Design/methodology/approach

This paper presents a novel lexicon-based method for MTC, where the keywords that are most associated with each label are extracted from the training data along with a threshold that can later be used to determine whether each test document belongs to a certain label.

Findings

The experiments show that the presented approach outperforms the currently available approaches. Specifically, the results of our experiments show that the best accuracy obtained from existing approaches is only 18 per cent, whereas the accuracy of the presented lexicon-based approach can reach an accuracy level of 31 per cent.

Originality/value

Although there exist some tools that can be customized to address the MTC problem for Arabic text, their accuracies are very low when applied to Arabic articles. This paper presents a novel method for MTC. The experiments show that the presented approach outperforms the currently available approaches.

Details

International Journal of Web Information Systems, vol. 12 no. 4

Type: Research Article

DOI:

ISSN: 1744-0084

Keywords

View access options

Article

Publication date: 29 December 2023

ImageNet classification with Raspberry Pis: federated learning algorithms of local classifiers

Thanh-Nghi Do and Minh-Thu Tran-Nguyen

This study aims to propose novel edge device-tailored federated learning algorithms of local classifiers (stochastic gradient descent, support vector machines), namely, FL-lSGD…

HTML

PDF (1001 KB)

Downloads

Abstract

Purpose

This study aims to propose novel edge device-tailored federated learning algorithms of local classifiers (stochastic gradient descent, support vector machines), namely, FL-lSGD and FL-lSVM. These algorithms are designed to address the challenge of large-scale ImageNet classification.

Design/methodology/approach

The authors’ FL-lSGD and FL-lSVM trains in a parallel and incremental manner to build an ensemble local classifier on Raspberry Pis without requiring data exchange. The algorithms load small data blocks of the local training subset stored on the Raspberry Pi sequentially to train the local classifiers. The data block is split into k partitions using the k-means algorithm, and models are trained in parallel on each data partition to enable local data classification.

Findings

Empirical test results on the ImageNet data set show that the authors’ FL-lSGD and FL-lSVM algorithms with 4 Raspberry Pis (Quad core Cortex-A72, ARM v8, 64-bit SoC @ 1.5GHz, 4GB RAM) are faster than the state-of-the-art LIBLINEAR algorithm run on a PC (Intel(R) Core i7-4790 CPU, 3.6 GHz, 4 cores, 32GB RAM).

Originality/value

Efficiently addressing the challenge of large-scale ImageNet classification, the authors’ novel federated learning algorithms of local classifiers have been tailored to work on the Raspberry Pi. These algorithms can handle 1,281,167 images and 1,000 classes effectively.

Details

International Journal of Web Information Systems, vol. 20 no. 1

Type: Research Article

DOI:

ISSN: 1744-0084

Keywords

View access options

Article

Publication date: 3 January 2018

Take full advantage of unlabeled data for sentiment classification

Lei La, Shuyan Cao and Liangjuan Qin

As a foundational issue of social mining, sentiment classification suffered from a lack of unlabeled data. To enhance accuracy of classification with few labeled data, many…

HTML

PDF (350 KB)

Downloads

270

Abstract

Purpose

As a foundational issue of social mining, sentiment classification suffered from a lack of unlabeled data. To enhance accuracy of classification with few labeled data, many semi-supervised algorithms had been proposed. These algorithms improved the classification performance when the labeled data are insufficient. However, precision and efficiency are difficult to be ensured at the same time in many semi-supervised methods. This paper aims to present a novel method for using unlabeled data in a more accurate and more efficient way.

Design/methodology/approach

First, the authors designed a boosting-based method for unlabeled data selection. The improved boosting-based method can choose unlabeled data which have the same distribution with the labeled data. The authors then proposed a novel strategy which can combine weak classifiers into strong classifiers that are more rational. Finally, a semi-supervised sentiment classification algorithm is given.

Findings

Experimental results demonstrate that the novel algorithm can achieve really high accuracy with low time consumption. It is helpful for achieving high-performance social network-related applications.

Research limitations/implications

The novel method needs a small labeled data set for semi-supervised learning. Maybe someday the authors can improve it to an unsupervised method.

Practical implications

The mentioned method can be used in text mining, image classification, audio processing and so on, and also in an unstructured data mining-related field. Overcome the problem of insufficient labeled data and achieve high precision using fewer computational time.

Social implications

Sentiment mining has wide applications in public opinion management, public security, market analysis, social network and related fields. Sentiment classification is the basis of sentiment mining.

Originality/value

According to what the authors have been informed, it is the first time transfer learning be introduced to AdaBoost for semi-supervised learning. Moreover, the improved AdaBoost uses a totally new mechanism for weighting.

Details

Kybernetes, vol. 47 no. 3

Type: Research Article

DOI:

ISSN: 0368-492X

Keywords

Access

Year

Content type

Article (60561)

1 – 10 of over 60000