Search results

1 – 10 of 405
Article
Publication date: 6 March 2007

Sven Sandow and Xuelong Zhou

Investors often rely on probabilistic models that were learned from small historical labeled datasets. The purpose of this article is to propose a new method for data‐efficient…

Abstract

Purpose

Investors often rely on probabilistic models that were learned from small historical labeled datasets. The purpose of this article is to propose a new method for data‐efficient model learning.

Design/methodology/approach

The proposed method, which is an extension of the standard minimum relative entropy (MRE) approach and has a clear financial interpretation, belongs to the class of semi‐supervised algorithms, which can learn from data that are only partially labeled with values of the variable of interest.

Findings

This study tests the method on an artificial dataset and uses it to learn a model for recovery of defaulted debt. In both cases, the resulting models perform better than the standard MRE model, when the number of labeled data is small.

Originality/value

The method can be applied to financial problems where labeled data are sparse but unlabeled data are readily available.

Details

The Journal of Risk Finance, vol. 8 no. 2
Type: Research Article
ISSN: 1526-5943

Keywords

Article
Publication date: 3 January 2018

Lei La, Shuyan Cao and Liangjuan Qin

As a foundational issue of social mining, sentiment classification suffered from a lack of unlabeled data. To enhance accuracy of classification with few labeled data, many…

Abstract

Purpose

As a foundational issue of social mining, sentiment classification suffered from a lack of unlabeled data. To enhance accuracy of classification with few labeled data, many semi-supervised algorithms had been proposed. These algorithms improved the classification performance when the labeled data are insufficient. However, precision and efficiency are difficult to be ensured at the same time in many semi-supervised methods. This paper aims to present a novel method for using unlabeled data in a more accurate and more efficient way.

Design/methodology/approach

First, the authors designed a boosting-based method for unlabeled data selection. The improved boosting-based method can choose unlabeled data which have the same distribution with the labeled data. The authors then proposed a novel strategy which can combine weak classifiers into strong classifiers that are more rational. Finally, a semi-supervised sentiment classification algorithm is given.

Findings

Experimental results demonstrate that the novel algorithm can achieve really high accuracy with low time consumption. It is helpful for achieving high-performance social network-related applications.

Research limitations/implications

The novel method needs a small labeled data set for semi-supervised learning. Maybe someday the authors can improve it to an unsupervised method.

Practical implications

The mentioned method can be used in text mining, image classification, audio processing and so on, and also in an unstructured data mining-related field. Overcome the problem of insufficient labeled data and achieve high precision using fewer computational time.

Social implications

Sentiment mining has wide applications in public opinion management, public security, market analysis, social network and related fields. Sentiment classification is the basis of sentiment mining.

Originality/value

According to what the authors have been informed, it is the first time transfer learning be introduced to AdaBoost for semi-supervised learning. Moreover, the improved AdaBoost uses a totally new mechanism for weighting.

Details

Kybernetes, vol. 47 no. 3
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 25 October 2018

Yoon-Sung Kim, Hae-Chang Rim and Do-Gil Lee

The purpose of this paper is to propose a methodology to analyze a large amount of unstructured textual data into categories of business environmental analysis frameworks.

1915

Abstract

Purpose

The purpose of this paper is to propose a methodology to analyze a large amount of unstructured textual data into categories of business environmental analysis frameworks.

Design/methodology/approach

This paper uses machine learning to classify a vast amount of unstructured textual data by category of business environmental analysis framework. Generally, it is difficult to produce high quality and massive training data for machine-learning-based system in terms of cost. Semi-supervised learning techniques are used to improve the classification performance. Additionally, the lack of feature problem that traditional classification systems have suffered is resolved by applying semantic features by utilizing word embedding, a new technique in text mining.

Findings

The proposed methodology can be used for various business environmental analyses and the system is fully automated in both the training and classifying phases. Semi-supervised learning can solve the problems with insufficient training data. The proposed semantic features can be helpful for improving traditional classification systems.

Research limitations/implications

This paper focuses on classifying sentences that contain the information of business environmental analysis in large amount of documents. However, the proposed methodology has a limitation on the advanced analyses which can directly help managers establish strategies, since it does not summarize the environmental variables that are implied in the classified sentences. Using the advanced summarization and recommendation techniques could extract the environmental variables among the sentences, and they can assist managers to establish effective strategies.

Originality/value

The feature selection technique developed in this paper has not been used in traditional systems for business and industry, so that the whole process can be fully automated. It also demonstrates practicality so that it can be applied to various business environmental analysis frameworks. In addition, the system is more economical than traditional systems because of semi-supervised learning, and can resolve the lack of feature problem that traditional systems suffer. This work is valuable for analyzing environmental factors and establishing strategies for companies.

Details

Industrial Management & Data Systems, vol. 119 no. 1
Type: Research Article
ISSN: 0263-5577

Keywords

Article
Publication date: 7 February 2023

Riju Bhattacharya, Naresh Kumar Nagwani and Sarsij Tripathi

A community demonstrates the unique qualities and relationships between its members that distinguish it from other communities within a network. Network analysis relies heavily on…

Abstract

Purpose

A community demonstrates the unique qualities and relationships between its members that distinguish it from other communities within a network. Network analysis relies heavily on community detection. Despite the traditional spectral clustering and statistical inference methods, deep learning techniques for community detection have grown in popularity due to their ease of processing high-dimensional network data. Graph convolutional neural networks (GCNNs) have received much attention recently and have developed into a potential and ubiquitous method for directly detecting communities on graphs. Inspired by the promising results of graph convolutional networks (GCNs) in analyzing graph structure data, a novel community graph convolutional network (CommunityGCN) as a semi-supervised node classification model has been proposed and compared with recent baseline methods graph attention network (GAT), GCN-based technique for unsupervised community detection and Markov random fields combined with graph convolutional network (MRFasGCN).

Design/methodology/approach

This work presents the method for identifying communities that combines the notion of node classification via message passing with the architecture of a semi-supervised graph neural network. Six benchmark datasets, namely, Cora, CiteSeer, ACM, Karate, IMDB and Facebook, have been used in the experimentation.

Findings

In the first set of experiments, the scaled normalized average matrix of all neighbor's features including the node itself was obtained, followed by obtaining the weighted average matrix of low-dimensional nodes. In the second set of experiments, the average weighted matrix was forwarded to the GCN with two layers and the activation function for predicting the node class was applied. The results demonstrate that node classification with GCN can improve the performance of identifying communities on graph datasets.

Originality/value

The experiment reveals that the CommunityGCN approach has given better results with accuracy, normalized mutual information, F1 and modularity scores of 91.26, 79.9, 92.58 and 70.5 per cent, respectively, for detecting communities in the graph network, which is much greater than the range of 55.7–87.07 per cent reported in previous literature. Thus, it has been concluded that the GCN with node classification models has improved the accuracy.

Details

Data Technologies and Applications, vol. 57 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 19 April 2022

D. Divya, Bhasi Marath and M.B. Santosh Kumar

This study aims to bring awareness to the developing of fault detection systems using the data collected from sensor devices/physical devices of various systems for predictive…

1665

Abstract

Purpose

This study aims to bring awareness to the developing of fault detection systems using the data collected from sensor devices/physical devices of various systems for predictive maintenance. Opportunities and challenges in developing anomaly detection algorithms for predictive maintenance and unexplored areas in this context are also discussed.

Design/methodology/approach

For conducting a systematic review on the state-of-the-art algorithms in fault detection for predictive maintenance, review papers from the years 2017–2021 available in the Scopus database were selected. A total of 93 papers were chosen. They are classified under electrical and electronics, civil and constructions, automobile, production and mechanical. In addition to this, the paper provides a detailed discussion of various fault-detection algorithms that can be categorised under supervised, semi-supervised, unsupervised learning and traditional statistical method along with an analysis of various forms of anomalies prevalent across different sectors of industry.

Findings

Based on the literature reviewed, seven propositions with a focus on the following areas are presented: need for a uniform framework while scaling the number of sensors; the need for identification of erroneous parameters; why there is a need for new algorithms based on unsupervised and semi-supervised learning; the importance of ensemble learning and data fusion algorithms; the necessity of automatic fault diagnostic systems; concerns about multiple fault detection; and cost-effective fault detection. These propositions shed light on the unsolved issues of predictive maintenance using fault detection algorithms. A novel architecture based on the methodologies and propositions gives more clarity for the reader to further explore in this area.

Originality/value

Papers for this study were selected from the Scopus database for predictive maintenance in the field of fault detection. Review papers published in this area deal only with methods used to detect anomalies, whereas this paper attempts to establish a link between different industrial domains and the methods used in each industry that uses fault detection for predictive maintenance.

Details

Journal of Quality in Maintenance Engineering, vol. 29 no. 2
Type: Research Article
ISSN: 1355-2511

Keywords

Article
Publication date: 17 October 2022

Fengwei Jing, Mengyang Zhang, Jie Li, Guozheng Xu and Jing Wang

Coil shape quality is the external representation of strip product quality, and it is also a direct reflection of strip production process level. This paper aims to predict the…

Abstract

Purpose

Coil shape quality is the external representation of strip product quality, and it is also a direct reflection of strip production process level. This paper aims to predict the coil shape results in advance based on the real-time data through the designed algorithm.

Design/methodology/approach

Aiming at the strip production scale and coil shape application requirements, this paper proposes a strip coil shape defects prediction algorithm based on Siamese semi-supervised denoising auto-encoder (DAE)-convolutional neural networks. The prediction algorithm first reconstructs the information eigenvectors using DAE, then combines the convolutional neural networks and skip connection to further process the eigenvectors and finally compares the eigenvectors with the full connect neural network and predicts the strip coil shape condition.

Findings

The performance of the model is further verified by using the coil shape data of a steel mill, and the results show that the overall prediction accuracy, recall rate and F-measure of the model are significantly better than other commonly used classification models, with each index exceeding 88%. In addition, the prediction results of the model for different steel grades strip coil shape are also very stable, and the model has strong generalization ability.

Originality/value

This research provides technical support for the adjustment and optimization of strip coil shape process based on the data-driven level, which helps to improve the production quality and intelligence level of hot strip continuous rolling.

Details

Assembly Automation, vol. 42 no. 6
Type: Research Article
ISSN: 0144-5154

Keywords

Article
Publication date: 20 July 2023

Mu Shengdong, Liu Yunjie and Gu Jijian

By introducing Stacking algorithm to solve the underfitting problem caused by insufficient data in traditional machine learning, this paper provides a new solution to the cold…

Abstract

Purpose

By introducing Stacking algorithm to solve the underfitting problem caused by insufficient data in traditional machine learning, this paper provides a new solution to the cold start problem of entrepreneurial borrowing risk control.

Design/methodology/approach

The authors introduce semi-supervised learning and integrated learning into the field of migration learning, and innovatively propose the Stacking model migration learning, which can independently train models on entrepreneurial borrowing credit data, and then use the migration strategy itself as the learning object, and use the Stacking algorithm to combine the prediction results of the source domain model and the target domain model.

Findings

The effectiveness of the two migration learning models is evaluated with real data from an entrepreneurial borrowing. The algorithmic performance of the Stacking-based model migration learning is further improved compared to the benchmark model without migration learning techniques, with the model area under curve value rising to 0.8. Comparing the two migration learning models reveals that the model-based migration learning approach performs better. The reason for this is that the sample-based migration learning approach only eliminates the noisy samples that are relatively less similar to the entrepreneurial borrowing data. However, the calculation of similarity and the weighing of similarity are subjective, and there is no unified judgment standard and operation method, so there is no guarantee that the retained traditional credit samples have the same sample distribution and feature structure as the entrepreneurial borrowing data.

Practical implications

From a practical standpoint, on the one hand, it provides a new solution to the cold start problem of entrepreneurial borrowing risk control. The small number of labeled high-quality samples cannot support the learning and deployment of big data risk control models, which is the cold start problem of the entrepreneurial borrowing risk control system. By extending the training sample set with auxiliary domain data through suitable migration learning methods, the prediction performance of the model can be improved to a certain extent and more generalized laws can be learned.

Originality/value

This paper introduces the thought method of migration learning to the entrepreneurial borrowing scenario, provides a new solution to the cold start problem of the entrepreneurial borrowing risk control system and verifies the feasibility and effectiveness of the migration learning method applied in the risk control field through empirical data.

Details

Management Decision, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0025-1747

Keywords

Article
Publication date: 16 October 2018

Guan Yuan, Zhaohui Wang, Fanrong Meng, Qiuyan Yan and Shixiong Xia

Currently, ubiquitous smartphones embedded with various sensors provide a convenient way to collect raw sequence data. These data bridges the gap between human activity and…

Abstract

Purpose

Currently, ubiquitous smartphones embedded with various sensors provide a convenient way to collect raw sequence data. These data bridges the gap between human activity and multiple sensors. Human activity recognition has been widely used in quite a lot of aspects in our daily life, such as medical security, personal safety, living assistance and so on.

Design/methodology/approach

To provide an overview, the authors survey and summarize some important technologies and involved key issues of human activity recognition, including activity categorization, feature engineering as well as typical algorithms presented in recent years. In this paper, the authors first introduce the character of embedded sensors and dsiscuss their features, as well as survey some data labeling strategies to get ground truth label. Then, following the process of human activity recognition, the authors discuss the methods and techniques of raw data preprocessing and feature extraction, and summarize some popular algorithms used in model training and activity recognizing. Third, they introduce some interesting application scenarios of human activity recognition and provide some available data sets as ground truth data to validate proposed algorithms.

Findings

The authors summarize their viewpoints on human activity recognition, discuss the main challenges and point out some potential research directions.

Originality/value

It is hoped that this work will serve as the steppingstone for those interested in advancing human activity recognition.

Details

Sensor Review, vol. 39 no. 2
Type: Research Article
ISSN: 0260-2288

Keywords

Article
Publication date: 1 November 2021

Vishakha Pareek, Santanu Chaudhury and Sanjay Singh

The electronic nose is an array of chemical or gas sensors and associated with a pattern-recognition framework competent in identifying and classifying odorant or non-odorant and…

Abstract

Purpose

The electronic nose is an array of chemical or gas sensors and associated with a pattern-recognition framework competent in identifying and classifying odorant or non-odorant and simple or complex gases. Despite more than 30 years of research, the robust e-nose device is still limited. Most of the challenges towards reliable e-nose devices are associated with the non-stationary environment and non-stationary sensor behaviour. Data distribution of sensor array response evolves with time, referred to as non-stationarity. The purpose of this paper is to provide a comprehensive introduction to challenges related to non-stationarity in e-nose design and to review the existing literature from an application, system and algorithm perspective to provide an integrated and practical view.

Design/methodology/approach

The authors discuss the non-stationary data in general and the challenges related to the non-stationarity environment in e-nose design or non-stationary sensor behaviour. The challenges are categorised and discussed with the perspective of learning with data obtained from the sensor systems. Later, the e-nose technology is reviewed with the system, application and algorithmic point of view to discuss the current status.

Findings

The discussed challenges in e-nose design will be beneficial for researchers, as well as practitioners as it presents a comprehensive view on multiple aspects of non-stationary learning, system, algorithms and applications for e-nose. The paper presents a review of the pattern-recognition techniques, public data sets that are commonly referred to as olfactory research. Generic techniques for learning in the non-stationary environment are also presented. The authors discuss the future direction of research and major open problems related to handling non-stationarity in e-nose design.

Originality/value

The authors first time review the existing literature related to learning with e-nose in a non-stationary environment and existing generic pattern-recognition algorithms for learning in the non-stationary environment to bridge the gap between these two. The authors also present details of publicly available sensor array data sets, which will benefit the upcoming researchers in this field. The authors further emphasise several open problems and future directions, which should be considered to provide efficient solutions that can handle non-stationarity to make e-nose the next everyday device.

Article
Publication date: 1 October 2005

Marko Grobelnik and Dunja Mladenić

PurposeTo resent approaches and some research results of various research areas contributing to knowledge discovery from different sources, different data forms, on different

3982

Abstract

PurposeTo resent approaches and some research results of various research areas contributing to knowledge discovery from different sources, different data forms, on different scale, and for different purpose. Design/methodology/approachContribute to knowledge management by applying knowledge discovery approaches to enable computer search for the relevant knowledge whereas the humans give just broad directions. FindingsKnowledge discovery techniques proved to be very appropriate for many problems related to knowledge management. Surprisingly, it is often the case that already relatively simple approaches provide valuable results. Research limitations/implicationsStill there are many open problems and scalability issues that arise when dealing with real‐world data and especially in the areas involving text and network analysis. Practical implicationsEach problem should be handled with care, taking into account different aspects and selecting/extending the most appropriate available methods or developing some new approaches. Originality/valueThis paper provides an interesting collection of selected knowledge discovery methods applied in different context but all contributing in some way to knowledge management. Several of the reported approaches were developed in collaboration with the authors of the paper with especial emphases on their usability for practical problems involving knowledge management.

Details

Journal of Knowledge Management, vol. 9 no. 5
Type: Research Article
ISSN: 1367-3270

Keywords

1 – 10 of 405