Search results

1 – 10 of over 16000

View access options

Article

Publication date: 6 February 2017

Hybrid supervised clustering based ensemble scheme for text classification

The immense quantity of available unstructured text documents serve as one of the largest source of information. Text classification can be an essential task for many purposes in…

HTML

PDF (234 KB)

Downloads

541

Abstract

Purpose

The immense quantity of available unstructured text documents serve as one of the largest source of information. Text classification can be an essential task for many purposes in information retrieval, such as document organization, text filtering and sentiment analysis. Ensemble learning has been extensively studied to construct efficient text classification schemes with higher predictive performance and generalization ability. The purpose of this paper is to provide diversity among the classification algorithms of ensemble, which is a key issue in the ensemble design.

Design/methodology/approach

An ensemble scheme based on hybrid supervised clustering is presented for text classification. In the presented scheme, supervised hybrid clustering, which is based on cuckoo search algorithm and k-means, is introduced to partition the data samples of each class into clusters so that training subsets with higher diversities can be provided. Each classifier is trained on the diversified training subsets and the predictions of individual classifiers are combined by the majority voting rule. The predictive performance of the proposed classifier ensemble is compared to conventional classification algorithms (such as Naïve Bayes, logistic regression, support vector machines and C4.5 algorithm) and ensemble learning methods (such as AdaBoost, bagging and random subspace) using 11 text benchmarks.

Findings

The experimental results indicate that the presented classifier ensemble outperforms the conventional classification algorithms and ensemble learning methods for text classification.

Originality/value

The presented ensemble scheme is the first to use supervised clustering to obtain diverse ensemble for text classification

Details

Kybernetes, vol. 46 no. 2

Type: Research Article

DOI:

ISSN: 0368-492X

Keywords

View access options

Article

Publication date: 6 March 2007

Data‐efficient model building for financial applications: A semi‐supervised learning approach

Sven Sandow and Xuelong Zhou

Investors often rely on probabilistic models that were learned from small historical labeled datasets. The purpose of this article is to propose a new method for data‐efficient…

HTML

PDF (342 KB)

Downloads

358

Abstract

Purpose

Investors often rely on probabilistic models that were learned from small historical labeled datasets. The purpose of this article is to propose a new method for data‐efficient model learning.

Design/methodology/approach

The proposed method, which is an extension of the standard minimum relative entropy (MRE) approach and has a clear financial interpretation, belongs to the class of semi‐supervised algorithms, which can learn from data that are only partially labeled with values of the variable of interest.

Findings

This study tests the method on an artificial dataset and uses it to learn a model for recovery of defaulted debt. In both cases, the resulting models perform better than the standard MRE model, when the number of labeled data is small.

Originality/value

The method can be applied to financial problems where labeled data are sparse but unlabeled data are readily available.

Details

The Journal of Risk Finance, vol. 8 no. 2

Type: Research Article

DOI:

ISSN: 1526-5943

Keywords

View access options

Article

Publication date: 5 April 2021

Evaluating disaster-related tweet credibility using content-based and user-based features

Nasser Assery, Yuan (Dorothy) Xiaohong, Qu Xiuli, Roy Kaushik and Sultan Almalki

This study aims to propose an unsupervised learning model to evaluate the credibility of disaster-related Twitter data and present a performance comparison with commonly used…

HTML

PDF (713 KB)

Downloads

171

Abstract

Purpose

This study aims to propose an unsupervised learning model to evaluate the credibility of disaster-related Twitter data and present a performance comparison with commonly used supervised machine learning models.

Design/methodology/approach

First historical tweets on two recent hurricane events are collected via Twitter API. Then a credibility scoring system is implemented in which the tweet features are analyzed to give a credibility score and credibility label to the tweet. After that, supervised machine learning classification is implemented using various classification algorithms and their performances are compared.

Findings

The proposed unsupervised learning model could enhance the emergency response by providing a fast way to determine the credibility of disaster-related tweets. Additionally, the comparison of the supervised classification models reveals that the Random Forest classifier performs significantly better than the SVM and Logistic Regression classifiers in classifying the credibility of disaster-related tweets.

Originality/value

In this paper, an unsupervised 10-point scoring model is proposed to evaluate the tweets’ credibility based on the user-based and content-based features. This technique could be used to evaluate the credibility of disaster-related tweets on future hurricanes and would have the potential to enhance emergency response during critical events. The comparative study of different supervised learning methods has revealed effective supervised learning methods for evaluating the credibility of Tweeter data.

Details

Information Discovery and Delivery, vol. 50 no. 1

Type: Research Article

DOI:

ISSN: 2398-6247

Keywords

View access options

Article

Publication date: 30 April 2020

Thermal comfort prediction by applying supervised machine learning in green sidewalks of Tehran

Nasim Eslamirad, Soheil Malekpour Kolbadinejad, Mohammadjavad Mahdavinejad and Mohammad Mehranrad

This research aims to introduce a new methodology for integration between urban design strategies and supervised machine learning (SML) method – by applying both energy…

HTML

PDF (1.8 MB)

Downloads

534

Abstract

Purpose

This research aims to introduce a new methodology for integration between urban design strategies and supervised machine learning (SML) method – by applying both energy engineering modeling (evaluating phase) for the existing green sidewalks and statistical energy modeling (predicting phase) for the new ones – to offer algorithms that help to catch the optimum morphology of green sidewalks, in case of high quality of the outdoor thermal comfort and less errors in results.

Design/methodology/approach

The tools of the study are the way of processing by SML, predicting the future based on the past. Machine learning is benefited from Python advantages. The structure of the study consisted of two main parts, as the majority of the similar studies follow: engineering energy modeling and statistical energy modeling. According to the concept of the study, at first, from 2268 models, some are randomly selected, simulated and sensitively analyzed by ENVI-met. Furthermore, the Envi-met output as the quantity of thermal comfort – predicted mean vote (PMV) and weather items are inputs of Python. Then, the formed data set is processed by SML, to reach the final reliable predicted output.

Findings

The process of SML leads the study to find thermal comfort of current models and other similar sidewalks. The results are evaluated by both PMV mathematical model and SML error evaluation functions. The results confirm that the average of the occurred error is about 1%. Then the method of study is reliable to apply in the variety of similar fields. Finding of this study can be helpful in perspective of the sustainable architecture strategies in the buildings and urban scales, to determine, monitor and control energy-based behaviors (thermal comfort, heating, cooling, lighting and ventilation) in operational phase of the systems (existed elements in buildings, and constructions) and the planning and designing phase of the future built cases – all over their life spans.

Research limitations/implications

Limitations of the study are related to the study variables and alternatives that are notable impact on the findings. Furthermore, the most trustable input data will result in the more accuracy in output. Then modeling and simulation processes are most significant part of the research to reach the exact results in the final step.

Practical implications

Finding of the study can be helpful in urban design strategies. By finding outdoor thermal comfort that resulted from machine learning method, urban and landscape designers, policymakers and architects are able to estimate the features of their designs in air quality and urban health and can be sure in catching design goals in case of thermal comfort in urban atmosphere.

Social implications

By 2030, cities are delved as living spaces for about three out of five people. As green infrastructures influence in moderating the cities’ climate, the relationship between green spaces and habitants’ thermal comfort is deduced. Although the strategies to outside thermal comfort improvement, by design methods and applicants, are not new subject to discuss, applying machines that may be common in predicting results can be called as a new insight in applying more effective design strategies and in urban environment’s comfort preparation. Then study’s footprint in social implications stems in learning from the previous projects and developing more efficient strategies to prepare cities as the more comfortable and healthy places to live, with the more efficient models and consuming money and time.

Originality/value

The study achievements are expected to be applied not only in Tehran but also in other climate zones as the pattern in more eco-city design strategies. Although some similar studies are done in different majors, the concept of study is new vision in urban studies.

Details

Smart and Sustainable Built Environment, vol. 9 no. 4

Type: Research Article

DOI:

ISSN: 2046-6099

Keywords

View access options

Article

Publication date: 3 January 2018

Take full advantage of unlabeled data for sentiment classification

Lei La, Shuyan Cao and Liangjuan Qin

As a foundational issue of social mining, sentiment classification suffered from a lack of unlabeled data. To enhance accuracy of classification with few labeled data, many semi…

HTML

PDF (350 KB)

Downloads

266

Abstract

Purpose

As a foundational issue of social mining, sentiment classification suffered from a lack of unlabeled data. To enhance accuracy of classification with few labeled data, many semi-supervised algorithms had been proposed. These algorithms improved the classification performance when the labeled data are insufficient. However, precision and efficiency are difficult to be ensured at the same time in many semi-supervised methods. This paper aims to present a novel method for using unlabeled data in a more accurate and more efficient way.

Design/methodology/approach

First, the authors designed a boosting-based method for unlabeled data selection. The improved boosting-based method can choose unlabeled data which have the same distribution with the labeled data. The authors then proposed a novel strategy which can combine weak classifiers into strong classifiers that are more rational. Finally, a semi-supervised sentiment classification algorithm is given.

Findings

Experimental results demonstrate that the novel algorithm can achieve really high accuracy with low time consumption. It is helpful for achieving high-performance social network-related applications.

Research limitations/implications

The novel method needs a small labeled data set for semi-supervised learning. Maybe someday the authors can improve it to an unsupervised method.

Practical implications

The mentioned method can be used in text mining, image classification, audio processing and so on, and also in an unstructured data mining-related field. Overcome the problem of insufficient labeled data and achieve high precision using fewer computational time.

Social implications

Sentiment mining has wide applications in public opinion management, public security, market analysis, social network and related fields. Sentiment classification is the basis of sentiment mining.

Originality/value

According to what the authors have been informed, it is the first time transfer learning be introduced to AdaBoost for semi-supervised learning. Moreover, the improved AdaBoost uses a totally new mechanism for weighting.

Details

Kybernetes, vol. 47 no. 3

Type: Research Article

DOI:

ISSN: 0368-492X

Keywords

View access options

Article

Publication date: 5 May 2022

Semisupervised fault diagnosis of aeroengine based on denoising autoencoder and deep belief network

Defeng Lv, Huawei Wang and Changchang Che

The purpose of this study is to analyze the intelligent semisupervised fault diagnosis method of aeroengine.

HTML

PDF (669 KB)

Downloads

267

Abstract

Purpose

The purpose of this study is to analyze the intelligent semisupervised fault diagnosis method of aeroengine.

Design/methodology/approach

A semisupervised fault diagnosis method based on denoising autoencoder (DAE) and deep belief network (DBN) is proposed for aeroengine. Multiple state parameters of aeroengine with long time series are processed to form high-dimensional fault samples and corresponding fault types are taken as sample labels. DAE is applied for unsupervised learning of fault samples, so as to achieve denoised dimension-reduction features. Subsequently, the extracted features and sample labels are put into DBN for supervised learning. Thus, the semisupervised fault diagnosis of aeroengine can be achieved by the combination of unsupervised learning and supervised learning.

Findings

The JT9D aeroengine data set and simulated aeroengine data set are applied to test the effectiveness of the proposed method. The result shows that the semisupervised fault diagnosis method of aeroengine based on DAE and DBN has great robustness and can maintain high accuracy of fault diagnosis under noise interference. Compared with other traditional models and separate deep learning model, the proposed method also has lower error and higher accuracy of fault diagnosis.

Originality/value

Multiple state parameters with long time series are processed to form high-dimensional fault samples. As a typical unsupervised learning, DAE is used to denoise the fault samples and extract dimension-reduction features for future deep learning. Based on supervised learning, DBN is applied to process the extracted features and fault diagnosis of aeroengine with multiple state parameters can be achieved through the pretraining and reverse fine-tuning of restricted Boltzmann machines.

Details

Aircraft Engineering and Aerospace Technology, vol. 94 no. 10

Type: Research Article

DOI:

ISSN: 1748-8842

Keywords

View access options

Article

Publication date: 25 October 2018

Business environmental analysis for textual data using data mining and sentence-level classification

Yoon-Sung Kim, Hae-Chang Rim and Do-Gil Lee

The purpose of this paper is to propose a methodology to analyze a large amount of unstructured textual data into categories of business environmental analysis frameworks.

HTML

PDF (454 KB)

Downloads

1911

Abstract

Purpose

The purpose of this paper is to propose a methodology to analyze a large amount of unstructured textual data into categories of business environmental analysis frameworks.

Design/methodology/approach

This paper uses machine learning to classify a vast amount of unstructured textual data by category of business environmental analysis framework. Generally, it is difficult to produce high quality and massive training data for machine-learning-based system in terms of cost. Semi-supervised learning techniques are used to improve the classification performance. Additionally, the lack of feature problem that traditional classification systems have suffered is resolved by applying semantic features by utilizing word embedding, a new technique in text mining.

Findings

The proposed methodology can be used for various business environmental analyses and the system is fully automated in both the training and classifying phases. Semi-supervised learning can solve the problems with insufficient training data. The proposed semantic features can be helpful for improving traditional classification systems.

Research limitations/implications

This paper focuses on classifying sentences that contain the information of business environmental analysis in large amount of documents. However, the proposed methodology has a limitation on the advanced analyses which can directly help managers establish strategies, since it does not summarize the environmental variables that are implied in the classified sentences. Using the advanced summarization and recommendation techniques could extract the environmental variables among the sentences, and they can assist managers to establish effective strategies.

Originality/value

The feature selection technique developed in this paper has not been used in traditional systems for business and industry, so that the whole process can be fully automated. It also demonstrates practicality so that it can be applied to various business environmental analysis frameworks. In addition, the system is more economical than traditional systems because of semi-supervised learning, and can resolve the lack of feature problem that traditional systems suffer. This work is valuable for analyzing environmental factors and establishing strategies for companies.

Details

Industrial Management & Data Systems, vol. 119 no. 1

Type: Research Article

DOI:

ISSN: 0263-5577

Keywords

Open Access

Article

Publication date: 18 November 2021

Combating money laundering with machine learning – applicability of supervised-learning algorithms at cryptocurrency exchanges

Eric Pettersson Ruiz and Jannis Angelis

This study aims to explore how to deanonymize cryptocurrency money launderers with the help of machine learning (ML). Money is laundered through cryptocurrencies by distributing…

HTML

PDF (279 KB)

Downloads

5516

Abstract

Purpose

This study aims to explore how to deanonymize cryptocurrency money launderers with the help of machine learning (ML). Money is laundered through cryptocurrencies by distributing funds to multiple accounts and then reexchanging the crypto back. This process of exchanging currencies is done through cryptocurrency exchanges. Current preventive efforts are outdated, and ML may provide novel ways to identify illicit currency movements. Hence, this study investigates ML applicability for combatting money laundering activities using cryptocurrency.

Design/methodology/approach

Four supervised-learning algorithms were compared using the Bitcoin Elliptic Dataset. The method covered a quantitative analysis of the algorithmic performance, capturing differences in three key evaluation metrics of F1-scores, precision and recall. Two complementary qualitative interviews were performed at cryptocurrency exchanges to identify fit and applicability of the algorithms.

Findings

The study results show that the current implemented ML tools for preventing money laundering at cryptocurrency exchanges are all too slow and need to be optimized for the task. The results also show that while not one single algorithm is most suitable for detecting transactions related to money-laundering, the specific applicability of the decision tree algorithm is most suitable for adoption by cryptocurrency exchanges.

Originality/value

Given the growth of cryptocurrency use, this study explores the newly developed field of algorithmic tools to combat illicit currency movement, in particular in the growing arena of cryptocurrencies. The study results provide new insights into the applicability of ML as a tool to combat money laundering using cryptocurrency exchanges.

Details

Journal of Money Laundering Control, vol. 25 no. 4

Type: Research Article

DOI:

ISSN: 1368-5201

Keywords

View access options

Article

Publication date: 16 August 2023

A weakly supervised pairwise comparison learning approach for bearing health quantitative evaluation and remaining useful life prediction

Fanshu Zhao, Jin Cui, Mei Yuan and Juanru Zhao

The purpose of this paper is to present a weakly supervised learning method to perform health evaluation and predict the remaining useful life (RUL) of rolling bearings.

HTML

PDF (1.9 MB)

Downloads

Abstract

Purpose

The purpose of this paper is to present a weakly supervised learning method to perform health evaluation and predict the remaining useful life (RUL) of rolling bearings.

Design/methodology/approach

Based on the principle that bearing health degrades with the increase of service time, a weak label qualitative pairing comparison dataset for bearing health is extracted from the original time series monitoring data of bearing. A bearing health indicator (HI) quantitative evaluation model is obtained by training the delicately designed neural network structure with bearing qualitative comparison data between different health statuses. The remaining useful life is then predicted using the bearing health evaluation model and the degradation tolerance threshold. To validate the feasibility, efficiency and superiority of the proposed method, comparison experiments are designed and carried out on a widely used bearing dataset.

Findings

The method achieves the transformation of bearing health from qualitative comparison to quantitative evaluation via a learning algorithm, which is promising in industrial equipment health evaluation and prediction.

Originality/value

Details

Engineering Computations, vol. 40 no. 7/8

Type: Research Article

DOI:

ISSN: 0264-4401

Keywords

View access options

Article

Publication date: 9 October 2019

From words to pixels: text and image mining methods for service research

Francisco Villarroel Ordenes and Shunyuan Zhang

The purpose of this paper is to describe and position the state-of-the-art of text and image mining methods in business research. By providing a detailed conceptual and technical…

HTML

PDF (276 KB)

Downloads

3523

Abstract

Purpose

The purpose of this paper is to describe and position the state-of-the-art of text and image mining methods in business research. By providing a detailed conceptual and technical review of both methods, it aims to increase their utilization in service research.

Design/methodology/approach

On a first stage, the authors review business literature in marketing, operations and management concerning the use of text and image mining methods. On a second stage, the authors identify and analyze empirical papers that used text and image mining methods in services journals and premier business. Finally, avenues for further research in services are provided.

Findings

The manuscript identifies seven text mining methods and describes their approaches, processes, techniques and algorithms, involved in their implementation. Four of these methods are positioned similarly for image mining. There are 39 papers using text mining in service research, with a focus on measuring consumer sentiment, experiences, and service quality. Due to the nonexistent use of image mining service journals, the authors review their application in marketing and management, and suggest ideas for further research in services.

Research limitations/implications

This manuscript focuses on the different methods and their implementation in service research, but it does not offer a complete review of business literature using text and image mining methods.

Practical implications

The results have a number of implications for the discipline that are presented and discussed. The authors provide research directions using text and image mining methods in service priority areas such as artificial intelligence, frontline employees, transformative consumer research and customer experience.

Originality/value

The manuscript provides an introduction to text and image mining methods to service researchers and practitioners interested in the analysis of unstructured data. This paper provides several suggestions concerning the use of new sources of data (e.g. customer reviews, social media images, employee reviews and emails), measurement of new constructs (beyond sentiment and valence) and the use of more recent methods (e.g. deep learning).

Details

Journal of Service Management, vol. 30 no. 5

Type: Research Article

DOI:

ISSN: 1757-5818

Keywords

Access

Year

Content type

1 – 10 of over 16000