Search results

1 – 10 of 366
Open Access
Article
Publication date: 14 December 2021

Mariam Elhussein and Samiha Brahimi

This paper aims to propose a novel way of using textual clustering as a feature selection method. It is applied to identify the most important keywords in the profile…

Abstract

Purpose

This paper aims to propose a novel way of using textual clustering as a feature selection method. It is applied to identify the most important keywords in the profile classification. The method is demonstrated through the problem of sick-leave promoters on Twitter.

Design/methodology/approach

Four machine learning classifiers were used on a total of 35,578 tweets posted on Twitter. The data were manually labeled into two categories: promoter and nonpromoter. Classification performance was compared when the proposed clustering feature selection approach and the standard feature selection were applied.

Findings

Radom forest achieved the highest accuracy of 95.91% higher than similar work compared. Furthermore, using clustering as a feature selection method improved the Sensitivity of the model from 73.83% to 98.79%. Sensitivity (recall) is the most important measure of classifier performance when detecting promoters’ accounts that have spam-like behavior.

Research limitations/implications

The method applied is novel, more testing is needed in other datasets before generalizing its results.

Practical implications

The model applied can be used by Saudi authorities to report on the accounts that sell sick-leaves online.

Originality/value

The research is proposing a new way textual clustering can be used in feature selection.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 23 July 2020

Rami Mustafa A. Mohammad

Spam emails classification using data mining and machine learning approaches has enticed the researchers' attention duo to its obvious positive impact in protecting internet…

1962

Abstract

Spam emails classification using data mining and machine learning approaches has enticed the researchers' attention duo to its obvious positive impact in protecting internet users. Several features can be used for creating data mining and machine learning based spam classification models. Yet, spammers know that the longer they will use the same set of features for tricking email users the more probably the anti-spam parties might develop tools for combating this kind of annoying email messages. Spammers, so, adapt by continuously reforming the group of features utilized for composing spam emails. For that reason, even though traditional classification methods possess sound classification results, they were ineffective for lifelong classification of spam emails duo to the fact that they might be prone to the so-called “Concept Drift”. In the current study, an enhanced model is proposed for ensuring lifelong spam classification model. For the evaluation purposes, the overall performance of the suggested model is contrasted against various other stream mining classification techniques. The results proved the success of the suggested model as a lifelong spam emails classification method.

Details

Applied Computing and Informatics, vol. 20 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 16 July 2021

Gustavo Grander, Luciano Ferreira da Silva and Ernesto Del Rosário Santibañez Gonzalez

This paper aims to analyze how decision support systems manage Big data to obtain value.

3509

Abstract

Purpose

This paper aims to analyze how decision support systems manage Big data to obtain value.

Design/methodology/approach

A systematic literature review was performed with screening and analysis of 72 articles published between 2012 and 2019.

Findings

The findings reveal that techniques of big data analytics, machine learning algorithms and technologies predominantly related to computer science and cloud computing are used on decision support systems. Another finding was that the main areas that these techniques and technologies are been applied are logistic, traffic, health, business and market. This article also allows authors to understand the relationship in which descriptive, predictive and prescriptive analyses are used according to an inverse relationship of complexity in data analysis and the need for human decision-making.

Originality/value

As it is an emerging theme, this study seeks to present an overview of the techniques and technologies that are being discussed in the literature to solve problems in their respective areas, as a form of theoretical contribution. The authors also understand that there is a practical contribution to the maturity of the discussion and with reflections even presented as suggestions for future research, such as the ethical discussion. This study’s descriptive classification can also serve as a guide for new researchers who seek to understand the research involving decision support systems and big data to gain value in our society.

Details

Revista de Gestão, vol. 28 no. 3
Type: Research Article
ISSN: 1809-2276

Keywords

Open Access
Article
Publication date: 18 November 2021

Eric Pettersson Ruiz and Jannis Angelis

This study aims to explore how to deanonymize cryptocurrency money launderers with the help of machine learning (ML). Money is laundered through cryptocurrencies by distributing…

5534

Abstract

Purpose

This study aims to explore how to deanonymize cryptocurrency money launderers with the help of machine learning (ML). Money is laundered through cryptocurrencies by distributing funds to multiple accounts and then reexchanging the crypto back. This process of exchanging currencies is done through cryptocurrency exchanges. Current preventive efforts are outdated, and ML may provide novel ways to identify illicit currency movements. Hence, this study investigates ML applicability for combatting money laundering activities using cryptocurrency.

Design/methodology/approach

Four supervised-learning algorithms were compared using the Bitcoin Elliptic Dataset. The method covered a quantitative analysis of the algorithmic performance, capturing differences in three key evaluation metrics of F1-scores, precision and recall. Two complementary qualitative interviews were performed at cryptocurrency exchanges to identify fit and applicability of the algorithms.

Findings

The study results show that the current implemented ML tools for preventing money laundering at cryptocurrency exchanges are all too slow and need to be optimized for the task. The results also show that while not one single algorithm is most suitable for detecting transactions related to money-laundering, the specific applicability of the decision tree algorithm is most suitable for adoption by cryptocurrency exchanges.

Originality/value

Given the growth of cryptocurrency use, this study explores the newly developed field of algorithmic tools to combat illicit currency movement, in particular in the growing arena of cryptocurrencies. The study results provide new insights into the applicability of ML as a tool to combat money laundering using cryptocurrency exchanges.

Details

Journal of Money Laundering Control, vol. 25 no. 4
Type: Research Article
ISSN: 1368-5201

Keywords

Open Access
Article
Publication date: 28 July 2020

Harleen Kaur and Vinita Kumari

Diabetes is a major metabolic disorder which can affect entire body system adversely. Undiagnosed diabetes can increase the risk of cardiac stroke, diabetic nephropathy and other…

11411

Abstract

Diabetes is a major metabolic disorder which can affect entire body system adversely. Undiagnosed diabetes can increase the risk of cardiac stroke, diabetic nephropathy and other disorders. All over the world millions of people are affected by this disease. Early detection of diabetes is very important to maintain a healthy life. This disease is a reason of global concern as the cases of diabetes are rising rapidly. Machine learning (ML) is a computational method for automatic learning from experience and improves the performance to make more accurate predictions. In the current research we have utilized machine learning technique in Pima Indian diabetes dataset to develop trends and detect patterns with risk factors using R data manipulation tool. To classify the patients into diabetic and non-diabetic we have developed and analyzed five different predictive models using R data manipulation tool. For this purpose we used supervised machine learning algorithms namely linear kernel support vector machine (SVM-linear), radial basis function (RBF) kernel support vector machine, k-nearest neighbour (k-NN), artificial neural network (ANN) and multifactor dimensionality reduction (MDR).

Open Access
Article
Publication date: 30 November 2021

Federico Barravecchia, Luca Mastrogiacomo and Fiorenzo Franceschini

Digital voice-of-customer (digital VoC) analysis is gaining much attention in the field of quality management. Digital VoC can be a great source of knowledge about customer needs…

1735

Abstract

Purpose

Digital voice-of-customer (digital VoC) analysis is gaining much attention in the field of quality management. Digital VoC can be a great source of knowledge about customer needs, habits and expectations. To this end, the most popular approach is based on the application of text mining algorithms named topic modelling. These algorithms can identify latent topics discussed within digital VoC and categorise each source (e.g. each review) based on its content. This paper aims to propose a structured procedure for validating the results produced by topic modelling algorithms.

Design/methodology/approach

The proposed procedure compares, on random samples, the results produced by topic modelling algorithms with those generated by human evaluators. The use of specific metrics allows to make a comparison between the two approaches and to provide a preliminary empirical validation.

Findings

The proposed procedure can address users of topic modelling algorithms in validating the obtained results. An application case study related to some car-sharing services supports the description.

Originality/value

Despite the vast success of topic modelling-based approaches, metrics and procedures to validate the obtained results are still lacking. This paper provides a first practical and structured validation procedure specifically employed for quality-related applications.

Details

International Journal of Quality & Reliability Management, vol. 39 no. 6
Type: Research Article
ISSN: 0265-671X

Keywords

Open Access
Article
Publication date: 28 April 2022

Pietro Miglioranza, Andrea Scanu, Giuseppe Simionato, Nicholas Sinigaglia and America Califano

Climate-induced damage is a pressing problem for the preservation of cultural properties. Their physical deterioration is often the cumulative effect of different environmental…

Abstract

Purpose

Climate-induced damage is a pressing problem for the preservation of cultural properties. Their physical deterioration is often the cumulative effect of different environmental hazards of variable intensity. Among these, fluctuations of temperature and relative humidity may cause nonrecoverable physical changes in building envelopes and artifacts made of hygroscopic materials, such as wood. Microclimatic fluctuations may be caused by several factors, including the presence of many visitors within the historical building. Within this framework, the current work is focused on detecting events taking place in two Norwegian stave churches, by identifying the fluctuations in temperature and relative humidity caused by the presence of people attending the public events.

Design/methodology/approach

The identification of such fluctuations and, so, of the presence of people within the churches has been carried out through three different methods. The first is an unsupervised clustering algorithm here termed “density peak,” the second is a supervised deep learning model based on a standard convolutional neural network (CNN) and the third is a novel ad hoc engineering feature approach “unexpected mixing ratio (UMR) peak.”

Findings

While the first two methods may have some instabilities (in terms of precision, recall and normal mutual information [NMI]), the last one shows a promising performance in the detection of microclimatic fluctuations induced by the presence of visitors.

Originality/value

The novelty of this work stands in using both well-established and in-house ad hoc machine learning algorithms in the field of heritage science, proving that these smart approaches could be of extreme usefulness and could lead to quick data analyses, if used properly.

Details

International Journal of Building Pathology and Adaptation, vol. 42 no. 1
Type: Research Article
ISSN: 2398-4708

Keywords

Open Access
Article
Publication date: 1 May 2020

Gonzalo Perera, Martin Sprechmann and Mathias Bourel

This study aims to perform a benefit segmentation and then a classification of visitors that travel to the Rocha Department in Uruguay from the capital city of Montevideo during…

1566

Abstract

Purpose

This study aims to perform a benefit segmentation and then a classification of visitors that travel to the Rocha Department in Uruguay from the capital city of Montevideo during the summer months.

Design/methodology/approach

A convenience sample was obtained with an online survey. A total of 290 cases were usable for subsequent data analysis. The following statistical techniques were used: hierarchical cluster analysis, K-means cluster analysis, machine learning, support vector machines, random forest and logistic regression.

Findings

Visitors that travel to the Rocha Department from Montevideo can be classified into four distinct clusters. Clusters are labelled as “entertainment seekers”, “Rocha followers”, “relax and activities seekers” and “active tourists”. The support vector machine model achieved the best classification results.

Research limitations/implications

Implications for destination marketers who cater to young visitors are discussed. Destination marketers should determine an optimal level of resource allocation and destination management activities that compare both present costs and discounted potential future income of the different target markets. Surveying non-residents was not possible. Future work should sample tourists from abroad.

Originality/value

The combination of market segmentation of Rocha Department’s visitors from the city of Montevideo and classification of sampled individuals training various machine learning classifiers would allow Rocha’s destination marketers determine the belonging of an unsampled individual into one of the already obtained four clusters, enhancing marketing promotion for targeted offers.

Details

Journal of Tourism Analysis: Revista de Análisis Turístico, vol. 27 no. 2
Type: Research Article
ISSN: 2254-0644

Keywords

Open Access
Article
Publication date: 9 December 2019

Zhengfa Yang, Qian Liu, Baowen Sun and Xin Zhao

This paper aims to make it convenient for those who have only just begun their research into Community Question Answering (CQA) expert recommendation, and for those who are…

1949

Abstract

Purpose

This paper aims to make it convenient for those who have only just begun their research into Community Question Answering (CQA) expert recommendation, and for those who are already concerned with this issue, to ease the extension of our understanding with future research.

Design/methodology/approach

In this paper, keywords such as “CQA”, “Social Question Answering”, “expert recommendation”, “question routing” and “expert finding” are used to search major digital libraries. The final sample includes a list of 83 relevant articles authored in academia as well as industry that have been published from January 1, 2008 to March 1, 2019.

Findings

This study proposes a comprehensive framework to categorize extant studies into three broad areas of CQA expert recommendation research: understanding profile modeling, recommendation approaches and recommendation system impacts.

Originality/value

This paper focuses on discussing and sorting out the key research issues from these three research genres. Finally, it was found that conflicting and contradictory research results and research gaps in the existing research, and then put forward the urgent research topics.

Details

International Journal of Crowd Science, vol. 3 no. 3
Type: Research Article
ISSN: 2398-7294

Keywords

Open Access
Article
Publication date: 11 July 2022

Afreen Khan, Swaleha Zubair and Samreen Khan

This study aimed to assess the potential of the Clinical Dementia Rating (CDR) Scale in the prognosis of dementia in elderly subjects.

Abstract

Purpose

This study aimed to assess the potential of the Clinical Dementia Rating (CDR) Scale in the prognosis of dementia in elderly subjects.

Design/methodology/approach

Dementia staging severity is clinically an essential task, so the authors used machine learning (ML) on the magnetic resonance imaging (MRI) features to locate and study the impact of various MR readings onto the classification of demented and nondemented patients. The authors used cross-sectional MRI data in this study. The designed ML approach established the role of CDR in the prognosis of inflicted and normal patients. Moreover, the pattern analysis indicated CDR as a strong cohort amongst the various attributes, with CDR to have a significant value of p < 0.01. The authors employed 20 ML classifiers.

Findings

The mean prediction accuracy varied with the various ML classifier used, with the bagging classifier (random forest as a base estimator) achieving the highest (93.67%). A series of ML analyses demonstrated that the model including the CDR score had better prediction accuracy and other related performance metrics.

Originality/value

The results suggest that the CDR score, a simple clinical measure, can be used in real community settings. It can be used to predict dementia progression with ML modeling.

Details

Arab Gulf Journal of Scientific Research, vol. 40 no. 1
Type: Research Article
ISSN: 1985-9899

Keywords

1 – 10 of 366