Search results
1 – 10 of 366Mariam Elhussein and Samiha Brahimi
This paper aims to propose a novel way of using textual clustering as a feature selection method. It is applied to identify the most important keywords in the profile…
Abstract
Purpose
This paper aims to propose a novel way of using textual clustering as a feature selection method. It is applied to identify the most important keywords in the profile classification. The method is demonstrated through the problem of sick-leave promoters on Twitter.
Design/methodology/approach
Four machine learning classifiers were used on a total of 35,578 tweets posted on Twitter. The data were manually labeled into two categories: promoter and nonpromoter. Classification performance was compared when the proposed clustering feature selection approach and the standard feature selection were applied.
Findings
Radom forest achieved the highest accuracy of 95.91% higher than similar work compared. Furthermore, using clustering as a feature selection method improved the Sensitivity of the model from 73.83% to 98.79%. Sensitivity (recall) is the most important measure of classifier performance when detecting promoters’ accounts that have spam-like behavior.
Research limitations/implications
The method applied is novel, more testing is needed in other datasets before generalizing its results.
Practical implications
The model applied can be used by Saudi authorities to report on the accounts that sell sick-leaves online.
Originality/value
The research is proposing a new way textual clustering can be used in feature selection.
Details
Keywords
Spam emails classification using data mining and machine learning approaches has enticed the researchers' attention duo to its obvious positive impact in protecting internet…
Abstract
Spam emails classification using data mining and machine learning approaches has enticed the researchers' attention duo to its obvious positive impact in protecting internet users. Several features can be used for creating data mining and machine learning based spam classification models. Yet, spammers know that the longer they will use the same set of features for tricking email users the more probably the anti-spam parties might develop tools for combating this kind of annoying email messages. Spammers, so, adapt by continuously reforming the group of features utilized for composing spam emails. For that reason, even though traditional classification methods possess sound classification results, they were ineffective for lifelong classification of spam emails duo to the fact that they might be prone to the so-called “Concept Drift”. In the current study, an enhanced model is proposed for ensuring lifelong spam classification model. For the evaluation purposes, the overall performance of the suggested model is contrasted against various other stream mining classification techniques. The results proved the success of the suggested model as a lifelong spam emails classification method.
Details
Keywords
Gustavo Grander, Luciano Ferreira da Silva and Ernesto Del Rosário Santibañez Gonzalez
This paper aims to analyze how decision support systems manage Big data to obtain value.
Abstract
Purpose
This paper aims to analyze how decision support systems manage Big data to obtain value.
Design/methodology/approach
A systematic literature review was performed with screening and analysis of 72 articles published between 2012 and 2019.
Findings
The findings reveal that techniques of big data analytics, machine learning algorithms and technologies predominantly related to computer science and cloud computing are used on decision support systems. Another finding was that the main areas that these techniques and technologies are been applied are logistic, traffic, health, business and market. This article also allows authors to understand the relationship in which descriptive, predictive and prescriptive analyses are used according to an inverse relationship of complexity in data analysis and the need for human decision-making.
Originality/value
As it is an emerging theme, this study seeks to present an overview of the techniques and technologies that are being discussed in the literature to solve problems in their respective areas, as a form of theoretical contribution. The authors also understand that there is a practical contribution to the maturity of the discussion and with reflections even presented as suggestions for future research, such as the ethical discussion. This study’s descriptive classification can also serve as a guide for new researchers who seek to understand the research involving decision support systems and big data to gain value in our society.
Details
Keywords
Eric Pettersson Ruiz and Jannis Angelis
This study aims to explore how to deanonymize cryptocurrency money launderers with the help of machine learning (ML). Money is laundered through cryptocurrencies by distributing…
Abstract
Purpose
This study aims to explore how to deanonymize cryptocurrency money launderers with the help of machine learning (ML). Money is laundered through cryptocurrencies by distributing funds to multiple accounts and then reexchanging the crypto back. This process of exchanging currencies is done through cryptocurrency exchanges. Current preventive efforts are outdated, and ML may provide novel ways to identify illicit currency movements. Hence, this study investigates ML applicability for combatting money laundering activities using cryptocurrency.
Design/methodology/approach
Four supervised-learning algorithms were compared using the Bitcoin Elliptic Dataset. The method covered a quantitative analysis of the algorithmic performance, capturing differences in three key evaluation metrics of F1-scores, precision and recall. Two complementary qualitative interviews were performed at cryptocurrency exchanges to identify fit and applicability of the algorithms.
Findings
The study results show that the current implemented ML tools for preventing money laundering at cryptocurrency exchanges are all too slow and need to be optimized for the task. The results also show that while not one single algorithm is most suitable for detecting transactions related to money-laundering, the specific applicability of the decision tree algorithm is most suitable for adoption by cryptocurrency exchanges.
Originality/value
Given the growth of cryptocurrency use, this study explores the newly developed field of algorithmic tools to combat illicit currency movement, in particular in the growing arena of cryptocurrencies. The study results provide new insights into the applicability of ML as a tool to combat money laundering using cryptocurrency exchanges.
Details
Keywords
Harleen Kaur and Vinita Kumari
Diabetes is a major metabolic disorder which can affect entire body system adversely. Undiagnosed diabetes can increase the risk of cardiac stroke, diabetic nephropathy and other…
Abstract
Diabetes is a major metabolic disorder which can affect entire body system adversely. Undiagnosed diabetes can increase the risk of cardiac stroke, diabetic nephropathy and other disorders. All over the world millions of people are affected by this disease. Early detection of diabetes is very important to maintain a healthy life. This disease is a reason of global concern as the cases of diabetes are rising rapidly. Machine learning (ML) is a computational method for automatic learning from experience and improves the performance to make more accurate predictions. In the current research we have utilized machine learning technique in Pima Indian diabetes dataset to develop trends and detect patterns with risk factors using R data manipulation tool. To classify the patients into diabetic and non-diabetic we have developed and analyzed five different predictive models using R data manipulation tool. For this purpose we used supervised machine learning algorithms namely linear kernel support vector machine (SVM-linear), radial basis function (RBF) kernel support vector machine, k-nearest neighbour (k-NN), artificial neural network (ANN) and multifactor dimensionality reduction (MDR).
Details
Keywords
Federico Barravecchia, Luca Mastrogiacomo and Fiorenzo Franceschini
Digital voice-of-customer (digital VoC) analysis is gaining much attention in the field of quality management. Digital VoC can be a great source of knowledge about customer needs…
Abstract
Purpose
Digital voice-of-customer (digital VoC) analysis is gaining much attention in the field of quality management. Digital VoC can be a great source of knowledge about customer needs, habits and expectations. To this end, the most popular approach is based on the application of text mining algorithms named topic modelling. These algorithms can identify latent topics discussed within digital VoC and categorise each source (e.g. each review) based on its content. This paper aims to propose a structured procedure for validating the results produced by topic modelling algorithms.
Design/methodology/approach
The proposed procedure compares, on random samples, the results produced by topic modelling algorithms with those generated by human evaluators. The use of specific metrics allows to make a comparison between the two approaches and to provide a preliminary empirical validation.
Findings
The proposed procedure can address users of topic modelling algorithms in validating the obtained results. An application case study related to some car-sharing services supports the description.
Originality/value
Despite the vast success of topic modelling-based approaches, metrics and procedures to validate the obtained results are still lacking. This paper provides a first practical and structured validation procedure specifically employed for quality-related applications.
Details
Keywords
Pietro Miglioranza, Andrea Scanu, Giuseppe Simionato, Nicholas Sinigaglia and America Califano
Climate-induced damage is a pressing problem for the preservation of cultural properties. Their physical deterioration is often the cumulative effect of different environmental…
Abstract
Purpose
Climate-induced damage is a pressing problem for the preservation of cultural properties. Their physical deterioration is often the cumulative effect of different environmental hazards of variable intensity. Among these, fluctuations of temperature and relative humidity may cause nonrecoverable physical changes in building envelopes and artifacts made of hygroscopic materials, such as wood. Microclimatic fluctuations may be caused by several factors, including the presence of many visitors within the historical building. Within this framework, the current work is focused on detecting events taking place in two Norwegian stave churches, by identifying the fluctuations in temperature and relative humidity caused by the presence of people attending the public events.
Design/methodology/approach
The identification of such fluctuations and, so, of the presence of people within the churches has been carried out through three different methods. The first is an unsupervised clustering algorithm here termed “density peak,” the second is a supervised deep learning model based on a standard convolutional neural network (CNN) and the third is a novel ad hoc engineering feature approach “unexpected mixing ratio (UMR) peak.”
Findings
While the first two methods may have some instabilities (in terms of precision, recall and normal mutual information [NMI]), the last one shows a promising performance in the detection of microclimatic fluctuations induced by the presence of visitors.
Originality/value
The novelty of this work stands in using both well-established and in-house ad hoc machine learning algorithms in the field of heritage science, proving that these smart approaches could be of extreme usefulness and could lead to quick data analyses, if used properly.
Details
Keywords
Gonzalo Perera, Martin Sprechmann and Mathias Bourel
This study aims to perform a benefit segmentation and then a classification of visitors that travel to the Rocha Department in Uruguay from the capital city of Montevideo during…
Abstract
Purpose
This study aims to perform a benefit segmentation and then a classification of visitors that travel to the Rocha Department in Uruguay from the capital city of Montevideo during the summer months.
Design/methodology/approach
A convenience sample was obtained with an online survey. A total of 290 cases were usable for subsequent data analysis. The following statistical techniques were used: hierarchical cluster analysis, K-means cluster analysis, machine learning, support vector machines, random forest and logistic regression.
Findings
Visitors that travel to the Rocha Department from Montevideo can be classified into four distinct clusters. Clusters are labelled as “entertainment seekers”, “Rocha followers”, “relax and activities seekers” and “active tourists”. The support vector machine model achieved the best classification results.
Research limitations/implications
Implications for destination marketers who cater to young visitors are discussed. Destination marketers should determine an optimal level of resource allocation and destination management activities that compare both present costs and discounted potential future income of the different target markets. Surveying non-residents was not possible. Future work should sample tourists from abroad.
Originality/value
The combination of market segmentation of Rocha Department’s visitors from the city of Montevideo and classification of sampled individuals training various machine learning classifiers would allow Rocha’s destination marketers determine the belonging of an unsampled individual into one of the already obtained four clusters, enhancing marketing promotion for targeted offers.
Details
Keywords
Zhengfa Yang, Qian Liu, Baowen Sun and Xin Zhao
This paper aims to make it convenient for those who have only just begun their research into Community Question Answering (CQA) expert recommendation, and for those who are…
Abstract
Purpose
This paper aims to make it convenient for those who have only just begun their research into Community Question Answering (CQA) expert recommendation, and for those who are already concerned with this issue, to ease the extension of our understanding with future research.
Design/methodology/approach
In this paper, keywords such as “CQA”, “Social Question Answering”, “expert recommendation”, “question routing” and “expert finding” are used to search major digital libraries. The final sample includes a list of 83 relevant articles authored in academia as well as industry that have been published from January 1, 2008 to March 1, 2019.
Findings
This study proposes a comprehensive framework to categorize extant studies into three broad areas of CQA expert recommendation research: understanding profile modeling, recommendation approaches and recommendation system impacts.
Originality/value
This paper focuses on discussing and sorting out the key research issues from these three research genres. Finally, it was found that conflicting and contradictory research results and research gaps in the existing research, and then put forward the urgent research topics.
Details
Keywords
Afreen Khan, Swaleha Zubair and Samreen Khan
This study aimed to assess the potential of the Clinical Dementia Rating (CDR) Scale in the prognosis of dementia in elderly subjects.
Abstract
Purpose
This study aimed to assess the potential of the Clinical Dementia Rating (CDR) Scale in the prognosis of dementia in elderly subjects.
Design/methodology/approach
Dementia staging severity is clinically an essential task, so the authors used machine learning (ML) on the magnetic resonance imaging (MRI) features to locate and study the impact of various MR readings onto the classification of demented and nondemented patients. The authors used cross-sectional MRI data in this study. The designed ML approach established the role of CDR in the prognosis of inflicted and normal patients. Moreover, the pattern analysis indicated CDR as a strong cohort amongst the various attributes, with CDR to have a significant value of p < 0.01. The authors employed 20 ML classifiers.
Findings
The mean prediction accuracy varied with the various ML classifier used, with the bagging classifier (random forest as a base estimator) achieving the highest (93.67%). A series of ML analyses demonstrated that the model including the CDR score had better prediction accuracy and other related performance metrics.
Originality/value
The results suggest that the CDR score, a simple clinical measure, can be used in real community settings. It can be used to predict dementia progression with ML modeling.
Details