Search results

1 – 10 of 55
Open Access
Article
Publication date: 14 December 2021

Mariam Elhussein and Samiha Brahimi

This paper aims to propose a novel way of using textual clustering as a feature selection method. It is applied to identify the most important keywords in the profile…

Abstract

Purpose

This paper aims to propose a novel way of using textual clustering as a feature selection method. It is applied to identify the most important keywords in the profile classification. The method is demonstrated through the problem of sick-leave promoters on Twitter.

Design/methodology/approach

Four machine learning classifiers were used on a total of 35,578 tweets posted on Twitter. The data were manually labeled into two categories: promoter and nonpromoter. Classification performance was compared when the proposed clustering feature selection approach and the standard feature selection were applied.

Findings

Radom forest achieved the highest accuracy of 95.91% higher than similar work compared. Furthermore, using clustering as a feature selection method improved the Sensitivity of the model from 73.83% to 98.79%. Sensitivity (recall) is the most important measure of classifier performance when detecting promoters’ accounts that have spam-like behavior.

Research limitations/implications

The method applied is novel, more testing is needed in other datasets before generalizing its results.

Practical implications

The model applied can be used by Saudi authorities to report on the accounts that sell sick-leaves online.

Originality/value

The research is proposing a new way textual clustering can be used in feature selection.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 9 May 2022

Khalid Iqbal and Muhammad Shehrayar Khan

In this digital era, email is the most pervasive form of communication between people. Many users become a victim of spam emails and their data have been exposed.

9384

Abstract

Purpose

In this digital era, email is the most pervasive form of communication between people. Many users become a victim of spam emails and their data have been exposed.

Design/methodology/approach

Researchers contribute to solving this problem by a focus on advanced machine learning algorithms and improved models for detecting spam emails but there is still a gap in features. To achieve good results, features also play an important role. To evaluate the performance of applied classifiers, 10-fold cross-validation is used.

Findings

The results approve that the spam emails are correctly classified with the accuracy of 98.00% for the Support Vector Machine and 98.06% for the Artificial Neural Network as compared to other applied machine learning classifiers.

Originality/value

In this paper, Point-Biserial correlation is applied to each feature concerning the class label of the University of California Irvine (UCI) spambase email dataset to select the best features. Extensive experiments are conducted on selected features by training the different classifiers.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 23 October 2023

Jan Svanberg, Tohid Ardeshiri, Isak Samsten, Peter Öhman, Presha E. Neidermeyer, Tarek Rana, Frank Maisano and Mats Danielson

The purpose of this study is to develop a method to assess social performance. Traditionally, environment, social and governance (ESG) rating providers use subjectively weighted…

Abstract

Purpose

The purpose of this study is to develop a method to assess social performance. Traditionally, environment, social and governance (ESG) rating providers use subjectively weighted arithmetic averages to combine a set of social performance (SP) indicators into one single rating. To overcome this problem, this study investigates the preconditions for a new methodology for rating the SP component of the ESG by applying machine learning (ML) and artificial intelligence (AI) anchored to social controversies.

Design/methodology/approach

This study proposes the use of a data-driven rating methodology that derives the relative importance of SP features from their contribution to the prediction of social controversies. The authors use the proposed methodology to solve the weighting problem with overall ESG ratings and further investigate whether prediction is possible.

Findings

The authors find that ML models are able to predict controversies with high predictive performance and validity. The findings indicate that the weighting problem with the ESG ratings can be addressed with a data-driven approach. The decisive prerequisite, however, for the proposed rating methodology is that social controversies are predicted by a broad set of SP indicators. The results also suggest that predictively valid ratings can be developed with this ML-based AI method.

Practical implications

This study offers practical solutions to ESG rating problems that have implications for investors, ESG raters and socially responsible investments.

Social implications

The proposed ML-based AI method can help to achieve better ESG ratings, which will in turn help to improve SP, which has implications for organizations and societies through sustainable development.

Originality/value

To the best of the authors’ knowledge, this research is one of the first studies that offers a unique method to address the ESG rating problem and improve sustainability by focusing on SP indicators.

Details

Sustainability Accounting, Management and Policy Journal, vol. 14 no. 7
Type: Research Article
ISSN: 2040-8021

Keywords

Open Access
Article
Publication date: 4 May 2021

Loris Nanni and Sheryl Brahnam

Automatic DNA-binding protein (DNA-BP) classification is now an essential proteomic technology. Unfortunately, many systems reported in the literature are tested on only one or…

1358

Abstract

Purpose

Automatic DNA-binding protein (DNA-BP) classification is now an essential proteomic technology. Unfortunately, many systems reported in the literature are tested on only one or two datasets/tasks. The purpose of this study is to create the most optimal and universal system for DNA-BP classification, one that performs competitively across several DNA-BP classification tasks.

Design/methodology/approach

Efficient DNA-BP classifier systems require the discovery of powerful protein representations and feature extraction methods. Experiments were performed that combined and compared descriptors extracted from state-of-the-art matrix/image protein representations. These descriptors were trained on separate support vector machines (SVMs) and evaluated. Convolutional neural networks with different parameter settings were fine-tuned on two matrix representations of proteins. Decisions were fused with the SVMs using the weighted sum rule and evaluated to experimentally derive the most powerful general-purpose DNA-BP classifier system.

Findings

The best ensemble proposed here produced comparable, if not superior, classification results on a broad and fair comparison with the literature across four different datasets representing a variety of DNA-BP classification tasks, thereby demonstrating both the power and generalizability of the proposed system.

Originality/value

Most DNA-BP methods proposed in the literature are only validated on one (rarely two) datasets/tasks. In this work, the authors report the performance of our general-purpose DNA-BP system on four datasets representing different DNA-BP classification tasks. The excellent results of the proposed best classifier system demonstrate the power of the proposed approach. These results can now be used for baseline comparisons by other researchers in the field.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 11 April 2022

Jie Zhu, Said Easa and Kun Gao

On-ramp merging areas are typical bottlenecks in the freeway network since merging on-ramp vehicles may cause intensive disturbances on the mainline traffic flow and lead to…

2318

Abstract

Purpose

On-ramp merging areas are typical bottlenecks in the freeway network since merging on-ramp vehicles may cause intensive disturbances on the mainline traffic flow and lead to various negative impacts on traffic efficiency and safety. The connected and autonomous vehicles (CAVs), with their capabilities of real-time communication and precise motion control, hold a great potential to facilitate ramp merging operation through enhanced coordination strategies. This paper aims to present a comprehensive review of the existing ramp merging strategies leveraging CAVs, focusing on the latest trends and developments in the research field.

Design/methodology/approach

The review comprehensively covers 44 papers recently published in leading transportation journals. Based on the application context, control strategies are categorized into three categories: merging into sing-lane freeways with total CAVs, merging into sing-lane freeways with mixed traffic flows and merging into multilane freeways.

Findings

Relevant literature is reviewed regarding the required technologies, control decision level, applied methods and impacts on traffic performance. More importantly, the authors identify the existing research gaps and provide insightful discussions on the potential and promising directions for future research based on the review, which facilitates further advancement in this research topic.

Originality/value

Many strategies based on the communication and automation capabilities of CAVs have been developed over the past decades, devoted to facilitating the merging/lane-changing maneuvers at freeway on-ramps. Despite the significant progress made, an up-to-date review covering these latest developments is missing to the authors’ best knowledge. This paper conducts a thorough review of the cooperation/coordination strategies that facilitate freeway on-ramp merging using CAVs, focusing on the latest developments in this field. Based on the review, the authors identify the existing research gaps in CAV ramp merging and discuss the potential and promising future research directions to address the gaps.

Details

Journal of Intelligent and Connected Vehicles, vol. 5 no. 2
Type: Research Article
ISSN: 2399-9802

Keywords

Open Access
Article
Publication date: 15 February 2022

Martin Nečaský, Petr Škoda, David Bernhauer, Jakub Klímek and Tomáš Skopal

Semantic retrieval and discovery of datasets published as open data remains a challenging task. The datasets inherently originate in the globally distributed web jungle, lacking…

1221

Abstract

Purpose

Semantic retrieval and discovery of datasets published as open data remains a challenging task. The datasets inherently originate in the globally distributed web jungle, lacking the luxury of centralized database administration, database schemes, shared attributes, vocabulary, structure and semantics. The existing dataset catalogs provide basic search functionality relying on keyword search in brief, incomplete or misleading textual metadata attached to the datasets. The search results are thus often insufficient. However, there exist many ways of improving the dataset discovery by employing content-based retrieval, machine learning tools, third-party (external) knowledge bases, countless feature extraction methods and description models and so forth.

Design/methodology/approach

In this paper, the authors propose a modular framework for rapid experimentation with methods for similarity-based dataset discovery. The framework consists of an extensible catalog of components prepared to form custom pipelines for dataset representation and discovery.

Findings

The study proposes several proof-of-concept pipelines including experimental evaluation, which showcase the usage of the framework.

Originality/value

To the best of authors’ knowledge, there is no similar formal framework for experimentation with various similarity methods in the context of dataset discovery. The framework has the ambition to establish a platform for reproducible and comparable research in the area of dataset discovery. The prototype implementation of the framework is available on GitHub.

Details

Data Technologies and Applications, vol. 56 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Open Access
Article
Publication date: 28 November 2018

Lisa M. Young and Swapnil Rajendra Gavade

The purpose of this paper is to use the data analysis method of sentiment analysis to improve the understanding of a large data set of employee comments from an annual employee…

4310

Abstract

Purpose

The purpose of this paper is to use the data analysis method of sentiment analysis to improve the understanding of a large data set of employee comments from an annual employee job satisfaction survey of a US hospitality organization.

Design/methodology/approach

Sentiment analysis is used to examine the employee comments by identifying meaningful patterns, frequently used words and emotions. The statistical computing language, R, uses the sentiment analysis process to scan each employee survey comment, compare the words with the predefined word dictionary and classify the employee comments into the appropriate emotion category.

Findings

Employee responses written in English and in Spanish are compared with significant differences identified between the two groups, triggering further investigation of the Spanish comments. Sentiment analysis was then conducted on the Spanish comments comparing two groups, front-of-house vs back-of-house employees and employees with male supervisors vs female supervisors. Results from the analysis of employee comments written in Spanish point to higher scores for job sadness and anger. The negative comments referred to desires for improved healthcare, requests for increased wages and frustration with difficult supervisor relationships. The findings from this study add to the growing body of literature that has begun to focus on the unique work experiences of Latino employees in the USA.

Originality/value

This is the first study to examine a large unstructured English and Spanish text database from a hospitality organization’s employee job satisfaction surveys using sentiment analysis. Applying this big data analytics process to advance new insights into the human capital aspects of hospitality management is intriguing to many researchers. The results of this study demonstrate an issue that needs to be further investigated particularly considering the hospitality industry’s employee demographics.

Details

International Hospitality Review, vol. 32 no. 1
Type: Research Article
ISSN: 2516-8142

Keywords

Open Access
Article
Publication date: 19 December 2018

Min Wang, Shuguang Li, Lei Zhu and Jin Yao

Analysis of characteristic driving operations can help develop supports for drivers with different driving skills. However, the existing knowledge on analysis of driving skills…

1112

Abstract

Purpose

Analysis of characteristic driving operations can help develop supports for drivers with different driving skills. However, the existing knowledge on analysis of driving skills only focuses on single driving operation and cannot reflect the differences on proficiency of coordination of driving operations. Thus, the purpose of this paper is to analyze driving skills from driving coordinating operations. There are two main contributions: the first involves a method for feature extraction based on AdaBoost, which selects features critical for coordinating operations of experienced drivers and inexperienced drivers, and the second involves a generating method for candidate features, called the combined features method, through which two or more different driving operations at the same location are combined into a candidate combined feature. A series of experiments based on driving simulator and specific course with several different curves were carried out, and the result indicated the feasibility of analyzing driving behavior through AdaBoost and the combined features method.

Design/methodology/approach

AdaBoost was used to extract features and the combined features method was used to combine two or more different driving operations at the same location.

Findings

A series of experiments based on driving simulator and specific course with several different curves were carried out, and the result indicated the feasibility of analyzing driving behavior through AdaBoost and the combined features method.

Originality/value

There are two main contributions: the first involves a method for feature extraction based on AdaBoost, which selects features critical for coordinating operations of experienced drivers and inexperienced drivers, and the second involves a generating method for candidate features, called the combined features method, through which two or more different driving operations at the same location are combined into a candidate combined feature.

Details

Journal of Intelligent and Connected Vehicles, vol. 1 no. 3
Type: Research Article
ISSN: 2399-9802

Keywords

Open Access
Article
Publication date: 11 July 2022

Afreen Khan, Swaleha Zubair and Samreen Khan

This study aimed to assess the potential of the Clinical Dementia Rating (CDR) Scale in the prognosis of dementia in elderly subjects.

Abstract

Purpose

This study aimed to assess the potential of the Clinical Dementia Rating (CDR) Scale in the prognosis of dementia in elderly subjects.

Design/methodology/approach

Dementia staging severity is clinically an essential task, so the authors used machine learning (ML) on the magnetic resonance imaging (MRI) features to locate and study the impact of various MR readings onto the classification of demented and nondemented patients. The authors used cross-sectional MRI data in this study. The designed ML approach established the role of CDR in the prognosis of inflicted and normal patients. Moreover, the pattern analysis indicated CDR as a strong cohort amongst the various attributes, with CDR to have a significant value of p < 0.01. The authors employed 20 ML classifiers.

Findings

The mean prediction accuracy varied with the various ML classifier used, with the bagging classifier (random forest as a base estimator) achieving the highest (93.67%). A series of ML analyses demonstrated that the model including the CDR score had better prediction accuracy and other related performance metrics.

Originality/value

The results suggest that the CDR score, a simple clinical measure, can be used in real community settings. It can be used to predict dementia progression with ML modeling.

Details

Arab Gulf Journal of Scientific Research, vol. 40 no. 1
Type: Research Article
ISSN: 1985-9899

Keywords

Open Access
Article
Publication date: 29 July 2020

Abdullah Alharbi, Wajdi Alhakami, Sami Bourouis, Fatma Najar and Nizar Bouguila

We propose in this paper a novel reliable detection method to recognize forged inpainting images. Detecting potential forgeries and authenticating the content of digital images is…

Abstract

We propose in this paper a novel reliable detection method to recognize forged inpainting images. Detecting potential forgeries and authenticating the content of digital images is extremely challenging and important for many applications. The proposed approach involves developing new probabilistic support vector machines (SVMs) kernels from a flexible generative statistical model named “bounded generalized Gaussian mixture model”. The developed learning framework has the advantage to combine properly the benefits of both discriminative and generative models and to include prior knowledge about the nature of data. It can effectively recognize if an image is a tampered one and also to identify both forged and authentic images. The obtained results confirmed that the developed framework has good performance under numerous inpainted images.

Details

Applied Computing and Informatics, vol. 20 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Access

Only Open Access

Year

All dates (55)

Content type

1 – 10 of 55