Search results

1 – 10 of 221
Open Access
Article
Publication date: 11 July 2022

Afreen Khan, Swaleha Zubair and Samreen Khan

This study aimed to assess the potential of the Clinical Dementia Rating (CDR) Scale in the prognosis of dementia in elderly subjects.

Abstract

Purpose

This study aimed to assess the potential of the Clinical Dementia Rating (CDR) Scale in the prognosis of dementia in elderly subjects.

Design/methodology/approach

Dementia staging severity is clinically an essential task, so the authors used machine learning (ML) on the magnetic resonance imaging (MRI) features to locate and study the impact of various MR readings onto the classification of demented and nondemented patients. The authors used cross-sectional MRI data in this study. The designed ML approach established the role of CDR in the prognosis of inflicted and normal patients. Moreover, the pattern analysis indicated CDR as a strong cohort amongst the various attributes, with CDR to have a significant value of p < 0.01. The authors employed 20 ML classifiers.

Findings

The mean prediction accuracy varied with the various ML classifier used, with the bagging classifier (random forest as a base estimator) achieving the highest (93.67%). A series of ML analyses demonstrated that the model including the CDR score had better prediction accuracy and other related performance metrics.

Originality/value

The results suggest that the CDR score, a simple clinical measure, can be used in real community settings. It can be used to predict dementia progression with ML modeling.

Details

Arab Gulf Journal of Scientific Research, vol. 40 no. 1
Type: Research Article
ISSN: 1985-9899

Keywords

Open Access
Article
Publication date: 18 March 2022

Loris Nanni, Alessandra Lumini and Sheryl Brahnam

Automatic anatomical therapeutic chemical (ATC) classification is progressing at a rapid pace because of its potential in drug development. Predicting an unknown compound's…

Abstract

Purpose

Automatic anatomical therapeutic chemical (ATC) classification is progressing at a rapid pace because of its potential in drug development. Predicting an unknown compound's therapeutic and chemical characteristics in terms of how it affects multiple organs and physiological systems makes automatic ATC classification a vital yet challenging multilabel problem. The aim of this paper is to experimentally derive an ensemble of different feature descriptors and classifiers for ATC classification that outperforms the state-of-the-art.

Design/methodology/approach

The proposed method is an ensemble generated by the fusion of neural networks (i.e. a tabular model and long short-term memory networks (LSTM)) and multilabel classifiers based on multiple linear regression (hMuLab). All classifiers are trained on three sets of descriptors. Features extracted from the trained LSTMs are also fed into hMuLab. Evaluations of ensembles are compared on a benchmark data set of 3883 ATC-coded pharmaceuticals taken from KEGG, a publicly available drug databank.

Findings

Experiments demonstrate the power of the authors’ best ensemble, EnsATC, which is shown to outperform the best methods reported in the literature, including the state-of-the-art developed by the fast.ai research group. The MATLAB source code of the authors’ system is freely available to the public at https://github.com/LorisNanni/Neural-networks-for-anatomical-therapeutic-chemical-ATC-classification.

Originality/value

This study demonstrates the power of extracting LSTM features and combining them with ATC descriptors in ensembles for ATC classification.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 9 May 2022

Khalid Iqbal and Muhammad Shehrayar Khan

In this digital era, email is the most pervasive form of communication between people. Many users become a victim of spam emails and their data have been exposed.

9365

Abstract

Purpose

In this digital era, email is the most pervasive form of communication between people. Many users become a victim of spam emails and their data have been exposed.

Design/methodology/approach

Researchers contribute to solving this problem by a focus on advanced machine learning algorithms and improved models for detecting spam emails but there is still a gap in features. To achieve good results, features also play an important role. To evaluate the performance of applied classifiers, 10-fold cross-validation is used.

Findings

The results approve that the spam emails are correctly classified with the accuracy of 98.00% for the Support Vector Machine and 98.06% for the Artificial Neural Network as compared to other applied machine learning classifiers.

Originality/value

In this paper, Point-Biserial correlation is applied to each feature concerning the class label of the University of California Irvine (UCI) spambase email dataset to select the best features. Extensive experiments are conducted on selected features by training the different classifiers.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 30 July 2020

Alaa Tharwat

Classification techniques have been applied to many applications in various fields of sciences. There are several ways of evaluating classification algorithms. The analysis of…

33526

Abstract

Classification techniques have been applied to many applications in various fields of sciences. There are several ways of evaluating classification algorithms. The analysis of such metrics and its significance must be interpreted correctly for evaluating different learning algorithms. Most of these measures are scalar metrics and some of them are graphical methods. This paper introduces a detailed overview of the classification assessment measures with the aim of providing the basics of these measures and to show how it works to serve as a comprehensive source for researchers who are interested in this field. This overview starts by highlighting the definition of the confusion matrix in binary and multi-class classification problems. Many classification measures are also explained in details, and the influence of balanced and imbalanced data on each metric is presented. An illustrative example is introduced to show (1) how to calculate these measures in binary and multi-class classification problems, and (2) the robustness of some measures against balanced and imbalanced data. Moreover, some graphical measures such as Receiver operating characteristics (ROC), Precision-Recall, and Detection error trade-off (DET) curves are presented with details. Additionally, in a step-by-step approach, different numerical examples are demonstrated to explain the preprocessing steps of plotting ROC, PR, and DET curves.

Details

Applied Computing and Informatics, vol. 17 no. 1
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 1 April 2021

Arunit Maity, P. Prakasam and Sarthak Bhargava

Due to the continuous and rapid evolution of telecommunication equipment, the demand for more efficient and noise-robust detection of dual-tone multi-frequency (DTMF) signals is…

1308

Abstract

Purpose

Due to the continuous and rapid evolution of telecommunication equipment, the demand for more efficient and noise-robust detection of dual-tone multi-frequency (DTMF) signals is most significant.

Design/methodology/approach

A novel machine learning-based approach to detect DTMF tones affected by noise, frequency and time variations by employing the k-nearest neighbour (KNN) algorithm is proposed. The features required for training the proposed KNN classifier are extracted using Goertzel's algorithm that estimates the absolute discrete Fourier transform (DFT) coefficient values for the fundamental DTMF frequencies with or without considering their second harmonic frequencies. The proposed KNN classifier model is configured in four different manners which differ in being trained with or without augmented data, as well as, with or without the inclusion of second harmonic frequency DFT coefficient values as features.

Findings

It is found that the model which is trained using the augmented data set and additionally includes the absolute DFT values of the second harmonic frequency values for the eight fundamental DTMF frequencies as the features, achieved the best performance with a macro classification F1 score of 0.980835, a five-fold stratified cross-validation accuracy of 98.47% and test data set detection accuracy of 98.1053%.

Originality/value

The generated DTMF signal has been classified and detected using the proposed KNN classifier which utilizes the DFT coefficient along with second harmonic frequencies for better classification. Additionally, the proposed KNN classifier has been compared with existing models to ascertain its superiority and proclaim its state-of-the-art performance.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 13 November 2019

Ahmad Younso, Ziad Kanaya and Nour Azhari

We consider the kernel-based classifier proposed by Younso (2017). This nonparametric classifier allows for the classification of missing spatially dependent data. The weak…

Abstract

We consider the kernel-based classifier proposed by Younso (2017). This nonparametric classifier allows for the classification of missing spatially dependent data. The weak consistency of the classifier has been studied by Younso (2017). The purpose of this paper is to establish strong consistency of this classifier under mild conditions. The classifier is discussed in a multi-class case. The results are illustrated with simulation studies and real applications.

Details

Arab Journal of Mathematical Sciences, vol. 26 no. 1/2
Type: Research Article
ISSN: 1319-5166

Keywords

Open Access
Article
Publication date: 3 July 2017

Rahila Umer, Teo Susnjak, Anuradha Mathrani and Suriadi Suriadi

The purpose of this paper is to propose a process mining approach to help in making early predictions to improve students’ learning experience in massive open online courses…

6260

Abstract

Purpose

The purpose of this paper is to propose a process mining approach to help in making early predictions to improve students’ learning experience in massive open online courses (MOOCs). It investigates the impact of various machine learning techniques in combination with process mining features to measure effectiveness of these techniques.

Design/methodology/approach

Student’s data (e.g. assessment grades, demographic information) and weekly interaction data based on event logs (e.g. video lecture interaction, solution submission time, time spent weekly) have guided this design. This study evaluates four machine learning classification techniques used in the literature (logistic regression (LR), Naïve Bayes (NB), random forest (RF) and K-nearest neighbor) to monitor weekly progression of students’ performance and to predict their overall performance outcome. Two data sets – one, with traditional features and second, with features obtained from process conformance testing – have been used.

Findings

The results show that techniques used in the study are able to make predictions on the performance of students. Overall accuracy (F1-score, area under curve) of machine learning techniques can be improved by integrating process mining features with standard features. Specifically, the use of LR and NB classifiers outperforms other techniques in a statistical significant way.

Practical implications

Although MOOCs provide a platform for learning in highly scalable and flexible manner, they are prone to early dropout and low completion rate. This study outlines a data-driven approach to improve students’ learning experience and decrease the dropout rate.

Social implications

Early predictions based on individual’s participation can help educators provide support to students who are struggling in the course.

Originality/value

This study outlines the innovative use of process mining techniques in education data mining to help educators gather data-driven insight on student performances in the enrolled courses.

Details

Journal of Research in Innovative Teaching & Learning, vol. 10 no. 2
Type: Research Article
ISSN: 2397-7604

Keywords

Open Access
Article
Publication date: 23 July 2020

Rami Mustafa A. Mohammad

Spam emails classification using data mining and machine learning approaches has enticed the researchers' attention duo to its obvious positive impact in protecting internet…

2020

Abstract

Spam emails classification using data mining and machine learning approaches has enticed the researchers' attention duo to its obvious positive impact in protecting internet users. Several features can be used for creating data mining and machine learning based spam classification models. Yet, spammers know that the longer they will use the same set of features for tricking email users the more probably the anti-spam parties might develop tools for combating this kind of annoying email messages. Spammers, so, adapt by continuously reforming the group of features utilized for composing spam emails. For that reason, even though traditional classification methods possess sound classification results, they were ineffective for lifelong classification of spam emails duo to the fact that they might be prone to the so-called “Concept Drift”. In the current study, an enhanced model is proposed for ensuring lifelong spam classification model. For the evaluation purposes, the overall performance of the suggested model is contrasted against various other stream mining classification techniques. The results proved the success of the suggested model as a lifelong spam emails classification method.

Details

Applied Computing and Informatics, vol. 20 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 2 May 2017

Choo Jun Tan, Ting Yee Lim, Chin Wei Bong and Teik Kooi Liew

The purpose of this paper is to propose a soft computing model based on multi-objective evolutionary algorithm (MOEA), namely, modified micro genetic algorithm (MmGA) coupled with…

1689

Abstract

Purpose

The purpose of this paper is to propose a soft computing model based on multi-objective evolutionary algorithm (MOEA), namely, modified micro genetic algorithm (MmGA) coupled with a decision tree (DT)-based classifier, in classifying and optimising the students’ online interaction activities as classifier of student achievement. Subsequently, the results are transformed into useful information that may help educator in designing better learning instructions geared towards higher student achievement.

Design/methodology/approach

A soft computing model based on MOEA is proposed. It is tested on benchmark data pertaining to student activities and achievement obtained from the University of California at Irvine machine learning repository. Additional, a real-world case study in a distance learning institution, namely, Wawasan Open University in Malaysia has been conducted. The case study involves a total of 46 courses collected over 24 consecutive weeks with students across the entire regions in Malaysia and worldwide.

Findings

The proposed model obtains high classification accuracy rates at reduced number of features used. These results are transformed into useful information for the educational institution in our case study in an effort to improve student achievement. Whether benchmark or real-world case study, the proposed model successfully reduced the number features used by at least 48 per cent while achieving higher classification accuracy.

Originality/value

A soft computing model based on MOEA, namely, MmGA coupled with a DT-based classifier, in handling educational data is proposed.

Details

Asian Association of Open Universities Journal, vol. 12 no. 1
Type: Research Article
ISSN: 1858-3431

Keywords

Open Access
Article
Publication date: 12 June 2017

Aida Krichene

Loan default risk or credit risk evaluation is important to financial institutions which provide loans to businesses and individuals. Loans carry the risk of being defaulted. To…

6757

Abstract

Purpose

Loan default risk or credit risk evaluation is important to financial institutions which provide loans to businesses and individuals. Loans carry the risk of being defaulted. To understand the risk levels of credit users (corporations and individuals), credit providers (bankers) normally collect vast amounts of information on borrowers. Statistical predictive analytic techniques can be used to analyse or to determine the risk levels involved in loans. This paper aims to address the question of default prediction of short-term loans for a Tunisian commercial bank.

Design/methodology/approach

The authors have used a database of 924 files of credits granted to industrial Tunisian companies by a commercial bank in the years 2003, 2004, 2005 and 2006. The naive Bayesian classifier algorithm was used, and the results show that the good classification rate is of the order of 63.85 per cent. The default probability is explained by the variables measuring working capital, leverage, solvency, profitability and cash flow indicators.

Findings

The results of the validation test show that the good classification rate is of the order of 58.66 per cent; nevertheless, the error types I and II remain relatively high at 42.42 and 40.47 per cent, respectively. A receiver operating characteristic curve is plotted to evaluate the performance of the model. The result shows that the area under the curve criterion is of the order of 69 per cent.

Originality/value

The paper highlights the fact that the Tunisian central bank obliged all commercial banks to conduct a survey study to collect qualitative data for better credit notation of the borrowers.

Propósito

El riesgo de incumplimiento de préstamos o la evaluación del riesgo de crédito es importante para las instituciones financieras que otorgan préstamos a empresas e individuos. Existe el riesgo de que el pago de préstamos no se cumpla. Para entender los niveles de riesgo de los usuarios de crédito (corporaciones e individuos), los proveedores de crédito (banqueros) normalmente recogen gran cantidad de información sobre los prestatarios. Las técnicas analíticas predictivas estadísticas pueden utilizarse para analizar o determinar los niveles de riesgo involucrados en los préstamos. En este artículo abordamos la cuestión de la predicción por defecto de los préstamos a corto plazo para un banco comercial tunecino.

Diseño/metodología/enfoque

Utilizamos una base de datos de 924 archivos de créditos concedidos a empresas industriales tunecinas por un banco comercial en 2003, 2004, 2005 y 2006. El algoritmo bayesiano de clasificadores se llevó a cabo y los resultados muestran que la tasa de clasificación buena es del orden del 63.85%. La probabilidad de incumplimiento se explica por las variables que miden el capital de trabajo, el apalancamiento, la solvencia, la rentabilidad y los indicadores de flujo de efectivo.

Hallazgos

Los resultados de la prueba de validación muestran que la buena tasa de clasificación es del orden de 58.66% ; sin embargo, los errores tipo I y II permanecen relativamente altos, siendo de 42.42% y 40.47%, respectivamente. Se traza una curva ROC para evaluar el rendimiento del modelo. El resultado muestra que el criterio de área bajo curva (AUC, por sus siglas en inglés) es del orden del 69%.

Originalidad/valor

El documento destaca el hecho de que el Banco Central tunecino obligó a todas las entidades del sector llevar a cabo un estudio de encuesta para recopilar datos cualitativos para un mejor registro de crédito de los prestatarios.

Palabras clave

Curva ROC, Evaluación de riesgos, Riesgo de incumplimiento, Sector bancario, Algoritmo clasificador bayesiano.

Tipo de artículo

Artículo de investigación

Details

Journal of Economics, Finance and Administrative Science, vol. 22 no. 42
Type: Research Article
ISSN: 2077-1886

Keywords

1 – 10 of 221