Search results

1 – 10 of 13

Open Access

Article

Publication date: 9 May 2022

Email classification analysis using machine learning techniques

Khalid Iqbal and Muhammad Shehrayar Khan

In this digital era, email is the most pervasive form of communication between people. Many users become a victim of spam emails and their data have been exposed.

HTML

PDF (773 KB)

Downloads

9226

Abstract

Purpose

In this digital era, email is the most pervasive form of communication between people. Many users become a victim of spam emails and their data have been exposed.

Design/methodology/approach

Researchers contribute to solving this problem by a focus on advanced machine learning algorithms and improved models for detecting spam emails but there is still a gap in features. To achieve good results, features also play an important role. To evaluate the performance of applied classifiers, 10-fold cross-validation is used.

Findings

The results approve that the spam emails are correctly classified with the accuracy of 98.00% for the Support Vector Machine and 98.06% for the Artificial Neural Network as compared to other applied machine learning classifiers.

Originality/value

In this paper, Point-Biserial correlation is applied to each feature concerning the class label of the University of California Irvine (UCI) spambase email dataset to select the best features. Extensive experiments are conducted on selected features by training the different classifiers.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2634-1964

Keywords

Open Access

Article

Publication date: 15 December 2023

Advancing tourism demand forecasting in Sri Lanka: evaluating the performance of machine learning models and the impact of social media data integration

Isuru Udayangani Hewapathirana

This study explores the pioneering approach of utilising machine learning (ML) models and integrating social media data for predicting tourist arrivals in Sri Lanka.

HTML

PDF (3.7 MB)

Downloads

566

Abstract

Purpose

This study explores the pioneering approach of utilising machine learning (ML) models and integrating social media data for predicting tourist arrivals in Sri Lanka.

Design/methodology/approach

Two sets of experiments are performed in this research. First, the predictive accuracy of three ML models, support vector regression (SVR), random forest (RF) and artificial neural network (ANN), is compared against the seasonal autoregressive integrated moving average (SARIMA) model using historical tourist arrivals as features. Subsequently, the impact of incorporating social media data from TripAdvisor and Google Trends as additional features is investigated.

Findings

The findings reveal that the ML models generally outperform the SARIMA model, particularly from 2019 to 2021, when several unexpected events occurred in Sri Lanka. When integrating social media data, the RF model performs significantly better during most years, whereas the SVR model does not exhibit significant improvement. Although adding social media data to the ANN model does not yield superior forecasts, it exhibits proficiency in capturing data trends.

Practical implications

The findings offer substantial implications for the industry's growth and resilience, allowing stakeholders to make accurate data-driven decisions to navigate the unpredictable dynamics of Sri Lanka's tourism sector.

Originality/value

This study presents the first exploration of ML models and the integration of social media data for forecasting Sri Lankan tourist arrivals, contributing to the advancement of research in this domain.

Details

Journal of Tourism Futures, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2055-5911

Keywords

Open Access

Article

Publication date: 13 August 2021

Boruta-grid-search least square support vector machine for NO₂ pollution prediction using big data analytics and IoT emission sensors

Habeeb Balogun, Hafiz Alaka and Christian Nnaemeka Egwim

This paper seeks to assess the performance levels of BA-GS-LSSVM compared to popular standalone algorithms used to build NO₂ prediction models. The purpose of this paper is to…

HTML

PDF (2.4 MB)

Downloads

1134

Abstract

Purpose

This paper seeks to assess the performance levels of BA-GS-LSSVM compared to popular standalone algorithms used to build NO₂ prediction models. The purpose of this paper is to pre-process a relatively large data of NO₂ from Internet of Thing (IoT) sensors with time-corresponding weather and traffic data and to use the data to develop NO₂ prediction models using BA-GS-LSSVM and popular standalone algorithms to allow for a fair comparison.

Design/methodology/approach

This research installed and used data from 14 IoT emission sensors to develop machine learning predictive models for NO₂ pollution concentration. The authors used big data analytics infrastructure to retrieve the large volume of data collected in tens of seconds for over 5 months. Weather data from the UK meteorology department and traffic data from the department for transport were collected and merged for the corresponding time and location where the pollution sensors exist.

Findings

The results show that the hybrid BA-GS-LSSVM outperforms all other standalone machine learning predictive Model for NO₂ pollution.

Practical implications

This paper's hybrid model provides a basis for giving an informed decision on the NO₂ pollutant avoidance system.

Originality/value

This research installed and used data from 14 IoT emission sensors to develop machine learning predictive models for NO₂ pollution concentration.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2634-1964

Keywords

Open Access

Article

Publication date: 29 January 2024

Prediction of surface roughness using deep learning and data augmentation

Miaoxian Guo, Shouheng Wei, Chentong Han, Wanliang Xia, Chao Luo and Zhijian Lin

Surface roughness has a serious impact on the fatigue strength, wear resistance and life of mechanical products. Realizing the evolution of surface quality through theoretical…

HTML

PDF (4.9 MB)

Downloads

320

Abstract

Purpose

Surface roughness has a serious impact on the fatigue strength, wear resistance and life of mechanical products. Realizing the evolution of surface quality through theoretical modeling takes a lot of effort. To predict the surface roughness of milling processing, this paper aims to construct a neural network based on deep learning and data augmentation.

Design/methodology/approach

This study proposes a method consisting of three steps. Firstly, the machine tool multisource data acquisition platform is established, which combines sensor monitoring with machine tool communication to collect processing signals. Secondly, the feature parameters are extracted to reduce the interference and improve the model generalization ability. Thirdly, for different expectations, the parameters of the deep belief network (DBN) model are optimized by the tent-SSA algorithm to achieve more accurate roughness classification and regression prediction.

Findings

The adaptive synthetic sampling (ADASYN) algorithm can improve the classification prediction accuracy of DBN from 80.67% to 94.23%. After the DBN parameters were optimized by Tent-SSA, the roughness prediction accuracy was significantly improved. For the classification model, the prediction accuracy is improved by 5.77% based on ADASYN optimization. For regression models, different objective functions can be set according to production requirements, such as root-mean-square error (RMSE) or MaxAE, and the error is reduced by more than 40% compared to the original model.

Originality/value

A roughness prediction model based on multiple monitoring signals is proposed, which reduces the dependence on the acquisition of environmental variables and enhances the model's applicability. Furthermore, with the ADASYN algorithm, the Tent-SSA intelligent optimization algorithm is introduced to optimize the hyperparameters of the DBN model and improve the optimization performance.

Details

Journal of Intelligent Manufacturing and Special Equipment, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2633-6596

Keywords

Open Access

Article

Publication date: 27 November 2023

Predictive machine learning model for mental health issues in higher education students due to COVID-19 using HADS assessment

Reshmy Krishnan, Shantha Kumari, Ali Al Badi, Shermina Jeba and Menila James

Students pursuing different professional courses at the higher education level during 2021–2022 saw the first-time occurrence of a pandemic in the form of coronavirus disease 2019…

HTML

PDF (1.3 MB)

Downloads

401

Abstract

Purpose

Students pursuing different professional courses at the higher education level during 2021–2022 saw the first-time occurrence of a pandemic in the form of coronavirus disease 2019 (COVID-19), and their mental health was affected. Many works are available in the literature to assess mental health severity. However, it is necessary to identify the affected students early for effective treatment.

Design/methodology/approach

Predictive analytics, a part of machine learning (ML), helps with early identification based on mental health severity levels to aid clinical psychologists. As a case study, engineering and medical course students were comparatively analysed in this work as they have rich course content and a stricter evaluation process than other streams. The methodology includes an online survey that obtains demographic details, academic qualifications, family details, etc. and anxiety and depression questions using the Hospital Anxiety and Depression Scale (HADS). The responses acquired through social media networks are analysed using ML algorithms – support vector machines (SVMs) (robust handling of health information) and J48 decision tree (DT) (interpretability/comprehensibility). Also, random forest is used to identify the predictors for anxiety and depression.

Findings

The results show that the support vector classifier produces outperforming results with classification accuracy of 100%, 1.0 precision and 1.0 recall, followed by the J48 DT classifier with 96%. It was found that medical students are affected by anxiety and depression marginally more when compared with engineering students.

Research limitations/implications

The entire work is dependent on the social media-displayed online questionnaire, and the participants were not met in person. This indicates that the response rate could not be evaluated appropriately. Due to the medical restrictions imposed by COVID-19, which remain in effect in 2022, this is the only method found to collect primary data from college students. Additionally, students self-selected themselves to participate in this survey, which raises the possibility of selection bias.

Practical implications

The responses acquired through social media networks are analysed using ML algorithms. This will be a big support for understanding the mental issues of the students due to COVID-19 and can taking appropriate actions to rectify them. This will improve the quality of the learning process in higher education in Oman.

Social implications

Furthermore, this study aims to provide recommendations for mental health screening as a regular practice in educational institutions to identify undetected students.

Originality/value

Comparing the mental health issues of two professional course students is the novelty of this work. This is needed because both studies require practical learning, long hours of work, etc.

Details

Arab Gulf Journal of Scientific Research, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 1985-9899

Keywords

Open Access

Article

Publication date: 21 February 2024

Analysis of data-driven approaches for radar target classification

Aysu Coşkun and Sándor Bilicz

This study focuses on the classification of targets with varying shapes using radar cross section (RCS), which is influenced by the target’s shape. This study aims to develop a…

HTML

PDF (1.1 MB)

Downloads

171

Abstract

Purpose

This study focuses on the classification of targets with varying shapes using radar cross section (RCS), which is influenced by the target’s shape. This study aims to develop a robust classification method by considering an incident angle with minor random fluctuations and using a physical optics simulation to generate data sets.

Design/methodology/approach

The approach involves several supervised machine learning and classification methods, including traditional algorithms and a deep neural network classifier. It uses histogram-based definitions of the RCS for feature extraction, with an emphasis on resilience against noise in the RCS data. Data enrichment techniques are incorporated, including the use of noise-impacted histogram data sets.

Findings

The classification algorithms are extensively evaluated, highlighting their efficacy in feature extraction from RCS histograms. Among the studied algorithms, the K-nearest neighbour is found to be the most accurate of the traditional methods, but it is surpassed in accuracy by a deep learning network classifier. The results demonstrate the robustness of the feature extraction from the RCS histograms, motivated by mm-wave radar applications.

Originality/value

This study presents a novel approach to target classification that extends beyond traditional methods by integrating deep neural networks and focusing on histogram-based methodologies. It also incorporates data enrichment techniques to enhance the analysis, providing a comprehensive perspective for target detection using RCS.

Details

COMPEL - The international journal for computation and mathematics in electrical and electronic engineering , vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 0332-1649

Keywords

Open Access

Article

Publication date: 4 May 2021

Robust ensemble of handcrafted and learned approaches for DNA-binding proteins

Loris Nanni and Sheryl Brahnam

Automatic DNA-binding protein (DNA-BP) classification is now an essential proteomic technology. Unfortunately, many systems reported in the literature are tested on only one or…

HTML

PDF (377 KB)

Downloads

1349

Abstract

Purpose

Automatic DNA-binding protein (DNA-BP) classification is now an essential proteomic technology. Unfortunately, many systems reported in the literature are tested on only one or two datasets/tasks. The purpose of this study is to create the most optimal and universal system for DNA-BP classification, one that performs competitively across several DNA-BP classification tasks.

Design/methodology/approach

Efficient DNA-BP classifier systems require the discovery of powerful protein representations and feature extraction methods. Experiments were performed that combined and compared descriptors extracted from state-of-the-art matrix/image protein representations. These descriptors were trained on separate support vector machines (SVMs) and evaluated. Convolutional neural networks with different parameter settings were fine-tuned on two matrix representations of proteins. Decisions were fused with the SVMs using the weighted sum rule and evaluated to experimentally derive the most powerful general-purpose DNA-BP classifier system.

Findings

The best ensemble proposed here produced comparable, if not superior, classification results on a broad and fair comparison with the literature across four different datasets representing a variety of DNA-BP classification tasks, thereby demonstrating both the power and generalizability of the proposed system.

Originality/value

Most DNA-BP methods proposed in the literature are only validated on one (rarely two) datasets/tasks. In this work, the authors report the performance of our general-purpose DNA-BP system on four datasets representing different DNA-BP classification tasks. The excellent results of the proposed best classifier system demonstrate the power of the proposed approach. These results can now be used for baseline comparisons by other researchers in the field.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2634-1964

Keywords

Open Access

Article

Publication date: 21 June 2022

Design of ensemble recurrent model with stacked fuzzy ARTMAP for breast cancer detection

Abhishek Das and Mihir Narayan Mohanty

In time and accurate detection of cancer can save the life of the person affected. According to the World Health Organization (WHO), breast cancer occupies the most frequent…

HTML

PDF (1.7 MB)

Downloads

545

Abstract

Purpose

In time and accurate detection of cancer can save the life of the person affected. According to the World Health Organization (WHO), breast cancer occupies the most frequent incidence among all the cancers whereas breast cancer takes fifth place in the case of mortality numbers. Out of many image processing techniques, certain works have focused on convolutional neural networks (CNNs) for processing these images. However, deep learning models are to be explored well.

Design/methodology/approach

In this work, multivariate statistics-based kernel principal component analysis (KPCA) is used for essential features. KPCA is simultaneously helpful for denoising the data. These features are processed through a heterogeneous ensemble model that consists of three base models. The base models comprise recurrent neural network (RNN), long short-term memory (LSTM) and gated recurrent unit (GRU). The outcomes of these base learners are fed to fuzzy adaptive resonance theory mapping (ARTMAP) model for decision making as the nodes are added to the F_2ˆa layer if the winning criteria are fulfilled that makes the ARTMAP model more robust.

Findings

The proposed model is verified using breast histopathology image dataset publicly available at Kaggle. The model provides 99.36% training accuracy and 98.72% validation accuracy. The proposed model utilizes data processing in all aspects, i.e. image denoising to reduce the data redundancy, training by ensemble learning to provide higher results than that of single models. The final classification by a fuzzy ARTMAP model that controls the number of nodes depending upon the performance makes robust accurate classification.

Research limitations/implications

Research in the field of medical applications is an ongoing method. More advanced algorithms are being developed for better classification. Still, the scope is there to design the models in terms of better performance, practicability and cost efficiency in the future. Also, the ensemble models may be chosen with different combinations and characteristics. Only signal instead of images may be verified for this proposed model. Experimental analysis shows the improved performance of the proposed model. This method needs to be verified using practical models. Also, the practical implementation will be carried out for its real-time performance and cost efficiency.

Originality/value

The proposed model is utilized for denoising and to reduce the data redundancy so that the feature selection is done using KPCA. Training and classification are performed using heterogeneous ensemble model designed using RNN, LSTM and GRU as base classifiers to provide higher results than that of single models. Use of adaptive fuzzy mapping model makes the final classification accurate. The effectiveness of combining these methods to a single model is analyzed in this work.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2634-1964

Keywords

Open Access

Article

Publication date: 15 June 2021

IDMPF: intelligent diabetes mellitus prediction framework using machine learning

Leila Ismail and Huned Materwala

Machine Learning is an intelligent methodology used for prediction and has shown promising results in predictive classifications. One of the critical areas in which machine…

HTML

PDF (2 MB)

Downloads

2131

Abstract

Purpose

Machine Learning is an intelligent methodology used for prediction and has shown promising results in predictive classifications. One of the critical areas in which machine learning can save lives is diabetes prediction. Diabetes is a chronic disease and one of the 10 causes of death worldwide. It is expected that the total number of diabetes will be 700 million in 2045; a 51.18% increase compared to 2019. These are alarming figures, and therefore, it becomes an emergency to provide an accurate diabetes prediction.

Design/methodology/approach

Health professionals and stakeholders are striving for classification models to support prognosis of diabetes and formulate strategies for prevention. The authors conduct literature review of machine models and propose an intelligent framework for diabetes prediction.

Findings

The authors provide critical analysis of machine learning models, propose and evaluate an intelligent machine learning-based architecture for diabetes prediction. The authors implement and evaluate the decision tree (DT)-based random forest (RF) and support vector machine (SVM) learning models for diabetes prediction as the mostly used approaches in the literature using our framework.

Originality/value

This paper provides novel intelligent diabetes mellitus prediction framework (IDMPF) using machine learning. The framework is the result of a critical examination of prediction models in the literature and their application to diabetes. The authors identify the training methodologies, models evaluation strategies, the challenges in diabetes prediction and propose solutions within the framework. The research results can be used by health professionals, stakeholders, students and researchers working in the diabetes prediction area.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2634-1964

Keywords

Open Access

Article

Publication date: 25 May 2021

Classification models for likelihood prediction of diabetes at early stage using feature selection

Oladosu Oyebisi Oladimeji, Abimbola Oladimeji and Olayanju Oladimeji

Diabetes is one of the life-threatening chronic diseases, which is already affecting 422m people globally based on (World Health Organization) WHO report as at 2018. This costs…

HTML

PDF (113 KB)

Downloads

2070

Abstract

Purpose

Diabetes is one of the life-threatening chronic diseases, which is already affecting 422m people globally based on (World Health Organization) WHO report as at 2018. This costs individuals, government and groups a whole lot; right from its diagnosis stage to the treatment stage. The reason for this cost, among others, is that it is a long-term treatment disease. This disease is likely to continue to affect more people because of its long asymptotic phase, which makes its early detection not feasible.

Design/methodology/approach

In this study, the authors have presented machine learning models with feature selection, which can detect diabetes disease at its early stage. Also, the models presented are not costly and available to everyone, including those in the remote areas.

Findings

The study result shows that feature selection helps in getting better model, as it prevents overfitting and removes redundant data. Hence, the study result when compared with previous research shows the better result has been achieved, after it was evaluated based on metrics such as F-measure, Precision-Recall curve and Receiver Operating Characteristic Area Under Curve. This discovery has the potential to impact on clinical practice, when health workers aim at diagnosing diabetes disease at its early stage.

Originality/value

This study has not been published anywhere else.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2634-1964

Keywords

Access

Year

Content type

Earlycite article (13)

1 – 10 of 13

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Practical implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Practical implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Social implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Access

Year

Content type

We’re listening — tell us what you think