Search results

1 – 10 of 153
Open Access
Article
Publication date: 25 July 2022

Fung Yuen Chin, Kong Hoong Lem and Khye Mun Wong

The amount of features in handwritten digit data is often very large due to the different aspects in personal handwriting, leading to high-dimensional data. Therefore, the…

1055

Abstract

Purpose

The amount of features in handwritten digit data is often very large due to the different aspects in personal handwriting, leading to high-dimensional data. Therefore, the employment of a feature selection algorithm becomes crucial for successful classification modeling, because the inclusion of irrelevant or redundant features can mislead the modeling algorithms, resulting in overfitting and decrease in efficiency.

Design/methodology/approach

The minimum redundancy and maximum relevance (mRMR) and the recursive feature elimination (RFE) are two frequently used feature selection algorithms. While mRMR is capable of identifying a subset of features that are highly relevant to the targeted classification variable, mRMR still carries the weakness of capturing redundant features along with the algorithm. On the other hand, RFE is flawed by the fact that those features selected by RFE are not ranked by importance, albeit RFE can effectively eliminate the less important features and exclude redundant features.

Findings

The hybrid method was exemplified in a binary classification between digits “4” and “9” and between digits “6” and “8” from a multiple features dataset. The result showed that the hybrid mRMR +  support vector machine recursive feature elimination (SVMRFE) is better than both the sole support vector machine (SVM) and mRMR.

Originality/value

In view of the respective strength and deficiency mRMR and RFE, this study combined both these methods and used an SVM as the underlying classifier anticipating the mRMR to make an excellent complement to the SVMRFE.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 31 July 2020

Omar Alqaryouti, Nur Siyam, Azza Abdel Monem and Khaled Shaalan

Digital resources such as smart applications reviews and online feedback information are important sources to seek customers’ feedback and input. This paper aims to help…

7635

Abstract

Digital resources such as smart applications reviews and online feedback information are important sources to seek customers’ feedback and input. This paper aims to help government entities gain insights on the needs and expectations of their customers. Towards this end, we propose an aspect-based sentiment analysis hybrid approach that integrates domain lexicons and rules to analyse the entities smart apps reviews. The proposed model aims to extract the important aspects from the reviews and classify the corresponding sentiments. This approach adopts language processing techniques, rules, and lexicons to address several sentiment analysis challenges, and produce summarized results. According to the reported results, the aspect extraction accuracy improves significantly when the implicit aspects are considered. Also, the integrated classification model outperforms the lexicon-based baseline and the other rules combinations by 5% in terms of Accuracy on average. Also, when using the same dataset, the proposed approach outperforms machine learning approaches that uses support vector machine (SVM). However, using these lexicons and rules as input features to the SVM model has achieved higher accuracy than other SVM models.

Details

Applied Computing and Informatics, vol. 20 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 21 June 2022

Abhishek Das and Mihir Narayan Mohanty

In time and accurate detection of cancer can save the life of the person affected. According to the World Health Organization (WHO), breast cancer occupies the most frequent…

Abstract

Purpose

In time and accurate detection of cancer can save the life of the person affected. According to the World Health Organization (WHO), breast cancer occupies the most frequent incidence among all the cancers whereas breast cancer takes fifth place in the case of mortality numbers. Out of many image processing techniques, certain works have focused on convolutional neural networks (CNNs) for processing these images. However, deep learning models are to be explored well.

Design/methodology/approach

In this work, multivariate statistics-based kernel principal component analysis (KPCA) is used for essential features. KPCA is simultaneously helpful for denoising the data. These features are processed through a heterogeneous ensemble model that consists of three base models. The base models comprise recurrent neural network (RNN), long short-term memory (LSTM) and gated recurrent unit (GRU). The outcomes of these base learners are fed to fuzzy adaptive resonance theory mapping (ARTMAP) model for decision making as the nodes are added to the F_2ˆa layer if the winning criteria are fulfilled that makes the ARTMAP model more robust.

Findings

The proposed model is verified using breast histopathology image dataset publicly available at Kaggle. The model provides 99.36% training accuracy and 98.72% validation accuracy. The proposed model utilizes data processing in all aspects, i.e. image denoising to reduce the data redundancy, training by ensemble learning to provide higher results than that of single models. The final classification by a fuzzy ARTMAP model that controls the number of nodes depending upon the performance makes robust accurate classification.

Research limitations/implications

Research in the field of medical applications is an ongoing method. More advanced algorithms are being developed for better classification. Still, the scope is there to design the models in terms of better performance, practicability and cost efficiency in the future. Also, the ensemble models may be chosen with different combinations and characteristics. Only signal instead of images may be verified for this proposed model. Experimental analysis shows the improved performance of the proposed model. This method needs to be verified using practical models. Also, the practical implementation will be carried out for its real-time performance and cost efficiency.

Originality/value

The proposed model is utilized for denoising and to reduce the data redundancy so that the feature selection is done using KPCA. Training and classification are performed using heterogeneous ensemble model designed using RNN, LSTM and GRU as base classifiers to provide higher results than that of single models. Use of adaptive fuzzy mapping model makes the final classification accurate. The effectiveness of combining these methods to a single model is analyzed in this work.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 4 August 2020

Alessandra Lumini, Loris Nanni and Gianluca Maguolo

In this paper, we present a study about an automated system for monitoring underwater ecosystems. The system here proposed is based on the fusion of different deep learning…

2346

Abstract

In this paper, we present a study about an automated system for monitoring underwater ecosystems. The system here proposed is based on the fusion of different deep learning methods. We study how to create an ensemble based of different Convolutional Neural Network (CNN) models, fine-tuned on several datasets with the aim of exploiting their diversity. The aim of our study is to experiment the possibility of fine-tuning CNNs for underwater imagery analysis, the opportunity of using different datasets for pre-training models, the possibility to design an ensemble using the same architecture with small variations in the training procedure.

Our experiments, performed on 5 well-known datasets (3 plankton and 2 coral datasets) show that the combination of such different CNN models in a heterogeneous ensemble grants a substantial performance improvement with respect to other state-of-the-art approaches in all the tested problems. One of the main contributions of this work is a wide experimental evaluation of famous CNN architectures to report the performance of both the single CNN and the ensemble of CNNs in different problems. Moreover, we show how to create an ensemble which improves the performance of the best single model. The MATLAB source code is freely link provided in title page.

Details

Applied Computing and Informatics, vol. 19 no. 3/4
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 17 July 2020

Sheryl Brahnam, Loris Nanni, Shannon McMurtrey, Alessandra Lumini, Rick Brattin, Melinda Slack and Tonya Barrier

Diagnosing pain in neonates is difficult but critical. Although approximately thirty manual pain instruments have been developed for neonatal pain diagnosis, most are complex…

2303

Abstract

Diagnosing pain in neonates is difficult but critical. Although approximately thirty manual pain instruments have been developed for neonatal pain diagnosis, most are complex, multifactorial, and geared toward research. The goals of this work are twofold: 1) to develop a new video dataset for automatic neonatal pain detection called iCOPEvid (infant Classification Of Pain Expressions videos), and 2) to present a classification system that sets a challenging comparison performance on this dataset. The iCOPEvid dataset contains 234 videos of 49 neonates experiencing a set of noxious stimuli, a period of rest, and an acute pain stimulus. From these videos 20 s segments are extracted and grouped into two classes: pain (49) and nopain (185), with the nopain video segments handpicked to produce a highly challenging dataset. An ensemble of twelve global and local descriptors with a Bag-of-Features approach is utilized to improve the performance of some new descriptors based on Gaussian of Local Descriptors (GOLD). The basic classifier used in the ensembles is the Support Vector Machine, and decisions are combined by sum rule. These results are compared with standard methods, some deep learning approaches, and 185 human assessments. Our best machine learning methods are shown to outperform the human judges.

Details

Applied Computing and Informatics, vol. 19 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 16 July 2020

Loris Nanni, Stefano Ghidoni and Sheryl Brahnam

This work presents a system based on an ensemble of Convolutional Neural Networks (CNNs) and descriptors for bioimage classification that has been validated on different datasets…

2315

Abstract

This work presents a system based on an ensemble of Convolutional Neural Networks (CNNs) and descriptors for bioimage classification that has been validated on different datasets of color images. The proposed system represents a very simple yet effective way of boosting the performance of trained CNNs by composing multiple CNNs into an ensemble and combining scores by sum rule. Several types of ensembles are considered, with different CNN topologies along with different learning parameter sets. The proposed system not only exhibits strong discriminative power but also generalizes well over multiple datasets thanks to the combination of multiple descriptors based on different feature types, both learned and handcrafted. Separate classifiers are trained for each descriptor, and the entire set of classifiers is combined by sum rule. Results show that the proposed system obtains state-of-the-art performance across four different bioimage and medical datasets. The MATLAB code of the descriptors will be available at https://github.com/LorisNanni.

Details

Applied Computing and Informatics, vol. 17 no. 1
Type: Research Article
ISSN: 2634-1964

Open Access
Article
Publication date: 4 May 2021

Loris Nanni and Sheryl Brahnam

Automatic DNA-binding protein (DNA-BP) classification is now an essential proteomic technology. Unfortunately, many systems reported in the literature are tested on only one or…

1354

Abstract

Purpose

Automatic DNA-binding protein (DNA-BP) classification is now an essential proteomic technology. Unfortunately, many systems reported in the literature are tested on only one or two datasets/tasks. The purpose of this study is to create the most optimal and universal system for DNA-BP classification, one that performs competitively across several DNA-BP classification tasks.

Design/methodology/approach

Efficient DNA-BP classifier systems require the discovery of powerful protein representations and feature extraction methods. Experiments were performed that combined and compared descriptors extracted from state-of-the-art matrix/image protein representations. These descriptors were trained on separate support vector machines (SVMs) and evaluated. Convolutional neural networks with different parameter settings were fine-tuned on two matrix representations of proteins. Decisions were fused with the SVMs using the weighted sum rule and evaluated to experimentally derive the most powerful general-purpose DNA-BP classifier system.

Findings

The best ensemble proposed here produced comparable, if not superior, classification results on a broad and fair comparison with the literature across four different datasets representing a variety of DNA-BP classification tasks, thereby demonstrating both the power and generalizability of the proposed system.

Originality/value

Most DNA-BP methods proposed in the literature are only validated on one (rarely two) datasets/tasks. In this work, the authors report the performance of our general-purpose DNA-BP system on four datasets representing different DNA-BP classification tasks. The excellent results of the proposed best classifier system demonstrate the power of the proposed approach. These results can now be used for baseline comparisons by other researchers in the field.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 26 April 2024

Luís Jacques de Sousa, João Poças Martins and Luís Sanhudo

Factors like bid price, submission time, and number of bidders influence the procurement process in public projects. These factors and the award criteria may impact the project’s…

Abstract

Purpose

Factors like bid price, submission time, and number of bidders influence the procurement process in public projects. These factors and the award criteria may impact the project’s financial compliance. Predicting budget compliance in construction projects has been traditionally challenging, but Machine Learning (ML) techniques have revolutionised estimations.

Design/methodology/approach

In this study, Portuguese Public Procurement Data (PPPData) was utilised as the model’s input. Notably, this dataset exhibited a substantial imbalance in the target feature. To address this issue, the study evaluated three distinct data balancing techniques: oversampling, undersampling, and the SMOTE method. Next, a comprehensive feature selection process was conducted, leading to the testing of five different algorithms for forecasting budget compliance. Finally, a secondary test was conducted, refining the features to include only those elements that procurement technicians can modify while also considering the two most accurate predictors identified in the previous test.

Findings

The findings indicate that employing the SMOTE method on the scraped data can achieve a balanced dataset. Furthermore, the results demonstrate that the Adam ANN algorithm outperformed others, boasting a precision rate of 68.1%.

Practical implications

The model can aid procurement technicians during the tendering phase by using historical data and analogous projects to predict performance.

Social implications

Although the study reveals that ML algorithms cannot accurately predict budget compliance using procurement data, they can still provide project owners with insights into the most suitable criteria, aiding decision-making. Further research should assess the model’s impact and capacity within the procurement workflow.

Originality/value

Previous research predominantly focused on forecasting budgets by leveraging data from the private construction execution phase. While some investigations incorporated procurement data, this study distinguishes itself by using an imbalanced dataset and anticipating compliance rather than predicting budgetary figures. The model predicts budget compliance by analysing qualitative and quantitative characteristics of public project contracts. The research paper explores various model architectures and data treatment techniques to develop a model to assist the Client in tender definition.

Details

Engineering, Construction and Architectural Management, vol. 31 no. 13
Type: Research Article
ISSN: 0969-9988

Keywords

Open Access
Article
Publication date: 13 November 2018

Bo Liu, Libin Shen, Huanling You, Yan Dong, Jianqiang Li and Yong Li

The influence of road surface temperature (RST) on vehicles is becoming more and more obvious. Accurate predication of RST is distinctly meaningful. At present, however, the…

1020

Abstract

Purpose

The influence of road surface temperature (RST) on vehicles is becoming more and more obvious. Accurate predication of RST is distinctly meaningful. At present, however, the prediction accuracy of RST is not satisfied with physical methods or statistical learning methods. To find an effective prediction method, this paper selects five representative algorithms to predict the road surface temperature separately.

Design/methodology/approach

Multiple linear regressions, least absolute shrinkage and selection operator, random forest and gradient boosting regression tree (GBRT) and neural network are chosen to be representative predictors.

Findings

The experimental results show that for temperature data set of this experiment, the prediction effect of GBRT in the ensemble algorithm is the best compared with the other four algorithms.

Originality/value

This paper compares different kinds of machine learning algorithms, observes the road surface temperature data from different angles, and finds the most suitable prediction method.

Details

International Journal of Crowd Science, vol. 2 no. 3
Type: Research Article
ISSN: 2398-7294

Keywords

Open Access
Article
Publication date: 3 February 2020

Wen Li, Wei Wang and Wenjun Huo

Inspired by the basic idea of gradient boosting, this study aims to design a novel multivariate regression ensemble algorithm RegBoost by using multivariate linear regression as a…

4546

Abstract

Purpose

Inspired by the basic idea of gradient boosting, this study aims to design a novel multivariate regression ensemble algorithm RegBoost by using multivariate linear regression as a weak predictor.

Design/methodology/approach

To achieve nonlinearity after combining all linear regression predictors, the training data is divided into two branches according to the prediction results using the current weak predictor. The linear regression modeling is recursively executed in two branches. In the test phase, test data is distributed to a specific branch to continue with the next weak predictor. The final result is the sum of all weak predictors across the entire path.

Findings

Through comparison experiments, it is found that the algorithm RegBoost can achieve similar performance to the gradient boosted decision tree (GBDT). The algorithm is very effective compared to linear regression.

Originality/value

This paper attempts to design a novel regression algorithm RegBoost with reference to GBDT. To the best of the knowledge, for the first time, RegBoost uses linear regression as a weak predictor, and combine with gradient boosting to build an ensemble algorithm.

Details

International Journal of Crowd Science, vol. 4 no. 1
Type: Research Article
ISSN: 2398-7294

Keywords

1 – 10 of 153