Search results
1 – 10 of 153Fung Yuen Chin, Kong Hoong Lem and Khye Mun Wong
The amount of features in handwritten digit data is often very large due to the different aspects in personal handwriting, leading to high-dimensional data. Therefore, the…
Abstract
Purpose
The amount of features in handwritten digit data is often very large due to the different aspects in personal handwriting, leading to high-dimensional data. Therefore, the employment of a feature selection algorithm becomes crucial for successful classification modeling, because the inclusion of irrelevant or redundant features can mislead the modeling algorithms, resulting in overfitting and decrease in efficiency.
Design/methodology/approach
The minimum redundancy and maximum relevance (mRMR) and the recursive feature elimination (RFE) are two frequently used feature selection algorithms. While mRMR is capable of identifying a subset of features that are highly relevant to the targeted classification variable, mRMR still carries the weakness of capturing redundant features along with the algorithm. On the other hand, RFE is flawed by the fact that those features selected by RFE are not ranked by importance, albeit RFE can effectively eliminate the less important features and exclude redundant features.
Findings
The hybrid method was exemplified in a binary classification between digits “4” and “9” and between digits “6” and “8” from a multiple features dataset. The result showed that the hybrid mRMR + support vector machine recursive feature elimination (SVMRFE) is better than both the sole support vector machine (SVM) and mRMR.
Originality/value
In view of the respective strength and deficiency mRMR and RFE, this study combined both these methods and used an SVM as the underlying classifier anticipating the mRMR to make an excellent complement to the SVMRFE.
Details
Keywords
Omar Alqaryouti, Nur Siyam, Azza Abdel Monem and Khaled Shaalan
Digital resources such as smart applications reviews and online feedback information are important sources to seek customers’ feedback and input. This paper aims to help…
Abstract
Digital resources such as smart applications reviews and online feedback information are important sources to seek customers’ feedback and input. This paper aims to help government entities gain insights on the needs and expectations of their customers. Towards this end, we propose an aspect-based sentiment analysis hybrid approach that integrates domain lexicons and rules to analyse the entities smart apps reviews. The proposed model aims to extract the important aspects from the reviews and classify the corresponding sentiments. This approach adopts language processing techniques, rules, and lexicons to address several sentiment analysis challenges, and produce summarized results. According to the reported results, the aspect extraction accuracy improves significantly when the implicit aspects are considered. Also, the integrated classification model outperforms the lexicon-based baseline and the other rules combinations by 5% in terms of Accuracy on average. Also, when using the same dataset, the proposed approach outperforms machine learning approaches that uses support vector machine (SVM). However, using these lexicons and rules as input features to the SVM model has achieved higher accuracy than other SVM models.
Details
Keywords
Abhishek Das and Mihir Narayan Mohanty
In time and accurate detection of cancer can save the life of the person affected. According to the World Health Organization (WHO), breast cancer occupies the most frequent…
Abstract
Purpose
In time and accurate detection of cancer can save the life of the person affected. According to the World Health Organization (WHO), breast cancer occupies the most frequent incidence among all the cancers whereas breast cancer takes fifth place in the case of mortality numbers. Out of many image processing techniques, certain works have focused on convolutional neural networks (CNNs) for processing these images. However, deep learning models are to be explored well.
Design/methodology/approach
In this work, multivariate statistics-based kernel principal component analysis (KPCA) is used for essential features. KPCA is simultaneously helpful for denoising the data. These features are processed through a heterogeneous ensemble model that consists of three base models. The base models comprise recurrent neural network (RNN), long short-term memory (LSTM) and gated recurrent unit (GRU). The outcomes of these base learners are fed to fuzzy adaptive resonance theory mapping (ARTMAP) model for decision making as the nodes are added to the F_2ˆa layer if the winning criteria are fulfilled that makes the ARTMAP model more robust.
Findings
The proposed model is verified using breast histopathology image dataset publicly available at Kaggle. The model provides 99.36% training accuracy and 98.72% validation accuracy. The proposed model utilizes data processing in all aspects, i.e. image denoising to reduce the data redundancy, training by ensemble learning to provide higher results than that of single models. The final classification by a fuzzy ARTMAP model that controls the number of nodes depending upon the performance makes robust accurate classification.
Research limitations/implications
Research in the field of medical applications is an ongoing method. More advanced algorithms are being developed for better classification. Still, the scope is there to design the models in terms of better performance, practicability and cost efficiency in the future. Also, the ensemble models may be chosen with different combinations and characteristics. Only signal instead of images may be verified for this proposed model. Experimental analysis shows the improved performance of the proposed model. This method needs to be verified using practical models. Also, the practical implementation will be carried out for its real-time performance and cost efficiency.
Originality/value
The proposed model is utilized for denoising and to reduce the data redundancy so that the feature selection is done using KPCA. Training and classification are performed using heterogeneous ensemble model designed using RNN, LSTM and GRU as base classifiers to provide higher results than that of single models. Use of adaptive fuzzy mapping model makes the final classification accurate. The effectiveness of combining these methods to a single model is analyzed in this work.
Details
Keywords
Alessandra Lumini, Loris Nanni and Gianluca Maguolo
In this paper, we present a study about an automated system for monitoring underwater ecosystems. The system here proposed is based on the fusion of different deep learning…
Abstract
In this paper, we present a study about an automated system for monitoring underwater ecosystems. The system here proposed is based on the fusion of different deep learning methods. We study how to create an ensemble based of different Convolutional Neural Network (CNN) models, fine-tuned on several datasets with the aim of exploiting their diversity. The aim of our study is to experiment the possibility of fine-tuning CNNs for underwater imagery analysis, the opportunity of using different datasets for pre-training models, the possibility to design an ensemble using the same architecture with small variations in the training procedure.
Our experiments, performed on 5 well-known datasets (3 plankton and 2 coral datasets) show that the combination of such different CNN models in a heterogeneous ensemble grants a substantial performance improvement with respect to other state-of-the-art approaches in all the tested problems. One of the main contributions of this work is a wide experimental evaluation of famous CNN architectures to report the performance of both the single CNN and the ensemble of CNNs in different problems. Moreover, we show how to create an ensemble which improves the performance of the best single model. The MATLAB source code is freely link provided in title page.
Details
Keywords
Sheryl Brahnam, Loris Nanni, Shannon McMurtrey, Alessandra Lumini, Rick Brattin, Melinda Slack and Tonya Barrier
Diagnosing pain in neonates is difficult but critical. Although approximately thirty manual pain instruments have been developed for neonatal pain diagnosis, most are complex…
Abstract
Diagnosing pain in neonates is difficult but critical. Although approximately thirty manual pain instruments have been developed for neonatal pain diagnosis, most are complex, multifactorial, and geared toward research. The goals of this work are twofold: 1) to develop a new video dataset for automatic neonatal pain detection called iCOPEvid (infant Classification Of Pain Expressions videos), and 2) to present a classification system that sets a challenging comparison performance on this dataset. The iCOPEvid dataset contains 234 videos of 49 neonates experiencing a set of noxious stimuli, a period of rest, and an acute pain stimulus. From these videos 20 s segments are extracted and grouped into two classes: pain (49) and nopain (185), with the nopain video segments handpicked to produce a highly challenging dataset. An ensemble of twelve global and local descriptors with a Bag-of-Features approach is utilized to improve the performance of some new descriptors based on Gaussian of Local Descriptors (GOLD). The basic classifier used in the ensembles is the Support Vector Machine, and decisions are combined by sum rule. These results are compared with standard methods, some deep learning approaches, and 185 human assessments. Our best machine learning methods are shown to outperform the human judges.
Details
Keywords
Loris Nanni, Stefano Ghidoni and Sheryl Brahnam
This work presents a system based on an ensemble of Convolutional Neural Networks (CNNs) and descriptors for bioimage classification that has been validated on different datasets…
Abstract
This work presents a system based on an ensemble of Convolutional Neural Networks (CNNs) and descriptors for bioimage classification that has been validated on different datasets of color images. The proposed system represents a very simple yet effective way of boosting the performance of trained CNNs by composing multiple CNNs into an ensemble and combining scores by sum rule. Several types of ensembles are considered, with different CNN topologies along with different learning parameter sets. The proposed system not only exhibits strong discriminative power but also generalizes well over multiple datasets thanks to the combination of multiple descriptors based on different feature types, both learned and handcrafted. Separate classifiers are trained for each descriptor, and the entire set of classifiers is combined by sum rule. Results show that the proposed system obtains state-of-the-art performance across four different bioimage and medical datasets. The MATLAB code of the descriptors will be available at https://github.com/LorisNanni.
Loris Nanni and Sheryl Brahnam
Automatic DNA-binding protein (DNA-BP) classification is now an essential proteomic technology. Unfortunately, many systems reported in the literature are tested on only one or…
Abstract
Purpose
Automatic DNA-binding protein (DNA-BP) classification is now an essential proteomic technology. Unfortunately, many systems reported in the literature are tested on only one or two datasets/tasks. The purpose of this study is to create the most optimal and universal system for DNA-BP classification, one that performs competitively across several DNA-BP classification tasks.
Design/methodology/approach
Efficient DNA-BP classifier systems require the discovery of powerful protein representations and feature extraction methods. Experiments were performed that combined and compared descriptors extracted from state-of-the-art matrix/image protein representations. These descriptors were trained on separate support vector machines (SVMs) and evaluated. Convolutional neural networks with different parameter settings were fine-tuned on two matrix representations of proteins. Decisions were fused with the SVMs using the weighted sum rule and evaluated to experimentally derive the most powerful general-purpose DNA-BP classifier system.
Findings
The best ensemble proposed here produced comparable, if not superior, classification results on a broad and fair comparison with the literature across four different datasets representing a variety of DNA-BP classification tasks, thereby demonstrating both the power and generalizability of the proposed system.
Originality/value
Most DNA-BP methods proposed in the literature are only validated on one (rarely two) datasets/tasks. In this work, the authors report the performance of our general-purpose DNA-BP system on four datasets representing different DNA-BP classification tasks. The excellent results of the proposed best classifier system demonstrate the power of the proposed approach. These results can now be used for baseline comparisons by other researchers in the field.
Details
Keywords
Luís Jacques de Sousa, João Poças Martins and Luís Sanhudo
Factors like bid price, submission time, and number of bidders influence the procurement process in public projects. These factors and the award criteria may impact the project’s…
Abstract
Purpose
Factors like bid price, submission time, and number of bidders influence the procurement process in public projects. These factors and the award criteria may impact the project’s financial compliance. Predicting budget compliance in construction projects has been traditionally challenging, but Machine Learning (ML) techniques have revolutionised estimations.
Design/methodology/approach
In this study, Portuguese Public Procurement Data (PPPData) was utilised as the model’s input. Notably, this dataset exhibited a substantial imbalance in the target feature. To address this issue, the study evaluated three distinct data balancing techniques: oversampling, undersampling, and the SMOTE method. Next, a comprehensive feature selection process was conducted, leading to the testing of five different algorithms for forecasting budget compliance. Finally, a secondary test was conducted, refining the features to include only those elements that procurement technicians can modify while also considering the two most accurate predictors identified in the previous test.
Findings
The findings indicate that employing the SMOTE method on the scraped data can achieve a balanced dataset. Furthermore, the results demonstrate that the Adam ANN algorithm outperformed others, boasting a precision rate of 68.1%.
Practical implications
The model can aid procurement technicians during the tendering phase by using historical data and analogous projects to predict performance.
Social implications
Although the study reveals that ML algorithms cannot accurately predict budget compliance using procurement data, they can still provide project owners with insights into the most suitable criteria, aiding decision-making. Further research should assess the model’s impact and capacity within the procurement workflow.
Originality/value
Previous research predominantly focused on forecasting budgets by leveraging data from the private construction execution phase. While some investigations incorporated procurement data, this study distinguishes itself by using an imbalanced dataset and anticipating compliance rather than predicting budgetary figures. The model predicts budget compliance by analysing qualitative and quantitative characteristics of public project contracts. The research paper explores various model architectures and data treatment techniques to develop a model to assist the Client in tender definition.
Details
Keywords
Bo Liu, Libin Shen, Huanling You, Yan Dong, Jianqiang Li and Yong Li
The influence of road surface temperature (RST) on vehicles is becoming more and more obvious. Accurate predication of RST is distinctly meaningful. At present, however, the…
Abstract
Purpose
The influence of road surface temperature (RST) on vehicles is becoming more and more obvious. Accurate predication of RST is distinctly meaningful. At present, however, the prediction accuracy of RST is not satisfied with physical methods or statistical learning methods. To find an effective prediction method, this paper selects five representative algorithms to predict the road surface temperature separately.
Design/methodology/approach
Multiple linear regressions, least absolute shrinkage and selection operator, random forest and gradient boosting regression tree (GBRT) and neural network are chosen to be representative predictors.
Findings
The experimental results show that for temperature data set of this experiment, the prediction effect of GBRT in the ensemble algorithm is the best compared with the other four algorithms.
Originality/value
This paper compares different kinds of machine learning algorithms, observes the road surface temperature data from different angles, and finds the most suitable prediction method.
Details
Keywords
Wen Li, Wei Wang and Wenjun Huo
Inspired by the basic idea of gradient boosting, this study aims to design a novel multivariate regression ensemble algorithm RegBoost by using multivariate linear regression as a…
Abstract
Purpose
Inspired by the basic idea of gradient boosting, this study aims to design a novel multivariate regression ensemble algorithm RegBoost by using multivariate linear regression as a weak predictor.
Design/methodology/approach
To achieve nonlinearity after combining all linear regression predictors, the training data is divided into two branches according to the prediction results using the current weak predictor. The linear regression modeling is recursively executed in two branches. In the test phase, test data is distributed to a specific branch to continue with the next weak predictor. The final result is the sum of all weak predictors across the entire path.
Findings
Through comparison experiments, it is found that the algorithm RegBoost can achieve similar performance to the gradient boosted decision tree (GBDT). The algorithm is very effective compared to linear regression.
Originality/value
This paper attempts to design a novel regression algorithm RegBoost with reference to GBDT. To the best of the knowledge, for the first time, RegBoost uses linear regression as a weak predictor, and combine with gradient boosting to build an ensemble algorithm.
Details