Search results

1 – 10 of 19
Open Access
Article
Publication date: 28 July 2020

Noura AlNuaimi, Mohammad Mehedy Masud, Mohamed Adel Serhani and Nazar Zaki

Organizations in many domains generate a considerable amount of heterogeneous data every day. Such data can be processed to enhance these organizations’ decisions in real…

1821

Abstract

Organizations in many domains generate a considerable amount of heterogeneous data every day. Such data can be processed to enhance these organizations’ decisions in real time. However, storing and processing large and varied datasets (known as big data) is challenging to do in real time. In machine learning, streaming feature selection has always been considered a superior technique for selecting the relevant subset features from highly dimensional data and thus reducing learning complexity. In the relevant literature, streaming feature selection refers to the features that arrive consecutively over time; despite a lack of exact figure on the number of features, numbers of instances are well-established. Many scholars in the field have proposed streaming-feature-selection algorithms in attempts to find the proper solution to this problem. This paper presents an exhaustive and methodological introduction of these techniques. This study provides a review of the traditional feature-selection algorithms and then scrutinizes the current algorithms that use streaming feature selection to determine their strengths and weaknesses. The survey also sheds light on the ongoing challenges in big-data research.

Details

Applied Computing and Informatics, vol. 18 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 25 July 2022

Fung Yuen Chin, Kong Hoong Lem and Khye Mun Wong

The amount of features in handwritten digit data is often very large due to the different aspects in personal handwriting, leading to high-dimensional data. Therefore, the…

Abstract

Purpose

The amount of features in handwritten digit data is often very large due to the different aspects in personal handwriting, leading to high-dimensional data. Therefore, the employment of a feature selection algorithm becomes crucial for successful classification modeling, because the inclusion of irrelevant or redundant features can mislead the modeling algorithms, resulting in overfitting and decrease in efficiency.

Design/methodology/approach

The minimum redundancy and maximum relevance (mRMR) and the recursive feature elimination (RFE) are two frequently used feature selection algorithms. While mRMR is capable of identifying a subset of features that are highly relevant to the targeted classification variable, mRMR still carries the weakness of capturing redundant features along with the algorithm. On the other hand, RFE is flawed by the fact that those features selected by RFE are not ranked by importance, albeit RFE can effectively eliminate the less important features and exclude redundant features.

Findings

The hybrid method was exemplified in a binary classification between digits “4” and “9” and between digits “6” and “8” from a multiple features dataset. The result showed that the hybrid mRMR +  support vector machine recursive feature elimination (SVMRFE) is better than both the sole support vector machine (SVM) and mRMR.

Originality/value

In view of the respective strength and deficiency mRMR and RFE, this study combined both these methods and used an SVM as the underlying classifier anticipating the mRMR to make an excellent complement to the SVMRFE.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 22 March 2022

Shiva Sumanth Reddy and C. Nandini

The present research work is carried out for determining haemoprotozoan diseases in cattle and breast cancer diseases in humans at early stage. The combination of LeNet…

Abstract

Purpose

The present research work is carried out for determining haemoprotozoan diseases in cattle and breast cancer diseases in humans at early stage. The combination of LeNet and bidirectional long short-term memory (Bi-LSTM) model is used for the classification of heamoprotazoan samples into three classes such as theileriosis, babesiosis and anaplasmosis. Also, BreaKHis dataset image samples are classified into two major classes as malignant and benign. The hyperparameter optimization is used for selecting the prominent features. The main objective of this approach is to overcome the manual identification and classification of samples into different haemoprotozoan diseases in cattle. The traditional laboratory approach of identification is time-consuming and requires human expertise. The proposed methodology will help to identify and classify the heamoprotozoan disease in early stage without much of human involvement.

Design/methodology/approach

LeNet-based Bi-LSTM model is used for the classification of pathology images into babesiosis, anaplasmosis, theileriosis and breast images classified into malignant or benign. An optimization-based super pixel clustering algorithm is used for segmentation once the normalization of histopathology images is conducted. The edge information in the normalized images is considered for identifying the irregular shape regions of images, which are structurally meaningful. Also, it is compared with another segmentation approach circular Hough Transform (CHT). The CHT is used to separate the nuclei from non-nuclei. The Canny edge detection and gaussian filter is used for extracting the edges before sending to CHT.

Findings

The existing methods such as artificial neural network (ANN), convolution neural network (CNN), recurrent neural network (RNN), LSTM and Bi-LSTM model have been compared with the proposed hyperparameter optimization approach with LeNET and Bi-LSTM. The results obtained by the proposed hyperparameter optimization-Bi-LSTM model showed the accuracy of 98.99% when compared to existing models like Ensemble of Deep Learning Models of 95.29% and Modified ReliefF Algorithm of 95.94%.

Originality/value

In contrast to earlier research done using Modified ReliefF, the suggested LeNet with Bi-LSTM model, there is an improvement in accuracy, precision and F-score significantly. The real time data set is used for the heamoprotozoan disease samples. Also, for anaplasmosis and babesiosis, the second set of datasets were used which are coloured datasets obtained by adding a chemical acetone and stain.

Details

International Journal of Intelligent Computing and Cybernetics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 5 June 2017

Stephen Loh Tangwe, Michael Simon and Edson Leroy Meyer

The purpose of this study was to build and develop mathematical models correlating ambient conditions and electrical energy to the coefficient of performance (COP) of an…

Abstract

Purpose

The purpose of this study was to build and develop mathematical models correlating ambient conditions and electrical energy to the coefficient of performance (COP) of an air-source heat pump (ASHP) water heater. This study also aimed to design a simulation application to compute the COP under different heating up scenarios, and to calculate the mean significant difference under the specified scenarios by using a statistical method.

Design/methodology/approach

A data acquisition system was designed with respect to the required sensors and data loggers on the basis of the experimental setup. The two critical scenarios (with hot water draws and without hot water draws) during the heating up cycles were analyzed. Both mathematical models and the simulation application were developed using the analyzed data.

Findings

The predictors showed a direct linear relationship to the COP under the no successive hot water draws scenario, while they exhibited a linear relationship with a negative gradient to the COP under the simultaneous draws scenario. Both scenarios showed the ambient conditions to be the primary factor, and the weight of importance of the contribution to the COP was five times more in the scenario of simultaneous hot water draws than in the other scenario. The average COP of the ASHP water heater was better during a heating cycle with simultaneous hot water draws but demonstrated no mean significant difference from the other scenario.

Research limitations/implications

There was a need to include other prediction parameters such as air speed, difference in condenser temperature and difference in compressor temperature, which could help improve model accuracy. However, these were excluded because of insufficient funding for the purchase of additional temperature sensors and an air speed transducer.

Practical implications

The research was conducted in a normal middle-income family home, and all the results were obtained from the collected data from the data acquisition system. Moreover, the experiment was very feasible because the conduction of the study did not interfere with the activities of the house, as occupants were able to carry out their activities as usual.

Social implications

This paper attempts to justify the system efficiency under different heating up scenarios. Based on the mathematical model, the performance of the system could be determined all year round and the payback period could be easily evaluated. Finally, from the study, homeowners could see the value of the efficiency of the technology, as they could easily compute its performance on the basis of the ambient conditions at their location.

Originality/value

This is the first research on the mathematical modeling of the COP of an ASHP water heater using ambient conditions and electrical energy as the predictors and by using surface fitting multi-linear regression. Further, the novelty is the design of the simulation application for a Simulink environment to compute the performance from real-time data.

Article
Publication date: 13 October 2020

Russel Mhundwa and Michael Simon

This paper aims to show that a simplified surface fitting model can be efficient in determining the energy consumption during milk cooling by an on-farm direct expansion…

Abstract

Purpose

This paper aims to show that a simplified surface fitting model can be efficient in determining the energy consumption during milk cooling by an on-farm direct expansion bulk milk cooler (DXBMC). The study reveals that milk volume and the temperature gradient between the room and the final milk temperature can effectively be used for predicting the energy consumption within 95% confidence bounds.

Design/methodology/approach

A data acquisition system comprised a Landis and Gyr E650 power meter, TMC6-HE temperature sensors, and HOBO UX120-006M 4-channel analog data logger was designed and built for monitoring of the DXBMC. The room temperature where the DXBMC is housed was measured using a TMC6-HE temperature sensor, connected to a Hobo UX120-006M four-channel analog data logger which was configured to log at one-minute intervals. The electrical energy consumed by the DXBMC was measured using a Landis and Gyr E650 meter while the volume of milk was extracted from on the farm records.

Findings

The results showed that the developed model can predict the electrical energy consumption of the DXBMC within an acceptable accuracy since 80% of the variation in the electrical energy consumption by the DXBMC was explained by the mathematical model. Also, milk volume and the temperature gradient between the room and final milk temperature in the BMC are primary and secondary contributors, respectively, to electrical energy consumption by the DXBMC. Based on the system that has been monitored the findings reveal that the DXBMC was operating within the expected efficiency level as evidenced by the optimized electrical energy consumption (EEC) closely mirroring the modelled EEC with a determination coefficient of 0.95.

Research limitations/implications

Only one system was monitored due to unavailability of funding to deploy several data acquisition systems across the country. The milk blending temperatures, effects of the insulation of the DXBMC, were not taken into account in this study.

Practical implications

The developed model is simple to use, cost effective and can be applied in real-time on the dairy farm which will enable the farmer to quickly identify an increase in the cooling energy per unit of milk cooled.

Social implications

The developed easy to use model can be used by dairy farmers on similar on-farm DXBMC; hence, they can devise ways to manage their energy consumption on the farm during the cooling of milk and foster some energy efficiency initiatives.

Originality/value

The implementation of the developed model can be useful to dairy farmers in South Africa. Through energy optimization, the maintenance of the DXBMC can be determined and scheduled accordingly.

Details

Journal of Engineering, Design and Technology , vol. 19 no. 3
Type: Research Article
ISSN: 1726-0531

Keywords

Article
Publication date: 9 June 2021

Md Nazmus Sakib, Theodora Chaspari and Amir H. Behzadan

As drones are rapidly transforming tasks such as mapping and surveying, safety inspection and progress monitoring, human operators continue to play a critical role in…

Abstract

Purpose

As drones are rapidly transforming tasks such as mapping and surveying, safety inspection and progress monitoring, human operators continue to play a critical role in ensuring safe drone missions in compliance with safety regulations and standard operating procedures. Research shows that operator's stress and fatigue are leading causes of drone accidents. Building upon the authors’ past work, this study presents a systematic approach to predicting impending drone accidents using data that capture the drone operator's physiological state preceding the accident.

Design/methodology/approach

The authors collect physiological data from 25 participants in real-world and virtual reality flight experiments to design a feedforward neural network (FNN) with back propagation. Four time series signals, namely electrodermal activity (EDA), skin temperature (ST), electrocardiogram (ECG) and heart rate (HR), are selected, filtered for noise and used to extract 92 time- and frequency-domain features. The FNN is trained with data from a window of length t = 3…8 s to predict accidents in the next p = 3…8 s.

Findings

Analysis of model performance in all 36 combinations of analysis window (t) and prediction horizon (p) combinations reveals that the FNN trained with 8 s of physiological signal (i.e. t = 8) to predict drone accidents in the next 6 s (i.e. p = 6) achieved the highest F1-score of 0.81 and AP of 0.71 after feature selection and data balancing.

Originality/value

The safety and integrity of collaborative human–machine systems (e.g. remotely operated drones) rely on not only the attributes of the human operator or the machinery but also how one perceives the other and adopts to the evolving nature of the operational environment. This study is a first systematic attempt at objective prediction of potential drone accident events from operator's physiological data in (near-) real time. Findings will lay the foundation for creating automated intervention systems for drone operations, ultimately leading to safer jobsites.

Details

Smart and Sustainable Built Environment, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2046-6099

Keywords

Article
Publication date: 18 March 2022

Pinsheng Duan, Jianliang Zhou and Shiwei Tao

The outbreak of the pandemic makes it more difficult to manage the safety or health of construction workers in infrastructure construction. Risk events in construction…

Abstract

Purpose

The outbreak of the pandemic makes it more difficult to manage the safety or health of construction workers in infrastructure construction. Risk events in construction workers' material handling tasks are highly relevant to workers' work-related musculoskeletal disorders. However, there are still many problems to be resolved in recognizing risk events accurately. The purpose of this research is to propose an automatic and non-invasive recognition method for construction workers in material handling tasks during the pandemic based on smartphone and machine learning.

Design/methodology/approach

This research proposes a method to recognize and classify four different risk events by collecting specific acceleration and angular velocity patterns through built-in sensors of smartphones. The events were simulated with anterior handling and shoulder handling methods in the laboratory. After data segmentation and feature extraction, five different machine learning methods are used to recognize risk events and the classification performances are compared.

Findings

The classification result of the shoulder handling method was slightly better than the anterior handling method. By comparing the accuracy of five different classifiers, cross-validation results showed that the classification accuracy of the random forest algorithm was the highest (76.71% in anterior handling method and 80.13% in shoulder handling method) when the window size was 0.64 s.

Originality/value

Less attention has been paid to the risk events in workers' material handling tasks in previous studies, and most events are recorded by manual observation methods. This study provided a simple and objective way to judge the risk events in manual material handling tasks of construction workers based on smartphones, which can be used as a non-invasive way for managers to improve health and labor productivity during the pandemic.

Details

Engineering, Construction and Architectural Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0969-9988

Keywords

Article
Publication date: 29 April 2020

Samer BuHamdan, Aladdin Alwisy and Ahmed Bouferguene

The purpose of this paper is to develop a clear understanding of the features that increase the probability of condos’ sale, with a focus on design-related features.

Abstract

Purpose

The purpose of this paper is to develop a clear understanding of the features that increase the probability of condos’ sale, with a focus on design-related features.

Design/methodology/approach

The present research uses survival analysis (SA) and the Cox proportional-hazards regression (CPHR) to analyze condo sales data provided by the REALTORS® Association of Edmonton (RAE) (Alberta, Canada).

Findings

The analysis of the provided data shows that the listed price, building age, appliances and condo fees have less effect on the time a condo spends on the market compared to the condo’s physical features, such as construction material, interior finishing and heating type and source.

Research limitations/implications

The data used in the present research comes from one geographical area (i.e. Edmonton, Canada). Furthermore, the data provided by the RAE does not include any real estate transactions not involving a realtor. Additionally, the present research, owing to its focus on design-related features, does not control features related to the external environment, such as community and transportation proximity.

Practical implications

The findings of the present research help construction practitioners (e.g. architects, builders and realtors) better understand the features that influence condo buyers’ decisions. This knowledge helps to develop designs and marketing strategies that increase the likelihood of selling and decrease the time listed condos spend on the market.

Originality/value

The present research expands our knowledge of the drivers influencing the purchasers’ decisions concerning the building’s physical features that can be controlled during the design stage. Also, analyzing the provided data by using SA and CPHR, as followed in this paper, facilitates the inclusion of records that are listed but not sold, which helps to overcome the survivorship bias and avoid the over-optimism that exists in the present literature.

Details

International Journal of Housing Markets and Analysis, vol. 14 no. 1
Type: Research Article
ISSN: 1753-8270

Keywords

Open Access
Article
Publication date: 28 July 2020

Kumash Kapadia, Hussein Abdel-Jaber, Fadi Thabtah and Wael Hadi

Indian Premier League (IPL) is one of the more popular cricket world tournaments, and its financial is increasing each season, its viewership has increased markedly and…

3032

Abstract

Indian Premier League (IPL) is one of the more popular cricket world tournaments, and its financial is increasing each season, its viewership has increased markedly and the betting market for IPL is growing significantly every year. With cricket being a very dynamic game, bettors and bookies are incentivised to bet on the match results because it is a game that changes ball-by-ball. This paper investigates machine learning technology to deal with the problem of predicting cricket match results based on historical match data of the IPL. Influential features of the dataset have been identified using filter-based methods including Correlation-based Feature Selection, Information Gain (IG), ReliefF and Wrapper. More importantly, machine learning techniques including Naïve Bayes, Random Forest, K-Nearest Neighbour (KNN) and Model Trees (classification via regression) have been adopted to generate predictive models from distinctive feature sets derived by the filter-based methods. Two featured subsets were formulated, one based on home team advantage and other based on Toss decision. Selected machine learning techniques were applied on both feature sets to determine a predictive model. Experimental tests show that tree-based models particularly Random Forest performed better in terms of accuracy, precision and recall metrics when compared to probabilistic and statistical models. However, on the Toss featured subset, none of the considered machine learning algorithms performed well in producing accurate predictive models.

Details

Applied Computing and Informatics, vol. 18 no. 3/4
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 13 October 2021

Sharanabasappa and Suvarna Nandyal

In order to prevent accidents during driving, driver drowsiness detection systems have become a hot topic for researchers. There are various types of features that can be…

Abstract

Purpose

In order to prevent accidents during driving, driver drowsiness detection systems have become a hot topic for researchers. There are various types of features that can be used to detect drowsiness. Detection can be done by utilizing behavioral data, physiological measurements and vehicle-based data. The existing deep convolutional neural network (CNN) models-based ensemble approach analyzed the behavioral data comprises eye or face or head movement captured by using a camera images or videos. However, the developed model suffered from the limitation of high computational cost because of the application of approximately 140 million parameters.

Design/methodology/approach

The proposed model uses significant feature parameters from the feature extraction process such as ReliefF, Infinite, Correlation, Term Variance are used for feature selection. The features that are selected are undergone for classification using ensemble classifier.

Findings

The output of these models is classified into non-drowsiness or drowsiness categories.

Research limitations/implications

In this research work higher end camera are required to collect videos as it is cost-effective. Therefore, researches are encouraged to use the existing datasets.

Practical implications

This paper overcomes the earlier approach. The developed model used complex deep learning models on small dataset which would also extract additional features, thereby provided a more satisfying result.

Originality/value

Drowsiness can be detected at the earliest using ensemble model which restricts the number of accidents.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 15 no. 2
Type: Research Article
ISSN: 1756-378X

Keywords

1 – 10 of 19