Search results

1 – 10 of 296
Article
Publication date: 15 March 2021

Putta Hemalatha and Geetha Mary Amalanathan

Adequate resources for learning and training the data are an important constraint to develop an efficient classifier with outstanding performance. The data usually follows a…

Abstract

Purpose

Adequate resources for learning and training the data are an important constraint to develop an efficient classifier with outstanding performance. The data usually follows a biased distribution of classes that reflects an unequal distribution of classes within a dataset. This issue is known as the imbalance problem, which is one of the most common issues occurring in real-time applications. Learning of imbalanced datasets is a ubiquitous challenge in the field of data mining. Imbalanced data degrades the performance of the classifier by producing inaccurate results.

Design/methodology/approach

In the proposed work, a novel fuzzy-based Gaussian synthetic minority oversampling (FG-SMOTE) algorithm is proposed to process the imbalanced data. The mechanism of the Gaussian SMOTE technique is based on finding the nearest neighbour concept to balance the ratio between minority and majority class datasets. The ratio of the datasets belonging to the minority and majority class is balanced using a fuzzy-based Levenshtein distance measure technique.

Findings

The performance and the accuracy of the proposed algorithm is evaluated using the deep belief networks classifier and the results showed the efficiency of the fuzzy-based Gaussian SMOTE technique achieved an AUC: 93.7%. F1 Score Prediction: 94.2%, Geometric Mean Score: 93.6% predicted from confusion matrix.

Research limitations/implications

The proposed research still retains some of the challenges that need to be focused such as application FG-SMOTE to multiclass imbalanced dataset and to evaluate dataset imbalance problem in a distributed environment.

Originality/value

The proposed algorithm fundamentally solves the data imbalance issues and challenges involved in handling the imbalanced data. FG-SMOTE has aided in balancing minority and majority class datasets.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 14 no. 2
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 28 July 2020

Sathyaraj R, Ramanathan L, Lavanya K, Balasubramanian V and Saira Banu J

The innovation in big data is increasing day by day in such a way that the conventional software tools face several problems in managing the big data. Moreover, the occurrence of…

Abstract

Purpose

The innovation in big data is increasing day by day in such a way that the conventional software tools face several problems in managing the big data. Moreover, the occurrence of the imbalance data in the massive data sets is a major constraint to the research industry.

Design/methodology/approach

The purpose of the paper is to introduce a big data classification technique using the MapReduce framework based on an optimization algorithm. The big data classification is enabled using the MapReduce framework, which utilizes the proposed optimization algorithm, named chicken-based bacterial foraging (CBF) algorithm. The proposed algorithm is generated by integrating the bacterial foraging optimization (BFO) algorithm with the cat swarm optimization (CSO) algorithm. The proposed model executes the process in two stages, namely, training and testing phases. In the training phase, the big data that is produced from different distributed sources is subjected to parallel processing using the mappers in the mapper phase, which perform the preprocessing and feature selection based on the proposed CBF algorithm. The preprocessing step eliminates the redundant and inconsistent data, whereas the feature section step is done on the preprocessed data for extracting the significant features from the data, to provide improved classification accuracy. The selected features are fed into the reducer for data classification using the deep belief network (DBN) classifier, which is trained using the proposed CBF algorithm such that the data are classified into various classes, and finally, at the end of the training process, the individual reducers present the trained models. Thus, the incremental data are handled effectively based on the training model in the training phase. In the testing phase, the incremental data are taken and split into different subsets and fed into the different mappers for the classification. Each mapper contains a trained model which is obtained from the training phase. The trained model is utilized for classifying the incremental data. After classification, the output obtained from each mapper is fused and fed into the reducer for the classification.

Findings

The maximum accuracy and Jaccard coefficient are obtained using the epileptic seizure recognition database. The proposed CBF-DBN produces a maximal accuracy value of 91.129%, whereas the accuracy values of the existing neural network (NN), DBN, naive Bayes classifier-term frequency–inverse document frequency (NBC-TFIDF) are 82.894%, 86.184% and 86.512%, respectively. The Jaccard coefficient of the proposed CBF-DBN produces a maximal Jaccard coefficient value of 88.928%, whereas the Jaccard coefficient values of the existing NN, DBN, NBC-TFIDF are 75.891%, 79.850% and 81.103%, respectively.

Originality/value

In this paper, a big data classification method is proposed for categorizing massive data sets for meeting the constraints of huge data. The big data classification is performed on the MapReduce framework based on training and testing phases in such a way that the data are handled in parallel at the same time. In the training phase, the big data is obtained and partitioned into different subsets of data and fed into the mapper. In the mapper, the features extraction step is performed for extracting the significant features. The obtained features are subjected to the reducers for classifying the data using the obtained features. The DBN classifier is utilized for the classification wherein the DBN is trained using the proposed CBF algorithm. The trained model is obtained as an output after the classification. In the testing phase, the incremental data are considered for the classification. New data are first split into subsets and fed into the mapper for classification. The trained models obtained from the training phase are used for the classification. The classified results from each mapper are fused and fed into the reducer for the classification of big data.

Details

Data Technologies and Applications, vol. 55 no. 3
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 30 July 2020

V. Srilakshmi, K. Anuradha and C. Shoba Bindu

This paper aims to model a technique that categorizes the texts from huge documents. The progression in internet technologies has raised the count of document accessibility, and…

Abstract

Purpose

This paper aims to model a technique that categorizes the texts from huge documents. The progression in internet technologies has raised the count of document accessibility, and thus the documents available online become countless. The text documents comprise of research article, journal papers, newspaper, technical reports and blogs. These large documents are useful and valuable for processing real-time applications. Also, these massive documents are used in several retrieval methods. Text classification plays a vital role in information retrieval technologies and is considered as an active field for processing massive applications. The aim of text classification is to categorize the large-sized documents into different categories on the basis of its contents. There exist numerous methods for performing text-related tasks such as profiling users, sentiment analysis and identification of spams, which is considered as a supervised learning issue and is addressed with text classifier.

Design/methodology/approach

At first, the input documents are pre-processed using the stop word removal and stemming technique such that the input is made effective and capable for feature extraction. In the feature extraction process, the features are extracted using the vector space model (VSM) and then, the feature selection is done for selecting the highly relevant features to perform text categorization. Once the features are selected, the text categorization is progressed using the deep belief network (DBN). The training of the DBN is performed using the proposed grasshopper crow optimization algorithm (GCOA) that is the integration of the grasshopper optimization algorithm (GOA) and Crow search algorithm (CSA). Moreover, the hybrid weight bounding model is devised using the proposed GCOA and range degree. Thus, the proposed GCOA + DBN is used for classifying the text documents.

Findings

The performance of the proposed technique is evaluated using accuracy, precision and recall is compared with existing techniques such as naive bayes, k-nearest neighbors, support vector machine and deep convolutional neural network (DCNN) and Stochastic Gradient-CAViaR + DCNN. Here, the proposed GCOA + DBN has improved performance with the values of 0.959, 0.959 and 0.96 for precision, recall and accuracy, respectively.

Originality/value

This paper proposes a technique that categorizes the texts from massive sized documents. From the findings, it can be shown that the proposed GCOA-based DBN effectively classifies the text documents.

Details

International Journal of Web Information Systems, vol. 16 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 21 December 2021

Laouni Djafri

This work can be used as a building block in other settings such as GPU, Map-Reduce, Spark or any other. Also, DDPML can be deployed on other distributed systems such as P2P…

381

Abstract

Purpose

This work can be used as a building block in other settings such as GPU, Map-Reduce, Spark or any other. Also, DDPML can be deployed on other distributed systems such as P2P networks, clusters, clouds computing or other technologies.

Design/methodology/approach

In the age of Big Data, all companies want to benefit from large amounts of data. These data can help them understand their internal and external environment and anticipate associated phenomena, as the data turn into knowledge that can be used for prediction later. Thus, this knowledge becomes a great asset in companies' hands. This is precisely the objective of data mining. But with the production of a large amount of data and knowledge at a faster pace, the authors are now talking about Big Data mining. For this reason, the authors’ proposed works mainly aim at solving the problem of volume, veracity, validity and velocity when classifying Big Data using distributed and parallel processing techniques. So, the problem that the authors are raising in this work is how the authors can make machine learning algorithms work in a distributed and parallel way at the same time without losing the accuracy of classification results. To solve this problem, the authors propose a system called Dynamic Distributed and Parallel Machine Learning (DDPML) algorithms. To build it, the authors divided their work into two parts. In the first, the authors propose a distributed architecture that is controlled by Map-Reduce algorithm which in turn depends on random sampling technique. So, the distributed architecture that the authors designed is specially directed to handle big data processing that operates in a coherent and efficient manner with the sampling strategy proposed in this work. This architecture also helps the authors to actually verify the classification results obtained using the representative learning base (RLB). In the second part, the authors have extracted the representative learning base by sampling at two levels using the stratified random sampling method. This sampling method is also applied to extract the shared learning base (SLB) and the partial learning base for the first level (PLBL1) and the partial learning base for the second level (PLBL2). The experimental results show the efficiency of our solution that the authors provided without significant loss of the classification results. Thus, in practical terms, the system DDPML is generally dedicated to big data mining processing, and works effectively in distributed systems with a simple structure, such as client-server networks.

Findings

The authors got very satisfactory classification results.

Originality/value

DDPML system is specially designed to smoothly handle big data mining classification.

Details

Data Technologies and Applications, vol. 56 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 25 January 2021

Jiake Fu, Huijing Tian, Lingguang Song, Mingchao Li, Shuo Bai and Qiubing Ren

This paper presents a new approach of productivity estimation of cutter suction dredger operation through data mining and learning from real-time big data.

Abstract

Purpose

This paper presents a new approach of productivity estimation of cutter suction dredger operation through data mining and learning from real-time big data.

Design/methodology/approach

The paper used big data, data mining and machine learning techniques to extract features of cutter suction dredgers (CSD) for predicting its productivity. ElasticNet-SVR (Elastic Net-Support Vector Machine) method is used to filter the original monitoring data. Along with the actual working conditions of CSD, 15 features were selected. Then, a box plot was used to clean the corresponding data by filtering out outliers. Finally, four algorithms, namely SVR (Support Vector Regression), XGBoost (Extreme Gradient Boosting), LSTM (Long-Short Term Memory Network) and BP (Back Propagation) Neural Network, were used for modeling and testing.

Findings

The paper provided a comprehensive forecasting framework for productivity estimation including feature selection, data processing and model evaluation. The optimal coefficient of determination (R2) of four algorithms were all above 80.0%, indicating that the features selected were representative. Finally, the BP neural network model coupled with the SVR model was selected as the final model.

Originality/value

Machine-learning algorithm incorporating domain expert judgments was used to select predictive features. The final optimal coefficient of determination (R2) of the coupled model of BP neural network and SVR is 87.6%, indicating that the method proposed in this paper is effective for CSD productivity estimation.

Details

Engineering, Construction and Architectural Management, vol. 28 no. 7
Type: Research Article
ISSN: 0969-9988

Keywords

Article
Publication date: 21 July 2020

Arshey M. and Angel Viji K. S.

Phishing is a serious cybersecurity problem, which is widely available through multimedia, such as e-mail and Short Messaging Service (SMS) to collect the personal information of…

Abstract

Purpose

Phishing is a serious cybersecurity problem, which is widely available through multimedia, such as e-mail and Short Messaging Service (SMS) to collect the personal information of the individual. However, the rapid growth of the unsolicited and unwanted information needs to be addressed, raising the necessity of the technology to develop any effective anti-phishing methods.

Design/methodology/approach

The primary intention of this research is to design and develop an approach for preventing phishing by proposing an optimization algorithm. The proposed approach involves four steps, namely preprocessing, feature extraction, feature selection and classification, for dealing with phishing e-mails. Initially, the input data set is subjected to the preprocessing, which removes stop words and stemming in the data and the preprocessed output is given to the feature extraction process. By extracting keyword frequency from the preprocessed, the important words are selected as the features. Then, the feature selection process is carried out using the Bhattacharya distance such that only the significant features that can aid the classification are selected. Using the selected features, the classification is done using the deep belief network (DBN) that is trained using the proposed fractional-earthworm optimization algorithm (EWA). The proposed fractional-EWA is designed by the integration of EWA and fractional calculus to determine the weights in the DBN optimally.

Findings

The accuracy of the methods, naive Bayes (NB), DBN, neural network (NN), EWA-DBN and fractional EWA-DBN is 0.5333, 0.5455, 0.5556, 0.5714 and 0.8571, respectively. The sensitivity of the methods, NB, DBN, NN, EWA-DBN and fractional EWA-DBN is 0.4558, 0.5631, 0.7035, 0.7045 and 0.8182, respectively. Likewise, the specificity of the methods, NB, DBN, NN, EWA-DBN and fractional EWA-DBN is 0.5052, 0.5631, 0.7028, 0.7040 and 0.8800, respectively. It is clear from the comparative table that the proposed method acquired the maximal accuracy, sensitivity and specificity compared with the existing methods.

Originality/value

The e-mail phishing detection is performed in this paper using the optimization-based deep learning networks. The e-mails include a number of unwanted messages that are to be detected in order to avoid the storage issues. The importance of the method is that the inclusion of the historical data in the detection process enhances the accuracy of detection.

Details

Data Technologies and Applications, vol. 54 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 11 October 2022

Chuanzhi Sun, Yin Chu Wang, Qing Lu, Yongmeng Liu and Jiubin Tan

Aiming at the problem that the transmission mechanism of the assembly error of the multi-stage rotor with saddle surface type is not clear, the purpose of this paper is to propose…

Abstract

Purpose

Aiming at the problem that the transmission mechanism of the assembly error of the multi-stage rotor with saddle surface type is not clear, the purpose of this paper is to propose a deep belief network to realize the prediction of the coaxiality and perpendicularity of the multi-stage rotor.

Design/methodology/approach

First, the surface type of the aero-engine rotor is classified. The rotor surface profile sampling data is converted into image structure data, and a rotor surface type classifier based on convolutional neural network is established. Then, for the saddle surface rotor, a prediction model of coaxiality and perpendicularity based on deep belief network is established. To verify the effectiveness of the coaxiality and perpendicularity prediction method proposed in this paper, a multi-stage rotor coaxiality and perpendicularity assembly measurement experiment is carried out.

Findings

The results of this paper show that the accuracy rate of face type classification using convolutional neural network is 99%, which meets the requirements of subsequent assembly process. For the 80 sets of test samples, the average errors of the coaxiality and perpendicularity of the deep belief network prediction method are 0.1 and 1.6 µm, respectively.

Originality/value

Therefore, the method proposed in this paper can be used not only for rotor surface classification but also to guide the assembly of aero-engine multi-stage rotors.

Details

Assembly Automation, vol. 42 no. 6
Type: Research Article
ISSN: 0144-5154

Keywords

Article
Publication date: 9 November 2021

Shilpa B L and Shambhavi B R

Stock market forecasters are focusing to create a positive approach for predicting the stock price. The fundamental principle of an effective stock market prediction is not only…

Abstract

Purpose

Stock market forecasters are focusing to create a positive approach for predicting the stock price. The fundamental principle of an effective stock market prediction is not only to produce the maximum outcomes but also to reduce the unreliable stock price estimate. In the stock market, sentiment analysis enables people for making educated decisions regarding the investment in a business. Moreover, the stock analysis identifies the business of an organization or a company. In fact, the prediction of stock prices is more complex due to high volatile nature that varies a large range of investor sentiment, economic and political factors, changes in leadership and other factors. This prediction often becomes ineffective, while considering only the historical data or textural information. Attempts are made to make the prediction more precise with the news sentiment along with the stock price information.

Design/methodology/approach

This paper introduces a prediction framework via sentiment analysis. Thereby, the stock data and news sentiment data are also considered. From the stock data, technical indicator-based features like moving average convergence divergence (MACD), relative strength index (RSI) and moving average (MA) are extracted. At the same time, the news data are processed to determine the sentiments by certain processes like (1) pre-processing, where keyword extraction and sentiment categorization process takes place; (2) keyword extraction, where WordNet and sentiment categorization process is done; (3) feature extraction, where Proposed holoentropy based features is extracted. (4) Classification, deep neural network is used that returns the sentiment output. To make the system more accurate on predicting the sentiment, the training of NN is carried out by self-improved whale optimization algorithm (SIWOA). Finally, optimized deep belief network (DBN) is used to predict the stock that considers the features of stock data and sentiment results from news data. Here, the weights of DBN are tuned by the new SIWOA.

Findings

The performance of the adopted scheme is computed over the existing models in terms of certain measures. The stock dataset includes two companies such as Reliance Communications and Relaxo Footwear. In addition, each company consists of three datasets (a) in daily option, set start day 1-1-2019 and end day 1-12-2020, (b) in monthly option, set start Jan 2000 and end Dec 2020 and (c) in yearly option, set year 2000. Moreover, the adopted NN + DBN + SIWOA model was computed over the traditional classifiers like LSTM, NN + RF, NN + MLP and NN + SVM; also, it was compared over the existing optimization algorithms like NN + DBN + MFO, NN + DBN + CSA, NN + DBN + WOA and NN + DBN + PSO, correspondingly. Further, the performance was calculated based on the learning percentage that ranges from 60, 70, 80 and 90 in terms of certain measures like MAE, MSE and RMSE for six datasets. On observing the graph, the MAE of the adopted NN + DBN + SIWOA model was 91.67, 80, 91.11 and 93.33% superior to the existing classifiers like LSTM, NN + RF, NN + MLP and NN + SVM, respectively for dataset 1. The proposed NN + DBN + SIWOA method holds minimum MAE value of (∼0.21) at learning percentage 80 for dataset 1; whereas, the traditional models holds the value for NN + DBN + CSA (∼1.20), NN + DBN + MFO (∼1.21), NN + DBN + PSO (∼0.23) and NN + DBN + WOA (∼0.25), respectively. From the table, it was clear that the RMSRE of the proposed NN + DBN + SIWOA model was 3.14, 1.08, 1.38 and 15.28% better than the existing classifiers like LSTM, NN + RF, NN + MLP and NN + SVM, respectively, for dataset 6. In addition, he MSE of the adopted NN + DBN + SIWOA method attain lower values (∼54944.41) for dataset 2 than other existing schemes like NN + DBN + CSA(∼9.43), NN + DBN + MFO (∼56728.68), NN + DBN + PSO (∼2.95) and NN + DBN + WOA (∼56767.88), respectively.

Originality/value

This paper has introduced a prediction framework via sentiment analysis. Thereby, along with the stock data and news sentiment data were also considered. From the stock data, technical indicator based features like MACD, RSI and MA are extracted. Therefore, the proposed work was said to be much appropriate for stock market prediction.

Details

Kybernetes, vol. 52 no. 3
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 26 July 2019

Ayalapogu Ratna Raju, Suresh Pabboju and Ramisetty Rajeswara Rao

Brain tumor segmentation and classification is the interesting area for differentiating the tumorous and the non-tumorous cells in the brain and classifies the tumorous cells for…

Abstract

Purpose

Brain tumor segmentation and classification is the interesting area for differentiating the tumorous and the non-tumorous cells in the brain and classifies the tumorous cells for identifying its level. The methods developed so far lack the automatic classification, consuming considerable time for the classification. In this work, a novel brain tumor classification approach, namely, harmony cuckoo search-based deep belief network (HCS-DBN) has been proposed. Here, the images present in the database are segmented based on the newly developed hybrid active contour (HAC) segmentation model, which is the integration of the Bayesian fuzzy clustering (BFC) and the active contour model. The proposed HCS-DBN algorithm is trained with the features obtained from the segmented images. Finally, the classifier provides the information about the tumor class in each slice available in the database. Experimentation of the proposed HAC and the HCS-DBN algorithm is done using the MRI image available in the BRATS database, and results are observed. The simulation results prove that the proposed HAC and the HCS-DBN algorithm have an overall better performance with the values of 0.945, 0.9695 and 0.99348 for accuracy, sensitivity and specificity, respectively.

Design/methodology/approach

The proposed HAC segmentation approach integrates the properties of the AC model and BFC. Initially, the brain image with different modalities is subjected to segmentation with the BFC and AC models. Then, the Laplacian correction is applied to fuse the segmented outputs from each model. Finally, the proposed HAC segmentation provides the error-free segments of the brain tumor regions prevailing in the MRI image. The next step is to extract the useful features, based on scattering transform, wavelet transform and local Gabor binary pattern, from the segmented brain image. Finally, the extracted features from each segment are provided to the DBN for the training, and the HCS algorithm chooses the optimal weights for DBN training.

Findings

The experimentation of the proposed HAC with the HCS-DBN algorithm is analyzed with the standard BRATS database, and its performance is evaluated based on metrics such as accuracy, sensitivity and specificity. The simulation results of the proposed HAC with the HCS-DBN algorithm are compared against existing works such as k-NN, NN, multi-SVM and multi-SVNN. The results achieved by the proposed HAC with the HCS-DBN algorithm are eventually higher than the existing works with the values of 0.945, 0.9695 and 0.99348 for accuracy, sensitivity and specificity, respectively.

Originality/value

This work presents the brain tumor segmentation and the classification scheme by introducing the HAC-based segmentation model. The proposed HAC model combines the BFC and the active contour model through a fusion process, using the Laplacian correction probability for segmenting the slices in the database.

Details

Sensor Review, vol. 39 no. 4
Type: Research Article
ISSN: 0260-2288

Keywords

Article
Publication date: 24 September 2019

Qinghua Liu, Lu Sun, Alain Kornhauser, Jiahui Sun and Nick Sangwa

To realize classification of different pavements, a road roughness acquisition system design and an improved restricted Boltzmann machine deep neural network algorithm based on…

Abstract

Purpose

To realize classification of different pavements, a road roughness acquisition system design and an improved restricted Boltzmann machine deep neural network algorithm based on Adaboost Backward Propagation algorithm for road roughness detection is presented in this paper. The developed measurement system, including hardware designs and algorithm for software, constitutes an independent system which is low-cost, convenient for installation and small.

Design/methodology/approach

The inputs of restricted Boltzmann machine deep neural network are the vehicle vertical acceleration power spectrum and the pitch acceleration power spectrum, which is calculated using ADAMS finite element software. Adaboost Backward Propagation algorithm is used in each restricted Boltzmann machine deep neural network classification model for fine-tuning given its performance of global searching. The algorithm is first applied to road spectrum detection and experiments indicate that the algorithm is suitable for detecting pavement roughness.

Findings

The detection rate of RBM deep neural network algorithm based on Adaboost Backward Propagation is up to 96 per cent, and the false positive rate is below 3.34 per cent. These indices are both better than the other supervised algorithms, which also performs better in extracting the intrinsic characteristics of data, and therefore improves the classification accuracy and classification quality. Additionally, the classification performance is optimized. The experimental results show that the algorithm can improve performance of restricted Boltzmann machine deep neural networks. The system can be used for detecting pavement roughness.

Originality/value

This paper presents an improved restricted Boltzmann machine deep neural network algorithm based on Adaboost Backward Propagation for identifying the road roughness. Through the restricted Boltzmann machine, it completes pre-training and initializing sample weights. The entire neural network is fine-tuned through the Adaboost Backward Propagation algorithm, verifying the validity of the algorithm on the MNIST data set. A quarter vehicle model is used as the foundation, and the vertical acceleration spectrum of the vehicle center of mass and pitch acceleration spectrum were obtained by simulation in ADAMS as the input samples. The experimental results show that the improved algorithm has better optimization ability, improves the detection rate and can detect the road roughness more effectively.

1 – 10 of 296