Search results

1 – 10 of 307

View access options

Article

Publication date: 23 September 2020

Steel strip surface inspection through the combination of feature selection and multiclass classifiers

Z.F. Zhang, Wei Liu, Egon Ostrosi, Yongjie Tian and Jianping Yi

During the production process of steel strip, some defects may appear on the surface, that is, traditional manual inspection could not meet the requirements of low-cost and…

HTML

PDF (1.6 MB)

Downloads

225

Abstract

Purpose

During the production process of steel strip, some defects may appear on the surface, that is, traditional manual inspection could not meet the requirements of low-cost and high-efficiency production. The purpose of this paper is to propose a method of feature selection based on filter methods combined with hidden Bayesian classifier for improving the efficiency of defect recognition and reduce the complexity of calculation. The method can select the optimal hybrid model for realizing the accurate classification of steel strip surface defects.

Design/methodology/approach

A large image feature set was initially obtained based on the discrete wavelet transform feature extraction method. Three feature selection methods (including correlation-based feature selection, consistency subset evaluator [CSE] and information gain) were then used to optimize the feature space. Parameters for the feature selection methods were based on the classification accuracy results of hidden Naive Bayes (HNB) algorithm. The selected feature subset was then applied to the traditional NB classifier and leading extended NB classifiers.

Findings

The experimental results demonstrated that the HNB model combined with feature selection approaches has better classification performance than other models of defect recognition. Among the results of this study, the proposed hybrid model of CSE + HNB is the most robust and effective and of highest classification accuracy in identifying the optimal subset of the surface defect database.

Originality/value

The main contribution of this paper is the development of a hybrid model combining feature selection and multi-class classification algorithms for steel strip surface inspection. The proposed hybrid model is primarily robust and effective for steel strip surface inspection.

Details

Engineering Computations, vol. 38 no. 4

Type: Research Article

DOI:

ISSN: 0264-4401

Keywords

View access options

Article

Publication date: 10 January 2023

Predictive model for admission uncertainty in high education using Naïve Bayes classifier

Atul Rawal and Bechoo Lal

The uncertainty of getting admission into universities/institutions is one of the global problems in an academic environment. The students are having good marks with highest…

HTML

PDF (279 KB)

Downloads

258

Abstract

Purpose

The uncertainty of getting admission into universities/institutions is one of the global problems in an academic environment. The students are having good marks with highest credential, but they are not sure about getting their admission into universities/institutions. In this research study, the researcher builds a predictive model using Naïve Bayes classifiers – machine learning algorithm to extract and analyze hidden pattern in students’ academic records and their credentials. The main purpose of this research study is to reduce the uncertainty for getting admission into universities/institutions based on their previous credentials and some other essential parameters.

Design/methodology/approach

This research study presents a joint venture of Naïve Bayes Classification and Kernel Density Estimations (KDE) to predict the student’s admission into universities or any higher institutions. The researcher collected data from the Kaggle data sets based on grade point average (GPA), graduate record examinations (GRE) and RANK of universities which are essential to take admission in higher education.

Findings

The classification model is built on the training data set of students’ examination score such as GPA, GRE, RANK and some other essential features that offered the admission with a predictive accuracy rate 72% and has been experimentally verified. To improve the quality of accuracy, the researcher used the Shapiro–Walk Normality Test and Gaussian distribution on large data sets.

Research limitations/implications

The limitation of this research study is that the developed predictive model is not applicable for getting admission into all courses. The researcher used the limited data attributes such as GRE, GPA and RANK which does not define the admission into all possible courses. It is stated that it is applicable only for student’s admission into universities/institutions, and the researcher used only three attributes of admission parameters, namely, GRE, GPA and RANK.

Practical implications

The researcher used the Naïve Bayes classifiers and KDE machine learning algorithms to develop a predictive model which is more reliable and efficient to classify the admission category (Admitted/Not Admitted) into universities/institutions. During the research study, the researcher found that accuracy performance of the predictive Model 1 and that of predictive Model 2 are very close to each other, with predictive Model 1 having truly predictive and falsely predictive rate of 70.46% and 29.53%, respectively.

Social implications

Yes, it is having a significant contribution for society; students and parents can get prior information about the possibilities of admission in higher academic institutions and universities.

Originality/value

The classification model can reduce the admission uncertainty and enhance the university’s decision-making capabilities. The significance of this research study is to reduce human intervention for making decisions with respect to the student’s admission into universities or any higher academic institutions, and it demonstrates many universities and higher-level institutions could use this predictive model to improve their admission process without human intervention.

Details

Journal of Indian Business Research, vol. 15 no. 2

Type: Research Article

DOI:

ISSN: 1755-4195

Keywords

View access options

Article

Publication date: 3 November 2020

Hate speech detection in Twitter using hybrid embeddings and improved cuckoo search-based neural networks

Femi Emmanuel Ayo, Olusegun Folorunso, Friday Thomas Ibharalu and Idowu Ademola Osinuga

Hate speech is an expression of intense hatred. Twitter has become a popular analytical tool for the prediction and monitoring of abusive behaviors. Hate speech detection with…

HTML

PDF (2.1 MB)

Downloads

478

Abstract

Purpose

Hate speech is an expression of intense hatred. Twitter has become a popular analytical tool for the prediction and monitoring of abusive behaviors. Hate speech detection with social media data has witnessed special research attention in recent studies, hence, the need to design a generic metadata architecture and efficient feature extraction technique to enhance hate speech detection.

Design/methodology/approach

This study proposes a hybrid embeddings enhanced with a topic inference method and an improved cuckoo search neural network for hate speech detection in Twitter data. The proposed method uses a hybrid embeddings technique that includes Term Frequency-Inverse Document Frequency (TF-IDF) for word-level feature extraction and Long Short Term Memory (LSTM) which is a variant of recurrent neural networks architecture for sentence-level feature extraction. The extracted features from the hybrid embeddings then serve as input into the improved cuckoo search neural network for the prediction of a tweet as hate speech, offensive language or neither.

Findings

The proposed method showed better results when tested on the collected Twitter datasets compared to other related methods. In order to validate the performances of the proposed method, t-test and post hoc multiple comparisons were used to compare the significance and means of the proposed method with other related methods for hate speech detection. Furthermore, Paired Sample t-Test was also conducted to validate the performances of the proposed method with other related methods.

Research limitations/implications

Finally, the evaluation results showed that the proposed method outperforms other related methods with mean F1-score of 91.3.

Originality/value

The main novelty of this study is the use of an automatic topic spotting measure based on naïve Bayes model to improve features representation.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 13 no. 4

Type: Research Article

DOI:

ISSN: 1756-378X

Keywords

View access options

Article

Publication date: 4 October 2021

On the potential of a graph attention network in money laundering detection

Guang-Yih Sheu and Chang-Yu Li

In a classroom, a support vector machines model with a linear kernel, a neural network and the k-nearest neighbors algorithm failed to detect simulated money laundering accounts…

HTML

PDF (1007 KB)

Downloads

268

Abstract

Purpose

In a classroom, a support vector machines model with a linear kernel, a neural network and the k-nearest neighbors algorithm failed to detect simulated money laundering accounts generated from the Panama papers data set of the offshore leak database. This study aims to resolve this failure.

Design/methodology/approach

Build a graph attention network having three modules as a new money laundering detection tool. A feature extraction module encodes these input data to create a weighted graph structure. In it, directed edges and their end vertices denote financial transactions. Each directed edge has weights for storing the frequency of money transactions and other significant features. Social network metrics are features of nodes for characterizing an account’s roles in a money laundering typology. A graph attention module implements a self-attention mechanism for highlighting target nodes. A classification module further filters out such targets using the biased rectified linear unit function.

Findings

Resulted from the highlighting of nodes using a self-attention mechanism, the proposed graph attention network outperforms a Naïve Bayes classifier, the random forest method and a support vector machines model with a radial kernel in detecting money laundering accounts. The Naïve Bayes classifier produces second accurate classifications.

Originality/value

This paper develops a new money laundering detection tool, which outperforms existing methods. This new tool produces more accurate detections of money laundering, perfects warns of money laundering accounts or links and provides sharp efficiency in processing financial transaction records without being afraid of their amount.

Details

Journal of Money Laundering Control, vol. 25 no. 3

Type: Research Article

DOI:

ISSN: 1368-5201

Keywords

Open Access

Article

Publication date: 9 May 2022

Email classification analysis using machine learning techniques

Khalid Iqbal and Muhammad Shehrayar Khan

In this digital era, email is the most pervasive form of communication between people. Many users become a victim of spam emails and their data have been exposed.

HTML

PDF (773 KB)

Downloads

9097

Abstract

Purpose

In this digital era, email is the most pervasive form of communication between people. Many users become a victim of spam emails and their data have been exposed.

Design/methodology/approach

Researchers contribute to solving this problem by a focus on advanced machine learning algorithms and improved models for detecting spam emails but there is still a gap in features. To achieve good results, features also play an important role. To evaluate the performance of applied classifiers, 10-fold cross-validation is used.

Findings

The results approve that the spam emails are correctly classified with the accuracy of 98.00% for the Support Vector Machine and 98.06% for the Artificial Neural Network as compared to other applied machine learning classifiers.

Originality/value

In this paper, Point-Biserial correlation is applied to each feature concerning the class label of the University of California Irvine (UCI) spambase email dataset to select the best features. Extensive experiments are conducted on selected features by training the different classifiers.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2634-1964

Keywords

View access options

Article

Publication date: 1 May 2007

Machine learning for Asian language text classification

Fuchun Peng and Xiangji Huang

The purpose of this research is to compare several machine learning techniques on the task of Asian language text classification, such as Chinese and Japanese where no word…

HTML

PDF (247 KB)

Downloads

961

Abstract

Purpose

The purpose of this research is to compare several machine learning techniques on the task of Asian language text classification, such as Chinese and Japanese where no word boundary information is available in written text. The paper advocates a simple language modeling based approach for this task.

Design/methodology/approach

Naïve Bayes, maximum entropy model, support vector machines, and language modeling approaches were implemented and were applied to Chinese and Japanese text classification. To investigate the influence of word segmentation, different word segmentation approaches were investigated and applied to Chinese text. A segmentation‐based approach was compared with the non‐segmentation‐based approach.

Findings

There were two findings: the experiments show that statistical language modeling can significantly outperform standard techniques, given the same set of features; and it was found that classification with word level features normally yields improved classification performance, but that classification performance is not monotonically related to segmentation accuracy. In particular, classification performance may initially improve with increased segmentation accuracy, but eventually classification performance stops improving, and can in fact even decrease, after a certain level of segmentation accuracy.

Practical implications

Apply the findings to real web text classification is ongoing work.

Originality/value

The paper is very relevant to Chinese and Japanese information processing, e.g. webpage classification, web search.

Details

Journal of Documentation, vol. 63 no. 3

Type: Research Article

DOI:

ISSN: 0022-0418

Keywords

View access options

Article

Publication date: 14 January 2014

A system of human vital signs monitoring and activity recognition based on body sensor network

Zhelong Wang, Cong Zhao and Sen Qiu

– The purpose of this paper is to develop a health monitoring system that can measure human vital signs and recognize human activity based on body sensor network (BSN).

HTML

PDF (303 KB)

Downloads

674

Abstract

Purpose

The purpose of this paper is to develop a health monitoring system that can measure human vital signs and recognize human activity based on body sensor network (BSN).

Design/methodology/approach

The system is mainly composed of electrocardiogram (ECG) signal collection node, blood oxygen signal collection node, inertial sensor node, receiving node and upper computer software. The three collection nodes collect ECG signals, blood oxygen signals and motion signals. And then collected signals are transmitted wirelessly to receiving node and analyzed by software in upper computer in real-time.

Findings

Experiment results show that the system can simultaneously monitor human ECG, heart rate, pulse rate, SpO₂ and recognize human activity. A classifier based on coupled hidden Markov model (CHMM) is adopted to recognize human activity. The average recognition accuracy of CHMM classifier is 94.8 percent, which is higher than some existent methods, such as supported vector machine (SVM), C4.5 decision tree and naive Bayes classifier (NBC).

Practical implications

The monitoring system may be used for falling detection, elderly care, postoperative care, rehabilitation training, sports training and other fields in the future.

Originality/value

First, the system can measure human vital signs (ECG, blood pressure, pulse rate, SpO₂, temperature, heart rate) and recognizes some specific simple or complex activities (sitting, lying, go boating, bicycle riding). Second, the researches of using CHMM for activity recognition based on BSN are extremely few. Consequently, the classifier based on CHMM is adopted to recognize activity with ideal recognition accuracies in this paper.

Details

Sensor Review, vol. 34 no. 1

Type: Research Article

DOI:

ISSN: 0260-2288

Keywords

View access options

Article

Publication date: 9 March 2022

Deep learning based loan eligibility prediction with Social Border Collie Optimization

G.L. Infant Cyril and J.P. Ananth

The bank is termed as an imperative part of the marketing economy. The failure or success of an institution relies on the ability of industries to compute the credit risk. The…

HTML

PDF (3.5 MB)

Downloads

170

Abstract

Purpose

The bank is termed as an imperative part of the marketing economy. The failure or success of an institution relies on the ability of industries to compute the credit risk. The loan eligibility prediction model utilizes analysis method that adapts past and current information of credit user to make prediction. However, precise loan prediction with risk and assessment analysis is a major challenge in loan eligibility prediction.

Design/methodology/approach

This aim of the research technique is to present a new method, namely Social Border Collie Optimization (SBCO)-based deep neuro fuzzy network for loan eligibility prediction. In this method, box cox transformation is employed on input loan data to create the data apt for further processing. The transformed data utilize the wrapper-based feature selection to choose suitable features to boost the performance of loan eligibility calculation. Once the features are chosen, the naive Bayes (NB) is adapted for feature fusion. In NB training, the classifier builds probability index table with the help of input data features and groups values. Here, the testing of NB classifier is done using posterior probability ratio considering conditional probability of normalization constant with class evidence. Finally, the loan eligibility prediction is achieved by deep neuro fuzzy network, which is trained with designed SBCO. Here, the SBCO is devised by combining the social ski driver (SSD) algorithm and Border Collie Optimization (BCO) to produce the most precise result.

Findings

The analysis is achieved by accuracy, sensitivity and specificity parameter by. The designed method performs with the highest accuracy of 95%, sensitivity and specificity of 95.4 and 97.3%, when compared to the existing methods, such as fuzzy neural network (Fuzzy NN), multiple partial least squares regression model (Multi_PLS), instance-based entropy fuzzy support vector machine (IEFSVM), deep recurrent neural network (Deep RNN), whale social optimization algorithm-based deep RNN (WSOA-based Deep RNN).

Originality/value

This paper devises SBCO-based deep neuro fuzzy network for predicting loan eligibility. Here, the deep neuro fuzzy network is trained with proposed SBCO, which is devised by combining the SSD and BCO to produce most precise result for loan eligibility prediction.

Details

Kybernetes, vol. 52 no. 8

Type: Research Article

DOI:

ISSN: 0368-492X

Keywords

Open Access

Article

Publication date: 3 July 2017

On predicting academic performance with process mining in learning analytics

Rahila Umer, Teo Susnjak, Anuradha Mathrani and Suriadi Suriadi

The purpose of this paper is to propose a process mining approach to help in making early predictions to improve students’ learning experience in massive open online courses…

HTML

PDF (580 KB)

Downloads

6230

Abstract

Purpose

The purpose of this paper is to propose a process mining approach to help in making early predictions to improve students’ learning experience in massive open online courses (MOOCs). It investigates the impact of various machine learning techniques in combination with process mining features to measure effectiveness of these techniques.

Design/methodology/approach

Student’s data (e.g. assessment grades, demographic information) and weekly interaction data based on event logs (e.g. video lecture interaction, solution submission time, time spent weekly) have guided this design. This study evaluates four machine learning classification techniques used in the literature (logistic regression (LR), Naïve Bayes (NB), random forest (RF) and K-nearest neighbor) to monitor weekly progression of students’ performance and to predict their overall performance outcome. Two data sets – one, with traditional features and second, with features obtained from process conformance testing – have been used.

Findings

The results show that techniques used in the study are able to make predictions on the performance of students. Overall accuracy (F1-score, area under curve) of machine learning techniques can be improved by integrating process mining features with standard features. Specifically, the use of LR and NB classifiers outperforms other techniques in a statistical significant way.

Practical implications

Although MOOCs provide a platform for learning in highly scalable and flexible manner, they are prone to early dropout and low completion rate. This study outlines a data-driven approach to improve students’ learning experience and decrease the dropout rate.

Social implications

Early predictions based on individual’s participation can help educators provide support to students who are struggling in the course.

Originality/value

This study outlines the innovative use of process mining techniques in education data mining to help educators gather data-driven insight on student performances in the enrolled courses.

Details

Journal of Research in Innovative Teaching & Learning, vol. 10 no. 2

Type: Research Article

DOI:

ISSN: 2397-7604

Keywords

Open Access

Article

Publication date: 12 June 2017

Using a naive Bayesian classifier methodology for loan risk assessment: Evidence from a Tunisian commercial bank

Aida Krichene

Loan default risk or credit risk evaluation is important to financial institutions which provide loans to businesses and individuals. Loans carry the risk of being defaulted. To…

HTML

PDF (1.1 MB)

Downloads

6730

Abstract

Purpose

Loan default risk or credit risk evaluation is important to financial institutions which provide loans to businesses and individuals. Loans carry the risk of being defaulted. To understand the risk levels of credit users (corporations and individuals), credit providers (bankers) normally collect vast amounts of information on borrowers. Statistical predictive analytic techniques can be used to analyse or to determine the risk levels involved in loans. This paper aims to address the question of default prediction of short-term loans for a Tunisian commercial bank.

Design/methodology/approach

The authors have used a database of 924 files of credits granted to industrial Tunisian companies by a commercial bank in the years 2003, 2004, 2005 and 2006. The naive Bayesian classifier algorithm was used, and the results show that the good classification rate is of the order of 63.85 per cent. The default probability is explained by the variables measuring working capital, leverage, solvency, profitability and cash flow indicators.

Findings

The results of the validation test show that the good classification rate is of the order of 58.66 per cent; nevertheless, the error types I and II remain relatively high at 42.42 and 40.47 per cent, respectively. A receiver operating characteristic curve is plotted to evaluate the performance of the model. The result shows that the area under the curve criterion is of the order of 69 per cent.

Originality/value

The paper highlights the fact that the Tunisian central bank obliged all commercial banks to conduct a survey study to collect qualitative data for better credit notation of the borrowers.

Propósito

El riesgo de incumplimiento de préstamos o la evaluación del riesgo de crédito es importante para las instituciones financieras que otorgan préstamos a empresas e individuos. Existe el riesgo de que el pago de préstamos no se cumpla. Para entender los niveles de riesgo de los usuarios de crédito (corporaciones e individuos), los proveedores de crédito (banqueros) normalmente recogen gran cantidad de información sobre los prestatarios. Las técnicas analíticas predictivas estadísticas pueden utilizarse para analizar o determinar los niveles de riesgo involucrados en los préstamos. En este artículo abordamos la cuestión de la predicción por defecto de los préstamos a corto plazo para un banco comercial tunecino.

Diseño/metodología/enfoque

Utilizamos una base de datos de 924 archivos de créditos concedidos a empresas industriales tunecinas por un banco comercial en 2003, 2004, 2005 y 2006. El algoritmo bayesiano de clasificadores se llevó a cabo y los resultados muestran que la tasa de clasificación buena es del orden del 63.85%. La probabilidad de incumplimiento se explica por las variables que miden el capital de trabajo, el apalancamiento, la solvencia, la rentabilidad y los indicadores de flujo de efectivo.

Hallazgos

Los resultados de la prueba de validación muestran que la buena tasa de clasificación es del orden de 58.66% ; sin embargo, los errores tipo I y II permanecen relativamente altos, siendo de 42.42% y 40.47%, respectivamente. Se traza una curva ROC para evaluar el rendimiento del modelo. El resultado muestra que el criterio de área bajo curva (AUC, por sus siglas en inglés) es del orden del 69%.

Originalidad/valor

El documento destaca el hecho de que el Banco Central tunecino obligó a todas las entidades del sector llevar a cabo un estudio de encuesta para recopilar datos cualitativos para un mejor registro de crédito de los prestatarios.

Palabras clave

Curva ROC, Evaluación de riesgos, Riesgo de incumplimiento, Sector bancario, Algoritmo clasificador bayesiano.

Tipo de artículo

Artículo de investigación

Details

Journal of Economics, Finance and Administrative Science, vol. 22 no. 42

Type: Research Article

DOI:

ISSN: 2077-1886

Keywords

Access

Year

Content type

1 – 10 of 307