Search results

1 – 10 of 750
Open Access
Article
Publication date: 12 June 2017

Aida Krichene

Loan default risk or credit risk evaluation is important to financial institutions which provide loans to businesses and individuals. Loans carry the risk of being defaulted. To…

6686

Abstract

Purpose

Loan default risk or credit risk evaluation is important to financial institutions which provide loans to businesses and individuals. Loans carry the risk of being defaulted. To understand the risk levels of credit users (corporations and individuals), credit providers (bankers) normally collect vast amounts of information on borrowers. Statistical predictive analytic techniques can be used to analyse or to determine the risk levels involved in loans. This paper aims to address the question of default prediction of short-term loans for a Tunisian commercial bank.

Design/methodology/approach

The authors have used a database of 924 files of credits granted to industrial Tunisian companies by a commercial bank in the years 2003, 2004, 2005 and 2006. The naive Bayesian classifier algorithm was used, and the results show that the good classification rate is of the order of 63.85 per cent. The default probability is explained by the variables measuring working capital, leverage, solvency, profitability and cash flow indicators.

Findings

The results of the validation test show that the good classification rate is of the order of 58.66 per cent; nevertheless, the error types I and II remain relatively high at 42.42 and 40.47 per cent, respectively. A receiver operating characteristic curve is plotted to evaluate the performance of the model. The result shows that the area under the curve criterion is of the order of 69 per cent.

Originality/value

The paper highlights the fact that the Tunisian central bank obliged all commercial banks to conduct a survey study to collect qualitative data for better credit notation of the borrowers.

Propósito

El riesgo de incumplimiento de préstamos o la evaluación del riesgo de crédito es importante para las instituciones financieras que otorgan préstamos a empresas e individuos. Existe el riesgo de que el pago de préstamos no se cumpla. Para entender los niveles de riesgo de los usuarios de crédito (corporaciones e individuos), los proveedores de crédito (banqueros) normalmente recogen gran cantidad de información sobre los prestatarios. Las técnicas analíticas predictivas estadísticas pueden utilizarse para analizar o determinar los niveles de riesgo involucrados en los préstamos. En este artículo abordamos la cuestión de la predicción por defecto de los préstamos a corto plazo para un banco comercial tunecino.

Diseño/metodología/enfoque

Utilizamos una base de datos de 924 archivos de créditos concedidos a empresas industriales tunecinas por un banco comercial en 2003, 2004, 2005 y 2006. El algoritmo bayesiano de clasificadores se llevó a cabo y los resultados muestran que la tasa de clasificación buena es del orden del 63.85%. La probabilidad de incumplimiento se explica por las variables que miden el capital de trabajo, el apalancamiento, la solvencia, la rentabilidad y los indicadores de flujo de efectivo.

Hallazgos

Los resultados de la prueba de validación muestran que la buena tasa de clasificación es del orden de 58.66% ; sin embargo, los errores tipo I y II permanecen relativamente altos, siendo de 42.42% y 40.47%, respectivamente. Se traza una curva ROC para evaluar el rendimiento del modelo. El resultado muestra que el criterio de área bajo curva (AUC, por sus siglas en inglés) es del orden del 69%.

Originalidad/valor

El documento destaca el hecho de que el Banco Central tunecino obligó a todas las entidades del sector llevar a cabo un estudio de encuesta para recopilar datos cualitativos para un mejor registro de crédito de los prestatarios.

Palabras clave

Curva ROC, Evaluación de riesgos, Riesgo de incumplimiento, Sector bancario, Algoritmo clasificador bayesiano.

Tipo de artículo

Artículo de investigación

Details

Journal of Economics, Finance and Administrative Science, vol. 22 no. 42
Type: Research Article
ISSN: 2077-1886

Keywords

Article
Publication date: 14 November 2016

Shrawan Kumar Trivedi and Shubhamoy Dey

The email is an important medium for sharing information rapidly. However, spam, being a nuisance in such communication, motivates the building of a robust filtering system with…

Abstract

Purpose

The email is an important medium for sharing information rapidly. However, spam, being a nuisance in such communication, motivates the building of a robust filtering system with high classification accuracy and good sensitivity towards false positives. In that context, this paper aims to present a combined classifier technique using a committee selection mechanism where the main objective is to identify a set of classifiers so that their individual decisions can be combined by a committee selection procedure for accurate detection of spam.

Design/methodology/approach

For training and testing of the relevant machine learning classifiers, text mining approaches are used in this research. Three data sets (Enron, SpamAssassin and LingSpam) have been used to test the classifiers. Initially, pre-processing is performed to extract the features associated with the email files. In the next step, the extracted features are taken through a dimensionality reduction method where non-informative features are removed. Subsequently, an informative feature subset is selected using genetic feature search. Thereafter, the proposed classifiers are tested on those informative features and the results compared with those of other classifiers.

Findings

For building the proposed combined classifier, three different studies have been performed. The first study identifies the effect of boosting algorithms on two probabilistic classifiers: Bayesian and Naïve Bayes. In that study, AdaBoost has been found to be the best algorithm for performance boosting. The second study was on the effect of different Kernel functions on support vector machine (SVM) classifier, where SVM with normalized polynomial (NP) kernel was observed to be the best. The last study was on combining classifiers with committee selection where the committee members were the best classifiers identified by the first study i.e. Bayesian and Naïve bays with AdaBoost, and the committee president was selected from the second study i.e. SVM with NP kernel. Results show that combining of the identified classifiers to form a committee machine gives excellent performance accuracy with a low false positive rate.

Research limitations/implications

This research is focused on the classification of email spams written in English language. Only body (text) parts of the emails have been used. Image spam has not been included in this work. We have restricted our work to only emails messages. None of the other types of messages like short message service or multi-media messaging service were a part of this study.

Practical implications

This research proposes a method of dealing with the issues and challenges faced by internet service providers and organizations that use email. The proposed model provides not only better classification accuracy but also a low false positive rate.

Originality/value

The proposed combined classifier is a novel classifier designed for accurate classification of email spam.

Details

VINE Journal of Information and Knowledge Management Systems, vol. 46 no. 4
Type: Research Article
ISSN: 2059-5891

Keywords

Article
Publication date: 1 April 2006

María M. Abad‐Grau and Daniel Arias‐Aranda

Information analysis tools enhance the possibilities of firm competition in terms of knowledge management. However, the generalization of decision support systems (DSS) is still…

2152

Abstract

Purpose

Information analysis tools enhance the possibilities of firm competition in terms of knowledge management. However, the generalization of decision support systems (DSS) is still far away from everyday use by managers and academicians. This paper aims to present a framework of analysis based on Bayesian networks (BN) whose accuracy is measured in order to assess scientific evidence.

Design/methodology/approach

Different learning algorithms based on BN are applied to extract relevant information about the relationship between operations strategy and flexibility in a sample of engineering consulting firms. Feature selection algorithms automatically are able to improve the accuracy of these classifiers.

Findings

Results show that the behaviors of the firms can be reduced to different rules that help in the decision‐making process about investments in technology and production resources.

Originality/value

Contrasting with methods from the classic statistics, Bayesian classifiers are able to model a variety of relationships between the variables affecting the dependent variable. Contrasting with other methods from the artificial intelligence field, such as neural networks or support vector machines, Bayesian classifiers are white‐box models that can directly be interpreted. Together with feature selection techniques from the machine learning field, they are able to automatically learn a model that accurately fits the data.

Details

Industrial Management & Data Systems, vol. 106 no. 4
Type: Research Article
ISSN: 0263-5577

Keywords

Book part
Publication date: 31 January 2015

Davy Janssens and Geert Wets

Several activity-based transportation models are now becoming operational and are entering the stage of application for the modelling of travel demand. In our application, we will…

Abstract

Several activity-based transportation models are now becoming operational and are entering the stage of application for the modelling of travel demand. In our application, we will use decision rules to support the decision-making of the model instead of principles of utility maximization, which means our work can be interpreted as an application of the concept of bounded rationality in the transportation domain. In this chapter we explored a novel idea of combining decision trees and Bayesian networks to improve decision-making in order to maintain the potential advantages of both techniques. The results of this study suggest that integrated Bayesian networks and decision trees can be used for modelling the different choice facets of a travel demand model with better predictive power than CHAID decision trees. Another conclusion is that there are initial indications that the new way of integrating decision trees and Bayesian networks has produced a decision tree that is structurally more stable.

Details

Bounded Rational Choice Behaviour: Applications in Transport
Type: Book
ISBN: 978-1-78441-071-1

Keywords

Article
Publication date: 25 October 2018

Shrawan Kumar Trivedi, Shubhamoy Dey and Anil Kumar

Sentiment analysis and opinion mining are emerging areas of research for analyzing Web data and capturing users’ sentiments. This research aims to present sentiment analysis of an…

Abstract

Purpose

Sentiment analysis and opinion mining are emerging areas of research for analyzing Web data and capturing users’ sentiments. This research aims to present sentiment analysis of an Indian movie review corpus using natural language processing and various machine learning classifiers.

Design/methodology/approach

In this paper, a comparative study between three machine learning classifiers (Bayesian, naïve Bayesian and support vector machine [SVM]) was performed. All the classifiers were trained on the words/features of the corpus extracted, using five different feature selection algorithms (Chi-square, info-gain, gain ratio, one-R and relief-F [RF] attributes), and a comparative study was performed between them. The classifiers and feature selection approaches were evaluated using different metrics (F-value, false-positive [FP] rate and training time).

Findings

The results of this study show that, for the maximum number of features, the RF feature selection approach was found to be the best, with better F-values, a low FP rate and less time needed to train the classifiers, whereas for the least number of features, one-R was better than RF. When the evaluation was performed for machine learning classifiers, SVM was found to be superior, although the Bayesian classifier was comparable with SVM.

Originality/value

This is a novel research where Indian review data were collected and then a classification model for sentiment polarity (positive/negative) was constructed.

Details

The Electronic Library, vol. 36 no. 4
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 1 November 2019

Shrawan Kumar Trivedi and Shubhamoy Dey

Email is a rapid and cheapest medium of sharing information, whereas unsolicited email (spam) is constant trouble in the email communication. The rapid growth of the spam creates…

Abstract

Purpose

Email is a rapid and cheapest medium of sharing information, whereas unsolicited email (spam) is constant trouble in the email communication. The rapid growth of the spam creates a necessity to build a reliable and robust spam classifier. This paper aims to presents a study of evolutionary classifiers (genetic algorithm [GA] and genetic programming [GP]) without/with the help of an ensemble of classifiers method. In this research, the classifiers ensemble has been developed with adaptive boosting technique.

Design/methodology/approach

Text mining methods are applied for classifying spam emails and legitimate emails. Two data sets (Enron and SpamAssassin) are taken to test the concerned classifiers. Initially, pre-processing is performed to extract the features/words from email files. Informative feature subset is selected from greedy stepwise feature subset search method. With the help of informative features, a comparative study is performed initially within the evolutionary classifiers and then with other popular machine learning classifiers (Bayesian, naive Bayes and support vector machine).

Findings

This study reveals the fact that evolutionary algorithms are promising in classification and prediction applications where genetic programing with adaptive boosting is turned out not only an accurate classifier but also a sensitive classifier. Results show that initially GA performs better than GP but after an ensemble of classifiers (a large number of iterations), GP overshoots GA with significantly higher accuracy. Amongst all classifiers, boosted GP turns out to be not only good regarding classification accuracy but also low false positive (FP) rates, which is considered to be the important criteria in email spam classification. Also, greedy stepwise feature search is found to be an effective method for feature selection in this application domain.

Research limitations/implications

The research implication of this research consists of the reduction in cost incurred because of spam/unsolicited bulk email. Email is a fundamental necessity to share information within a number of units of the organizations to be competitive with the business rivals. In addition, it is continually a hurdle for internet service providers to provide the best emailing services to their customers. Although, the organizations and the internet service providers are continuously adopting novel spam filtering approaches to reduce the number of unwanted emails, the desired effect could not be significantly seen because of the cost of installation, customizable ability and the threat of misclassification of important emails. This research deals with all the issues and challenges faced by internet service providers and organizations.

Practical implications

In this research, the proposed models have not only provided excellent performance accuracy, sensitivity with low FP rate, customizable capability but also worked on reducing the cost of spam. The same models may be used for other applications of text mining also such as sentiment analysis, blog mining, news mining or other text mining research.

Originality/value

A comparison between GP and GAs has been shown with/without ensemble in spam classification application domain.

Article
Publication date: 4 April 2016

GopalaKrishnan T and P Sengottuvelan

The ultimate objective of the any e-Learning system is to meet the specific need of the online learners and provide them with various features to have efficacious learning…

Abstract

Purpose

The ultimate objective of the any e-Learning system is to meet the specific need of the online learners and provide them with various features to have efficacious learning experiences by understanding their complexities. Any e-Learning system could be much more improved by tracking students commitment and disengagement on that course, in turn, would allow system to have personalized involvements at appropriate times in order to re-engage learners. Motivations play a important role to get back the learners on the track could be done by analyzing of several attributes of the log files. This paper aims to analyze the multiple attributes which cause the learners to disengage from an online learning environment.

Design/methodology/approach

For this improvisation, Web based learning system is researched using data mining techniques in education. There are various attributes characterized for the disengagement prediction using web log file analysis. Though, there have been several attempts to include motivating characteristics in e-Learning systems are adapted, presently influence on cognition is acknowledged mostly.

Findings

Classification is one of the predictive data mining technique which makes prediction about values of data using known results found from different data sets. To find out the optimal solution for identifying disengaged learners in the online learning systems, Naive Bayesian (NB) classifier with Particle Swarm Optimization (PSO) algorithm is used which will classify the data set and then perform the independent analysis.

Originality/value

The experimental results shows that the use of unrelated variables in the class attributes will reduce the accuracy and reliability of a any classification model. However, the hybrid PSO algorithm is clearly more apt to find minor subsets of attributes than the PSO with NB classifier. The NB classifier combined with hybrid PSO feature selection method proves to be the best feature selection capability without degrading the classification accuracy. It is further proved to be an effective method for mining large structural data in much less computation time.

Article
Publication date: 8 June 2010

Pablo A.D. Castro and Fernando J. Von Zuben

The purpose of this paper is to apply a multi‐objective Bayesian artificial immune system (MOBAIS) to feature selection in classification problems aiming at minimizing both the…

Abstract

Purpose

The purpose of this paper is to apply a multi‐objective Bayesian artificial immune system (MOBAIS) to feature selection in classification problems aiming at minimizing both the classification error and cardinality of the subset of features. The algorithm is able to perform a multimodal search maintaining population diversity and controlling automatically the population size according to the problem. In addition, it is capable of identifying and preserving building blocks (partial components of the whole solution) effectively.

Design/methodology/approach

The algorithm evolves candidate subsets of features by replacing the traditional mutation operator in immune‐inspired algorithms with a probabilistic model which represents the probability distribution of the promising solutions found so far. Then, the probabilistic model is used to generate new individuals. A Bayesian network is adopted as the probabilistic model due to its capability of capturing expressive interactions among the variables of the problem. In order to evaluate the proposal, it was applied to ten datasets and the results compared with those generated by state‐of‐the‐art algorithms.

Findings

The experiments demonstrate the effectiveness of the multi‐objective approach to feature selection. The algorithm found parsimonious subsets of features and the classifiers produced a significant improvement in the accuracy. In addition, the maintenance of building blocks avoids the disruption of partial solutions, leading to a quick convergence.

Originality/value

The originality of this paper relies on the proposal of a novel algorithm to multi‐objective feature selection.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 3 no. 2
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 26 July 2019

Ayalapogu Ratna Raju, Suresh Pabboju and Ramisetty Rajeswara Rao

Brain tumor segmentation and classification is the interesting area for differentiating the tumorous and the non-tumorous cells in the brain and classifies the tumorous cells for…

Abstract

Purpose

Brain tumor segmentation and classification is the interesting area for differentiating the tumorous and the non-tumorous cells in the brain and classifies the tumorous cells for identifying its level. The methods developed so far lack the automatic classification, consuming considerable time for the classification. In this work, a novel brain tumor classification approach, namely, harmony cuckoo search-based deep belief network (HCS-DBN) has been proposed. Here, the images present in the database are segmented based on the newly developed hybrid active contour (HAC) segmentation model, which is the integration of the Bayesian fuzzy clustering (BFC) and the active contour model. The proposed HCS-DBN algorithm is trained with the features obtained from the segmented images. Finally, the classifier provides the information about the tumor class in each slice available in the database. Experimentation of the proposed HAC and the HCS-DBN algorithm is done using the MRI image available in the BRATS database, and results are observed. The simulation results prove that the proposed HAC and the HCS-DBN algorithm have an overall better performance with the values of 0.945, 0.9695 and 0.99348 for accuracy, sensitivity and specificity, respectively.

Design/methodology/approach

The proposed HAC segmentation approach integrates the properties of the AC model and BFC. Initially, the brain image with different modalities is subjected to segmentation with the BFC and AC models. Then, the Laplacian correction is applied to fuse the segmented outputs from each model. Finally, the proposed HAC segmentation provides the error-free segments of the brain tumor regions prevailing in the MRI image. The next step is to extract the useful features, based on scattering transform, wavelet transform and local Gabor binary pattern, from the segmented brain image. Finally, the extracted features from each segment are provided to the DBN for the training, and the HCS algorithm chooses the optimal weights for DBN training.

Findings

The experimentation of the proposed HAC with the HCS-DBN algorithm is analyzed with the standard BRATS database, and its performance is evaluated based on metrics such as accuracy, sensitivity and specificity. The simulation results of the proposed HAC with the HCS-DBN algorithm are compared against existing works such as k-NN, NN, multi-SVM and multi-SVNN. The results achieved by the proposed HAC with the HCS-DBN algorithm are eventually higher than the existing works with the values of 0.945, 0.9695 and 0.99348 for accuracy, sensitivity and specificity, respectively.

Originality/value

This work presents the brain tumor segmentation and the classification scheme by introducing the HAC-based segmentation model. The proposed HAC model combines the BFC and the active contour model through a fusion process, using the Laplacian correction probability for segmenting the slices in the database.

Details

Sensor Review, vol. 39 no. 4
Type: Research Article
ISSN: 0260-2288

Keywords

Open Access
Article
Publication date: 11 December 2020

Balamurugan Souprayen, Ayyasamy Ayyanar and Suresh Joseph K

The purpose of the food traceability is used to retain the good quality of raw material supply, diminish the loss and reduced system complexity.

1201

Abstract

Purpose

The purpose of the food traceability is used to retain the good quality of raw material supply, diminish the loss and reduced system complexity.

Design/methodology/approach

The proposed hybrid algorithm is for food traceability to make accurate predictions and enhanced period data. The operation of the internet of things is addressed to track and trace the food quality to check the data acquired from manufacturers and consumers.

Findings

In order to survive with the existing financial circumstances and the development of global food supply chain, the authors propose efficient food traceability techniques using the internet of things and obtain a solution for data prediction.

Originality/value

The operation of the internet of things is addressed to track and trace the food quality to check the data acquired from manufacturers and consumers. The experimental analysis depicts that proposed algorithm has high accuracy rate, less execution time and error rate.

Details

Modern Supply Chain Research and Applications, vol. 3 no. 1
Type: Research Article
ISSN: 2631-3871

Keywords

1 – 10 of 750