Search results
1 – 10 of 219Loan default risk or credit risk evaluation is important to financial institutions which provide loans to businesses and individuals. Loans carry the risk of being defaulted. To…
Abstract
Purpose
Loan default risk or credit risk evaluation is important to financial institutions which provide loans to businesses and individuals. Loans carry the risk of being defaulted. To understand the risk levels of credit users (corporations and individuals), credit providers (bankers) normally collect vast amounts of information on borrowers. Statistical predictive analytic techniques can be used to analyse or to determine the risk levels involved in loans. This paper aims to address the question of default prediction of short-term loans for a Tunisian commercial bank.
Design/methodology/approach
The authors have used a database of 924 files of credits granted to industrial Tunisian companies by a commercial bank in the years 2003, 2004, 2005 and 2006. The naive Bayesian classifier algorithm was used, and the results show that the good classification rate is of the order of 63.85 per cent. The default probability is explained by the variables measuring working capital, leverage, solvency, profitability and cash flow indicators.
Findings
The results of the validation test show that the good classification rate is of the order of 58.66 per cent; nevertheless, the error types I and II remain relatively high at 42.42 and 40.47 per cent, respectively. A receiver operating characteristic curve is plotted to evaluate the performance of the model. The result shows that the area under the curve criterion is of the order of 69 per cent.
Originality/value
The paper highlights the fact that the Tunisian central bank obliged all commercial banks to conduct a survey study to collect qualitative data for better credit notation of the borrowers.
Propósito
El riesgo de incumplimiento de préstamos o la evaluación del riesgo de crédito es importante para las instituciones financieras que otorgan préstamos a empresas e individuos. Existe el riesgo de que el pago de préstamos no se cumpla. Para entender los niveles de riesgo de los usuarios de crédito (corporaciones e individuos), los proveedores de crédito (banqueros) normalmente recogen gran cantidad de información sobre los prestatarios. Las técnicas analíticas predictivas estadísticas pueden utilizarse para analizar o determinar los niveles de riesgo involucrados en los préstamos. En este artículo abordamos la cuestión de la predicción por defecto de los préstamos a corto plazo para un banco comercial tunecino.
Diseño/metodología/enfoque
Utilizamos una base de datos de 924 archivos de créditos concedidos a empresas industriales tunecinas por un banco comercial en 2003, 2004, 2005 y 2006. El algoritmo bayesiano de clasificadores se llevó a cabo y los resultados muestran que la tasa de clasificación buena es del orden del 63.85%. La probabilidad de incumplimiento se explica por las variables que miden el capital de trabajo, el apalancamiento, la solvencia, la rentabilidad y los indicadores de flujo de efectivo.
Hallazgos
Los resultados de la prueba de validación muestran que la buena tasa de clasificación es del orden de 58.66% ; sin embargo, los errores tipo I y II permanecen relativamente altos, siendo de 42.42% y 40.47%, respectivamente. Se traza una curva ROC para evaluar el rendimiento del modelo. El resultado muestra que el criterio de área bajo curva (AUC, por sus siglas en inglés) es del orden del 69%.
Originalidad/valor
El documento destaca el hecho de que el Banco Central tunecino obligó a todas las entidades del sector llevar a cabo un estudio de encuesta para recopilar datos cualitativos para un mejor registro de crédito de los prestatarios.
Palabras clave
Curva ROC, Evaluación de riesgos, Riesgo de incumplimiento, Sector bancario, Algoritmo clasificador bayesiano.
Tipo de artículo
Artículo de investigación
Details
Keywords
Ruchi Kejriwal, Monika Garg and Gaurav Sarin
Stock market has always been lucrative for various investors. But, because of its speculative nature, it is difficult to predict the price movement. Investors have been using both…
Abstract
Purpose
Stock market has always been lucrative for various investors. But, because of its speculative nature, it is difficult to predict the price movement. Investors have been using both fundamental and technical analysis to predict the prices. Fundamental analysis helps to study structured data of the company. Technical analysis helps to study price trends, and with the increasing and easy availability of unstructured data have made it important to study the market sentiment. Market sentiment has a major impact on the prices in short run. Hence, the purpose is to understand the market sentiment timely and effectively.
Design/methodology/approach
The research includes text mining and then creating various models for classification. The accuracy of these models is checked using confusion matrix.
Findings
Out of the six machine learning techniques used to create the classification model, kernel support vector machine gave the highest accuracy of 68%. This model can be now used to analyse the tweets, news and various other unstructured data to predict the price movement.
Originality/value
This study will help investors classify a news or a tweet into “positive”, “negative” or “neutral” quickly and determine the stock price trends.
Details
Keywords
Khalid Iqbal and Muhammad Shehrayar Khan
In this digital era, email is the most pervasive form of communication between people. Many users become a victim of spam emails and their data have been exposed.
Abstract
Purpose
In this digital era, email is the most pervasive form of communication between people. Many users become a victim of spam emails and their data have been exposed.
Design/methodology/approach
Researchers contribute to solving this problem by a focus on advanced machine learning algorithms and improved models for detecting spam emails but there is still a gap in features. To achieve good results, features also play an important role. To evaluate the performance of applied classifiers, 10-fold cross-validation is used.
Findings
The results approve that the spam emails are correctly classified with the accuracy of 98.00% for the Support Vector Machine and 98.06% for the Artificial Neural Network as compared to other applied machine learning classifiers.
Originality/value
In this paper, Point-Biserial correlation is applied to each feature concerning the class label of the University of California Irvine (UCI) spambase email dataset to select the best features. Extensive experiments are conducted on selected features by training the different classifiers.
Details
Keywords
Kumash Kapadia, Hussein Abdel-Jaber, Fadi Thabtah and Wael Hadi
Indian Premier League (IPL) is one of the more popular cricket world tournaments, and its financial is increasing each season, its viewership has increased markedly and the…
Abstract
Indian Premier League (IPL) is one of the more popular cricket world tournaments, and its financial is increasing each season, its viewership has increased markedly and the betting market for IPL is growing significantly every year. With cricket being a very dynamic game, bettors and bookies are incentivised to bet on the match results because it is a game that changes ball-by-ball. This paper investigates machine learning technology to deal with the problem of predicting cricket match results based on historical match data of the IPL. Influential features of the dataset have been identified using filter-based methods including Correlation-based Feature Selection, Information Gain (IG), ReliefF and Wrapper. More importantly, machine learning techniques including Naïve Bayes, Random Forest, K-Nearest Neighbour (KNN) and Model Trees (classification via regression) have been adopted to generate predictive models from distinctive feature sets derived by the filter-based methods. Two featured subsets were formulated, one based on home team advantage and other based on Toss decision. Selected machine learning techniques were applied on both feature sets to determine a predictive model. Experimental tests show that tree-based models particularly Random Forest performed better in terms of accuracy, precision and recall metrics when compared to probabilistic and statistical models. However, on the Toss featured subset, none of the considered machine learning algorithms performed well in producing accurate predictive models.
Details
Keywords
Afreen Khan, Swaleha Zubair and Samreen Khan
This study aimed to assess the potential of the Clinical Dementia Rating (CDR) Scale in the prognosis of dementia in elderly subjects.
Abstract
Purpose
This study aimed to assess the potential of the Clinical Dementia Rating (CDR) Scale in the prognosis of dementia in elderly subjects.
Design/methodology/approach
Dementia staging severity is clinically an essential task, so the authors used machine learning (ML) on the magnetic resonance imaging (MRI) features to locate and study the impact of various MR readings onto the classification of demented and nondemented patients. The authors used cross-sectional MRI data in this study. The designed ML approach established the role of CDR in the prognosis of inflicted and normal patients. Moreover, the pattern analysis indicated CDR as a strong cohort amongst the various attributes, with CDR to have a significant value of p < 0.01. The authors employed 20 ML classifiers.
Findings
The mean prediction accuracy varied with the various ML classifier used, with the bagging classifier (random forest as a base estimator) achieving the highest (93.67%). A series of ML analyses demonstrated that the model including the CDR score had better prediction accuracy and other related performance metrics.
Originality/value
The results suggest that the CDR score, a simple clinical measure, can be used in real community settings. It can be used to predict dementia progression with ML modeling.
Details
Keywords
Muneza Kagzi, Sayantan Khanra and Sanjoy Kumar Paul
From a technological determinist perspective, machine learning (ML) may significantly contribute towards sustainable development. The purpose of this study is to synthesize prior…
Abstract
Purpose
From a technological determinist perspective, machine learning (ML) may significantly contribute towards sustainable development. The purpose of this study is to synthesize prior literature on the role of ML in promoting sustainability and to encourage future inquiries.
Design/methodology/approach
This study conducts a systematic review of 110 papers that demonstrate the utilization of ML in the context of sustainable development.
Findings
ML techniques may play a vital role in enabling sustainable development by leveraging data to uncover patterns and facilitate the prediction of various variables, thereby aiding in decision-making processes. Through the synthesis of findings from prior research, it is evident that ML may help in achieving many of the United Nations’ sustainable development goals.
Originality/value
This study represents one of the initial investigations that conducted a comprehensive examination of the literature concerning ML’s contribution to sustainability. The analysis revealed that the research domain is still in its early stages, indicating a need for further exploration.
Details
Keywords
Adam Christian Haupt, Jonathan Alt and Samuel Buttrey
This paper aims to use a data-driven approach to identify the factors and metrics that provide the best indicators of academic attrition in the Korean language program at the…
Abstract
Purpose
This paper aims to use a data-driven approach to identify the factors and metrics that provide the best indicators of academic attrition in the Korean language program at the Defense Language Institute Foreign Language Center.
Design methodology approach
This research develops logistic regression models to aid in the identification of at-risk students in the Defense Language Institute’s Korean language school.
Findings
The results from this research demonstrates that this methodology can detect significant factors and metrics that identify students at-risk. Additionally, this research shows that school policy changes can be detected using logistic regression models and stepwise regression.
Originality value
This research represents a real-world application of logistic regression modeling methods applied to the problem of identifying at-risk students for the purpose of academic intervention or other negative outcomes. By using logistic regression, the authors are able to gain a greater understanding of the problem and identify statistically significant predictors of student attrition that they believe can be converted into meaningful policy change.
Details
Keywords
The purpose this paper is to review some of the statistical methods used in the field of social sciences.
Abstract
Purpose
The purpose this paper is to review some of the statistical methods used in the field of social sciences.
Design/methodology/approach
A review of some of the statistical methodologies used in areas like survey methodology, official statistics, sociology, psychology, political science, criminology, public policy, marketing research, demography, education and economics.
Findings
Several areas are presented such as parametric modeling, nonparametric modeling and multivariate methods. Focus is also given to time series modeling, analysis of categorical data and sampling issues and other useful techniques for the analysis of data in the social sciences. Indicative references are given for all the above methods along with some insights for the application of these techniques.
Originality/value
This paper reviews some statistical methods that are used in social sciences and the authors draw the attention of researchers on less popular methods. The purpose is not to give technical details and also not to refer to all the existing techniques or to all the possible areas of statistics. The focus is mainly on the applied aspect of the techniques and the authors give insights about techniques that can be used to answer problems in the abovementioned areas of research.
Details