Search results

1 – 10 of over 2000

View access options

Article

Publication date: 27 August 2024

Machine learning insights: probing the variable importance of ex-ante information

Ali Albada, Eimad Eldin Abusham, Chui Zi Ong and Khalid Al Qatiti

Empirical examinations of initial public offering (IPO) initial returns often rely heavily on linear regression models. However, these models can prove inefficient owing to their…

HTML

PDF (6.8 MB)

Downloads

Abstract

Purpose

Empirical examinations of initial public offering (IPO) initial returns often rely heavily on linear regression models. However, these models can prove inefficient owing to their susceptibility to outliers, a common occurrence in IPO data. This study introduces a machine learning method, known as random forest, to address issues that linear regression may struggle to resolve.

Design/methodology/approach

The study’s sample comprises 352 fixed-priced IPOs from the year 2004 until 2021. A unique aspect of this research is its application of the random forest method. The accuracy of random forest in comparison to other methods is evaluated. The findings indicate that the random forest model significantly outperforms other methods in all of the evaluated aspects.

Findings

The variable importance measure indicates that investors’ demand, divergence of opinion among investors and offer price are the most crucial predictors of IPO initial returns. These determinants hold particular significance due to the widespread use of the fixed-price method in Malaysia, as this method amplifies the information asymmetry in the IPO market.

Originality/value

To the best of the authors’ knowledge, this study is among the pioneering works in Malaysian literature to apply the random forest method to address the constraints of conventional linear regression models. This is achieved by considering a more extensive array of factors and acknowledging the influence of outliers. Additionally, this study adds value to Malaysian literature by ranking and identifying the ex-ante information that best signals the issuing firm’s quality. This contribution facilitates prospective investors’ decision-making processes and provides issuing firms with effective means to communicate their value and quality to the IPO market.

Details

Managerial Finance, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 0307-4358

Keywords

View access options

Book part

Publication date: 6 September 2019

Detecting Non-injured Passengers and Drivers in Car Accidents: A New Under-resampling Method for Imbalanced Classification

Son Nguyen, Gao Niu, John Quinn, Alan Olinsky, Jonathan Ormsbee, Richard M. Smith and James Bishop

In recent years, the problem of classification with imbalanced data has been growing in popularity in the data-mining and machine-learning communities due to the emergence of an…

HTML

PDF (716 KB)

EPUB (432 KB)

Abstract

In recent years, the problem of classification with imbalanced data has been growing in popularity in the data-mining and machine-learning communities due to the emergence of an abundance of imbalanced data in many fields. In this chapter, we compare the performance of six classification methods on an imbalanced dataset under the influence of four resampling techniques. These classification methods are the random forest, the support vector machine, logistic regression, k-nearest neighbor (KNN), the decision tree, and AdaBoost. Our study has shown that all of the classification methods have difficulty when working with the imbalanced data, with the KNN performing the worst, detecting only 27.4% of the minority class. However, with the help of resampling techniques, all of the classification methods experience improvement on overall performances. In particular, the Random Forest, in combination with the random over-sampling technique, performs the best, achieving 82.8% balanced accuracy (the average of the true-positive rate and true-negative rate).

We then propose a new procedure to resample the data. Our method is based on the idea of eliminating “easy” majority observations before under-sampling them. It has further improved the balanced accuracy of the Random Forest to 83.7%, making it the best approach for the imbalanced data.

Details

Advances in Business and Management Forecasting

Type: Book

DOI:

ISBN: 978-1-78754-290-7

Keywords

View access options

Article

Publication date: 2 August 2024

Predicting the financial performance of microfinance institutions with machine learning techniques

Tang Ting, Md Aslam Mia, Md Imran Hossain and Khaw Khai Wah

Given the growing emphasis among scholars, practitioners and policymakers on financial sustainability, this study aims to explore the applicability of machine learning techniques…

HTML

PDF (1007 KB)

Downloads

100

Abstract

Purpose

Given the growing emphasis among scholars, practitioners and policymakers on financial sustainability, this study aims to explore the applicability of machine learning techniques in predicting the financial performance of microfinance institutions (MFIs).

Design/methodology/approach

This study gathered 9,059 firm-year observations spanning from 2003 to 2018 from the World Bank's Mix Market database. To predict the financial performance of MFIs, the authors applied a range of machine learning regression approaches to both training and testing data sets. These included linear regression, partial least squares, linear regression with stepwise selection, elastic net, random forest, quantile random forest, Bayesian ridge regression, K-Nearest Neighbors and support vector regression. All models were implemented using Python.

Findings

The findings revealed the random forest model as the most suitable choice, outperforming the other models considered. The effectiveness of the random forest model varied depending on specific scenarios, particularly the balance between training and testing data set proportions. More importantly, the results identified operational self-sufficiency as the most critical factor influencing the financial performance of MFIs.

Research limitations/implications

This study leveraged machine learning on a well-defined data set to identify the factors predicting the financial performance of MFIs. These insights offer valuable guidance for MFIs aiming to predict their long-term financial sustainability. Investors and donors can also use these findings to make informed decisions when selecting their potential recipients. Furthermore, practitioners and policymakers can use these findings to identify potential financial performance vulnerabilities.

Originality/value

This study stands out by using a global data set to investigate the best model for predicting the financial performance of MFIs, a relatively scarce subject in the existing microfinance literature. Moreover, it uses advanced machine learning techniques to gain a deeper understanding of the factors affecting the financial performance of MFIs.

Details

Journal of Modelling in Management, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 1746-5664

Keywords

View access options

Article

Publication date: 9 September 2022

Comparison of machine learning predictions of subjective poverty in rural China

Lucie Maruejols, Hanjie Wang, Qiran Zhao, Yunli Bai and Linxiu Zhang

Despite rising incomes and reduction of extreme poverty, the feeling of being poor remains widespread. Support programs can improve well-being, but they first require identifying…

HTML

PDF (581 KB)

Downloads

635

Abstract

Purpose

Despite rising incomes and reduction of extreme poverty, the feeling of being poor remains widespread. Support programs can improve well-being, but they first require identifying who are the households that judge their income is insufficient to meet their basic needs, and what factors are associated with subjective poverty.

Design/methodology/approach

Households report the income level they judge is sufficient to make ends meet. Then, they are classified as being subjectively poor if their own monetary income is inferior to the level they indicated. Second, the study compares the performance of three machine learning algorithms, the random forest, support vector machines and least absolute shrinkage and selection operator (LASSO) regression, applied to a set of socioeconomic variables to predict subjective poverty status.

Findings

The random forest generates 85.29% of correct predictions using a range of income and non-income predictors, closely followed by the other two techniques. For the middle-income group, the LASSO regression outperforms random forest. Subjective poverty is mostly associated with monetary income for low-income households. However, a combination of low income, low endowment (land, consumption assets) and unusual large expenditure (medical, gifts) constitutes the key predictors of feeling poor for the middle-income households.

Practical implications

To reduce the feeling of poverty, policy intervention should continue to focus on increasing incomes. However, improvements in nonincome domains such as health expenditure, education and family demographics can also relieve the feeling of income inadequacy. Methodologically, better performance of either algorithm depends on the data at hand.

Originality/value

For the first time, the authors show that prediction techniques are reliable to identify subjective poverty prevalence, with example from rural China. The analysis offers specific attention to the modest-income households, who may feel poor but not be identified as such by objective poverty lines, and is relevant when policy-makers seek to address the “next step” after ending extreme poverty. Prediction performance and mechanisms for three machine learning algorithms are compared.

Details

China Agricultural Economic Review, vol. 15 no. 2

Type: Research Article

DOI:

ISSN: 1756-137X

Keywords

View access options

Book part

Publication date: 8 November 2021

Do Machine Learning Models Hold the Key to Better Money Demand Forecasting?

Taniya Ghosh and Sakshi Agarwal

Significant evidence in the literature points to money demand instability and therefore inaccurate forecasting. In view of this issue, this chapter seeks to use a method…

HTML

PDF (1.3 MB)

EPUB (602 KB)

Abstract

Significant evidence in the literature points to money demand instability and therefore inaccurate forecasting. In view of this issue, this chapter seeks to use a method, innovative for money demand literature, that is, the machine learning model to predict money demand. Specifically, this chapter uses Random Forest Regression to predict money demand using monthly data in the Indian context over the period April-1996 to December-2018 using the variables usually used in literature. The chapter finds that in money demand prediction, the Random Forest Regression performs fairly well. The results are also compared to traditional models and it is found that the Random Forest Regression model has the potential to enhance the prediction of money demand over what traditional models predicts.

Details

Environmental, Social, and Governance Perspectives on Economic Development in Asia

Type: Book

DOI:

ISBN: 978-1-80117-594-4

Keywords

View access options

Article

Publication date: 13 February 2024

What are tenants demanding the most? A machine learning approach for the prediction of time on market

Marcelo Cajias and Anna Freudenreich

This is the first article to apply a machine learning approach to the analysis of time on market on real estate markets.

HTML

PDF (1.1 MB)

Downloads

100

Abstract

Purpose

This is the first article to apply a machine learning approach to the analysis of time on market on real estate markets.

Design/methodology/approach

The random survival forest approach is introduced to the real estate market. The most important predictors of time on market are revealed and it is analyzed how the survival probability of residential rental apartments responds to these major characteristics.

Findings

Results show that price, living area, construction year, year of listing and the distances to the next hairdresser, bakery and city center have the greatest impact on the marketing time of residential apartments. The time on market for an apartment in Munich is lowest at a price of 750 € per month, an area of 60 m2, built in 1985 and is in a range of 200–400 meters from the important amenities.

Practical implications

The findings might be interesting for private and institutional investors to derive real estate investment decisions and implications for portfolio management strategies and ultimately to minimize cash-flow failure.

Originality/value

Although machine learning algorithms have been applied frequently on the real estate market for the analysis of prices, its application for examining time on market is completely novel. This is the first paper to apply a machine learning approach to survival analysis on the real estate market.

Details

Journal of Property Investment & Finance, vol. 42 no. 2

Type: Research Article

DOI:

ISSN: 1463-578X

Keywords

View access options

Article

Publication date: 8 August 2022

Explainable housing price prediction with determinant analysis

Ean Zou Teoh, Wei-Chuen Yau, Thian Song Ong and Tee Connie

This study aims to develop a regression-based machine learning model to predict housing price, determine and interpret factors that contribute to housing prices using different…

HTML

PDF (1 MB)

Downloads

691

Abstract

Purpose

This study aims to develop a regression-based machine learning model to predict housing price, determine and interpret factors that contribute to housing prices using different data sets available publicly. The significant determinants that affect housing prices will be first identified by using multinomial logistics regression (MLR) based on the level of relative importance. A comprehensive study is then conducted by using SHapley Additive exPlanations (SHAP) analysis to examine the features that cause the major changes in housing prices.

Design/methodology/approach

Predictive analytics is an effective way to deal with uncertainties in process modelling and improve decision-making for housing price prediction. The focus of this paper is two-fold; the authors first apply regression analysis to investigate how well the housing independent variables contribute to the housing price prediction. Two data sets are used for this study, namely, Ames Housing dataset and Melbourne Housing dataset. For both the data sets, random forest regression performs the best by achieving an average R² of 86% for the Ames dataset and 85% for the Melbourne dataset, respectively. Second, multinomial logistic regression is adopted to investigate and identify the factor determinants of housing sales price. For the Ames dataset, the authors find that the top three most significant factor variables to determine the housing price is the general living area, basement size and age of remodelling. As for the Melbourne dataset, properties having more rooms/bathrooms, larger land size and closer distance to central business district (CBD) are higher priced. This is followed by a comprehensive analysis on how these determinants contribute to the predictability of the selected regression model by using explainable SHAP values. These prominent factors can be used to determine the optimal price range of a property which are useful for decision-making for both buyers and sellers.

Findings

By using the combination of MLR and SHAP analysis, it is noticeable that general living area, basement size and age of remodelling are the top three most important variables in determining the house’s price in the Ames dataset, while properties with more rooms/bathrooms, larger land area and closer proximity to the CBD or to the South of Melbourne are more expensive in the Melbourne dataset. These important factors can be used to estimate the best price range for a housing property for better decision-making.

Research limitations/implications

A limitation of this study is that the distribution of the housing prices is highly skewed. Although it is normal that the properties’ price is normally cluttered at the lower side and only a few houses are highly price. As mentioned before, MLR can effectively help in evaluating the likelihood ratio of each variable towards these categories. However, housing price is originally continuous, and there is a need to convert the price to categorical type. Nonetheless, the most effective method to categorize the data is still questionable.

Originality/value

The key point of this paper is the use of explainable machine learning approach to identify the prominent factors of housing price determination, which could be used to determine the optimal price range of a property which are useful for decision-making for both the buyers and sellers.

Details

International Journal of Housing Markets and Analysis, vol. 16 no. 5

Type: Research Article

DOI:

ISSN: 1753-8270

Keywords

View access options

Article

Publication date: 25 April 2022

Longitudinal modelling of housing prices with machine learning and temporal regression

Yu Zhang, Arnab Rahman and Eric Miller

The purpose of this paper is to model housing price temporal variations and to predict price trends within the context of land use–transportation interactions using machine…

HTML

PDF (1.4 MB)

Downloads

290

Abstract

Purpose

The purpose of this paper is to model housing price temporal variations and to predict price trends within the context of land use–transportation interactions using machine learning methods based on longitudinal observation of housing transaction prices.

Design/methodology/approach

This paper examines three machine learning algorithms (linear regression machine learning (ML), random forest and decision trees) applied to housing price trends from 2001 to 2016 in the Greater Toronto and Hamilton Area, with particular interests in the role of accessibility in modelling housing price. It compares the performance of the ML algorithms with traditional temporal lagged regression models.

Findings

The empirical results show that the ML algorithms achieve good accuracy (R² of 0.873 after cross-validation), and the temporal regression produces competitive results (R² of 0.876). Temporal lag effects are found to play a key role in housing price modelling, along with physical conditions and socio-economic factors. Differences in accessibility effects on housing prices differ by mode and activity type.

Originality/value

Housing prices have been extensively modelled through hedonic-based spatio-temporal regression and ML approaches. However, the mutually dependent relationship between transportation and land use makes price determination a complex process, and the comparison of different longitudinal analysis methods is rarely considered. The finding presents the longitudinal dynamics of housing market variation to housing planners.

Details

International Journal of Housing Markets and Analysis, vol. 16 no. 4

Type: Research Article

DOI:

ISSN: 1753-8270

Keywords

View access options

Article

Publication date: 14 February 2023

Prognosis of entrepreneurial traits among agricultural undergraduate students in India using machine learning

Sapna Jarial and Jayant Verma

This study aimed to understand the agri-entrepreneurial traits of undergraduate university students using machine learning (ML) algorithms.

HTML

PDF (533 KB)

Downloads

Abstract

Purpose

This study aimed to understand the agri-entrepreneurial traits of undergraduate university students using machine learning (ML) algorithms.

Design/methodology/approach

This study used a conceptual framework of individual-level determinants of entrepreneurship and ML. The Google Survey instrument was prepared on a 5-point scale and administered to 656 students in different sections of the same class during regular virtual classrooms in 2021. The datasets were analyzed and compared using ML.

Findings

Entrepreneurial traits existed among students before attending undergraduate entrepreneurship courses. Establishing strong partnerships (0.359), learning (0.347) and people-organizing ability (0.341) were promising correlated entrepreneurial traits. Female students exhibited fewer entrepreneurial traits than male students. The random forest model exhibited 60% accuracy in trait prediction against gradient boosting (58.4%), linear regression (56.8%), ridge (56.7%) and lasso regression (56.0%). Thus, the ML model appeared to be unsuitable to predict entrepreneurial traits. Quality data are important for accurate trait predictions.

Research limitations/implications

Further studies can validate K-nearest neighbors (KNN) and support vector machine (SVM) models against random forest to support the statement that the ML model cannot be used for entrepreneurial trait prediction.

Originality/value

This research is unique because ML models, such as random forest, gradient boosting and lasso regression, are used for entrepreneurial trait prediction by agricultural domain students.

Details

Journal of Agribusiness in Developing and Emerging Economies, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2044-0839

Keywords

Open Access

Article

Publication date: 19 August 2022

Stock market prediction by applying big data mining

Bedour M. Alshammari, Fairouz Aldhmour, Zainab M. AlQenaei and Haidar Almohri

There is a gap in knowledge about the Gulf Cooperation Council (GCC) because most studies are undertaken in countries outside the Gulf region – such as China, India, the US and…

HTML

PDF (929 KB)

Downloads

5162

Abstract

Purpose

There is a gap in knowledge about the Gulf Cooperation Council (GCC) because most studies are undertaken in countries outside the Gulf region – such as China, India, the US and Taiwan. The stock market contains rich, valuable and considerable data, and these data need careful analysis for good decisions to be made that can lead to increases in the efficiency of a business. Data mining techniques offer data processing tools and applications used to enhance decision-maker decisions. This study aims to predict the Kuwait stock market by applying big data mining.

Design/methodology/approach

The methodology used is quantitative techniques, which are mathematical and statistical models that describe a various array of the relationships of variables. Quantitative methods used to predict the direction of the stock market returns by using four techniques were implemented: logistic regression, decision trees, support vector machine and random forest.

Findings

The results are all variables statistically significant at the 5% level except gold price and oil price. Also, the variables that do not have an influence on the direction of the rate of return of Boursa Kuwait are money supply and gold price, unlike the Kuwait index, which has the highest coefficient. Furthermore, the height score of the variable that affects the direction of the rate of return is the firms, and the accuracy of the overall performance of the four models is nearly 50%.

Research limitations/implications

Some of the limitations identified for this study are as follows: (1) location limitation: Kuwait Stock Exchange; (2) time limitation: the amount of time available to accomplish the study, where the period was completed within the academic year 2019-2020 and the academic year 2020-2021. During 2020, the coronavirus pandemic (COVID-19), which was a major obstacle, occurred during data collection and analysis; (3) data limitation: The Kuwait Stock Exchange data were collected from May 2019 to March 2020, while the factors affecting the stock exchange data were collected in July 2020 due to the corona pandemic.

Originality/value

The study used new titles, variables and techniques such as using data mining to predict the Kuwait stock market. There are no adequate studies that predict the stock market by data mining in the GCC, especially in Kuwait. There is a gap in knowledge in the GCC as most studies are in foreign countries, such as China, India, the US and Taiwan.

Details

Arab Gulf Journal of Scientific Research, vol. 40 no. 2

Type: Research Article

DOI:

ISSN: 1985-9899

Keywords

Access

Year

Content type

1 – 10 of over 2000

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Practical implications

Originality/value

Details

Keywords

Abstract

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Practical implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Originality/value

Details

Keywords

Access

Year

Content type

All feedback is valuable

Report an issue or find answers to frequently asked questions