Search results

1 – 10 of over 2000
Book part
Publication date: 6 September 2019

Son Nguyen, Gao Niu, John Quinn, Alan Olinsky, Jonathan Ormsbee, Richard M. Smith and James Bishop

In recent years, the problem of classification with imbalanced data has been growing in popularity in the data-mining and machine-learning communities due to the emergence of an…

Abstract

In recent years, the problem of classification with imbalanced data has been growing in popularity in the data-mining and machine-learning communities due to the emergence of an abundance of imbalanced data in many fields. In this chapter, we compare the performance of six classification methods on an imbalanced dataset under the influence of four resampling techniques. These classification methods are the random forest, the support vector machine, logistic regression, k-nearest neighbor (KNN), the decision tree, and AdaBoost. Our study has shown that all of the classification methods have difficulty when working with the imbalanced data, with the KNN performing the worst, detecting only 27.4% of the minority class. However, with the help of resampling techniques, all of the classification methods experience improvement on overall performances. In particular, the Random Forest, in combination with the random over-sampling technique, performs the best, achieving 82.8% balanced accuracy (the average of the true-positive rate and true-negative rate).

We then propose a new procedure to resample the data. Our method is based on the idea of eliminating “easy” majority observations before under-sampling them. It has further improved the balanced accuracy of the Random Forest to 83.7%, making it the best approach for the imbalanced data.

Details

Advances in Business and Management Forecasting
Type: Book
ISBN: 978-1-78754-290-7

Keywords

Article
Publication date: 9 September 2022

Lucie Maruejols, Hanjie Wang, Qiran Zhao, Yunli Bai and Linxiu Zhang

Despite rising incomes and reduction of extreme poverty, the feeling of being poor remains widespread. Support programs can improve well-being, but they first require identifying…

Abstract

Purpose

Despite rising incomes and reduction of extreme poverty, the feeling of being poor remains widespread. Support programs can improve well-being, but they first require identifying who are the households that judge their income is insufficient to meet their basic needs, and what factors are associated with subjective poverty.

Design/methodology/approach

Households report the income level they judge is sufficient to make ends meet. Then, they are classified as being subjectively poor if their own monetary income is inferior to the level they indicated. Second, the study compares the performance of three machine learning algorithms, the random forest, support vector machines and least absolute shrinkage and selection operator (LASSO) regression, applied to a set of socioeconomic variables to predict subjective poverty status.

Findings

The random forest generates 85.29% of correct predictions using a range of income and non-income predictors, closely followed by the other two techniques. For the middle-income group, the LASSO regression outperforms random forest. Subjective poverty is mostly associated with monetary income for low-income households. However, a combination of low income, low endowment (land, consumption assets) and unusual large expenditure (medical, gifts) constitutes the key predictors of feeling poor for the middle-income households.

Practical implications

To reduce the feeling of poverty, policy intervention should continue to focus on increasing incomes. However, improvements in nonincome domains such as health expenditure, education and family demographics can also relieve the feeling of income inadequacy. Methodologically, better performance of either algorithm depends on the data at hand.

Originality/value

For the first time, the authors show that prediction techniques are reliable to identify subjective poverty prevalence, with example from rural China. The analysis offers specific attention to the modest-income households, who may feel poor but not be identified as such by objective poverty lines, and is relevant when policy-makers seek to address the “next step” after ending extreme poverty. Prediction performance and mechanisms for three machine learning algorithms are compared.

Details

China Agricultural Economic Review, vol. 15 no. 2
Type: Research Article
ISSN: 1756-137X

Keywords

Book part
Publication date: 8 November 2021

Taniya Ghosh and Sakshi Agarwal

Significant evidence in the literature points to money demand instability and therefore inaccurate forecasting. In view of this issue, this chapter seeks to use a method…

Abstract

Significant evidence in the literature points to money demand instability and therefore inaccurate forecasting. In view of this issue, this chapter seeks to use a method, innovative for money demand literature, that is, the machine learning model to predict money demand. Specifically, this chapter uses Random Forest Regression to predict money demand using monthly data in the Indian context over the period April-1996 to December-2018 using the variables usually used in literature. The chapter finds that in money demand prediction, the Random Forest Regression performs fairly well. The results are also compared to traditional models and it is found that the Random Forest Regression model has the potential to enhance the prediction of money demand over what traditional models predicts.

Details

Environmental, Social, and Governance Perspectives on Economic Development in Asia
Type: Book
ISBN: 978-1-80117-594-4

Keywords

Article
Publication date: 13 February 2024

Marcelo Cajias and Anna Freudenreich

This is the first article to apply a machine learning approach to the analysis of time on market on real estate markets.

Abstract

Purpose

This is the first article to apply a machine learning approach to the analysis of time on market on real estate markets.

Design/methodology/approach

The random survival forest approach is introduced to the real estate market. The most important predictors of time on market are revealed and it is analyzed how the survival probability of residential rental apartments responds to these major characteristics.

Findings

Results show that price, living area, construction year, year of listing and the distances to the next hairdresser, bakery and city center have the greatest impact on the marketing time of residential apartments. The time on market for an apartment in Munich is lowest at a price of 750 € per month, an area of 60 m2, built in 1985 and is in a range of 200–400 meters from the important amenities.

Practical implications

The findings might be interesting for private and institutional investors to derive real estate investment decisions and implications for portfolio management strategies and ultimately to minimize cash-flow failure.

Originality/value

Although machine learning algorithms have been applied frequently on the real estate market for the analysis of prices, its application for examining time on market is completely novel. This is the first paper to apply a machine learning approach to survival analysis on the real estate market.

Details

Journal of Property Investment & Finance, vol. 42 no. 2
Type: Research Article
ISSN: 1463-578X

Keywords

Article
Publication date: 8 August 2022

Ean Zou Teoh, Wei-Chuen Yau, Thian Song Ong and Tee Connie

This study aims to develop a regression-based machine learning model to predict housing price, determine and interpret factors that contribute to housing prices using different…

522

Abstract

Purpose

This study aims to develop a regression-based machine learning model to predict housing price, determine and interpret factors that contribute to housing prices using different data sets available publicly. The significant determinants that affect housing prices will be first identified by using multinomial logistics regression (MLR) based on the level of relative importance. A comprehensive study is then conducted by using SHapley Additive exPlanations (SHAP) analysis to examine the features that cause the major changes in housing prices.

Design/methodology/approach

Predictive analytics is an effective way to deal with uncertainties in process modelling and improve decision-making for housing price prediction. The focus of this paper is two-fold; the authors first apply regression analysis to investigate how well the housing independent variables contribute to the housing price prediction. Two data sets are used for this study, namely, Ames Housing dataset and Melbourne Housing dataset. For both the data sets, random forest regression performs the best by achieving an average R2 of 86% for the Ames dataset and 85% for the Melbourne dataset, respectively. Second, multinomial logistic regression is adopted to investigate and identify the factor determinants of housing sales price. For the Ames dataset, the authors find that the top three most significant factor variables to determine the housing price is the general living area, basement size and age of remodelling. As for the Melbourne dataset, properties having more rooms/bathrooms, larger land size and closer distance to central business district (CBD) are higher priced. This is followed by a comprehensive analysis on how these determinants contribute to the predictability of the selected regression model by using explainable SHAP values. These prominent factors can be used to determine the optimal price range of a property which are useful for decision-making for both buyers and sellers.

Findings

By using the combination of MLR and SHAP analysis, it is noticeable that general living area, basement size and age of remodelling are the top three most important variables in determining the house’s price in the Ames dataset, while properties with more rooms/bathrooms, larger land area and closer proximity to the CBD or to the South of Melbourne are more expensive in the Melbourne dataset. These important factors can be used to estimate the best price range for a housing property for better decision-making.

Research limitations/implications

A limitation of this study is that the distribution of the housing prices is highly skewed. Although it is normal that the properties’ price is normally cluttered at the lower side and only a few houses are highly price. As mentioned before, MLR can effectively help in evaluating the likelihood ratio of each variable towards these categories. However, housing price is originally continuous, and there is a need to convert the price to categorical type. Nonetheless, the most effective method to categorize the data is still questionable.

Originality/value

The key point of this paper is the use of explainable machine learning approach to identify the prominent factors of housing price determination, which could be used to determine the optimal price range of a property which are useful for decision-making for both the buyers and sellers.

Details

International Journal of Housing Markets and Analysis, vol. 16 no. 5
Type: Research Article
ISSN: 1753-8270

Keywords

Article
Publication date: 25 April 2022

Yu Zhang, Arnab Rahman and Eric Miller

The purpose of this paper is to model housing price temporal variations and to predict price trends within the context of land use–transportation interactions using machine…

Abstract

Purpose

The purpose of this paper is to model housing price temporal variations and to predict price trends within the context of land use–transportation interactions using machine learning methods based on longitudinal observation of housing transaction prices.

Design/methodology/approach

This paper examines three machine learning algorithms (linear regression machine learning (ML), random forest and decision trees) applied to housing price trends from 2001 to 2016 in the Greater Toronto and Hamilton Area, with particular interests in the role of accessibility in modelling housing price. It compares the performance of the ML algorithms with traditional temporal lagged regression models.

Findings

The empirical results show that the ML algorithms achieve good accuracy (R2 of 0.873 after cross-validation), and the temporal regression produces competitive results (R2 of 0.876). Temporal lag effects are found to play a key role in housing price modelling, along with physical conditions and socio-economic factors. Differences in accessibility effects on housing prices differ by mode and activity type.

Originality/value

Housing prices have been extensively modelled through hedonic-based spatio-temporal regression and ML approaches. However, the mutually dependent relationship between transportation and land use makes price determination a complex process, and the comparison of different longitudinal analysis methods is rarely considered. The finding presents the longitudinal dynamics of housing market variation to housing planners.

Details

International Journal of Housing Markets and Analysis, vol. 16 no. 4
Type: Research Article
ISSN: 1753-8270

Keywords

Open Access
Article
Publication date: 19 August 2022

Bedour M. Alshammari, Fairouz Aldhmour, Zainab M. AlQenaei and Haidar Almohri

There is a gap in knowledge about the Gulf Cooperation Council (GCC) because most studies are undertaken in countries outside the Gulf region – such as China, India, the US and…

4632

Abstract

Purpose

There is a gap in knowledge about the Gulf Cooperation Council (GCC) because most studies are undertaken in countries outside the Gulf region – such as China, India, the US and Taiwan. The stock market contains rich, valuable and considerable data, and these data need careful analysis for good decisions to be made that can lead to increases in the efficiency of a business. Data mining techniques offer data processing tools and applications used to enhance decision-maker decisions. This study aims to predict the Kuwait stock market by applying big data mining.

Design/methodology/approach

The methodology used is quantitative techniques, which are mathematical and statistical models that describe a various array of the relationships of variables. Quantitative methods used to predict the direction of the stock market returns by using four techniques were implemented: logistic regression, decision trees, support vector machine and random forest.

Findings

The results are all variables statistically significant at the 5% level except gold price and oil price. Also, the variables that do not have an influence on the direction of the rate of return of Boursa Kuwait are money supply and gold price, unlike the Kuwait index, which has the highest coefficient. Furthermore, the height score of the variable that affects the direction of the rate of return is the firms, and the accuracy of the overall performance of the four models is nearly 50%.

Research limitations/implications

Some of the limitations identified for this study are as follows: (1) location limitation: Kuwait Stock Exchange; (2) time limitation: the amount of time available to accomplish the study, where the period was completed within the academic year 2019-2020 and the academic year 2020-2021. During 2020, the coronavirus pandemic (COVID-19), which was a major obstacle, occurred during data collection and analysis; (3) data limitation: The Kuwait Stock Exchange data were collected from May 2019 to March 2020, while the factors affecting the stock exchange data were collected in July 2020 due to the corona pandemic.

Originality/value

The study used new titles, variables and techniques such as using data mining to predict the Kuwait stock market. There are no adequate studies that predict the stock market by data mining in the GCC, especially in Kuwait. There is a gap in knowledge in the GCC as most studies are in foreign countries, such as China, India, the US and Taiwan.

Details

Arab Gulf Journal of Scientific Research, vol. 40 no. 2
Type: Research Article
ISSN: 1985-9899

Keywords

Article
Publication date: 14 February 2023

Sapna Jarial and Jayant Verma

This study aimed to understand the agri-entrepreneurial traits of undergraduate university students using machine learning (ML) algorithms.

Abstract

Purpose

This study aimed to understand the agri-entrepreneurial traits of undergraduate university students using machine learning (ML) algorithms.

Design/methodology/approach

This study used a conceptual framework of individual-level determinants of entrepreneurship and ML. The Google Survey instrument was prepared on a 5-point scale and administered to 656 students in different sections of the same class during regular virtual classrooms in 2021. The datasets were analyzed and compared using ML.

Findings

Entrepreneurial traits existed among students before attending undergraduate entrepreneurship courses. Establishing strong partnerships (0.359), learning (0.347) and people-organizing ability (0.341) were promising correlated entrepreneurial traits. Female students exhibited fewer entrepreneurial traits than male students. The random forest model exhibited 60% accuracy in trait prediction against gradient boosting (58.4%), linear regression (56.8%), ridge (56.7%) and lasso regression (56.0%). Thus, the ML model appeared to be unsuitable to predict entrepreneurial traits. Quality data are important for accurate trait predictions.

Research limitations/implications

Further studies can validate K-nearest neighbors (KNN) and support vector machine (SVM) models against random forest to support the statement that the ML model cannot be used for entrepreneurial trait prediction.

Originality/value

This research is unique because ML models, such as random forest, gradient boosting and lasso regression, are used for entrepreneurial trait prediction by agricultural domain students.

Details

Journal of Agribusiness in Developing and Emerging Economies, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2044-0839

Keywords

Article
Publication date: 1 October 2018

Vinod Nistane and Suraj Harsha

In rotary machines, the bearing failure is one of the major causes of the breakdown of machinery. The bearing degradation monitoring is a great anxiety for the prevention of…

Abstract

Purpose

In rotary machines, the bearing failure is one of the major causes of the breakdown of machinery. The bearing degradation monitoring is a great anxiety for the prevention of bearing failures. This paper aims to present a combination of the stationary wavelet decomposition and extra-trees regression (ETR) for the evaluation of bearing degradation.

Design/methodology/approach

The higher order cumulants features are extracted from the bearing vibration signals by using the stationary wavelet decomposition (stationary wavelet transform [SWT]). The extracted features are then subjected to the ETR for obtaining normal and failure state. A dominance level curve build using the dissimilarity data of test object and retained as health degradation indicator for the evaluation of bearing health.

Findings

Experiment conducts to verify and assess the effectiveness of ETR for the evaluation of performance of bearing degradation. To justify the preeminence of recommended approach, it is compared with the performance of random forest regression and multi-layer perceptron regression.

Originality/value

The experimental results indicated that the presently adopted method shows better performance for detecting the degradation more accurately at early stage. Furthermore, the diagnostics and prognostics have been getting much attention in the field of vibration, and it plays a significant role to avoid accidents.

Book part
Publication date: 28 September 2023

M Anand Shankar Raja, Keerthana Shekar, B Harshith and Purvi Rastogi

The COVID-19 pandemic has recently had an impact on the stock market all over the globe. A thorough review of the literature that included the most cited articles and articles…

Abstract

The COVID-19 pandemic has recently had an impact on the stock market all over the globe. A thorough review of the literature that included the most cited articles and articles from well-known databases revealed that earlier research in the field had not specifically addressed how the BRIC stock markets responded to the COVID-19 pandemic. The data regarding COVID-19 were collected from the World Health Organization (WHO) website, and the stock market data were collected from Yahoo Finance and the respective country’s stock exchange. A random forest regression algorithm takes the closing price of respective stock indices as target variables and COVID-19 variables as input variables. Using this algorithm, a model is fit to the data and is visualised using line plots. This study’s findings highlight a relationship between the COVID-19 variables and stock market indices. In addition, the stock market of BRIC countries showed a high correlation, especially with the Shanghai Composite Stock Index with a correlation value of 0.7 and above. Brazil took the worst hit in the studied duration by declining approximately 45.99%, followed by India by 37.76%. Finally, the data set’s model fit, which employed the random forest machine learning method, produced R2 values of 0.972, 0.005, 0.997, and 0.983 and mean percentage errors of 1.4, 0.8, 0.9, and 0.8 for Brazil, Russia, India, and China (BRIC), respectively. Even now, two years after the coronavirus pandemic started, the Brazilian stock index has not yet returned to its pre-pandemic level.

Details

Digital Transformation, Strategic Resilience, Cyber Security and Risk Management
Type: Book
ISBN: 978-1-83797-009-4

Keywords

1 – 10 of over 2000