To read this content please select one of the options below:

Machine learning prediction of factors affecting Major League Baseball (MLB) game attendance: algorithm comparisons and macroeconomic factor of unemployment

Juho Park (Korea Institute of Sport Science, Nowon-gu, South Korea)
Junghwan Cho (Halla University, Wonju, South Korea)
Alex C. Gang (Department of Educational Leadership and Sport Management, Washington State University, Pullman, Washington, USA)
Hyun-Woo Lee (Department of Health and Kinesiology, Texas A&M University, College Station, Texas, USA)
Paul M. Pedersen (Department of Kinesiology, School of Public Health, Indiana University, Bloomington, Indiana, USA)

International Journal of Sports Marketing and Sponsorship

ISSN: 1464-6668

Article publication date: 8 February 2024

Issue publication date: 19 March 2024




This study aims to identify an automated machine learning algorithm with high accuracy that sport practitioners can use to identify the specific factors for predicting Major League Baseball (MLB) attendance. Furthermore, by predicting spectators for each league (American League and National League) and division in MLB, the authors will identify the specific factors that increase accuracy, discuss them and provide implications for marketing strategies for academics and practitioners in sport.


This study used six years of daily MLB game data (2014–2019). All data were collected as predictors, such as game performance, weather and unemployment rate. Also, the attendance rate was obtained as an observation variable. The Random Forest, Lasso regression models and XGBoost were used to build the prediction model, and the analysis was conducted using Python 3.7.


The RMSE value was 0.14, and the R2 was 0.62 as a consequence of fine-tuning the tuning parameters of the XGBoost model, which had the best performance in forecasting the attendance rate. The most influential variables in the model are “Rank” of 0.247 and “Day of the week”, “Home team” and “Day/Night game” were shown as influential variables in order. The result was shown that the “Unemployment rate”, as a macroeconomic factor, has a value of 0.06 and weather factors were a total value of 0.147.


This research highlights unemployment rate as a determinant affecting MLB game attendance rates. Beyond contextual elements such as climate, the findings of this study underscore the significance of economic factors, particularly unemployment rates, necessitating further investigation into these factors to gain a more comprehensive understanding of game attendance.



Park, J., Cho, J., Gang, A.C., Lee, H.-W. and Pedersen, P.M. (2024), "Machine learning prediction of factors affecting Major League Baseball (MLB) game attendance: algorithm comparisons and macroeconomic factor of unemployment", International Journal of Sports Marketing and Sponsorship, Vol. 25 No. 2, pp. 382-395.



Emerald Publishing Limited

Copyright © 2024, Emerald Publishing Limited

Related articles