The effect of COVID-19 on the Egyptian exchange using principal component analysis

PurposeSince the beginning of 2020, economies faced many changes as a result of coronavirus disease 2019 (COVID-19) pandemic. The effect of COVID-19 on the Egyptian Exchange (EGX) is investigated in this research.Design/methodology/approachTo explore the impact of COVID-19, three periods were considered: (1) 17 months before the spread of COVID-19 and the start of the lockdown, (2) 17 months after the spread of COVID-19 and the during the lockdown and (3) 34 months comprehending the whole period (before and during COVID-19). Due to the large number of variables that could be considered, dimensionality reduction method, such as the principal component analysis (PCA) is followed. This method helps in determining the most individual stocks contributing to the main EGX index (EGX 30). The PCA, also, addresses the multicollinearity between the variables under investigation. Additionally, a principal component regression (PCR) model is developed to predict the future behavior of the EGX 30.FindingsThe results demonstrate that the first three principal components (PCs) could be considered to explain 89%, 85%, and 88% of data variability at (1) before COVID-19, (2) during COVID-19 and (3) the whole period, respectively. Furthermore, sectors of food and beverage, basic resources and real estate have not been affected by the COVID-19. The resulted Principal Component Regression (PCR) model performs very well. This could be concluded by comparing the observed values of EGX 30 with the predicted ones (R-squared estimated as 0.99).Originality/valueTo the best of our knowledge, no research has been conducted to investigate the effect of the COVID-19 on the EGX following an unsupervised machine learning method.


Introduction
Since the end of Year 2019, economies faced many challenges due to the imposed lockdown and public fear. Financial markets play significant role in countries' economies (Mishkin, 2010). However, financial markets are very fragile to sudden changes (Cont, 2001). Henceforth, inspecting the effect of coronavirus diseases 2019 (COVID-19) on financial markets is of significant importance.
Many studies were run to investigate the effect of the pandemic on the stock market of different countries [c.f. (Amin et al., 2021;Hong et al., 2021;Uddin et al., 2021;Yousfi et al., 2021;He et al., 2020;Sharif et al., 2020)]. Awad (2020) and Elayed and Abdelrhim (2020) investigated the impact of number of COVID-19 cases on the EGX. Authors above studied the contribution of individual stocks to EGX 30 (which includes the major 30 individual stocks in terms of activity and liquidity traded in the EGX) to identify stable sectors in the EGX.
To our knowledge, no study has been conducted to investigate the effect of COVID-19 on the EGX following unsupervised machine learning approach. In this paper, results are drawn from the stock trading behavior instead of linking number of COVID-19 cases to the EGX. By this, any inefficiency or inaccuracy in announced number of COVID-19 cases are avoided. This paper aims to help detecting the most affected individual stocks by . Results of this research will aid investors to decide on which stocks to trade during and pre the period of COVID-19. Also, this research would help the government to identify the negatively affected sectors to support them. Finally, researchers can study the spillover effect on the EGX returns.
The PCA is a dimensionality reduction technique, which main goal is to express original variables with a much smaller number of PCs that explain most of the data variability. A data set with n variables can be converted into a new lower dimension set of p (where, p < n) principal components (PCs) following the PCA. Each PC is a linear combination of the original n variables. PCA addresses the two main challenges in financial time series: (1) multicollinearity and (2) the huge number of dimensions (30 stocks).
Recently, much research has been developed different models that could work efficiently for financial market prediction (Hargreaves, 2019;Zhong and Enke, 2019;Cavalcante et al., 2016). As stock markets could be affected by a vast number of factors, it seems to be a wise idea to rely on the most important variables to predict the future behavior. Many researchers developed prediction models following dimensionality reduction techniques [c.f. (Cao and Wang, 2020;Ghorbani and Chong, 2020;Zhang, 2018;Waqar et al., 2017)]. In this research, the future behavior of the EGX is predicted following the PCR model, which is a regression analysis relying on the PCA algorithm. The period included in the study composes a period before the pandemic and a period since the start of the spread of COVID-19.
The main results reveal that the first three PCs could be considered to explain most of the data variability at (1) before COVID-19, (2) during COVID-19 and (3) the whole period. Moreover, sectors of food and beverage, basic resources and real estate have not been affected by the COVID-19. The developed PCR model performs very well. This could be decided by comparing the observed values of EGX 30 against the predicted ones (R-squared estimated as 0.99).
The rest of the paper is divided to five sections. Section 2 reviews the literature of studying the effect of COVID-19 on financial markets. Section 3 describes the data and their preprocessing. Section 4 illustrates the research methodology and the PCA and PCR algorithms. Main results and analyses are displayed in Section 5. Finally, Section 6 concludes the paper.

Literature review
In this section, we are going to review the literature of previous studies that investigated the effect of COVID-19 on financial markets. Baker et al. (2020) studied the effect of COVID-19 on the US stock market. They concluded that the crash caused by COVID-19 was very dramatic compared to previous crashes caused by infectious diseases.
Additionally, Mazur et al. (2021) investigated the US stock market performance during the COVID-19 pandemic. The main findings show that the natural gas, food, health care, and software stocks earned high returns. On the other hand, petroleum, real estate, entertainment and hospitality sectors fell drastically. Liu et al. (2021) studied the effect of COVID-19 on the stock market in China. They found that the pandemic increased stock crash risk. Izzeldin et al. (2021) investigated the impact of COVID-19 on the stock markets in the G7 countries. The authors found that the health care and consumer services were the most severely affected sectors. The technology sector was marginally hit, as enforced lockdown pushed people to exploit web-based entertainment and other distraction sources. Contessi and Pace (2021) studied the spread of COVID-19 and stock market collapses in 18 major countries. The main results reveal that the instability was transmitted from the stock market in China to all other markets, especially the European ones. Amin et al. (2021) studied the effect of the spread of COVID-19 on financial markets in three regions: Central America, North America and South America. The results show that COVID-19 had negative impact on the stock markets. However, there was an insignificant correlation between COVID-19 and the stock market in South America. Hong et al. (2021) investigated the association between COVID-19 and the instability of both stock return predictability and price volatility in the US. The results display a single break in return predictability and price volatility. Also, the timing of the break is synchronized with the COVID-19 outbreak. Uddin et al. (2021) studied the effect of COVID-19 pandemic on stock market volatility and economic strength. The effect was evaluated by a set of selected country-level economic measures and factors. The results show that governments could minimize financial volatility by setting different economic policy responses. Yousfi et al. (2021) performed a comparative assessment of the effect of the first and second waves of COVID-19 for the US stock market and its uncertainty. The paper presents dynamic conditional correlation and asymmetric impacts of waves on the association between the US and the Chinese stock markets before and during COVID-19. Also, the correlation between COVID-19 and the US returns and uncertainty during the pandemic were examined. The results support the existence of spillover effect between the two stock markets. Also, a persistent link between the US returns, uncertainty, and COVID-19 was observed. The results demonstrate that the pandemic caused harmful effects on financial markets in general and on the US economy in specifically. He et al. (2020) studied the spillover effect of COVID-19 on stock markets, such as China, Italy, South Korea, France, Spain, Germany, Japan and the US The results illustrate that COVID-19 has a negative but short-term effect on stock markets of affected countries. Additionally, the effect of COVID-19 on stock markets has bidirectional spillover association between Asian countries and European and American countries.

JHASS
Given the foregoing discussion and argument, we speculated the following hypothesis: H1. Some sectors in the EGX are affected by COVID-19 more than others.
To examine H1, we divided the period under investigation to three windows: before COVID-19, during COVID-19 and the whole period. The correlation between individual stocks contributing to each PC and the respective PCs was examined.

Data description and preprocessing
This research investigates the effect of COVID-19 on the EGX. To achieve research goal, we studied the EGX 30 price index and its constituents in the period from August 1, 2018 to April 30, 2021 (670 observations). This period is divided to two main windows: (1) before COVID-19 (from August 1, 2018 to January 1, 2020) and (2) since spread of COVID-19 (from January 1, 2020 to April 30, 2021). There were 20 individual stocks registered in EGX 30 for the two-time windows. Description of individual stocks' names, symbols and sectors is provided in Table 1.
As illustrated in Table 1, some data were missing. Deleting their corresponding records may generate a totally biased data set. Accordingly, the Missing Completely At Random (MCAR) Test was run. The p-value for the Hawkins test of normality and homoscedasticity is 2.35e-26. This implies that either the test of multivariate normality or homoscedasticity (or both) is rejected. Provided that normality can be assumed, the hypothesis of MCAR is rejected at 0.05 significance level. Multiple Imputation by Chained Equations (MICE) approach that uses Classification And Regression Trees (CART) was followed to replace the missing values (Burgette and Reiter, 2010). It does not set parametric assumptions or data transformations to Effect of COVID-19 on EGX using PCA fit nonlinear relations and complex distributions. The algorithm was run for 1,000 times, and the median of these runs was considered to replace the missing values. Figure 1 displays the result of the multiple imputation. Heteroskedasticity is one of the important features of financial time series that would influence the results of the PCA. Thereafter, stock returns are used for the PCA instead of stock closing prices. Stock returns, r tj , are calculated as where p tj ¼ logðP tj Þ and P tj is the closing price at time t, where t ¼ 1; 2; . . . ; T; for stock j, where j 5 1, 2, . . ., J.

Research methodology
Many studies show that Stock market data are highly correlated (Sharma and Banerjee, 2015; Solnik et al., 1996). One important feature of the PCA is the ability to address multicollinearity and high dimensionality. The PCA algorithm is run after data preprocessing. To find the PCs, we need first to standardize the data, such as where r j is the mean and s j is the standard deviation for stock j, where j ¼ 1; 2; . . . ; J. Then, the correlation matrix is calculated for standardized values. The PCs are the eigenvectors of the correlation matrix. These eigenvectors are the directions that explain most data variance. Eigenvalues are the coefficients associated with the eigenvectors. Eigenvalues provide the amount of variance that could be captured by each PC.

Results and analyses
The main results of the PCA are represented in this section. Tables 2-4 report the eigenvalue, percentage of explained variance and cumulative percentage of variance for each PC for the time windows: (1) before COVID-19, (2) during COVID-19 and (3) the whole period, respectively. To decide on the number of PCs that should be considered, three rules in the literature could be followed (Jolliffe, 2002): (1) consider the first PCs that explain at least 70 percent of cumulative variance, (2) in the scree plot determine at which PC the percentage of explained variance would not be improved (the elbow) and/or (3) choosing PCs with eigenvalues that are greater than one, this rule is known as Kaiser's rule (Kaiser, 1960). Tables 2-4 report that the first three PCs could explain about 89%, 85% and 88%, Effect of COVID-19 on EGX using PCA respectively, of data variability. This cumulative percentage of variance is considered very pleasing. Figure 2 illustrate the scree plot for the three times windows: before COVID-19, during COVID-19 and the whole period, respectively. The figure shows that after the fourth PC the added explanation for the variance is very low. Additionally, Tables 2-4 report that eigenvalues attached to the first three PCs are greater than one. Accordingly, the first three PCs can explain variance more than the variance that could be explained by single original variables. So, the three rules support reflecting the first three PCs in the analysis. Effect of COVID-19 on EGX using PCA Each panel in Figures 3-5 display the highest ten individual stocks contributing to the first three PCs for each time window: before COVID-19, during COVID-19 and the whole period, respectively. For the before COVID-19 time window, ORWE, HRHO, ACGC, OCDI and HELI are the highest stocks contributing to the first PC; ACGC, COMI, CIEB and EGTS are Figure 3. The most ten contributing stocks to the first three PCs before COVID-19, respectively JHASS Figure 4. The most ten contributing stocks to the first three PCs during respectively Effect of COVID-19 on EGX using PCA Figure 5. The most ten contributing stocks to the first three PCs the whole period, respectively JHASS the highest stocks contributing to the second PC; and ORWE, ACGC, HELI, OCDI and COMI are the highest stocks contributing to the third PC. For the during COVID-19 time window, IRON, ESRS, EGTS, COMI, JUFO, SWDY, OCDI,  EAST, MNHD and PHDC are the highest stocks contributing to the first PC; HELI, COMI Linear combinations of the variables that could be conveyed by the first three PCs for the whole period could be described by the following three equations:  Table 5 reports correlation coefficients between individual stocks and the first three PCs for the before COVID-19, during Covid-19 and the whole period, respectively. Studying correlation between variables and the respective PCs is important for results interpretation (Jolliffe, 2002). Let us consider results reported in Table 5. As PC1 in all time windows explains 50% or more of the data variability, we are going to focus on analyzing PC1 in the three-time windows.
The results show that ESRS, IRON, JUFO, OCDI, PHDC and TMGH are positively related to PC1 in all the time windows. This indicates that these individual stocks had not been affected by the COVID-19 and they are moving together. This suggests that sectors of food and beverage, basic resources and real estate have not been affected by the COVID-19. Accordingly, for long-run investment decisions, investing in these individual stocks may Effect of COVID-19 on EGX using PCA seem to be more stable and good decision. On the other hand, the PIOH (financial services excluding banks) and MNHD (real estate) did not play significant role during COVID-19. However, CCAP and EKHO (financial services excluding banks) EGCH (chemicals) and EGTS (travel and leisure) are positively correlated during COVID-19. Finally, let us consider the results of implementing the PCR. As past effect cannot be ignored, the prediction model is run considering the whole period. Figure 6 shows the results of implementing cross-validation method to verify the best model. The figure displays that, the minimal Mean Squared Error of Prediction (MSEP) is approached at the first PC. Thereafter, PC1 is sufficient for predicting EGX 30 (98% of variability is explained with PC1).

Conclusion
This paper investigates the effect of the COVID-19 on the Egyptian Stock Exchange EGX. This is achieved by applying an unsupervised machine learning method, which is the principal component analysis (PCA). PCA is a dimensionality reduction technique, which aims to find a linear combination of the original variables that explains most of the data variability. A total period of three years was studied. This period includes 16 months before the COVID-19 and 16 months during it. Three PCA models were built: (1) before the COVID-19, (2) during the COVID-19 and (3) the whole period. The study finds that for the three periods the first three Principal Components (PCs) captures about 89%, 85% and 88% of the data variability, respectively. Moreover, the results indicate that sectors of food and beverage, basic resources and real estate have not been affected by the COVID-19. Real estate stability could be explained by huge investments of the government in this sector especially the construction and development of the new capital. Policymakers could benefit from these results by issuing different regulations to support negatively affected sectors. Researchers need to study the economy features that could absorb the negative impact of the pandemic on financial markets. Moreover, a PCR model was developed to predict the future performance of the EGX 30. By comparing the observed values of EGX 30 with the predicted ones (R-squared estimated as 0.99), we can conclude that the PCR model performs fairly well. Effect of COVID-19 on EGX using PCA