Abstract
Purpose
This study explores whether a new machine learning method can more accurately predict the movement of stock prices.
Design/methodology/approach
This study presents a novel hybrid deep learning model, Residual-CNN-Seq2Seq (RCSNet), to predict the trend of stock price movement. RCSNet integrates the autoregressive integrated moving average (ARIMA) model, convolutional neural network (CNN) and the sequence-to-sequence (Seq2Seq) long–short-term memory (LSTM) model.
Findings
The hybrid model is able to forecast both linear and non-linear time-series component of stock dataset. CNN and Seq2Seq LSTMs can be effectively combined for dynamic modeling of short- and long-term-dependent patterns in non-linear time series forecast. Experimental results show that the proposed model outperforms baseline models on S&P 500 index stock dataset from January 2000 to August 2016.
Originality/value
This study develops the RCSNet hybrid model to tackle the challenge by combining both linear and non-linear models. New evidence has been obtained in predicting the movement of stock market prices.
Keywords
Citation
Zhao, Y. and Chen, Z. (2022), "Forecasting stock price movement: new evidence from a novel hybrid deep learning model", Journal of Asian Business and Economic Studies, Vol. 29 No. 2, pp. 91-104. https://doi.org/10.1108/JABES-05-2021-0061
Publisher
:Emerald Publishing Limited
Copyright © 2021, Yang Zhao and Zhonglu Chen
License
Licensed re-use rights only Published in Journal of Asian Business and Economic Studies. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode
1. Introduction
It is widely acknowledged that predicting stock price movement trend is a difficult financial problem. The relatively precise prediction of a stock's future price movement will maximize profits of investors. However, for the variability and instability of stock market, it is still an open question.
Generally, traditional stock forecast methods mainly include two categories, which are the technical analysis and the fundamental analysis. Fundamental analysis analyzes the company financial environment, operations, macroeconomic and microeconomic indicators to predict stock price. In fact, in recent years, as the massive growth of internet content, the development of natural language processing (NLP) technique enables investors to capture market movement trends online. However, the quality of online content of stock market is not guaranteed and could not exclude low-quality content and even including fake news and comments. That is to say, fundamental analysis based on methods are difficult to model. Therefore, in this work, the technical analysis methods is concerned.
Technical analysis methods based on historical time-series data. Stock price movement prediction is deemed as a time-series forecast problem. Generally, rather than the original stock forecast, stock price time-series data are abstracted into two components, which are a linear and non-linear. Hence, we could research both linear and non-linear stock forecasts. Classical linear time-series models, including the vector autoregression (VAR) model and autoregressive integrated moving average (ARIMA) model (Ltkepohl, 2005), have been proved effective for linear time-series forecast in many fields such as economic (Hamilton, 1989) and power price forecasting (Contreras et al., 2003), meanwhile have poor performance for non-linear forecast. On the other hand, traditional non-linear time forecast models, like support vector machine (SVM) (Hossain et al., 2009), back-propagation (BP) neural network (NN) (Maier and Dandy, 2000), specialize in describing the non-linear data, while performing worse in linear data and the long term in time-series forecast.
A successful stock time-series forecasting model should satisfy two points: Firstly, the model should be suited to both linear and non-linear data since the stock movement data contain both. Secondly, the model could capture multi-frequency patterns (short and long term) for accurate non-linear part predictions.
In recent years, based on the fast growth of the computing ability of computers and massive data, deep learning methodologies have been wildly adopted in speech recognition (Hinton et al., 2012), image classification, machine transition (Cho et al., 2014) and other area, which are composed of kinds of derivatives of artificial neural network (ANN), such as convolutional neural network (CNN) and recurrent neural network (RNN). CNN models perform outstanding image recognition performance by extracting local features at various granularity levels from input images. RNN (as shown in Figure 1(a)) is a type of non-linear model, and what is more, it has the ability to model long-term dependencies, which makes it is well suited to both non-linear data and long-term dependencies. However, long-term dependencies are hard to detect by traditional RNN, suffering from the gradient vanishing (Bengio et al., 2002) problem. To solve this problem, long–short-term memory (LSTM) units (as shown in Figure 1(b)) (Hochreiter and Schmidhuber, 1997) and the gated recurrent unit (GRU) (Chung et al., 2014) have achieved great success in various domains like computer version and NLP.
With the recent success of deep learning, we develop the residual-CNN-Seq2Seq (RCSNet) hybrid model to tackle the challenge by combining both linear and non-linear models. In Figure 2, the model consists of ARIMA layer, CNN layer, Seq2Seq LSTM layer and fully connected (FC) layer. Hence, the model is capable of forecasting both linear and non-linear time series and could also proceed the frequency patterns separately in the Seq2Seq LSTM layer. Overall, based on the above characteristics, we apply it to forecast the stock movement trend.
The rest of this work is planned as follows: In Section 2, we present the preliminary and related works. Section 3 provides the details of the RCSNet model proposed in this study. The comparison between the results of our model and traditional models is shown in Section 4. The conclusion and future work are displayed in Section 5.
2. Preliminary and related works
This study involves three groups model of research primarily, that is, linear models, non-linear models and hybrid models. Therefore, the traditional linear models, non-linear models and hybrid models for stock movement forecast are mentioned in this section.
2.1 Autoregressive model
ARIMA model is one of the most commonly used autoregressive models, which has been extensively studied (Hamilton, 1989; Contreras et al., 2003). These linear models can be applied effectively to forecast the behavior of economic and financial (Tsay, 2005; Sims, 1980). The ARIMA model is generally expressed as ARIMA (p, d, q), where p, d and q are parameters composed by non-negative integers. p is the set for the number of time lags of the autoregressive model (the order of AR terms); d denotes the order of differences, and q indicates the order of the moving-average model. The ARIMA model is formulated as:
2.2 Back-propagation neural network model
A back-propagation neural network (BPNN) is one of most essential ANNs proposed by Rumelhart et al. in the year 1986 (Rumelhart et al., 1988). It is widely used in prediction and forecasting. JZ Wang et al. have proposed the wavelet de-noising-based back-propagation (WDBP) NN (Wang et al., 2011) to predict the stock prices. M Gken et al. have provided a NN with selecting the most relevant technical indicators for stock market forecasting (Gken et al., 2016) in 2016. In this study, we assume that the BPNN is composed of non-linear layers with sigma function
2.3 Recurrent neural network model
RNN is a sort of ANN, which connects itself via sequential data and forms a directed circulation. This enables it to display dynamic temporal behavior. There are variety of types of RNN, including simple RNN, LSTM and so on. The RNN model is capable of capturing long-term and non-linear internal rule of the data with the gated memory cells. Hence, it could memorize the long-term content of time-series data. EW Saad et al. have exploited a type of time delay, recurrent and probabilistic NNs for stock price movement trend (Saad et al., 1998). CM Kuan et al. have researched the out-of-sample forecasting performance of RNNs via empirical foreign exchange rate data (Kuan and Liu, 1995). An adaptive “forget gate,” presented by F Gers, allows an LSTM cell to learn to rebuild itself at a suitable time, thereby releasing internal resources during forecasting (Gers et al., 2000).
2.3.1 Simple recurrent neural network model
A simple RNN model can deal with time series of inputs by utilizing its internal memory. The input is propagated in the manner of a standard feed-forward at each time step. The fixed back connections cause the context units to hold a copy of the previously hidden units values all the time. The equations are given by:
In the model, X =
2.3.2 Simple long–short-term memory model
The simple LSTM model is composed of LSTM units. There are a cell, an input gate, an output gate and a forget gate in an LSTM unit. The equations are given by:
2.3.3 Hybrid model
A hybrid model is a model that combines two or more base models of various kinds to reach a better model, which has been extensively researched. Ping Feng Pai et al. researched the hybrid model combining ARIMA with SVMs based on time-series data (Pai and Lin, 2005). They proposed a hybrid methodology to predict stock prices via taking advantage of the superiority of the ARIMA and SVM models. Ashu Jain et al. have developed a hybrid time-series NN model, which can make use of the advantages of traditional time-series methods and ANNs (Jain and Kumar, 2007). A hybridized framework of SVM with K-nearest neighbor approach for the Indian stock market indices prediction proposed by Nayak et al. (2015). FA Gers et al. designed a hybrid model that combined a time window-based MLP and LSTM for forecasting (Gers et al., 2001) in 2002. A time window-based MLP was primarily trained, then its weights were freezed, and finally, LSTM was employed to decrease the forecast error in this model.
As discussed above, overall, the related works ignore the following:
The stock time-series data have both linear and non-linear dependency component.
The non-linear component of stock time-series data contains long and short patterns.
Hence, we design a hybrid model for solving these problems to forecast the trend of the stock movement.
3. Residual-CNN-Seq2Seq model
This section discusses the details of the model for stock time-series prediction. RCSNet is comprised of the ARIMA, CNN, Seq2Seq LSTM and FC layers. The objective function and the optimization strategy also are discussed.
3.1 Problem statement and algorithm
The problem is typically described as follows: given a target series
RCSNet firstly extracts the linear dependence component by the ARIMA model. Then, the residual error time series (target data subtracts ARIMA's forecast output) is seen as the non-linear component. The CNN layer is used to extract short- and long-term trading patterns. After the CNN layer, the residual error time series of different patterns is predicted by the Seq2Seq layer to generate non-linear intermediate forecast results. Finally, the FC layer jointly outputs the final forecast results by using linear and non-linear intermediate results. Generally, the RCSNet can be described as below:
The linear model takes input
3.2 Autoregressive integrated moving average layer
RCSNet uses the ARIMA model as the linear filter. Seq2Seq LSTM model is trained by calculating the residual of the linear model (non-linear component). As Figure 3 shows, the ARIMA filter can distinguish the linear component from historical stock data series, and then we could obtain the non-linear data series.
3.3 Convolutional neural network layer
The second layer of RCSNet, designed to extract short and long patterns in the time dimension, is a convolutional network layer without pooling. The CNN layer is comprised of multiple filters of height w that is set to equal the number of variables. The k-th filter sweeps through the input matrix X. The long-term patterns reflect the trading frequency of season, month, while the short-term patterns express the trading frequency of week, day. Because of taking different kinds of trading frequency patterns into consideration, it is likely for us to forecast the time series precisely.
3.4 Seq2Seq long–short-term memory layer
Inspired by the success of machine translation (Cho et al., 2014), we have recognized the power of the Seq2Seq model in NLP. More specifically, two crucial components make up the standard Seq2Seq model, one is an encoder and the other is a decoder. The former maps the source input x to a vector representation, while the latter produces an output series based on the source. Both the encoder and decoder are LSTMs. By transmitting the last memory condition of the encoder to the decoder as the original memory condition, the encoder is capable of accessing information from the encoder. Input and output generally apply various LSTMs that possess their own compositional parameters to capture various compositional patterns. We apply a Seq2Seq LSTMs model to address the non-linear time-series forecasting issue as the third layer. Figure 4 shows the model constituted by an encoder and a decoder. In the encoder part, the input LSTM mechanism is used for inputting into series data. In the decoder part, an output LSTM mechanism is employed to decode the hidden states of encoder across all time steps before.
3.4.1 Encoder
The encoder module is substantially an LSTM. It can encode the input series into a characteristic representation. For example, given N input series X = (
3.4.2 Decoder
The decoder is a feed-forward neural language network model that generates the next data series based on previous generated data and encoder states. The decoder layer is trained to generate the next data series (intermediate predicted result) for the FC layer, given previous state of the encoder. Importantly, the decoder uses the hidden state vector from the encoder as initial state, where the decoder gets initial information from. Effectively, the decoder learns to generate targets
3.5 Fully connected layer
The FC layer has two hidden layers comprised of N rectified linear units (ReLUs) presented by Nair and Hinton (2010) in 2010. Each unit in the hidden layers is fully connected to the previous layer.
The layer obtains linear forecast data series from the ARIMA filter, and non-linear forecast data series from the Seq2Seq LSTM layer. Then, it jointly generates the final forecast results from the linear and non-linear intermediate output forecasts:
3.6 Objective function and optimization strategy
To adjust the parameters and evaluate the results, we adopted the squared error as the loss function to forecast. In our model, the corresponding optimization objective is formulated as, minimize
4. Experiments
In this section, we lead a comprehensive set of experiments and present the experimental details and compare our results with baselines.
4.1 Dataset
We choose S&P 500 indices as our dataset, which is an American stock market index involving the market capitalizations of 500 major corporations. It has ordinary shares listed on the NYSE or NASDAQ and can represent the international stock price indices. The dataset is downloaded from Yahoo Finance database, including ranges 16 years trading price movement from 2000–01–02 to 2016–12–07 and consists 4,262 indices. It includes daily close prices, as showed in Figure 5. The study separates training data into two consecutive segments: one (70%) is for training and the other (30%) is for testing.
4.2 Evaluation
This research adopts two estimate metrics to assess performance of different methodologies for stock movement time-series prediction. They are root mean squared error (RMSE) (Plutowski et al., 1996) and mean absolute error (MAE), which are two scale-dependent measures. Specifically, assuming
4.3 Training and testing
The training proceeds as follows: at first, the dataset is normalized with zero mean and unit variance between 0 and 1. When training, we set batch size as 90. Hence, one sample of the training data contains 90 days’ indices data. Next, we set ARIMA (p, d, q) as ARIMA (5, 1, 0), and use the ARIMA layer to extract linear component of the data as linear intermediate production. At the third step, the residual data are used for four CNN filters
The test proceeds are as follows: at the initial step, the test dataset (30% dataset) is also normalized as training data did. Then, to prove the effectiveness of the RCSNet model, we compare it with three baseline models, including ARIMA (5, 1, 0), BPNN (three layers) and simple LSTM (128 units). We perform one, three, seven and 14 days ahead prediction, respectively, for all the test variables using four models mentioned above, which means the prediction horizon (length of time steps) of h is 1, 3, 7, 14 for the four models.
4.4 Results
Firstly, we compare the results of various methods with one, three, seven and 14 days (length of time steps) ahead prediction. As Table 1 shows, when the time steps h is set as 1, the ARIMA model achieves MAE at 10.939 and MAPE at 15.108, which perform better than other methods. However, with the increasement of time step h, RCSNet performs better than the average performance of baseline models. For long time step tasks: when the time steps h is set as 3, RCSNet performs 81.17% promote than the average performance of baseline models. When the time step h is set as 7, RCSNet performs with 78.72% accuracy compared with the average performance of baseline models. When the time step h is set as 14, RCSNet improves 74.2% than the average performance of the baseline models. Obviously, we can conclude that RCSNet performs better, which is shown in Figure 6. In summary, the efficiency of our model has been clearly confirmed by those experiments.
5. Conclusion and future work
This study presents a novel framework, RCSNet model, for stock time-series prediction by combining the traditional linear and non-linear time-series models. Our model integrates the power of existing classical models, which exploits ARIMA to capture the linear time-series component, and then applies the CNN and Seq2Seq LSTM to handle the non-linear residual component. Finally, combining the two parts of intermediate results in generating the final forecast result. The paper's experiment results demonstrate the following three points:
The hybrid model is able to forecast both linear and non-linear time-series component of stock dataset.
CNN and Seq2Seq LSTMs can be effectively combined for dynamic modeling of short- and long-term-dependent patterns in non-linear time-series forecast.
The experimental results show our model that outperforms baseline models on S&P 500 index stock dataset from January 2000 to August 2016.
We will take into account the social media information that relates company's product and economic environment for future research and study how to filter “fake” information to make prediction not only by technical analysis but also by fundamental analysis using NLP tools. Besides, we will use time-series cross-validation rather than the simple data split for time-series experiments.
Figures
The results of one, three, seven, 14 days (length of time steps) ahead prediction with different methods
Model | Evaluation | Horizon1 = 1 | Horizon = 3 | Horizon = 7 | Horizon = 14 |
---|---|---|---|---|---|
ARIMA | MAE | 10.939 | 370.47 | 469.86 | 479.417 |
RMSE | 15.108 | 424.28 | 533.16 | 533.074 | |
BPNN | MAE | 64.089 | 320.27 | 536.42 | 730.397 |
RMSE | 71.477 | 330.50 | 555.47 | 760.644 | |
LSTM | MAE | 139.06 | 41.156 | 136.44 | 297.350 |
RMSE | 162.59 | 50.997 | 158.31 | 344.391 | |
RCSNet | MAE | 39.456 | 53.176 | 80.777 | 129.269 |
RMSE | 46.471 | 62.466 | 98.474 | 151.624 |
References
Bengio, Y., Simard, P. and Frasconi, P. (2002), “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, Vol. 5 No. 2, pp. 157-166, 2.
Cho, K., Merriënboer, B., Bahdanau, D. and Bengio, Y. (2014), “On the properties of neural machine translation: encoder-decoder approaches”, arXiv preprint arXiv:1409.1259.
Chung, J., Gulcehre, C., Cho, K. and Bengio, Y. (2014), “Empirical evaluation of gated recurrent neural networks on sequence modeling”, arXiv preprint arXiv:1412.3555.
Contreras, J., Espinola, R., Nogales, F.J. and Conejo, A.J. (2003), “Arima models to predict next-day electricity prices”, IEEE Transactions on Power Systems, Vol. 18 No. 3, pp. 1014-1020, 1, 2.
Gers, F., Schmidhuber, J. and Cummins, F. (2000), “Learning to forget: continual prediction with LSTM”, Neural Computation, Vol. 12 No. 10, pp. 2451-2471, 3.
Gers, F.A., Eck, D. and Schmidhuber, J. (2001), “Applying LSTM to time series predictable through time-window approaches”, International Conference on Artificial Neural Networks, pp. 669-676, 4.
Gken, M., Zalc, M., Boru, A. and Dosdoru, A.T. (2016), “Integrating metaheuristics and artificial neural networks for improved stock price prediction”, Expert Systems with Applications, Vol. 44 No. C, pp. 320-331, 2.
Hamilton, J.D. (1989), “A new approach to the economic analysis of nonstationary time series and the business cycle”, Econometrica, Vol. 57 No. 2, pp. 357-384, 1, 2.
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P. and Sainath, T.N. (2012), “Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups”, IEEE Signal Processing Magazine, Vol. 29 No. 6, pp. 82-97, 2.
Hochreiter, S. and Schmidhuber, J. (1997), “Long short-term memory”, Neural Computation, Vol. 9 No. 8, pp. 1735-1780, 2.
Hossain, A., Zaman, F., Nasser, M. and Mufakhkharul Islam, M. (2009), “Comparison of Garch, neural network and support vector machine in financial time series prediction”, Lecture Notes in Computer Science, Vol. 5909, pp. 597-602, 1.
Jain, A. and Kumar, A.M. (2007), “Hybrid neural network models for hydrologic time series forecasting”, Applied Soft Computing, Vol. 7 No. 2, pp. 585-592, 4.
Kingma, D.P. and Ba, J. (2014), “Adam: a method for stochastic optimization”, arXiv preprint arXiv:1412.6980.
Kuan, C.M. and Liu, T. (1995), “Forecasting exchange rates using feedforward and recurrent neural networks”, Journal of Applied Econometrics, Vol. 10 No. 4, pp. 347-364, 3.
Ltkepohl, H. (2005), New Introduction to Multiple Time Series Analysis, Springer Science & Business Media.
Maier, H.R. and Dandy, G.C. (2000), “Neural networks for the prediction and forecasting of water resources variables: a review of modelling issues and applications”, Environmental Modelling Software, Vol. 15 No. 1, pp. 101-124, 1.
Nair, V. and Hinton, G.E. (2010), “Rectified linear units improve restricted Boltzmann machines”, International Conference on International Conference on Machine Learning, pp. 807-814, 7.
Nayak, R.K., Mishra, D. and Rath, A.K. (2015), “A Naïve SVM-KNN based stock market trend reversal analysis for Indian benchmark indices”, Applied Soft Computing, Vol. 35, pp. 670-680.
Pai, P.F. and Lin, C.S. (2005), “A hybrid arima and support vector machines model in stock price forecasting”, Omega, Vol. 33 No. 6, pp. 4974-505.
Plutowski, M., Cottrell, G. and White, H. (1996), “Experience with selecting exemplars from clean data”, Neural Networks, Vol. 9 No. 2, pp. 273-294, 8.
Rumelhart, D.E., Hinton, G.E. and Williams, R.J. (1988), Learning Internal Representations by Error Propagation, Institute for Cognitive Science, University of California, San Diego.
Saad, E.W., Prokhorov, D.V. and Wunsch, D.C. (1998), “Comparative study of stock trend prediction using time delay, recurrent and probabilistic neural networks”, IEEE Transactions on Neural Networks, Vol. 9 No. 6, pp. 1456-70, 3.
Sims, C.A. (1980), “Macroeconomics and reality”, Econometrica, Vol. 48 No. 1, pp. 1-48, 2.
Tsay, R.S. (2005), Analysis of Financial Time Series, John wiley & sons, Vol. 543.
Wang, J.Z., Wang, J.J., Zhang, Z.G. and Guo, S.P. (2011), “Forecasting stock indices with back propagation neural network”, Expert Systems with Applications, Vol. 38 No. 11, pp. 14346-14355, 2.
Further reading
Ardalani-Farsa, M. and Zolfaghari, S. (2010), “Chaotic time series prediction with residual analysis method using hybrid elmannarx neural networks”, Neurocomputing, Vol. 73 No. 1315, pp. 2540-2553.
Greff, K., Srivastava, R.K., Koutnik, J., Steunebrink, B.R. and Schmidhuber, J. (2017), “LSTM: a search space odyssey”, IEEE Transactions on Neural Networks Learning Systems, Vol. 28 No. 10, pp. 2222-2232.
Kaastra, I. and Boyd, M. (1996), “Designing a neural network for forecasting financial and economic time series”, Neurocomputing, Vol. 10 No. 3, pp. 215-236.
Lipton, Z.C., Berkowitz, J. and Elkan, C. (2015), “A critical review of recurrent neural networks for sequence learning”, Computer Science.
Makridakis, S. and Hibon, M. (1997), “ARMA models and the Box -Jenkins methodology”, Journal of Forecasting, Vol. 16 No. 3, pp. 147-163.
Rather, A.M., Agarwal, A. and Sastry, V.N. (2015), “Recurrent neural network and a hybrid model for prediction of stock returns”, Expert Systems with Applications, Vol. 42 No. 6, pp. 3234-3241.
Zhang, G.P. (2003), “Time series forecasting using a hybrid arima and neural network model”, Neurocomputing, Vol. 50 No. 1, pp. 159-175.