• Open access
  • Published: 28 August 2020

Short-term stock market price trend prediction using a comprehensive deep learning system

  • Jingyi Shen 1 &
  • M. Omair Shafiq   ORCID: orcid.org/0000-0002-1859-8296 1  

Journal of Big Data volume  7 , Article number:  66 ( 2020 ) Cite this article

266k Accesses

161 Citations

91 Altmetric

Metrics details

In the era of big data, deep learning for predicting stock market prices and trends has become even more popular than before. We collected 2 years of data from Chinese stock market and proposed a comprehensive customization of feature engineering and deep learning-based model for predicting price trend of stock markets. The proposed solution is comprehensive as it includes pre-processing of the stock market dataset, utilization of multiple feature engineering techniques, combined with a customized deep learning based system for stock market price trend prediction. We conducted comprehensive evaluations on frequently used machine learning models and conclude that our proposed solution outperforms due to the comprehensive feature engineering that we built. The system achieves overall high accuracy for stock market trend prediction. With the detailed design and evaluation of prediction term lengths, feature engineering, and data pre-processing methods, this work contributes to the stock analysis research community both in the financial and technical domains.

Introduction

Stock market is one of the major fields that investors are dedicated to, thus stock market price trend prediction is always a hot topic for researchers from both financial and technical domains. In this research, our objective is to build a state-of-art prediction model for price trend prediction, which focuses on short-term price trend prediction.

As concluded by Fama in [ 26 ], financial time series prediction is known to be a notoriously difficult task due to the generally accepted, semi-strong form of market efficiency and the high level of noise. Back in 2003, Wang et al. in [ 44 ] already applied artificial neural networks on stock market price prediction and focused on volume, as a specific feature of stock market. One of the key findings by them was that the volume was not found to be effective in improving the forecasting performance on the datasets they used, which was S&P 500 and DJI. Ince and Trafalis in [ 15 ] targeted short-term forecasting and applied support vector machine (SVM) model on the stock price prediction. Their main contribution is performing a comparison between multi-layer perceptron (MLP) and SVM then found that most of the scenarios SVM outperformed MLP, while the result was also affected by different trading strategies. In the meantime, researchers from financial domains were applying conventional statistical methods and signal processing techniques on analyzing stock market data.

The optimization techniques, such as principal component analysis (PCA) were also applied in short-term stock price prediction [ 22 ]. During the years, researchers are not only focused on stock price-related analysis but also tried to analyze stock market transactions such as volume burst risks, which expands the stock market analysis research domain broader and indicates this research domain still has high potential [ 39 ]. As the artificial intelligence techniques evolved in recent years, many proposed solutions attempted to combine machine learning and deep learning techniques based on previous approaches, and then proposed new metrics that serve as training features such as Liu and Wang [ 23 ]. This type of previous works belongs to the feature engineering domain and can be considered as the inspiration of feature extension ideas in our research. Liu et al. in [ 24 ] proposed a convolutional neural network (CNN) as well as a long short-term memory (LSTM) neural network based model to analyze different quantitative strategies in stock markets. The CNN serves for the stock selection strategy, automatically extracts features based on quantitative data, then follows an LSTM to preserve the time-series features for improving profits.

The latest work also proposes a similar hybrid neural network architecture, integrating a convolutional neural network with a bidirectional long short-term memory to predict the stock market index [ 4 ]. While the researchers frequently proposed different neural network solution architectures, it brought further discussions about the topic if the high cost of training such models is worth the result or not.

There are three key contributions of our work (1) a new dataset extracted and cleansed (2) a comprehensive feature engineering, and (3) a customized long short-term memory (LSTM) based deep learning model.

We have built the dataset by ourselves from the data source as an open-sourced data API called Tushare [ 43 ]. The novelty of our proposed solution is that we proposed a feature engineering along with a fine-tuned system instead of just an LSTM model only. We observe from the previous works and find the gaps and proposed a solution architecture with a comprehensive feature engineering procedure before training the prediction model. With the success of feature extension method collaborating with recursive feature elimination algorithms, it opens doors for many other machine learning algorithms to achieve high accuracy scores for short-term price trend prediction. It proved the effectiveness of our proposed feature extension as feature engineering. We further introduced our customized LSTM model and further improved the prediction scores in all the evaluation metrics. The proposed solution outperformed the machine learning and deep learning-based models in similar previous works.

The remainder of this paper is organized as follows. “ Survey of related works ” section describes the survey of related works. “ The dataset ” section provides details on the data that we extracted from the public data sources and the dataset prepared. “ Methods ” section presents the research problems, methods, and design of the proposed solution. Detailed technical design with algorithms and how the model implemented are also included in this section. “ Results ” section presents comprehensive results and evaluation of our proposed model, and by comparing it with the models used in most of the related works. “ Discussion ” section provides a discussion and comparison of the results. “ Conclusion ” section presents the conclusion. This research paper has been built based on Shen [ 36 ].

Survey of related works

In this section, we discuss related works. We reviewed the related work in two different domains: technical and financial, respectively.

Kim and Han in [ 19 ] built a model as a combination of artificial neural networks (ANN) and genetic algorithms (GAs) with discretization of features for predicting stock price index. The data used in their study include the technical indicators as well as the direction of change in the daily Korea stock price index (KOSPI). They used the data containing samples of 2928 trading days, ranging from January 1989 to December 1998, and give their selected features and formulas. They also applied optimization of feature discretization, as a technique that is similar to dimensionality reduction. The strengths of their work are that they introduced GA to optimize the ANN. First, the amount of input features and processing elements in the hidden layer are 12 and not adjustable. Another limitation is in the learning process of ANN, and the authors only focused on two factors in optimization. While they still believed that GA has great potential for feature discretization optimization. Our initialized feature pool refers to the selected features. Qiu and Song in [ 34 ] also presented a solution to predict the direction of the Japanese stock market based on an optimized artificial neural network model. In this work, authors utilize genetic algorithms together with artificial neural network based models, and name it as a hybrid GA-ANN model.

Piramuthu in [ 33 ] conducted a thorough evaluation of different feature selection methods for data mining applications. He used for datasets, which were credit approval data, loan defaults data, web traffic data, tam, and kiang data, and compared how different feature selection methods optimized decision tree performance. The feature selection methods he compared included probabilistic distance measure: the Bhattacharyya measure, the Matusita measure, the divergence measure, the Mahalanobis distance measure, and the Patrick-Fisher measure. For inter-class distance measures: the Minkowski distance measure, city block distance measure, Euclidean distance measure, the Chebychev distance measure, and the nonlinear (Parzen and hyper-spherical kernel) distance measure. The strength of this paper is that the author evaluated both probabilistic distance-based and several inter-class feature selection methods. Besides, the author performed the evaluation based on different datasets, which reinforced the strength of this paper. However, the evaluation algorithm was a decision tree only. We cannot conclude if the feature selection methods will still perform the same on a larger dataset or a more complex model.

Hassan and Nath in [ 9 ] applied the Hidden Markov Model (HMM) on the stock market forecasting on stock prices of four different Airlines. They reduce states of the model into four states: the opening price, closing price, the highest price, and the lowest price. The strong point of this paper is that the approach does not need expert knowledge to build a prediction model. While this work is limited within the industry of Airlines and evaluated on a very small dataset, it may not lead to a prediction model with generality. One of the approaches in stock market prediction related works could be exploited to do the comparison work. The authors selected a maximum 2 years as the date range of training and testing dataset, which provided us a date range reference for our evaluation part.

Lei in [ 21 ] exploited Wavelet Neural Network (WNN) to predict stock price trends. The author also applied Rough Set (RS) for attribute reduction as an optimization. Rough Set was utilized to reduce the stock price trend feature dimensions. It was also used to determine the structure of the Wavelet Neural Network. The dataset of this work consists of five well-known stock market indices, i.e., (1) SSE Composite Index (China), (2) CSI 300 Index (China), (3) All Ordinaries Index (Australian), (4) Nikkei 225 Index (Japan), and (5) Dow Jones Index (USA). Evaluation of the model was based on different stock market indices, and the result was convincing with generality. By using Rough Set for optimizing the feature dimension before processing reduces the computational complexity. However, the author only stressed the parameter adjustment in the discussion part but did not specify the weakness of the model itself. Meanwhile, we also found that the evaluations were performed on indices, the same model may not have the same performance if applied on a specific stock.

Lee in [ 20 ] used the support vector machine (SVM) along with a hybrid feature selection method to carry out prediction of stock trends. The dataset in this research is a sub dataset of NASDAQ Index in Taiwan Economic Journal Database (TEJD) in 2008. The feature selection part was using a hybrid method, supported sequential forward search (SSFS) played the role of the wrapper. Another advantage of this work is that they designed a detailed procedure of parameter adjustment with performance under different parameter values. The clear structure of the feature selection model is also heuristic to the primary stage of model structuring. One of the limitations was that the performance of SVM was compared to back-propagation neural network (BPNN) only and did not compare to the other machine learning algorithms.

Sirignano and Cont leveraged a deep learning solution trained on a universal feature set of financial markets in [ 40 ]. The dataset used included buy and sell records of all transactions, and cancellations of orders for approximately 1000 NASDAQ stocks through the order book of the stock exchange. The NN consists of three layers with LSTM units and a feed-forward layer with rectified linear units (ReLUs) at last, with stochastic gradient descent (SGD) algorithm as an optimization. Their universal model was able to generalize and cover the stocks other than the ones in the training data. Though they mentioned the advantages of a universal model, the training cost was still expensive. Meanwhile, due to the inexplicit programming of the deep learning algorithm, it is unclear that if there are useless features contaminated when feeding the data into the model. Authors found out that it would have been better if they performed feature selection part before training the model and found it as an effective way to reduce the computational complexity.

Ni et al. in [ 30 ] predicted stock price trends by exploiting SVM and performed fractal feature selection for optimization. The dataset they used is the Shanghai Stock Exchange Composite Index (SSECI), with 19 technical indicators as features. Before processing the data, they optimized the input data by performing feature selection. When finding the best parameter combination, they also used a grid search method, which is k cross-validation. Besides, the evaluation of different feature selection methods is also comprehensive. As the authors mentioned in their conclusion part, they only considered the technical indicators but not macro and micro factors in the financial domain. The source of datasets that the authors used was similar to our dataset, which makes their evaluation results useful to our research. They also mentioned a method called k cross-validation when testing hyper-parameter combinations.

McNally et al. in [ 27 ] leveraged RNN and LSTM on predicting the price of Bitcoin, optimized by using the Boruta algorithm for feature engineering part, and it works similarly to the random forest classifier. Besides feature selection, they also used Bayesian optimization to select LSTM parameters. The Bitcoin dataset ranged from the 19th of August 2013 to 19th of July 2016. Used multiple optimization methods to improve the performance of deep learning methods. The primary problem of their work is overfitting. The research problem of predicting Bitcoin price trend has some similarities with stock market price prediction. Hidden features and noises embedded in the price data are threats of this work. The authors treated the research question as a time sequence problem. The best part of this paper is the feature engineering and optimization part; we could replicate the methods they exploited in our data pre-processing.

Weng et al. in [ 45 ] focused on short-term stock price prediction by using ensemble methods of four well-known machine learning models. The dataset for this research is five sets of data. They obtained these datasets from three open-sourced APIs and an R package named TTR. The machine learning models they used are (1) neural network regression ensemble (NNRE), (2) a Random Forest with unpruned regression trees as base learners (RFR), (3) AdaBoost with unpruned regression trees as base learners (BRT) and (4) a support vector regression ensemble (SVRE). A thorough study of ensemble methods specified for short-term stock price prediction. With background knowledge, the authors selected eight technical indicators in this study then performed a thoughtful evaluation of five datasets. The primary contribution of this paper is that they developed a platform for investors using R, which does not need users to input their own data but call API to fetch the data from online source straightforward. From the research perspective, they only evaluated the prediction of the price for 1 up to 10 days ahead but did not evaluate longer terms than two trading weeks or a shorter term than 1 day. The primary limitation of their research was that they only analyzed 20 U.S.-based stocks, the model might not be generalized to other stock market or need further revalidation to see if it suffered from overfitting problems.

Kara et al. in [ 17 ] also exploited ANN and SVM in predicting the movement of stock price index. The data set they used covers a time period from January 2, 1997, to December 31, 2007, of the Istanbul Stock Exchange. The primary strength of this work is its detailed record of parameter adjustment procedures. While the weaknesses of this work are that neither the technical indicator nor the model structure has novelty, and the authors did not explain how their model performed better than other models in previous works. Thus, more validation works on other datasets would help. They explained how ANN and SVM work with stock market features, also recorded the parameter adjustment. The implementation part of our research could benefit from this previous work.

Jeon et al. in [ 16 ] performed research on millisecond interval-based big dataset by using pattern graph tracking to complete stock price prediction tasks. The dataset they used is a millisecond interval-based big dataset of historical stock data from KOSCOM, from August 2014 to October 2014, 10G–15G capacity. The author applied Euclidean distance, Dynamic Time Warping (DTW) for pattern recognition. For feature selection, they used stepwise regression. The authors completed the prediction task by ANN and Hadoop and RHive for big data processing. The “ Results ” section is based on the result processed by a combination of SAX and Jaro–Winkler distance. Before processing the data, they generated aggregated data at 5-min intervals from discrete data. The primary strength of this work is the explicit structure of the whole implementation procedure. While they exploited a relatively old model, another weakness is the overall time span of the training dataset is extremely short. It is difficult to access the millisecond interval-based data in real life, so the model is not as practical as a daily based data model.

Huang et al. in [ 12 ] applied a fuzzy-GA model to complete the stock selection task. They used the key stocks of the 200 largest market capitalization listed as the investment universe in the Taiwan Stock Exchange. Besides, the yearly financial statement data and the stock returns were taken from the Taiwan Economic Journal (TEJ) database at www.tej.com.tw/ for the time period from year 1995 to year 2009. They conducted the fuzzy membership function with model parameters optimized with GA and extracted features for optimizing stock scoring. The authors proposed an optimized model for selection and scoring of stocks. Different from the prediction model, the authors more focused on stock rankings, selection, and performance evaluation. Their structure is more practical among investors. But in the model validation part, they did not compare the model with existed algorithms but the statistics of the benchmark, which made it challenging to identify if GA would outperform other algorithms.

Fischer and Krauss in [ 5 ] applied long short-term memory (LSTM) on financial market prediction. The dataset they used is S&P 500 index constituents from Thomson Reuters. They obtained all month-end constituent lists for the S&P 500 from Dec 1989 to Sep 2015, then consolidated the lists into a binary matrix to eliminate survivor bias. The authors also used RMSprop as an optimizer, which is a mini-batch version of rprop. The primary strength of this work is that the authors used the latest deep learning technique to perform predictions. They relied on the LSTM technique, lack of background knowledge in the financial domain. Although the LSTM outperformed the standard DNN and logistic regression algorithms, while the author did not mention the effort to train an LSTM with long-time dependencies.

Tsai and Hsiao in [ 42 ] proposed a solution as a combination of different feature selection methods for prediction of stocks. They used Taiwan Economic Journal (TEJ) database as data source. The data used in their analysis was from year 2000 to 2007. In their work, they used a sliding window method and combined it with multi layer perceptron (MLP) based artificial neural networks with back propagation, as their prediction model. In their work, they also applied principal component analysis (PCA) for dimensionality reduction, genetic algorithms (GA) and the classification and regression trees (CART) to select important features. They did not just rely on technical indices only. Instead, they also included both fundamental and macroeconomic indices in their analysis. The authors also reported a comparison on feature selection methods. The validation part was done by combining the model performance stats with statistical analysis.

Pimenta et al. in [ 32 ] leveraged an automated investing method by using multi-objective genetic programming and applied it in the stock market. The dataset was obtained from Brazilian stock exchange market (BOVESPA), and the primary techniques they exploited were a combination of multi-objective optimization, genetic programming, and technical trading rules. For optimization, they leveraged genetic programming (GP) to optimize decision rules. The novelty of this paper was in the evaluation part. They included a historical period, which was a critical moment of Brazilian politics and economics when performing validation. This approach reinforced the generalization strength of their proposed model. When selecting the sub-dataset for evaluation, they also set criteria to ensure more asset liquidity. While the baseline of the comparison was too basic and fundamental, and the authors did not perform any comparison with other existing models.

Huang and Tsai in [ 13 ] conducted a filter-based feature selection assembled with a hybrid self-organizing feature map (SOFM) support vector regression (SVR) model to forecast Taiwan index futures (FITX) trend. They divided the training samples into clusters to marginally improve the training efficiency. The authors proposed a comprehensive model, which was a combination of two novel machine learning techniques in stock market analysis. Besides, the optimizer of feature selection was also applied before the data processing to improve the prediction accuracy and reduce the computational complexity of processing daily stock index data. Though they optimized the feature selection part and split the sample data into small clusters, it was already strenuous to train daily stock index data of this model. It would be difficult for this model to predict trading activities in shorter time intervals since the data volume would be increased drastically. Moreover, the evaluation is not strong enough since they set a single SVR model as a baseline, but did not compare the performance with other previous works, which caused difficulty for future researchers to identify the advantages of SOFM-SVR model why it outperforms other algorithms.

Thakur and Kumar in [ 41 ] also developed a hybrid financial trading support system by exploiting multi-category classifiers and random forest (RAF). They conducted their research on stock indices from NASDAQ, DOW JONES, S&P 500, NIFTY 50, and NIFTY BANK. The authors proposed a hybrid model combined random forest (RF) algorithms with a weighted multicategory generalized eigenvalue support vector machine (WMGEPSVM) to generate “Buy/Hold/Sell” signals. Before processing the data, they used Random Forest (RF) for feature pruning. The authors proposed a practical model designed for real-life investment activities, which could generate three basic signals for investors to refer to. They also performed a thorough comparison of related algorithms. While they did not mention the time and computational complexity of their works. Meanwhile, the unignorable issue of their work was the lack of financial domain knowledge background. The investors regard the indices data as one of the attributes but could not take the signal from indices to operate a specific stock straightforward.

Hsu in [ 11 ] assembled feature selection with a back propagation neural network (BNN) combined with genetic programming to predict the stock/futures price. The dataset in this research was obtained from Taiwan Stock Exchange Corporation (TWSE). The authors have introduced the description of the background knowledge in detail. While the weakness of their work is that it is a lack of data set description. This is a combination of the model proposed by other previous works. Though we did not see the novelty of this work, we can still conclude that the genetic programming (GP) algorithm is admitted in stock market research domain. To reinforce the validation strengths, it would be good to consider adding GP models into evaluation if the model is predicting a specific price.

Hafezi et al. in [ 7 ] built a bat-neural network multi-agent system (BN-NMAS) to predict stock price. The dataset was obtained from the Deutsche bundes-bank. They also applied the Bat algorithm (BA) for optimizing neural network weights. The authors illustrated their overall structure and logic of system design in clear flowcharts. While there were very few previous works that had performed on DAX data, it would be difficult to recognize if the model they proposed still has the generality if migrated on other datasets. The system design and feature selection logic are fascinating, which worth referring to. Their findings in optimization algorithms are also valuable for the research in the stock market price prediction research domain. It is worth trying the Bat algorithm (BA) when constructing neural network models.

Long et al. in [ 25 ] conducted a deep learning approach to predict the stock price movement. The dataset they used is the Chinese stock market index CSI 300. For predicting the stock price movement, they constructed a multi-filter neural network (MFNN) with stochastic gradient descent (SGD) and back propagation optimizer for learning NN parameters. The strength of this paper is that the authors exploited a novel model with a hybrid model constructed by different kinds of neural networks, it provides an inspiration for constructing hybrid neural network structures.

Atsalakis and Valavanis in [ 1 ] proposed a solution of a neuro-fuzzy system, which is composed of controller named as Adaptive Neuro Fuzzy Inference System (ANFIS), to achieve short-term stock price trend prediction. The noticeable strength of this work is the evaluation part. Not only did they compare their proposed system with the popular data models, but also compared with investment strategies. While the weakness that we found from their proposed solution is that their solution architecture is lack of optimization part, which might limit their model performance. Since our proposed solution is also focusing on short-term stock price trend prediction, this work is heuristic for our system design. Meanwhile, by comparing with the popular trading strategies from investors, their work inspired us to compare the strategies used by investors with techniques used by researchers.

Nekoeiqachkanloo et al. in [ 29 ] proposed a system with two different approaches for stock investment. The strengths of their proposed solution are obvious. First, it is a comprehensive system that consists of data pre-processing and two different algorithms to suggest the best investment portions. Second, the system also embedded with a forecasting component, which also retains the features of the time series. Last but not least, their input features are a mix of fundamental features and technical indices that aim to fill in the gap between the financial domain and technical domain. However, their work has a weakness in the evaluation part. Instead of evaluating the proposed system on a large dataset, they chose 25 well-known stocks. There is a high possibility that the well-known stocks might potentially share some common hidden features.

As another related latest work, Idrees et al. [ 14 ] published a time series-based prediction approach for the volatility of the stock market. ARIMA is not a new approach in the time series prediction research domain. Their work is more focusing on the feature engineering side. Before feeding the features into ARIMA models, they designed three steps for feature engineering: Analyze the time series, identify if the time series is stationary or not, perform estimation by plot ACF and PACF charts and look for parameters. The only weakness of their proposed solution is that the authors did not perform any customization on the existing ARIMA model, which might limit the system performance to be improved.

One of the main weaknesses found in the related works is limited data-preprocessing mechanisms built and used. Technical works mostly tend to focus on building prediction models. When they select the features, they list all the features mentioned in previous works and go through the feature selection algorithm then select the best-voted features. Related works in the investment domain have shown more interest in behavior analysis, such as how herding behaviors affect the stock performance, or how the percentage of inside directors hold the firm’s common stock affects the performance of a certain stock. These behaviors often need a pre-processing procedure of standard technical indices and investment experience to recognize.

In the related works, often a thorough statistical analysis is performed based on a special dataset and conclude new features rather than performing feature selections. Some data, such as the percentage of a certain index fluctuation has been proven to be effective on stock performance. We believe that by extracting new features from data, then combining such features with existed common technical indices will significantly benefit the existing and well-tested prediction models.

The dataset

This section details the data that was extracted from the public data sources, and the final dataset that was prepared. Stock market-related data are diverse, so we first compared the related works from the survey of financial research works in stock market data analysis to specify the data collection directions. After collecting the data, we defined a data structure of the dataset. Given below, we describe the dataset in detail, including the data structure, and data tables in each category of data with the segment definitions.

Description of our dataset

In this section, we will describe the dataset in detail. This dataset consists of 3558 stocks from the Chinese stock market. Besides the daily price data, daily fundamental data of each stock ID, we also collected the suspending and resuming history, top 10 shareholders, etc. We list two reasons that we choose 2 years as the time span of this dataset: (1) most of the investors perform stock market price trend analysis using the data within the latest 2 years, (2) using more recent data would benefit the analysis result. We collected data through the open-sourced API, namely Tushare [ 43 ], mean-while we also leveraged a web-scraping technique to collect data from Sina Finance web pages, SWS Research website.

Data structure

Figure  1 illustrates all the data tables in the dataset. We collected four categories of data in this dataset: (1) basic data, (2) trading data, (3) finance data, and (4) other reference data. All the data tables can be linked to each other by a common field called “Stock ID” It is a unique stock identifier registered in the Chinese Stock market. Table  1 shows an overview of the dataset.

figure 1

Data structure for the extracted dataset

The Table  1 lists the field information of each data table as well as which category the data table belongs to.

In this section, we present the proposed methods and the design of the proposed solution. Moreover, we also introduce the architecture design as well as algorithmic and implementation details.

Problem statement

We analyzed the best possible approach for predicting short-term price trends from different aspects: feature engineering, financial domain knowledge, and prediction algorithm. Then we addressed three research questions in each aspect, respectively: How can feature engineering benefit model prediction accuracy? How do findings from the financial domain benefit prediction model design? And what is the best algorithm for predicting short-term price trends?

The first research question is about feature engineering. We would like to know how the feature selection method benefits the performance of prediction models. From the abundance of the previous works, we can conclude that stock price data embedded with a high level of noise, and there are also correlations between features, which makes the price prediction notoriously difficult. That is also the primary reason for most of the previous works introduced the feature engineering part as an optimization module.

The second research question is evaluating the effectiveness of findings we extracted from the financial domain. Different from the previous works, besides the common evaluation of data models such as the training costs and scores, our evaluation will emphasize the effectiveness of newly added features that we extracted from the financial domain. We introduce some features from the financial domain. While we only obtained some specific findings from previous works, and the related raw data needs to be processed into usable features. After extracting related features from the financial domain, we combine the features with other common technical indices for voting out the features with a higher impact. There are numerous features said to be effective from the financial domain, and it would be impossible for us to cover all of them. Thus, how to appropriately convert the findings from the financial domain to a data processing module of our system design is a hidden research question that we attempt to answer.

The third research question is that which algorithms are we going to model our data? From the previous works, researchers have been putting efforts into the exact price prediction. We decompose the problem into predicting the trend and then the exact number. This paper focuses on the first step. Hence, the objective has been converted to resolve a binary classification problem, meanwhile, finding an effective way to eliminate the negative effect brought by the high level of noise. Our approach is to decompose the complex problem into sub-problems which have fewer dependencies and resolve them one by one, and then compile the resolutions into an ensemble model as an aiding system for investing behavior reference.

In the previous works, researchers have been using a variety of models for predicting stock price trends. While most of the best-performed models are based on machine learning techniques, in this work, we will compare our approach with the outperformed machine learning models in the evaluation part and find the solution for this research question.

Proposed solution

The high-level architecture of our proposed solution could be separated into three parts. First is the feature selection part, to guarantee the selected features are highly effective. Second, we look into the data and perform the dimensionality reduction. And the last part, which is the main contribution of our work is to build a prediction model of target stocks. Figure  2 depicts a high-level architecture of the proposed solution.

figure 2

High-level architecture of the proposed solution

There are ways to classify different categories of stocks. Some investors prefer long-term investments, while others show more interest in short-term investments. It is common to see the stock-related reports showing an average performance, while the stock price is increasing drastically; this is one of the phenomena that indicate the stock price prediction has no fixed rules, thus finding effective features before training a model on data is necessary.

In this research, we focus on the short-term price trend prediction. Currently, we only have the raw data with no labels. So, the very first step is to label the data. We mark the price trend by comparing the current closing price with the closing price of n trading days ago, the range of n is from 1 to 10 since our research is focusing on the short-term. If the price trend goes up, we mark it as 1 or mark as 0 in the opposite case. To be more specified, we use the indices from the indices of n  −  1 th day to predict the price trend of the n th day.

According to the previous works, some researchers who applied both financial domain knowledge and technical methods on stock data were using rules to filter the high-quality stocks. We referred to their works and exploited their rules to contribute to our feature extension design.

However, to ensure the best performance of the prediction model, we will look into the data first. There are a large number of features in the raw data; if we involve all the features into our consideration, it will not only drastically increase the computational complexity but will also cause side effects if we would like to perform unsupervised learning in further research. So, we leverage the recursive feature elimination (RFE) to ensure all the selected features are effective.

We found most of the previous works in the technical domain were analyzing all the stocks, while in the financial domain, researchers prefer to analyze the specific scenario of investment, to fill the gap between the two domains, we decide to apply a feature extension based on the findings we gathered from the financial domain before we start the RFE procedure.

Since we plan to model the data into time series, the number of the features, the more complex the training procedure will be. So, we will leverage the dimensionality reduction by using randomized PCA at the beginning of our proposed solution architecture.

Detailed technical design elaboration

This section provides an elaboration of the detailed technical design as being a comprehensive solution based on utilizing, combining, and customizing several existing data preprocessing, feature engineering, and deep learning techniques. Figure  3 provides the detailed technical design from data processing to prediction, including the data exploration. We split the content by main procedures, and each procedure contains algorithmic steps. Algorithmic details are elaborated in the next section. The contents of this section will focus on illustrating the data workflow.

figure 3

Detailed technical design of the proposed solution

Based on the literature review, we select the most commonly used technical indices and then feed them into the feature extension procedure to get the expanded feature set. We will select the most effective i features from the expanded feature set. Then we will feed the data with i selected features into the PCA algorithm to reduce the dimension into j features. After we get the best combination of i and j , we process the data into finalized the feature set and feed them into the LSTM [ 10 ] model to get the price trend prediction result.

The novelty of our proposed solution is that we will not only apply the technical method on raw data but also carry out the feature extensions that are used among stock market investors. Details on feature extension are given in the next subsection. Experiences gained from applying and optimizing deep learning based solutions in [ 37 , 38 ] were taken into account while designing and customizing feature engineering and deep learning solution in this work.

Applying feature extension

The first main procedure in Fig.  3 is the feature extension. In this block, the input data is the most commonly used technical indices concluded from related works. The three feature extension methods are max–min scaling, polarizing, and calculating fluctuation percentage. Not all the technical indices are applicable for all three of the feature extension methods; this procedure only applies the meaningful extension methods on technical indices. We choose meaningful extension methods while looking at how the indices are calculated. The technical indices and the corresponding feature extension methods are illustrated in Table  2 .

After the feature extension procedure, the expanded features will be combined with the most commonly used technical indices, i.e., input data with output data, and feed into RFE block as input data in the next step.

Applying recursive feature elimination

After the feature extension above, we explore the most effective i features by using the Recursive Feature Elimination (RFE) algorithm [ 6 ]. We estimate all the features by two attributes, coefficient, and feature importance. We also limit the features that remove from the pool by one, which means we will remove one feature at each step and retain all the relevant features. Then the output of the RFE block will be the input of the next step, which refers to PCA.

Applying principal component analysis (PCA)

The very first step before leveraging PCA is feature pre-processing. Because some of the features after RFE are percentage data, while others are very large numbers, i.e., the output from RFE are in different units. It will affect the principal component extraction result. Thus, before feeding the data into the PCA algorithm [ 8 ], a feature pre-processing is necessary. We also illustrate the effectiveness and methods comparison in “ Results ” section.

After performing feature pre-processing, the next step is to feed the processed data with selected i features into the PCA algorithm to reduce the feature matrix scale into j features. This step is to retain as many effective features as possible and meanwhile eliminate the computational complexity of training the model. This research work also evaluates the best combination of i and j, which has relatively better prediction accuracy, meanwhile, cuts the computational consumption. The result can be found in the “ Results ” section, as well. After the PCA step, the system will get a reshaped matrix with j columns.

Fitting long short-term memory (LSTM) model

PCA reduced the dimensions of the input data, while the data pre-processing is mandatory before feeding the data into the LSTM layer. The reason for adding the data pre-processing step before the LSTM model is that the input matrix formed by principal components has no time steps. While one of the most important parameters of training an LSTM is the number of time steps. Hence, we have to model the matrix into corresponding time steps for both training and testing dataset.

After performing the data pre-processing part, the last step is to feed the training data into LSTM and evaluate the performance using testing data. As a variant neural network of RNN, even with one LSTM layer, the NN structure is still a deep neural network since it can process sequential data and memorizes its hidden states through time. An LSTM layer is composed of one or more LSTM units, and an LSTM unit consists of cells and gates to perform classification and prediction based on time series data.

The LSTM structure is formed by two layers. The input dimension is determined by j after the PCA algorithm. The first layer is the input LSTM layer, and the second layer is the output layer. The final output will be 0 or 1 indicates if the stock price trend prediction result is going down or going up, as a supporting suggestion for the investors to perform the next investment decision.

Design discussion

Feature extension is one of the novelties of our proposed price trend predicting system. In the feature extension procedure, we use technical indices to collaborate with the heuristic processing methods learned from investors, which fills the gap between the financial research area and technical research area.

Since we proposed a system of price trend prediction, feature engineering is extremely important to the final prediction result. Not only the feature extension method is helpful to guarantee we do not miss the potentially correlated feature, but also feature selection method is necessary for pooling the effective features. The more irrelevant features are fed into the model, the more noise would be introduced. Each main procedure is carefully considered contributing to the whole system design.

Besides the feature engineering part, we also leverage LSTM, the state-of-the-art deep learning method for time-series prediction, which guarantees the prediction model can capture both complex hidden pattern and the time-series related pattern.

It is known that the training cost of deep learning models is expansive in both time and hardware aspects; another advantage of our system design is the optimization procedure—PCA. It can retain the principal components of the features while reducing the scale of the feature matrix, thus help the system to save the training cost of processing the large time-series feature matrix.

Algorithm elaboration

This section provides comprehensive details on the algorithms we built while utilizing and customizing different existing techniques. Details about the terminologies, parameters, as well as optimizers. From the legend on the right side of Fig.  3 , we note the algorithm steps as octagons, all of them can be found in this “ Algorithm elaboration ” section.

Before dive deep into the algorithm steps, here is the brief introduction of data pre-processing: since we will go through the supervised learning algorithms, we also need to program the ground truth. The ground truth of this research is programmed by comparing the closing price of the current trading date with the closing price of the previous trading date the users want to compare with. Label the price increase as 1, else the ground truth will be labeled as 0. Because this research work is not only focused on predicting the price trend of a specific period of time but short-term in general, the ground truth processing is according to a range of trading days. While the algorithms will not change with the prediction term length, we can regard the term length as a parameter.

The algorithmic detail is elaborated, respectively, the first algorithm is the hybrid feature engineering part for preparing high-quality training and testing data. It corresponds to the Feature extension, RFE, and PCA blocks in Fig.  3 . The second algorithm is the LSTM procedure block, including time-series data pre-processing, NN constructing, training, and testing.

Algorithm 1: Short-term stock market price trend prediction—applying feature engineering using FE + RFE + PCA

The function FE is corresponding to the feature extension block. For the feature extension procedure, we apply three different processing methods to translate the findings from the financial domain to a technical module in our system design. While not all the indices are applicable for expanding, we only choose the proper method(s) for certain features to perform the feature extension (FE), according to Table  2 .

Normalize method preserves the relative frequencies of the terms, and transform the technical indices into the range of [0, 1]. Polarize is a well-known method often used by real-world investors, sometimes they prefer to consider if the technical index value is above or below zero, we program some of the features using polarize method and prepare for RFE. Max-min (or min-max) [ 35 ] scaling is a transformation method often used as an alternative to zero mean and unit variance scaling. Another well-known method used is fluctuation percentage, and we transform the technical indices fluctuation percentage into the range of [− 1, 1].

The function RFE () in the first algorithm refers to recursive feature elimination. Before we perform the training data scale reduction, we will have to make sure that the features we selected are effective. Ineffective features will not only drag down the classification precision but also add more computational complexity. For the feature selection part, we choose recursive feature elimination (RFE). As [ 45 ] explained, the process of recursive feature elimination can be split into the ranking algorithm, resampling, and external validation.

For the ranking algorithm, it fits the model to the features and ranks by the importance to the model. We set the parameter to retain i numbers of features, and at each iteration of feature selection retains Si top-ranked features, then refit the model and assess the performance again to begin another iteration. The ranking algorithm will eventually determine the top Si features.

The RFE algorithm is known to have suffered from the over-fitting problem. To eliminate the over-fitting issue, we will run the RFE algorithm multiple times on randomly selected stocks as the training set and ensure all the features we select are high-weighted. This procedure is called data resampling. Resampling can be built as an optimization step as an outer layer of the RFE algorithm.

The last part of our hybrid feature engineering algorithm is for optimization purposes. For the training data matrix scale reduction, we apply Randomized principal component analysis (PCA) [ 31 ], before we decide the features of the classification model.

Financial ratios of a listed company are used to present the growth ability, earning ability, solvency ability, etc. Each financial ratio consists of a set of technical indices, each time we add a technical index (or feature) will add another column of data into the data matrix and will result in low training efficiency and redundancy. If non-relevant or less relevant features are included in training data, it will also decrease the precision of classification.

figure a

The above equation represents the explanation power of principal components extracted by PCA method for original data. If an ACR is below 85%, the PCA method would be unsuitable due to a loss of original information. Because the covariance matrix is sensitive to the order of magnitudes of data, there should be a data standardize procedure before performing the PCA. The commonly used standardized methods are mean-standardization and normal-standardization and are noted as given below:

Mean-standardization: \(X_{ij}^{*} = X_{ij} /\overline{{X_{j} }}\) , which \(\overline{{X_{j} }}\) represents the mean value.

Normal-standardization: \(X_{ij}^{*} = (X_{ij} - \overline{{X_{j} }} )/s_{j}\) , which \(\overline{{X_{j} }}\) represents the mean value, and \(s_{j}\) is the standard deviation.

The array fe_array is defined according to Table  2 , row number maps to the features, columns 0, 1, 2, 3 note for the extension methods of normalize, polarize, max–min scale, and fluctuation percentage, respectively. Then we fill in the values for the array by the rule where 0 stands for no necessity to expand and 1 for features need to apply the corresponding extension methods. The final algorithm of data preprocessing using RFE and PCA can be illustrated as Algorithm 1.

Algorithm 2: Price trend prediction model using LSTM

After the principal component extraction, we will get the scale-reduced matrix, which means i most effective features are converted into j principal components for training the prediction model. We utilized an LSTM model and added a conversion procedure for our stock price dataset. The detailed algorithm design is illustrated in Alg 2. The function TimeSeriesConversion () converts the principal components matrix into time series by shifting the input data frame according to the number of time steps [ 3 ], i.e., term length in this research. The processed dataset consists of the input sequence and forecast sequence. In this research, the parameter of LAG is 1, because the model is detecting the pattern of features fluctuation on a daily basis. Meanwhile, the N_TIME_STEPS is varied from 1 trading day to 10 trading days. The functions DataPartition (), FitModel (), EvaluateModel () are regular steps without customization. The NN structure design, optimizer decision, and other parameters are illustrated in function ModelCompile () .

Some procedures impact the efficiency but do not affect the accuracy or precision and vice versa, while other procedures may affect both efficiency and prediction result. To fully evaluate our algorithm design, we structure the evaluation part by main procedures and evaluate how each procedure affects the algorithm performance. First, we evaluated our solution on a machine with 2.2 GHz i7 processor, with 16 GB of RAM. Furthermore, we also evaluated our solution on Amazon EC2 instance, 3.1 GHz Processor with 16 vCPUs, and 64 GB RAM.

In the implementation part, we expanded 20 features into 54 features, while we retain 30 features that are the most effective. In this section, we discuss the evaluation of feature selection. The dataset was divided into two different subsets, i.e., training and testing datasets. Test procedure included two parts, one testing dataset is for feature selection, and another one is for model testing. We note the feature selection dataset and model testing dataset as DS_test_f and DS_test_m, respectively.

We randomly selected two-thirds of the stock data by stock ID for RFE training and note the dataset as DS_train_f; all the data consist of full technical indices and expanded features throughout 2018. The estimator of the RFE algorithm is SVR with linear kernels. We rank the 54 features by voting and get 30 effective features then process them using the PCA algorithm to perform dimension reduction and reduce the features into 20 principal components. The rest of the stock data forms the testing dataset DS_test_f to validate the effectiveness of principal components we extracted from selected features. We reformed all the data from 2018 as the training dataset of the data model and noted as DS_train_m. The model testing dataset DS_test_m consists of the first 3 months of data in 2019, which has no overlap with the dataset we utilized in the previous steps. This approach is to prevent the hidden problem caused by overfitting.

Term length

To build an efficient prediction model, instead of the approach of modeling the data to time series, we determined to use 1 day ahead indices data to predict the price trend of the next day. We tested the RFE algorithm on a range of short-term from 1 day to 2 weeks (ten trading days), to evaluate how the commonly used technical indices correlated to price trends. For evaluating the prediction term length, we fully expanded the features as Table  2 , and feed them to RFE. During the test, we found that different length of the term has a different level of sensitive-ness to the same indices set.

We get the close price of the first trading date and compare it with the close price of the n _ th trading date. Since we are predicting the price trend, we do not consider the term lengths if the cross-validation score is below 0.5. And after the test, as we can see from Fig.  4 , there are three-term lengths that are most sensitive to the indices we selected from the related works. They are n  = {2, 5, 10}, which indicates that price trend prediction of every other day, 1 week, and 2 weeks using the indices set are likely to be more reliable.

figure 4

How do term lengths affect the cross-validation score of RFE

While these curves have different patterns, for the length of 2 weeks, the cross-validation score increases with the number of features selected. If the prediction term length is 1 week, the cross-validation score will decrease if selected over 8 features. For every other day price trend prediction, the best cross-validation score is achieved by selecting 48 features. Biweekly prediction requires 29 features to achieve the best score. In Table  3 , we listed the top 15 effective features for these three-period lengths. If we predict the price trend of every other day, the cross-validation score merely fluctuates with the number of features selected. So, in the next step, we will evaluate the RFE result for these three-term lengths, as shown in Fig.  4 .

We compare the output feature set of RFE with the all-original feature set as a baseline, the all-original feature set consists of n features and we choose n most effective features from RFE output features to evaluate the result using linear SVR. We used two different approaches to evaluate feature effectiveness. The first method is to combine all the data into one large matrix and evaluate them by running the RFE algorithm once. Another method is to run RFE for each individual stock and calculate the most effective features by voting.

Feature extension and RFE

From the result of the previous subsection, we can see that when predicting the price trend for every other day or biweekly, the best result is achieved by selecting a large number of features. Within the selected features, some features processed from extension methods have better ranks than original features, which proves that the feature extension method is useful for optimizing the model. The feature extension affects both precision and efficiency, while in this part, we only discuss the precision aspect and leave efficiency part in the next step since PCA is the most effective method for training efficiency optimization in our design. We involved an evaluation of how feature extension affects RFE and use the test result to measure the improvement of involving feature extension.

We further test the effectiveness of feature extension, i.e., if polarize, max–min scale, and calculate fluctuation percentage works better than original technical indices. The best case to leverage this test is the weekly prediction since it has the least effective feature selected. From the result we got from the last section, we know the best cross-validation score appears when selecting 8 features. The test consists of two steps, and the first step is to test the feature set formed by original features only, in this case, only SLOWK, SLOWD, and RSI_5 are included. The next step is to test the feature set of all 8 features we selected in the previous subsection. We leveraged the test by defining the simplest DNN model with three layers.

The normalized confusion matrix of testing the two feature sets are illustrated in Fig.  5 . The left one is the confusion matrix of the feature set with expanded features, and the right one besides is the test result of using original features only. Both precisions of true positive and true negative have been improved by 7% and 10%, respectively, which proves that our feature extension method design is reasonably effective.

figure 5

Confusion matrix of validating feature extension effectiveness

Feature reduction using principal component analysis

PCA will affect the algorithm performance on both prediction accuracy and training efficiency, while this part should be evaluated with the NN model, so we also defined the simplest DNN model with three layers as we used in the previous step to perform the evaluation. This part introduces the evaluation method and result of the optimization part of the model from computational efficiency and accuracy impact perspectives.

In this section, we will choose bi-weekly prediction to perform a use case analysis, since it has a smoothly increasing cross-validation score curve, moreover, unlike every other day prediction, it has excluded more than 20 ineffective features already. In the first step, we select all 29 effective features and train the NN model without performing PCA. It creates a baseline of the accuracy and training time for comparison. To evaluate the accuracy and efficiency, we keep the number of the principal component as 5, 10, 15, 20, 25. Table  4 recorded how the number of features affects the model training efficiency, then uses the stack bar chart in Fig.  6 to illustrate how PCA affects training efficiency. Table  6 shows accuracy and efficiency analysis on different procedures for the pre-processing of features. The times taken shown in Tables  4 , 6 are based on experiments conducted in a standard user machine to show the viability of our solution with limited or average resource availability.

figure 6

Relationship between feature number and training time

We also listed the confusion matrix of each test in Fig.  7 . The stack bar chart shows that the overall time spends on training the model is decreasing by the number of selected features, while the PCA method is significantly effective in optimizing training dataset preparation. For the time spent on the training stage, PCA is not as effective as the data preparation stage. While there is the possibility that the optimization effect of PCA is not drastic enough because of the simple structure of the NN model.

figure 7

How does the number of principal components affect evaluation results

Table  5 indicates that the overall prediction accuracy is not drastically affected by reducing the dimension. However, the accuracy could not fully support if the PCA has no side effect to model prediction, so we looked into the confusion matrices of test results.

From Fig.  7 we can conclude that PCA does not have a severe negative impact on prediction precision. The true positive rate and false positive rate are barely be affected, while the false negative and true negative rates are influenced by 2% to 4%. Besides evaluating how the number of selected features affects the training efficiency and model performance, we also leveraged a test upon how data pre-processing procedures affect the training procedure and predicting result. Normalizing and max–min scaling is the most commonly seen data pre-procedure performed before PCA, since the measure units of features are varied, and it is said that it could increase the training efficiency afterward.

We leveraged another test on adding pre-procedures before extracting 20 principal components from the original dataset and make the comparison in the aspects of time elapse of training stage and prediction precision. However, the test results lead to different conclusions. In Table  6 we can conclude that feature pre-processing does not have a significant impact on training efficiency, but it does influence the model prediction accuracy. Moreover, the first confusion matrix in Fig.  8 indicates that without any feature pre-processing procedure, the false-negative rate and true negative rate are severely affected, while the true positive rate and false positive rate are not affected. If it performs the normalization before PCA, both true positive rate and true negative rate are decreasing by approximately 10%. This test also proved that the best feature pre-processing method for our feature set is exploiting the max–min scale.

figure 8

Confusion matrices of different feature pre-processing methods

In this section, we discuss and compare the results of our proposed model, other approaches, and the most related works.

Comparison with related works

From the previous works, we found the most commonly exploited models for short-term stock market price trend prediction are support vector machine (SVM), multilayer perceptron artificial neural network (MLP), Naive Bayes classifier (NB), random forest classifier (RAF) and logistic regression classifier (LR). The test case of comparison is also bi-weekly price trend prediction, to evaluate the best result of all models, we keep all 29 features selected by the RFE algorithm. For MLP evaluation, to test if the number of hidden layers would affect the metric scores, we noted layer number as n and tested n  = {1, 3, 5}, 150 training epochs for all the tests, found slight differences in the model performance, which indicates that the variable of MLP layer number hardly affects the metric scores.

From the confusion matrices in Fig.  9 , we can see all the machine learning models perform well when training with the full feature set we selected by RFE. From the perspective of training time, training the NB model got the best efficiency. LR algorithm cost less training time than other algorithms while it can achieve a similar prediction result with other costly models such as SVM and MLP. RAF algorithm achieved a relatively high true-positive rate while the poor performance in predicting negative labels. For our proposed LSTM model, it achieves a binary accuracy of 93.25%, which is a significantly high precision of predicting the bi-weekly price trend. We also pre-processed data through PCA and got five principal components, then trained for 150 epochs. The learning curve of our proposed solution, based on feature engineering and the LSTM model, is illustrated in Fig.  10 . The confusion matrix is the figure on the right in Fig.  11 , and detailed metrics scores can be found in Table  9 .

figure 9

Model prediction comparison—confusion matrices

figure 10

Learning curve of proposed solution

figure 11

Proposed model prediction precision comparison—confusion matrices

The detailed evaluate results are recorded in Table  7 . We will also initiate a discussion upon the evaluation result in the next section.

Because the resulting structure of our proposed solution is different from most of the related works, it would be difficult to make naïve comparison with previous works. For example, it is hard to find the exact accuracy number of price trend prediction in most of the related works since the authors prefer to show the gain rate of simulated investment. Gain rate is a processed number based on simulated investment tests, sometimes one correct investment decision with a large trading volume can achieve a high gain rate regardless of the price trend prediction accuracy. Besides, it is also a unique and heuristic innovation in our proposed solution, we transform the problem of predicting an exact price straight forward to two sequential problems, i.e., predicting the price trend first, focus on building an accurate binary classification model, construct a solid foundation for predicting the exact price change in future works. Besides the different result structure, the datasets that previous works researched on are also different from our work. Some of the previous works involve news data to perform sentiment analysis and exploit the SE part as another system component to support their prediction model.

The latest related work that can compare is Zubair et al. [ 47 ], the authors take multiple r-square for model accuracy measurement. Multiple r-square is also called the coefficient of determination, and it shows the strength of predictor variables explaining the variation in stock return [ 28 ]. They used three datasets (KSE 100 Index, Lucky Cement Stock, Engro Fertilizer Limited) to evaluate the proposed multiple regression model and achieved 95%, 89%, and 97%, respectively. Except for the KSE 100 Index, the dataset choice in this related work is individual stocks; thus, we choose the evaluation result of the first dataset of their proposed model.

We listed the leading stock price trend prediction model performance in Table  8 , from the comparable metrics, the metric scores of our proposed solution are generally better than other related works. Instead of concluding arbitrarily that our proposed model outperformed other models in related works, we first look into the dataset column of Table  8 . By looking into the dataset used by each work [ 18 ], only trained and tested their proposed solution on three individual stocks, which is difficult to prove the generalization of their proposed model. Ayo [ 2 ] leveraged analysis on the stock data from the New York Stock Exchange (NYSE), while the weakness is they only performed analysis on closing price, which is a feature embedded with high noise. Zubair et al. [ 47 ] trained their proposed model on both individual stocks and index price, but as we have mentioned in the previous section, index price only consists of the limited number of features and stock IDs, which will further affect the model training quality. For our proposed solution, we collected sufficient data from the Chinese stock market, and applied FE + RFE algorithm on the original indices to get more effective features, the comprehensive evaluation result of 3558 stock IDs can reasonably explain the generalization and effectiveness of our proposed solution in Chinese stock market. However, the authors of Khaidem and Dey [ 18 ] and Ayo [ 2 ] chose to analyze the stock market in the United States, Zubair et al. [ 47 ] performed analysis on Pakistani stock market price, and we obtained the dataset from Chinese stock market, the policies of different countries might impact the model performance, which needs further research to validate.

Proposed model evaluation—PCA effectiveness

Besides comparing the performance across popular machine learning models, we also evaluated how the PCA algorithm optimizes the training procedure of the proposed LSTM model. We recorded the confusion matrices comparison between training the model by 29 features and by five principal components in Fig.  11 . The model training using the full 29 features takes 28.5 s per epoch on average. While it only takes 18 s on average per epoch training on the feature set of five principal components. PCA has significantly improved the training efficiency of the LSTM model by 36.8%. The detailed metrics data are listed in Table  9 . We will leverage a discussion in the next section about complexity analysis.

Complexity analysis of proposed solution

This section analyzes the complexity of our proposed solution. The Long Short-term Memory is different from other NNs, and it is a variant of standard RNN, which also has time steps with memory and gate architecture. In the previous work [ 46 ], the author performed an analysis of the RNN architecture complexity. They introduced a method to regard RNN as a directed acyclic graph and proposed a concept of recurrent depth, which helps perform the analysis on the intricacy of RNN.

The recurrent depth is a positive rational number, and we denote it as \(d_{rc}\) . As the growth of \(n\) \(d_{rc}\) measures, the nonlinear transformation average maximum number of each time step. We then unfold the directed acyclic graph of RNN and denote the processed graph as \(g_{c}\) , meanwhile, denote \(C(g_{c} )\) as the set of directed cycles in this graph. For the vertex \(v\) , we note \(\sigma_{s} (v)\) as the sum of edge weights and \(l(v)\) as the length. The equation below is proved under a mild assumption, which could be found in [ 46 ].

They also found that another crucial factor that impacts the performance of LSTM, which is the recurrent skip coefficients. We note \(s_{rc}\) as the reciprocal of the recurrent skip coefficient. Please be aware that \(s_{rc}\) is also a positive rational number.

According to the above definition, our proposed model is a 2-layers stacked LSTM, which \(d_{rc} = 2\) and \(s_{rc} = 1\) . From the experiments performed in previous work, the authors also found that when facing the problems of long-term dependency, LSTMs may benefit from decreasing the reciprocal of recurrent skip coefficients and from increasing recurrent depth. The empirical findings above mentioned are useful to enhance the performance of our proposed model further.

This work consists of three parts: data extraction and pre-processing of the Chinese stock market dataset, carrying out feature engineering, and stock price trend prediction model based on the long short-term memory (LSTM). We collected, cleaned-up, and structured 2 years of Chinese stock market data. We reviewed different techniques often used by real-world investors, developed a new algorithm component, and named it as feature extension, which is proved to be effective. We applied the feature expansion (FE) approaches with recursive feature elimination (RFE), followed by principal component analysis (PCA), to build a feature engineering procedure that is both effective and efficient. The system is customized by assembling the feature engineering procedure with an LSTM prediction model, achieved high prediction accuracy that outperforms the leading models in most related works. We also carried out a comprehensive evaluation of this work. By comparing the most frequently used machine learning models with our proposed LSTM model under the feature engineering part of our proposed system, we conclude many heuristic findings that could be future research questions in both technical and financial research domains.

Our proposed solution is a unique customization as compared to the previous works because rather than just proposing yet another state-of-the-art LSTM model, we proposed a fine-tuned and customized deep learning prediction system along with utilization of comprehensive feature engineering and combined it with LSTM to perform prediction. By researching into the observations from previous works, we fill in the gaps between investors and researchers by proposing a feature extension algorithm before recursive feature elimination and get a noticeable improvement in the model performance.

Though we have achieved a decent outcome from our proposed solution, this research has more potential towards research in future. During the evaluation procedure, we also found that the RFE algorithm is not sensitive to the term lengths other than 2-day, weekly, biweekly. Getting more in-depth research into what technical indices would influence the irregular term lengths would be a possible future research direction. Moreover, by combining latest sentiment analysis techniques with feature engineering and deep learning model, there is also a high potential to develop a more comprehensive prediction system which is trained by diverse types of information such as tweets, news, and other text-based data.

Abbreviations

Long short term memory

Principal component analysis

Recurrent neural networks

Artificial neural network

Deep neural network

Dynamic Time Warping

Recursive feature elimination

Support vector machine

Convolutional neural network

Stochastic gradient descent

Rectified linear unit

Multi layer perceptron

Atsalakis GS, Valavanis KP. Forecasting stock market short-term trends using a neuro-fuzzy based methodology. Expert Syst Appl. 2009;36(7):10696–707.

Article   Google Scholar  

Ayo CK. Stock price prediction using the ARIMA model. In: 2014 UKSim-AMSS 16th international conference on computer modelling and simulation. 2014. https://doi.org/10.1109/UKSim.2014.67 .

Brownlee J. Deep learning for time series forecasting: predict the future with MLPs, CNNs and LSTMs in Python. Machine Learning Mastery. 2018. https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/

Eapen J, Bein D, Verma A. Novel deep learning model with CNN and bi-directional LSTM for improved stock market index prediction. In: 2019 IEEE 9th annual computing and communication workshop and conference (CCWC). 2019. pp. 264–70. https://doi.org/10.1109/CCWC.2019.8666592 .

Fischer T, Krauss C. Deep learning with long short-term memory networks for financial market predictions. Eur J Oper Res. 2018;270(2):654–69. https://doi.org/10.1016/j.ejor.2017.11.054 .

Article   MathSciNet   MATH   Google Scholar  

Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Mach Learn 2002;46:389–422.

Hafezi R, Shahrabi J, Hadavandi E. A bat-neural network multi-agent system (BNNMAS) for stock price prediction: case study of DAX stock price. Appl Soft Comput J. 2015;29:196–210. https://doi.org/10.1016/j.asoc.2014.12.028 .

Halko N, Martinsson PG, Tropp JA. Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 2001;53(2):217–88.

Article   MathSciNet   Google Scholar  

Hassan MR, Nath B. Stock market forecasting using Hidden Markov Model: a new approach. In: Proceedings—5th international conference on intelligent systems design and applications 2005, ISDA’05. 2005. pp. 192–6. https://doi.org/10.1109/ISDA.2005.85 .

Hochreiter S, Schmidhuber J. Long short-term memory. J Neural Comput. 1997;9(8):1735–80. https://doi.org/10.1162/neco.1997.9.8.1735 .

Hsu CM. A hybrid procedure with feature selection for resolving stock/futures price forecasting problems. Neural Comput Appl. 2013;22(3–4):651–71. https://doi.org/10.1007/s00521-011-0721-4 .

Huang CF, Chang BR, Cheng DW, Chang CH. Feature selection and parameter optimization of a fuzzy-based stock selection model using genetic algorithms. Int J Fuzzy Syst. 2012;14(1):65–75. https://doi.org/10.1016/J.POLYMER.2016.08.021 .

Huang CL, Tsai CY. A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting. Expert Syst Appl. 2009;36(2 PART 1):1529–39. https://doi.org/10.1016/j.eswa.2007.11.062 .

Idrees SM, Alam MA, Agarwal P. A prediction approach for stock market volatility based on time series data. IEEE Access. 2019;7:17287–98. https://doi.org/10.1109/ACCESS.2019.2895252 .

Ince H, Trafalis TB. Short term forecasting with support vector machines and application to stock price prediction. Int J Gen Syst. 2008;37:677–87. https://doi.org/10.1080/03081070601068595 .

Jeon S, Hong B, Chang V. Pattern graph tracking-based stock price prediction using big data. Future Gener Comput Syst. 2018;80:171–87. https://doi.org/10.1016/j.future.2017.02.010 .

Kara Y, Acar Boyacioglu M, Baykan ÖK. Predicting direction of stock price index movement using artificial neural networks and support vector machines: the sample of the Istanbul Stock Exchange. Expert Syst Appl. 2011;38(5):5311–9. https://doi.org/10.1016/j.eswa.2010.10.027 .

Khaidem L, Dey SR. Predicting the direction of stock market prices using random forest. 2016. pp. 1–20.

Kim K, Han I. Genetic algorithms approach to feature discretization in artificial neural networks for the prediction of stock price index. Expert Syst Appl. 2000;19:125–32. https://doi.org/10.1016/S0957-4174(00)00027-0 .

Lee MC. Using support vector machine with a hybrid feature selection method to the stock trend prediction. Expert Syst Appl. 2009;36(8):10896–904. https://doi.org/10.1016/j.eswa.2009.02.038 .

Lei L. Wavelet neural network prediction method of stock price trend based on rough set attribute reduction. Appl Soft Comput J. 2018;62:923–32. https://doi.org/10.1016/j.asoc.2017.09.029 .

Lin X, Yang Z, Song Y. Expert systems with applications short-term stock price prediction based on echo state networks. Expert Syst Appl. 2009;36(3):7313–7. https://doi.org/10.1016/j.eswa.2008.09.049 .

Liu G, Wang X. A new metric for individual stock trend prediction. Eng Appl Artif Intell. 2019;82(March):1–12. https://doi.org/10.1016/j.engappai.2019.03.019 .

Liu S, Zhang C, Ma J. CNN-LSTM neural network model for quantitative strategy analysis in stock markets. 2017;1:198–206. https://doi.org/10.1007/978-3-319-70096-0 .

Long W, Lu Z, Cui L. Deep learning-based feature engineering for stock price movement prediction. Knowl Based Syst. 2018;164:163–73. https://doi.org/10.1016/j.knosys.2018.10.034 .

Malkiel BG, Fama EF. Efficient capital markets: a review of theory and empirical work. J Finance. 1970;25(2):383–417.

McNally S, Roche J, Caton S. Predicting the price of bitcoin using machine learning. In: Proceedings—26th Euromicro international conference on parallel, distributed, and network-based processing, PDP 2018. pp. 339–43. https://doi.org/10.1109/PDP2018.2018.00060 .

Nagar A, Hahsler M. News sentiment analysis using R to predict stock market trends. 2012. http://past.rinfinance.com/agenda/2012/talk/Nagar+Hahsler.pdf . Accessed 20 July 2019.

Nekoeiqachkanloo H, Ghojogh B, Pasand AS, Crowley M. Artificial counselor system for stock investment. 2019. ArXiv Preprint arXiv:1903.00955 .

Ni LP, Ni ZW, Gao YZ. Stock trend prediction based on fractal feature selection and support vector machine. Expert Syst Appl. 2011;38(5):5569–76. https://doi.org/10.1016/j.eswa.2010.10.079 .

Pang X, Zhou Y, Wang P, Lin W, Chang V. An innovative neural network approach for stock market prediction. J Supercomput. 2018. https://doi.org/10.1007/s11227-017-2228-y .

Pimenta A, Nametala CAL, Guimarães FG, Carrano EG. An automated investing method for stock market based on multiobjective genetic programming. Comput Econ. 2018;52(1):125–44. https://doi.org/10.1007/s10614-017-9665-9 .

Piramuthu S. Evaluating feature selection methods for learning in data mining applications. Eur J Oper Res. 2004;156(2):483–94. https://doi.org/10.1016/S0377-2217(02)00911-6 .

Qiu M, Song Y. Predicting the direction of stock market index movement using an optimized artificial neural network model. PLoS ONE. 2016;11(5):e0155133.

Scikit-learn. Scikit-learn Min-Max Scaler. 2019. https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html . Retrieved 26 July 2020.

Shen J. Thesis, “Short-term stock market price trend prediction using a customized deep learning system”, supervised by M. Omair Shafiq, Carleton University. 2019.

Shen J, Shafiq MO. Deep learning convolutional neural networks with dropout—a parallel approach. ICMLA. 2018;2018:572–7.

Google Scholar  

Shen J, Shafiq MO. Learning mobile application usage—a deep learning approach. ICMLA. 2019;2019:287–92.

Shih D. A study of early warning system in volume burst risk assessment of stock with Big Data platform. In: 2019 IEEE 4th international conference on cloud computing and big data analysis (ICCCBDA). 2019. pp. 244–8.

Sirignano J, Cont R. Universal features of price formation in financial markets: perspectives from deep learning. Ssrn. 2018. https://doi.org/10.2139/ssrn.3141294 .

Article   MATH   Google Scholar  

Thakur M, Kumar D. A hybrid financial trading support system using multi-category classifiers and random forest. Appl Soft Comput J. 2018;67:337–49. https://doi.org/10.1016/j.asoc.2018.03.006 .

Tsai CF, Hsiao YC. Combining multiple feature selection methods for stock prediction: union, intersection, and multi-intersection approaches. Decis Support Syst. 2010;50(1):258–69. https://doi.org/10.1016/j.dss.2010.08.028 .

Tushare API. 2018. https://github.com/waditu/tushare . Accessed 1 July 2019.

Wang X, Lin W. Stock market prediction using neural networks: does trading volume help in short-term prediction?. n.d.

Weng B, Lu L, Wang X, Megahed FM, Martinez W. Predicting short-term stock prices using ensemble methods and online data sources. Expert Syst Appl. 2018;112:258–73. https://doi.org/10.1016/j.eswa.2018.06.016 .

Zhang S. Architectural complexity measures of recurrent neural networks, (NIPS). 2016. pp. 1–9.

Zubair M, Fazal A, Fazal R, Kundi M. Development of stock market trend prediction system using multiple regression. Computational and mathematical organization theory. Berlin: Springer US; 2019. https://doi.org/10.1007/s10588-019-09292-7 .

Book   Google Scholar  

Download references

Acknowledgements

This research is supported by Carleton University, in Ottawa, ON, Canada. This research paper has been built based on the thesis [ 36 ] of Jingyi Shen, supervised by M. Omair Shafiq at Carleton University, Canada, available at https://curve.carleton.ca/52e9187a-7f71-48ce-bdfe-e3f6a420e31a .

NSERC and Carleton University.

Author information

Authors and affiliations.

School of Information Technology, Carleton University, Ottawa, ON, Canada

Jingyi Shen & M. Omair Shafiq

You can also search for this author in PubMed   Google Scholar

Contributions

Yes. All authors read and approved the final manuscript.

Corresponding author

Correspondence to M. Omair Shafiq .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Shen, J., Shafiq, M.O. Short-term stock market price trend prediction using a comprehensive deep learning system. J Big Data 7 , 66 (2020). https://doi.org/10.1186/s40537-020-00333-6

Download citation

Received : 24 January 2020

Accepted : 30 July 2020

Published : 28 August 2020

DOI : https://doi.org/10.1186/s40537-020-00333-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Deep learning
  • Stock market trend
  • Feature engineering

stock market prediction using lstm research paper

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

LSTM based stock prediction using weighted and categorized financial news

Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Validation, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliation Systems Research Laboratory, FAST-National University of Computer and Emerging Sciences, Karachi, Pakistan

ORCID logo

Roles Supervision

  • Shazia Usmani, 
  • Jawwad A. Shamsi

PLOS

  • Published: March 7, 2023
  • https://doi.org/10.1371/journal.pone.0282234
  • Reader Comments

Fig 1

A significant correlation between financial news with stock market trends has been explored extensively. However, very little research has been conducted for stock prediction models that utilize news categories, weighted according to their relevance with the target stock. In this paper, we show that prediction accuracy can be enhanced by incorporating weighted news categories simultaneously into the prediction model. We suggest utilizing news categories associated with the structural hierarchy of the stock market: that is, news categories for the market, sector, and stock-related news. In this context, Long Short-Term Memory (LSTM) based Weighted and Categorized News Stock prediction model (WCN-LSTM) is proposed. The model incorporates news categories with their learned weights simultaneously. To enhance the effectiveness, sophisticated features are integrated into WCN-LSTM. These include, hybrid input, lexicon-based sentiment analysis, and deep learning to impose sequential learning. Experiments have been performed for the case of the Pakistan Stock Exchange (PSX) using different sentiment dictionaries and time steps. Accuracy and F1-score are used to evaluate the prediction model. We have analyzed the WCN-LSTM results thoroughly and identified that WCN-LSTM performs better than the baseline model. Moreover, the sentiment lexicon HIV4 along with time steps 3 and 7, optimized the prediction accuracy. We have conducted statistical analysis to quantitatively assess our findings. A qualitative comparison of WCN-LSTM with existing prediction models is also presented to highlight its superiority and novelty over its counterparts.

Citation: Usmani S, Shamsi JA (2023) LSTM based stock prediction using weighted and categorized financial news. PLoS ONE 18(3): e0282234. https://doi.org/10.1371/journal.pone.0282234

Editor: Sriparna Saha, Indian Institute of Technology Patna, INDIA

Received: May 23, 2022; Accepted: February 11, 2023; Published: March 7, 2023

Copyright: © 2023 Usmani, Shamsi. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All data are fully available from the Mendeley Data repository. Data identification number: 10.17632/mc4s7zvx9c.1 Direct URL to data: https://data.mendeley.com/datasets/mc4s7zvx9c/1 .

Funding: This work has been supported by Ph.D. fellowship of Shazia Usmani. The work has been supported by the Higher Education Commission of Pakistan. But paper publication charges will not be paid by HEC.

Competing interests: We have no conflicts of interest to disclose.

1. Introduction

News analysis could play a significant role in the prediction of stock trends due to the fact that the stock market is heavily influenced by market-related news [ 1 ]. News analysis with deep insight could generate significant benefits by improving stock prediction performance. In recent years, stock related news is analysed from different perspectives but there is still much room to mine information from financial news repository. However, the task of news analysis is challenging due to many factors.

First and foremost, proper categorization of financial news is important so that news can be assessed precisely in its area of influence. For instance, Schumaker and Chen partitioned financial news articles into two news groups related to similar industries and sectors and found that sector-based grouping enhanced prediction model’s performance [ 2 ]. Inspecting the significance of using multiple and simultaneous news groups in predicting stock trends has been a research-oriented task in the context of news analysis as well. Shynkevich, et al. [ 3 ] have shown improvement in prediction accuracy by concurrent and appropriately weighted incorporation of news groups into the prediction model. Hence, news groups should be identified according to their area of influence. Furthermore, there should be a way to identify the optimized weights for the impact of each news group. And finally, an efficient machine learning approach should be investigated that simultaneously incorporates these categorized and weighted information in order to improve prediction performance.

Selecting an efficient way to extract information from news and representing it in machine-readable format is another research-oriented task. Textual analytics deals with text processing to extract significant information [ 4 ]. Sentiment analysis is a form of text analytics that measures the polarity of text. It is a way to identify the meaning of the text in terms of how positive or negative it is [ 5 ]. A few researchers have also used sentiment analysis for stock prediction [ 6 , 7 ]. Sentiment dictionaries play a vital role in measuring sentiment scores. These dictionaries can be general or specific for a domain, like Harvard IV (HIV4) and Loughran and McDonald (LM) are used as general and specific domain dictionaries. So there is a need for comparative analysis between different sentiment lexicons to achieve optimized prediction performance.

Most of the earlier work rely only on the input data at time point t to predict stock trend at time point t+1 . Recently, many studies adopted the stock prediction problem as a sequence learning problem where the input to the prediction model is a sequence of input at successive time points [ 8 – 10 ]. But there is little work that investigates the effectiveness of multiple input sequence length in quest of enhancing prediction performance. Moreover, adopting an efficient machine learning model that could efficiently maintain memory across long sequences to improve prediction accuracy is important to implement sequence learning.

Our research objective is motivated by the above mentioned opportunities and challenges in the quest of enhancing the performance of news sensitive stock prediction model.

1.1 Research objectives

In order to improve the performance of news sensitive stock trend prediction, we have identified the following objectives of our research:

  • To propose and evaluate an extensive model, which can cater complexities of the hybrid input dataset in order to extract significant information.
  • To incorporate the filtered news groups into the prediction model simultaneously with their learned weight.
  • To effectively identify sentiments of news to predict the stock trend.
  • To adopt sequence learning in stock prediction to utilize historical sequence information.

The above research objectives lead towards the proposal of a Long Short-Term Memory (LSTM) based Weighted and Categorized News (WCN-LSTM) stock prediction model. WCN-LSTM integrates state of the art features. The proposed model utilizes hybrid input. It allows the incorporation of multiple weighted news groups simultaneously into the prediction model. WCN-LSTM performs feature extraction from news using lexicon-based sentiment analysis. It uses the LSTM layer to process sequential input data. The implementation of WCN-LSTM causes to raise the following empirical research questions.

1.2 Research questions

RQ1: Does news categorization give more insight into understanding news impact on the stock market in turn improving prediction accuracy?

RQ2: Do different news categories have different weights of their impact on the stock market to significantly enhance prediction performance?

RQ3: How to identify the optimized value of weights for categorized news impact?

RQ4: Which sentiment dictionary improves WCN-LSTM performance significantly? general or domain specific?

RQ5: What should be the optimized time step value of the input sequence for WCN-LSTM to predict the trend of the next day for a stock?

1.3 Contributions

We have proposed a new stock prediction model. Moreover, to demonstrate the significance of the proposed model in improving stock prediction accuracy our contributions are given below:

  • We have selected Pakistan Stock Exchange (PSX) in order to perform experiments. Stock prices are downloaded from the website of PSX Stock for the period of January 2006 to August 2018. News headlines are scraped from a newspaper ( The News ) archive along with the publishing date. News headlines are categorized using an unsupervised classification technique suggested by Usmani and Shamsi [ 11 ].
  • We have implemented our proposed stock prediction model, WCN-LSTM. Moreover, to perform a quantitative analysis, we have also implemented a baseline model proposed by Li, et al. [ 8 ].
  • We have performed extensive experiments by utilizing different sentiment lexicons and varying input sequence lengths, to reveal significant findings for the case of PSX.
  • We have adopted Wilcoxon signed-rank test to perform statistical analysis for all experimental scenarios (see section 6.2).
  • We have also presented a qualitative analysis between WCN-LSTM and existing stock prediction models to show the significance and novelty of our proposed model (see Fig 13 ).

For evaluation, we have adopted accuracy and F1-score as metrics. We have observed that our proposed approach performs better than the baseline approach by combining the impact of news categories according to their learned weights. The empirical research questions are answered by employing empirical and statistical evidence in the last section of the paper. The paper is organized as follows. Section 2 discusses the literature review. The proposed approach is discussed in section 3. The dataset description along with input preparation is presented in section 4. The experimental setup and results are discussed in sections 5 and 6, respectively. Finally, section 7 concludes this research along with future work. The organization of the paper is shown in Fig 1 .

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0282234.g001

2. Related work

News sensitive stock trend prediction is addressed in the literature mostly by using stock price data along with the news. Moreover, technical indicators are also used frequently which are derived from stock price data. In [ 12 ], the authors discussed the importance of hybrid information extracted from stock price time series and news to improve prediction performance. A few researchers have shown the significance of incorporating hybrid information into the prediction model in terms of accuracy [ 13 – 15 ]. Moreover, it has also been shown that deep learning architectures improve feature representations [ 16 ] and prediction performance [ 17 ] in the financial domain. In this section, literature related to stock prediction is discussed briefly from the perspective of news categorization, sentiment analysis, and sequence learning.

2.1 Stock prediction using categorized financial news

Mostly stock prediction deals with only one news category which influences the stock market. In literature, only few authors addressed the significance of incorporating categorized news in the prediction model. In [ 13 ], authors employed SVM for prediction and incorporated general market and company specific news along with technical indicators. They showed that prediction performance and system profitability were enhanced by taking in multiple news categories and technical indicators. However, they considered the news impact on the market for only 24 hours of its release. Although there could be some news that has long-term impact like news about government policies for the financial market.

There are different industry classification standards to group companies with a similar output to take the advantage of investigating their group effect. Global Industry Classification Standard (GICS) is one of the industry classification standards and employs four level hierarchy of Sector, Industry Group, Industry and Sub-Sector [ 2 , 3 ].

Schumaker and Chen [ 2 ] employed GICS to partition news articles according to their relevance to sector, industry, subindustry, etc. They have used these news groups to identify their effect on stock prediction performance. They have found that prediction performance varies for different news group. However, they have used only one news group at a time to investigate its effect on prediction performance. For instance, the whole news dataset, sector-related news group, industry related news group etc. are incorporated in prediction model and they found that sector-related news group performed better among all news groups. Moreover, they employed Support Vector Regression (SVR) machine learning algorithm for prediction. Furthermore, news groups have not been incorporated simultaneously into the prediction model that is, the combined impact of all news groups has not been investigated in their work.

In [ 3 ], authors adapted this approach by incorporating all identified news groups simultaneously in the prediction model. They showed how properly weighted news articles with different degrees of relevance with stock prices used simultaneously can improve prediction performance significantly. They used the Multiple kernel learning (MKL) approaches for stock prediction by integrating the information coming from the prediction using a separate kernel for each news group. They have used newsgroups based on GICS standard which has some lacking in finding group of relevant articles.

In [ 18 ], the authors claimed that heterogeneity exists in GISC that limits the relevant finding regarding stock prediction. They proposed a model that searches for a group of companies with high relevance. They showed that the proposed model outperforms GISC system based prediction. However, these newsgroups are not belonging to the structural hierarchy of the stock market and their degree of relevance has not been learned ae well.

2.2 Stock prediction using sentiment analysis

Stock market-related news not only states the current market status but also has an impact on market volatility. There is a lot of work exists where market-related textual data is mined to extract sentiments about the financial market. There are two major ways to perform sentiment analysis: classification algorithm based and sentiment lexicon-based.

In the classification algorithm based approach labeled data is used to train the algorithm. Jiawei and Murata [ 6 ] used training data with positive and negative sentiment labels. LSTM is adopted for sentiment analysis and trend prediction. Multiple news articles at day t form an input sequence and are passed into the sentiment analysis module to generate a sentiment label at day t . Technical indicators after dimensionality reduction are passed along with sentiment labels into the trend prediction module. This module predicts the average trend of the next three days from day t and achieves 66.32% accuracy. Although they have proved the effectiveness of sentiment analysis by improving prediction performance, they have not utilized the strength of the LSTM model by passing input data of succeeding days. Carosia, et al. [ 7 ] performed tweets sentiment analysis using Multi-Layer Perceptron (MLP) for Brazilian stock market. Tweet sentiments are investigated using an absolute number of tweets, weighted tweets sentiments by favorites, and weighted tweets sentiments by retweets. Comparative analysis showed that MLP outperformed other machine learning models. They have considered only tweets sentiments as input data for market movement prediction which raises questions on prediction accuracy. Because there may be tweets about past moves and users may use multiple accounts for the same tweets.

Sentiment analysis can also be approached by using sentiment lexicons. These lexicons are created manually using rules and vocabulary [ 19 , 20 ]. These manually created lexicons are small in size in turn limited the performance of prediction algorithms. In order to increase the size of sentiment lexicons, semi-automated approaches are proposed in the literature where manually created small-sized lexicons are used as seeds for automated approaches [ 21 ]. For instance, SentiWordNet 3.0 [ 22 ] and SenticNet 5 [ 23 ] are semi-automatically created sentiment lexicons. Sentiment lexicons are general and for a specific domain. For instance, Vader [ 19 ] and Harvard IV (HIV4) are general sentiment lexicons while LM is specific for the financial domain.

Li, et al. [ 24 ] adopted HIV4 and LM sentiment lexicons to generate news sentiment scores. They also used stock price data along with sentiment scores. Support Vector Machine (SVM) is used for making stock trend predictions. They have shown that sentiment scores enhanced prediction accuracy as compared to the Bag of Words (BoW) feature representation. However, the difference between the prediction accuracy of HIV4 and LM was insignificant. Although, it is mentioned that there are two groups of market-related and stock-related news. But they have not captured their impact separately so that deep insight could be explored.

Picasso, et al. [ 25 ] used technical indicators and news sentiment as input. They adopted a feed-forward neural network for stock trend prediction. News sentiment analysis is performed using LM and AffectiveSpace 2. They have examined the effectiveness of different feature sets. However, they have used a small-sized dataset not enough for comparative analysis.

Li, et al. [ 8 ] used four different manuals and semi-automatically created sentiment lexicons. They performed stock trend prediction for Hong Kong Stock Exchange. The proposed LSTM based stock prediction model incorporates stock prices, technical indicators, and news sentiment scores. They showed that hybrid input enhanced prediction accuracy. Moreover, sentiment lexicon specific for the financial domain performed better than other sentiment lexicon-based prediction performances. They have used LSTM layers in prediction model and incorporated sequential input data. But experiment to investigate the optimized time step for the input sequence was missing.

2.3 Stock prediction as a sequence learning problem

The problem of stock prediction is always challenging for the research community due to its high volatility. However, recent development in deep learning models opens new ways to tackle this type of data from different perspectives. For instance, Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), LSTM, etc. are efficiently used in literature.

Ding, et al. [ 26 ] extracted event-based textual features from the news. They improved the quality of extracted events using embedding and knowledge bases. They employed CNN for prediction and showed that their proposed approach for event extraction enhanced prediction accuracy. However, they have not taken advantage of stock price data along with textual features for stock trend prediction.

Vargas, et al. [ 14 ] input hybrid information into a prediction model based on hybrid deep learning approaches. Textual features are extracted using word and sentence embedding and technical indicators are derived from stock price data. The prediction model is built by combining CNN and LSTM layers and incorporating the previous day’s information in order to predict the stock trend of the next day. They showed that prediction accuracy is improved when technical indicators are also added to the input set along with the news titles. But they have not combined observations from successive days to form an input sequence that can be efficiently processed by LSTM.

Hu, et al. [ 9 ] incorporated hybrid information and utilized a hybrid deep learning model for news-oriented stock trend prediction. They adopted Gated Recurrent Unit (GRU) for sequential modeling. GRU is a variant of RNN. The authors [ 10 ], utilized stock price with a news sentiment score. They showed that analyzing the lengthy input sequences can significantly improve the accuracy of the LSTM based model. Li, et al. [ 8 ] input hybrid sequential information into the LSTM based prediction model. Pokhrel, et al. [ 27 ] performed a comparative study between CNN, LSTM, and GRU to predict closing price for Nepal Stock Exchange. They used Root Mean Square Error (RMSE) as a performance evaluation metric while the input sequence comprises stock prices, macroeconomics data, technical indicators and sentiment scores of financial news. They found that LSTM performed better than CNN and GRU. Li and Pan [ 28 ] used ensemble of LSTM and GRU deep learning models to learn from sequential data. They showed that ensemble learning significantly improves prediction performance. However, the above review in the context of the sequence learning problem shows that the optimized length of sequential data is not investigated for the problem under consideration. Moreover, hybrid deep learning models should be adopted to effectively enhance prediction performance.

In this discussion, we have found that previous approaches have recognized the significance of hybrid information for stock prediction. News processing using sentiment analysis based approaches have gained widespread interest in recent years because it allows to make faster and accurate conclusions. In stock prediction, sentiment analysis is mostly adopted by employing sentiment lexicons. However, research is going on for enrichment in sentiment lexicons by utilizing semi-automated and state of the art approaches.

News plays a vital role in stock prediction. Despite that, financial news categorization at a more granular level according to the structural hierarchy in the stock market is not addressed in the literature. Although, categorized news opens new perspectives to investigate news impact more deeply.

By addressing the research gap and considering the state of the art techniques, we propose an approach to predict the trend of the next day for target stock by employing information from previous days. Our proposed approach utilizes hybrid information as model input, employs sentiment analysis as a text mining technique, and incorporates categorized news into the prediction model along with their learned weights. Consecutive input vectors are combined to form a sequence according to a given time step and passed into the model. Moreover, the proposed model adopts LSTM layers to perform sequential learning. The proposed approach is discussed in the next section.

3. Proposed approach

We propose an LSTM based Weighted Categorized News (WCN-LSTM) stock trend prediction model. The WCN-LSTM model utilizes input data from textual and numerical sources and performs binary classification. It strengthens the prediction approach by using sequential data along with weighted news sentiment scores for different news categories. The WCN-LSTM is formulated as given below:

3.1 Problem statement

stock market prediction using lstm research paper

δ = Stock close price, volume, and technical indicators

θ1 = Market-related news sentiment scores

θ2 = Sector-related news sentiment scores

θ3 = Stock-related news sentiment scores

α , β , and γ are weights for θ1 , θ2 , and θ3 .

stock market prediction using lstm research paper

Trend = set of prediction labels

LSTM and Dense are neural network layers, used to predict stock trends.

The impact of financial news is equally important as the impact of stock price data in stock trend prediction. In our scenario, we have categorized financial news into three news groups according to the stock market structural hierarchy. Hence, there is a constraint that the total weight of categorized financial news has to be considered as 1 and the weight of each news category is learned through the training dataset. Eq 3 represents that sum of the total weights is equal to 1.

3.2 Architecture of the WCN-LSTM prediction model

The architecture of the WCN-LSTM is comprised of multiple neural network layers. In WCN-LSTM, each sequential input is passed to the LSTM layer. LSTM is a type of neural network deals with sequential data where the next observation is dependent on previous observations in a sequence. It can support short-term as well as long-term dependencies in a sequence by using its gating mechanism [ 5 , 29 ]. In our scenario, the LSTM cell is adopted according to the baseline approach proposed in [ 8 ] where the sigmoid activation function is replaced with the hard sigmoid activation function.

The dropout layer tackles the issue of overfitting in the deep neural network which suffers from this issue due to the small dataset for training. In our model, the dropout layer is added after each LSTM layer so it probabilistically excludes some input vector and recurrent connections to LSTM [ 8 ]. The concatenate layer is responsible to concatenate the input vectors coming from different tracks into an output vector.

The dense layer is a neural network layer that is connected deeply, which means each neuron in the dense layer receives input from all neurons of its previous layer. The dense layer is found to be the most commonly used layer in the models. In our WCN-LSTM, it is the last layer and contains a sigmoid activation function which is specifically used for binary classification problems.

We have combined all layers discussed above and presented our proposed prediction model WCN-LSTM. The architecture of our proposed model is demonstrated in Fig 2 .

thumbnail

https://doi.org/10.1371/journal.pone.0282234.g002

4. Dataset description and input sequence formulation

WCN-LSTM utilizes the news and stock price dataset. This section describes datasets along with the step of feature extraction and formulation of input sequences.

4.1 News dataset

There are two ways to perform text categorization, by adopting automatic classification algorithms or manually assigning categories to each news. Classification algorithms required enough amount of labelled data for training [ 30 , 31 ].

If there is no labelled data, manual effort is required to perform news categorization which is a tiring job. There are approaches that reduce manual efforts by introducing the semi-automatic way of text classification. These approaches manually define some domain related keywords, extend the list of keywords using NLP techniques, and then utilize these keywords in clustering methods [ 32 , 33 ].

We are considering news headlines as textual input data rather than complete news articles for text categorization. The authors Chen, et al. [ 34 ] suggested that news headlines contain less noise and more valuable information than news bodies. In [ 14 , 35 , 36 ], news titles are suggested to perform experiments.

Publically available news headlines are scrapped from 2006 to 2018 and then arranged into three different groups. In order to categorize financial news headlines according to their area of influence, we considered the structural hierarchy of the stock market where the stock market has multiple sectors and each sector has multiple stocks.

In [ 11 ], news headlines related to Pakistan Stock Exchange (PSX) are filtered and categorized according to the structural hierarchy of the stock market using a proposed semi-automatic approach. News headlines are divided into three news groups. The first group contains news related to the whole stock market. The second group contains news related to the specific sector and the third group holds news related to a specific stock. News headlines are aligned with their publishing date. For any specific date, there might be news related to all three news groups or maybe only for one or two news groups. In Table 1 , all news groups and their descriptions are given. PSX has many sectors but for this research work, a limited number of leading sectors are considered. News headlines related to these sectors are labeled according to the approach proposed in [ 11 ] for unlabeled data.

thumbnail

https://doi.org/10.1371/journal.pone.0282234.t001

PSX or KSE-related news headlines belong to the first news group where news represents whole stock market. While all selected sectors-related news belongs to the second news group. Whereas, all active stocks in selected sectors belong to the third news group. In Table 2 , all news categories along with their number of filtered news headlines is illustrated.

thumbnail

https://doi.org/10.1371/journal.pone.0282234.t002

4.1.1 Lexicon based sentiment scores.

We have selected three lexicons to generate a sentiment score vector for news headlines. Vader and HIV4 are general purpose lexicons while LM is specifically used in the financial domain.

4.1.2 Categorized news sentiment scores sequence.

stock market prediction using lstm research paper

Where n = lookback days

Eqs 4 , 5 , and 6 represent sequences of sentiment scores for the market, sector, and stock-related news categories.

4.2 Stock price dataset

Stock price data contains different attributes like open price, close price, volume, etc., and is aligned with the publishing date. We have selected close price and volume as input features. Furthermore, the dataset is processed to derive some new attributes. We have to perform binary classification to perform stock trend prediction. So the target variable trend is calculated according to the Eq 1 .

4.2.1 Technical indcators.

Technical indicators are data points derived from historic stock prices and represent future price trends. They are significantly used in literature along with other textual and numerical input data [ 8 , 13 , 25 ]. We have used ten technical indicators suggested by Li, et al. [ 8 ]. The adopted indicators are shown in Table 3 .

thumbnail

https://doi.org/10.1371/journal.pone.0282234.t003

4.2.2 Stock price and technical indicators sequence.

Input vector δ contains stock close price, volume, and ten different technical attributes. For the detail of technical indicators the work of Li, et al. [ 8 ] is suggested.

stock market prediction using lstm research paper

Eqs 7 and 8 represent the input vectors and sequence containing the stock price and its derived information.

5. Experimental setup

In this section, baseline model architecture, setup of both models along with hyper-parameter tuning are presented. Moreover, prediction models are evaluated using performance measures. For binary prediction problems, accuracy, precision, recall, and F1-score are commonly used as evaluation metrics. We have selected accuracy and F1-score performance measures to evaluate the prediction model’s performance.

5.1 Baseline model

stock market prediction using lstm research paper

In the baseline model, two sequences of stock price data and news sentiment scores are passed as input into the model. Input sequences are passed into the concatenation layer where both sequences are combined and observations in sequences are aligned according to publishing date. Our WCN-LSTM model is adapted from the baseline model and detail related to the neural network layers have been discussed in section 3. The architecture of the baseline model is demonstrated in Fig 3 .

thumbnail

https://doi.org/10.1371/journal.pone.0282234.g003

5.2 Prediction models’ setup

In order to evaluate the neural network based prediction model, we have selected binary cross-entropy as a loss function in our binary classification problem. The neural network model’s training is performed by an optimizer algorithm. we have selected Root mean square prop (RMSProp), suggested by [ 37 ] for recurrent neural network. Learning rate is a critical hyper parameter of an optimizer algorithm. It defines the step size of each iteration in optimization algorithm while approaching the minima of loss function. Initially learning rate is set to 0.001. Moreover, a learning rate deduction technique is employed to adjust it according to the change of loss. After five consecutive epochs, if no change in loss, the learning rate is reduced to one tenth of itself [ 8 ]. Batch size and the maximum number of epochs are set to 32 and 500 for both prediction models.

5.3 Prediction models’ validation and optimization

In order to validate and optimize prediction model performance, we have performed model training using cross validation and adopted the grid search technique in search of optimizing the model’s hyper-parameters. We have also incorporated an early stop mechanism to improve generalization and to reduce overfitting of the deep learning model.

To accomplish the model tuning process, we have divided the dataset into three parts. The first part contains 70% of the total data and is used for model learning. While the remaining data is divided into two equal parts used for model testing and implementing an early-stop mechanism.

5.3.1 Time series split cross validation. In stock prediction, data related to the stock market is treated as a time series. That is observations are collected at regular intervals of time. Mathematically, it is described as in Eqs 5 , 6 , 7 , 8 , and 15 for numerical and textual data. For time series data, the shuffling of data is incorrect to validate the model’s performance. Consequently, time series split cross validation is suggested in the literature by Ratto, et al. [ 38 ] for stock prediction. In a time split cross validation scheme, training and validation sets are selected in each iteration so that the validation set is always ahead of the training set. Likewise, we have adopted time series cross validation and divided the training set into 3-folds.

5.3.2 Hyper-parameter optimization.

In machine learning, hyper-parameters are the model’s parameters that control the learning model. Hyper-parameter optimization is the process to select the best combination of hyper-parameters values so that performance of the learning model can be optimized.

We have adopted a grid search technique to search for the best value for hyper-parameters. We have performed a grid search for baseline and WCN-LSTM and identified the best values by considering the models’ accuracy from the candidate values. It is shown in Tables 4 and 5 for baseline (LSTM) and WCN-LSTM prediction models.

thumbnail

https://doi.org/10.1371/journal.pone.0282234.t004

thumbnail

https://doi.org/10.1371/journal.pone.0282234.t005

5.3.3 Early stopping.

In the training phase of the neural network, an epoch is considered a critical hyper-parameter. If the number of epochs is too high, then it can lead to overfitting of the training dataset. Whereas, a less number of epochs may get an underfit model.

Early stopping is a mechanism that controls the number of epochs by monitoring the performance measure and stops training when the model’s performance reaches the maximum. We have implemented this mechanism for baseline (LSTM) and WCN-LSTM prediction models and training stops when validation loss does not decrease in 10 consecutive epochs in order to get the regularized models.

After performing all the validation and optimization steps we have got optimized hyper-parameters values for all scenarios under consideration. By using these values, we have finalized models for all combinations of sentiment lexicon and time steps.

6. Experimental results and discussion

We have employed proposed and baseline models to perform experiments. Furthermore, we also conducted statistical testing for making quantitative decisions. Finally, the discussion is presented to answer research questions and to describe the importance of the proposed model.

6.1 Experimental results

Experiments are performed for three different time steps: 3, 7, and 10 previous days’ information, using three different sentiment dictionary scores. For the baseline model, input sequences are generated using the selected time step and concatenated using date, and then passed to the first LSTM layer in the baseline model.

For WCN-LSTM input sequences are generated using the same approach but they are not concatenated. All these sequences are aligned according to the stock transactions and news publishing dates. These sequences are input to the model using a unique path for each sequence.

Experimental results are shown in graphs. Where columns represent stocks from all leading sectors of PSX and rows represent the accuracy of prediction models for each sentiment dictionary.

6.1.1 For time step = 3.

A time series sequence is generated using the stock data from the last three days of transactions along with stock-related news headlines. Experiments are performed using the proposed and baseline model using stock price and sentiment scores sequences calculated from three different sentiment dictionaries. It is shown in Figs 4 and 5 .

thumbnail

https://doi.org/10.1371/journal.pone.0282234.g004

thumbnail

https://doi.org/10.1371/journal.pone.0282234.g005

It can be observed clearly that WCN-LSTM performs better in terms of accuracy for most of the stocks. Using the HIV4 sentiment dictionary, WCN-LSTM gets better accuracy for 32 out of 41, while using LM, 28 out of 41, and using Vader, 22 out of 41 stocks. For the F1-score, baseline performance is better than WCN-LSTM using HIV4 and LM sentiment dictionaries. The baseline model produces better F1-score for 26 stocks using HIV4 and 20 stocks using LM out of 41 stocks. While WCN-LSTM gives a better F1-score for 24 stocks using Vader. While some of the stocks from different sectors have the same accuracy and F1-score for both models.

The accuracy of WCN-LSTM for all three sentiment dictionaries can be analyzed in Fig 4 . It is obvious that the accuracy of the proposed model using HIV4 is better than LM and Vader in most experiments. While sentiment scores calculated using the LM sentiment dictionary enhanced WCN-LSTM performance better than Vader lexicon.

6.1.2 For time step = 7

A time series sequence is generated using the data from the last seven days of transactions along with stock-related news headlines. Experiments are performed using a proposed and baseline model for sentiment scores calculated from three different sentiment dictionaries. Experimental results for time step 7 are illustrated in Figs 6 and 7 .

thumbnail

Accuracy of prediction models by incorporating information from the last 7 days using the test set.

https://doi.org/10.1371/journal.pone.0282234.g006

thumbnail

https://doi.org/10.1371/journal.pone.0282234.g007

It can be observed that WCN-LSTM performs better in terms of accuracy for most of the stocks. Using the HIV4 sentiment dictionary, WCN-LSTM gets better accuracy for 28 out of 41, while using LM and Vader, 23 out of 41 stocks. For the F1-score, WCN-LSTM produces better F1-score for 20 stocks using HIV4 and LM out of 41 stocks. While LSTM gives the better F1-score for 25 stocks using Vader. While some of the stocks from different sectors have the same accuracy and F1-score for both models.

The accuracy of WCN-LSTM for all three sentiment dictionaries is analyzed and it is observed that the accuracy of the model using HIV4 is better than LM and Vader in most experiments. While sentiment lexicons Vader and LM, equally influenced WCN-LSTM performance.

6.1.3 For time step = 10.

A time series sequence is generated using the data from the last 10 transactions days along with stock-related news headlines. Experiments are performed using proposed and baseline model for sentiment scores calculated from three different sentiment dictionaries. It is presented in Figs 8 and 9 .

thumbnail

https://doi.org/10.1371/journal.pone.0282234.g008

thumbnail

https://doi.org/10.1371/journal.pone.0282234.g009

WCN-LSTM performs better in terms of accuracy for most of the stocks. Using HIV4 sentiment dictionary, WCN-LSTM gets better accuracy for 24 out of 41, while using LM, 22 out of 41, and using Vader, 26 out of 41 stocks. For the F1-score, baseline performance is better than WCN-LSTM using the HIV4, LM, and Vader sentiment dictionaries. The baseline model produces better F1-score for 24 stocks using HIV4 and 23 stocks using LM and Vader out of 41 stocks. While some of the stocks from different sectors have the same accuracy and F1-score for both models.

The accuracy of WCN-LSTM for all three sentiment dictionaries is analyzed using experimental results. WCN-LSTM performance using Vader is slightly better than HIV4. In the same way sentiment scores calculated using HIV4 produced better WCN-LSTM results than the LM sentiment dictionary.

6.2 Statistical analysis

We have adopted Wilcoxon signed-rank test to statistically compare whether the predictive performance of the two models is significantly different from each other. The Wilcoxon signed-rank test is a non-parametric and distribution free technique. It is considered safer than a parametric t-test due to the exemption in the assumption of normality and homogeneity of variance [ 39 ]. We have followed the work of [ 40 – 42 ] to use the Wilcoxon signed-rank test in order to comparatively analyze the predictive performance of WCN-LSTM and LSTM forecasting models.

We have constructed three hypotheses to statistically analyze the experimental results. For hypothesis testing, the model accuracy measure which is a continuous value is taken as a response variable. While for all three hypotheses, models, time steps, and sentiment lexicons are taken as independent variables. Our independent variable consists of two related groups where the same participants are presented in both groups. In order to conclude that one model performs better than the other, we have defined null and alternative hypotheses accordingly.

We have chosen 0.05 as a significant level or threshold value. To accept or reject the null hypothesis, the p-value is examined. If the p-value is less than a significant level, then the null hypothesis is rejected at a confidence level of 95%. The null hypothesis is accepted if the p-value is greater than the significance level. In the results, a p-value less than significance level is shown with ‘*’.

6.2.1 First hypothesis.

In our first hypothesis, we want to compare the prediction accuracy of WCN-LSTM and LSTM forecasting models denoted as Acc WCN-LSTM and Acc LSTM Our null hypothesis states that there is no significant difference between the predictive performance of both models. While the alternative hypothesis states that the predictive performance of WCN-LSTM is better than the LSTM model.

Null Hypothesis: H 0 : Acc WCN-LSTM = Acc LSTM

Alternative Hypothesis: H 1 : Acc WCN-LSTM > Acc LSTM

We have performed hypothesis testing for three different time steps and three different sentiment dictionaries and which is demonstrated in Table 6 . According to Table 6 , the predictive performance of WCN-LSTM was significantly better than the LSTM forecasting model in 7 out of 9 different scenarios.

thumbnail

https://doi.org/10.1371/journal.pone.0282234.t006

6.2.2 Second hypothesis.

In the second hypothesis, WCN-LSTM prediction accuracy is compared using different sentiment dictionaries for each time step. Because it doesn’t allow us to compare more than two groups so we have conducted multiple tests. The null hypothesis states that there is no significant difference appears in prediction accuracy by employing different sentiment lexicons in the WCN-LSTM model. The null and alternative hypotheses are given below:

Null Hypothesis (H 0 ): Acc WCN-LSTM (HIV4) = Acc WCN-LSTM (LM),

Acc WCN-LSTM (HIV4) = Acc WCN-LSTM (Vader),

Acc WCN-LSTM (LM) = Acc WCN-LSTM (Vader)

Alternative Hypothesis (H 1 ): Acc WCN-LSTM (HIV4) > Acc WCN-LSTM (LM),

Acc WCN-LSTM (HIV4) > Acc WCN-LSTM (Vader),

Acc WCN-LSTM (LM) > Acc WCN-LSTM (Vader)

According to Table 7 , for time steps 3 and 7, sentiment lexicon HIV4 performs significantly better than LM and Vader sentiment lexicons.

thumbnail

https://doi.org/10.1371/journal.pone.0282234.t007

6.2.3 Third hypothesis

In the third hypothesis, comparisons are conducted between the prediction accuracy of WCN-LSTM for the sequences formed using different time steps. The null hypothesis states that there is no significant difference appears in the prediction accuracy of WCN-LSTM by incorporating sequences formed using different time steps. The Wilcoxon signed-rank test performs comparisons in different combinations for more than 2 groups. Moreover, these tests are performed for each sentiment lexicon adopted in experiments. The null and alternative hypotheses for multiple combinations of comparisons are given below:

Null Hypothesis (H 0 ): Acc WCN-LSTM (t3) = Acc WCN-LSTM (t7),

Acc WCN-LSTM (t3) = Acc WCN-LSTM (t10),

Acc WCN-LSTM (t7) = Acc WCN-LSTM (t10)

Alternative Hypothesis (H 1 ): Acc WCN-LSTM (t3) > Acc WCN-LSTM (t7),

Acc WCN-LSTM (t3) > Acc WCN-LSTM (t10),

Acc WCN-LSTM (t7) > Acc WCN-LSTM (t10)

In order to perform Wilcoxon signed-rank test for third hypothesis, all test combinations for each sentiment lexicon are tested. It is shown in Table 8 , using the HIV4 sentiment lexicon, time steps 3 and 7 performs better than time step 10. While using the LM sentiment lexicon, time step 7 performs better than time steps 3 and 10. Moreover, using the Vader sentiment lexicon, there is no significant difference between all the three time steps.

thumbnail

https://doi.org/10.1371/journal.pone.0282234.t008

6.3 Discussion

We have implemented WCN-LSTM and LSTM based prediction models in order to answer our empirical research questions. We have also compared WCN-LSTM, qualitatively with existing models to highlight its novelty. Our comparative findings from different perspectives are discussed below:

6.3.1 The superiority of WCN-LSTM.

By observing the experimental results illustrated in Figs 4 – 9 , it has been observed that WCN-LSTM incorporates sentiment scores of categorized news along with weighted impact, achieving better accuracy than LSTM. In the statistical analysis phase, Wilcoxon signed-rank test is adopted in order to endorse the significance of weighed categorized news incorporated in the prediction model. In Table 6 , it is shown that the alternative hypothesis is accepted. Therefore, the performance of WCN-LSTM is better than the LSTM prediction model.

The performance of WCN-LSTM could also be observed at the sector level. The average accuracy achieved by WCNLSTM for each stock in a sector is calculated and compared from the baseline model. It could be observed in Figs 10 – 12 , that WCN-LSTM clearly performs better than LSTM especially for sentiment lexicon HIV4 and time steps 3 and 7.

thumbnail

https://doi.org/10.1371/journal.pone.0282234.g010

thumbnail

https://doi.org/10.1371/journal.pone.0282234.g011

thumbnail

https://doi.org/10.1371/journal.pone.0282234.g012

WCN-LSTM results could be analyzed for individual stocks to study the relationship between prediction accuracy and the number of news headlines in each news category. The number of market news is the same for all stocks while the number of sectors and stock-related news are different for each sector and stock. In Table 2 , each news category is listed along with the number of news headline. In Figs 10 – 12 , it is presented that Oil & Gas and Commercial Banks are two sectors with a large number of active stocks that scored better prediction accuracy produced by WCNLSTM in all cases of sentiment lexicon and time steps.

The sector Technology & Communication performs better except in one case. It could be deduced that the number of sector-related news headlines influenced positively on sector’s prediction accuracy. But this statement violated the case of the Textile sector where sector related new headlines are more than any other sector and the number of active stocks selected for experiments is less than any other sector. Furthermore, for the sector Commercial Banks number of stock related news are comparatively more than the any other sector’s stocks. In the case of PSX, it could be deduced that the large size of all news categories improves prediction accuracy at sector and stock level.

6.3.2 Comparison between sentiment lexicons and input sequence length.

The comparative analysis between sentiment lexicons is performed statistically and demonstrated in Table 7 , in order to answer the research question RQ4. It reveals that HIV4 performed better than LM and Vader sentiment lexicons. Finally, experimental results disclosed that time step 3 and 7 for generating input sequences significantly enhanced WCN-LSTM performance. In Table 8 , findings are statistically ascertained which answered the research question RQ5.

6.3.3 Implications.

In Fig 13 , a qualitative comparison is presented in order to show the novelty of WCN-LSTM among the other state of the art stock prediction models. We have reviewed the existing work and identified the strength of hybrid input, sentiment analysis, and sequence learning in making predictions. Although the baseline approach incorporated all these features, but sequence length is not investigated in search of optimized length. Furthermore, we have found that there is a very rare attempt to group financial news according to their area of influence in the stock market and incorporate these news groups into the prediction model simultaneously. In existing work, the GICS standard is used to group financial news. However, it is revealed that GICS has a limitation in finding homogeneous news groups. By taking into account the existing opportunities and limitations, our proposed model WCN-LSTM employed hybrid input, lexicon-based sentiment analysis, and sequential learning. We have incorporated news groups related to the structural hierarchy of the stock market. We incorporated these news groups simultaneously into WCN-LSTM with learned weights. This is the first paper that suggests a sophisticated prediction model for the incorporation of new groups that are related to the structural hierarchy of the stock market. Furthermore, we have considered the case of PSX which is not explored yet by the research community for forecasting using hierarchical news groups. At a broader level, we have established a news sensitive stock prediction model that utilizes news groups that influence the market volatility with varying impact. For instance, news groups related to politics, terrorism, foreign affairs, natural disasters, etc., could also be incorporated into the prediction model according to their learned weights with minor adaptation.

thumbnail

https://doi.org/10.1371/journal.pone.0282234.g013

7. Conclusion and future work

In this paper, we addressed the stock trend prediction problem by utilizing hybrid input, weighted news groups, sentiment analysis, and sequential learning. The LSTM layer is used to implement sequential learning, specifically designed to efficiently memorizes long and short-term dependencies in an input sequence. WCN-LSTM incorporates three news groups, namely market news, sector news, and stock news according to the structural hierarchy of the stock market. We have selected the case of the Pakistan Stock Exchange to conduct experiments. WCN-LSTM prediction model is adapted from LSTM based prediction model which we have considered as a baseline model to perform quantitative analysis. We have identified five empirical research questions. In order to answer these questions, experiments are performed using WCN-LSTM and baseline prediction models. Moreover, we also perform statistical analysis using Wilcoxon signed-rank test. We have shown that WCN-LSTM performed better than the baseline model. While for WCN-LSTM, sentiment lexicon HIV4 for time steps 3 and 7 performs satisfactorily among other candidate choices. To present the novelty of WCN-LSTM, we conduct a qualitative analysis by comparing our proposed model features with existing stock prediction models. However, there is a strong requirement for homogenous news groups. These news groups are required to learn their degree of influence on market dynamics before making predictions.

We have found that the LM sentiment lexicon did not perform well although specifically designed for the financial domain. To tackle this shortcoming, we are interested in adapting the sentiment lexicon for PSX using transfer learning based approaches like the Bert language model and incorporating it into our proposed WCN-LSTM model. We are also curious to examine other textual feature representations like word and event embedding along with knowledge bases that refines embedding. Other sources of textual features like company annual reports as well as other news groups related to politics, terrorism, and foreign policies could also be incorporated into the prediction model for further enhancement in prediction quality. Furthermore, there is much room to investigate hybrid architectures where the sequence learning model could be combined with other deep learning models to improve prediction results of our proposed model.

Supporting information

https://doi.org/10.1371/journal.pone.0282234.s001

Acknowledgments

This research work is supported by the Higher Education Commission (HEC), Islamabad, Pakistan.

  • View Article
  • Google Scholar
  • PubMed/NCBI
  • 34. Chen D, Zou Y, Harimoto K, Bao R, Ren X, Sun X. Incorporating fine-grained events in stock movement prediction. Proceedings of the Second Workshop on Economics and Natural Language Processing Hong Kong,: Association for Computational Linguistics 2019. p. 31–40.

AIP Publishing Logo

  • Previous Article
  • Next Article

Improving stock market predictions using LSTM based on MLP’s comparative analysis

[email protected]

[email protected]

  • Article contents
  • Figures & tables
  • Supplementary Data
  • Peer Review
  • Reprints and Permissions
  • Cite Icon Cite
  • Search Site

Yuvika Saini , Aleem Ali , Ananshu Kukreja; Improving stock market predictions using LSTM based on MLP’s comparative analysis. AIP Conf. Proc. 19 March 2024; 3072 (1): 020023. https://doi.org/10.1063/5.0199407

Download citation file:

  • Ris (Zotero)
  • Reference Manager

The stock market serves as a vital investment platform, attracting numerous financial investors due to its growing capitalization. However, predicting stock prices accurately has always been a challenging task, requiring advanced algorithmic techniques. This research paper presents an innovative approach to improve stock market predictions by leveraging the strengths of Long Short-Term Memory (LSTM) based on a comparative analysis with Multi-Level Perceptron (MLP) models. The study compares the predictive capabilities of Random Forest, MLP, and LSTM models, with MLP demonstrating superior accuracy compared to Random Forest in the initial analysis. Building upon this finding, LSTM is employed by incorporating MLP predictions to forecast future market trends. The proposed technique demonstrates reliable results by combining machine learning and deep learning approaches. The MLP analysis further enhances the suggested model, resulting in improved accuracy and reduced mean square error. The result highlights the potential of integrating LSTM and MLP models for enhanced stock market predictions, offering valuable insights for financial forecasting and opening avenues for further research and refinement in this field.

Sign in via your Institution

Citing articles via, publish with us - request a quote.

stock market prediction using lstm research paper

Sign up for alerts

  • Online ISSN 1551-7616
  • Print ISSN 0094-243X
  • For Researchers
  • For Librarians
  • For Advertisers
  • Our Publishing Partners  
  • Physics Today
  • Conference Proceedings
  • Special Topics

pubs.aip.org

  • Privacy Policy
  • Terms of Use

Connect with AIP Publishing

This feature is available to subscribers only.

Sign In or Create an Account

Book cover

International Conference on Frontiers of Intelligent Computing: Theory and Applications

FICTA 2023: Evolution in Computational Intelligence pp 161–172 Cite as

Indian Stock Price Prediction Using Long Short-Term Memory

  • Himanshu Rathi   ORCID: orcid.org/0000-0002-6066-842X 8 ,
  • Ishaan Joardar   ORCID: orcid.org/0009-0009-1841-8345 8 ,
  • Gaurav Dhanuka   ORCID: orcid.org/0000-0002-6466-5728 8 ,
  • Lakshya Gupta 9 &
  • J. Angel Arul Jothi   ORCID: orcid.org/0000-0002-1773-8779 8  
  • Conference paper
  • First Online: 21 November 2023

86 Accesses

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 370))

In recent years, numerous researchers across the world have developed various methods for predicting stock prices. However, the accuracy of these models has been found to be inconsistent. This field, known as stock market prediction and analysis, offers potential for further improvement. This paper proposes a framework based on a long short-term memory (LSTM)-based deep learning model, capable of accurately predicting the closing prices of companies listed on the National Stock Exchange (NSE) or Bombay Stock Exchange (BSE) of India. The LSTM model was trained using historical stock market data of Tata Motors and demonstrated a high degree of accuracy in predicting future price movements. The LSTM approach was found to be superior to other methods in terms of accuracy and precision.

  • Stock prediction
  • Machine learning
  • Long short-term memory

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Sarkar, S.: An improved rough set data model for stock market prediction. In: 2014 2nd International Conference on Business and Information Management (ICBIM), pp. 96–100 (2014). https://doi.org/10.1109/ICBIM.2014.6970963

Ramezanian, R., Peymanfar, A., Ebrahimi, S.B.: An integrated framework of genetic network programming and multi-layer perceptron neural network for prediction of daily stock return: an application in Tehran stock exchange market. Appl. Soft Comput. 82 (C) (2019). https://doi.org/10.1016/j.asoc.2019.105551

Akita, R., Yoshihara, A., Matsubara, T., Uehara, K.: Deep learning for stock prediction using numerical and textual information. In: 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), pp. 1–6 (2016). https://doi.org/10.1109/ICIS.2016.7550882

Hoseinzade, E., Haratizadeh, S.: CNNpred: CNN-based stock market prediction using a diverse set of variables. Exp. Syst. Appl. 129 , 273–285 (2019). ISSN 0957-4174. https://doi.org/10.1016/j.eswa.2019.03.029

Sezer, O.B., Ozbayoglu, A.M. (2018) Algorithmic financial trading with deep convolutional neural networks: time series to image conversion approach. Appl. Soft Comput. 70 , 525–538 (2018). ISSN 1568-4946. https://doi.org/10.1016/j.asoc.2018.04.024

Long, J., Chen, Z., He, W., Wu, T., Ren, J.: An integrated framework of deep learning and knowledge graph for prediction of stock price trend: an application in Chinese stock exchange market. Appl. Soft Comput. 91 , 106205 (2020). ISSN 1568-4946. https://doi.org/10.1016/j.asoc.2020.106205

Nair, B.B., Dharini, N.M., Mohandas, V.P.: A stock market trend prediction system using a hybrid decision tree-neuro-fuzzy system. In: Proceedings of the 2010 International Conference on Advances in Recent Technologies in Communication and Computing (ARTCOM’10), pp. 381–385. IEEE Computer Society, USA (2010). https://doi.org/10.1109/ARTCom.2010.75

Wang, L., Wang, Q.: Stock market prediction using artificial neural networks based on HLP. In: Third International Conference on Intelligent Human-Machine Systems and Cybernetics, pp. 116–119 (2011). https://doi.org/10.1109/IHMSC.2011.34

Pal, S.S., Kar, S.: Time series forecasting for stock market prediction through data discretization by fuzzistics and rule generation by rough set theory. Math. Comput. Simul. 162 , 18–30. (2019). ISSN 0378-4754. https://doi.org/10.1016/j.matcom.2019.01.001

Kimoto, T., Asakawa, K., Yoda, M., Takeoka, M.: Stock market prediction system with modular neural networks. In: 1990 IJCNN International Joint Conference on Neural Networks, vol. 1, pp. 1–6, (1990). https://doi.org/10.1109/IJCNN.1990.137535

Verma, R., Choure, P., Singh, U.: Neural networks through stock market data prediction. In: International Conference of Electronics. Communication and Aerospace Technology (ICECA), pp. 514–519 (2017). https://doi.org/10.1109/ICECA.2017.8212717

Kadam, S., Jain, S.: Stock market prediction using machine learning. IJIRT 8 (2) (2022). ISSN: 2349-6002

Google Scholar  

Maini, S.S., Govinda, K.: Stock market prediction using data mining techniques. In: International Conference on Intelligent Sustainable Systems (ICISS), pp. 654–661 (2017). https://doi.org/10.1109/ISS1.2017.8389253

Khatri, S.K., Srivastava, A.: Using sentimental analysis in prediction of stock market investment. In: 2016 5th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), pp. 566–569. https://doi.org/10.1109/ICRITO.2016.7785019

Garcia-Vega, S., Zeng, X.-J., Keane, J.: Stock returns prediction using kernel adaptive filtering within a stock market interdependence approach. Exp. Syst. Appl. 160 , 113668 (2020). ISSN 0957-4174. https://doi.org/10.1016/j.eswa.2020.113668

Chen, K., Zhou, Y., Dai, F.: A LSTM-based method for stock returns prediction: a case study of China stock market. In: IEEE International Conference on Big Data (Big Data), 2823–2824 (2015). https://doi.org/10.1109/BigData.2015.7364089

Selvamuthu, D., Kumar, V., Mishra, A.: Indian stock market prediction using artificial neural networks on tick data. Financ. Innov. 5 , 16 (2019). https://doi.org/10.1186/s40854-019-0131-7

Raut, S., Shinde, I., Malathi, D.: Int. J. Pure Appl. Math. 115 (8), 71–77 (2017). ISSN: 1311-8080 (printed version). ISSN: 1314-3395 (online version). Special issue. http://www.ijpam.eu

Nabipour, M., Nayyeri, P., Jabani, H., Mosavi, A., Salwana, E., Shahab, S.: Deep learning for stock market prediction. Entropy (Basel, Switz.) 22 (8), 840 (2020). https://doi.org/10.3390/e22080840

Kranthi, R.: Stock market prediction using machine learning. Int. Res. J. Eng. Technol. (IRJET) 05 (10) (2018). e-ISSN: 2395-0056, p-ISSN: 2395-0

Yahoo Finance. https://finance.yahoo.com/quote/TTM/ . Last accessed 6 Dec 2022

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9 (8), 1735–1780 (1997)

Draper, N.R., Smith, H.: Applied Regression Analysis. Wiley-Interscience (1998). ISBN 978-0-471-17082-2

Download references

Author information

Authors and affiliations.

Department of Computer Science, Birla Institute of Technology and Science, Pilani, Dubai Campus, Dubai, UAE

Himanshu Rathi, Ishaan Joardar, Gaurav Dhanuka & J. Angel Arul Jothi

Delhi Technological University, Delhi, India

Lakshya Gupta

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Himanshu Rathi .

Editor information

Editors and affiliations.

Department of Electronics Engineering, Faculty of Engineering and Technology (UNSIET), Veer Bahadur Singh Purvanchal University, Jaunpur, Uttar Pradesh, India

Vikrant Bhateja

Middlesex University, London, UK

Xin-She Yang

Faculty of Engineering, University of Porto, Porto, Portugal

Marta Campos Ferreira

Cardiff Metropolitan University, Cardiff, Warwickshire, UK

Sandeep Singh Sengar

Institute for Technological Development and Innovation in Communications, University of Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain

Carlos M. Travieso-Gonzalez

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper.

Rathi, H., Joardar, I., Dhanuka, G., Gupta, L., Angel Arul Jothi, J. (2023). Indian Stock Price Prediction Using Long Short-Term Memory. In: Bhateja, V., Yang, XS., Ferreira, M.C., Sengar, S.S., Travieso-Gonzalez, C.M. (eds) Evolution in Computational Intelligence. FICTA 2023. Smart Innovation, Systems and Technologies, vol 370. Springer, Singapore. https://doi.org/10.1007/978-981-99-6702-5_13

Download citation

DOI : https://doi.org/10.1007/978-981-99-6702-5_13

Published : 21 November 2023

Publisher Name : Springer, Singapore

Print ISBN : 978-981-99-6701-8

Online ISBN : 978-981-99-6702-5

eBook Packages : Intelligent Technologies and Robotics Intelligent Technologies and Robotics (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Stock Market Prediction Using Machine Learning

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

IMAGES

  1. Block diagram of stock prediction using LSTM

    stock market prediction using lstm research paper

  2. Prediction of stock price direction using the LASSO-LSTM model combines

    stock market prediction using lstm research paper

  3. Use Case Diagram For Stock Market Prediction

    stock market prediction using lstm research paper

  4. Flowchart of the IPSO-LSTM model for stock index forecasting

    stock market prediction using lstm research paper

  5. Stock Price prediction using LSTM and SVR

    stock market prediction using lstm research paper

  6. (PDF) Multivariate LSTM for Stock Market Volatility Prediction

    stock market prediction using lstm research paper

VIDEO

  1. Stock Market Prediction and Forecasting using LSTM

  2. Biggest Stock Market Risks in 2024 ?? Market Crash 2024

  3. B3 Stock Price Prediction Using LSTM Neural Networks and Sentiment Analysis

  4. LetsGrowMore || Task 2

  5. Stock Market Prediction Using LSTM Algorithm

  6. Stock Market Prediction & Forecasting||Stacked LSTM||Task 02-LGMVIP||PART-1

COMMENTS

  1. (PDF) Stock Market Price Prediction Using LSTM

    Abstract. This research proposes an innovative approach involving the implementation of an LSTM (Long Short-Term Memory) model for forecasting stock prices. The predictive analysis relies on ...

  2. Stock Price Prediction Using Lstm: An Advanced Review

    Abstract. This paper presents a brief study of some existing methods by which a retail investor can predict the stock price. `Either the price to go down or up depending upon the quarterly result, financial news inflow, technical behavior, or market sentiment due to global scenario, in the past few days.

  3. Short-term stock market price trend prediction using a comprehensive

    In the era of big data, deep learning for predicting stock market prices and trends has become even more popular than before. We collected 2 years of data from Chinese stock market and proposed a comprehensive customization of feature engineering and deep learning-based model for predicting price trend of stock markets. The proposed solution is comprehensive as it includes pre-processing of ...

  4. LSTM based stock prediction using weighted and categorized ...

    A significant correlation between financial news with stock market trends has been explored extensively. However, very little research has been conducted for stock prediction models that utilize news categories, weighted according to their relevance with the target stock. In this paper, we show that prediction accuracy can be enhanced by incorporating weighted news categories simultaneously ...

  5. LSTM-based Deep Learning Model for Stock Prediction and Predictive

    The future work includes improving the model by using some hybrid prediction-based models to get better predictions of stock prices, study existing portfolio models, improve the proposed model from the perspective of genetic algorithms and particle swarm optimization. This is an important approach for future research.

  6. Stock Price Prediction using Linear Regression and LSTM Neural Network

    The stock market has a profound influence on the modern society. Therefore, predicting stock prices is always a hot research topic. In this paper, we use linear regression models and LSTM models based on machine learning to predict the stock price of Amazon. In order to let the algorithm more available for individual investors, we only use the historical stock price of the company as data ...

  7. Predicting Stock Market Movements Using Long Short-Term Memory (LSTM

    This paper explores using artificial intelligence (AI) to predict stock market movements and build optimal portfolios. The research methodology involves using LSTM networks to predict stock performance. The study aims to combine AI with human expertise to develop an intelligent trading system. The findings emphasize the importance of selecting appropriate AI approaches for accurate predictions ...

  8. Stock Price Prediction Using Machine Learning and LSTM-Based Deep

    Prediction of stock prices has been an important area of research for a long time. While supporters of the efficient market hypothesis believe that it is impossible to predict stock prices accurately, there are formal propositions demonstrating that accurate modeling and designing of appropriate variables may lead to models using which stock prices and stock price movement patterns can be very ...

  9. Stock Price Prediction Using CNN and LSTM-Based Deep Learning Models

    Stock Price Prediction Using CNN and LSTM-Based Deep Learning Models. Designing robust and accurate predictive models for stock price prediction has been an active area of research for a long time. While on one side, the supporters of the efficient market hypothesis claim that it is impossible to forecast stock prices accurately, many ...

  10. Implementing and Analysis of RNN LSTM Model for Stock Market Prediction

    In this work, we use a recurrent neural network (RNN) with long short-term memory (LSTM) to study stock market prediction problem. The main aim of this study is to measure the feasibility and effectiveness of LSTM models in stock price forecasting. LSTM model is deployed with different configurations. Multi-layer neural network is built using a ...

  11. PDF Stock Price Prediction using Sentiment Analysis and Deep Learning for

    Keywords: Sentiment analysis, Stock Prediction, LSTM, Random Forest 1 Introduction The objective of this exercise has been to predict future stock prices using Machine Learning and other Artificial Intelligence. The exercise started with a comprehensive review of available literature in this domain. Research papers as well as online sources

  12. Stock Price Prediction Based on LSTM Deep Learning Model

    Predicting the stock market is either the easiest or the toughest task in the field of computations. There are many factors related to prediction, physical factors vs. physiological, rational and irrational , capitalist sentiment, market , etc. All these aspects combine to make stock costs volatile and are extremely tough to predict with high accuracy. The prices of a stock market depend very ...

  13. Improving stock market predictions using LSTM based on MLP's

    However, predicting stock prices accurately has always been a challenging task, requiring advanced algorithmic techniques. This research paper presents an innovative approach to improve stock market predictions by leveraging the strengths of Long Short-Term Memory (LSTM) based on a comparative analysis with Multi-Level Perceptron (MLP) models.

  14. A Combined Model for INDEX Price Forecasting Using LSTM, RNN ...

    A Combined Model for INDEX Price Forecasting Using LSTM, RNN, and GRU. Conference paper; First Online: 11 April 2024; pp 499-514; ... Improving the criteria of the investment on stock market using data mining techniques: the case of S &P 500 index. ... Abdullah H (2019) Stock price prediction using LSTM, RNN, and CNN. Google Scholar ...

  15. PDF Stock Price Prediction Using CNN and LSTM- Based Deep Learning Models

    This flattened vector is fed as an input to the decoder LSTM sub-model. The decoder LSTM sub-model remains exactly identical to that in the LSTM#1 model discussed earlier. As in the case of LSTM#1, this model is also trained using a batch size of 16, over 20 epochs. The architecture of the model is depicted in Fig. 4.

  16. S_I_LSTM: stock price prediction based on multiple data sources and

    An S_I_LSTM framework is designed by incorporating multiple data sources and investors' sentiment. Sentiment analysis method based on CNN is proposed to calculate the investors' sentiment index. LSTM network with attention mechanism is proposed to predict stock price. The rest of this paper is organised as follows.

  17. PDF Using LSTM in Stock prediction and Quantitative Trading

    In this research, we have constructed and applied the state-of-art deep learning sequential model, namely Long Short Term Memory Model (LSTM), Stacked-LSTM and Attention-Based LSTM, along with the traditional ARIMA model, into the prediction of stock prices on the next day. Moreover, using our prediction,

  18. Stock Price Prediction Using Machine Learning Techniques

    The main method used in this article is the LSTM structure, and the building blocks are constructed into a full Recurrent Neural Network to predict the price for several famous technology companies from NASDAQ. ... Stock Price Prediction Using Machine Learning Techniques @article{Li2024StockPP, title={Stock Price Prediction Using Machine ...

  19. Stock Prediction with Stacked-LSTM Neural Networks

    This paper explores a stacked long-term and short-term memory (LSTM) model for non-stationary financial time series in stock price prediction. The proposed LSTM is designed to overcome gradient explosion, gradient vanishing, and save long-term memory. Firstly, build time series with different days for network input, and then add early-stopping, rectified linear units (Relu) activation function ...

  20. A Predictive Model of the Stock Market Using the LSTM Algorithm with a

    This research paper has focused on the integration of promising stock market indicators such as the relative strength index (RSI) and different versions of the exponential moving average (EMA) (i.e., 50-day, 100-day, and 150-day EMA) with the long short-term memory (LSTM) machine learning algorithm for stock price prediction. LSTM is the most robust version of recurrent neural network because ...

  21. PDF Stock Market Prediction Using Streamlit and Lstm

    work enhances the field of stock analysis research in both the financial and technical spheres. IndexTerms: Trade High, Trade Close, CNN, Machine Learning, Deep Learning, LSTM, etc. ... Utilizing Google Colab and Streamlit, the Stock Market Prediction Using LSTM (Web Application) is effectively implemented[3].. Fig 1.2 Architecture

  22. Stock Price Prediction Using LSTM, CNN and ANN

    Abstract. Forecasting the stock market is difficult because the stock price time series is so intricate. We applied long short-term memory (LSTM) algorithm, convolutional neural network (CNN) and artificial neural network (ANN) because of the following reason: Recent work provides preliminary proof that machine learning methods could locate ...

  23. Updated deep long short-term memory with Namib beetle Henry

    Updated deep long short-term memory with Namib beetle Henry optimisation for sentiment-based stock market prediction. Authors: Nital Adikane. ... An optimal deep learning-based LSTM for stock price prediction using twitter sentiment analysis. ... This paper proposes a stock price prediction model, which extracts features from time series data ...

  24. Indian Stock Price Prediction Using Long Short-Term Memory

    Our LSTM stock prediction model consists of three LSTM layers and three dense layers, all with a ReLU activation function. ... The paper in proposes a stock market prediction tool using machine learning models such as Random Forest and Support Vector Machine. The model was tested on the Dow Jones Industrial Average Index and was able to provide ...

  25. Applied Sciences

    Unemployment, a significant economic and social challenge, triggers repercussions that affect individual workers and companies, generating a national economic impact. Forecasting the unemployment rate becomes essential for policymakers, allowing them to make short-term estimates, assess economic health, and make informed monetary policy decisions. This paper proposes the innovative GA-LSTM ...

  26. Prediction of Stock Price Based on LSTM Neural Network

    This study, based on the demand for stock price prediction and the practical problems it faces, compared and analyzed a variety of neural network prediction methods, and finally chose LSTM (Long Short-Term Memory, LSTM) neural network. Then, through in-depth study on how to predict the stock price by the LSTM neural network optimized by MBGD algorithm, the feasibility of the method and the ...

  27. Stock Market Prediction Using Machine Learning

    In Stock Market Prediction, the aim is to predict the future value of the financial stocks of a company. The recent trend in stock market prediction technologies is the use of machine learning which makes predictions based on the values of current stock market indices by training on their previous values. Machine learning itself employs different models to make prediction easier and authentic ...