Download - INFORMS 2015
1
INFORMS PhiladelphiaNovember 2015
Bin Weng ( Email: [email protected])Ph.D. Candidate of Industrial and System Engineering
Mohamed A. Ahmed (Email: [email protected])M.S. Candidate of Industrial and System Engineering
Fadel M. Megahed (Email: [email protected])Assistant Professor of Industrial and System Engineering
Stock Market Prediction Using Disparate Data Sources
2Stock Market Prediction Why?• The stock market is
one of the most important way for companies to raise money.• About 48% Americans
invested in the stock market as 2015 (CNBC).• The successful
prediction of a stock’s future price could yield significant PROFIT.
3Stock Market Prediction How?
Guess? Fundamental Analysis
Technical Analysis (Charting) Technological Methods
4Stock Market Prediction
Ray Dalio’s $165B Bridgewater Associates will start a new artificial-intelligence unit to use predictive analysis for trades. (Bloomberg, 2015)
5Related WorksPaper Index Selected Papers
[1] Predicting Financial Markets: Comparing Survey,News, Twitter and Search Engine Data
[2] A fusion model of HMM, ANN and GA for stock market forecasting
[3] Twitter mood predicts the stock market
[4] Stock Market Prediction System with Modular Neural Networks
[5]Empirical evaluation of an automated intraday stock recommendationsystem incorporating both market data and textual news
[6] A Hybrid Machine Learning System for Stock Market Forecasting
[7]Market Index and Stock Price Direction Prediction using MachineLearning Techniques: An empirical study on the KOSPI and HSI
[8] Stock Market Prediction Using Disparate Data Sources (Proposed)
6Related WorksPape
r
Data ModelTarget
Type of Stock
Market Data
Technical Indicator
Social Media News
Secondary
VariableTime
SeriesLogistic
Regression
Decision Trees
Neural Networks
Support Vector
Machines
IT IndexMix of
companies
[1] Price Volume
[2] Price
[3] Movement
[4]
Buy and sell signal
[5] Price Volume
[6] Movement
[7] Movement
[8] Movement
7Research Motivation Which sources of data have the most correlation with the stock market time series?
Which logical target has the best prediction capability with regards to the stock movement?
Which technological model is best at predicting the stock movement?
Can we construct a better model using disparate data sources?
8Data Sources
9Process Overview
10Data Sources Social Media and Internet Data• “Financial news articles play a large role in
influencing the movement of a stock as humans react to the information.” (M. Nardo etc. 2015)
• “Data on changes in how often financially related Wikipedia pages were viewed have contained early signs of stock market moves.” (H. Moat etc. 2013)• Blog communication exhibits
remarkable predictive power. (M. Choudhury etc. 2008)
11Data Sources Secondary Variables
• The data from Social Media and Internet always have high variability (e.g. Moving Average, Momentum, Relative Strength Index).
• If the upward or downward movement in predicting variables had an effect on the target movement?
• What range of the primary variables have predicting power over the targets?
1/2/20
14
1/8/201
4
1/14/2
014
1/20/2
014
1/26/2
014
2/1/20
14
2/7/201
4
2/13/2
014
2/19/2
014
2/25/2
014
3/3/201
4
3/9/20
14
3/15/2
014
3/21/2
014
3/27/2
014
4/2/201
4
4/8/20
14
4/14/2
014
4/20/2
014
4/26/2
014
5/2/20
14
5/8/20
14
5/14/2
014
5/20/2
014
5/26/2
014
6/1/20
14
6/7/20
14
6/13/2
014
6/19/2
014
6/25/2
0140
500100015002000250030003500
Google News & Blogs
12Target Matrix
Target Type Method
1 Open (i+1) – Close (i)2 Open (i+1) – Open (i)3 Close (i+1) – Close (i)4 Close (i+1) – Open (i)5 Volume of trades moves as previous day
13Data Fusion
14Feature Selection• Simplification of model• Shorter training times• Improve accuracy• Enhanced generalization by reducing overfitting
15Feature Selection Chord Diagram
16Feature Selection Method : Recursive feature elimination (RFE)
Coding : Python with multiple feature selection package Pseudo Code of RFE
* Code is available on https://github.com/binweng/SFS
17Feature SelectionTarget Variables
Target 1
Close Open High Low P/E RatioWiki_3_day_disparity Wiki_5_day_disparity Wiki_10_day_disparity Wiki_Momentum_1 Wiki_ROC
Google_MA_5 Google_EMA_3 Google_3_Day_disparity
Google_5_day_disparity RSI
Stochastic Ocillater Wiki_RSI Google_MA_4 William %R Google_MA_3
Target 2
Close Open High Low P/E Ratio
Wiki_5_day_disparity Wiki_Move Wiki_MA3_Move Wiki_EMA5_Move Wiki_5day_disparity_Move
Google_EMA5_Move Google_3day_disparity_Move Google_ROC_Move Google_RSI_Move Wiki_3_day_disparity
Stochastic Ocillater RSI_Move Wiki_RSI_Move Google_MA_6 Google_Move
Target 3
Close Open High P/E Ratio Stochastic_MoveWiki_Monentum_1 Wiki_Move Wiki_MA3_Move Wiki_EMA5_Move Wiki_ROC_Move
Google_EMA5_Move Google_3day_disparity_Move Google_ROC_Move Google_RSI_Move Wiki_10_day_disparity
RSI_Move Wiki_RSI_Move Wiki_3_day_disparity Google_Move Google_MA5_Move
Target 4
Close Open High Low P/E RatioRSI_Move Wiki_10_day_Disparity Wiki_Move Wiki_MA3_Move Wiki_EMA5_Move
Google_Move Google_3day_disparity_Move Google_ROC_Move Google_RSI_Move William %R
Stochastic Ocillater Stochastic_Move Wiki_3day_disparity_Move Wiki_ROC_Move Wiki_RSI_Move
Target 5
Close Open High Low William %RWiki_Monentum_1 Wiki_RSI Google_MA_2 Google_MA_3 Google_MA_4
Google_MA_9 Google_3_day_disparity
Google_5_day_disparity
Google_10_day_disparity Wiki_10_day_disparity
Wiki_3_day_disparity Wiki_5_day_disparity Google_MA_6 Google_MA_7 Google_MA_8
18Model Comparison
19Model Comparison
Source: http://scikit-learn.org/stable/tutorial/machine_learning_map/
20Model Comparison
21Experimental Result
Paper 1 – B. Nair etc., 2010 Paper 2 – A. Chen, 2003
22Experimental Result• Comparison of Model Accuracy by information
input
23Experimental Result• Evaluate the model using AUC
24Experimental ResultTarget Coincidence Matrix for SVM
Target1Training 0 1 Testing 0 1
0 55 113 0 60 951 27 229 1 34 183
Target2Training 0 1 Testing 0 1
0 160 28 0 156 391 37 180 1 32 164
Target3Training 0 1 Testing 0 1
0 147 46 0 164 321 30 172 1 31 174
Target4Training 0 1 Testing 0 1
0 150 31 0 165 341 34 172 1 31 179
Target5Training 0 1 Testing 0 1
0 177 29 0 183 371 130 61 1 125 54
25Target Matrix
Target Type Method
1 Open (i+1) – Close (i)2 Open (i+1) – Open (i)3 Close (i+1) – Close (i)4 Close (i+1) – Open (i)5 Volume of trades as previous day
26Evaluation 10 – fold cross validation
27Evaluation Cross validation result
28Evaluation
Accuracy: 82% - 89%
29Moving Prediction
30Conclusion
• Disparate sources of data help predict the stock market.
• Multiple targets’ prediction results can be used in conjunction to successfully track stock market movements.
• Decision tree model and support vector machine model perform the best interchangeably with different combinations of input data.
• With all the types of input data, SVMs performed best.
31Future Work• Identifying and adding into a more inclusive form
of this model, new sources of data that have a predictive effect on the movement of the stock market, like twitter sentiment and market news textual analysis.• Include linguistic modeling, clustering, and
controlling methods like fuzzy theory in obtaining the predictions of price range.
Fuzzy Membership Function
Fuzzy System
32
INFORMS PhiladelphiaNovember 2015
Bin Weng ( Email: [email protected])Ph.D. Candidate of Industrial and System Engineering
Mohamed A. Ahmed (Email: [email protected])M.S. Candidate of Industrial and System Engineering
Fadel M. Megahed (Email: [email protected])Assistant Professor of Industrial and System Engineering
Stock Market Prediction Using Disparate Data Sources