informs 2015

32
1 INFORMS Philadelphia November 2015 Bin Weng ( Email: [email protected]) Ph.D. Candidate of Industrial and System Engineering Mohamed A. Ahmed (Email: [email protected]) M.S. Candidate of Industrial and System Engineering Fadel M. Megahed (Email: [email protected]) Assistant Professor of Industrial and System Engineering Stock Market Prediction Using Disparate Data Sources

Upload: mohamed-abraar-ahmed

Post on 14-Feb-2017

354 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: INFORMS 2015

1

INFORMS PhiladelphiaNovember 2015

Bin Weng ( Email: [email protected])Ph.D. Candidate of Industrial and System Engineering

Mohamed A. Ahmed (Email: [email protected])M.S. Candidate of Industrial and System Engineering

Fadel M. Megahed (Email: [email protected])Assistant Professor of Industrial and System Engineering

Stock Market Prediction Using Disparate Data Sources

Page 2: INFORMS 2015

2Stock Market Prediction Why?• The stock market is

one of the most important way for companies to raise money.• About 48% Americans

invested in the stock market as 2015 (CNBC).• The successful

prediction of a stock’s future price could yield significant PROFIT.

Page 3: INFORMS 2015

3Stock Market Prediction How?

Guess? Fundamental Analysis

Technical Analysis (Charting) Technological Methods

Page 4: INFORMS 2015

4Stock Market Prediction

Ray Dalio’s $165B Bridgewater Associates will start a new artificial-intelligence unit to use predictive analysis for trades. (Bloomberg, 2015)

Page 5: INFORMS 2015

5Related WorksPaper Index Selected Papers

[1] Predicting Financial Markets: Comparing Survey,News, Twitter and Search Engine Data

[2] A fusion model of HMM, ANN and GA for stock market forecasting

[3] Twitter mood predicts the stock market

[4] Stock Market Prediction System with Modular Neural Networks

[5]Empirical evaluation of an automated intraday stock recommendationsystem incorporating both market data and textual news

[6] A Hybrid Machine Learning System for Stock Market Forecasting

[7]Market Index and Stock Price Direction Prediction using MachineLearning Techniques: An empirical study on the KOSPI and HSI

[8] Stock Market Prediction Using Disparate Data Sources (Proposed)

Page 6: INFORMS 2015

6Related WorksPape

r

Data ModelTarget

Type of Stock

Market Data

Technical Indicator

Social Media News

Secondary

VariableTime

SeriesLogistic

Regression

Decision Trees

Neural Networks

Support Vector

Machines

IT IndexMix of

companies

[1]             Price Volume  

   

[2]                   Price    

[3]               Movement    

[4]               

  Buy and sell signal  

   

  

[5]       Price Volume    

[6]                 Movement    

[7]                 Movement  

[8]   Movement  

Page 7: INFORMS 2015

7Research Motivation Which sources of data have the most correlation with the stock market time series?

Which logical target has the best prediction capability with regards to the stock movement?

Which technological model is best at predicting the stock movement?

Can we construct a better model using disparate data sources?

Page 8: INFORMS 2015

8Data Sources

Page 9: INFORMS 2015

9Process Overview

Page 10: INFORMS 2015

10Data Sources Social Media and Internet Data• “Financial news articles play a large role in

influencing the movement of a stock as humans react to the information.” (M. Nardo etc. 2015)

• “Data on changes in how often financially related Wikipedia pages were viewed have contained early signs of stock market moves.” (H. Moat etc. 2013)• Blog communication exhibits

remarkable predictive power. (M. Choudhury etc. 2008)

Page 11: INFORMS 2015

11Data Sources Secondary Variables

• The data from Social Media and Internet always have high variability (e.g. Moving Average, Momentum, Relative Strength Index).

• If the upward or downward movement in predicting variables had an effect on the target movement?

• What range of the primary variables have predicting power over the targets?

1/2/20

14

1/8/201

4

1/14/2

014

1/20/2

014

1/26/2

014

2/1/20

14

2/7/201

4

2/13/2

014

2/19/2

014

2/25/2

014

3/3/201

4

3/9/20

14

3/15/2

014

3/21/2

014

3/27/2

014

4/2/201

4

4/8/20

14

4/14/2

014

4/20/2

014

4/26/2

014

5/2/20

14

5/8/20

14

5/14/2

014

5/20/2

014

5/26/2

014

6/1/20

14

6/7/20

14

6/13/2

014

6/19/2

014

6/25/2

0140

500100015002000250030003500

Google News & Blogs

Page 12: INFORMS 2015

12Target Matrix

Target Type Method

1 Open (i+1) – Close (i)2 Open (i+1) – Open (i)3 Close (i+1) – Close (i)4 Close (i+1) – Open (i)5 Volume of trades moves as previous day

Page 13: INFORMS 2015

13Data Fusion

Page 14: INFORMS 2015

14Feature Selection• Simplification of model• Shorter training times• Improve accuracy• Enhanced generalization by reducing overfitting

Page 15: INFORMS 2015

15Feature Selection Chord Diagram

Page 16: INFORMS 2015

16Feature Selection Method : Recursive feature elimination (RFE)

Coding : Python with multiple feature selection package Pseudo Code of RFE

* Code is available on https://github.com/binweng/SFS

Page 17: INFORMS 2015

17Feature SelectionTarget Variables

Target 1

Close Open High Low P/E RatioWiki_3_day_disparity Wiki_5_day_disparity Wiki_10_day_disparity Wiki_Momentum_1 Wiki_ROC

Google_MA_5 Google_EMA_3 Google_3_Day_disparity

Google_5_day_disparity RSI

Stochastic Ocillater Wiki_RSI Google_MA_4 William %R Google_MA_3

Target 2

Close Open High Low P/E Ratio

Wiki_5_day_disparity Wiki_Move Wiki_MA3_Move Wiki_EMA5_Move Wiki_5day_disparity_Move

Google_EMA5_Move Google_3day_disparity_Move Google_ROC_Move Google_RSI_Move Wiki_3_day_disparity

Stochastic Ocillater RSI_Move Wiki_RSI_Move Google_MA_6 Google_Move

Target 3

Close Open High P/E Ratio Stochastic_MoveWiki_Monentum_1 Wiki_Move Wiki_MA3_Move Wiki_EMA5_Move Wiki_ROC_Move

Google_EMA5_Move Google_3day_disparity_Move Google_ROC_Move Google_RSI_Move Wiki_10_day_disparity

RSI_Move Wiki_RSI_Move Wiki_3_day_disparity Google_Move Google_MA5_Move

Target 4

Close Open High Low P/E RatioRSI_Move Wiki_10_day_Disparity Wiki_Move Wiki_MA3_Move Wiki_EMA5_Move

Google_Move Google_3day_disparity_Move Google_ROC_Move Google_RSI_Move William %R

Stochastic Ocillater Stochastic_Move Wiki_3day_disparity_Move Wiki_ROC_Move Wiki_RSI_Move

Target 5

Close Open High Low William %RWiki_Monentum_1 Wiki_RSI Google_MA_2 Google_MA_3 Google_MA_4

Google_MA_9 Google_3_day_disparity

Google_5_day_disparity

Google_10_day_disparity Wiki_10_day_disparity

Wiki_3_day_disparity Wiki_5_day_disparity Google_MA_6 Google_MA_7 Google_MA_8

Page 18: INFORMS 2015

18Model Comparison

Page 19: INFORMS 2015

19Model Comparison

Source: http://scikit-learn.org/stable/tutorial/machine_learning_map/

Page 20: INFORMS 2015

20Model Comparison

Page 21: INFORMS 2015

21Experimental Result

Paper 1 – B. Nair etc., 2010 Paper 2 – A. Chen, 2003

Page 22: INFORMS 2015

22Experimental Result• Comparison of Model Accuracy by information

input

Page 23: INFORMS 2015

23Experimental Result• Evaluate the model using AUC

Page 24: INFORMS 2015

24Experimental ResultTarget Coincidence Matrix for SVM

Target1Training 0 1 Testing 0 1

0 55 113 0 60 951 27 229 1 34 183

Target2Training 0 1 Testing 0 1

0 160 28 0 156 391 37 180 1 32 164

Target3Training 0 1 Testing 0 1

0 147 46 0 164 321 30 172 1 31 174

Target4Training 0 1 Testing 0 1

0 150 31 0 165 341 34 172 1 31 179

Target5Training 0 1 Testing 0 1

0 177 29 0 183 371 130 61 1 125 54

Page 25: INFORMS 2015

25Target Matrix

Target Type Method

1 Open (i+1) – Close (i)2 Open (i+1) – Open (i)3 Close (i+1) – Close (i)4 Close (i+1) – Open (i)5 Volume of trades as previous day

Page 26: INFORMS 2015

26Evaluation 10 – fold cross validation

Page 27: INFORMS 2015

27Evaluation Cross validation result

Page 28: INFORMS 2015

28Evaluation

Accuracy: 82% - 89%

Page 29: INFORMS 2015

29Moving Prediction

Page 30: INFORMS 2015

30Conclusion

• Disparate sources of data help predict the stock market.

• Multiple targets’ prediction results can be used in conjunction to successfully track stock market movements.

• Decision tree model and support vector machine model perform the best interchangeably with different combinations of input data.

• With all the types of input data, SVMs performed best.

Page 31: INFORMS 2015

31Future Work• Identifying and adding into a more inclusive form

of this model, new sources of data that have a predictive effect on the movement of the stock market, like twitter sentiment and market news textual analysis.• Include linguistic modeling, clustering, and

controlling methods like fuzzy theory in obtaining the predictions of price range.

Fuzzy Membership Function

Fuzzy System

Page 32: INFORMS 2015

32

INFORMS PhiladelphiaNovember 2015

Bin Weng ( Email: [email protected])Ph.D. Candidate of Industrial and System Engineering

Mohamed A. Ahmed (Email: [email protected])M.S. Candidate of Industrial and System Engineering

Fadel M. Megahed (Email: [email protected])Assistant Professor of Industrial and System Engineering

Stock Market Prediction Using Disparate Data Sources