data driven portfolio optimization utilizing machine learning

25
Data-Driven Portfolio Optimization Utilizing Machine Learning MELINDA HSIEH RIDER UNIVERSITY 5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 1

Upload: others

Post on 23-Feb-2022

7 views

Category:

Documents


0 download

TRANSCRIPT

Data-Driven Portfolio Optimization Utilizing Machine Learning

MELINDA HSIEH

RIDER UNIVERSITY

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 1

Topics

Data-Driven Portfolio Optimization with Drawdown Constraints

Prescribe Optimal Portfolio utilizing Machine Learning Methods

Portfolio Performance Assessment

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 2

Portfolio Performance

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 3

Drawdown โ€“ Peak to Trough decline

๐‘ซ ๐’• = ๐ฆ๐š๐ฑ๐ŸŽโ‰ค๐‰โ‰ค๐’•

๐‘น ๐‰ โˆ’ ๐‘น(๐’•)

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 4

๐‘ก1 ๐‘ก2 ๐‘ก3 โ‹ฏโ‹ฏ ๐‘‡ ๐‘ก๐‘–๐‘š๐‘’

Po

rtfo

lio C

um

ula

tive

R

etu

rn

R(t)

D(t)

Portfolio Optimization โ€“ with constraint in drawdown

๐ฆ๐š๐ฑ๐Ž

๐ธ[๐Ž๐‘‡ ๐‘…๐‘ต]

s.t. max ๐ท ๐Ž, ๐‘ก โ‰ค ๐ถ

๐ถ1 โ‰ค ๐Ž๐‘‡1๐‘ โ‰ค ๐ถ2

๐‘น๐‘ต: the annualized cumulative returns of N assets over the time period [0,T]

๐Ž: investment weights of the N assets

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 5

Data-Driven Portfolio Optimization โ€“ with constraint in drawdown

๐ฆ๐š๐ฑ๐Ž

๐Ž๐‘‡ ๐‘…๐‘ต

s.t. max ๐ท ๐Ž, ๐‘ก โ‰ค ๐ถ

๐ถ1 โ‰ค ๐Ž๐‘‡1๐‘ โ‰ค ๐ถ2

๐‘น๐‘ต: the estimated average annualized cumulative returns of N assets over the time period [0,T]

๐Ž: investment weights of the N assets

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 6

Big Data

An explosion in the availability and accessibility in data

For example, people frequently browse the internet: shopping, streaming, and searching for key words

Online activities generate footprints - source of data

A variety of data sources may be related to stock returns

Optimal portfolio weights should take into account of these

auxiliary variables

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 7

Portfolio Optimization with Auxiliary Variables

max๐Ž(๐’™)

๐Ž๐‘‡(๐‘ฅ)๐ธ[ ๐‘…๐‘ |๐’™]

s.t. max ๐ท ๐Ž(๐‘ฅ), ๐‘ก โ‰ค ๐ถ

๐ถ1 โ‰ค ๐Ž(๐’™)๐‘‡1๐‘ โ‰ค ๐ถ2

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 8

๐’™ โˆˆ ๐‘น๐’… are auxiliary variables with d-dimensions

Data-Driven Portfolio Optimization with Auxiliary Variables

max๐Ž(๐’™)

๐Ž๐‘‡(๐‘ฅ) ๐‘…๐‘ (๐’™)

s.t. max ๐ท ๐Ž(๐‘ฅ), ๐‘ก โ‰ค ๐ถ

๐ถ1 โ‰ค ๐Ž(๐’™)๐‘‡1๐‘ โ‰ค ๐ถ2

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 9

๐‘…๐‘ ๐‘ฅ : the estimated conditional mean of annualized cumulative returns of N assets

Searching Optimal Portfolio Weights โ€“ Linear Programming Problem

The portfolio optimization can be represented as a

linear programming problem

Compute average accumulative returns to obtain

๐‘…๐‘

The derived optimal portfolio weight is data-driven,

depending on the estimated ๐‘…๐‘

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 10

Machine Learning

Apply machine learning methods to estimate the conditional means

of returns ๐‘น๐‘ต ๐’™ and derive the optimal portfolio weight ๐Ž(๐’™):

1. Use machine learning methods to classify asset returns into several groups based on the values of auxiliary variables

2. For each group, compute ๐‘น๐‘ต(๐’™)

3. Solve the linear programing problem to find the optimal portfolio weight ๐Ž(๐’™)

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 11

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 12

๐‘…๐‘(1); ๐‘ฅ1

(1), ๐‘ฅ2

(1)โ€ฆ , ๐‘ฅ๐‘‘

(1)

๐‘…๐‘(2)

; ๐‘ฅ1(2), ๐‘ฅ2

(2)โ€ฆ , ๐‘ฅ๐‘‘

(2)

๐‘…๐‘(3)

; ๐‘ฅ1(3), ๐‘ฅ2

(3)โ€ฆ , ๐‘ฅ๐‘‘

(3)

โ‹ฎ

๐‘…๐‘(๐‘‡); ๐‘ฅ1

(๐‘‡), ๐‘ฅ2

(๐‘‡)โ€ฆ , ๐‘ฅ๐‘‘

(๐‘‡)

๐‘…๐‘(1); ๐‘ฅ1

(1), ๐‘ฅ2

(1)โ€ฆ , ๐‘ฅ๐‘‘

(1)

๐‘…๐‘(3); ๐‘ฅ1

(3), ๐‘ฅ2

(3)โ€ฆ , ๐‘ฅ๐‘‘

(3)

๐‘…๐‘(4); ๐‘ฅ1

(4), ๐‘ฅ2

(4)โ€ฆ , ๐‘ฅ๐‘‘

(4)

๐‘…๐‘(5); ๐‘ฅ1

(5), ๐‘ฅ2

(5)โ€ฆ , ๐‘ฅ๐‘‘

(5)

๐‘…๐‘(10)

; ๐‘ฅ1(10)

, ๐‘ฅ2(10)

โ€ฆ , ๐‘ฅ๐‘‘(10)

๐‘…๐‘(2); ๐‘ฅ1

(2), ๐‘ฅ2

(2)โ€ฆ , ๐‘ฅ๐‘‘

(2)

๐‘…๐‘(6); ๐‘ฅ1

(6), ๐‘ฅ2

(6)โ€ฆ , ๐‘ฅ๐‘‘

(6)

๐‘…๐‘(20)

; ๐‘ฅ1(20)

, ๐‘ฅ2(20)

โ€ฆ , ๐‘ฅ๐‘‘(20)

๐‘…๐‘(๐‘‡); ๐‘ฅ1

(๐‘‡), ๐‘ฅ2

(๐‘‡)โ€ฆ , ๐‘ฅ๐‘‘

(๐‘‡)

Train data ๐Ž ๐’™(๐Ÿ), ๐’™(๐Ÿ‘), ๐’™(๐Ÿ’)

๐Ž ๐’™(๐Ÿ“), ๐’™(๐Ÿ๐ŸŽ)

๐Ž ๐’™(๐Ÿ), ๐’™(๐Ÿ”), ๐’™(๐Ÿ๐ŸŽ), ๐’™(๐‘ป)

Prescribe Investment Weights

๐‘…๐‘(๐‘–): cumulative returns observed at time i

KNN - K Nearest Neighborhood

๐Ž๐‘ฒ๐‘ต๐‘ต(๐’™๐ŸŽ) โˆˆ ๐ฆ๐š๐ฑ๐’™ ๐’Š โˆˆ โ„ต๐’Œ(๐’™๐ŸŽ)

๐Ž๐‘‡ ๐‘น๐‘ (๐’™(๐’Š))

s.t. max ๐ท ๐Ž, ๐‘ก โ‰ค ๐ถ

๐ถ1 โ‰ค ๐Ž๐‘‡1๐‘ โ‰ค ๐ถ2

where โ„ต๐พ ๐’™๐ŸŽ = ๐‘– = 1,โ€ฆ , ๐‘‡ โˆถ ๐‘—=1๐‘‡ ๐ผ ๐’™๐ŸŽ โˆ’ ๐’™(๐’Š) โ‰ฅ ๐’™๐ŸŽ โˆ’ ๐’™(๐’‹) โ‰ค ๐‘˜ is the neighborhood of

k data points that are closest to the point ๐’™๐ŸŽ

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 13

Ctree - Conditional Inference Tree (Hothorn et al (2006)

๐Ž๐‘ช๐’•๐’“๐’†๐’†(๐’™) โˆˆ ๐ฆ๐š๐ฑ๐’Š:๐‘น(๐’™ ๐’Š ) โˆˆ ๐‘น(๐’™)

๐Ž๐‘‡ ๐‘น๐‘ (๐’™(๐’Š))

s.t. max ๐ท ๐Ž, ๐‘ก โ‰ค ๐ถ

๐ถ1 โ‰ค ๐Ž๐‘‡1๐‘ โ‰ค ๐ถ2

where R(x) is the splitting rule implied by a regression tree trained based on the training data

Ctree applies the inference test to determine if a possible split is significant

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 14

Random Forest

๐Ž๐‘น๐‘ญ๐’Œ (๐’™) โˆˆ ๐ฆ๐š๐ฑ

๐’Š:๐‘น๐’Œ(๐’™ ๐’Š ) โˆˆ ๐‘น๐’Œ(๐’™)๐Ž๐‘‡ ๐‘น๐‘ (๐’™(๐’Š))

s.t. max ๐ท ๐Ž, ๐‘ก โ‰ค ๐ถ

๐ถ1 โ‰ค ๐Ž๐‘‡1๐‘ โ‰ค ๐ถ2

๐Ž๐‘น๐‘ญ ๐’™ =๐Ÿ

๐’Ž

๐’Œ=๐Ÿ

๐’Ž

๐Ž๐‘น๐‘ญ๐’Œ (๐’™)

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 15

Simulation

Let ๐‘‹1, ๐‘‹2, ๐‘‹3 be the market factors that are related to the returns of the underlying assets in the portfolio.

Assume ๐‘‹ ๐‘ก = {๐‘‹1(๐‘ก), ๐‘‹2(๐‘ก), ๐‘‹3(๐‘ก)} follows a multivariate ARMA(2,2) process

Generate 12 time series of returns based on the three market factors

๐‘…๐‘ ๐‘ก = ๐ด๐‘‡ ๐‘‹ ๐‘ก +๐›ฟ

4+ (๐ต๐‘‡(๐‘‹(๐‘ก))๐œ‚

where ๐›ฟ and ๐œ‚ are independent Gaussian noises

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 16

Portfolio Performance โ€“ Validation

For each realization of the simulated returns

split the series into in-sample (training) and out-of-sample (validation) data.

Trained the in-sample data based on the machine learning algorithm and prescribe optimal portfolio weights

Apply the prescribed weights to the out-of-sample (validation) returns and compute the portfolio returns

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 17

Performance Metrics

1. Average Cumulative Return

๐Ÿ

๐’

๐’•=๐Ÿ

๐’

๐‘น๐’•๐’—

2. Maximum Drawdown

๐ฆ๐š๐ฑ๐ŸŽโ‰ค๐’•โ‰ค๐’

๐ฆ๐š๐ฑ๐ŸŽโ‰ค๐‰โ‰ค๐’•

๐‘น ๐‰ โˆ’ ๐‘น ๐’•

3. Reward Risk ๐Ÿ๐’ ๐’•=๐Ÿ๐’ ๐‘น๐’•

๐’—

๐ฆ๐š๐ฑ๐ŸŽโ‰ค๐’•โ‰ค๐’

๐ฆ๐š๐ฑ๐ŸŽโ‰ค๐‰โ‰ค๐’•

๐‘น ๐‰ โˆ’ ๐‘น ๐’•

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 18

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 19

0.0

0.2

0.4

10 100 1000 2500 5000 10000Training Size

Av

era

ge

Cu

mu

lati

ve

Re

turn

(%)

method

CtreeKNNRFSAA

Average Cumulative Returns

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 20

0.5

1.0

1.5

2.0

10 100 1000 2500 5000 10000Training Size

Ma

x D

raw

do

wn

method

CtreeKNNRFSAA

Maximum Drawdown

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 21

0.0

0.5

1.0

1.5

10 100 1000 2500 5000 10000Training Size

Re

wa

rd R

isk method

CtreeKNNRFSAA

Reward Risk

Summary

Machine Learning method improves the out-of sample performance of the data-driven optimal portfolio

The performance improves as the size of training data increases

Tree-based approaches such as random forest and Ctreeoutperform the SAA method which does not incorporate the inputs from auxiliary variables.

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 22

Future Work

how the tuning parameters affect optimal portfolio weights and portfolio performance?

how the correlations of the auxiliary variables affect optimal portfolio weights and portfolio performance?

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 23

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 24

Appendix

Searching Optimal Portfolio Weights โ€“ Linear Programming Problem

๐ฆ๐š๐ฑ๐Ž

๐Ž๐‘‡ ๐‘น๐‘ปร—๐‘ต

s.t. ๐‘ข๐‘˜โˆ’ ๐Ž๐‘‡ ๐‘…๐‘˜ร—๐‘ โ‰ค ๐ถ , 1 โ‰ค ๐‘˜ โ‰ค ๐‘

๐‘ข๐‘˜ โ‰ฅ ๐Ž๐‘‡ ๐‘…๐‘˜ร—๐‘ , 1 โ‰ค ๐‘˜ โ‰ค ๐‘

๐‘ข๐‘˜ โ‰ฅ ๐‘ข๐‘˜โˆ’1, 1 โ‰ค ๐‘˜ โ‰ค ๐‘

๐‘ข0= 0

๐ถ1 โ‰ค ๐Ž๐‘‡1๐‘ โ‰ค ๐ถ2

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 25

where ๐‘ข๐‘˜ are auxiliary variables, 1 โ‰ค ๐‘˜ โ‰ค ๐‘