data driven portfolio optimization utilizing machine learning

Data-Driven Portfolio Optimization Utilizing Machine Learning

MELINDA HSIEH

RIDER UNIVERSITY

5/23/2018 SYMPOSIUM ON DATA SCIENCE AND STATISTICS 1

Topics

Data-Driven Portfolio Optimization with Drawdown Constraints

Prescribe Optimal Portfolio utilizing Machine Learning Methods

Portfolio Performance Assessment


Portfolio Performance


Drawdown – Peak to Trough decline

𝑫 𝒕 = 𝐦𝐚𝐱𝟎≤𝝉≤𝒕

𝑹 𝝉 − 𝑹(𝒕)


𝑡1 𝑡2 𝑡3 ⋯⋯ 𝑇 𝑡𝑖𝑚𝑒

Po

rtfo

lio C

um

ula

tive

R

etu

rn

R(t)

D(t)

Portfolio Optimization – with constraint in drawdown

𝐦𝐚𝐱𝝎

𝐸[𝝎𝑇 𝑅𝑵]

s.t. max 𝐷 𝝎, 𝑡 ≤ 𝐶

𝐶1 ≤ 𝝎𝑇1𝑁 ≤ 𝐶2

𝑹𝑵: the annualized cumulative returns of N assets over the time period [0,T]

𝝎: investment weights of the N assets


Data-Driven Portfolio Optimization – with constraint in drawdown

𝐦𝐚𝐱𝝎

𝝎𝑇 𝑅𝑵


𝐶1 ≤ 𝝎𝑇1𝑁 ≤ 𝐶2

𝑹𝑵: the estimated average annualized cumulative returns of N assets over the time period [0,T]

𝝎: investment weights of the N assets


Big Data

An explosion in the availability and accessibility in data

For example, people frequently browse the internet: shopping, streaming, and searching for key words

Online activities generate footprints - source of data

A variety of data sources may be related to stock returns

Optimal portfolio weights should take into account of these

auxiliary variables


Portfolio Optimization with Auxiliary Variables

max𝝎(𝒙)

𝝎𝑇(𝑥)𝐸[ 𝑅𝑁 |𝒙]

s.t. max 𝐷 𝝎(𝑥), 𝑡 ≤ 𝐶

𝐶1 ≤ 𝝎(𝒙)𝑇1𝑁 ≤ 𝐶2


𝒙 ∈ 𝑹𝒅 are auxiliary variables with d-dimensions

Data-Driven Portfolio Optimization with Auxiliary Variables

max𝝎(𝒙)

𝝎𝑇(𝑥) 𝑅𝑁 (𝒙)

s.t. max 𝐷 𝝎(𝑥), 𝑡 ≤ 𝐶

𝐶1 ≤ 𝝎(𝒙)𝑇1𝑁 ≤ 𝐶2


𝑅𝑁 𝑥 : the estimated conditional mean of annualized cumulative returns of N assets

Searching Optimal Portfolio Weights – Linear Programming Problem

The portfolio optimization can be represented as a

linear programming problem

Compute average accumulative returns to obtain

𝑅𝑁

The derived optimal portfolio weight is data-driven,

depending on the estimated 𝑅𝑁


Machine Learning

Apply machine learning methods to estimate the conditional means

of returns 𝑹𝑵 𝒙 and derive the optimal portfolio weight 𝝎(𝒙):

1. Use machine learning methods to classify asset returns into several groups based on the values of auxiliary variables

2. For each group, compute 𝑹𝑵(𝒙)

3. Solve the linear programing problem to find the optimal portfolio weight 𝝎(𝒙)



𝑅𝑁(1); 𝑥1

(1), 𝑥2

(1)… , 𝑥𝑑

(1)

𝑅𝑁(2)

; 𝑥1(2), 𝑥2

(2)… , 𝑥𝑑

(2)

𝑅𝑁(3)

; 𝑥1(3), 𝑥2

(3)… , 𝑥𝑑

(3)

⋮

𝑅𝑁(𝑇); 𝑥1

(𝑇), 𝑥2

(𝑇)… , 𝑥𝑑

(𝑇)

𝑅𝑁(1); 𝑥1

(1), 𝑥2

(1)… , 𝑥𝑑

(1)

𝑅𝑁(3); 𝑥1

(3), 𝑥2

(3)… , 𝑥𝑑

(3)

𝑅𝑁(4); 𝑥1

(4), 𝑥2

(4)… , 𝑥𝑑

(4)

𝑅𝑁(5); 𝑥1

(5), 𝑥2

(5)… , 𝑥𝑑

(5)

𝑅𝑁(10)

; 𝑥1(10)

, 𝑥2(10)

… , 𝑥𝑑(10)

𝑅𝑁(2); 𝑥1

(2), 𝑥2

(2)… , 𝑥𝑑

(2)

𝑅𝑁(6); 𝑥1

(6), 𝑥2

(6)… , 𝑥𝑑

(6)

𝑅𝑁(20)

; 𝑥1(20)

, 𝑥2(20)

… , 𝑥𝑑(20)

𝑅𝑁(𝑇); 𝑥1

(𝑇), 𝑥2

(𝑇)… , 𝑥𝑑

(𝑇)

Train data 𝝎 𝒙(𝟏), 𝒙(𝟑), 𝒙(𝟒)

𝝎 𝒙(𝟓), 𝒙(𝟏𝟎)

𝝎 𝒙(𝟐), 𝒙(𝟔), 𝒙(𝟐𝟎), 𝒙(𝑻)

Prescribe Investment Weights

𝑅𝑁(𝑖): cumulative returns observed at time i

KNN - K Nearest Neighborhood

𝝎𝑲𝑵𝑵(𝒙𝟎) ∈ 𝐦𝐚𝐱𝒙 𝒊 ∈ ℵ𝒌(𝒙𝟎)

𝝎𝑇 𝑹𝑁 (𝒙(𝒊))


𝐶1 ≤ 𝝎𝑇1𝑁 ≤ 𝐶2

where ℵ𝐾 𝒙𝟎 = 𝑖 = 1,… , 𝑇 ∶ 𝑗=1𝑇 𝐼 𝒙𝟎 − 𝒙(𝒊) ≥ 𝒙𝟎 − 𝒙(𝒋) ≤ 𝑘 is the neighborhood of

k data points that are closest to the point 𝒙𝟎


Ctree - Conditional Inference Tree (Hothorn et al (2006)

𝝎𝑪𝒕𝒓𝒆𝒆(𝒙) ∈ 𝐦𝐚𝐱𝒊:𝑹(𝒙 𝒊 ) ∈ 𝑹(𝒙)

𝝎𝑇 𝑹𝑁 (𝒙(𝒊))


𝐶1 ≤ 𝝎𝑇1𝑁 ≤ 𝐶2

where R(x) is the splitting rule implied by a regression tree trained based on the training data

Ctree applies the inference test to determine if a possible split is significant


Random Forest

𝝎𝑹𝑭𝒌 (𝒙) ∈ 𝐦𝐚𝐱

𝒊:𝑹𝒌(𝒙 𝒊 ) ∈ 𝑹𝒌(𝒙)𝝎𝑇 𝑹𝑁 (𝒙(𝒊))


𝐶1 ≤ 𝝎𝑇1𝑁 ≤ 𝐶2

𝝎𝑹𝑭 𝒙 =𝟏

𝒎

𝒌=𝟏

𝒎

𝝎𝑹𝑭𝒌 (𝒙)


Simulation

Let 𝑋1, 𝑋2, 𝑋3 be the market factors that are related to the returns of the underlying assets in the portfolio.

Assume 𝑋 𝑡 = {𝑋1(𝑡), 𝑋2(𝑡), 𝑋3(𝑡)} follows a multivariate ARMA(2,2) process

Generate 12 time series of returns based on the three market factors

𝑅𝑁 𝑡 = 𝐴𝑇 𝑋 𝑡 +𝛿

4+ (𝐵𝑇(𝑋(𝑡))𝜂

where 𝛿 and 𝜂 are independent Gaussian noises


Portfolio Performance – Validation

For each realization of the simulated returns

split the series into in-sample (training) and out-of-sample (validation) data.

Trained the in-sample data based on the machine learning algorithm and prescribe optimal portfolio weights

Apply the prescribed weights to the out-of-sample (validation) returns and compute the portfolio returns


Performance Metrics

1. Average Cumulative Return

𝟏

𝒏

𝒕=𝟏

𝒏

𝑹𝒕𝒗

2. Maximum Drawdown

𝐦𝐚𝐱𝟎≤𝒕≤𝒏

𝐦𝐚𝐱𝟎≤𝝉≤𝒕

𝑹 𝝉 − 𝑹 𝒕

3. Reward Risk 𝟏𝒏 𝒕=𝟏𝒏 𝑹𝒕

𝒗

𝐦𝐚𝐱𝟎≤𝒕≤𝒏

𝐦𝐚𝐱𝟎≤𝝉≤𝒕

𝑹 𝝉 − 𝑹 𝒕



0.0

0.2

0.4

10 100 1000 2500 5000 10000Training Size

Av

era

ge

Cu

mu

lati

ve

Re

turn

(%)

method

CtreeKNNRFSAA

Average Cumulative Returns


0.5

1.0

1.5

2.0

10 100 1000 2500 5000 10000Training Size

Ma

x D

raw

do

wn

method

CtreeKNNRFSAA

Maximum Drawdown


0.0

0.5

1.0

1.5

10 100 1000 2500 5000 10000Training Size

Re

wa

rd R

isk method

CtreeKNNRFSAA

Reward Risk

Summary

Machine Learning method improves the out-of sample performance of the data-driven optimal portfolio

The performance improves as the size of training data increases

Tree-based approaches such as random forest and Ctreeoutperform the SAA method which does not incorporate the inputs from auxiliary variables.


Future Work

how the tuning parameters affect optimal portfolio weights and portfolio performance?

how the correlations of the auxiliary variables affect optimal portfolio weights and portfolio performance?



Appendix

Searching Optimal Portfolio Weights – Linear Programming Problem

𝐦𝐚𝐱𝝎

𝝎𝑇 𝑹𝑻×𝑵

s.t. 𝑢𝑘− 𝝎𝑇 𝑅𝑘×𝑁 ≤ 𝐶 , 1 ≤ 𝑘 ≤ 𝑁

𝑢𝑘 ≥ 𝝎𝑇 𝑅𝑘×𝑁 , 1 ≤ 𝑘 ≤ 𝑁

𝑢𝑘 ≥ 𝑢𝑘−1, 1 ≤ 𝑘 ≤ 𝑁

𝑢0= 0

𝐶1 ≤ 𝝎𝑇1𝑁 ≤ 𝐶2


where 𝑢𝑘 are auxiliary variables, 1 ≤ 𝑘 ≤ 𝑁

data driven portfolio optimization utilizing machine learning

Documents