a porfolio strategy - rice university · a porfolio strategy ... song, le you, kunhui zhang stat682...

A PORFOLIO STRATEGY NOT TOO BAD… Group 3 : Yue Duan, Jie Song, Le You, Kunhui Zhang

STAT682 Final Project Rice University 2014 Fall

INTRODUCTION

Page 1

Introduction

In today’s society, data analysis plays an important role in business decisions. From highly automated high-frequency trading system to annual financial report analysis, the ability to interpret financial stats and use them as a decision making tool has become very crucial.

In this report, we offer an approach for stock selection and portfolio construction, incorporating both data mining techniques and classic portfolio theory. Instead of mining historical returns and volatility, or applying strategies like statistical arbitrary, we decided to look into another aspect of businesses that intrigues investors ---- accounting ratios. In O'Shaughnessy’s What Works on Wall street, he carefully demonstrated the interpretation of each accounting ratios and its influence on stock movements. In our project, we tried to study how a collection of ratios would decide the movements of stock movements, and rank the importance of each. After building our stock selection scheme, we will test the performance of our collection at different time frames. We will utilize classic portfolio theory and weight our portfolio at the maximized Sharpe ratio.

The whole procedure involves both machine learning and human screening. We do not try to build an automatically buy-sell-rebalance system. We also do not treat sequentially recorded historical annual ratios as time series data; instead, we will treat them as fixed points in the “history plane.” Moreover, we decided to use the next year’s excess return compared to market return as our response variable. The reasoning here is that we think the primary objective of an investment is to beat the market. And the market will also need some time to response after the financial statements are released.

Due to time, energy & intelligence constraints, our model has many imperfection and limitations, and huge room for improvements. And we will continuously reflect on our work throughout the whole project.

INTRODUCTION

Page 2

PROCEDURE

The diagram above summarizes our procedure, which will be explained in detail in the following sections.

Datebase

Constru-ction

Model Building

Stock Selection

Porfolio Optimizati

on

Performance

Evaluation

DATASET CONSTRUCTION

Page 3

Dataset Construction

COVERAGE &RATIOS We retrieved our data from CRSP-Compustat database. Variables we selected and their coverage are summarized on Appendix 1. Ratios we selected were diversified among different aspects of a company’s operations. Note that we only analyze financial ratios because money value would be affected by time value of money and inflation. After careful selection and examination on coverage, we pick the following ratios:

Price to Earnings (P/E), Price to Book (P/B), Price to Sales (P/S), Dividend Yield, Earnings Per Share (EPS), Return on Assets (ROA), Return on Equity (ROE), Current Ratio, Payout Ratio, Tobin’s Q Ratio, Debt/Equity Ratio, and Market Share.

A summary of how those ratios are derived is presented in Appendix 2.

Those 12 ratios constitute our predictors. We compute the next year’s excess return compared to market return as our response variable. In order to keep our model constant, we omitted all NA values.

NOTES ON NEGATIVE VALUES

For value creation purposes, investors generally do not look at stock with negative EPS and P/E ratio. It is true that stocks with negative EPS or P/E could abruptly change and gain massive return, but investors generally do not want to take the risk. And such data points could potentially be outliers in our data set. Thus, we omit all negative values of EPS and P/E.

REFLECTION For model building purposes, more data does not necessarily mean improved accuracy. However, time interval of extracting data could make a difference. For example, daily data tend to be more accurate than annual data. Unfortunately, accounting ratios are only released annually or quarterly. Thus in this analysis, it is very hard to capture the immediate response of a company’s financial status on the stock market.

MODEL BUILDING

Page 4

Model Building

For massive data regression purposes, a good black-box learning method is Random Forest. The general algorithm of Random Forest for regression purposes is as follows.

First, the algorithm subtracts n bootstrap samples from the raw data. Then, it planted a regression tree for each of the bootstrap sample. Several modifications should be adept when grow trees, such as sample mtry from the predictors and select the best split from them at random. Third, random forest aggregates the predictions of the n trees to predict new data.

The method gives out the importance of each predicting variables. Based on our model, the top ranked variables are listed below in order of importance:

• Market Share%: defined as market capitalization divided by total market capitalization. Surprisingly we find a positive correlation, which contradicts with O’Shaughnessy’s result that small stocks outperform a little bit…

• Price to Equity (P/E): P/E evaluates the performance of a company's stock price. Generally low P/E stocks are considered underpriced for its return.

• Price to Book Value (P/B): P/B is an indication of how much net assets of a company are paid by shareholders. According to O’Shaughnessy the market rewards low P/B stocks.

• Earnings Per Share (EPS): EPS is given as the ratio of net income and average outstanding shares over the reporting period. EPS is considered as one of the most important indicators of a company’s profitability, but should be considered with other variables.

• Return On Assets (ROA): ROA indicates the ability of company to generate earnings by using its assets. A higher ROA generally indicate more efficient management, though the ratio varies with industry.

• Tobin’s Q Ratio: Q ratio of a stock is calculated by dividing the market value of a company by the replacement value of its assets. A Q ratio lower than 1 implies the stock is undervalues, and greater than 1 implies overvalued.

• Price-To-Sales Ratio (P/S): P/S determines is computed by dividing the company’s market capitalization by its annual sales. It’s considered the most consistent and guide for future performance by O’Shaughnessy.

MODEL BUILDING

Page 5

REFLECTION

Infinite improvements can be made here. Generally, putting all faiths on one base learner is not reliable. Tuning and cross-validated ensemble learners could be extremely time-consuming and CPU-burning. Although we did not include them in this report, we do realize the importance of avoiding over fit. One good thing about random forest is that the resampling bootstrap nature of the algorithm guaranteed a certain degree of generalization.

RECONSTRUCTING & TESTING OUR MODEL Since our model was constructed based on annual data without time series consent, it tends to work better in shorter holding period. To test our model on real world scenario, we first test our strategy on 2012 with one-year holding period; then we reconstructed our model using data before 2000, and apply our strategy on 2000. For each scenario, top 50 stocks returned by the model with highest predicted annual excess return was selected. Then we apply portfolio theory to give weights to the stocks.

PORTFOLIO OPTIMIZATION

Page 6

Portfolio Optimization

We decide to simplify our portfolio into a 2-asset portfolio with risk-free asset and a risky asset. The risky asset is, actually, a risky portfolio consist of many securities as selected by our model. After picking the stocks, the next steps are to (1) determine the weights of each stock in our risky asset, and (2) determine the weights of risky-free and risky assets.

The strategy is summarized as below:

P = α1P1 + α2B

where B is risk free asset, P1 is risky asset (our risky portfolio), and α1 and α2 are weights for risky and risk free asset.

Also we have

P1 = α11S1 + α12S2 + … + α1nSn

where S1… Sn are stocks in the risky portfolio, and α11 … α1n are individual weights.

Thus our portfolio is

P = α1(α11S1 + α12S2 + … + α1nSn) + α2B

WEIGHTS OPTIMIZATION OF RISKY PORTFOLIO By Markowitz’s Modern Portfolio Theory (MPT), risk is an inherent part of higher reward, and a portfolio is efficient if it can maximize expected return at lowest risk, for rational risk-averse investors. By diversification, we can achieve the highest risked-adjusted return, since risk of a portfolio is lower than the average of the risks of each asset taken individually.

We denote the return of ith stock Si to be 𝑥!, its weight is 𝛼!, and the variance of return is to be 𝜎!! . For portfolio with returns 𝑋 = (𝑥!, 𝑥!,… , 𝑥!) , weights 𝛼 = (𝛼!,𝛼!,… ,𝛼!), the expected return of portfolio is 𝜇 = 𝐸[𝑋], and covariance matrix of return is Σ = 𝐸[(𝑋 − 𝜇)(𝑋 − 𝜇)!]. Then we have expected return and variance of the portfolio 𝜇! = 𝛼!𝜇 , and 𝜎!! = 𝛼!Σ𝛼 = 𝛼!!𝜎!!!

! + 𝛼!!!!!! 𝛼!𝜎!" .

Thus we can construct the efficient frontier by minimizing 𝜎!! for a given 𝜇!.


Page 7

Using this rationale, we optimized our risky asset portfolio. Constraints include no short sell and maximum allocation of 50%. We obtained the tangency portfolio with largest Sharpe ratio, since we are also including risk-free asset.

As an example, we run the model using data before 2012 to pick stocks for 2013, and obtained stocks optimal return weights. The portfolio we picked is as summarized as below (stocks with weight < 1% were removed).

Table 1. Stocks and Weights in Portfolio Picked by Model

Ticker SPCB HGSH NTWK TCX NXST TPI EVC FRGI LOAN

Weight % 1.110 6.142 16.673 13.761 33.349 0.629 1.293 19.927 7.116

INCLUDING RISK-FREE ASSET

We choose the one-month Treasury bill return (data from WRDS Fama-French Factor Library) as our risk-free asset in the portfolio. Since it’s “risk-free” this asset has zero variance in returns and hence uncorrelated with any other asset. So when combined with other asset or portfolio, when the combination varies, the change in return is linearly related to the change in risk. By Modern Portfolio Theory, the tangency portfolio is obtained as the risk-free half line tangents the efficiency frontier of the risky portfolio at the point with highest Sharpe Ratio. The points on the line connecting the intercept and tangency point represent combination of different weights of risk-free and risky assets.

●

Risk: 2.465%Return: 0.475%Sharpe: 19.27%

0.000

0.005

0.010

0.015

0.020

0.025

0.05 0.10 0.15Risk (standard deviation of portfolio, daily)

Ret

urn

(Dai

ly)

Efficient Frontier and Optimal Portfolio


Page 8

We conducted a simple test to test the difference between optimal weight and equal-weight risk portfolio. We assumed our initial capital is $10,000 at the beginning of 2013 and our holding period is one year.

Table 2. Comparison of optimal weight and equal weight risky portfolio -2012 Weight Initial Balance ($) Final Balance ($) CAGR (%)

Weighted 10,000 33,196 231.96 Equal 10,000 26,740 167.40

Based on the test results, our portfolio performed very impressive in 2013 in two aspects. Firstly the stocks we had faith in did perform good as we expected. Secondly, the weight we set did maximize our profits by compared with the return of same stocks but equal weighted portfolio. Our model seems to be valid.

MODEL TESTING

Page 9

Model Testing

After getting our portfolio, we needed to test the validity and the performance of the model in the long run. So we adopt a simulation-investing multi-years test with real market data, which allowed us to predict the return of our portfolio return in the future and then to compare with the total market return.

Generally, we assumed that we had a fixed amount of initial capital at the year after our portfolio building periods and we wanted to allocate the capital to our selected stocks by preset weight. Our trading strategy was very simple. We bought stocks at the beginning of a year and we sold them all at the year-end. We did not consider finding the perfect time for buying or selling. And during the holding period, there was no adjustment for weighting.

We generated a portfolio using data before 2000 and picked the optimal portfolio summarized as below:

Table 2. Stocks and Weights in Portfolio Picked by Model - 2000

Ticker SGRP ENG ARCI EDAP EVI AGX TOF UWN FCX JOSB AN PQ

Weight% 1.00 4.00 1.00 1.00 2.00 1.00 2.00 5.00 39.00 4.00 18.00 22.00

●Risk: 1.62Return: 0.571%Sharpe: 35.22%0.000

0.005

0.010

0.015

0.020

0.03 0.06 0.09Risk (standard deviation of portfolio, daily)

Ret

urn

(Dai

ly)

Efficient Frontier and Optimal Portfolio

MODEL TESTING

Page 10

And then we want to know how well it performed by conducting a long period simulation test from 2001 to 2011. Similarly we assume our initial capital is $10,000 at 2001.

Table 3. Portfolio Performance

Initial Balance ($)

Final Balance ($)

CAGR (%)

Sd (%)

Best Year (%)

Worst Year (%)

Sharp Ratio

Max. Markdown (%)

Sortino Ratio

US Market Correlation

10,000 65,619 18.65 24.96 52.06 -26.19 0.77 -65.7 1.71 0.36

According to the test result, our portfolio earned an impressive return by accumulating our capital to more than six times. After checking the absolute performance we also wanted to check the relative performance by comparing our portfolio return with market return.

MODEL TESTING

Page 11

It is obvious from the figure above that generally our portfolio had outperformed the total market. Yet we also noticed that during 2005-2007 our portfolio did not win the total market and went down before the market did. Then it also rebounded before the market. We assume that this might due to the high volatility in our portfolio, since companies with high expectation of growth issued most of our selected stocks. In other words, they are capital driven stocks. When market is good and even at the beginning of the turbulence, they have good performance since capitals do not tend to leave very quickly. However, when the market goes down, capitals tend to be pulled out very quickly.

Generally, our model seems to be valid and out-perform the market.

CONCLUSION & REFLECTION

Page 12

Conclusion & Reflection

In this project, we provided an approach to select portfolio and make investments. Although our procedure was not very rigorous, each step was outlined carefully. An advantage of our model is quick-adjustment of the portfolio: we can update the data as needed, and the model should improve and generate a more accurate prediction as data increase. After careful thinking we feel like our model is probably more suitable for short-term investment since we do see better performance of our model when testing it for relatively shorter holding period (not shown in report). But thanks to computers, we can easily use the model to update our portfolio whenever needed, so maybe instead of annual rebalancing, we can reconstruct portfolio each year.

The next steps we need to take are definitely building and refining a more generalized model, and outline a more rigorous testing scenario. Although our model returns a portfolio with not-too-bad performance, we did not carefully structure our risk-return tradeoff. Thus, consistently good outputs are not guaranteed. And we are not very satisfied with our results. There’re definitely lots of improvements to be done, but we’re hoping the general idea of our project makes some sense.

REFERENCES

Page 13

References

1. Campbell, Kyle; Enos, Jeff; Gerlanc, Daniel; Kane, David. Backtests. Retrieved

from: http://cran.r-project.org/web/packages/backtest/backtest.pdf.

2. Cenci, Marisa; Corradini, Massimiliano; Gheno, Andrea. 2005. Dynamic Portfolio

Selection In A Dual Expected Utility Theory Framework. Retrieved from:

www.actuaries.org/LIBRARY/ASTIN/vol36no2/505.pdf.

3. Dobelman,J.A., Kang,H.B., Park,S.W. 2014. WRDS INDEX DATA EXTRACTION

METHODOLOGY. The Rice University, Statistics Department Technical Report

TR2014-01.

4. Grishina,Nina. A Behavioural Approach to Financial Portfolio Selection Problem:

an Empirical Study Using Heuristics. Retrieved from:

http://bura.brunel.ac.uk/handle/2438/9173 .

5. Lam, Hon Cheong; Fisher, Brian ; Ebert, D.S. An Experimental Study of Financial

Portfolio Selection with Visual Analytics for Decision Support. Retrieved from:

http://www.academia.edu/2976841/An_Experimental_Study_of_Financial_Portfol

io_Selection_with_Visual_Analytics_for_Decision_Support.

6. Leung, Angela Hei-Yan. Portfolio Selection and Risk Management: An

Introduction, Empirical Demonstration and R-Application for Stock Portfolios.

Retrieved from: http://statistics.ucla.edu/system/resources/.

7. O'Shaughnessy,James. 2011. WHAT WORKS ON WALL STREET: A GUIDE TO THE

BESTPERFORMING INVESTMENT STRATEGIES OF ALL TIME, 4th Ed. New York:

McGraw-Hill.

8. Yollin, Guy. R Tools for Portfolio Optimization. Retrieved from

http://economistatlarge.com/portfolio-theory/r-optimized-portfolio.

APPENDIX

Page 14

Appendix

APPENDIX 1.

Table. Variable Description and Coverage

Variable Description Total Coverage %

GVKEY Standard and Poor's Identifier 100.00 datadate date 100.00 exchg Stock Exchange Code 100.00 fyear Data Year - Fiscal 99.99 tic Ticker 99.99 fyr Fiscal Year 99.99 act Current Assets - Total 78.90 ap Accounts Payable - Trade 87.50 at Total Assets 93.72 bkvlps Book Value Per Share 91.98 cogs Cost of Goods Sold 92.81 csho Common Shares Outstanding 99.34 dlc Debt in Current Liabilities - Total 92.38 dvpsp_c Dividends per Share - Pay Date - Calendar 92.29 dvpsp_f Dividends per Share - Pay Date - Fiscal 92.81 dvt Dividends - Total 93.23 ebit Earnings Before Interest and Taxes 89.64 ebita Earnings Before Interest 91.26 epspx Earnings Per Share (Basic) Excl. Extraordinary Items 93.02 ib Income Before Extraordinary Items 93.59 icapt Invested Capital - Total 92.13 invt Inventories - Total 89.75 lt Total Liabilities 90.81 ni Net Income (Loss) 93.31 prcc_f Price Close - Annual - Fiscal 95.69 sale Sales/Turnover (Net) 93.34 seq Stockholders' Equity - Total 90.13

APPENDIX

Page 15

APPENDIX 2.

Formula used for variable selection.

Market Share = Market Value

Total Market Value=DVPSP_FAT

Price to Earnings ratio =Stock Price

Earnings per Share =PRCC_C×CSHO

IB

Price to Book ratio =Market Capitalization

Book Value =PRCC_CBKVLPS

Price to Sales Ratio =Share Price

Sales per Share=PRCC_CSALE/CSHO =

PRCC_C×CSHOSALE

Dividend Yield =DVPSP_FPRCC_F

Earnings per Share = EPSPX

Debt to Equity =Total Debt

Total Equity=

LTSEQ

Return on Assets =Net IncomeTotal Assets

=NIAT

Return on Equity =Net IncomeTotal Equity

=NISEQ

Current Ratio = Current Asset Current Liability =

ACTDLC

Inventory Turnover = Cost of Goods SoldAverage Inventory

=SALEINVT

Payout Ratio = Dividends per ShareEarnings per Share

=DVPSP_FEPSPX

Tobin!s Q Ratio = Total Market Value

Total Asset Value=DVPSP_FAT

APPENDIX

Page 16

Appendix 3.

R code:

library(plyr) setwd("E:/SJ/1Rice/682-Quantitative Financial Analytics/final") getwd() fulldata <- read.csv("fulldata.csv",header=T) data <- fulldata[which(fulldata$exchg!=0& fulldata$exchg!=1& fulldata$exchg!=3& fulldata$exchg!=13& fulldata$exchg!=19),] data <- subset(data,select=c(-opincar,-oancf)) data.c <- na.omit(data) data.c$tic = as.character(data.c$tic) library(fBasics) basicStats(data.c$fyear) data.c$mkt <- data.c$prcc_f*data.c$csho names(data.c) mktsum <- ddply(data.c, .(fyear), summarise, summkt=sum(mkt)) data.c<-join(mktsum,data.c,by="fyear") data.c <-data.c[order(data.c$GVKEY),] data.c$ID <- 1:length(data.c[,1]) data.r <- data.frame(ID=1:length(data.c[,1])) data.r$cr <- data.c$act/data.c$dlc data.r$payout <- data.c$dvpsp_f/data.c$epspx data.r$mktshare <- data.c$mkt/data.c$summkt data.r$tq <- data.c$prcc_f*data.c$csho/data.c$at data.r$pe <-data.c$prcc_f*data.c$csho/data.c$ib data.r$pb <-data.c$prcc_f*data.c$csho/data.c$bkvlps data.r$ps<-data.c$prcc_f*data.c$csho/data.c$sale data.r$dy<-data.c$dvpsp_f/data.c$prcc_f data.r$eps<-data.c$epspx data.r$de<-data.c$lt/data.c$seq data.r$roa<-data.c$ni/data.c$at data.r$roe<-data.c$ni/data.c$seq data.r$fyear <- data.c$fyear

APPENDIX

Page 17

data.r$tic <- data.c$tic av <- ddply(data.r,.(fyear),summarise,mean=mean(roe)) av ##calculating return p_end <- data.c$prcc_f p_endnxty <- c(data.c$prcc_f[-1],0) p_return <- p_endnxty/p_end-1 data.r$rtrn <- p_return ##mkt return, transfer from monthly to annual mkt.r <- read.csv("market.csv",header=T) names(mkt.r) mkt.r$r1 <- round(mkt.r$vwretd+1,5) mkt.r$fyear <- rep(1950:2013,each=12) mkt <- ddply(mkt.r,.(fyear),summarise,mktr=prod(r1)-1) mkt$fyear=mkt$fyear-1 mkt ##assign mkt return by year data.r <- join(data.r,mkt,by="fyear") data.r$excsr <- data.r$rtrn - data.r$mktr data.r ##delete last row for each company f=c() for (i in 1:(length(data.r[,1])-1)){ if (data.r$fyear[i+1]-data.r$fyear[i]!=1){ f = c(f,data.r$ID[i]) } } f data.model <- data.r[-f,] basicStats(data.model$tq) is.na(data.model) <- sapply(data.model, is.infinite) data.model <- na.omit(data.model) data.model[which(data.model==2705),] data.model <- subset(data.model,data.model$eps >0 & data.model$pe>0&data.model$ps>0&data.model$pb>0) data.model.c <- data.model[ , -which(names(data.model) %in% c("fyear","rtrn","mktr","ID","tic"))]

APPENDIX

Page 18

##sample set.seed(1000) samp1=sample(2,nrow(data.model.c),replace=TRUE,prob=c(0.4,0.6)) data.train <- data.model.c[samp1==1,] data.test <- data.model.c[samp1==2,] length(which(data.train$eps<0)) write.csv(data.train,"modeltrain.csv",row.names=F) library(randomForest) ##small sample samp2=sample(2,nrow(data.model.c),replace=TRUE,prob=c(0.4,0.6)) data.small <- data.model.c[samp2==1,] rf1 <- randomForest(excsr~.,data=data.train) imp1 <- importance(rf1) data.2000 <- data.model[which(data.model$fyear<2000),] data.2000.c <- data.2000[ , -which(names(data.2000) %in% c("fyear","rtrn","mktr","ID","tic"))] samp2000 = sample(2,nrow(data.2000.c),replace=TRUE,prob=c(0.4,0.6)) data.2000samp <- data.2000.c[samp2000==1,] rf2000 <- randomForest(excsr~.,data=data.2000samp) imp2000 <- importance(rf2000) imp2000 data.2013 <- data.model[which(data.model$fyear<2013),] data.2013.c <- data.2013[ , -which(names(data.2013) %in% c("fyear","rtrn","mktr","ID","tic"))] samp2013 = sample(2,nrow(data.2013.c),replace=TRUE,prob=c(0.4,0.6)) data.2013samp <- data.2013.c[samp2013==1,] rf2013 <- randomForest(excsr~.,data=data.2013samp) imp2013 <- importance(rf2013) imp2013

APPENDIX

Page 19

library(nnet) nnet1 <- nnet(excsr~.,data.train,size=10,abstol = 1e-10) library(lars) lasso1 <- lars(x=as.matrix(data.small[,-15]),y=as.matrix(data.small[,15]),type="lasso") names(lasso1) lasso1$df data.2000 <- data.model[which(data.model$fyear==2000),] data.2000.c <-data.2000[ , -which(names(data.model) %in% c("ID","fyear","rtrn","mktr","tic"))] pred2000 <- predict(rf1,data.2000.c) pred2000 <- cbind(pred2000,data.2000) pred2000top100 <- pred2000[order(pred2000$pred2000,decreasing=T),][1:50,] pred2000top100rank <- pred2000[order(pred2000$excsr,decreasing=T),][1:50,] length(which(pred2000top100$excsr>0)) write.csv(pred2000top100,"pred2000top100.csv",row.names=F) data.2000 <- data.model[which(data.model$fyear==2000),] data.2000.c <-data.2000[ , -which(names(data.model) %in% c("ID","fyear","rtrn","mktr","tic"))] pred2000a <- predict(rf2013,data.2000.c) pred2000a <- cbind(pred2000a,data.2000) pred2000top80a <- pred2000a[order(pred2000a$pred2000a,decreasing=T),][1:80,] pred2000top80ranka <- pred2000a[order(pred2000a$excsr,decreasing=T),][1:80,] length(which(pred2000top80a$excsr>0)) write.csv(pred2000top80a,"pred2000top80a.csv",row.names=F) data.2012 <- data.model[which(data.model$fyear==2012),] data.2012.c <-data.2012[ , -which(names(data.model) %in% c("ID","fyear","rtrn","mktr","tic"))]

APPENDIX

Page 20

pred2012 <- predict(rf1,data.2012.c) pred2012 <- cbind(pred2012,data.2012) pred2012top100 <- pred2012[order(pred2012$pred2012,decreasing=T),][1:100,] pred2012top100rank <- pred2012[order(pred2012$excsr,decreasing=T),][1:100,] pred2012top30 <- pred2012[order(pred2012$pred2012,decreasing=T),][1:30,] write.csv(pred2012top30,"pred2012top30.csv",row.names=F) ##extract stock getrt <- function(tic, s,e){ year = s:e art=matrix(nrow=length(year),ncol=length(tic)) for(j in 1:length(tic)){ for (i in (1:length(year))){ a = try(fulldata[which(fulldata$tic==tic[j] & fulldata$fyear==year[i]),]$prcc_f,silent=T)/try(fulldata[which(fulldata$tic==tic[j] & fulldata$fyear==year[i-1]),]$prcc_f,silent=T) if(length(a)==0) {art[i,j]<-NA} else{art[i,j]=a-1} } } colnames(art) = tic art = art[-1,] art=t(na.omit(t(art))) } a = try(fulldata[which(fulldata$tic=="BYBI" & fulldata$fyear==1995),]$prcc_f,silent=T)/try(fulldata[which(fulldata$tic=="BYBI" & fulldata$fyear==1994),]$prcc_f,silent=T) a fff <- getrt(pred2000top50a$tic,1991,2000) fff

APPENDIX

Page 21

ourreturns$R ### ##fpca install.packages("psy") library("psy") data = read.csv("F:\\MSTAT\\682\\final\\modeltrain.csv", header= T) data2 = read.csv("F:\\MSTAT\\682\\final\\modeltrain.csv", header= T) nrow(data)*0.6 data3 = data2 data4 = read.csv("F:\\MSTAT\\682\\final\\modeltrain2.csv", header= T) head(data4) is.infinite(as.matrix(data[,1:12]))==T fpca(formula=data$excsr~data$cr + data$invt.tvr +data$payout + data$mktshare + data$tq + data$pe +data$pb + data$ps + data$dy + data$eps + data$de + data$roa + data$roe +data$atshare , y = data$excsr, x = data[,1:12], data = data, cx=0.75, pvalues="No", partial="Yes", input="data", contraction="No", sample.size=1) ##pca conomy.pr<-princomp(~data$cr + data$payout +data$mktshare + data$tq + data$pe +data$pb + data$ps + data$dy + data$eps + data$de + data$roa + data$roe, data=sample(data,23137, replace = T), cor=T) summa = summary(conomy.pr, loadings=TRUE) summa$loading pre<-predict(conomy.pr) head(pre) data$z1<-pre[,1] data$z2<-pre[,2] data$z3<-pre[,3] data$z4<-pre[,4] data$z5<-pre[,5] data$z6<-pre[,6]

APPENDIX

Page 22

data$z7<-pre[,7] data$z8<-pre[,8] data$z9<-pre[,9] data$z10<-pre[,10] data$z11<-pre[,11] data$z12<-pre[,12] lm.sol<-lm(excsr ~ z1+z2+z3+z4+z5+z6+z7+z8+z9+z10+z11+z12, data=data) summary(lm.sol) beta<-coef(lm.sol) A<-loadings(conomy.pr) x.bar<-conomy.pr$center x.sd<-conomy.pr$scale coef<-(beta[2]*A[,1]+ beta[3]*A[,2]+beta[4]*A[,3]+ beta[5]*A[,4]+ beta[6]*A[,5]+ beta[7]*A[,6]+ beta[8]*A[,7]+ beta[9]*A[,8]+ beta[10]*A[,9]+ beta[11]*A[,10]+ beta[12]*A[,11])/x.sd beta0 <- beta[1]- sum(x.bar * coef) c(beta0, coef) ##can a <- data2[ ,1:12] b <- data2[ ,13] cc = cancor(a,b) names(cc) cc$cor str(data) install.packages("CCA") library(CCA) cxy1=cc(a,b) plt.cc(cxy1) ##PLSR install.packages("pls") library(pls) head(data3) nir.mvr <- mvr(excsr ~ ., ncomp = 12, data=data.train) summary(nir.mvr) names(nir.mvr) nir.mvr$coefficients head(data3) v = predict(nir.mvr, comps = 1:12, newdata = data4)

APPENDIX

Page 23

head(v) head(data2) head(data) coef(nir.mvr) ### portfolio optimization ### library(quadprog) library(stockPortfolio) # the ticker for stocks stocks <- c( "SPY", "EFA", "IWM", "VWO", "LQD", "HYG") # get information of the stocks using ticker returns <- getReturns(stocks, freq="day") str(returns) # get returns summary(returns) # creating efficient frontier plot # This file uses the solve.QP function in the quadprog package to solve for the # efficient frontier. # Since the efficient frontier is a parabolic function, we can find the solution # that minimizes portfolio variance and then vary the risk premium to find # points along the efficient frontier. Then simply find the portfolio with the # largest Sharpe ratio (expected return / sd) to identify the most # efficient portfolio library(stockPortfolio) # Base package for retrieving returns library(ggplot2) # Used to graph efficient frontier library(reshape2) # Used to melt the data

APPENDIX

Page 24

library(quadprog) #Needed for solve.QP ###### coverage fun = function(x) {1-sum(is.na(x))/length(x)} coverage = apply(fulldata, 2, fun) coverage # Find the optimal portfolio eff.optimal.point.1 <- eff.1[eff.1$sharpe==max(eff.1$sharpe),] write.csv(eff.optimal.point.1, file="weight.csv") eff = eff.1 eff.optimal.point = eff.optimal.point.1 r.f. = 0.00001 # rf = data.frame(x=c(0, eff.optimal.point$Std.Dev), y=c(r.f., eff.optimal.point$Exp.Return)) ggplot(eff, aes(x=Std.Dev, y=Exp.Return)) + geom_point(alpha=.3, color=ealdark) + geom_point(data=eff.optimal.point, aes(x=Std.Dev, y=Exp.Return, label=sharpe), color=ealred, size=5) + geom_abline(intercept = r.f., slope=(eff.optimal.point$Exp.Return - r.f.)/eff.optimal.point$Std.Dev, color="blue") + annotate(geom="text", x=eff.optimal.point$Std.Dev, y=eff.optimal.point$Exp.Return, label=paste("Risk: ", round(eff.optimal.point$Std.Dev*100, digits=3),"\nReturn: ", round(eff.optimal.point$Exp.Return*100, digits=3),"%\nSharpe: ", round(eff.optimal.point$sharpe*100, digits=2), "%", sep=""), hjust=0, vjust=1.2) + ggtitle("Efficient Frontier and Optimal Portfolio") + labs(x="Risk (standard deviation of portfolio, daily)", y="Return (Daily)") + theme(panel.background=element_rect(fill=eallighttan), text=element_text(color=ealdark), plot.title=element_text(size=24, color=ealred))

APPENDIX

Page 25

ggsave("Efficient.Frontier.2013.png") ##### 2000, 50 STOCKS tic50.1 = c("MED", "AGX", "PDO", "EDAP", "UWN", "ENG","AOI", "HW","ICCC", "SPAR", "CLWT", "TSCO", "KAI", "MNTG","JST","TRR","GPI","BEBE","MSN","SGRP","TISI", "ARCI","STEI", "BRLI","HNR") ourreturns.2 <- getReturns(tic50.1, freq="day", start="1995-01-01", end="2000-12-31") eff.2 <- eff.frontier(returns=ourreturns.2$R, short="no", max.allocation=.50, risk.premium.up=1, risk.increment=.001) eff.optimal.point.2 <- eff.2[eff.2$sharpe==max(eff.2$sharpe),] write.csv(eff.optimal.point.2, file="weight2.csv") # edit ticker tic50.3 = c("MED", "AGX", "EDAP", "UWN", "ENG","AOI", "HW","ICCC", "SPAR", "CLWT", "TSCO", "KAI", "JST","TRR","GPI","BEBE","MSN","SGRP","TISI", "ARCI", "BRLI","HNR") ourreturns.3 <- getReturns(tic50.3, freq="day", start="1995-01-01", end="2000-12-31") eff.3 <- eff.frontier(returns=ourreturns.3$R, short="no", max.allocation=.50, risk.premium.up=1, risk.increment=.001) eff.optimal.point.3 <- eff.3[eff.3$sharpe==max(eff.3$sharpe),] write.csv(eff.optimal.point.3, file="weight3.csv") ##### new model using 2000 data pred2000top50a <- read.csv("~/Desktop/682/Final/code/pred2000top50a.csv") tic2000 = c("GEL", "MED", "CLWT", "SGRP", "ENG", "ARCI", "EDAP", "HNR", "ENTG", "CMT", "ICCC", "IDCC", "MSN","TRNS", "EVI", "RCPI", "TPC", "INS", "LBIX", "UWN", "SNP", "AGX", "PDO", "ANTP") # no information "ERS", "KALU" ourreturns.2000 <- getReturns(tic2000, freq="day", start="1995-01-01", end="2000-12-31") eff.2000 <- eff.frontier(returns=ourreturns.2000$R, short="no", max.allocation=.50,

APPENDIX

Page 26

risk.premium.up=1, risk.increment=.001) eff.optimal.point.2000 <- eff.2000[eff.2000$sharpe==max(eff.2000$sharpe),] write.csv(eff.optimal.point.2000, file="weight2000.csv") eff = eff.2000 eff.optimal.point = eff.optimal.point.2000 r.f. = 0.0002 ggplot(eff, aes(x=Std.Dev, y=Exp.Return)) + geom_point(alpha=.3, color=ealdark) + geom_point(data=eff.optimal.point, aes(x=Std.Dev, y=Exp.Return, label=sharpe), color=ealred, size=5) + geom_abline(intercept = r.f., slope=(eff.optimal.point$Exp.Return - r.f.)/eff.optimal.point$Std.Dev, color="blue") + annotate(geom="text", x=eff.optimal.point$Std.Dev, y=eff.optimal.point$Exp.Return, label=paste("Risk: ", round(eff.optimal.point$Std.Dev*100, digits=3),"\nReturn: ", round(eff.optimal.point$Exp.Return*100, digits=3),"%\nSharpe: ", round(eff.optimal.point$sharpe*100, digits=2), "%", sep=""), hjust=0, vjust=1.2) + ggtitle("Efficient Frontier and Optimal Portfolio") + labs(x="Daily Risk (standard deviation of portfolio)", y="Daily Return") + theme(panel.background=element_rect(fill=eallighttan), text=element_text(color=ealdark), plot.title=element_text(size=24, color=ealred)) ggsave("Efficient.Frontier.2000.png") ###### 2000 - 80 tic2000.80 = c("GEL", "MED", "CLWT", "SGRP", "ENG", "ARCI", "EDAP", "HNR", "ENTG", "CMT", "ICCC", "IDCC", "MSN","TRNS", "EVI", "RCPI", "TPC", "INS", "LBIX", "UWN", "SNP", "AGX", "PDO", "ANTP", "JST", "TOF", "RCKY", "FCX", "PQ", "RICK", "HW",

APPENDIX

Page 27

"JOSB", "BAMM", "IVAN", "AN") # "FPP","BAGL", no info ourreturns.2000.80 <- getReturns(tic2000.80, freq="day", start="1995-01-01", end="2000-12-31") eff.2000.80 <- eff.frontier(returns=ourreturns.2000.80$R, short="no", max.allocation=.50, risk.premium.up=1, risk.increment=.001) eff.optimal.point.2000.80 <- eff.2000.80[eff.2000.80$sharpe==max(eff.2000.80$sharpe),] write.csv(eff.optimal.point.2000.80, file="weight2000.80.csv") eff = eff.2000.80 eff.optimal.point = eff.optimal.point.2000.80 r.f. = 0.0002 # rf = data.frame(x=c(0, eff.optimal.point$Std.Dev), y=c(r.f., eff.optimal.point$Exp.Return)) ggplot(eff, aes(x=Std.Dev, y=Exp.Return)) + geom_point(alpha=.3, color=ealdark) + geom_point(data=eff.optimal.point, aes(x=Std.Dev, y=Exp.Return, label=sharpe), color=ealred, size=5) + geom_abline(intercept = r.f., slope=(eff.optimal.point$Exp.Return - r.f.)/eff.optimal.point$Std.Dev, color="blue") + annotate(geom="text", x=eff.optimal.point$Std.Dev, y=eff.optimal.point$Exp.Return, label=paste("Risk: ", round(eff.optimal.point$Std.Dev*100, digits=3),"\nReturn: ", round(eff.optimal.point$Exp.Return*100, digits=3),"%\nSharpe: ", round(eff.optimal.point$sharpe*100, digits=2), "%", sep=""), hjust=0, vjust=1.2) + ggtitle("Efficient Frontier and Optimal Portfolio") + labs(x="Risk (standard deviation of portfolio, daily)", y="Return (Daily)") + theme(panel.background=element_rect(fill=eallighttan), text=element_text(color=ealdark), plot.title=element_text(size=24, color=ealred))

APPENDIX

Page 28

ggsave("Efficient.Frontier.2000.80.png") ################# ##### geometric mean function # test matrix abc = matrix(rep(1:10,3),ncol=3, byrow=FALSE) abc # function for matrix geom mean matrix.gm = function(x) { gm_mean = function(a) {exp(mean(log(a)))} apply(x, 2, gm_mean) } matrix.gm(abc) ######################### # modified eff function ######################## eff.frontier <- function (returns, short="no", max.allocation=NULL, risk.premium.up=.5, risk.increment=.005){ # return argument should be an m x n matrix with one column per security # short argument is whether short-selling is allowed; default is no (short # selling prohibited)max.allocation is the maximum % allowed for any one # security (reduces concentration) risk.premium.up is the upper limit of the # risk premium modeled (see for loop below) and risk.increment is the # increment (by) value used in the for loop covariance <- cov(returns) print(covariance) n <- ncol(covariance) # Create initial Amat and bvec assuming only equality constraint # (short-selling is allowed, no allocation constraints) Amat <- matrix (1, nrow=n) bvec <- 1 meq <- 1 # Then modify the Amat and bvec if short-selling is prohibited if(short=="no"){

APPENDIX

Page 29

Amat <- cbind(1, diag(n)) bvec <- c(bvec, rep(0, n)) } # And modify Amat and bvec if a max allocation (concentration) is specified if(!is.null(max.allocation)){ if(max.allocation > 1 | max.allocation <0){ stop("max.allocation must be greater than 0 and less than 1") } if(max.allocation * n < 1){ stop("Need to set max.allocation higher; not enough assets to add to 1") } Amat <- cbind(Amat, -diag(n)) bvec <- c(bvec, rep(-max.allocation, n)) } # Calculate the number of loops loops <- risk.premium.up / risk.increment + 1 loop <- 1 # Initialize a matrix to contain allocation and statistics # This is not necessary, but speeds up processing and uses less memory eff <- matrix(nrow=loops, ncol=n+3) # Now I need to give the matrix column names colnames(eff) <- c(colnames(returns), "Std.Dev", "Exp.Return", "sharpe") # gemometric mean matrix.gm = function(x) { gm_mean = function(a) {exp(mean(log(a)))} apply(x, 2, gm_mean) } # Loop through the quadratic program solver for (i in seq(from=0, to=risk.premium.up, by=risk.increment)){ dvec <- colMeans(returns) * i # This moves the solution along the EF sol <- solve.QP(covariance, dvec=dvec, Amat=Amat, bvec=bvec, meq=meq) eff[loop,"Std.Dev"] <-

APPENDIX

Page 30

sqrt(sum(sol$solution*colSums((covariance*sol$solution)))) eff[loop,"Exp.Return"] <- as.numeric(sol$solution %*% (matrix.gm(returns+1))-1) # eff[loop,"Exp.Return"] <- as.numeric(sol$solution %*% colMeans(returns)) eff[loop,"sharpe"] <- eff[loop,"Exp.Return"] / eff[loop,"Std.Dev"] eff[loop,1:n] <- sol$solution loop <- loop+1 } return(as.data.frame(eff)) }

a porfolio strategy - rice university · a porfolio strategy ... song, le you, kunhui zhang stat682...

Documents