using r 4/10/2012 geoff black matthew goglia. what is r? r is a free program for statistical...

21
Using R 4/10/2012 Geoff Black Matthew Goglia

Upload: kathlyn-mcdaniel

Post on 01-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Using R 4/10/2012

Geoff BlackMatthew Goglia

What is R?

R is a free program for statistical analysis and graphical display of data.

R uses code; these saved scripts can be easily used again to perform calculations on new data

It is particularly strong at performing matrix calculations

Gives us the ability to quickly pull various data in a usable format

Before Beginning R

1. Download Rhttp://cran.cnr.berkeley.edu/

2. DemonstrationBasic ArithmeticMatricesTreasury Bill data

1) Importing Data and Calculating Returns

Before Beginning R

1. Create an Excel Spreadsheet Input all ETF symbols that you want data for into one

columnSave as a .CSV file, for example “symbols.csv”

2. Make sure your spreadsheet and R are in the same directory

Create a new folder on your desktop and save your spreadsheet and a copy of R in it

Set R to start in the same directory as your spreadsheet by editing “properties” for PC or “preferences” for Mac

Install and Load Yahoo Finance Package

Open R and type in the following:

>if (!require(fImport)) install.packages ('fImport’) >library("fImport")

You are now able to import market data from Yahoo.

(“>” is the command line in R; it is not part of the code.)

Pull Symbols from Excel Spreadsheet

Type in the following:

>symbols <- scan("symbols.csv",what=character(),sep = ",")

“<-” gives a definition.

Now R knows which ETFs to pull data for. Make sure to use the name of the spreadsheet that you created if you did not name it “symbols.csv”.

Load Yahoo Data for ETFs

Type in the following:

>stockdata <- yahooSeries(symbols,nDaysBack = 365*3)

This code pulls the last 3 years of data for each ETF in your Excel spreadsheet. You can change the number of years by changing the last digit in the code.

Remove Unnecessary Data

Type in the following:

>c <- 1:(ncol(stockdata)/6)*6

>stockadj <- stockdata[,c]

We only need the 6th column of data that Yahoo gives us. The first code, “c” defines the columns we need and ignores the ones we don’t.

The second line defines “stockadj” as the data to be extracted from the Yahoo data we received

Calculate Daily Returns

Type in the following:

>returns <- stockadj/lag(stockadj,k=1)-1

This code defines returns as: (price/previous day’s price)-1

Write to File

Type in the following:

>write.table(returns, "returns.csv", sep=",", col.names=NA)

This code will produce an Excel spreadsheet in your working directory under the name “returns.csv”. You now have 3 years’ of daily returns for each ETF in your original spreadsheet.

Final Code

if (!require(fImport)) install.packages('fImport')library("fImport")symbols <- scan("symbols.csv",what=character(),sep = ",")stockdata <- yahooSeries(symbols,nDaysBack = 365*3)c <- 1:(ncol(stockdata)/6)*6stockadj <- stockdata[,c]

returns <- stockadj/lag(stockadj,k=1)-1write.table(returns, "returns.csv", sep=",", col.names=NA)

Now that you know how to write the code, you can just copy and paste the above into R.

You are now ready to use R to solve for GMVP.

2) Creating GMVP

A Review

1 Make a Covariance Matrix of returns above the risk-free rate

2 Make an Inverse Matrix

3 Make a Vector of Ones

4 Multiply the Inverse Matrix by the Vector of One

This is the basic equation we need to solve:

Source: cran.r-project.org/web/packages/quadprog/quadprog.pdf

Install the Quadratic Programming Package

>if (!require(quadprog)) install.packages('quadprog')>library("quadprog")

Definitions

>n <- dim(returns)[1]>p <- dim(returns)[2]>ub <- rep(.1,p)>zeros <- numeric(p) >ones <- zeros +1 >dim(ub) <- c(p,1)>dim(ones) <- c(p,1)>dim(zeros) <- c(p,1)>dim(ub) <- c(p,1)

Number of rows, number of columns, upper bounds, vectors

Construct the A-Matrix and B-Vector

>Atmp <- rbind(t(ones),diag(p),-diag(p))>Amat <- t(Atmp)>bvec <- rbind(1,zeros,-ub)

Constraints: Ax >= bSum of stock weights = 1No shortsWeights must be below the upper bound

Build the Covariance Matrix

>sigma<- cov(returns, use="pairwise")

Pairwise means there must be two values. If there is only one value the covariance is not calculated.

Solve for GMVP and Write to Table

>gmvp=solve.QP(sigma,zeros,Amat,bvec,meq=1) >gmvp$solution <- round(gmvp$solution,2)>soln<-matrix(cbind(symbols,gmvp$solution),nrow=p,ncol=2)>write.table(soln, "soln.csv”, sep=",",col.names=F,row.names=F)

Meq = 1 means the first row of constraints is an equality

Final Code

if (!require(quadprog)) install.packages('quadprog')library("quadprog")n <- dim(returns)[1] p <- dim(returns)[2] ub <- rep(.1,p)zeros <- numeric(p) ones <- zeros +1 dim(ub) <- c(p,1)dim(ones) <- c(p,1)dim(zeros) <- c(p,1)dim(ub) <- c(p,1)Atmp <- rbind(t(ones),diag(p),-diag(p))Amat <- t(Atmp)bvec <- rbind(1,zeros,-ub)sigma<- cov(returns, use="pairwise”)gmvp=solve.QP(sigma,zeros,Amat,bvec,meq=1) #meq=1 means first row of contraints is an equality gmvp$solution <- round(gmvp$solution,2)soln<-matrix(cbind(symbols,gmvp$solution),nrow=p,ncol=2)write.table(soln, "soln.csv", sep=",",col.names=F,row.names=F)

Now that you know how to write the code, you can just copy and paste the above into R to solve for GMVP.