l1 introduction to satistical analysis

22
Introduction to Statistical Analysis Statistical Methods in Finance Lecture 1 Ta-Wei Huang September 8, 2015 Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 1 / 20

Upload: ta-wei-huang

Post on 17-Jan-2017

227 views

Category:

Data & Analytics


4 download

TRANSCRIPT

Page 1: L1 introduction to satistical analysis

Introduction to Statistical AnalysisStatistical Methods in Finance

Lecture 1

Ta-Wei Huang

September 8, 2015

Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 1 / 20

Page 2: L1 introduction to satistical analysis

Table of Contents

We all know the importance of data analysis, but seldom we know theprocedure of data analysis. In this class, I would like to introduce basicconcepts of statistical data analysis.

1 What is statistics?

2 Statistical Procedures

3 Next Lecture

Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 2 / 20

Page 3: L1 introduction to satistical analysis

Table of Contents

We all know the importance of data analysis, but seldom we know theprocedure of data analysis. In this class, I would like to introduce basicconcepts of statistical data analysis.

1 What is statistics?

2 Statistical Procedures

3 Next Lecture

Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 2 / 20

Page 4: L1 introduction to satistical analysis

Table of Contents

We all know the importance of data analysis, but seldom we know theprocedure of data analysis. In this class, I would like to introduce basicconcepts of statistical data analysis.

1 What is statistics?

2 Statistical Procedures

3 Next Lecture

Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 2 / 20

Page 5: L1 introduction to satistical analysis

What is statistics?

Definition on Wikipedia

Statistics is the study of the collection, analysis, interpretation,

presentation, and organization of data.

Descriptive statistics: summarize data from a sample using indexes

such as the mean or standard deviation

Inferential statistics: draw conclusions from data that are subject to

random variation (e.g., observational errors, sampling variation)

Actually, it’s an old-fashioned statement! Modern statistical methods

concerning more than descriptive and inferential analysis!

Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 3 / 20

Page 6: L1 introduction to satistical analysis

Statistical Procedures

Modern Statistical Procedures

Modern statistical analysis must have the following procedures.

Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 4 / 20

Page 7: L1 introduction to satistical analysis

Statistical Procedures

Problem Formulation

Don’t ask ”what can we learn from this data set!” The most important

question is what the problem we are facing now! Then, decide what kinds

of data you need.

improve credit card coverage of our bank in Taiwan

decrease the non-performing loan ratio of our bank

develop a trading rule to earn higher profit

Domain knowledge plays the most important role in this step! Learn the

basic financial theory and understand how the system works!

Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 5 / 20

Page 8: L1 introduction to satistical analysis

Statistical Procedures

Data Collection

After you’ve formulate your problem, you should collect the data. It’s

natural to ask two questions: what kinds of data you need and how to

collect?

What kinds of data you need? You need to use domain knowledge to

answer this question. ⇒ Define the population of your problem.

How to collect? Most of the time, we get data from some database.

⇒ Produce representative data for drawing correct information to

solve your problem!

Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 6 / 20

Page 9: L1 introduction to satistical analysis

Statistical Procedures

Data Cleaning and Exploratory Data Analysis 1

Data cleaning deals with detecting and removing errors and inconsistencies

from data in order to improve the quality of data. There are some types of

dirty data.

Missing values: some required values in the dataset are missing.

Inconsistent responses: usually seen in survey sampling.

Other errors: such as mistyping, non-desired format, etc.

Data cleaning is the most exhaustive step throughout the whole statistical

analysis. We need to clean data so that the final dataset is structured well.

Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 7 / 20

Page 10: L1 introduction to satistical analysis

Statistical Procedures

Data Cleaning and Exploratory Data Analysis 2

EDA is an approach to analyzing data sets to summarize their main

characteristics, often with visual methods. A statistical model can be used

or not, but primarily EDA is for seeing what the data can tell us beyond

the formal modeling or hypothesis testing task.

Graphical techniques: use suitable visualization to discover patterns

Cluster analysis: find individuals with similar features and group them

Dimensional Reduction: decrease # of variables by rotation axes

Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 8 / 20

Page 11: L1 introduction to satistical analysis

Statistical Procedures

EDA Example 1

Question: Return on large stocks ¿ small stocks? Stock dividend ¿ Cash

Dividend? Is there any interaction effect?

Dividend Policy

Cash Dividend Stock Dividend

Capital

Large4.24% -5.23% 2.69% 8.12%

3.94% 9.37% 6.71% 12.20%

Small5.92% 12.10% 24.65% -8.53%

-9.03% 0.24% 1.69% 12.63%

Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 9 / 20

Page 12: L1 introduction to satistical analysis

Statistical Procedures

EDA Example 2

How can we find from the following interaction plot (or profile chart)?

Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 10 / 20

Page 13: L1 introduction to satistical analysis

Statistical Procedures

Statistical Task Formulation 1

Now, after having the structured data, we can determine our task. From

the viewpoint of purpose, we mainly have three kinds of task. Note that

the task should connect tightly with your problem.

Explanatory Analysis: want to find a hidden common structures

behind the population.

Prediction: want to predict some feature when a new individual

comes in. (Why important?)

Forecasting: want to forecast future outcome of a/some given time

series variable(s).

Then, determine your outputs and inputs.

Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 11 / 20

Page 14: L1 introduction to satistical analysis

Statistical Procedures

Statistical Task Formulation 2

Suppose that you have n stocks with variables return on stock i, Ri,t,

risk-free rate, Rf,t, and market return, Rm,t.

Explanatory Analysis: Does the CAPM holds for this data set?

⇒ Need to design a ”empirical form” for CAPM.

⇒ performance measure: the explanatory power

Forecasting: Can we use the CAPM to predict a company’s future

return?

⇒ Need to design a forecasting model.

⇒ performance measure: the predictive accuracy

Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 12 / 20

Page 15: L1 introduction to satistical analysis

Statistical Procedures

Statistical Methods Selection

Determine models to apply by the following criterion.

Applying methods appropriate for the statistical task.

Applying methods appropriate for the outputs/inputs and data types.

Applying methods applicable for your computer. (Important!)

In finance, linear model and time series analysis are the most popular

methods, but others are also useful, such as multivariate analysis (or data

mining) and statistical learning.

Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 13 / 20

Page 16: L1 introduction to satistical analysis

Statistical Procedures

Data Type

There are three types of data you will use when dealing with a problem in

financial econometrics.

Cross-sectional Data: data on one or more variables collected at a

single point in time.

Time series Data: data that have been collected over a period of time

on one or more variables.

Panel Data: data having the dimensions of both time series and

cross-sections (very often to see).

Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 14 / 20

Page 17: L1 introduction to satistical analysis

Statistical Procedures

Cross-sectional Data

Corss-sectional data is collected by observing many subjects (such as

individuals, firms, countries, or regions) at the same point of time, or

without regard to differences in time.

Company ID Delisted EPS ROE Profit Margin

3651 1 0.39 3.13 0.68

5296 1 0.13 0.28 -6.74

4975 1 -2.82 -19.67 -76.53

3613 1 4.05 20.88 3.92

Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 15 / 20

Page 18: L1 introduction to satistical analysis

Statistical Procedures

Time Series Data

A time series is a sequence of data points, typically consisting of successive

measurements made over a time interval.

Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 16 / 20

Page 19: L1 introduction to satistical analysis

Statistical Procedures

Panel Data

panel data refers to multi-dimensional data frequently involving

measurements over time. Panel data contain observations of multiple

variables obtained over multiple time periods for the same individuals.

Example

A simple market model is of the form

Ri,t = αi,t + βi,tRm,t + εi,t,

where Ri,t is the return on stock i at time t and Ri,t is the market return

on stocks at time t.

Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 17 / 20

Page 20: L1 introduction to satistical analysis

Statistical Procedures

Model Evaluation

After getting the result, the evaluation step is necessary, and performance

measures are various for different purposes.

Explanatory analysis: goodness-of-fit and model interpretation.

Prediction: RMSE(prediction), ROC curve and AUC, cost analysis.

Forecasting: RMSE, MAE, MAPE, MASE, and so on.

Spirit

max Profit(Θ) or min Cost(Θ) subject to model risk, where Θ is the

result from a model.

Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 18 / 20

Page 21: L1 introduction to satistical analysis

Statistical Procedures

Final Step: Model Deployment

Congratulations! From here we can deploy our statistical model! Hey, here

we still need to ask some questions.

What is the risk and loss if your model is totally wrong?

Does your dataset reproducible? How long should you update your

model?

If there is a structural change on your population, what should you

do?

Model risk is very important! You should always be aware of the limitation

of your model so that when your model die, the loss is controllable.

Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 19 / 20

Page 22: L1 introduction to satistical analysis

Next Lecture

The Next Lecture

In next lecture, we will review the probability theory in an advanced level!

Ta-Wei Huang Introduction to Statistical Analysis September 8, 2015 20 / 20