introduction to panel data analysis using...

INTRODUCTION TO PANEL DATA ANALYSIS

USING EVIEWS

FARIDAH NAJUNA MISMAN, PHD

DEPARTMENT OF FINANCE

FACULTY OF BUSINESS AND MANAGEMENT

UITM JOHOR

OUTLINE

1. Introduction

2. CLRM Assumptions

3. Static Panel Data Models

4. Getting Start with EViews 9

5. Data Analysis

6. Reading The Results

Panel Data Analysis Workshop-6 May 2018

2

1. INTRODUCTION

There are 3 types of data structure available:

1. Time Series data is the data that is collected at regular time intervals such as every month

or every year. (N=1, t=1……T)

• Usually this represents the values for a single firm or a single variable at different points in time.

• Most macroeconomic data for real variables e.g. GDP or Consumption, is quarterly time series data.

• The data for monetary variables such as Interest rates is often monthly time series data.

2. Cross sectional data is data associated with the values of many different firms or

households that is collected at a single point in time. (i=1……N,T=1)

3. Panel data is a combination of the other two where we have values for all members of a

panel or group of firms or households measured at more than one period in time. (i=1…..N,

t=1……T)


3

1. INTRODUCTION

Classical panel data: N>T or known as short or micro panel

Macro panel: T>N or known as long panel

Balanced panel : data available for all cross section for all periods.

No of observation: n = NT

Unbalanced panel : different T for individual.


4

1. INTRODUCTION

Selection of econometric models will depend on the types of data:

1. Cross-section Model: Apply to cross-section data set.

2. Time-series Model: Normally applied to time series data, to uncover long run relations

and short run dynamics.

3. Panel Data Modelling: Normally used to capture heterogeneity across samples and due

to the need to have bigger sample size.

❖ Statics Panel data model : POLS, FE, RE, BE

❖Dynamic panel data: GMM

❖Panel unit root and cointegration (macro panel)


5

1. INTRODUCTION

• Advantages & Disadvantages

Panel Data allow us to control for variables you cannot observe or measure such as:

❖ Time-invariant factors like geographical area, firm management characteristics.

❖ Variables that change over time but not across entities like national policies, federal regulation,

international agreements.

In other word, panel data is able to take into account for individual

heterogeneity (uniqueness)- resulted efficient estimates


6

1. INTRODUCTION

Advantages:

i. Larger sample size, more variation, less collinearity therefore it will increased precision of

estimates

ii. Ability to study the dynamic- repeated cross-sectional observations-adjustment over times

iii. Ability to account for heterogeneity across individual often ignored in pooled data-more

robust against misspecification due to omitted variable

Disadvantages:

i. Data availablity/maintenance

ii. Measurement errors

iii. Self-selection bias


7

1. INTRODUCTION

• Why Analyse Panel Data?

We are interested in describing change over time o social change, e.g. changing

attitudes, behaviours, social relationships o individual growth or development, e.g.

life-course studies, child development, career trajectories, school achievement o

occurrence (or non-occurrence) of events

We want superior estimates trends in social phenomena o Panel models can be used

to inform policy – e.g. health, obesity o Multiple observations on each unit can

provide superior estimates as compared to cross-sectional models of association

We want to estimate causal models o Policy evaluation o Estimation of treatment

effects


8

1. INTRODUCTION

• What kind of data are required for panel analysis?

Basic panel methods require at least two “waves” of measurement. Consider student

GPAs and job hours during two semesters of college

One way to organize the panel data is to create a single record for each combination of

unit and time period

Notice that the data include:

• A time-invariant unique identifier for each unit (Student ID)

• A time-varying outcome (GPA)

• An indicator for time (Semester)

Panel datasets can include other time-varying or time-invariant variables


9

2.CLASSICAL LINEAR REGRESSION MODEL (CLRM)

Table taken from page 37, “Applied Econometrics:, Asteriou & Hall, 2nd ed. 2011, Palgrave Macmillan


10

3. PANEL DATA MODEL: POOLED OLS

Pooled OLSyit = β0 + βit Xit + αi + νit

i. αi and vit are normally distributed and they are mutually independent,

ii. E(αi) = E(vij) = 0, for i = 1,...,m, j = 1,2,...,m(i),

iii. E(αiαi´) =

ii

otherwise,

,

,0

21

iv. E(vijvi´j´) =

jjii

otherwise

,

.

,

,0

22


11

4.GETTING START WITH EVIEWS 9


12


13


14


15


16


17


18


19


20

5. DATA ANALYSIS


21

DESCRIPTIVE STATISTICS


22


23

CORRELATION ANALYSIS


24


25


26


27

POOLED OLS REGRESSION


28


29


30


31

NORMALITY TEST


32


33


34

DUMMY VARIABLES


35


36


37


38


39


40

MODEL VALIDATION

1. Face validity: signs and magnitudes make sense

2. Statistical validity:

• Parameter significance: t-test

• Model fit: R2

• Model significance: F-test

• Strength of effects: beta-coefficients

• Discussion of multicollinearity: correlation matrix

3. Predictive validity: how well the model predicts

• Out-of-sample forecast errors© 2012 John Wiley & Sons Ltd.

www.wiley.com/college/sekaran


PARAMETER SIGNIFICANCE

• Testing that a specific parameter is significant (i.e., j 0)

• H0: j = 0

H1: j 0

• Test-statistic: t = bj/SEj ~ tn-k-1

with bj = the estimated coefficient for j

SEj = the standard error of bj

© 2012 John Wiley & Sons Ltd.




P-VALUES• This is an alternative to the t-test

• A p-value, or marginal significance level, is the probability of observing a t-scorethat size or larger (in absolute value) if the null hypothesis were true

• Graphically, it’s two times the area under the curve of the t-distribution betweenthe absolute value of the actual t-score and infinity.

• In theory, we could find this by combing through pages andpages of statistical tables

• But we don’t have to, since we have EViews and Stata: these(and other) statistical software packages automatically give thep-values as part of the standard output!

• In light of all this, the p-value decision rule therefore is:

Reject H0 if p-valueK < the level of significance and if has the sign implied by HA

43

MEASURE OF OVERALL FIT: R2

• R2 measures the proportion of the variation in y that is explained by the

variation in x.

• R2 = total variation – unexplained variation

total variation

• R2 takes on any value between zero and one:

• R2 = 1: Perfect match between the line and the data points.

• R2 = 0: There is no linear relationship between x and y.




MODEL SIGNIFICANCEF-TEST

• H0: 0 = 1 = ... = m = 0 (all parameters are zero)

H1: Not H0

• Test statistic (k = # of variables excl. intercept)

F = (SSReg/k) ~ Fk, n-1-k

(SSe/(n – 1 – k)

SSReg = explained variation by regression

SSe = unexplained variation by regression




6. READING THE RESULTS

Dependent Variable: CR

Method: Panel Least Squares

Date: 05/23/17 Time: 17:06

Sample (adjusted): 1996 2011Time included

Total no of groups

Periods included: 16n=NT

Cross-sections included: 17

Total panel (unbalanced) observations: 85

Variable Coefficient Std. Error t-Statistic Prob.

C 12.83313 2.387841 5.374368 0.0000

FE -0.160617 0.039199 -4.097434 0.0001

FQ 2.032662 0.380137 5.347179 0.0000

CB 0.362423 0.185213 1.956787 0.0539

CAPR -0.203388 0.075746 -2.685126 0.0088

R-squared 0.371546 Mean dependent var 6.020596

Adjusted R-squared 0.340123 S.D. dependent var 5.639222

S.E. of regression 4.580898 Akaike info criterion 5.938690

Sum squared resid 1678.770 Schwarz criterion 6.082375

Log likelihood -247.3943 Hannan-Quinn criter. 5.996484

F-statistic 11.82412 Durbin-Watson stat 0.735389

Prob(F-statistic) 0.000000

Constant

If this no is < 0.05

then the model is

ok.

This is F test to see

whether all coeffs in

the model are diff

than zero.


46

Coefficient Std. Error t-Statistic Prob.

12.83313 2.387841 5.374368 0.0000

-0.160617 0.039199 -4.097434 0.0001

2.032662 0.380137 5.347179 0.0000

0.362423 0.185213 1.956787 0.0539

-0.203388 0.075746 -2.685126 0.0088

Coefficients of the

regressors.

Indicate how much

Y change

When X change

T-values test the hypothesis that

each coeff is diff from 0

To reject this, the t-value has to

be higher than 1.96 (95%

confidence interval). If this is the

case then you can say that the

variables has a significant

influence on your DV (Y). The

higher the value the higher the

relevance of the variable.

Two-tail p-values test the

hypothesis

That each coeff is diff from

0.To reject this,

P-value has to be lower

than 0.05 (95%). If this is

Case the you can say that

the variable has a

significant influence

On you DV (Y)


47

R-squared 0.371546 Mean dependent var 6.020596

Adjusted R-squared 0.340123 S.D. dependent var 5.639222

S.E. of regression 4.580898 Akaike info criterion 5.938690

Sum squared resid 1678.770 Schwarz criterion 6.082375

Log likelihood -247.3943 Hannan-Quinn criter. 5.996484

F-statistic 11.82412 Durbin-Watson stat 0.735389

Prob(F-statistic) 0.000000

R-squared shows the amount

of variance of Y explained by X Adjusted R-squared shows the

same as R-squared but adjusted by

the number of cases and number

of variables.

When the number of variables is

small and the number of cases is

very large,

then Adj R-squared is closer to R-

squared


48

introduction to panel data analysis using...

Documents