introduction to panel data analysis using...
TRANSCRIPT
INTRODUCTION TO PANEL DATA ANALYSIS
USING EVIEWS
FARIDAH NAJUNA MISMAN, PHD
DEPARTMENT OF FINANCE
FACULTY OF BUSINESS AND MANAGEMENT
UITM JOHOR
OUTLINE
1. Introduction
2. CLRM Assumptions
3. Static Panel Data Models
4. Getting Start with EViews 9
5. Data Analysis
6. Reading The Results
Panel Data Analysis Workshop-6 May 2018
2
1. INTRODUCTION
There are 3 types of data structure available:
1. Time Series data is the data that is collected at regular time intervals such as every month
or every year. (N=1, t=1……T)
• Usually this represents the values for a single firm or a single variable at different points in time.
• Most macroeconomic data for real variables e.g. GDP or Consumption, is quarterly time series data.
• The data for monetary variables such as Interest rates is often monthly time series data.
2. Cross sectional data is data associated with the values of many different firms or
households that is collected at a single point in time. (i=1……N,T=1)
3. Panel data is a combination of the other two where we have values for all members of a
panel or group of firms or households measured at more than one period in time. (i=1…..N,
t=1……T)
Panel Data Analysis Workshop-6 May 2018
3
1. INTRODUCTION
Classical panel data: N>T or known as short or micro panel
Macro panel: T>N or known as long panel
Balanced panel : data available for all cross section for all periods.
No of observation: n = NT
Unbalanced panel : different T for individual.
Panel Data Analysis Workshop-6 May 2018
4
1. INTRODUCTION
Selection of econometric models will depend on the types of data:
1. Cross-section Model: Apply to cross-section data set.
2. Time-series Model: Normally applied to time series data, to uncover long run relations
and short run dynamics.
3. Panel Data Modelling: Normally used to capture heterogeneity across samples and due
to the need to have bigger sample size.
❖ Statics Panel data model : POLS, FE, RE, BE
❖Dynamic panel data: GMM
❖Panel unit root and cointegration (macro panel)
Panel Data Analysis Workshop-6 May 2018
5
1. INTRODUCTION
• Advantages & Disadvantages
Panel Data allow us to control for variables you cannot observe or measure such as:
❖ Time-invariant factors like geographical area, firm management characteristics.
❖ Variables that change over time but not across entities like national policies, federal regulation,
international agreements.
In other word, panel data is able to take into account for individual
heterogeneity (uniqueness)- resulted efficient estimates
Panel Data Analysis Workshop-6 May 2018
6
1. INTRODUCTION
Advantages:
i. Larger sample size, more variation, less collinearity therefore it will increased precision of
estimates
ii. Ability to study the dynamic- repeated cross-sectional observations-adjustment over times
iii. Ability to account for heterogeneity across individual often ignored in pooled data-more
robust against misspecification due to omitted variable
Disadvantages:
i. Data availablity/maintenance
ii. Measurement errors
iii. Self-selection bias
Panel Data Analysis Workshop-6 May 2018
7
1. INTRODUCTION
• Why Analyse Panel Data?
We are interested in describing change over time o social change, e.g. changing
attitudes, behaviours, social relationships o individual growth or development, e.g.
life-course studies, child development, career trajectories, school achievement o
occurrence (or non-occurrence) of events
We want superior estimates trends in social phenomena o Panel models can be used
to inform policy – e.g. health, obesity o Multiple observations on each unit can
provide superior estimates as compared to cross-sectional models of association
We want to estimate causal models o Policy evaluation o Estimation of treatment
effects
Panel Data Analysis Workshop-6 May 2018
8
1. INTRODUCTION
• What kind of data are required for panel analysis?
Basic panel methods require at least two “waves” of measurement. Consider student
GPAs and job hours during two semesters of college
One way to organize the panel data is to create a single record for each combination of
unit and time period
Notice that the data include:
• A time-invariant unique identifier for each unit (Student ID)
• A time-varying outcome (GPA)
• An indicator for time (Semester)
Panel datasets can include other time-varying or time-invariant variables
Panel Data Analysis Workshop-6 May 2018
9
2.CLASSICAL LINEAR REGRESSION MODEL (CLRM)
Table taken from page 37, “Applied Econometrics:, Asteriou & Hall, 2nd ed. 2011, Palgrave Macmillan
Panel Data Analysis Workshop-6 May 2018
10
3. PANEL DATA MODEL: POOLED OLS
Pooled OLSyit = β0 + βit Xit + αi + νit
i. αi and vit are normally distributed and they are mutually independent,
ii. E(αi) = E(vij) = 0, for i = 1,...,m, j = 1,2,...,m(i),
iii. E(αiαi´) =
ii
otherwise,
,
,0
21
iv. E(vijvi´j´) =
jjii
otherwise
,
.
,
,0
22
Panel Data Analysis Workshop-6 May 2018
11
4.GETTING START WITH EVIEWS 9
Panel Data Analysis Workshop-6 May 2018
12
Panel Data Analysis Workshop-6 May 2018
13
Panel Data Analysis Workshop-6 May 2018
14
Panel Data Analysis Workshop-6 May 2018
15
Panel Data Analysis Workshop-6 May 2018
16
Panel Data Analysis Workshop-6 May 2018
17
Panel Data Analysis Workshop-6 May 2018
18
Panel Data Analysis Workshop-6 May 2018
19
Panel Data Analysis Workshop-6 May 2018
20
5. DATA ANALYSIS
Panel Data Analysis Workshop-6 May 2018
21
DESCRIPTIVE STATISTICS
Panel Data Analysis Workshop-6 May 2018
22
Panel Data Analysis Workshop-6 May 2018
23
CORRELATION ANALYSIS
Panel Data Analysis Workshop-6 May 2018
24
Panel Data Analysis Workshop-6 May 2018
25
Panel Data Analysis Workshop-6 May 2018
26
Panel Data Analysis Workshop-6 May 2018
27
POOLED OLS REGRESSION
Panel Data Analysis Workshop-6 May 2018
28
Panel Data Analysis Workshop-6 May 2018
29
Panel Data Analysis Workshop-6 May 2018
30
Panel Data Analysis Workshop-6 May 2018
31
NORMALITY TEST
Panel Data Analysis Workshop-6 May 2018
32
Panel Data Analysis Workshop-6 May 2018
33
Panel Data Analysis Workshop-6 May 2018
34
DUMMY VARIABLES
Panel Data Analysis Workshop-6 May 2018
35
Panel Data Analysis Workshop-6 May 2018
36
Panel Data Analysis Workshop-6 May 2018
37
Panel Data Analysis Workshop-6 May 2018
38
Panel Data Analysis Workshop-6 May 2018
39
Panel Data Analysis Workshop-6 May 2018
40
MODEL VALIDATION
1. Face validity: signs and magnitudes make sense
2. Statistical validity:
• Parameter significance: t-test
• Model fit: R2
• Model significance: F-test
• Strength of effects: beta-coefficients
• Discussion of multicollinearity: correlation matrix
3. Predictive validity: how well the model predicts
• Out-of-sample forecast errors© 2012 John Wiley & Sons Ltd.
www.wiley.com/college/sekaran
Panel Data Analysis Workshop-6 May 2018
PARAMETER SIGNIFICANCE
• Testing that a specific parameter is significant (i.e., j 0)
• H0: j = 0
H1: j 0
• Test-statistic: t = bj/SEj ~ tn-k-1
with bj = the estimated coefficient for j
SEj = the standard error of bj
© 2012 John Wiley & Sons Ltd.
www.wiley.com/college/sekaran
Panel Data Analysis Workshop-6 May 2018
Panel Data Analysis Workshop-6 May 2018
P-VALUES• This is an alternative to the t-test
• A p-value, or marginal significance level, is the probability of observing a t-scorethat size or larger (in absolute value) if the null hypothesis were true
• Graphically, it’s two times the area under the curve of the t-distribution betweenthe absolute value of the actual t-score and infinity.
• In theory, we could find this by combing through pages andpages of statistical tables
• But we don’t have to, since we have EViews and Stata: these(and other) statistical software packages automatically give thep-values as part of the standard output!
• In light of all this, the p-value decision rule therefore is:
Reject H0 if p-valueK < the level of significance and if has the sign implied by HA
43
MEASURE OF OVERALL FIT: R2
• R2 measures the proportion of the variation in y that is explained by the
variation in x.
• R2 = total variation – unexplained variation
total variation
• R2 takes on any value between zero and one:
• R2 = 1: Perfect match between the line and the data points.
• R2 = 0: There is no linear relationship between x and y.
© 2012 John Wiley & Sons Ltd.
www.wiley.com/college/sekaran
Panel Data Analysis Workshop-6 May 2018
MODEL SIGNIFICANCEF-TEST
• H0: 0 = 1 = ... = m = 0 (all parameters are zero)
H1: Not H0
• Test statistic (k = # of variables excl. intercept)
F = (SSReg/k) ~ Fk, n-1-k
(SSe/(n – 1 – k)
SSReg = explained variation by regression
SSe = unexplained variation by regression
© 2012 John Wiley & Sons Ltd.
www.wiley.com/college/sekaran
Panel Data Analysis Workshop-6 May 2018
6. READING THE RESULTS
Dependent Variable: CR
Method: Panel Least Squares
Date: 05/23/17 Time: 17:06
Sample (adjusted): 1996 2011Time included
Total no of groups
Periods included: 16n=NT
Cross-sections included: 17
Total panel (unbalanced) observations: 85
Variable Coefficient Std. Error t-Statistic Prob.
C 12.83313 2.387841 5.374368 0.0000
FE -0.160617 0.039199 -4.097434 0.0001
FQ 2.032662 0.380137 5.347179 0.0000
CB 0.362423 0.185213 1.956787 0.0539
CAPR -0.203388 0.075746 -2.685126 0.0088
R-squared 0.371546 Mean dependent var 6.020596
Adjusted R-squared 0.340123 S.D. dependent var 5.639222
S.E. of regression 4.580898 Akaike info criterion 5.938690
Sum squared resid 1678.770 Schwarz criterion 6.082375
Log likelihood -247.3943 Hannan-Quinn criter. 5.996484
F-statistic 11.82412 Durbin-Watson stat 0.735389
Prob(F-statistic) 0.000000
Constant
If this no is < 0.05
then the model is
ok.
This is F test to see
whether all coeffs in
the model are diff
than zero.
Panel Data Analysis Workshop-6 May 2018
46
Coefficient Std. Error t-Statistic Prob.
12.83313 2.387841 5.374368 0.0000
-0.160617 0.039199 -4.097434 0.0001
2.032662 0.380137 5.347179 0.0000
0.362423 0.185213 1.956787 0.0539
-0.203388 0.075746 -2.685126 0.0088
Coefficients of the
regressors.
Indicate how much
Y change
When X change
T-values test the hypothesis that
each coeff is diff from 0
To reject this, the t-value has to
be higher than 1.96 (95%
confidence interval). If this is the
case then you can say that the
variables has a significant
influence on your DV (Y). The
higher the value the higher the
relevance of the variable.
Two-tail p-values test the
hypothesis
That each coeff is diff from
0.To reject this,
P-value has to be lower
than 0.05 (95%). If this is
Case the you can say that
the variable has a
significant influence
On you DV (Y)
Panel Data Analysis Workshop-6 May 2018
47
R-squared 0.371546 Mean dependent var 6.020596
Adjusted R-squared 0.340123 S.D. dependent var 5.639222
S.E. of regression 4.580898 Akaike info criterion 5.938690
Sum squared resid 1678.770 Schwarz criterion 6.082375
Log likelihood -247.3943 Hannan-Quinn criter. 5.996484
F-statistic 11.82412 Durbin-Watson stat 0.735389
Prob(F-statistic) 0.000000
R-squared shows the amount
of variance of Y explained by X Adjusted R-squared shows the
same as R-squared but adjusted by
the number of cases and number
of variables.
When the number of variables is
small and the number of cases is
very large,
then Adj R-squared is closer to R-
squared
Panel Data Analysis Workshop-6 May 2018
48