6034 - classical linear regression model.pptx
TRANSCRIPT
1
Two Variable Regression model: The problem of estimation
2
Sample and population regression line
3
4
i1 52.25 258.30 -60.0535 -286.635 17213.43 82159.622 58.32 343.10 -53.9835 -201.835 10895.76 40737.373 81.79 425.00 -30.5135 -119.935 3659.64 14384.404 119.90 467.50 7.5965 -77.435 -588.23 5996.185 125.80 482.90 13.4965 -62.035 -837.26 3848.346 100.46 487.70 -11.8435 -57.235 677.86 3275.857 121.51 496.50 9.2065 -48.435 -445.92 2345.958 100.08 519.40 -12.2235 -25.535 312.13 652.049 127.75 543.30 15.4465 -1.635 -25.26 2.6710 104.94 548.70 -7.3635 3.765 -27.72 14.1811 107.48 564.60 -4.8235 19.665 -94.85 386.7112 98.48 588.30 -13.8235 43.365 -599.46 1880.5213 181.21 591.30 68.9065 46.365 3194.85 2149.7114 122.23 607.30 9.9265 62.365 619.07 3889.3915 129.57 611.20 17.2665 66.265 1144.16 4391.0516 92.84 631.00 -19.4635 86.065 -1675.13 7407.1817 117.92 659.60 5.6165 114.665 644.02 13148.0618 82.13 664.00 -30.1735 119.065 -3592.61 14176.4719 182.28 704.20 69.9765 159.265 11144.81 25365.3420 139.13 704.80 26.8265 159.865 4288.62 25556.82 2246.07 10898.70 45907.91 251767.87
𝑌ത= σ𝑌𝑖𝑛 = 112.3035 𝛽መ= σ𝑥𝑖𝑦𝑖σ𝑥𝑖2 = 45907.91251767.87 = 0.1823 𝑋ത= σ𝑋𝑖𝑛 = 544.935 𝛼ො��= 𝑌ത− 𝛽መ 𝑋ത= 12.9388 SRF: 𝑌𝑖 = 𝛽1 + 𝛽2 𝑋𝑖 𝒀 𝒊 = 𝟏𝟐.𝟗𝟑𝟖𝟖+ 𝟎.𝟏𝟖𝟐𝟑𝑿𝒊
5
Properties of SRF
1. It passes through
2. The mean value of estimated Y is equal to the mean value of the actual Y
3. The mean value of the residuals is zero
4. The residuals are uncorrelated with the predicted
5. The residuals are uncorrelated with the
6
The Classical Linear Regression Model
Assumptions (pertain to PRF):1. Linear in the parameters.2. Fixed X values in repeated samples (fixed regressor)
or X values independent of error term; for stochastic regressor;3. Zero mean value of disturbance ;
𝑌 𝑖=𝛽1+𝛽2𝑋 𝑖+𝑢𝑖
7
The Classical Linear Regression Model
4. Constant variance of (homoscedasticity);
heteroscedasticity
8
The Classical Linear Regression Model5. No autocorrelation between the disturbances ;
6. The number of observations n must be greater than the number of parameters to be estimated7. There must be variation in the values of the X variables8. No exact colinearity between the X variables9. There is no specification bias
Precision or standard errors of Least-squares estimates
Standard errors of estimates:
• Standard error : standard deviation of the sampling distribution of the estimator; distribution of the set of values of estimator obtained from all possible samples of the same size from a given population.
•
10
• standard error of estimate or standard error of the regression
(standard deviation of Y values about the estimated regression line)
Important feature of
The variance of is directly proportional to but inversely proportional to
-> Given , the larger variation in X values, the smaller the variance of implying greater precision
-> Given , the larger , the larger the variance of
-> As sample size n increases, the number of terms in , the smaller the variance of
11
Properties of Least Square Estimators
The Gauss-Markov Theorem
Given the assumptions of the classical linear regression model, the least-squares estimators, in the class of unbiased linear estimators, have minimum variance, that is they are BLUE (Best linear unbiased estimator)
1. Linear function of a random variable, such as the dependent variable Y in the regression model.
2. Unbiased: its average of expected value is equal to the true value.
3. Minimum variance in the class of all such linear unbiased estimators -> efficient estimator
12
Coefficient of determination
How well the sample regression line fits the data?The goodness of fit of the fitted regression line to the data set.
-> coefficient of determination
Circle Y: variation in dependent variable YCircle X: variation in independent variable X
Shaded area: The extent to which the variation in Y is explained by the variation in X
numerical measures of overlap, lies between 0 and 1
In deviation form
∑ 𝑦 𝑖2=∑ ( �̂� 𝑖+�̂�𝑖 )
2
• Total sum of squares (TSS) = total variation of actual Y about their sample mean
• Explained sum of squares (ESS) = variation of estimated Y about their sample meansum of squares due to regression (explanatory var)
• Residual sum of squares (RSS) = unexplained variation of Y values about the regression
line
TSS = ESS + RSS
14
Coefficient of determination (
Measures the proportion or percentage of the total variation in Y explained by the regression model.i. Nonnegative quantityii. [0,1]
What is the value of from exercise 1?
15
Alternatively:
16
Coefficient of correlation
17
Properties of r:
1. Can be positive or negative
2. [-1, 1]
3. Symmetrical
4. Independent of origin and scale
5. If X and Y are statistically independent, r = 0. But zero correlation does not necessarily imply independence.
6. A measure of linear association or linear dependence only.
7. Does not necessarily imply any cause-and effect relationship.
18
Classical Normal Linear Regression Model
Values of OLS estimators change from sample to sample.Random variables.
Need to find out their probability distribution.
The Normality Assumption for
The classical normal linear regression model assumes that each is distributed normally with
19
• Theoretical justification Central limit theorem (CLT): Given random and independent samples of N observations each, the distribution of sample means approaches normality as the size of N increases, regardless of the shape of the population distribution
represent combined influence (on the dependent variable) of a large number of independent variables that are not explicitly introduced in the regression model. By CLT, if there is a large number of independent and identically distributed random variables, the distribution of their sum follows normal distribution as the number of variables increases indefinitely.
A variant of CLT, even if the number of variables is not very large or if these variables are not strictly independent, their sum, may still be normally distributed.
20
• One property of normal distribution:
Any linear function of normally distributed variables is itself normally distributed.
-> If are normally distributed, and are also normally distributed.
21
Properties of Least Square Estimators under the Normality assumption
1. Unbiased2. Minimum variance3. Consistency – as sample size increases indefinitely, the estimators converge to their true population
values4. is normally distributed with
5. is normally distributed with
22
23
6. Variable follows chi square distribution with (n-2)df.7. and distributed independently of 8. and have minimum variance in the entire class of unbiased estimators whether linear or not.
Normality assumption of enables to derive the probability distributions of , and -> estimation and hypotheses testing
Note that since is a linear function of
24
Interval Estimation
Example = 0.7240Point estimator-> constructing an interval around the point estimator
The probability that the random interval contains the true is
• ) Confidence interval• Confidence coefficient• level of significance• Endpoints of CI confidence limits (lower / upper)
25
Confidence intervals for and
OLS estimators and normally distributed
Standardized normal variable:
However, unknown. -> determined by
26
critical t value at level of significance. The value of t variable obtained from the t distribution for level of significance and n-2 df.
27
100 (1-)% confidence interval for :
• Larger the standard error, larger is the width of the CI.• Larger the standard error, the greater the uncertainty of estimating the true value of the unknown parameter.Standard error of estimator as a measure of the precision of the estimator, how precisely the estimator measure the true population value.
Example: = 0.7240se = 0.07n = 13
What is the critical value from t table assuming =0.05? What is the 95 percent confidence interval for ?
28
29
100 (1-)% confidence interval for :
Interpretation:
Given the confidence coefficient of 95 percent, in 95 out of 100 cases, the interval of will contain the true .
However, the probability that the specified fixed interval includes the true is 1 or 0.
30
Simple Regression in Eviews:• Step 1: Open Eviews
• Step 2: Click on File/New/Workfile in order to create a new file
• Step 3: Choose the frequency of the data in the case of time series data or Undated or Irregular in the case of cross-sectional data, and specify the start and end of your data set. Eviews will open a new window which automatically contains a constant (c) and a residual (resid) series.
• Step 4: On the command line type:
genr x=0 (press enter)
genr y=0 (press enter)
which creates two new series named x and y that contains zeros for every observation. Open x and y as a group by selecting them and double clicking with the mouse.
• Step 5: Either type the data or copy/paste from Excel. To be able to type (edit) the data of your series or to paste anything into the Eviews cells, the edit +/- button must be pressed. After editing the series press the edit +/- button again to lock or secure the data.
• Step 6: Once the data have been entered into Eviews, the regression line may be estimated either by typing
ls y c x (press enter)
on the command line, or by clicking on Quick/Estimate equation and then writing your equation (y c x) in the new window.
Do exercise 3.20.