![Page 1: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/1.jpg)
Analysis of Individual Variables
• Descriptive – – Measures of Central Tendency
• Mean – Average score of distribution (1st moment)• Median – Middle score (50th percentile) of distribution
– Measures of Variation (used to measure the range of the distribution relative to the measures of central tendency)
• Range – Distance between lowest and highest data point• Mean Deviation – Average distance between Mean and data
points • Variance – Sum of Squared distance from mean (2nd moment)• Standard Deviation – Square root of variance
![Page 2: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/2.jpg)
Analysis of Individual Variables
Obs Income1 20.502 31.503 47.704 26.205 44.006 8.287 30.808 17.209 19.90 Mean 31.28
10 9.96 Median 25.7011 55.80 Variance 500.6812 25.20 Stdev 22.3813 29.0014 85.5015 15.1016 28.5017 21.4018 17.7019 6.4220 84.90
![Page 3: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/3.jpg)
Analysis of Relationship among Variables
• Correlation• Regression
– Two Variable Models– Multiple Variable Models– Discrete Dependent Variable Models
![Page 4: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/4.jpg)
Scatter Plot of Money Supply Growth and Inflation
![Page 5: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/5.jpg)
Correlation
• A scatter plot is a graph that shows the relationship between the observations for two data series in two dimensions
• Correlation analysis expresses this numerically– In contrast to a scatter plot, which graphically depicts the
relationship between two data series, correlation analysis expresses this same relationship using a single number
– The correlation coefficient is a measure of how closely related two data series are
– The correlation coefficient measures the linear association between two variables
![Page 6: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/6.jpg)
Correlation
• Determine association between 2 variables • Measured on a scale from +1 to -1
– values close to +1.0 indicates strong positive relationship
– values close to -1.0 indicates strong negative relationship
– values close to 0 indicates little or no relationship
+1 0 -1
![Page 7: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/7.jpg)
Variables with Perfect Positive Correlation
![Page 8: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/8.jpg)
Variables with Perfect Negative Correlation
![Page 9: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/9.jpg)
Variables with a Correlation of 0
![Page 10: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/10.jpg)
Variables with a Non-Linear Association
![Page 11: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/11.jpg)
Calculating correlations
• The sample correlation coefficient ‘r’ is,
n
i
iY
n
i
iX
n
i
ii
YX
n
YYs
n
XXs
n
YYXXYXCov
ss
YXCovr
1
2
1
2
1
)1(
)(,
)1(
)(
)1(
))((),(
),(
![Page 12: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/12.jpg)
Calculating correlations
• E.g.: Is it true that higher education leads to higher compensation?– To answer this question, we need to look at the data and
calculate correlation
Years of Education
Compensation (000)
17.97 163.3022.86 142.0517.25 100.0013.35 103.5514.97 90.0015.87 97.5013.17 90.0011.1 80.00
13.86 90.258.97 49.50
![Page 13: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/13.jpg)
Calculating correlations
• The sample correlation coefficient ‘r’ is,
22
1
2
1
2
1
)(,)(
)(),(
Y of average
X, of average
calculate toneed weso
)1(
)(,
)1(
)(,
)1(
))((),(
),(
YYXX
YYXX
Y
X
n
YYs
n
XXs
n
YYXXYXCov
ss
YXCovr
ii
ii
n
i
iY
n
i
iX
n
i
ii
YX
![Page 14: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/14.jpg)
Calculating correlationsYears of
Ed. Comp (000) (X-XBar)2 (Y-YBar)2 (X-XBar)(Y-YBar)17.97 163.30 9.20 3929.41 190.1222.86 142.05 62.77 1716.86 328.2917.25 100.00 5.35 0.38 -1.4213.35 103.55 2.52 8.61 -4.6614.97 90.00 0.00 112.68 -0.3515.87 97.50 0.87 9.70 -2.9113.17 90.00 3.12 112.68 18.7611.10 80.00 14.72 424.98 79.1013.86 90.25 1.16 107.43 11.168.97 49.50 35.61 2612.74 305.00
Sums: 135.32 9035.48 923.10:
XBar 14.94YBar 100.62n -1 9.00
Covariance 102.57
SX 3.88
SY 31.69
r 0.83
Calculations
![Page 15: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/15.jpg)
Calculating correlations (EXCEL)
Years of Ed.
Comp (000)
17.97 163.3022.86 142.0517.25 100.0013.35 103.5514.97 90.0015.87 97.5013.17 90.0011.10 80.0013.86 90.258.97 49.50
Correlation =CORREL(array1, array2)Correlation 0.83
![Page 16: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/16.jpg)
Correlation Matrix
US Eqt UK US FI Japan Korea Mexico China HK S'pore IndiaUS Eqt 1.00
UK 0.27 1.00US FI -0.13 -0.27 1.00Japan 0.20 -0.15 0.08 1.00Korea -0.13 -0.17 0.28 -0.01 1.00
Mexico -0.10 0.28 -0.35 -0.38 -0.01 1.00China 0.17 -0.12 0.29 0.09 0.19 0.00 1.00
HK 0.22 0.24 -0.38 -0.23 -0.55 0.32 -0.08 1.00S'pore 0.52 0.24 0.00 0.08 -0.02 0.30 0.35 -0.01 1.00India 0.30 0.57 0.17 -0.12 -0.11 -0.17 0.24 0.01 0.35 1.00
![Page 17: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/17.jpg)
Correlations Among Stock Return Series
![Page 18: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/18.jpg)
Regression
• Most times its not enough to just say whether 2 variables are correlated– we would like to define a relationship between the two variables– E.g. when the economy grows 1%, how much will the S&P500
increase
• To do this, we use a technique of Regression
![Page 19: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/19.jpg)
Regression
• How the term Regression came to be applied to the subject of statistical models.
• 19th century scientist, Sir Francis Galton, studying human subjects found in all things "regression toward mediocrity”– E.g. If your parents are very smart, you are likely to
be significantly less smart - so its really not your fault!!
![Page 20: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/20.jpg)
Regression
• In modern times, when we talk of Regression analysis, we make an implicit assumption of a ‘mean’ relationship between variables and we try to determine that relationship.
• Regression analysis is concerned with –– the study of the dependence of one variable (the dependent
variable) – on one or more other variables (the explanatory variables) – with a view to estimating and/or predicting the mean or
average value of the former – in terms of the fixed values of the latter.
![Page 21: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/21.jpg)
Two Variable Regression Model
• Regression analysis is concerned with relationship of 2 variables, say ‘y’ and ‘x’ and can be written as –
– All this means is that the value of ‘y’ is a function of the value of ‘x’– Another way of saying it is that ‘y’ doesn’t independently get its
value, but somehow depends on ‘x’ to get its value– Thus y can so how be derived from ‘x’– Thus ‘y’ is a dependent variable and ‘x’ is an independent variable
• Regression is thus, the study of a relationship between the dependent and independent variables
)( ii xfy
![Page 22: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/22.jpg)
Regressionx y
1.0 2.01.5 3.02.0 4.02.5 5.03.0 6.03.5 7.04.0 8.04.5 9.05.0 10.05.5 11.06.0 12.06.5 13.07.0 14.07.5 15.025 ?
2 where,*)(
50,25 if so , *2
)(:
25 when x y, is what :Q
xyxfy
yxxy
xfyA
0
2
4
6
8
10
12
14
16
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5
![Page 23: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/23.jpg)
Regression
303*33
202*22
1011:
10 when x y3, y2, y1, are what :Q
yxy
yxy
yxyA
0
5
10
15
20
25
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5
x y1 y2 y30.0 0.0 0.0 0.00.5 0.5 1.0 1.51.0 1.0 2.0 3.01.5 1.5 3.0 4.52.0 2.0 4.0 6.02.5 2.5 5.0 7.53.0 3.0 6.0 9.03.5 3.5 7.0 10.54.0 4.0 8.0 12.04.5 4.5 9.0 13.55.0 5.0 10.0 15.05.5 5.5 11.0 16.56.0 6.0 12.0 18.06.5 6.5 13.0 19.57.0 7.0 14.0 21.07.5 7.5 15.0 22.510
![Page 24: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/24.jpg)
Regression
2 1, where, *)(
21,10 if so
)*2(1
)(:
10 when x y, is what :Q
xyxfy
yx
xy
xfyA
x y10.0 1.00.5 2.01.0 3.01.5 4.02.0 5.02.5 6.03.0 7.03.5 8.04.0 9.04.5 10.05.0 11.05.5 12.06.0 13.06.5 14.07.0 15.07.5 16.0
10.0
0
2
4
6
8
10
12
14
16
18
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5
![Page 25: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/25.jpg)
Two Variable Regression Model
• Regression analysis is concerned with –– the study of a relationship between the dependent and
independent variables
– In reality, we can are estimating a relationship, so we can calculate the value of a random variable
)( ii xfy
ii xy
![Page 26: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/26.jpg)
Two Variable Regression Model
• Real data from which we estimate relationship is never very good because we deal with random variables– What we end up having is some thing like this
– What we try to do in regression is estimates the “Line of Best Fit”, so that we can come up with this equation
– This is also the equation of line, so this form of regression is called a ‘Linear regression”
ii xy
errorxy ii
![Page 27: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/27.jpg)
Two Variable Regression Model
y = 0.841+0.3909x
R2 = 0.7247
2.002.202.402.602.803.003.203.403.603.804.00
2.00 3.00 4.00 5.00 6.00 7.00 8.00
![Page 28: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/28.jpg)
Two Variable Regression Model
• Regression Model – Equation of a Line
• Terminology – ‘y’– Dependent Variable, or– Left-Hand Side Variable, or– Explained Variable, or
iii xy
![Page 29: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/29.jpg)
• Terminology – ‘x’– Independent Variable, or– Right-Hand Side Variable, or– Explanatory Variable, or– Regressor, Covariate, Control Variable
• Terminology – ‘’– Error– Disturbance
Two Variable Regression Model
![Page 30: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/30.jpg)
Two Variable Regression Model
iii xy
• Terminology – – ‘’ - Intercept– ‘’ – Slope– ‘’ - error
![Page 31: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/31.jpg)
Assumptions of the Linear Regression Model
• The relationship between the dependent variable, Y, and the independent variable, X is linear
• The independent variable, X, is not random• About the error –
– The expected value (remember average) of the error term is 0– The error term is normally distributed– The variance of the error term is the same for all observations– The error term is uncorrelated across observations
![Page 32: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/32.jpg)
Regression Relationship estimation
• The model is estimated by the “Least Squares Estimation” method
![Page 33: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/33.jpg)
Two Variable Regression Model
XY
XVar
YXCov
xy ii
)(
),(
![Page 34: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/34.jpg)
• Inferences from Regression can be made about– Model - how well does the specified model perform, i.e., are
the specified independent variables, taken together a good predictor of the dependent variable (R2)
– Independent Variables – The contribution of each independent variable in predicting the dependent variable (hypothesis test)
Inferences from Regression
iii xy 11
![Page 35: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/35.jpg)
Model power
variationTotal
variationdUnexplaine1
variationTotal
variationexp variationTotal
variation2
lainedUnToal
ExplainedR
![Page 36: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/36.jpg)
Inference about Model
• Coeff. of Determination (R2)
• So, higher the R2 – better model (Yes? That would be too easy!)
x1-xm)
(x1, y1)
ym
yp
y1
xm x1
SST SSE
SSRSST
SSER
SST
SSE
SST
SSRSST
SSE
SST
SSR
SSESSRSST
1
1
1
2
![Page 37: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/37.jpg)
Inference about Model
• If the model is correctly specified, R2 is an ideal measure
• Addition of a variable to a regression will increase the R2 (by construction)
• This fact can be exploited to get regressions with R2 ~ 100% by addition of variables, but this doesn’t mean that the model is any good
• Adj-R2 should be reported
![Page 38: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/38.jpg)
Inference about Parameters
• Coefficients are estimated with a confidence interval• To know if a specific independent variable (xi) is
influential in predicting the dependent variable (y), we test whether the corresponding coefficient is statistically different from 0 (i.e. i = 0).
• We do so by calculating the t-statistic for the coefficient
• If the t-stat is sufficient large, it indicates that bi is significantly different from 0 indicating that i * xi plays a role in determining y
![Page 39: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/39.jpg)
Inference about parameters
• We can test to see if the slope coefficient is significant by using a t-test.
1
01
^
bst
![Page 40: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/40.jpg)
In Excel
![Page 41: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/41.jpg)
In Excel
![Page 42: Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50](https://reader035.vdocument.in/reader035/viewer/2022062715/56649d795503460f94a5c79a/html5/thumbnails/42.jpg)
In Excel
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.405156042R Square 0.164151419Adjusted R Square 0.149740236Standard Error 0.05350165Observations 60
ANOVAdf SS MS F Significance F
Regression 1 0.032604637 0.032604637 11.39055864 0.001321732Residual 58 0.166020739 0.002862427Total 59 0.198625377
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept -9.72076E-05 0.007438982 -0.01306732 0.98961893 -0.014987948 0.014793533X Variable 1 0.939398568 0.278341127 3.374990169 0.001321732 0.382238272 1.496558865