Download - Bivariate Data Analysis
![Page 1: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/1.jpg)
Bivariate Data Analysis Bivariate Data analysis 4
![Page 2: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/2.jpg)
If the relationship is linear the residuals plotted against the original x - values would be
scattered randomly above and below the line.
![Page 3: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/3.jpg)
![Page 4: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/4.jpg)
![Page 5: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/5.jpg)
A scatter plot of residuals versus the x-values should be boring and have no interesting features, like
direction or shape. It should stretch horizontally with about
the same amount of scatter throughout. It should show no
curves or outliers
![Page 6: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/6.jpg)
r = 0.87 indicates a strong linear relationship between x and y
![Page 7: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/7.jpg)
The scatter plot below however shows the relationship is clearly non-linear
![Page 8: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/8.jpg)
When examining residuals to check whether a linear model is appropriate, it is usually best to
plot them.
The variation in the residuals is the key to assessing how well the
model fits.
![Page 9: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/9.jpg)
The pattern of residuals looks more like a parabola. This should indicate that the data were not really linear, but were
more likely to be quadratic.
![Page 10: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/10.jpg)
Discuss this data.
![Page 11: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/11.jpg)
Discuss this situation.
Outlier?
![Page 12: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/12.jpg)
Discuss the plot of the residuals
![Page 13: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/13.jpg)
Discuss this scatter plot
![Page 14: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/14.jpg)
Linear?
![Page 15: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/15.jpg)
Residuals
![Page 16: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/16.jpg)
Useful website
• http://stat-www.berkeley.edu/~stark/Java/Correlation.htm plots residuals, regression lines etc
![Page 17: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/17.jpg)
Many of our tools for displaying and summarizing data work only
when the data meet certain conditions.
We cannot use a linear model unless the relationship between two variables is linear.
Often re-expression can save the day, straightening bent relationships so that we can fit and use a simple linear model.
![Page 18: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/18.jpg)
Displays of the residuals can often help you find subsets in the
data.
![Page 19: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/19.jpg)
When a scatterplot shows a CURVED form that consistently increases or decreases, we can often straighten the form of the
plot be re-expressing one or both of the variables.
![Page 20: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/20.jpg)
The correlation is 0.979. That sounds pretty high, but the scatter plot shows something is not quite right.
![Page 21: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/21.jpg)
Re-expressing f/stop speed by squaring straightens the plot.
![Page 22: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/22.jpg)
This plot looks ‘straight’. The correlation is now 0.998, but the increase in correlation is not
important. (The original value of 0.979 is already large.) What is
important is the form of the plot is now straight, so the
correlation is now an appropriate measure of association.
![Page 23: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/23.jpg)
Goals of re-expression
• Make the distribution (as seen in its histogram, for example) more symmetric.
• Make the form of the scatter plot more nearly linear.
• Make the scatter in a scatter plot spread out evenly rather than following a fan shape.
![Page 24: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/24.jpg)
Some hints
• Try y2 for unimodal skewed to the left.• Try square root of y for counted data.• Try logs for measurements that can’t be negative
and especially when they grow by percentage increases.
• Try -1/y or -1/(square root of y).• Logs straighten exponential trends and pull in a
long right trail.• Logs straighten power curves.
![Page 25: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/25.jpg)
Try y versus x2
![Page 26: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/26.jpg)
Try y versus x2
![Page 27: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/27.jpg)
Try log or 1/x
![Page 28: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/28.jpg)
Try log or 1/x
![Page 29: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/29.jpg)
Don’t stray too far from the powers suggested. Taking a high power may
artificially inflate R2, but it won’t give a useful or meaningful model. It is better to stick with powers between 2 and -2. Even
in that range you should prefer the simpler powers in the ladder to those in the cracks. A square root is easier to
understand than the 0.413 power.
![Page 30: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/30.jpg)
Comparing histograms and scatter graphs
![Page 31: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/31.jpg)
The data in the scatter plot below shows the progression of the fastest times for the men’s marathon since the Second World War.
We may want to use this data to predict the fastest time at 1 January 2010 (i.e. 64 years after 1 January 1946).
Page 53
![Page 32: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/32.jpg)
Possible solutions
• a quadratic (y = ax2 + bx + c)• an exponential function (y = aebx)• a power function (y = axb)• 2 separate straight lines – one for say 0 – 23 years and one for say 23 – 60 years• a line for only the later years, say
23 – 60 years
![Page 33: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/33.jpg)
Quadratic
• Curve seems to fit• R2 = 0.9592 is very
high• Inappropriate to quote
r as it is not linear
• time starts increasing (not sensible)
Page 54
![Page 34: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/34.jpg)
Exponential
• Doesn’t fit the data points particularly well
![Page 35: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/35.jpg)
Power Function
• reasonable fit, • R2 is high • R2 = 0.9401
√
![Page 36: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/36.jpg)
Line for only the later years (1969-2003)
• Line (1969-2003) – reasonable fit, • R2 is high• Note: We only use the later years line for
the prediction and ignore the earlier years
√
![Page 37: Bivariate Data Analysis](https://reader035.vdocument.in/reader035/viewer/2022062322/568150ad550346895dbec2ca/html5/thumbnails/37.jpg)
The data in the scatter plot below comes from a random sample of 60 models of new cars taken from all models on the market in New Zealand in May 2000. We want to
use the engine size to predict the weight of a car.
• Seems to be linear for engine sizes less than 2500cc.
• Very weak or no linear relationship for engine sizes over 2500cc.
• Solution: Fit a line for engine sizes less than 2500cc.
Page 55