c orrelation v s. c ausation 4.2 c autions about c orrelation and r egression correlation and...
TRANSCRIPT
CORRELATION VS. CAUSATION4.2
CAUTIONS ABOUT CORRELATION AND REGRESSION
Correlation and Regression ONLY describe only linear relationships
r and Least Squares Line are NOT resistant Extreme values and influential points can have
large effect Plot your scatter plot FIRST!!!!
EXTRAPOLATION Predicting x values from y’s (Extrapolation)
You SHOULD remain within the domain of your data Or very close to it
Predictions Outside your domain are often VERY inaccurate
The following is the least squares regression equation obtained for a young child’s heights in feet (y) compared to her age in years(x). Assuming the girl will live to be 52, predict her height at this ripe old age.
xy 1495.3388375.2^
10 feet tall
Age (yrs) Height (ft)
3 2.795
4 2.925
4.25 2.9575
4.5 3.0225
4.75 3.055
5 3.0875
Obviously people don’t continue to grow over time…
Just remember to be careful when extrapolating!!
LURKING VARIABLES Lurking Variable
Variable not in your study that can (and probably does) effect the interpretation of the relationship between your two measured variables
Often makes up the “left over” r2
May be hidden Can cause a “strong” or “weak” relationship that isn’t true Dangerous to data and Interpretations
What do I do about them?
Try to identify them BEFORE the study
Talk about their possible effects in your interpretations
Use a residual plot with time as your x to try to identify potential effects
SHOULD I USE AVERAGED DATA? Averaged data is okay, BUT
It shouldn’t really be used to predict or interpret for INDIVIDUALS
Correlations based on Averaged Data are often too High when applied to individuals
Averaged Data should be used to make predictions about averages
So What Do I Need to Do?
Pay attention to the WHOLE Situation: Look at the Data (Contextually) Look for Possible Lurking Variables Make sure to DOUBLE CHECK any Contextual
Inferences you make!!
CAUSATION r and r2, our regression statistics are describing an
association between 2 variables. But does this association mean that the explanatory
variable CAUSES the response variables An obvious example of this statement comes from a true
study that found the association listed below:
An actual study performed over a one year time span found a statistically strong relationship between the number of ice cream cones sold in a month and the number of homicides in the same month.
While there appeared to be a statistical association between these two variables, we know that it would be incorrect to say that the number of ice cream cones sold CAUSES the number of homicides.
This is where a LURKING variable comes into play…
CAUSATION (VISUALLY)
Below are three different visual examples of different situations and underlying variables that can Explain an association
x y
Dotted lines = association
Arrow = causal relationship
Causation
x y
z
x y
z
Common Response
(lurking variable)
ConfoundingCommon Response
Causation doesn’t mean there aren’t other factors that effect the result… Just that the response is directly caused by the explanatory variable…
CAUSATION (DIRECT) Let’s look at situations where direct causation occurs
A study of recorded the heights of young males (between the ages of 12 and 15) and their fathers. The study found an association between the two heights with an r2 of about 25%.
While there is a direct cause between the thickness of the rat’s stomach and the ounces of battery acid eaten, this is an example of a situation that you can’t generalize to all cases. IE… The effect might not be the same for humans.
There is a direct causal relationship between the height of a father and their son through heredity. It is possible to have direct causation with a low r2, it just says that the father’s height only explains about 25% of the variation in the son’s height.
A study performed on a number of lab rats found an association between the number of ounces of battery
acid eaten and the thickness level of the stomach lining.
COMMON RESPONSE (LURKING VARIABLE)
Let’s look at situations where there is a “lurking” variable An actual study performed over a one year time span
found a strong relationship statistically between the number of ice cream cones sold in a month and the number of homicides sold in the same month
Earlier we found a fairly good association between the number of tv’s that a person owns and their life expectancy.
While this study may show an association between the two, we know that there are many other “lurking” variables that can have an effect on life expectancy and the # of tv’s you own…. (DISCUSSION!!)
While this study provided evidence that there was an association between ice cream and homicides, they both are probably effected by a lurking variable such as heat/temperature. IE – when people are hot, they eat ice cream and when they are hot they are CRANKY
The MORAL: Association
doesn’t mean CAUSATION
CONFOUNDING
Two variables are “confounding” when you can’t tell which variable is effecting the responseMr. Arnold and Mr. Reed have been selected to compare the effectiveness of two well known laundry detergents, PRIDE and NONE. Each takes their respective detergents home, wash their clothes, and then bring them to a panel of judges for submission. It is found that PRIDE is the better detergent because Mr. Reed’s clothes are more clean.
While we can say that the detergent had an effect on the cleanliness of their clothes, there are other factors that could have equally effected the outcome… Washer quality, Water Quality, Laundry Cycle, etc… When we can’t tell if the “lurking” variables or the explanatory variable had the effect, the study is CONFOUNDING.
The MORAL: Association
doesn’t mean CAUSATION
SO WHEN CAN I SAY CAUSE?
Remember, even HIGH correlation doesn’t mean CAUSATION
When can I say it?
If you do an EXPERIMENT and control lurking
variables OR if you can prove high association over repeated studies, then you can say the
magic word!!!
CauseMan, I look good!!
MORAL OF THE STORY
Correlation and Association doesn’t mean CAUSATION
Really examine the CONTEXT of your data Don’t just look at the numbers
Numbers tell you everything!!
I love Numbers!!
Don’t listen to that Geek! You
better look at the CONTEXT, not just
the numbers.
HOMEWORK
#38-45