correlation. what is correlation? correlation is the measure of whether and how strongly pairs of...

19
Correlation

Upload: millicent-rodgers

Post on 29-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Correlation. What is correlation? Correlation is the measure of whether and how strongly pairs of variables are related

Correlation

Page 2: Correlation. What is correlation? Correlation is the measure of whether and how strongly pairs of variables are related

What is correlation?

• Correlation is the measure of whether and how strongly pairs of variables are related.

Page 3: Correlation. What is correlation? Correlation is the measure of whether and how strongly pairs of variables are related

Types of correlation

• Positive/negatives• Strong /weak

Page 4: Correlation. What is correlation? Correlation is the measure of whether and how strongly pairs of variables are related

Correlation coefficient r

• Correlation coefficient r is a number that indicates how well data fit a statistical model

• Ranges between –1 and 1

• The closer to –1, the stronger the negative linear relationship

• The closer to 1, the stronger the positive linear relationship

• The closer to 0, the weaker any positive linear relationship

Page 5: Correlation. What is correlation? Correlation is the measure of whether and how strongly pairs of variables are related

Correlation coefficient r

• 1 is a perfect positive correlation• 0 is no correlation (the values don't seem

linked at all)• -1 is a perfect negative correlation

Page 6: Correlation. What is correlation? Correlation is the measure of whether and how strongly pairs of variables are related

Correlation and causation

• Causation is a relationship that describes and analyses cause and effect

• Correlation is NOT causation

Page 7: Correlation. What is correlation? Correlation is the measure of whether and how strongly pairs of variables are related

Examples

• Temperature & ice-cream sales• Ice-cream sales & shark attack• Runny nose & headache

Page 8: Correlation. What is correlation? Correlation is the measure of whether and how strongly pairs of variables are related

How to calculate r

• Step 1: Find the mean of x, and the mean of y• Step 2: Subtract the mean of x from every x

value (call them "a"), do the same for y (call them "b")

• Step 3: Calculate: a × b, a2 and b2 for every value

• Step 4: Sum up a × b, sum up a2 and sum up b2

• Step 5: Divide the sum of a × b by the square root of [(sum of a2) × (sum of b2)]

Page 9: Correlation. What is correlation? Correlation is the measure of whether and how strongly pairs of variables are related

How to calculate r

• Formula for r

Page 10: Correlation. What is correlation? Correlation is the measure of whether and how strongly pairs of variables are related

• Sxx is the sum of all the squares of the differences between the xi and the mean, for all i from 1 to n.

• Syy is the sum of all the squares of the differences between the yi and the mean, for all i from 1 to n.

• Sxy is the product of the differences between the xi and the mean and the differences between the yi and the mean, for all I from 1 to n.

Page 11: Correlation. What is correlation? Correlation is the measure of whether and how strongly pairs of variables are related

Example• A local shop of milk shakes keeps a track of the amount of milk shakes

they sell in accordance to the temperature on that day. Below are the figures of their sale and temperature for the last 12 days. Comment on the relationship.

Page 12: Correlation. What is correlation? Correlation is the measure of whether and how strongly pairs of variables are related

Solution

Page 13: Correlation. What is correlation? Correlation is the measure of whether and how strongly pairs of variables are related

Question

• Find the correlation coefficient of the following data.

Page 14: Correlation. What is correlation? Correlation is the measure of whether and how strongly pairs of variables are related

Linear Regression

• In regression, one variable is considered independent (=predictor) variable (X) and the other the dependent (=outcome) variable Y.

• In statistics, linear regression is an approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variable) denoted X.

• The case of one explanatory variable is called simple linear regression.

Page 15: Correlation. What is correlation? Correlation is the measure of whether and how strongly pairs of variables are related

Linear Regression

• Remember this?• Y=mX+B

B

m

Page 16: Correlation. What is correlation? Correlation is the measure of whether and how strongly pairs of variables are related

What is a slope?

• A slope of 2 means that every 1-unit change in X yields a 2-unit change in Y.

Page 17: Correlation. What is correlation? Correlation is the measure of whether and how strongly pairs of variables are related

Prediction

• If you know something about the x, based on the model, you can know something about y

• Extrapolation:• Attempting to use a regression equation to

predict values outside of the observed range

Page 18: Correlation. What is correlation? Correlation is the measure of whether and how strongly pairs of variables are related

How to calculate linear regression

Page 19: Correlation. What is correlation? Correlation is the measure of whether and how strongly pairs of variables are related

Exercise

• Find the linear regression of the following data.