lecture 25: scatterplots for bivariate dataxuanyaoh/stat350/xyapr2lec25.pdf · scatterplots for...

18
Lecture 25: Scatterplots for Bivariate Data Section 3.1 and 3.2

Upload: others

Post on 15-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 25: Scatterplots for Bivariate Dataxuanyaoh/stat350/xyApr2Lec25.pdf · Scatterplots for Bivariate Data Section 3.1 and 3.2 . Announcement • Grades of Exam 2 are posted

Lecture 25: Scatterplots for Bivariate Data

Section 3.1 and 3.2

Page 2: Lecture 25: Scatterplots for Bivariate Dataxuanyaoh/stat350/xyApr2Lec25.pdf · Scatterplots for Bivariate Data Section 3.1 and 3.2 . Announcement • Grades of Exam 2 are posted

Announcement

•  Grades of Exam 2 are posted. •  Blackboard Signal Intervention will re-

run.

Page 3: Lecture 25: Scatterplots for Bivariate Dataxuanyaoh/stat350/xyApr2Lec25.pdf · Scatterplots for Bivariate Data Section 3.1 and 3.2 . Announcement • Grades of Exam 2 are posted

3.1 Visually Display Bivariate …

Page 4: Lecture 25: Scatterplots for Bivariate Dataxuanyaoh/stat350/xyApr2Lec25.pdf · Scatterplots for Bivariate Data Section 3.1 and 3.2 . Announcement • Grades of Exam 2 are posted

Scatterplot for H/W data

40

45

50

55

60

65

70

155 160 165 170 175 180

Page 5: Lecture 25: Scatterplots for Bivariate Dataxuanyaoh/stat350/xyApr2Lec25.pdf · Scatterplots for Bivariate Data Section 3.1 and 3.2 . Announcement • Grades of Exam 2 are posted

Scatter Plots

Page 6: Lecture 25: Scatterplots for Bivariate Dataxuanyaoh/stat350/xyApr2Lec25.pdf · Scatterplots for Bivariate Data Section 3.1 and 3.2 . Announcement • Grades of Exam 2 are posted

Example 3.1 with SAS Code

Page 7: Lecture 25: Scatterplots for Bivariate Dataxuanyaoh/stat350/xyApr2Lec25.pdf · Scatterplots for Bivariate Data Section 3.1 and 3.2 . Announcement • Grades of Exam 2 are posted

Scatterplot of Example 3.1

Page 8: Lecture 25: Scatterplots for Bivariate Dataxuanyaoh/stat350/xyApr2Lec25.pdf · Scatterplots for Bivariate Data Section 3.1 and 3.2 . Announcement • Grades of Exam 2 are posted

Scatterplots

•  Plot bivariate data •  Plot the (x,y) pairs directly on plot •  Pattern within plot can indicate certain

relationships between x and y – Linear

•  we like these A LOT! – Quadratic, Cubic? – Nonlinear? Exponential or Log? – Other? Random? – Etc.

Page 9: Lecture 25: Scatterplots for Bivariate Dataxuanyaoh/stat350/xyApr2Lec25.pdf · Scatterplots for Bivariate Data Section 3.1 and 3.2 . Announcement • Grades of Exam 2 are posted

3.2 Pearson’s Correlation Coefficient

•  Suppose a scatterplot shows a linear (or roughly linear) relationship between X and Y (note: both must be quantitative)

•  The correlation coefficient, r, measures the

strength and direction of the linear relationship –  Formally called Pearson’s correlation coefficient

•  Examples: –  Age and Bone Density –  Weight and Blood Pressure –  Etc.

Page 10: Lecture 25: Scatterplots for Bivariate Dataxuanyaoh/stat350/xyApr2Lec25.pdf · Scatterplots for Bivariate Data Section 3.1 and 3.2 . Announcement • Grades of Exam 2 are posted

How to calculate Correlation

•  Where:

•  Typo: On the right hand side of the above “Sxy” equation, the second item on the numerator part should be Sum of yi, instead of Sum of xi.

•  See Example 3.3 in text on page 108. •  Or by calculator

Page 11: Lecture 25: Scatterplots for Bivariate Dataxuanyaoh/stat350/xyApr2Lec25.pdf · Scatterplots for Bivariate Data Section 3.1 and 3.2 . Announcement • Grades of Exam 2 are posted

More about the Correlation •  Takes values between -1 and 1 –  Sign indicates type of relationship •  Positive, i.e., As X increases, Y also increases •  Negative , i.e., X increases, Y decreases (and vice versa)

–  Value indicates strength, farther from 0 is stronger •  If r is near 0, it implies a weak (or no) linear relationship •  Closer to +1 or -1 suggests very strong linear pattern •  See page 109 indicating “strengths”

•  If switch roles of X and Y à r doesn’t change

•  Unit free—unaffected by linear transformations

Page 12: Lecture 25: Scatterplots for Bivariate Dataxuanyaoh/stat350/xyApr2Lec25.pdf · Scatterplots for Bivariate Data Section 3.1 and 3.2 . Announcement • Grades of Exam 2 are posted

Visual understanding

Page 13: Lecture 25: Scatterplots for Bivariate Dataxuanyaoh/stat350/xyApr2Lec25.pdf · Scatterplots for Bivariate Data Section 3.1 and 3.2 . Announcement • Grades of Exam 2 are posted

Concerns with Correlation

•  r is affected by outliers, see formula •  Captures only the strength of the “linear”

relationship –  it could be true that Y and X have a very strong

non-linear relationship but r is close to zero

•  r = +1 or -1 only when points lie perfectly on a straight line. (Y=2X+3) – Rarely, if ever, true for real data!

Page 14: Lecture 25: Scatterplots for Bivariate Dataxuanyaoh/stat350/xyApr2Lec25.pdf · Scatterplots for Bivariate Data Section 3.1 and 3.2 . Announcement • Grades of Exam 2 are posted

Do datasets with the same r value have the same relationship? …

•  All four datasets have the same r = 0.816

Page 15: Lecture 25: Scatterplots for Bivariate Dataxuanyaoh/stat350/xyApr2Lec25.pdf · Scatterplots for Bivariate Data Section 3.1 and 3.2 . Announcement • Grades of Exam 2 are posted

More about r •  Does a small r indicate that x and y are NOT

associated? –  Not exactly, although maybe –  Linear association is weak between x and y BUT another

association may still exist! –  Are there outliers? –  Are there clusters?

•  Does a large r indicate that x and y are always linearly associated? –  Not always, could have clusters that look linear

•  Always check your scatterplot!!

Page 16: Lecture 25: Scatterplots for Bivariate Dataxuanyaoh/stat350/xyApr2Lec25.pdf · Scatterplots for Bivariate Data Section 3.1 and 3.2 . Announcement • Grades of Exam 2 are posted

What about a similar idea for populations?

•  Yes! We can define the correlation for populations as well, designated as ρ

–  Called the population correlation coefficient –  Maintains similar properties as r, i.e.

–It is between −1 and 1

–The correlation is 1 in the case of an increasing linear relationship, −1 in the case of a decreasing linear relationship

–Some value in between indicating the degree of linear association between the variables

– We are not required to calculate ρ at this point

Page 17: Lecture 25: Scatterplots for Bivariate Dataxuanyaoh/stat350/xyApr2Lec25.pdf · Scatterplots for Bivariate Data Section 3.1 and 3.2 . Announcement • Grades of Exam 2 are posted

Correlation and Causation •  A correlation, even a very strong one, DOES NOT IMPLY CAUSATION!!! •  Examples

–  For children, there is a extremely strong correlation between shoe size and math scores

–  Very strong correlation between ice cream sales and number of deaths by

drowning –  Very strong correlation between number of churches in a town and number of

bars in a town.

– A large correlation between height and weight of a person only means that there is a positive association between height and weight

– Heavy weight does not cause a person to grow tall

– Examples of common response…NOT causation!

Page 18: Lecture 25: Scatterplots for Bivariate Dataxuanyaoh/stat350/xyApr2Lec25.pdf · Scatterplots for Bivariate Data Section 3.1 and 3.2 . Announcement • Grades of Exam 2 are posted

After Class …

•  Review Section 3.1 and 3.2 •  Read section 3.3

•  Hw#9, 5pm today •  This Wed- Lab#5