examining relationships prob. and stat. ch.2.1 scatterplots

21
Examining Relationships Prob. And Stat. CH.2.1 Scatterplots

Upload: flora-daniel

Post on 27-Dec-2015

226 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Examining Relationships Prob. And Stat. CH.2.1 Scatterplots

Examining Relationships

Prob. And Stat. CH.2.1Scatterplots

Page 2: Examining Relationships Prob. And Stat. CH.2.1 Scatterplots

• The relationships between two variables can be strongly influenced by other variables that are lurking in the background.• Because variation is everywhere, statistical relationships are overall tendencies, not ironclad rules. They allow individual exceptions.• To study a relationship between two variables, we measure both variables on the same individual.• We often think that one of the variables explains or influences the other.

Page 3: Examining Relationships Prob. And Stat. CH.2.1 Scatterplots

Response Variable, Explanatory Variable

A response variable measures an outcome of a study.

An explanatory variable explains or influences changes in a response variable.

Page 4: Examining Relationships Prob. And Stat. CH.2.1 Scatterplots

•A response variable measures an outcome of a study. (Independent Variables)•An explanatory variable explains or influences changes in a response variable. (Dependent Variables)•Because of the multiple meanings that the words “independent”, and “dependent”, have in statistics, we should try to avoid using them.•It is easier to identify explanatory and response variables when we actually set values of one variable in order to see how it affects another variable.

Page 5: Examining Relationships Prob. And Stat. CH.2.1 Scatterplots

When we don’t set the values of either variable but just observe both variables, there may or may not be explanatory and response variables. Whether there are depends on how we plan to use the data.

Page 6: Examining Relationships Prob. And Stat. CH.2.1 Scatterplots

EXAMPLE 2.1 The effects of alcohol

Alcohol has many effects on the body. One effect is a drop in body temperature. To study this effect, researchers give several different amounts of alcohol to mice, then measure the change in each mouse’s body temperature in the 15 minutes after taking the alcohol. Amount of alcohol is the explanatory variable, and change in body temperature is the response variable.

Page 7: Examining Relationships Prob. And Stat. CH.2.1 Scatterplots

EXAMPLE 2.2 SAT scores

Jim wants to know the average SAT math and verbal scores in the 51 states (including the District of Columbia) are related to each other. He doesn’t think that either score explains or causes the other. Jim has two related variables, and neither is an explanatory variable.Julie looks at the same data. She asks, “Can I predict a state’s SAT math score if I know its SAT verbal score?” Julie is treating the verbal score as the explanatory variable and the math score as the response variable.

Page 8: Examining Relationships Prob. And Stat. CH.2.1 Scatterplots

• Alcohol actually causes a change in body temperature.

• There is no cause-and –effect relationship between SAT math and verbal scores.

• Because the scores are closely related, we can use a state’s SAT verbal score to predict its math score.

• In section 2.3 we will look at predictions, that require us to identify an explanatory variable and a response variable

• Most statistical studies examine data on more than one variable.

• If we call one variable explanatory and the other response doesn’t mean that changes in one cause changes in the other.

Page 9: Examining Relationships Prob. And Stat. CH.2.1 Scatterplots

Statistical Principles

• First plot the data, then add numerical summaries.

• Look for overall patterns and deviations from those patterns.

• When the overall pattern is quite regular, use a compact mathematical model to describe it.

Page 10: Examining Relationships Prob. And Stat. CH.2.1 Scatterplots

•The most common way to display the relation between two quantitative variables is a scatterplot.

•Always plot the explanatory variable, if there is one, on the horizontal axis (the x axis) of a scatterplot.

•Usually we call the explanatory variable x and the response variable y.

Page 11: Examining Relationships Prob. And Stat. CH.2.1 Scatterplots

SCATTERPLOT

A scatterplot shows the relationship between two quantitative variables measured on the same individuals. The values of one variable appear on the horizontal axis. Each individual in the data appears as the point in the plot fixed by the values of both variables for that individual.

Page 12: Examining Relationships Prob. And Stat. CH.2.1 Scatterplots

EXAMINING A SCATTERPLOT

In any graph of data, look for the overall pattern and for striking deviations from that pattern.

You can describe the overall pattern of a scatterplot by the form, direction, and strength of the relationship.

An important kind of deviation is an outlier, an individual value that falls outside the overall pattern of the relationship.

Page 13: Examining Relationships Prob. And Stat. CH.2.1 Scatterplots

Analysis of SAT/ACT Scatterplots

Figure 2.1 shows a clear form: there are two distinct clusters of states with a gap between them. In the cluster at the right of the plot, 45% or more of high school graduates take the SAT, and the average scores are low. The states in the cluster at the left have higher SAT scores and lower percents of graduates taking the test. There are no clear outliers. That is, no points fall clearly outside the clusters.

Page 14: Examining Relationships Prob. And Stat. CH.2.1 Scatterplots

What explains the clusters? There are two widely used college entrance exams, the SAT and the America College Testing (ACT) exam. Each state favors one or the other. The left cluster in figure 2.1 contains the ACT states, and the SAT states make up the right cluster. In ACT states, most students who take the SAT are applying to a selective college that requires SAT scores. This select group of students has a higher average score than the much larger group of students who take the SAT in SAT states.

Page 15: Examining Relationships Prob. And Stat. CH.2.1 Scatterplots

The relationship in Figure 2.1 also has a clear direction: states in which a higher percent of students take the SAT tend to have lower average scores. This is a negative association between the two variables.

Page 16: Examining Relationships Prob. And Stat. CH.2.1 Scatterplots

POSITIVE ASSOCIATION, NEGATIVE ASSOCIATION

Two variables are positively associated when above-average values of one tend to accompany above-average values of the other and below-average values also tend to occur together.

Two variables are negatively associated when above-average values of one tend to accompany below-average values of the other, and vice versa.

Page 17: Examining Relationships Prob. And Stat. CH.2.1 Scatterplots

The strength of a relationship in a scatterplot is determined by how closely the points follow a clear form. The overall relationship in Figure 2.1 is not strong-states with similar percents taking the SAT show quite a bit of scatter in their average scores. Here is an example of a stronger relationship with a clearer form.

Page 18: Examining Relationships Prob. And Stat. CH.2.1 Scatterplots

•Not all relationships are linear in form. They can be nonlinear.

•Not all relationships have a clear direction that we can describe as a positive association or a negative association.

Page 19: Examining Relationships Prob. And Stat. CH.2.1 Scatterplots

Adding categorical variables to scatterplots

Looking at SAT scores, we can add categorical variables for Southern and Non-southern states, adds a third variable to the scatterplot. Figure 2.3 uses two different colors to distinguish the difference.

Use different colors or symbols to plot points when you want to add a categorical variable to a scatterplot.

We can use different symbols if two or more values occupy the same point or represent more than one individual.

Page 20: Examining Relationships Prob. And Stat. CH.2.1 Scatterplots

Example 2.5 Is the South different?

Figure 2.3 enhances the scatterplot in Figure 2.1 by plotting the southern states in purple. (We took the South to be the states in the East South Central and South Atlantic regions.) Most of the southern states blend in with the rest of the country. Several southern states do lie at the lower edges of their clusters, along with the District of Columbia, which is a city rather than a state. Georgia, South Carolina, and West Virginia have lower SAT scores than we would expect from the percent of their high school graduates who take the examination.

Page 21: Examining Relationships Prob. And Stat. CH.2.1 Scatterplots

A scatterplot displays the form, direction, and strength of the relationship between two quantitative variables.