algebra 1 and statistics…. teacher reference: descriptive statistics and analyses

algebra 1 and statistics…

teacher reference: descriptive statistics and analyses

UNIT 3 – S.ID.A.1, S.ID.A.2, S.ID.A.3

Quantitative and Categorical VariablesQuantitative Variable: takes numerical values for which arithmetic operations such as adding and averaging make sense. Examples: cholesterol levels, salaries, numerical test grades, etc.

Categorical Variable: places an individual into one of several groups or categories. Examples: car color, gender, zip code, drink size, etc.

Quantitative VariablesDiscrete: A finite number of values between two numbers on a number line. These are counting numbers. Examples: number of siblings, number of states visited, shoe size, number of pets.

Continuous: An infinite number of values between two numbers on a number line. These are measurements. Examples: height, weight, temperature, number of ounces in a Starbucks coffee.

Distribution – Center Median (M): The midpoint of a distribution, the number such that half the observations are smaller and the other half are larger.

How to find the median of a distribution1. Arrange all observations in order of size, from smallest to largest.

2. If the number of observations (n) is odd, the median M is the center observation in the ordered list.

3. If the number of observations (n) is even, the median M is the mean of the two center observations in the ordered list.

Mean (x-bar Ë): The arithmetic average.

How to find the mean of a distribution1. Add all observations in the distribution.

2. Divide the sum by the number of observations, n.

nx

x i

Distribution – SpreadInterquartile Range (IQR): IQR = Q3 – Q1

First Quartile (Q1): The median of the observations whose position in the ordered list is to the left of the location of the overall median.

Third Quartile (Q3): The median of the observations whose position in the ordered list is to the right of the location of the overall median.

Standard Deviation (s): The square root of the average of the squares of the deviations of the observations from their mean. In plain terms, it tells us the average amount the data varies from the mean. If the deviations from the mean are small, we will have a small standard deviation.

1nxx

s2

i

Outlier: Any individual observation that falls outside the overall pattern of the graph.

Outlier Rule: Any value that falls outside the range: Q1 – 1.5(IQR) and Q3 + 1.5(IQR).

UNIT 3 – S.ID.B.5, S.ID.B.6Two-way Frequency TableTwo-way Table: Describes two categorical variables, the row variable (pass/fail) and the column variable (gender).

male femalepass 84 92fail 11 6

Relative Frequency: Converting frequency into proportions or percents.

Marginal Distribution: Row and column totals.Two-way relative frequency table with marginal

distribution by gender:

Conditional Relative Frequency: A distribution referring to only people who satisfy a given condition. Example: What percent of male students passed? 88.4%

male femalepass 0.884 0.939fail 0.116 0.061

total 1.0 1.0

UNIT 3 – S.ID.C.7, S.ID.C.8, S.ID.C.9Residual: The difference between an observed value of the response variable and the value predicted by the regression line.

residual = observed y – predicted y

A residual plot is a scatterplot of the regression residuals against the independent variable (x-values). Residual plots help us assess the fit of a regression line.

If the regression line captures the overall relationship between x and y, the residuals should have no systematic pattern.

A curved pattern in a residual plot shows that the relationship between x and y is non-linear.

Rate of ChangeSlope in context of a problem: for each increase in the x-value, on average, the y-value will increase (or decrease if the slope is negative) by the slope.

InterceptIt is the predicted value of y when x = 0. Sometimes the y-intercept does not have any meaning in the context of the data.

Linear ModelWe fit the best fitting line using least squares regression. This model is

where a is the slope of the model and b is the y-intercept.

Using TechnologyIn order to display the correlation coefficient on the TI-84, go into CATELOG, select DIAGNOSTIC ON, and enter twice.

To retrieve the regression line on the TI-84, go into STAT, 4:LinReg(ax+b), and hit enter twice.

Understanding Data as Linear Models Activity

1. Which variable is the explanatory variable and which is the response variable? Explain your reasoning.

explanatory variable: car weight – we believe the weight of a car explains the gas mileage

response variable: gas mileage – we believe how much gas a car uses is a response (or depends) on the weight of the car.

2. Use this data to make a scatterplot.

15

35

25

3000 50004000

gas

mile

age

car weight

http://www.google.com/imgres?num=10&hl=en&biw=1024&bih=634&tbm=isch&tbnid=MWcHKZP-3w17vM:&imgrefurl=http://www.jamesrahn.com/graph%20paper/graph_paper.htm&docid=Q6o-fYJJHSLjvM&imgurl=http://www.jamesrahn.com/graph%20paper/IMAGES/graph_32.gif&w=403&h=407&ei=DFwhUKObLYXj0QGLmoCQDQ&zoom=1&iact=hc&vpx=758&vpy=161&dur=4642&hovh=226&hovw=223&tx=162&ty=104&sig=109569882329434534921&page=1&tbnh=122&tbnw=119&start=0&ndsp=21&ved=1t:429,r:6,s:0,i:160

3. Find the linear regression equation and graph this on your plot.

y = -0.0065x + 49.57

4. What is the rate of change for this line? In context of the data, describe the rate of change.

rate of change = slope = -0.0065For each increase in car weight (in pounds), on average, fuel mileage will decrease by 0.0065 mpg.

5. What is the correlation coefficient? In context of the data, describe the correlation coefficient.

correlation coefficient = r = -0.935There is a strong, negative, linear relationship between car weight and fuel efficiency.

6. A Smart Car weighs about 1,600 pounds. Showing your rationale, predict its gas mileage.

prediction: -0.0065(1600) + 49.57 = 39.2

We would expect a Smart Car to get about 39.2 mpg.

7. What is the residual value for a car weighing 3489 pounds?

observed y: 28 mpgpredicted y: 26.9 mpg

residual = observed y – predicted y = 28 mpg – 26.9 mpg = 1.1 mpg

8. Find the residual value for each of the car weights.

Car Weight in Pounds

Gas Mileage

MPG (highway

)

Residual Value

3489 28 1.063955 25 1.083345 27 -0.873085 29 -0.564915 18 0.314159 21 -1.604289 20 -1.753992 26 2.32

Which car weight has the largest residual value? Show this on your scatterplot. Which car weigh has the smallest residual value? Show this on your scatterplot.

largestx=399

2smalle

stx=491

5

Describe what a residual value from your data means.

A residual value is the vertical (y) distance an observation point is from the prediction (regression) line.

The larger the residual value, the further from the prediction line the point is located. Positive residuals are found with points above the regression line.

9. Use your car weights and residual values to make a residual plot. Analyze your residual plot.

resi

du

al weight

Because there is no clear pattern in the residual plot, we can conclude a linear model is the best fit for our data, mpg vs. car weight

Correlation does not equal causation!

10. Examine this data and describe the correlation.

There is a VERY strong, positive, linear relationship between our puppy’s weight and Alaska’s snowshoe price.

Discuss the moral of this example, “be careful what you infer from your statistical analysis.”

BE SURE YOUR RELATIONSHIP MAKES SENSE!

What other variables could be involved in this relationship?

If our puppy was born at the beginning of snowshoe season, it would make sense that the weight and price would increase together.

Representations of Data Activity

1. Find the minimum, quartile 1, median, quartile 3, and maximum for the weights of the players. Use this information to construct a boxplot.

165 220 310

2. Find the minimum, quartile 1, median, quartile 3, and maximum for the heights of the players. Use this information to construct a boxplot.

907871

3. Find the minimum, quartile 1, median, quartile 3, and maximum for the heights of all the players except for Yao Ming. Use this information to construct a boxplot.

71 78 86

4. Compare the boxplots from Questions 2 and 3. How has the plot changed?

The right “whisker” and the box got smaller when we removed Ming’s height. This changes the spread of our data.

907871

5. Did the minimum or the maximum change? Why or why not? Be sure to relate your reasons to the data you used to construct your plot.

The minimum stayed the same, but the maximum changed because we removed the largest (maximum) observation.

6. Did the median change? Why or why not? Be sure to relate your reasons to the data you used to construct your plot.

The median stayed the same because the middle observation did not change.

7. Did the upper or lower quartile change? Why or why not? Be sure to relate your reasons to the data you used to construct your plot.

Because we only removed the largest observation, the lower half of the data did not change (nor did the lower quartile)… however, the upper half changed slightly because we removed one of the data points. The max and upper quartile changed.

Relative, Joint, and MarginalFrequencies Activity

1. Divide the numbers in the frequency table by the total to obtain relative frequencies as decimals. Record the results in the table below.

preferred food at game

hot dogs

hamburgers

pizza

total

frequency

0.45

0.3

0.25

1.0

2. How can you check to see if you have accurately converted frequencies to relative frequencies?

If the sum of the relative frequencies is 1 (or 100%), then we have correctly converted.

3. Explain why the number in the total column of a relative frequency table is always 1 or 100%.

If the total column is more than 100%, then we have too much frequency in at least one category. If the total column is less than 100%, then we have too little frequency in at least one column. The total has to account for all (100%) of the observations .

4. What does the data tell us about the most preferred food to eat at a baseball game?

Hot dogs, because that category has the largest relative frequency at 45%.

5. Fill in the missing marginal frequencies (the entries in the row and column total).

6. Highlight the joint frequencies (entries in the body of the table).

7. Find the grand total, which is the sum of the row totals as well as the sum of the column totals. Write the grand total in the lower-right corner of the two-way table.

hotdogs hamburgers

pizza total

child 8 1 2 11teenager 5 3 5 13

adult 5 8 3 16total 18 12 10 40

8. Where have you seen the row totals before?

They were from Carla’s original table, without considering the age of the respondent.

9. In terms of Carla’s survey, what does the grand total represent?

It is the total amount of people that Carla selected for her survey.

10. What does the data tell us about the preference of food for children at a baseball game?Most children prefer to eat hotdogs at baseball games.

11. How does this compare with the adults?The majority of adults polled prefer hamburgers while at baseball games.

12. Make a relative frequency table for each age group (row variable).

13. What is the conditional probability that a child will choose pizza?

18% 14. What is the conditional probability that an adult will choose hotdogs?

31%

hotdogs hamburgers

pizza total

child 0.73 0.09 0.18 1 teenager 0.385 0.23 0.385 1

adult 0.31 0.5 0.19 1

algebra 1 and statistics…. teacher reference: descriptive statistics and analyses

Documents

number of observations

number line

distribution center

number of states

number of ounces

number of siblings

number of pets

center observations