5-minute check on lesson 1-2 click the mouse button or press the space bar to display the answers....

40
5-Minute Check on Lesson 1-2 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the Click the mouse button or press the Space Bar to display the answers. answers. 1. What 4 terms are used to describe data sets or distributions? 2. Which type of graph can our calculators do (bar or histogram)? 3. How many classes should a histogram have? 4. What needs to be looked for in time-series graphs? 5. What is the major difference between a histogram and a stem-plot? 6. Name a possible graphical error in a Shape, Outliers, Center, Spread (SOCS) histogram classes = square root (number of observations) seasonal trends histogram summarizes the data stem-plot maintains the data overlapping categories

Upload: brendan-buck

Post on 14-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

5-Minute Check on Lesson 1-25-Minute Check on Lesson 1-25-Minute Check on Lesson 1-25-Minute Check on Lesson 1-2

Click the mouse button or press the Space Bar to display the answers.Click the mouse button or press the Space Bar to display the answers.

1. What 4 terms are used to describe data sets or distributions?

2. Which type of graph can our calculators do (bar or histogram)?

3. How many classes should a histogram have?

4. What needs to be looked for in time-series graphs?

5. What is the major difference between a histogram and a stem-plot?

6. Name a possible graphical error in a histogram

Shape, Outliers, Center, Spread (SOCS)

histogram

classes = square root (number of observations)

seasonal trends

histogram summarizes the datastem-plot maintains the data

overlapping categories

Page 2: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Lesson 1 - 3

Describing Quantitative Data with Numbers

adapted from Mr. Molesky’s TPS 4E slides

Page 3: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Objectives• Calculate and interpret measures of center (mean,

median, mode)

• Calculate and interpret measures of spread (IQR, standard deviation, range)

• Identify outliers using the 1.5 x IQR rule

• Make a boxplot

• Select appropriate measures of center and spread

• Use appropriate graphs and numerical summaries to compare distributions of quantitative variables

Page 4: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Vocabulary

• Boxplot – graphs the five number summary and any outliers

• Degrees of freedom – the number of independent pieces of information that are included in your measurement

• Five-number summary – the minimum, Q1, Median, Q3, maximum

• Interquartile range – the range of the middle 50% of the data; (IQR) – IQR = Q3 – Q1

• Mean – the average value (balance point); x-bar

• Median – the middle value (in an ordered list); M

• Mode – the most frequent data value

Page 5: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Vocabulary cont

• Outlier – a data value that lies outside the interval [Q1 – 1.5 IQR, Q3 + 1.5 IQR]

• Pth percentile – p percent of the observations (in an ordered list) fall below at or below this number

• Quartile – multiples of 25th percentile (Q1 – 25th; Q2 –50th or median; Q3 – 75th)

• Range – difference between the largest and smallest observations

• Resistant measure – a measure (statistic or parameter) that is not sensitive to the influence of extreme observations

• Standard Deviation– the square root of the variance

• Variance – the average of the squares of the deviations from the mean

Page 6: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Measures of Center

Numerical descriptions of distributions begin with a measure of its “center”

If you could summarize the data with one number, what would it be?

Mean: The “average” value of a dataset

Median: The “middle” value of an ordered dataset1.Arrange observations in order min to max2.Locate the middle observation, average if needed

Page 7: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Mean vs Median

The mean and the median are the most common measures of center

If a distribution is perfectly symmetric, the mean and the median are the same

The mean is not resistant to outliers

The mode, the data value that occurs the most often, is a common measure of center for categorical data

You must decide which number is the most appropriate description of the center...

MeanMedian Applet

Use the mean on symmetric data andthe median on skewed data or data with outliers

Page 8: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Skewed Left: (tail to the left)Mean substantially smaller than median

(tail pulls mean toward it)

Mean < Median < Mode

Mode

Median

Mean

Distributions Parameters

Page 9: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Symmetric:Mean roughly equal to median

Mean ≈ Median ≈ Mode

Mode

Median

Mean

Distributions Parameters

Page 10: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Skewed Right: (tail to the right)Mean substantially greater than median

(tail pulls mean toward it)

Mean > Median > Mode

Mode

Median

Mean

Distributions Parameters

Page 11: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Central Measures Comparisons

Measure of Central Tendency

Computation Interpretation When to use

Meanμ = (∑xi ) / Nx‾ = (∑xi) / n

Center of gravity

Data are quantitative and

frequency distribution is

roughly symmetric

Median

Arrange data in ascending order

and divide the data set into half

Divides into bottom 50% and

top 50%

Data are quantitative and

frequency distribution is

skewed

Mode

Tally data to determine most

frequent observation

Most frequent observation

Data are categorical or the

most frequent observation is the

desired measure of central tendency

Page 12: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Measuring Center: Example 1

• Use the data below to calculate the mean and median of the commuting times (in minutes) of 20 randomly selected New York workers. Example, page 53Example, page 53

10 30 5 25 40 20 10 15 30 20 15 20 85 15 65 15 60 60 40 45

minutes 25.3120

4540...2553010

x

0 51 0055552 00053 004 00556 00578 5

Key: 4|5 represents a New York worker who reported a 45-minute travel time to work.

M 20 25

222.5 minutes

Page 13: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Example 2

Which of the following measures of central tendency resistant?

1. Mean

2. Median

3. Mode

Not resistant

Resistant

Resistant

Page 14: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Example 3Given the following set of data:

70, 56, 48, 48, 53, 52, 66, 48, 36, 49, 28, 35, 58, 62, 45, 60, 38, 73, 45, 51,56, 51, 46, 39, 56, 32, 44, 60, 51, 44, 63, 50, 46, 69, 53, 70, 33, 54, 55, 52

What is the mean?

What is the median?

What is the mode?

 

What is the shape of the distribution?

51.125

51

48, 51, 56

Symmetric(tri-modal)

Page 15: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Example 4Given the following types of data and sample sizes, list the measure of central tendency you would use and explain why?

Sample of 50 Sample of 200Hair colorHeightWeightParent’s IncomeNumber of SiblingsAge

Does sample size affect your decision?

mode mode

mean mean

mean meanmedian medianmean meanmean mean

Not in this case, but the larger the sample size, might allow use to use the mean vs the median

Page 16: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Day 1 Summary and Homework

• Summary– Three characteristics must be used to describe

distributions (from histograms or similar charts)• Shape (uniform, symmetric, bi-modal, etc) • Outliers (rule next lesson)• Center (mean, median, mode measures)• Spread (IQR, variance – next lesson)

– Median is resistant to outliers; mean is not!– Use Mean for symmetric data– Use Median for skewed data (or data with outliers)– Use Mode for categorical data

• Homework– pg 70-74; prob 79, 81, 83, 87, 89

Page 17: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

5-Minute Check on Lesson 1-3a5-Minute Check on Lesson 1-3a5-Minute Check on Lesson 1-3a5-Minute Check on Lesson 1-3a

Click the mouse button or press the Space Bar to display the answers.Click the mouse button or press the Space Bar to display the answers.

1. What are the two quantitative measures of center?

2. When do we use one versus the other?

3. Which one is resistant to outliers?

4. Which measure of center is used for qualitative data?

5. Find the mean, median and mode of the following data set: 7, 15, 4, 8, 16, 17, 2, 5, 11, 8, 12, 6

Mean and median

Mean for symmetric data and median for skewed

Median

Mode

Mean: 9.25Median: 8Mode: 8

Page 18: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Measures of Spread

Variability is the key to Statistics. Without variability, there would be no need for the subject.

When describing data, never rely on center alone.

Measures of Spread:Range - {rarely used ... why?}

Quartiles - InterQuartile Range {IQR=Q3-Q1}

Variance and Standard Deviation {var and sx}

Like Measures of Center, you must choose the most appropriate measure of spread.

Page 19: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Standard Deviation

Another common measure of spread is the Standard Deviation: a measure of the “average” deviation of all observations from the mean.

To calculate Standard Deviation:Calculate the mean.Determine each observation’s deviation (x - xbar).“Average” the squared-deviations by dividing the total squared deviation by (n-1).This quantity is the Variance.Square root the result to determine the Standard Deviation.

Page 20: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Standard Deviation Properties

s measures spread about the mean and should be used only when the mean is used as the measure of center

s = 0 only when there is no spread/variability. This happens only when all observations have the same value. Otherwise, s > 0. As the observations become more spread out about their mean, s gets larger

s, like the mean x-bar, is not resistant. A few outliers can make s very large

Page 21: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Standard Deviation

Variance:

Standard Deviation:

Example 1.16 (p.85 of TPS 3E): Metabolic Rates

var (x1 x )2 (x2 x )2 ... (xn x )2

n 1

sx (xi x )2n 1

1792 1666 1362 1614 1460 1867 1439

Page 22: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Standard Deviation

1792 1666 1362 1614 1460 1867 1439

x (x - x) (x - x)2

1792 192 36864

1666 66 4356

1362 -238 56644

1614 14 196

1460 -140 19600

1867 267 71289

1439 -161 25921

Totals: 0 214870

Metabolic Rates: mean=1600

Total Squared Deviation

214870

Variance

var=214870/6

var=35811.66

Standard Deviation

s=√35811.66

s=189.24 cal

What does this value, s, mean?

Page 23: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

The Interquartile Range (IQR)

– A measure of center alone can be misleading.– A useful numerical description of a distribution requires

both a measure of center and a measure of spread.

To calculate the quartiles:

1)Arrange the observations in increasing order and locate the median M.

2)The first quartile Q1 is the median of the observations located to the left of the median in the ordered list.

3)The third quartile Q3 is the median of the observations located to the right of the median in the ordered list.

The interquartile range (IQR) is defined as:

IQR = Q3 – Q1

How to Calculate the Quartiles and the Interquartile Range

Page 24: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

QuartilesQuartiles Q1 and Q3 represent the 25th and 75th percentiles.

To find them, order data from min to max.

Determine the median - average if necessary.

The first quartile is the middle of the ‘bottom half’.

The third quartile is the middle of the ‘top half’.

19 22 23 23 23 26 26 27 28 29 30 31 32

45 68 74 75 76 82 82 91 93 98

med Q3=29.5Q1=23

med=79Q1 Q3

Page 25: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Example 1

Which of the following measures of spread are resistant?

1. Range

2. Variance

3. Standard Deviation

4. Interquartile Range (IQR)

Not Resistant

Not Resistant

Not Resistant

Resistant

Page 26: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Example 2

• Travel times to work for 20 randomly selected New Yorkers

5 10 10 15 15 15 15 20 20 20 25 30 30 40 40 45 60 60 65 85

Example, page 57Example, page 57

10 30 5 25 40 20 10 15 30 20 15 20 85 15 65 15 60 60 40 45

5 10 10 15 15 15 15 20 20 20 25 30 30 40 40 45 60 60 65 85

M = 22.5M = 22.5 Q3= 42.5Q1 = 15

IQR = Q3 – Q1

= 42.5 – 15= 27.5 minutes

Interpretation: The range of the middle half of travel times for the New Yorkers in the sample is 27.5 minutes.

Page 27: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Determining Outliers

InterQuartile Range “IQR”: Distance between Q1 and Q3. Resistant measure of spread...only measures middle 50% of data.

IQR = Q3 - Q1 {width of the “box” in a boxplot}

1.5 IQR Rule: If an observation falls more than 1.5 IQRs above Q3 or below Q1, it is an outlier.

“1.5 IQR Rule”“1.5 IQR Rule”

Why 1.5? According to John Tukey, 1 IQR seemed like too little and 2 IQRs Why 1.5? According to John Tukey, 1 IQR seemed like too little and 2 IQRs seemed like too much...seemed like too much...

Page 28: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Outliers: 1.5 IQR Rule

To determine outliers:

1. Find 5 Number Summary

2. Determine IQR

3. Multiply 1.5 IQR

4. Set up “fences”

A. Lower Fence: Q1 - (1.5 IQR)

B. Upper Fence: Q3 + (1.5 IQR)

5. Observations “outside” the fences are outliers.

Page 29: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Example 2 part 2

• In addition to serving as a measure of spread, the interquartile range (IQR) is used as part of a rule of thumb for identifying outliers.Definition:

The 1.5 x IQR Rule for Outliers

Call an observation an outlier if it falls more than 1.5 x IQR above the third quartile or below the first quartile.

Example, page 57Example, page 57

In the New York travel time data, we found Q1=15 minutes, Q3=42.5 minutes, and IQR=27.5 minutes.

For these data, 1.5 x IQR = 1.5(27.5) = 41.25

Q1 - 1.5 x IQR = 15 – 41.25 = -26.25

Q3+ 1.5 x IQR = 42.5 + 41.25 = 83.75

Any travel time shorter than -26.25 minutes or longer than 83.75 minutes is considered an outlier.

0 51 0055552 00053 004 00556 00578 5

Page 30: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

5-Number Summary, Boxplots

The 5 Number Summary provides a reasonably complete description of the center and spread of distribution

We can visualize the 5 Number Summary with a boxplot.

MIN Q1 MED Q3 MAX

min=45 Q1=74 med=79 Q3=91 max=98

45 50 55 60 65 70 75 80 85 90 95 100

Quiz ScoresOutlier?Outlier?

Page 31: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Drawing a Boxplot

The five-number summary divides the distribution roughly into quarters. This leads to a new way to display quantitative data, the boxplot.

• Draw and label a number line that includes the range of the distribution.

• Draw a central box from Q1 to Q3.

• Note the median M inside the box.

• Extend lines (whiskers) from the box out to the minimum and maximum values that are not outliers

Page 32: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Example 2 part 3

• Boxplot

M = 22.5M = 22.5 Q3= 42.5Q1 = 15Min=5

10 30 5 25 40 20 10 15 30 20 15 20 85 15 65 15 60 60 40 45

5 10 10 15 15 15 15 20 20 20 25 30 30 40 40 45 60 60 65 85

TravelTime0 10 20 30 40 50 60 70 80 90

Max=85Recall, this is an outlier by

the 1.5 x IQR rule

Page 33: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Example 3Consumer Reports did a study of ice cream bars (sigh, only

vanilla flavored) in their August 1989 issue. Twenty-seven bars having a taste-test rating of at least “fair” were listed, and calories per bar was included. Calories vary quite a bit partly because bars are not of uniform size. Just how many calories should an ice cream bar contain?

 

Construct a boxplot for the data above.

342 377 319 353 295 234 294 286

377 182 310 439 111 201 182 197

209 147 190 151 131 151

Page 34: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Example 3 - Answer

Q1 = 182 Q2 = 221.5 Q3 = 319

Min = 111 Max = 439 Range = 328

IQR = 137 UF = 524.5 LF = -23.5

Calories

100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500

Page 35: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Example 4

The weights of 20 randomly selected juniors at MSHS are recorded below:

 

 

a) Construct a boxplot of the data

b) Determine if there are any mild or extreme outliers

c) Comment on the distribution

121 126 130 132 143 137 141 144 148 205

125 128 131 133 135 139 141 147 153 213

Page 36: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Example 4 - Answer

Q1 = 130.5 Q2 = 138 Q3 = 145.5

Min = 121 Max = 213 Range = 92

IQR = 15 UF = 168 LF = 108

Mean = 143.6

StDev = 23.91

Weight (lbs)

100 110 120 130 140 150 160 170 180 190 200 210 220

**

Extreme Outliers( > 3 IQR from Q3)

Shape: somewhat symmetric Outliers: 2 extreme outliersCenter: Median = 138 Spread: IQR = 15

Page 37: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Example 5Consider the following test scores for a small class:

75 76 82 93 45 68 74 82 91 98

Plot the data and describe the SOCS:

Why use median describes the “center”?Why use IQR to describes the “spread’?

scores40 50 60 70 80 90 100

Collection 1 Dot Plot

scores40 50 60 70 80 90 100

Collection 1 Dot Plot Shape?Outliers?Center?Spread?

skewed left

maybe 45

M = 79

IQR = 91-74=17

data skewed

data skewed

Page 38: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Choosing Measures of Center & Spread

• We now have a choice between two descriptions for center and spread– Mean and Standard Deviation– Median and Interquartile Range

•The median and IQR are usually better than the mean and standard deviation for describing a skewed distribution or a distribution with outliers.

•Use mean and standard deviation only for reasonably symmetric distributions that don’t have outliers.

•NOTE: Numerical summaries do not fully describe the shape of a distribution. ALWAYS PLOT YOUR DATA!

Choosing Measures of Center and Spread

Page 39: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Using the TI-83

• Enter the test data into List, L1– STAT, EDIT enter data into L1

• Calculate 5 Number Summary– Hit STAT go over to CALC

and select 1-Var Stats and hitt 2nd 1 (L1)

• Use 2nd Y= (STAT PLOT) to graph the box plot– Turn plot1 ON– Select BOX PLOT (4th option, first in second row)– Xlist: L1– Freq: 1– Hit ZOOM 9:ZoomStat to graph the box plot

• Copy graph with appropriate labels and titles

Page 40: 5-Minute Check on Lesson 1-2 Click the mouse button or press the Space Bar to display the answers. 1.What 4 terms are used to describe data sets or distributions?

Day 2 Summary and Homework

• Summary– Sample variance is found by dividing by (n – 1) to keep it an

unbiased (since we estimate the population mean, μ, by using the sample mean, x-bar) estimator of population variance

– The larger the standard deviation, the more dispersion the distribution has

– Boxplots can be used to check outliers and distributions– Use comparative boxplots for two datasets– Identifying a distribution from boxplots or histograms is

subjective!– Use standard deviation with mean and IQR with median

• Homework– pg 82: prob 33; pg 89 probs 40, 41;

pg 97 probs 45, 46