chapter 1: exploring data - quia 1: exploring data sym 4th edition notes 5 ex. 1 construct a bar...

20
Chapter 1: Exploring Data SYM 4 th Edition Notes 1 Chapter 1: Exploring Data Objectives: Students will: Use a variety of graphical techniques to display a distribution. These should include bar graphs, pie charts, stemplots, histograms, ogives, time plots, and Boxplots Interpret graphical displays in terms of the shape, center, and spread of the distribution, as well as gaps and outliers Use a variety of numerical techniques to describe a distribution. These should include mean, median, quartiles, five- number summary, interquartile range, standard deviation, range, and variance Interpret numerical measures in the context of the situation in which they occur Learn to identify outliers in a data set AP Outline Fit: I. Exploring Data: Describing patterns and departures from patterns. A. Constructing and interpreting graphical displays of distributions of univariate data (dotplot, stemplot, histogram, cumulative frequency plot covered in chapter 2) 1. Center and spread 2. Clusters and gaps 3. Outliers and other unusual features 4. Shape B. Summarizing distributions of univariate data 1. Measuring center: median, mean 2. Measuring spread: range, interquartile range, standard deviation 3. Measuring position: quartiles, percentiles, standardized scores (z-scores covered in chapter 2) 4. Using boxplots 5. The effect of changing units on summary measures (covered in chapter 2) C. Comparing distributions of univariate data (dotplots, back-to-back stemplots, parallel boxplots) 1. Comparing center and spread: within group, between group variation 2. Comparing clusters and gaps 3. Comparing outliers and other unusual features 4. Comparing shapes E. Exploring categorical data 1. Frequency tables and bar charts 2. Marginal and joint frequencies for two-way tables (covered in chapter 5) 3. Conditional relative frequencies and association 4. Comparing distributions with bar charts. Calculator Functions in this Chapter: 1. Using Lists for data 2. Graphing a. Scatter plots b. Histograms c. Box plots 3. Summary Statistics a. Mean b. Standard Deviation c. Five-Number Summary

Upload: doanhanh

Post on 30-Mar-2018

220 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Chapter 1: Exploring Data - Quia 1: Exploring Data SYM 4th Edition Notes 5 Ex. 1 Construct a bar graph and a pie chart from the following data: Format Nr of Stations Pct Adult contemporary

Chapter 1: Exploring Data

SYM 4th

Edition Notes 1

Chapter 1: Exploring Data

Objectives: Students will:

Use a variety of graphical techniques to display a distribution. These should include bar graphs, pie charts, stemplots,

histograms, ogives, time plots, and Boxplots

Interpret graphical displays in terms of the shape, center, and spread of the distribution, as well as gaps and outliers

Use a variety of numerical techniques to describe a distribution. These should include mean, median, quartiles, five-

number summary, interquartile range, standard deviation, range, and variance

Interpret numerical measures in the context of the situation in which they occur

Learn to identify outliers in a data set

AP Outline Fit:

I. Exploring Data: Describing patterns and departures from patterns.

A. Constructing and interpreting graphical displays of distributions of univariate data (dotplot, stemplot, histogram,

cumulative frequency plot covered in chapter 2)

1. Center and spread

2. Clusters and gaps

3. Outliers and other unusual features

4. Shape

B. Summarizing distributions of univariate data

1. Measuring center: median, mean

2. Measuring spread: range, interquartile range, standard deviation

3. Measuring position: quartiles, percentiles, standardized scores (z-scores covered in chapter 2)

4. Using boxplots

5. The effect of changing units on summary measures (covered in chapter 2)

C. Comparing distributions of univariate data (dotplots, back-to-back stemplots, parallel boxplots)

1. Comparing center and spread: within group, between group variation

2. Comparing clusters and gaps

3. Comparing outliers and other unusual features

4. Comparing shapes

E. Exploring categorical data

1. Frequency tables and bar charts

2. Marginal and joint frequencies for two-way tables (covered in chapter 5)

3. Conditional relative frequencies and association

4. Comparing distributions with bar charts.

Calculator Functions in this Chapter:

1. Using Lists for data

2. Graphing

a. Scatter plots

b. Histograms

c. Box plots

3. Summary Statistics

a. Mean

b. Standard Deviation

c. Five-Number Summary

Page 2: Chapter 1: Exploring Data - Quia 1: Exploring Data SYM 4th Edition Notes 5 Ex. 1 Construct a bar graph and a pie chart from the following data: Format Nr of Stations Pct Adult contemporary

Chapter 1: Exploring Data

SYM 4th

Edition Notes 2

What You Will Learn:

A. Displaying Distribution

1. Make a stemplot of the distribution of a quantitative variable.

2. Trim the numbers or split stems as needed to make an effective stemplot

3. Make a histogram of the distribution of a quantitative variable

4. Construct and interpret an ogive of a set of quantitative data

B. Inspecting Distributions (Quantitative Variables)

1. Look for the overall pattern and for major deviations from the pattern

2. Assess from a dotplot, stemplot, or histogram whether the shape of a distribution is roughly symmetric,

distinctly skewed, or neither.

3. Assess whether the distribution has one or more major modes

4. Describe the overall pattern by giving numerical measures of center and spread in addition to a verbal

description of shape

5. Decide which measures of center and spread are more appropriate: the mean and standard deviation (especially

for symmetric distributions) or the five-number summary (especially for skewed distributions)

6. Recognize outliers

C. Time Plots

1. Make a time plot of data, with the time of each observation on the horizontal axis and the value of the observed

variable on the vertical axis

2. Recognize strong trends or other patterns in a time plot

D. Measuring Center

1. Find the mean, x-bar, of a set of observations

2. Find the median M of a set of observations

3. Understand that the median is more resistant (less affected by extreme observations) than the mean.

4. Recognize that skewness in a distribution moves the mean away from the median toward the long fall (tail)

E. Measuring Spread

1. Find the quartiles Q1 and Q3 for a set of data

2. Give the five-number summary and draw a boxplot, assess center, spread, symmetry, and skewness from a

boxplot.

3. Determine outliers

4. Using a calculator or software, find the standard deviation, s, for a set of observations

5. Know the basic properties of s:

a) s ≥ 0 always;

b) s = 0 only when all observations are identical; s increases as the spread increases;

c) s has the same units as the original measurements;

d) s is increased by outliers or skewness

F. Comparing Distributions

1. Use side-by-side bar graphs to compare distributions of categorical data

2. Make back-to-back stemplots or side-by-side Boxplots to compare distributions of quantitative variables

3. Write narrative comparisons of the shape, center, spread, and outliers for two or more quantitative distributions

Page 3: Chapter 1: Exploring Data - Quia 1: Exploring Data SYM 4th Edition Notes 5 Ex. 1 Construct a bar graph and a pie chart from the following data: Format Nr of Stations Pct Adult contemporary

Chapter 1: Exploring Data

SYM 4th

Edition Notes 3

Section 1.0: Data Analysis: Making Sense of Data

Objectives: Students will be able to:

Identify the individuals and variables in a set of data

Classify variables as categorical or quantitative

Identify units of measurement for a quantitative variable

Vocabulary:

Individuals – objects described by a set of data; maybe people, animals or things

Variable – any characteristic of an individual; can take on different values for different individuals

Categorical variable – places an individual into one of several groups or categories

Quantitative variable – takes numerical values; for which it makes sense to find an average!

Distribution – tells us what values the variable takes on and how often it takes these values

Inference – using a sample of data to infer (to draw conclusions) about a larger group of data

Key Concept:

Data Exploration:

a) Begin by examining each variable by itself

b) Study the relationship between the variables

c) Use plots or graphs

d) Add numerical summaries

Activity:

Hiring Discrimination on pg 5

Read the scenario and answer the following question:

Reverse discrimination: Yes or No

Break into pairs and follow the instructions in the activity

One person work with the cards

One person acts as a recorder of the data and fills in the following table:

Captains Trial 1 Trial 2 Trial 3 Trial 4 Trial 5

Male

Female

Did your opinion change after the class’s data was displayed?

Examine the homework from last night:

What variables on the information sheet were categorical? Quantitative?

What variables on Mr. Starnes’s survey were categorical? Quantitative?

Summary:

A data set contains information on a number of individuals

Information is often values for one or more variables

Variables can be categorical or quantitative

Distribution of a variable describes what values it can take on and how often

Homework: pg 7-8, problems 1, 3, 5, 7, 8

Page 4: Chapter 1: Exploring Data - Quia 1: Exploring Data SYM 4th Edition Notes 5 Ex. 1 Construct a bar graph and a pie chart from the following data: Format Nr of Stations Pct Adult contemporary

Chapter 1: Exploring Data

SYM 4th

Edition Notes 4

Section 1.1: Analyzing Categorical Data

Objectives: Students will be able to:

Make a bar graph of the distribution of a categorical variable or, in general, to compare related quantities

Recognize when a pie chart can and cannot be used

Identify what makes some graphs deceptive

From two-way table of counts, answer questions involving marginal and conditional distributions

Describe the relationship between two categorical variables by computing appropriate conditional distributions

Construct bar graphs to display the relationship between two categorical variables

Vocabulary:

Association – two variables are associated if specific values of one variable tend to occur in common with specific values

of the other variable

Bar graph – displays the distribution of a categorical variable

Bimodal – a distribution whose shape has two peaks (modes)

Conditional distribution – describes the values of a variable among individuals who have a specific value of another

variable; separate distributions for each value of the other variable.

Histogram – breaks range of values into classes and displays their frequencies

Frequency – counts of data in a class

Frequency table – table of frequencies

Marginal distribution – the distribution of just one variable in a two-way table (found at the right or bottom)

Modes – major peaks in a distribution

Ogive – relative cumulative frequency graph

Pie chart – chart that emphasize each category’s relation to the whole

Roundoff error – errors associated with decimal inaccuracies

Simpson’s paradox – an association between two variables that holds for each individual value of a third variable can be

changed or even reversed with the data for all values of the third variable are combined.

Two-way table – describes two categorical variables, organizing counts according to a row variable and a column

variable

Unimodal – a distribution whose shape with a single peak (mode)

Key Concepts:

Categorical Charts

0

2

4

6

8

10

12

14

Ba

ck

Wri

st

Elb

ow

Hip

Sh

ou

lde

r

Kn

ee

Ha

nd

Gro

in

Ne

ck

Rehab

Rehab

Back

40%

Wrist

7%Elbow

3%

Hip

7%

Shoulder

13%

Knee

17%

Hand

7%

Groin

3%

Neck

3% Pie ChartBar Chart

Page 5: Chapter 1: Exploring Data - Quia 1: Exploring Data SYM 4th Edition Notes 5 Ex. 1 Construct a bar graph and a pie chart from the following data: Format Nr of Stations Pct Adult contemporary

Chapter 1: Exploring Data

SYM 4th

Edition Notes 5

Ex. 1 Construct a bar graph and a pie chart from the following data:

Format Nr of Stations Pct

Adult contemporary 1,556 11.2

Adult standards 1.196 8.6

Contemporary Hits 569 4.1

Country 2,066 14.9

News/Talk/Info 2,179 15.7

Oldies 1,060 7.7

Religious 2,014 14.6

Rock 869 6.3

Spanish Language 750 5.4

Other formats 1,579 11.4

Total 13,838 99.9

Ex. 2 Below are the results of a survey of young adults by gender:

Young adults by gender and chance of getting rich

Female Male Total

Almost no chance 96 98 194

Some chance, but probably not 426 286 712

A 50-50 chance 696 720 1416

A good chance 663 758 1421

Almost certain 486 597 1083

Total 2367 2459 4826

a) What are the variables described by this two-way table?

b) How many young adults were surveyed?

Ex. 3 Why can’t we get totals in the super-powers data?

Homework: pg 22-6, problems 12, 16, 20, 23, 24

Page 6: Chapter 1: Exploring Data - Quia 1: Exploring Data SYM 4th Edition Notes 5 Ex. 1 Construct a bar graph and a pie chart from the following data: Format Nr of Stations Pct Adult contemporary

Chapter 1: Exploring Data

SYM 4th

Edition Notes 6

Section 1.2: Displaying Quantitative Data with Graphs

Objectives: Students will be able to:

Make a dotplot or stemplot to display small sets of data

Describe the overall pattern (shape, outliers – major departures from the pattern, center, and spread) of a distribution

Make a histogram with a reasonable choice of classes

Identify the shape of a distribution from a dotplot, stemplot or histogram (roughly symmetric or skewed – right/left)

Identify the number of modes of a distribution

Interpret histograms

Vocabulary:

Back-to-back stemplot – two distributions plotted with a common stem

Bimodal – a distribution whose shape has two peaks (modes)

Dotplot – each data point is marked as a dot above a number line

Histogram – breaks range of values into classes and displays their frequencies

Frequency – counts of data in a class

Modes – major peaks in a distribution

Multimodal – a distribution whose shape has more than two peaks (modes)

Ogive – relative cumulative frequency graph

Seasonal variation – a regular rise and fall in a time plot

Skewed – if smaller or larger values from the center form a tail

Splitting stems – divides step into 0-4 and 5-9

Stemplot – includes actual numerical values in a plot that gives a quick picture of the distribution

Symmetric – if values smaller and larger of the center are mirror images of each other

Time plot – plots a variable against time on the horizontal scale of the plot

Trimming – removes the last digit or digits before making a stemplot

Unimodal – a distribution whose shape with a single peak (mode)

Key Concepts:

Dot plots: Maintains the raw data, while histograms do not maintain the raw data

Best used when the data sets are small

Stem and Leaf plots: Maintains the raw data, while histograms do not maintain the raw data

Best used when the data sets are small

Page 7: Chapter 1: Exploring Data - Quia 1: Exploring Data SYM 4th Edition Notes 5 Ex. 1 Construct a bar graph and a pie chart from the following data: Format Nr of Stations Pct Adult contemporary

Chapter 1: Exploring Data

SYM 4th

Edition Notes 7

Histograms and Bar Graphs: Bar graphs have bars touching; histograms don’t

The number of classes, k, to be constructed can be roughly approximated by

k = number of observations

To determine the width of a class use w = maximum - minimum

k

and always round up to the same decimal units as the original data.

Other Charts: Charts for both types of data

00.05

0.1

0.150.2

0.250.3

0.350.4

0.45

Ba

ck

Wri

st

Elb

ow

Hip

Sh

ou

lde

r

Kn

ee

Ha

nd

Gro

in

Ne

ck

Pe

rce

nt

Rehab

00.05

0.1

0.150.2

0.250.3

0.350.4

0.45

Ba

ck

Kn

ee

Sh

ou

lde

r

Wri

st

Ha

nd

Hip

Elb

ow

Gro

in

Ne

ck

Pe

rce

nt

Rehab

Pareto ChartRelative Frequency Chart

0

0.2

0.4

0.6

0.8

1

1.2B

ac

k

Wri

st

Elb

ow

Hip

Sh

ou

lde

r

Kn

ee

Ha

nd

Gro

in

Ne

ck

Pe

rce

nt

Rehab

Cumulative Frequency Chart

Plot in the upper right corner is a Pareto chart. It is the same as the relative frequency chart; except the categories are in

relative frequency order (from largest to smallest) from left to right. This graph came from the Total Quality Management

(TQM) era in the middle to late 1980’s. The bottom chart is also known as an ogive (a confusing chart for students!).

Cautions:

• Label all axes and title all graphs

• Histogram rectangles touch each other; rectangles in bar graphs do not touch.

• Can’t have class widths that overlap

• Raw data can be retrieved from the stem-and-leaf plot; but a frequency distribution of histogram of continuous data

summarizes the raw data

• Only quantitative data can be described as skewed left, skewed right or symmetric (uniform or bell-shaped)

Uniform Mound-like (Bell-Shaped)

Skewed Left (-- tail)Bi-Modal Skewed Right (-- tail)

Frequency Distributions

Page 8: Chapter 1: Exploring Data - Quia 1: Exploring Data SYM 4th Edition Notes 5 Ex. 1 Construct a bar graph and a pie chart from the following data: Format Nr of Stations Pct Adult contemporary

Chapter 1: Exploring Data

SYM 4th

Edition Notes 8

With the following data

a) Construct a stem graph (in example 1 do a back-to-back [comparative] stem plot)

b) Construct a histogram

Ex. 1 The ages (measured by last birthday) of the employees of Dewey, Cheatum and Howe are listed below.

Office A: 22 31 21 49 26 42 42 30 28 31 39 39

Office B: 20 37 32 36 35 33 45 47 49 38 28 48

Ex. 2 Below are times obtained from a mail-order company's shipping records concerning time from receipt of order to

delivery (in days) for items from their catalogue?

3 7 10 5 14 12 6 2 9 22 25 11

5 7 12 10 22 23 14 8 5 4 7 13

27 31 13 21 6 8 3 10 19 12 11 8

Homework: pg 42-50; problems 37, 41, 43, 49, 62, 63

Page 9: Chapter 1: Exploring Data - Quia 1: Exploring Data SYM 4th Edition Notes 5 Ex. 1 Construct a bar graph and a pie chart from the following data: Format Nr of Stations Pct Adult contemporary

Chapter 1: Exploring Data

SYM 4th

Edition Notes 9

Section 1.3: Describing Quantitative Data with Numbers

Objectives: Students will:

Calculate and interpret measures of center (mean, median, mode)

Calculate and interpret measures of spread (IQR, standard deviation, range)

Identify outliers using the 1.5 x IQR rule

Make a boxplot

Select appropriate measures of center and spread

Use appropriate graphs and numerical summaries to compare distributions of quantitative variables

Vocabulary:

Boxplot – graphs the five number summary and any outliers

Degrees of freedom – the number of independent pieces of information that are included in your measurement

Five-number summary – the minimum, Q1, Median, Q3, maximum

Interquartile range – the range of the middle 50% of the data; (IQR) – IQR = Q3 – Q1

Mean – the average value (balance point); x-bar

Median – the middle value (in an ordered list); M

Mode – the most frequent data value

Outlier– a data value that lies outside the interval [Q1 – 1.5 IQR, Q3 + 1.5 IQR]

Pth

percentile – p percent of the observations (in an ordered list) fall below at or below this number

Quartile – multiples of 25th

percentile (Q1 – 25th

; Q2 –50th

or median; Q3 – 75th

)

Range – difference between the largest and smallest observations

Resistant measure – a measure (statistic or parameter) that is not sensitive to the influence of extreme observations

Standard Deviation– the square root of the variance

Variance – the average of the squares of the deviations from the mean

Key Concepts:

Measure of

Central

Tendency

Computation Interpretation When to use

Mean

μ = (∑xi ) / N

x‾ = (∑xi) / n

Center of gravity

Data are quantitative and

frequency distribution is

roughly symmetric

Median

Arrange data in ascending order;

middle value or average of the two

middle values (if even number)

Divides into

bottom 50% and top 50%

Data are quantitative and

frequency distribution is

skewed

Mode Tally data to determine most

frequent observation Most frequent observation

Data are qualitative or the

most frequent observation is

the desired measure of

central tendency

Center: The mean and the median are the most common measures of center

If a distribution is perfectly symmetric, the mean and the median are the same

The mean is not resistant to outliers

The mode, the data value that occurs the most often, is a common measure of

center for categorical data

Use the mean on symmetric data and the median on skewed data or data with

outliers

Spread:

Standard deviation is the most common measure of spread and is used only

with the mean.

Inter-quartile range (IQR = Q3 – Q1) is used with medians

Range (max – min value) is also a measure of spread.

Distributions Shape

Skewed Left:

Mean substantially smaller

than median

Symmetric:

Mean roughly equal to median

Skewed Right:

Mean substantially greater

than median

Mean ≈ Median ≈ Mode

Mean > Median > Mode

Mean < Median < Mode

Page 10: Chapter 1: Exploring Data - Quia 1: Exploring Data SYM 4th Edition Notes 5 Ex. 1 Construct a bar graph and a pie chart from the following data: Format Nr of Stations Pct Adult contemporary

Chapter 1: Exploring Data

SYM 4th

Edition Notes 10

Distribution Shape Based on Boxplots:

a. If the median is near the center of the box and each horizontal line is of approximately equal length, then the

distribution is roughly symmetric

b. If the median is to the left of the center of the box or the right line is substantially longer than the left line,

then the distribution is skewed right

c. If the median is to the right of the center of the box or the left line is substantially longer than the right line,

then the distribution is skewed left

Remember identifying a distribution from boxplots or histograms is subjective!

Why Use a Boxplot?

A boxplot provides an alternative to a histogram, a dotplot, and a stem-and-leaf plot. Among the advantages of a

boxplot over a histogram are ease of construction and convenient handling of outliers. In addition, the

construction of a boxplot does not involve subjective judgments, as does a histogram. That is, two individuals

will construct the same boxplot for a given set of data - which is not necessarily true of a histogram, because the

number of classes and the class endpoints must be chosen. On the other hand, the boxplot lacks the details the

histogram provides.

Dotplots and stemplots retain the identity of the individual observations; a boxplot does not. Many sets of data

are more suitable for display as boxplots than as a stemplot. A boxplot as well as a stemplot are useful for

making side-by-side comparisons.

Five-number summary

Min Q1 M Q3 Max

smallest

value

largest

value

Boxplot

First, Second and Third Quartiles

(Second Quartile is the Median, M)

[ ] *

Outlier

Lower

Fence

Upper

Fence

Smallest Data Value > Lower Fence Largest Data Value < Upper Fence

(Min unless min is an outlier) (Max unless max is an outlier)

Median Notes:

If the number of observations, n, is odd, then the median is the center observation in the ordered list

If the number of observations is even, then the median is the average of the two center observations

Mean Notes:

The mean is pulled toward the tail of the distribution, away from the median, when the distribution is skewed

Outliers can also pull the mean toward them

Page 11: Chapter 1: Exploring Data - Quia 1: Exploring Data SYM 4th Edition Notes 5 Ex. 1 Construct a bar graph and a pie chart from the following data: Format Nr of Stations Pct Adult contemporary

Chapter 1: Exploring Data

SYM 4th

Edition Notes 11

Example 1: Which of the following are resistant measures of central tendency:

Mean, Range

Median or Variance

Mode? Standard Deviation

IQR

Example 2: Given the following set of data:

70, 56, 48, 48, 53, 52, 66, 48, 36, 49, 28, 35, 58, 62, 45, 60, 38, 73, 45, 51,

56, 51, 46, 39, 56, 32, 44, 60, 51, 44, 63, 50, 46, 69, 53, 70, 33, 54, 55, 52

What is the mean? What is the range?

What is the median? What is the variance?

What is the mode? What is the standard deviation?

What is the shape of the distribution? What is the IQR?

What is the Q1?

What is the Q3?

What is the IQR?

What is the upper fence?

What is the lower fence?

Are there any outliers?

Example 3: Given the following types of data and sample sizes, list the measure of central tendency you would use and

explain why?

Sample of 50 Sample of 200

Hair color

Height

Weight

Parent’s Income

Number of Siblings

Age

Does sample size affect your decision?

Page 12: Chapter 1: Exploring Data - Quia 1: Exploring Data SYM 4th Edition Notes 5 Ex. 1 Construct a bar graph and a pie chart from the following data: Format Nr of Stations Pct Adult contemporary

Chapter 1: Exploring Data

SYM 4th

Edition Notes 12

Example 4: Consumer Reports did a study of ice cream bars (sigh, only vanilla flavored) in their August 1989 issue.

Twenty-seven bars having a taste-test rating of at least “fair” were listed, and calories per bar was included. Calories vary

quite a bit partly because bars are not of uniform size. Just how many calories should an ice cream bar contain?

342 377 319 353 295 234 294 286 377 182 310

439 111 201 182 197 209 147 190 151 131 151

a) Construct a boxplot of the data

b) Determine if there are any outliers.

Example 5: The weights of 20 randomly selected juniors at MSHS are recorded below:

121 126 130 132 134 137 141 144 148 205

125 128 131 133 135 139 141 147 153 213

a) Construct a boxplot of the data

b) Determine if there are any outliers.

Example 6: Using the data from example #5

a) Change the weight from pounds to kilograms and add 2 kg (special uniform)

b) Get summary statistics and compare with example 5

c) Draw a box plot

Homework: pg 70-74; problems 79, 81, 85, 89, 91, 95, 99, 105

Page 13: Chapter 1: Exploring Data - Quia 1: Exploring Data SYM 4th Edition Notes 5 Ex. 1 Construct a bar graph and a pie chart from the following data: Format Nr of Stations Pct Adult contemporary

Chapter 1: Exploring Data

SYM 4th

Edition Notes 13

Chapter 1 Review

Objectives: Students will be able to:

Summarize the chapter

Define the vocabulary used

Know and be able to discuss all sectional and chapter knowledge objectives

Be able to do all sectional and chapter construction objectives

Successfully answer any of the review exercises

Vocabulary: None new

Summary:

Always include labels, scales, legend (if needed) and title on any graph you are asked to make

Supporting graphs (that you make as supporting evidence) don’t have to have as much detail

Numerical summaries do not fully describe the shape of a distribution; Always plot your data!

Median and IQR are resistant to outliers; mean and standard deviations are not

Boxplots do not tell you everything about the distribution (especially about the specific type)

IQR is a number (the length of the box part in the boxplot)

Describe all distributions by their Shape, Outliers, Center and Spread (SOCS)

o Shape – symmetric (uniform or mound-shape) or skewed (left or right); uni-, bi- or multi-modal

o Outliers – or any unusual data values or patterns

o Center – mean (symmetric) or median(skewed) or mode (for categorical data)

o Spread – tied to center; standard deviation or IQR (or range for categorical data)

Always use comparative language (much greater, less, about the same) when comparing distributions SOCS

Homework: pg 75-81; problems T1 1-15

Plot your data

Dotplot, Stemplot, Histogram

Interpret what you see

Shape, Outliers, Center, Spread

Choose numerical summary

x and s, or median and IQR

Page 14: Chapter 1: Exploring Data - Quia 1: Exploring Data SYM 4th Edition Notes 5 Ex. 1 Construct a bar graph and a pie chart from the following data: Format Nr of Stations Pct Adult contemporary

Chapter 1: Exploring Data

SYM 4th

Edition Notes 14

1. The upper or third quartile for grades on the first calculus test was 85%. Your friend, who has not taken statistics, scored

90% on the test. Explain to your friend how her grade compares to others in her class.

2. Suppose you have test scores of 72%, 91%, 86%, and 95% in your chemistry class. What score do you need to make on

the next test in order to have an 85% average?

3. In the computational formula for standard deviation, you sometimes use n and sometimes use (n – 1). Under what

circumstances should you use n?

4. Fill in the following blanks:

a. We studied two measures of central tendency, mean and median. Which of these is the more resistant measure?

_________________ Explain why this measure is more resistant.

b. We studied three measures of spread: standard deviation, interquartile range, and range. Which of these is the most

resistant measure? ________________

5. In an experiment designed to determine the effect of a drug on reaction time, a subject is asked to press a button

whenever a light flashes. The reaction times (in milliseconds) for ten trials are:

96 101 112 138 93 99 107 93 95 100

a. Make a stem and leaf plot to display this information. Be sure to include unit information (a legend).

b. What information about the distribution does the stem and leaf plot provide? Be thorough in your response.

6. Data were collected on a sample of Deerfield Academy students. Several of the variables are listed below. Next to each

variable, put all of the following words that correctly describe the variable:

Categorical quantitative discrete continuous

(a) Advisor ______________________________

(b) Height _______________________________

(c) Number of courses student is taking this term ______________________________________

7. A teacher returned the first test to the five students in a small class. She reported that the median score was 85 and the

mean score was 84. The student with the lowest score (62) realized that the teacher had incorrectly calculated her grade

and that the correct grade was 72. Assuming that this is still be the lowest score for the seminar students, when the

teacher recomputed the summary statistics, the median will equal _____________ and the mean will equal

________________ .

Page 15: Chapter 1: Exploring Data - Quia 1: Exploring Data SYM 4th Edition Notes 5 Ex. 1 Construct a bar graph and a pie chart from the following data: Format Nr of Stations Pct Adult contemporary

Chapter 1: Exploring Data

SYM 4th

Edition Notes 15

8. The histogram below displays weight increases (in pounds) for a sample of pigs

fed a certain diet. Assume that bars include right endpoints.

How many pigs were in this sample? ___________

Estimate the median weight increase for the pigs in this sample. __________

What proportion of these pigs had a weight increase exceeding 20 pounds?

_________________

Briefly (but completely) describe the shape of this distribution

9. As I drove through Connecticut several weeks ago, I obtained a sample of prices for a gallon of unleaded gasoline at

service stations I passed. Four of these are provided here: $3.09, $3.15, $3.19, $3.29. Use the definition and show work

below to find the mean and standard deviation of these prices. Round answers to the nearest cent.

Mean

Standard deviation

10. The Los Angeles Times reported interest rates for savings accounts at a sample of California banks. Summary statistics

are provided below:

Minimum = 3.15% Q1 = 3.25% Median = 3.31% Q3 = 3.33% Maximum = 4.35%

Determine whether the data set has any outliers (check for extremely low and high values). Show work and provide an

explanation to support your answer.

Page 16: Chapter 1: Exploring Data - Quia 1: Exploring Data SYM 4th Edition Notes 5 Ex. 1 Construct a bar graph and a pie chart from the following data: Format Nr of Stations Pct Adult contemporary

Chapter 1: Exploring Data

SYM 4th

Edition Notes 16

5 minute Reviews

Lesson 1-0:

1. Gender is an example of what type of variable?

2. Age is an example of what type of variable?

3. Your zip code is an example of what type of variable?

4. What was the percentage that was our boundary for something being unusual?

Lesson 1-1:

1. To organize data on two categorical variables use a:

2. Row totals and column totals are called:

3. When we fix the value of one categorical variable and look at the distribution of the other variable it is called:

4. A variable not in the data that influences variables in the collected data is called:

5. The four-steps in statistical analysis are:

Lesson 1-2A:

1. Dot plots and stem-plots have what advantages:

2. Dot plots and stem-plots are impractical when:

3. What pieces of SOCS can be seen in dot and stem-plots?

4. Compare the following distributions:

Office A Ages of Personnel Office B

1, 2, 6, 8 2 0, 8

0, 1, 1, 9, 9 3 2, 3, 5, 6, 7, 8

2, 2, 9 4 5, 7, 8, 9

Page 17: Chapter 1: Exploring Data - Quia 1: Exploring Data SYM 4th Edition Notes 5 Ex. 1 Construct a bar graph and a pie chart from the following data: Format Nr of Stations Pct Adult contemporary

Chapter 1: Exploring Data

SYM 4th

Edition Notes 17

Lesson 1-2B:

1. What 4 terms are used to describe data sets or distributions?

2. Which type of graph can our calculators do (bar or histogram)?

3. How many classes should a histogram have?

4. What needs to be looked for in time-series graphs?

5. What is the major difference between a histogram and a stem-plot?

6. Name a possible graphical error in a histogram

Lesson 1-3A:

1. What are the two quantitative measures of center?

2. When do we use one versus the other?

3. Which one is resistant to outliers?

4. Which measure of center is used for qualitative data?

5. Find the mean, median and mode of the following data set: 7, 15, 4, 8, 16, 17, 2, 5, 11, 8, 12, 6

Page 18: Chapter 1: Exploring Data - Quia 1: Exploring Data SYM 4th Edition Notes 5 Ex. 1 Construct a bar graph and a pie chart from the following data: Format Nr of Stations Pct Adult contemporary

Chapter 1: Exploring Data

SYM 4th

Edition Notes 18

Homework:

pg 7-8, problems 1, 3, 5, 7, 8

1. Cat: type of wood and water repellent, paint color Quan: paint thickness, weathering time

3. a. stat students completed survey b. Cat: gender, handedness, fav music; Quan: ht, homework time, total value of money

c. Female, right handed 58 inches tall, 60 minutes on homework, alternative music and 76 cents in change

5. Quan: percentage rates for retention and graduation rates; Cat: region of country, school type

7. B 8. C

pg 22-6, problems 12, 16, 20, 23, 24

12. You need to know total deaths to compute an “other” to get to 100%

16. a. Movie attendance seems to drop off as people get older; b. No, given percentages represent fractions of different age

groups rather than parts from a single whole. c. We don’t know how many people are in each age group

20. a. 5375 students total; 18.68% smoke b. marginal dist:

Neither One Both

Count 1356 2239 1780

% 25.23 41.66 33.12

23. Americans are much more likely to choose white or pearl and read, while Europeans are more likely to choose silver,

black or gray

24. b. heights of the bars in the first graph are different for every color except silver; biggest differences in white or pearl

Second graph shows black are most popular choice for luxury and white with SUVs

pg 42-50; problems 37, 41, 43, 49, 62, 63

37. Roughly symmetric, center of 6 and a range of 8 with no apparent outliers

41. a. tail must be to the left b. Fewer older coins in people’s pockets

43. Both distributions are roughly symmetric with a range of about 20 and a cluster to the left. Internal center is 21 while the

external is lower at 17.5. Ratings are on average higher for the internal

49. a. Most students seem to round to either 10 minute increments or ½ or 1 hour increments. Two students claimed to have

studied 5 and 6 hours (not likely) b. Yes; center for women is apx 175, which is greater than the 120 minutes for men

62. Should be near 10 for each of the six values (extremely rare for them all to be ten!!)

63. a. round-off error b. relative frequency diagrams would be best since much men than women c. Both histograms are

skewed to the right and generally women’s salaries are lower than men’s. Peak for women is between 20 and 25K, while

men’s peak is between 25 and 30K. Range is about the same.

Page 19: Chapter 1: Exploring Data - Quia 1: Exploring Data SYM 4th Edition Notes 5 Ex. 1 Construct a bar graph and a pie chart from the following data: Format Nr of Stations Pct Adult contemporary

Chapter 1: Exploring Data

SYM 4th

Edition Notes 19

pg 70-74; problems 79, 81, 85, 89, 91, 95, 99, 105

79. Mean is 85. Add scores and divide by 14

81. a. Median is the average of the 7th

and 8th

values (84+86)/2 = 85 b. Median is 84, but mean drops to 79.33. Mean is not

resistant to outliers

85. Total payroll = 30 Million. With Median you can’t calculate the sum of all 25 salaries!!

89. a. IQR = 91 – 78 = 13 (middle 50% has a spread of 13 points) b. No outliers. No values below LF (78 – 1.5*13 = 58.5)

or above UF (91 + 1.5*13 = 110.5)

91. Boxplot is skewed right with one high outlier. Half of students sent less than 10 texts a day, way less than the average of

58 texts per day (1742/30) claimed by the article.

95. a. Stock varied between -3.5% and 3% b. Median return for stocks was about 0.1% and for real estate about 0%. c.

Stock fund has much more variability (chances to make big positive or negative)

99. a. Since mean is much larger than the median, then it would appear to be skewed right. b. The stand dev = 21.70 and

represents the average distance between individual amounts and the mean. c. Q1 = 19.27 Q3 = 45.4 so IQR = 45.4 – 19.27 =

26.13. Any points outside the fences (LF = 19.27 – 1.5*26.13 = -19.925 and UF = 45.4 + 1.5*26.13 = 84.595) would be

outliers; since max value is 93.34, then at least one high outlier.

105. Women have a higher center (mean or median) than men. Women have one high side outlier, but still have a smaller

spread (by standard dev, IQR or range) than the men. With the exception of the outlier, women’s distribution is symmetric,

but men’s has more right skewness.

pg 75-81; problems T1 1-15

1. d

2. e

3. b

4. b

5. c

6. c

7. b

8. c

9. e

10. b

11. d

12. b. Q1 = 30 Q3 = 77; IQR = 77 – 30 = 47; 151 is an outlier because it falls outside the UF(77+1.5*47 = 147.5). No values

below 0 (LF = 30 – 1.5*47 = -40.5) so all above lower fence

13. d. There appears to be an association. Those with diabetes have a much higher rate of babies with birth defects than non

or pre diabetics

14. a. Longest any battery lasted was between 550-559 hours b. Brand X has a higher minimum lifetime (for people who are

risk averse) c. Brand Y has a higher median lifetime and would probably appeal to more people.

15. Both distributions are reasonably symmetric; AL has no outliers, but NL has two outliers (29, 31). All NL centers (mean

and median) are less than AL; but AL has more variability (because of higher standard deviation and range)

Page 20: Chapter 1: Exploring Data - Quia 1: Exploring Data SYM 4th Edition Notes 5 Ex. 1 Construct a bar graph and a pie chart from the following data: Format Nr of Stations Pct Adult contemporary

Chapter 1: Exploring Data

SYM 4th

Edition Notes 20

Mr. Starnes’s Infamous AP Statistics Survey

This is an anonymous survey for our AP Statistics class. Do not put your name on it or show it to anyone

else! I will not share information in any way that reveals who you are.

Amount of sleep you got last night: ____________ hours Gender: ________

# of siblings (brothers/sisters) you have: __________ Hair color: __________

Height: ___________(in inches) Cumulative GPA: __________

SAT Math score: ___________ SAT CR score: ___________

# of AP classes you are taking this year: __________ Birth date: ___________________

Favorite fast-food restaurant: _____________________ Left- or right-handed: ____________

Ounces of soda consumed yesterday: __________ Pulse rate: ___________

Days since you last consumed food from a fast-food restaurant: ___________

Do you believe that fast-food restaurants should be held responsible for obesity resulting from eating too much

fast-food? Circle: Yes or No

Time spent yesterday: on a computer __________

on the Internet __________

watching TV ___________

Thinking back to the most recent time you went to the bathroom; did you wash your hands immediately

afterward? Circle: Yes or No

Which of the following best describes what percent of the time you wash your hands immediately after using the

bathroom? 100% 75%–99% 50%–74% 25%–49% 0%–24%

Choose one of the following numbers. Then circle your choice. 1 2 3 4

Randomly choose one of the letters S or Q. Circle your choice.

Imagine a sequence of 10 tosses of a fair penny. Write down a sequence of outcomes that you believe is likely to

occur. Use H for heads and T for tails.

Guess your teacher’s age: _______