chapter 1: exploring data - quia 1: exploring data sym 4th edition notes 5 ex. 1 construct a bar...
TRANSCRIPT
Chapter 1: Exploring Data
SYM 4th
Edition Notes 1
Chapter 1: Exploring Data
Objectives: Students will:
Use a variety of graphical techniques to display a distribution. These should include bar graphs, pie charts, stemplots,
histograms, ogives, time plots, and Boxplots
Interpret graphical displays in terms of the shape, center, and spread of the distribution, as well as gaps and outliers
Use a variety of numerical techniques to describe a distribution. These should include mean, median, quartiles, five-
number summary, interquartile range, standard deviation, range, and variance
Interpret numerical measures in the context of the situation in which they occur
Learn to identify outliers in a data set
AP Outline Fit:
I. Exploring Data: Describing patterns and departures from patterns.
A. Constructing and interpreting graphical displays of distributions of univariate data (dotplot, stemplot, histogram,
cumulative frequency plot covered in chapter 2)
1. Center and spread
2. Clusters and gaps
3. Outliers and other unusual features
4. Shape
B. Summarizing distributions of univariate data
1. Measuring center: median, mean
2. Measuring spread: range, interquartile range, standard deviation
3. Measuring position: quartiles, percentiles, standardized scores (z-scores covered in chapter 2)
4. Using boxplots
5. The effect of changing units on summary measures (covered in chapter 2)
C. Comparing distributions of univariate data (dotplots, back-to-back stemplots, parallel boxplots)
1. Comparing center and spread: within group, between group variation
2. Comparing clusters and gaps
3. Comparing outliers and other unusual features
4. Comparing shapes
E. Exploring categorical data
1. Frequency tables and bar charts
2. Marginal and joint frequencies for two-way tables (covered in chapter 5)
3. Conditional relative frequencies and association
4. Comparing distributions with bar charts.
Calculator Functions in this Chapter:
1. Using Lists for data
2. Graphing
a. Scatter plots
b. Histograms
c. Box plots
3. Summary Statistics
a. Mean
b. Standard Deviation
c. Five-Number Summary
Chapter 1: Exploring Data
SYM 4th
Edition Notes 2
What You Will Learn:
A. Displaying Distribution
1. Make a stemplot of the distribution of a quantitative variable.
2. Trim the numbers or split stems as needed to make an effective stemplot
3. Make a histogram of the distribution of a quantitative variable
4. Construct and interpret an ogive of a set of quantitative data
B. Inspecting Distributions (Quantitative Variables)
1. Look for the overall pattern and for major deviations from the pattern
2. Assess from a dotplot, stemplot, or histogram whether the shape of a distribution is roughly symmetric,
distinctly skewed, or neither.
3. Assess whether the distribution has one or more major modes
4. Describe the overall pattern by giving numerical measures of center and spread in addition to a verbal
description of shape
5. Decide which measures of center and spread are more appropriate: the mean and standard deviation (especially
for symmetric distributions) or the five-number summary (especially for skewed distributions)
6. Recognize outliers
C. Time Plots
1. Make a time plot of data, with the time of each observation on the horizontal axis and the value of the observed
variable on the vertical axis
2. Recognize strong trends or other patterns in a time plot
D. Measuring Center
1. Find the mean, x-bar, of a set of observations
2. Find the median M of a set of observations
3. Understand that the median is more resistant (less affected by extreme observations) than the mean.
4. Recognize that skewness in a distribution moves the mean away from the median toward the long fall (tail)
E. Measuring Spread
1. Find the quartiles Q1 and Q3 for a set of data
2. Give the five-number summary and draw a boxplot, assess center, spread, symmetry, and skewness from a
boxplot.
3. Determine outliers
4. Using a calculator or software, find the standard deviation, s, for a set of observations
5. Know the basic properties of s:
a) s ≥ 0 always;
b) s = 0 only when all observations are identical; s increases as the spread increases;
c) s has the same units as the original measurements;
d) s is increased by outliers or skewness
F. Comparing Distributions
1. Use side-by-side bar graphs to compare distributions of categorical data
2. Make back-to-back stemplots or side-by-side Boxplots to compare distributions of quantitative variables
3. Write narrative comparisons of the shape, center, spread, and outliers for two or more quantitative distributions
Chapter 1: Exploring Data
SYM 4th
Edition Notes 3
Section 1.0: Data Analysis: Making Sense of Data
Objectives: Students will be able to:
Identify the individuals and variables in a set of data
Classify variables as categorical or quantitative
Identify units of measurement for a quantitative variable
Vocabulary:
Individuals – objects described by a set of data; maybe people, animals or things
Variable – any characteristic of an individual; can take on different values for different individuals
Categorical variable – places an individual into one of several groups or categories
Quantitative variable – takes numerical values; for which it makes sense to find an average!
Distribution – tells us what values the variable takes on and how often it takes these values
Inference – using a sample of data to infer (to draw conclusions) about a larger group of data
Key Concept:
Data Exploration:
a) Begin by examining each variable by itself
b) Study the relationship between the variables
c) Use plots or graphs
d) Add numerical summaries
Activity:
Hiring Discrimination on pg 5
Read the scenario and answer the following question:
Reverse discrimination: Yes or No
Break into pairs and follow the instructions in the activity
One person work with the cards
One person acts as a recorder of the data and fills in the following table:
Captains Trial 1 Trial 2 Trial 3 Trial 4 Trial 5
Male
Female
Did your opinion change after the class’s data was displayed?
Examine the homework from last night:
What variables on the information sheet were categorical? Quantitative?
What variables on Mr. Starnes’s survey were categorical? Quantitative?
Summary:
A data set contains information on a number of individuals
Information is often values for one or more variables
Variables can be categorical or quantitative
Distribution of a variable describes what values it can take on and how often
Homework: pg 7-8, problems 1, 3, 5, 7, 8
Chapter 1: Exploring Data
SYM 4th
Edition Notes 4
Section 1.1: Analyzing Categorical Data
Objectives: Students will be able to:
Make a bar graph of the distribution of a categorical variable or, in general, to compare related quantities
Recognize when a pie chart can and cannot be used
Identify what makes some graphs deceptive
From two-way table of counts, answer questions involving marginal and conditional distributions
Describe the relationship between two categorical variables by computing appropriate conditional distributions
Construct bar graphs to display the relationship between two categorical variables
Vocabulary:
Association – two variables are associated if specific values of one variable tend to occur in common with specific values
of the other variable
Bar graph – displays the distribution of a categorical variable
Bimodal – a distribution whose shape has two peaks (modes)
Conditional distribution – describes the values of a variable among individuals who have a specific value of another
variable; separate distributions for each value of the other variable.
Histogram – breaks range of values into classes and displays their frequencies
Frequency – counts of data in a class
Frequency table – table of frequencies
Marginal distribution – the distribution of just one variable in a two-way table (found at the right or bottom)
Modes – major peaks in a distribution
Ogive – relative cumulative frequency graph
Pie chart – chart that emphasize each category’s relation to the whole
Roundoff error – errors associated with decimal inaccuracies
Simpson’s paradox – an association between two variables that holds for each individual value of a third variable can be
changed or even reversed with the data for all values of the third variable are combined.
Two-way table – describes two categorical variables, organizing counts according to a row variable and a column
variable
Unimodal – a distribution whose shape with a single peak (mode)
Key Concepts:
Categorical Charts
0
2
4
6
8
10
12
14
Ba
ck
Wri
st
Elb
ow
Hip
Sh
ou
lde
r
Kn
ee
Ha
nd
Gro
in
Ne
ck
Rehab
Rehab
Back
40%
Wrist
7%Elbow
3%
Hip
7%
Shoulder
13%
Knee
17%
Hand
7%
Groin
3%
Neck
3% Pie ChartBar Chart
Chapter 1: Exploring Data
SYM 4th
Edition Notes 5
Ex. 1 Construct a bar graph and a pie chart from the following data:
Format Nr of Stations Pct
Adult contemporary 1,556 11.2
Adult standards 1.196 8.6
Contemporary Hits 569 4.1
Country 2,066 14.9
News/Talk/Info 2,179 15.7
Oldies 1,060 7.7
Religious 2,014 14.6
Rock 869 6.3
Spanish Language 750 5.4
Other formats 1,579 11.4
Total 13,838 99.9
Ex. 2 Below are the results of a survey of young adults by gender:
Young adults by gender and chance of getting rich
Female Male Total
Almost no chance 96 98 194
Some chance, but probably not 426 286 712
A 50-50 chance 696 720 1416
A good chance 663 758 1421
Almost certain 486 597 1083
Total 2367 2459 4826
a) What are the variables described by this two-way table?
b) How many young adults were surveyed?
Ex. 3 Why can’t we get totals in the super-powers data?
Homework: pg 22-6, problems 12, 16, 20, 23, 24
Chapter 1: Exploring Data
SYM 4th
Edition Notes 6
Section 1.2: Displaying Quantitative Data with Graphs
Objectives: Students will be able to:
Make a dotplot or stemplot to display small sets of data
Describe the overall pattern (shape, outliers – major departures from the pattern, center, and spread) of a distribution
Make a histogram with a reasonable choice of classes
Identify the shape of a distribution from a dotplot, stemplot or histogram (roughly symmetric or skewed – right/left)
Identify the number of modes of a distribution
Interpret histograms
Vocabulary:
Back-to-back stemplot – two distributions plotted with a common stem
Bimodal – a distribution whose shape has two peaks (modes)
Dotplot – each data point is marked as a dot above a number line
Histogram – breaks range of values into classes and displays their frequencies
Frequency – counts of data in a class
Modes – major peaks in a distribution
Multimodal – a distribution whose shape has more than two peaks (modes)
Ogive – relative cumulative frequency graph
Seasonal variation – a regular rise and fall in a time plot
Skewed – if smaller or larger values from the center form a tail
Splitting stems – divides step into 0-4 and 5-9
Stemplot – includes actual numerical values in a plot that gives a quick picture of the distribution
Symmetric – if values smaller and larger of the center are mirror images of each other
Time plot – plots a variable against time on the horizontal scale of the plot
Trimming – removes the last digit or digits before making a stemplot
Unimodal – a distribution whose shape with a single peak (mode)
Key Concepts:
Dot plots: Maintains the raw data, while histograms do not maintain the raw data
Best used when the data sets are small
Stem and Leaf plots: Maintains the raw data, while histograms do not maintain the raw data
Best used when the data sets are small
Chapter 1: Exploring Data
SYM 4th
Edition Notes 7
Histograms and Bar Graphs: Bar graphs have bars touching; histograms don’t
The number of classes, k, to be constructed can be roughly approximated by
k = number of observations
To determine the width of a class use w = maximum - minimum
k
and always round up to the same decimal units as the original data.
Other Charts: Charts for both types of data
00.05
0.1
0.150.2
0.250.3
0.350.4
0.45
Ba
ck
Wri
st
Elb
ow
Hip
Sh
ou
lde
r
Kn
ee
Ha
nd
Gro
in
Ne
ck
Pe
rce
nt
Rehab
00.05
0.1
0.150.2
0.250.3
0.350.4
0.45
Ba
ck
Kn
ee
Sh
ou
lde
r
Wri
st
Ha
nd
Hip
Elb
ow
Gro
in
Ne
ck
Pe
rce
nt
Rehab
Pareto ChartRelative Frequency Chart
0
0.2
0.4
0.6
0.8
1
1.2B
ac
k
Wri
st
Elb
ow
Hip
Sh
ou
lde
r
Kn
ee
Ha
nd
Gro
in
Ne
ck
Pe
rce
nt
Rehab
Cumulative Frequency Chart
Plot in the upper right corner is a Pareto chart. It is the same as the relative frequency chart; except the categories are in
relative frequency order (from largest to smallest) from left to right. This graph came from the Total Quality Management
(TQM) era in the middle to late 1980’s. The bottom chart is also known as an ogive (a confusing chart for students!).
Cautions:
• Label all axes and title all graphs
• Histogram rectangles touch each other; rectangles in bar graphs do not touch.
• Can’t have class widths that overlap
• Raw data can be retrieved from the stem-and-leaf plot; but a frequency distribution of histogram of continuous data
summarizes the raw data
• Only quantitative data can be described as skewed left, skewed right or symmetric (uniform or bell-shaped)
Uniform Mound-like (Bell-Shaped)
Skewed Left (-- tail)Bi-Modal Skewed Right (-- tail)
Frequency Distributions
Chapter 1: Exploring Data
SYM 4th
Edition Notes 8
With the following data
a) Construct a stem graph (in example 1 do a back-to-back [comparative] stem plot)
b) Construct a histogram
Ex. 1 The ages (measured by last birthday) of the employees of Dewey, Cheatum and Howe are listed below.
Office A: 22 31 21 49 26 42 42 30 28 31 39 39
Office B: 20 37 32 36 35 33 45 47 49 38 28 48
Ex. 2 Below are times obtained from a mail-order company's shipping records concerning time from receipt of order to
delivery (in days) for items from their catalogue?
3 7 10 5 14 12 6 2 9 22 25 11
5 7 12 10 22 23 14 8 5 4 7 13
27 31 13 21 6 8 3 10 19 12 11 8
Homework: pg 42-50; problems 37, 41, 43, 49, 62, 63
Chapter 1: Exploring Data
SYM 4th
Edition Notes 9
Section 1.3: Describing Quantitative Data with Numbers
Objectives: Students will:
Calculate and interpret measures of center (mean, median, mode)
Calculate and interpret measures of spread (IQR, standard deviation, range)
Identify outliers using the 1.5 x IQR rule
Make a boxplot
Select appropriate measures of center and spread
Use appropriate graphs and numerical summaries to compare distributions of quantitative variables
Vocabulary:
Boxplot – graphs the five number summary and any outliers
Degrees of freedom – the number of independent pieces of information that are included in your measurement
Five-number summary – the minimum, Q1, Median, Q3, maximum
Interquartile range – the range of the middle 50% of the data; (IQR) – IQR = Q3 – Q1
Mean – the average value (balance point); x-bar
Median – the middle value (in an ordered list); M
Mode – the most frequent data value
Outlier– a data value that lies outside the interval [Q1 – 1.5 IQR, Q3 + 1.5 IQR]
Pth
percentile – p percent of the observations (in an ordered list) fall below at or below this number
Quartile – multiples of 25th
percentile (Q1 – 25th
; Q2 –50th
or median; Q3 – 75th
)
Range – difference between the largest and smallest observations
Resistant measure – a measure (statistic or parameter) that is not sensitive to the influence of extreme observations
Standard Deviation– the square root of the variance
Variance – the average of the squares of the deviations from the mean
Key Concepts:
Measure of
Central
Tendency
Computation Interpretation When to use
Mean
μ = (∑xi ) / N
x‾ = (∑xi) / n
Center of gravity
Data are quantitative and
frequency distribution is
roughly symmetric
Median
Arrange data in ascending order;
middle value or average of the two
middle values (if even number)
Divides into
bottom 50% and top 50%
Data are quantitative and
frequency distribution is
skewed
Mode Tally data to determine most
frequent observation Most frequent observation
Data are qualitative or the
most frequent observation is
the desired measure of
central tendency
Center: The mean and the median are the most common measures of center
If a distribution is perfectly symmetric, the mean and the median are the same
The mean is not resistant to outliers
The mode, the data value that occurs the most often, is a common measure of
center for categorical data
Use the mean on symmetric data and the median on skewed data or data with
outliers
Spread:
Standard deviation is the most common measure of spread and is used only
with the mean.
Inter-quartile range (IQR = Q3 – Q1) is used with medians
Range (max – min value) is also a measure of spread.
Distributions Shape
Skewed Left:
Mean substantially smaller
than median
Symmetric:
Mean roughly equal to median
Skewed Right:
Mean substantially greater
than median
Mean ≈ Median ≈ Mode
Mean > Median > Mode
Mean < Median < Mode
Chapter 1: Exploring Data
SYM 4th
Edition Notes 10
Distribution Shape Based on Boxplots:
a. If the median is near the center of the box and each horizontal line is of approximately equal length, then the
distribution is roughly symmetric
b. If the median is to the left of the center of the box or the right line is substantially longer than the left line,
then the distribution is skewed right
c. If the median is to the right of the center of the box or the left line is substantially longer than the right line,
then the distribution is skewed left
Remember identifying a distribution from boxplots or histograms is subjective!
Why Use a Boxplot?
A boxplot provides an alternative to a histogram, a dotplot, and a stem-and-leaf plot. Among the advantages of a
boxplot over a histogram are ease of construction and convenient handling of outliers. In addition, the
construction of a boxplot does not involve subjective judgments, as does a histogram. That is, two individuals
will construct the same boxplot for a given set of data - which is not necessarily true of a histogram, because the
number of classes and the class endpoints must be chosen. On the other hand, the boxplot lacks the details the
histogram provides.
Dotplots and stemplots retain the identity of the individual observations; a boxplot does not. Many sets of data
are more suitable for display as boxplots than as a stemplot. A boxplot as well as a stemplot are useful for
making side-by-side comparisons.
Five-number summary
Min Q1 M Q3 Max
smallest
value
largest
value
Boxplot
First, Second and Third Quartiles
(Second Quartile is the Median, M)
[ ] *
Outlier
Lower
Fence
Upper
Fence
Smallest Data Value > Lower Fence Largest Data Value < Upper Fence
(Min unless min is an outlier) (Max unless max is an outlier)
Median Notes:
If the number of observations, n, is odd, then the median is the center observation in the ordered list
If the number of observations is even, then the median is the average of the two center observations
Mean Notes:
The mean is pulled toward the tail of the distribution, away from the median, when the distribution is skewed
Outliers can also pull the mean toward them
Chapter 1: Exploring Data
SYM 4th
Edition Notes 11
Example 1: Which of the following are resistant measures of central tendency:
Mean, Range
Median or Variance
Mode? Standard Deviation
IQR
Example 2: Given the following set of data:
70, 56, 48, 48, 53, 52, 66, 48, 36, 49, 28, 35, 58, 62, 45, 60, 38, 73, 45, 51,
56, 51, 46, 39, 56, 32, 44, 60, 51, 44, 63, 50, 46, 69, 53, 70, 33, 54, 55, 52
What is the mean? What is the range?
What is the median? What is the variance?
What is the mode? What is the standard deviation?
What is the shape of the distribution? What is the IQR?
What is the Q1?
What is the Q3?
What is the IQR?
What is the upper fence?
What is the lower fence?
Are there any outliers?
Example 3: Given the following types of data and sample sizes, list the measure of central tendency you would use and
explain why?
Sample of 50 Sample of 200
Hair color
Height
Weight
Parent’s Income
Number of Siblings
Age
Does sample size affect your decision?
Chapter 1: Exploring Data
SYM 4th
Edition Notes 12
Example 4: Consumer Reports did a study of ice cream bars (sigh, only vanilla flavored) in their August 1989 issue.
Twenty-seven bars having a taste-test rating of at least “fair” were listed, and calories per bar was included. Calories vary
quite a bit partly because bars are not of uniform size. Just how many calories should an ice cream bar contain?
342 377 319 353 295 234 294 286 377 182 310
439 111 201 182 197 209 147 190 151 131 151
a) Construct a boxplot of the data
b) Determine if there are any outliers.
Example 5: The weights of 20 randomly selected juniors at MSHS are recorded below:
121 126 130 132 134 137 141 144 148 205
125 128 131 133 135 139 141 147 153 213
a) Construct a boxplot of the data
b) Determine if there are any outliers.
Example 6: Using the data from example #5
a) Change the weight from pounds to kilograms and add 2 kg (special uniform)
b) Get summary statistics and compare with example 5
c) Draw a box plot
Homework: pg 70-74; problems 79, 81, 85, 89, 91, 95, 99, 105
Chapter 1: Exploring Data
SYM 4th
Edition Notes 13
Chapter 1 Review
Objectives: Students will be able to:
Summarize the chapter
Define the vocabulary used
Know and be able to discuss all sectional and chapter knowledge objectives
Be able to do all sectional and chapter construction objectives
Successfully answer any of the review exercises
Vocabulary: None new
Summary:
Always include labels, scales, legend (if needed) and title on any graph you are asked to make
Supporting graphs (that you make as supporting evidence) don’t have to have as much detail
Numerical summaries do not fully describe the shape of a distribution; Always plot your data!
Median and IQR are resistant to outliers; mean and standard deviations are not
Boxplots do not tell you everything about the distribution (especially about the specific type)
IQR is a number (the length of the box part in the boxplot)
Describe all distributions by their Shape, Outliers, Center and Spread (SOCS)
o Shape – symmetric (uniform or mound-shape) or skewed (left or right); uni-, bi- or multi-modal
o Outliers – or any unusual data values or patterns
o Center – mean (symmetric) or median(skewed) or mode (for categorical data)
o Spread – tied to center; standard deviation or IQR (or range for categorical data)
Always use comparative language (much greater, less, about the same) when comparing distributions SOCS
Homework: pg 75-81; problems T1 1-15
Plot your data
Dotplot, Stemplot, Histogram
Interpret what you see
Shape, Outliers, Center, Spread
Choose numerical summary
x and s, or median and IQR
Chapter 1: Exploring Data
SYM 4th
Edition Notes 14
1. The upper or third quartile for grades on the first calculus test was 85%. Your friend, who has not taken statistics, scored
90% on the test. Explain to your friend how her grade compares to others in her class.
2. Suppose you have test scores of 72%, 91%, 86%, and 95% in your chemistry class. What score do you need to make on
the next test in order to have an 85% average?
3. In the computational formula for standard deviation, you sometimes use n and sometimes use (n – 1). Under what
circumstances should you use n?
4. Fill in the following blanks:
a. We studied two measures of central tendency, mean and median. Which of these is the more resistant measure?
_________________ Explain why this measure is more resistant.
b. We studied three measures of spread: standard deviation, interquartile range, and range. Which of these is the most
resistant measure? ________________
5. In an experiment designed to determine the effect of a drug on reaction time, a subject is asked to press a button
whenever a light flashes. The reaction times (in milliseconds) for ten trials are:
96 101 112 138 93 99 107 93 95 100
a. Make a stem and leaf plot to display this information. Be sure to include unit information (a legend).
b. What information about the distribution does the stem and leaf plot provide? Be thorough in your response.
6. Data were collected on a sample of Deerfield Academy students. Several of the variables are listed below. Next to each
variable, put all of the following words that correctly describe the variable:
Categorical quantitative discrete continuous
(a) Advisor ______________________________
(b) Height _______________________________
(c) Number of courses student is taking this term ______________________________________
7. A teacher returned the first test to the five students in a small class. She reported that the median score was 85 and the
mean score was 84. The student with the lowest score (62) realized that the teacher had incorrectly calculated her grade
and that the correct grade was 72. Assuming that this is still be the lowest score for the seminar students, when the
teacher recomputed the summary statistics, the median will equal _____________ and the mean will equal
________________ .
Chapter 1: Exploring Data
SYM 4th
Edition Notes 15
8. The histogram below displays weight increases (in pounds) for a sample of pigs
fed a certain diet. Assume that bars include right endpoints.
How many pigs were in this sample? ___________
Estimate the median weight increase for the pigs in this sample. __________
What proportion of these pigs had a weight increase exceeding 20 pounds?
_________________
Briefly (but completely) describe the shape of this distribution
9. As I drove through Connecticut several weeks ago, I obtained a sample of prices for a gallon of unleaded gasoline at
service stations I passed. Four of these are provided here: $3.09, $3.15, $3.19, $3.29. Use the definition and show work
below to find the mean and standard deviation of these prices. Round answers to the nearest cent.
Mean
Standard deviation
10. The Los Angeles Times reported interest rates for savings accounts at a sample of California banks. Summary statistics
are provided below:
Minimum = 3.15% Q1 = 3.25% Median = 3.31% Q3 = 3.33% Maximum = 4.35%
Determine whether the data set has any outliers (check for extremely low and high values). Show work and provide an
explanation to support your answer.
Chapter 1: Exploring Data
SYM 4th
Edition Notes 16
5 minute Reviews
Lesson 1-0:
1. Gender is an example of what type of variable?
2. Age is an example of what type of variable?
3. Your zip code is an example of what type of variable?
4. What was the percentage that was our boundary for something being unusual?
Lesson 1-1:
1. To organize data on two categorical variables use a:
2. Row totals and column totals are called:
3. When we fix the value of one categorical variable and look at the distribution of the other variable it is called:
4. A variable not in the data that influences variables in the collected data is called:
5. The four-steps in statistical analysis are:
Lesson 1-2A:
1. Dot plots and stem-plots have what advantages:
2. Dot plots and stem-plots are impractical when:
3. What pieces of SOCS can be seen in dot and stem-plots?
4. Compare the following distributions:
Office A Ages of Personnel Office B
1, 2, 6, 8 2 0, 8
0, 1, 1, 9, 9 3 2, 3, 5, 6, 7, 8
2, 2, 9 4 5, 7, 8, 9
Chapter 1: Exploring Data
SYM 4th
Edition Notes 17
Lesson 1-2B:
1. What 4 terms are used to describe data sets or distributions?
2. Which type of graph can our calculators do (bar or histogram)?
3. How many classes should a histogram have?
4. What needs to be looked for in time-series graphs?
5. What is the major difference between a histogram and a stem-plot?
6. Name a possible graphical error in a histogram
Lesson 1-3A:
1. What are the two quantitative measures of center?
2. When do we use one versus the other?
3. Which one is resistant to outliers?
4. Which measure of center is used for qualitative data?
5. Find the mean, median and mode of the following data set: 7, 15, 4, 8, 16, 17, 2, 5, 11, 8, 12, 6
Chapter 1: Exploring Data
SYM 4th
Edition Notes 18
Homework:
pg 7-8, problems 1, 3, 5, 7, 8
1. Cat: type of wood and water repellent, paint color Quan: paint thickness, weathering time
3. a. stat students completed survey b. Cat: gender, handedness, fav music; Quan: ht, homework time, total value of money
c. Female, right handed 58 inches tall, 60 minutes on homework, alternative music and 76 cents in change
5. Quan: percentage rates for retention and graduation rates; Cat: region of country, school type
7. B 8. C
pg 22-6, problems 12, 16, 20, 23, 24
12. You need to know total deaths to compute an “other” to get to 100%
16. a. Movie attendance seems to drop off as people get older; b. No, given percentages represent fractions of different age
groups rather than parts from a single whole. c. We don’t know how many people are in each age group
20. a. 5375 students total; 18.68% smoke b. marginal dist:
Neither One Both
Count 1356 2239 1780
% 25.23 41.66 33.12
23. Americans are much more likely to choose white or pearl and read, while Europeans are more likely to choose silver,
black or gray
24. b. heights of the bars in the first graph are different for every color except silver; biggest differences in white or pearl
Second graph shows black are most popular choice for luxury and white with SUVs
pg 42-50; problems 37, 41, 43, 49, 62, 63
37. Roughly symmetric, center of 6 and a range of 8 with no apparent outliers
41. a. tail must be to the left b. Fewer older coins in people’s pockets
43. Both distributions are roughly symmetric with a range of about 20 and a cluster to the left. Internal center is 21 while the
external is lower at 17.5. Ratings are on average higher for the internal
49. a. Most students seem to round to either 10 minute increments or ½ or 1 hour increments. Two students claimed to have
studied 5 and 6 hours (not likely) b. Yes; center for women is apx 175, which is greater than the 120 minutes for men
62. Should be near 10 for each of the six values (extremely rare for them all to be ten!!)
63. a. round-off error b. relative frequency diagrams would be best since much men than women c. Both histograms are
skewed to the right and generally women’s salaries are lower than men’s. Peak for women is between 20 and 25K, while
men’s peak is between 25 and 30K. Range is about the same.
Chapter 1: Exploring Data
SYM 4th
Edition Notes 19
pg 70-74; problems 79, 81, 85, 89, 91, 95, 99, 105
79. Mean is 85. Add scores and divide by 14
81. a. Median is the average of the 7th
and 8th
values (84+86)/2 = 85 b. Median is 84, but mean drops to 79.33. Mean is not
resistant to outliers
85. Total payroll = 30 Million. With Median you can’t calculate the sum of all 25 salaries!!
89. a. IQR = 91 – 78 = 13 (middle 50% has a spread of 13 points) b. No outliers. No values below LF (78 – 1.5*13 = 58.5)
or above UF (91 + 1.5*13 = 110.5)
91. Boxplot is skewed right with one high outlier. Half of students sent less than 10 texts a day, way less than the average of
58 texts per day (1742/30) claimed by the article.
95. a. Stock varied between -3.5% and 3% b. Median return for stocks was about 0.1% and for real estate about 0%. c.
Stock fund has much more variability (chances to make big positive or negative)
99. a. Since mean is much larger than the median, then it would appear to be skewed right. b. The stand dev = 21.70 and
represents the average distance between individual amounts and the mean. c. Q1 = 19.27 Q3 = 45.4 so IQR = 45.4 – 19.27 =
26.13. Any points outside the fences (LF = 19.27 – 1.5*26.13 = -19.925 and UF = 45.4 + 1.5*26.13 = 84.595) would be
outliers; since max value is 93.34, then at least one high outlier.
105. Women have a higher center (mean or median) than men. Women have one high side outlier, but still have a smaller
spread (by standard dev, IQR or range) than the men. With the exception of the outlier, women’s distribution is symmetric,
but men’s has more right skewness.
pg 75-81; problems T1 1-15
1. d
2. e
3. b
4. b
5. c
6. c
7. b
8. c
9. e
10. b
11. d
12. b. Q1 = 30 Q3 = 77; IQR = 77 – 30 = 47; 151 is an outlier because it falls outside the UF(77+1.5*47 = 147.5). No values
below 0 (LF = 30 – 1.5*47 = -40.5) so all above lower fence
13. d. There appears to be an association. Those with diabetes have a much higher rate of babies with birth defects than non
or pre diabetics
14. a. Longest any battery lasted was between 550-559 hours b. Brand X has a higher minimum lifetime (for people who are
risk averse) c. Brand Y has a higher median lifetime and would probably appeal to more people.
15. Both distributions are reasonably symmetric; AL has no outliers, but NL has two outliers (29, 31). All NL centers (mean
and median) are less than AL; but AL has more variability (because of higher standard deviation and range)
Chapter 1: Exploring Data
SYM 4th
Edition Notes 20
Mr. Starnes’s Infamous AP Statistics Survey
This is an anonymous survey for our AP Statistics class. Do not put your name on it or show it to anyone
else! I will not share information in any way that reveals who you are.
Amount of sleep you got last night: ____________ hours Gender: ________
# of siblings (brothers/sisters) you have: __________ Hair color: __________
Height: ___________(in inches) Cumulative GPA: __________
SAT Math score: ___________ SAT CR score: ___________
# of AP classes you are taking this year: __________ Birth date: ___________________
Favorite fast-food restaurant: _____________________ Left- or right-handed: ____________
Ounces of soda consumed yesterday: __________ Pulse rate: ___________
Days since you last consumed food from a fast-food restaurant: ___________
Do you believe that fast-food restaurants should be held responsible for obesity resulting from eating too much
fast-food? Circle: Yes or No
Time spent yesterday: on a computer __________
on the Internet __________
watching TV ___________
Thinking back to the most recent time you went to the bathroom; did you wash your hands immediately
afterward? Circle: Yes or No
Which of the following best describes what percent of the time you wash your hands immediately after using the
bathroom? 100% 75%–99% 50%–74% 25%–49% 0%–24%
Choose one of the following numbers. Then circle your choice. 1 2 3 4
Randomly choose one of the letters S or Q. Circle your choice.
Imagine a sequence of 10 tosses of a fair penny. Write down a sequence of outcomes that you believe is likely to
occur. Use H for heads and T for tails.
Guess your teacher’s age: _______