statistics unit lesson 1: collecting data statistics lesson 1: collecting data learning targets: i k...
TRANSCRIPT
Statistics Unit
Lesson 1: Collecting Data
Statistics Lesson 1: Collecting Data
Learning Targets: I know the meaning of “a sample from a population” and “a
census of a population”. (S3.1.1) I can distinguish between sample statistics and population
parameters. (S3.1.1)
I can use samples to make inferences about populations and determine relationships and interpret data. (S3.1.1)
I know the effect of replication on the precision of estimates. (S3.1.2)
I can identify possible sources of bias in data collection and sampling methods and simple experiments. (S3.1.2)
I can explain the impact of bias on conclusions made from analysis of data (margin of sampling error) (S3.1.2)
Statisticsbranch of math dealing with collection, organization, analysis and interpretation of information called data
Consider this situation: The medical lab tech gets an order for counting the number of white blood cells in a patient’s blood.
The “variable” is what might vary. It’s what can be classified or counted or measured.
In this case ex., the number of white blood cells.
The “population” is the set of all objects you want to study.
In this case, the population is “all the patient’s blood.”
Consider this situation: The medical lab tech gets an order for counting the number of white blood cells in a patient’s blood.
If the lab tech takes out all the patient’s blood to analyze it, the patient will die. This is not what the lab tech wants. So she decides to use a “sample”.
A “sample” is the part of the population you can actually study.
In this case, the sample is taking out “some” of the patient’s blood.
Example 1: The Student Senate is counting the number of students wearing black and white for Spirit Week. There are 2200 total students enrolled at West Ottawa High School. The Student Senate President counts the number of students wearing black and white in a randomly selected classroom containing 30 students.
Population?________________
Sample? _________________ Variable? ___________________
Consider this situation: The medical lab tech gets an order for counting the number of white blood cells in a patient’s blood.
Sometimes statisticians are not in the medical field. A political scientist might want to know what people think of a health care issue like giving swine flu immunizations.
In this case, taking blood won’t cut it.
Survey: give a questionnaire or gather answers to a
question in an interview.
census: complete list of all the values in a population (ex., US Census)
Random Samples
Samples can be taken randomly, in a way so that every member of the population has an equal chance of being chosen.
In our example, the blood draw would be of random blood cells.
This will enable us to estimate information about the population.
Bias
If a sample is not chosen randomly from a population, the data from the sample may not apply to the population.
If a sample is not random, and therefore not representative of the population, it is said to be biased.
Example 2: Determine whether each situation would produce a random sample. Write yes or no and explain your answer. a. surveying of students at the prom of whether or not they like to dance b. polling every 5th person who walks into the mall about what is their favorite color
Margin of Sampling Error
If the percent of people in a sample responding in a certain
way is p and the size of the sample is n, then 95% of the
time, the percent of the population responding in that same
way will be between p - ME and p + ME, where
ME 2p(1 p)
n
Example 3:
In a survey of 120 randomly selected students, 37% answered “yes” to lying to their parents in the past week.
What is the margin of error?
What does the margin of error mean?
ME 2p(1 p)
n
This margin of error means that with _____% accuracy
the actual percent of people who had lied to their parents
in the past week is between _____% and _____%.
Your Turn 3:
In a survey of 240 randomly selected adults, 85%answered “no” to smoking in the past week.
What is the margin of error?
ME 2p(1 p)
n
This margin of error means that with _____% accuracy
the actual percent of people who had not smoked cigarettes
in the past week is between _____% and _____%.
What does the margin of error mean?
Example 4:In a survey, 30% of the people surveyed said they had smoked cigarettes in the past week. The margin of error was 2%.
How many people were surveyed?
1) Substitute #s into the equation
NOTE: ME as a decimal
2) Solve for n:
i) divide by 2
ii) square both sides
iii) mult. both sides by n
iv) divide by the ME value
ME 2p(1 p)
n
Your Turn 4:In an earlier survey, 32% of the people surveyed said they did not complete all their math homework the past week. The margin of error was 4.5%.
How many people were surveyed?
I can use samples to make inferences about populations anddetermine relationships and interpret data. (S3.1.1)
Capture-recapture method: 1. capture, tag and release
2. Recapture and count
populationtotal
populationtheintagged
samplerecapthein
recaptureintaggedof
#
. #
#
Statistics: Collecting Data
The repetition of an experiment or observation in the same or similar conditions.
Replication adds information about the reliability of conclusions to be drawn from the data.
replication:
Example 5: In order to estimate the number of salmon in Little John Lake, L.J. captured and carefully tagged 62 salmon. He then released them. The next month, he caught 149 salmon, of which 23 were tagged. About how many salmon were in the lake?
populationtotal
populationtheintagged
samplerecapthein
recaptureintaggedof
#
. #
#
Your Turn 5:In order to estimate the number of perch in Sapphire Lake, Tim captured and carefully tagged 23 perch. He then released them. The next month, he caught 62 perch, of which 14 were tagged. About how many perch were in the lake?
I know the effect of replication on the precision of estimates.
Assignment:
Worksheet 1
Warm-Up:Health In an earlier survey, 25% of the people surveyed said they had exercised in the past week. The margin of error was 3%.
a. What does the 3% indicate about the results?This margin of error means that with _____% accuracy the actual percent of people who had exercised in the past week is between _____% and _____%.
b. How many people were surveyed?
I can use samples to make inferences about populations anddetermine relationships and interpret data. (S3.1.1)
Statistics Lesson 2:
Tables, Bar Graphs, and Circle Graphs
Learning Targets:
I can read and interpret tables, bar graphs and circle graphs. (S1.1.1)
I can draw graphs to display data. (S1.1.1)
Example 1:Table Graph
Pie Charts/Circle GraphsPercent of Students Wanting Uniforms at
School
yes
no
unsure
Using your knowledge of a circle, what percent do youthink answered “yes” to the question of having uniformsat school?
Either “no or unsure”?
“Unsure” alone?
Ex 2Circle Graphs
1. If the workforce in 1973 was 105,200,000 individuals, what number of those were white collar workers?
Farm workers?
2. Describe the change in farm labor from 1958 to 1973.
3. If the workforce in 1958 was 82,500,000, what number of these were blue collar workers?
Your Turn 2Circle Graphs
Bar Graphs:
1. One axis labels categories or variables
2. The other axis usually a numerical scale
3. Categories are identified and labeled
4. A legend often given for clarity
5. In order to portray relations between data accurately, numerical scales should begin with zero
Example 3: Horizontal Bar Graph:
Questions: 1. How many chemical peels were done in 2001? 2. How many more Botox injections than collagen injections were done in 2001?
Your Turn 3: Bar Graph - Multiple Bars
Questions: 1. In what years did girls use the internet more than boys at Redwood? 2. What general trend(s) do you notice?
Statistics Lesson 3: Other Displays
Learning Targets:
I can calculate measures of spread for data sets.
I can use statistics to describe data sets or to compare
and contrast data sets.
I can read and interpret bar graphs and coordinate graphs.
I can draw graphs to display data.
3: Other Displays
Check out the scales on the graphs. Which give a more accurate picture of the rat of change of the population? Why?
Average Rate of Change:
Average rate of change is the slope of the segment.
yx
Slopes on Intervals:
positive slope on an interval
negative slope on an interval
zero slope on an interval
35048 3502334710 34748 34637
34429
32030
31500
32000
32500
33000
33500
34000
34500
35000
35500
1999 2000 2001 2002 2003 2004 2005 2006 2007
Popu
lati
on
Year
Holland Population population
Slopes on IntervalsAlso, when the graph slants upward as you read from left to right, the slope is positive. When the graph slants downward, its slope is negative. When a graph is horizontal, its slope is zero. When a graph has a positive slope on some interval, it is said to be increasing on that interval. Likewise, when a graph has a negative slope on an interval, it is said to be decreasing on that interval. When the slope is zero, it is said to be constant on that interval.
Example (Back to Boston from example 1)
Calculate the average rate of change in the population of Boston in the time interval:
a) between 1850 and 1900
b) between 1950 and 1960
Stem & Leaf:Similar to a bar graph (stems are like categories and the number of leaves is the number of grades in that category.) But, better than a bar graph, the individual data values are not lost. The value 68 is circled.
For example, you can clearly see the following:
- maximum
- minimum
- range (difference between highest and lowest)
- clusters (bunches of similar scores)
- outliers (scores very different from the rest)
Back to Back Stem & Leaf:
range
# students
# in 80s
outliers?
Closure (Lesson 3)
The chart above shows daily temperatures in New York City. a. What is the average rate of change between day 2 and day 3? Is that interval increasing or decreasing?
b. What is the average rate of change between day 5 and day 6? Is that interval increasing or decreasing?
Assignment
Worksheet 3
The pie chart above shows the ingredients used to make a sausage and mushroom pizza weighing 1.6 kg.
a. What ingredient was used the least?
b. How much cheese was used to make the pizza?
c. How much sausage was used to make the pizza?
I can read and interpret circle graphs.Warm-Up
Warm-UpMake a stem and leaf plot showing the day of the month class
members were born (Ex: December 17 would be a “17”.) Write your “date” on the post-it as 7 and put it on the screen (in the correct location).
0
1
2
3
Then find the maximum, minimum, and range of the data.
I can draw graphs to display data.
Statistics Lesson 4: Measure of Center
I can calculate measures of center for data sets. (S1.2.1)
I can use summation notation to represent a sum or mean. (intro) (S1.2.3)
I can describe relations between measures of center. (S1.1.1)
I can use statistics to describe data sets or to compare or contrast data sets. (S1.2.1)
Measures of Center:
measures of center
measures of central tendency
numbers that describe typical values in a data set
Meanmean = arithmetic average
the sum of the data divided by the number of items in the data set
Example?
my bank account over the months of
June $450
July $275
August $400
Medianmedian = middle value of a set of data placed in increasing order
Example
same bank account: $450, $275, $400
What if the data set has an even number?
25, 30, 40, 52
Take the average of the two middle numbers!
Mode
mode = the most common item in the data set
mode is always a member of the data set
Mode is not considered to be a measure of center of a data set because it could be an extreme value.
Back to Back Stem & Leaf
measure 1st hour 3rd hour
mean
median
mode
Calculator StepsFind the STAT button.
1. input list of data2. easiest to use default lists
STAT-->Calc--->1-var stat ENTER
Wacky Widget Company
Enter lists and use your calculator to find the mean and median salary. (or do it by hand!)Mean salary:
Median salary:
Mean? Median? Mode ? Which measure would be most meaningful in each situation?
a tailor stocking shirts
a teacher looking at exam results
a city council member budgeting local income tax
Sigma Notation:The sum of the x-sub-i’s as i goes from a to b.
x iia
b
i = index (It indicates the position of a number in an ordered list.)
a = first number to evaluate
b = last number to evaluate
Sigma Notation Problems:
Evaluate the following:
6
2
5
1
ii
ii
x
x
x1=10
x 2=12
x 3=3
x 4=5
x 5=15
x 6=20
6
3
2
23
1
ii
ii
x
x
Closure:1. The table shows the number of nations represented in the Summer Olympic Games from 1960 through 2004. Find the mean, median, mode and range of the data. Which do you think best represents the data? Explain.
Year Nations1960 831964 931968 1121972 1211976 921980 801984 1401988 1591992 1691996 1972000 1992004 201
Mean:Median:Mode:Range:Which do you think best represents the data? Explain.
2. Evaluate the following.
a. b.
14,7,5,2,8,3 654321 xxxxxx
5
1iix
26
4iix
Warm-Up:
Find the mean, median, mode & range for the data giving the height of the varsity basketball team at Hill High:
# players height (in.)2 672 681 701 712 728 781 79
Mean:
Median:
Mode:
Range:
Which do you think best Represents the data? Explain.
Statistics Lesson 5: Quartiles, Percentiles & Pox Plots
Learning Targets I can calculate measures of center and spread for data sets.
(S1.2.3) I can describe relations between measures of center and
measures of spread. (S1.2.3)
I can use statistics to describe data sets or to compare and contrast data sets. (S1.2.3)
I can read and interpret box plots. (S1.1.1 and S1.2.1 and S1.2.3)
I can draw graphs to display data. (S1.1.1)
Stats Lesson 5: Quartiles, Percentiles & Box Plots
When describing data it is useful to describe both central values (mean and median) and how much the data are spread out from the center (range and quartiles).
measures of center measures of spread
Rank-Ordered Datadata sequenced in order to help organize the information
First, examine the spread to look at how the data varies. Think about a number line.
Think about high/low, then think about middle values.
range = highest # - lowest #
Quartiles are values which divide a rank-ordered (from lowest to highest) set of data into four sets of approximately equal size
Minimum: Lowest #
1st Quartile (Q1 or lower quartile):
the median of the numbers below the data’s median (half of the first ½ of data)
MEDIAN (Q2)
3rd Quartile (Q3 or upper quartile):
the median of the numbers above the data’s median (half of the 2nd half of data)
Maximum: Highest #
The 5-Number Summary gives a lot of information about the data set.
Interquartile Range
IQR = Q3 - Q1gives a measure of spread around the center of the data, gives a range in which you find the
middle 50% of the data
Box Plot (Box and Whiskers)A box plot is a visual representation of the 5
number summary of a data set.
Steps:
1. draw a number line with min/max values
2. draw a rectangle with outside edges at Q1 and Q3 (these edges are often called hinges)
3. inside rectangle, draw vertical line at the median
4. draw segments from the middle of the hinges to the min and max values (whiskers)
A Money Example!Job Title SalaryPresident $250,000
VP $100,000Warehouse Supervisor $60,000
Sales Supervisor $60,000Sales Representative NE $40,000Sales Representative NW $40,000Sales Representative SE $40,000Sales Representative SW $40,000
Secretary to President $25,000Secretary to VP $20,000
Warehouse worker $20,000Warehouse Worker $20,000
Custodian $18,000Custodian $16,000Custodian $16,000
A Money Example!Now, let’s make a box-and whisker plot.
Min: __ Q1: ___ Median: ___ Q3: ___ Max: ___
16 16 18 20 20 20 25 40 40 40 40 60 60 100 250
median
{This is called the
first quartileQ1
{median of
numbers belowthe median
median ofnumbers above
the median
This is called thethird quartile
Q3
Box-and-whiskers plot16 16 18 20 20 20 25 40 40 40 40 60 60 100 250
0 50 100 150 200 250
Med
40
Q1 20
Q1 20
Min
. 16
Max 250
“Reading” a Box PlotWhat can you say about this data from reading the box plot?
Half of data is between _____________ & _____________
Minimum: _____________ Median: _____________ Maximum: _____________
P-th PercentileThe pth percentile of a set of numbers is a value in the set such that p percent of the numbers are less than or equal to that value.
In other words,
A percentile tells what percent of the values in the data set are less than or equal to the value you are considering.
Percentile measures position from the bottom.
To calculate a percentile
1. Count the spot where your value is in the data set (i.e., the fourth spot)
2. Divide that number by the total number of values.
3. Multiply by 100 and round.
Find a percentile
Find the percentile rank of the value 89.
43 52 65 66 67 68 70 70 71 72 73 74 7575 76 78 78 78 78 79 80 82 85 87 87 8889 90 90 90 92 93 94 94 98
1. Count the spot where your value is in the data set _______
2. Divide that number by the total number of values. _____÷_____
3. Multiply by 100 and round. _____x100 ~ ______th percentile
Find a value in a data set at a particular percentile
What test score is at the 20th percentile? 43 52 65 66 67 68 70 70 71 72 73 74 7575 76 78 78 78 78 79 80 82 85 87 87 8889 90 90 90 92 93 94 94 98
Now we go backwards, from step 3 to step 1:
3. Divide by 100 ________
2. Times the total number _______
• Count the spot ______,
so the test score at the 20th percentile is _____
Your Turn:a. Find the percentile rank of the value 73 in the data above.
b. Find what test score is at the 83th percentile
43 52 65 66 67 68 70 70 71 72 73 74 7575 76 78 78 78 78 79 80 82 85 87 87 8889 90 90 90 92 93 94 94 98
More practice with test scoresYour Turn:
1. What’s the mode? ________
2. What is the range? _________
3. Five number summary:
Min:___, Q1:___, Median____, Q3____, Max:____
4. What is the intequartile range (IQR)?
43 52 65 66 67 68 70 70 71 72 73 74 7575 76 78 78 78 78 79 80 82 85 87 87 8889 90 90 90 92 93 94 94 98
Outlier FormulaOutliers are values that are quite different from the rest of the data. If numbers are outside of these limits, they are outliers.
IQR=Q3-Q1Formula to find the limits:
Q3 + (1.5 x IQR) for upper outliers
Q1 – (1.5 x IQR) for lower outliers
Sketch the box plotFind the outliers and sketch the box plot, indicating
outliers with dots beyond the whiskers.
lower outliers upper outliers
Q1 – (1.5 x IQR) Q3 + (1.5 x IQR)
Are there any numbers OUTSIDE of the EXTENDED limits?
Assignment
Worksheet 5
Complete this
(check the box plots for outliers!)
Warm-UpMake a box plot for the following data: 2, 17, 18, 24, 16, 19, 25, 36, 27, 27, 18, 24
0 2 10 20 30 40
IQR = _______
Q3 + 1.5 x IQR for upper outliers __________________________
Q1 – 1.5 x IQR for lower outliers ___________________________
Five number summary: ____, ____, ____, ____, ____
Statistics Lesson 6: Histograms
Learning Targets I can calculate measures of spread for data sets.
(S1.2.1 and S1.2.3) I can use statistics to describe data sets or to compare
and contrast data sets. (S1.1.2)
I can read and interpret histograms. (S1.1.1)
I can draw graphs to display data. (S1.1.1)
6. HISTOGRAMS
A special type of bar graph - breaks values into non-overlapping intervals of
equal width
- displays the number of values that fall into each interval
- frequency distributions have actual counts
- relative frequency distributions have percentages instead of actual counts
Garage Sale ExampleEqual widthNo gaps between bars
# in each interval
Relative frequency tableA relative frequency histogram has the same shape and the same horizontal scale as the corresponding frequency histogram. The difference is that the vertical scale measures the relative frequencies, written as a percentage, not frequency counts.
probabilityas a %
Cost of college
A well-made histogram tells something about the spread of the data, but seldom indicates the median or any other exact value.
Poorly made histogramThis histogram also represents the cost of college. What is wrong with this histogram?
How many intervals?_____
Is the width of the intervals appropriate?
Can you tell how many colleges cost ~ $2500?
Can you tell how many colleges cost ~ $5000?
What should have been done different? __________________________________________________
Defining the edges of bars
Non-overlapping intervals!
Make a Histogram
43 52 65 66 67 68 70 70 71 72 73 74 7575 76 78 78 78 78 79 80 82 85 87 87 8889 90 90 90 92 93 94 94 98
Use our test score data from earlier to make a histogram to represent this information.
15
10
5
0
Interval frequency
40 < x < 50
50 < x < 60
Make a Histogram
43 52 65 66 67 68 70 70 71 72 73 74 7575 76 78 78 78 78 79 80 82 85 87 87 8889 90 90 90 92 93 94 94 98
Use our test score data from earlier to make a histogram to represent this information.
15
10
5
0
Interval frequency
-----
-----
-----
-----
40 < x < 50 1
50 < x < 60 1
60 < x < 70 6
70 < x < 80 13
80 < x < 90 9
90 < x < 100 5
Your Turn
Use the following histogram with the number of items sold at a garage sale.
1. How many items were sold at the garage sale? 2. At what cost interval had the most sales? 3. How many items sold cost over $20?
Assignment:
Worksheet 6
Warm-UpUse the following data to create a histogram. Remember to show interval borders using appropriate mathematical symbols.
11, 63, 45, 62, 19, 28, 33, 21, 15, 19, 10, 20, 18, 13, 25, 17, 27, 65, 59, 52, 22, 60
Math intervals
Freq.
1. Which interval has the most data? ____________________
2. Which interval contains the median? __________________
Statistics Lesson 7: Variance and Standard Deviation
Learning Targets Calculate measures of center and measures of spread for data sets. (S1.1.2 and S1.2.3) Use statistics to describe data sets or to compare and
contrast data sets. (S1.1.2)
Describe relations between measures of center and measures of spread. (S1.2.1 and S1.2.3)
Use summation notation to represent a sum, mean, variance or standard deviation. (S1.2.3)
7: Variance & Standard DeviationThe IQR is one measure of spread.
The other measures of spread are variance and standard deviation.
-deviation is the difference between a data point and the mean
- variance: the average of the squared deviations
This means you find the deviations, square
them, add them up, and divide.
- standard deviation: the square root of the variance
This tells how far from the mean the scores
deviate.
Algorithm
To calculate variance and standard deviation for a data set with n numbers:
1. Calculate the mean.
2. Find the deviation (difference) of each value from the mean.
3. Square each deviation and add the squares.
4. Divide the sum of squared deviations by n-1.
5. Take the square root of the variance. This is the standard deviation.
Variance and Standard Deviation
variance standard deviation
s2 s
s2 (xi
i1
n
x )2
n 1
s (xi
i1
n
x )2
n 1
Example 1You can do these calculations by hand (as with the dogs) or you can use your
calculator.
Find the standard deviation of each data set below:
a. 4, 7, 11, 13, 15
xi xi – (xi – )2
x x
Calculate the mean: ________
Calculate the variance: ________
Calculate the standard deviation:
n
ixnx
1
1
11
2
2
n
xxS
n
i
2S
Your Turn
Find the standard deviation of each data set below:
b. 8, 9, 10, 11, 12
xi xi – (xi – )2
x x
Calculate the mean: ________
Calculate the variance: ________
Calculate the standard deviation:
n
ixnx
1
1
11
2
2
n
xxS
n
i
2S
Closure (stats lesson 7) Variance & Standard Deviation
1. If the variance is 39, find the standard deviation: _______
2. If the standard deviation is 15.2, find the variance: ______
3. Use the following data to answer the following.45, 41, 44, 48
3a. Calculate the mean: ________
xi xi – (xi – )2
xx
3b. Calculate the variance:
3c. Calculate the standard deviation:
Assignment:
Worksheet 7
Normal Distribution
Find out more…
Take three minutes to visit this:
http://www.ms.uky.edu/~mai/java/stat/GaltonMachine.html
http://www.article19.com/shockwave/monte.htm
Find the standard deviation of each data set below:
42, 37, 52, 66, 39, 49
xi xi – (xi – )2
x x
1. Calculate the mean: ________
2. Calculate the variance: ________
3. Calculate the standard deviation:
n
ixnx
1
1
11
2
2
n
xxS
n
i
2S
Warm-Up
Statistics Lesson 8: Normal Distribution
Learning Targets Determine whether a set of data appears to be normally distributed or skewed. (S1.3.2) Solve problems involving normally distributed data.
(S1.3.3)
Lesson 8: Normal Distribution
facts
also known as the “bell curve”
symmetric
extends to +/- infinity
area under the curve = 1
described by mean and standard deviation
empirical rule: 68% of data falls within one standard deviation of the mean
95% of data falls within two standard deviations of the mean
99.7% of data falls within three standard deviations
Example 1Students counted the number of candies in 150 small packages. They found that the number of candies per package was normally distributed with a mean of 23 candies per package and a standard deviation of 1 piece of candy.
a. About how many packages had between 24 and 22 candies?
b. About how many packages has more than 25 candies?
Your Turn: A fisherman counted the number of worms in 60 containers of bait. He found that the number of worms per container was normally distributed with a mean of 15 worms per container and a standard deviation of 2 worms.
a. What percent of the containers had between 13 to 17 worms? b. What percent of the containers had between 11 and 15 worms? c. About how many containers had less than 11 worms or
more than 19 worms?
8: Skewness
Example 2: Determine whether the data in each of the following appear to be normally distributed, positively skewed, or negatively skewed.
DAY # of ABSENCES
Mon 2Tues 1Wed 5
Thurs 9Fri 8
Assignment
Worksheet 8