1: collecting data · 2018-09-09 · unit2 notes.notebook 1 november 17, 2015 unit 2 statistical...
TRANSCRIPT
unit2 notes.notebook
1
November 17, 2015
Unit 2 Statistical Analysis
Collecting data is done by performing Surveys, either on the whole population or just a sample of it.
→ the population is the total group being studied, either people, animals or things.
→ usually information is collected only from a sample (part of the pop.)
→ collecting survey information from the whole population group is called a census
# every sample must beware of being Biased (prejudiced) in some way. We try to avoid bias by picking our sample in a random way.
Statistics is the area of Math that is all about 1: collecting,
2: analysing and 3: reporting about data.
1: Collecting data
unit2 notes.notebook
2
November 17, 2015
Once we have collected some information we have to start trying to make sense of it all. This is done by analysing the data to look for patterns that can represent the large data set in a simple to understand way.
There are a few basic tools that can represent any data:
Range → highestlowest value
→Shows how spread out the data is.
Mean → the average of the scores
→ add all numbers and divide by
how many numbers you have (n)
Median → the middle number
→ rank all values in order and pick middle.
→ with an even data set you get an average
of the two middle values.
Mode → the most common value in the set.
→ there may be no mode, or there could
be more than one mode, depending on
how the numbers are arranged
Mean, Mode and Median are all called Measures of Central Tendency, because they all try to find the normal, central type of score in a group of data.
2: Analysing Data
unit2 notes.notebook
3
November 17, 2015
Range (highestlowest) : 185155 = 30 cm
Mode (most common): Bimodal
157cm & 173 cm
Medians (middle) : 173 cm
Mean (average) : 1870 ÷ 11
= I70 cm
First.... rewrite the data in order
unit2 notes.notebook
4
November 17, 2015
Mean, Mode and Median all try to describe the full set of data by finding a 'normal' score. They are called the Measures of Central Tendency.
Each measure of central tendency can be accurate or messed up (skewed) depending on the data set.
Mode
there can be no Mode or multiple modes in a set of data.
the mode could be a very high or low score in the date, and not show the normal trend.
Mean
a couple of very high or low scores could pull the average up of down, even when all other data is grouped close together.
The Problems of Central Tendency
MedianThe median is always the middle number and is a little harder to skew, but if you had a lot of numbers in a low range and a lot of numbers in a high range the median would likely be either high or low when the mean would be somewhere in the middle of the range.
The point is...... you can mess up any of the measures of central tendency so you need to do all three and compare in order to get a good understanding.
unit2 notes.notebook
5
November 17, 2015
45.6 54.6 44.6 46.5 66.4 54.6
120 320 330 220 202 210 230 320 210 201 310 330 240 210 330 230
practice: Mean, Mode & Median
All 3 are good measures of central tendency, but the mean is probably the best because it is in the center of the three
The mode here has a problem of being split, and being lower and higher than the other two. Mean or Median would be the best to use here.
unit2 notes.notebook
6
November 17, 2015
Weighted Mean
when not all pieces of data are equally important to the total, this gives each one a better balance.
ex: polling across Canada would have to weight Ontario and Quebec more than PEI and NL because of much larger populations.
Province
Ont
Que
NL
Weighting
AB
.40
.23
.15
.10
Score
* each category
must be given
a weight.
To calculate a weighted mean, you take the original score and multiply it by the weighting to get scores that can work together at the same
65
52
78
40
x
x
x
x
=
=
=
=
unit2 notes.notebook
7
November 17, 2015
A Trimmed MeanThere are times when you may want to remove some of the extreme scores from the data set so that you get a better picture of the normal scores. This would work for a student's work when they normally score really well, but on one assignment they failed.
The failing score would be an outlier for the data. By removing the outlier(s) the central tendency would best show the student's normal achievement.
Removing the outliers and then calculating the average is called a Trimmed Mean
Bob's scores on 3202 work....
92, 90, 86, 95, 100, 35, 91The mean of the complete data set is.... _________
The 35 is an outlier for this group. We would trim the mean by removing the highest and lowest score (sorry, but this means losing the 100 too!) and calculating the mean from the rest of the scores..... ________
NOTE!!!!
Trimming the mean should not change the mode or the median of the data. Because you took the top and bottom score, the middle number (median) should still be the same. And if your top or bottom number was the mode, it wasn't a very good mode to begin with!
unit2 notes.notebook
8
November 17, 2015
3. Reporting your findings→ the best way to report on data is with a chart or graph.
→ any good graph should include:
scales (x axis, y axis)
a title at top and on each axis.
the best style of graph for the data used.
scale and label
scale and label
xaxis
yaxisGraph Title
unit2 notes.notebook
9
November 17, 2015
unit2 notes.notebook
10
November 17, 2015
unit2 notes.notebook
11
November 17, 2015
Percentile RankingThere is a form of reporting the scores of a group according to how they rank to the rest of the group, and not by how they actually scored. This is done by using a Percentile.
Percentile the percentage of score that are below a certain mark.
The top score in the group is always the 100th percentile and the bottom score is always the 1st percentile. Even if they scored a 90% on the assignment, if it was the lowest in the class they score as the 1st percentile.
Example:
Alison was in the 80th percentile with a mark of 72%. This means that 80% of the people scored lower than her.
Percentile Rank the percent of scores at or below the given score.
Percentiles are often used to compare results to the general population. For example, 10 minutes after you were born, you were measured in length and weight and ranked according to how you come out against the general size of all other Canadian babies. If you were in the 90th percentile you were a huge kid, and if you were in the 20th percentile you were probably a premature baby.
Any time you score close to the 50th percentile you are average. The median number is always exactly the 50th percentile.
To calculate the percentile, divide the number of scores that are lower than yours by the total number of scores in the data set, and then multiply by 100.
NOTE!!! Its best to rewrite the scores in order to do work on percentiles.
Percentile =number of scores lower than yours
total number of scores (n)x 100
Ex: 22, 47, 57, 65, 66, 84, 75, 80, 88, 89, 91, 98
Percentile rank (89)
To find the 25th percentile: take the median (middle) of the bottom half of the data
find the 75th percentile: take the median of the top half of the data
unit2 notes.notebook
12
November 17, 2015
here is a percentile chart for all baby boys born in North America
a newborn baby is compared to all other children, not to a weight or height goal.
unit2 notes.notebook
13
November 17, 2015
How do you find the percentiles?
First write the data in ascending order....
the 'upper median' is the 75th percentile
the median is the 50th percentile
the 'lower median' is the 25th percentile
unit2 notes.notebook
14
November 17, 2015
Section 2.3 Scatterplots.
independent variabledependent variable
a Scatterplot is a 2 axis graph which tries to compare 2 variables
each piece of data makes one
point on graph.
we look for a trend (pattern) in the data. "The line of best fit"
doesn't have to include origin
have equal #'s above / below line
try to fit the flow of dots.
The Line of Best fit.
* you can predict values beyond the range of the data: extrapolating
* you can predict values that are within the range of the range: interpolating
extrapolate
Interpolate
unit2 notes.notebook
15
November 17, 2015
Describing the Pattern. (Correlation)
To describe the pattern we can use the correlation term but also in terms of the variables
"As the forearm lenqht(independent variable)
increases
the hand length(dependent variable)
increases.