1: collecting data · 2018-09-09 · unit2 notes.notebook 1 november 17, 2015 unit 2 statistical...

unit2 notes.notebook

1

November 17, 2015

Unit 2 Statistical Analysis

Collecting data is done by performing Surveys, either on the whole population or just a sample of it.

→ the population is the total group being studied, either people, animals or things.

→ usually information is collected only from a sample (part of the pop.)

→ collecting survey information from the whole population group is called a census

# every sample must beware of being Biased (prejudiced) in some way. We try to avoid bias by picking our sample in a random way.

Statistics is the area of Math that is all about 1: collecting,

2: analysing and 3: reporting about data.

1: Collecting data


2

November 17, 2015

Once we have collected some information we have to start trying to make sense of it all. This is done by analysing the data to look for patterns that can represent the large data set in a simple to understand way.

There are a few basic tools that can represent any data:

Range → highestlowest value

→Shows how spread out the data is.

Mean → the average of the scores

→ add all numbers and divide by

how many numbers you have (n)

Median → the middle number

→ rank all values in order and pick middle.

→ with an even data set you get an average

of the two middle values.

Mode → the most common value in the set.

→ there may be no mode, or there could

be more than one mode, depending on

how the numbers are arranged

Mean, Mode and Median are all called Measures of Central Tendency, because they all try to find the normal, central type of score in a group of data.

2: Analysing Data


3

November 17, 2015

Range (highestlowest) : 185155 = 30 cm

Mode (most common): Bimodal

157cm & 173 cm

Medians (middle) : 173 cm

Mean (average) : 1870 ÷ 11

= I70 cm

First.... rewrite the data in order


4

November 17, 2015

Mean, Mode and Median all try to describe the full set of data by finding a 'normal' score. They are called the Measures of Central Tendency.

Each measure of central tendency can be accurate or messed up (skewed) depending on the data set.

Mode

there can be no Mode or multiple modes in a set of data.

the mode could be a very high or low score in the date, and not show the normal trend.

Mean

a couple of very high or low scores could pull the average up of down, even when all other data is grouped close together.

The Problems of Central Tendency

MedianThe median is always the middle number and is a little harder to skew, but if you had a lot of numbers in a low range and a lot of numbers in a high range the median would likely be either high or low when the mean would be somewhere in the middle of the range.

The point is...... you can mess up any of the measures of central tendency so you need to do all three and compare in order to get a good understanding.


5

November 17, 2015

45.6 54.6 44.6 46.5 66.4 54.6

120 320 330 220 202 210 230 320 210 201 310 330 240 210 330 230

practice: Mean, Mode & Median

All 3 are good measures of central tendency, but the mean is probably the best because it is in the center of the three

The mode here has a problem of being split, and being lower and higher than the other two. Mean or Median would be the best to use here.


6

November 17, 2015

Weighted Mean

when not all pieces of data are equally important to the total, this gives each one a better balance.

ex: polling across Canada would have to weight Ontario and Quebec more than PEI and NL because of much larger populations.

Province

Ont

Que

NL

Weighting

AB

.40

.23

.15

.10

Score

* each category

must be given

a weight.

To calculate a weighted mean, you take the original score and multiply it by the weighting to get scores that can work together at the same

65

52

78

40

x

x

x

x

=

=

=

=


7

November 17, 2015

A Trimmed MeanThere are times when you may want to remove some of the extreme scores from the data set so that you get a better picture of the normal scores. This would work for a student's work when they normally score really well, but on one assignment they failed.

The failing score would be an outlier for the data. By removing the outlier(s) the central tendency would best show the student's normal achievement.

Removing the outliers and then calculating the average is called a Trimmed Mean

Bob's scores on 3202 work....

92, 90, 86, 95, 100, 35, 91The mean of the complete data set is.... _________

The 35 is an outlier for this group. We would trim the mean by removing the highest and lowest score (sorry, but this means losing the 100 too!) and calculating the mean from the rest of the scores..... ________

NOTE!!!!

Trimming the mean should not change the mode or the median of the data. Because you took the top and bottom score, the middle number (median) should still be the same. And if your top or bottom number was the mode, it wasn't a very good mode to begin with!


8

November 17, 2015

3. Reporting your findings→ the best way to report on data is with a chart or graph.

→ any good graph should include:

scales (x axis, y axis)

a title at top and on each axis.

the best style of graph for the data used.

scale and label

scale and label

xaxis

yaxisGraph Title


9

November 17, 2015


10

November 17, 2015


11

November 17, 2015

Percentile RankingThere is a form of reporting the scores of a group according to how they rank to the rest of the group, and not by how they actually scored. This is done by using a Percentile.

Percentile the percentage of score that are below a certain mark.

The top score in the group is always the 100th percentile and the bottom score is always the 1st percentile. Even if they scored a 90% on the assignment, if it was the lowest in the class they score as the 1st percentile.

Example:

Alison was in the 80th percentile with a mark of 72%. This means that 80% of the people scored lower than her.

Percentile Rank the percent of scores at or below the given score.

Percentiles are often used to compare results to the general population. For example, 10 minutes after you were born, you were measured in length and weight and ranked according to how you come out against the general size of all other Canadian babies. If you were in the 90th percentile you were a huge kid, and if you were in the 20th percentile you were probably a premature baby.

Any time you score close to the 50th percentile you are average. The median number is always exactly the 50th percentile.

To calculate the percentile, divide the number of scores that are lower than yours by the total number of scores in the data set, and then multiply by 100.

NOTE!!! Its best to rewrite the scores in order to do work on percentiles.

Percentile =number of scores lower than yours

total number of scores (n)x 100

Ex: 22, 47, 57, 65, 66, 84, 75, 80, 88, 89, 91, 98

Percentile rank (89)

To find the 25th percentile: take the median (middle) of the bottom half of the data

find the 75th percentile: take the median of the top half of the data


12

November 17, 2015

here is a percentile chart for all baby boys born in North America

a newborn baby is compared to all other children, not to a weight or height goal.


13

November 17, 2015

How do you find the percentiles?

First write the data in ascending order....

the 'upper median' is the 75th percentile

the median is the 50th percentile

the 'lower median' is the 25th percentile


14

November 17, 2015

Section 2.3 Scatterplots.

independent variabledependent variable

a Scatterplot is a 2 axis graph which tries to compare 2 variables

each piece of data makes one

point on graph.

we look for a trend (pattern) in the data. "The line of best fit"

doesn't have to include origin

have equal #'s above / below line

try to fit the flow of dots.

The Line of Best fit.

* you can predict values beyond the range of the data: extrapolating

* you can predict values that are within the range of the range: interpolating

extrapolate

Interpolate


15

November 17, 2015

Describing the Pattern. (Correlation)

To describe the pattern we can use the correlation term but also in terms of the variables

"As the forearm lenqht(independent variable)

increases

the hand length(dependent variable)

increases.

1: collecting data · 2018-09-09 · unit2 notes.notebook 1 november 17, 2015 unit 2 statistical...

Documents