© boardworks ltd 2004 1 of 43 statistics 3 eso bil mathematics
TRANSCRIPT
© Boardworks Ltd 2004 1 of 43
Statistics
3 ESO BIL Mathematics
© Boardworks Ltd 2004 2 of 43
D1
D1
D1
D1
D1.1 Basic concepts
Contents
D1 Statistics
D1.2 Collecting data
D1.3 Organizing data
D1.4 Writing a statistical report
© Boardworks Ltd 2004 3 of 43
What is the Statistics?
The science related to the collection, organization, interpretation and analysis of data is called Statistics.
The science related to the collection, organization, interpretation and analysis of data is called Statistics.
A statistic gives us information about a characteristic studied in a set of individuals called population.
In order to study the characteristic you can choose a subset of members of the population called sample.
Each element of the elements of the population of the sample is called individual.
The characteristics we want to study is called statistics variable.
© Boardworks Ltd 2004 4 of 43
Choosing the sample
When collecting data it is usually impractical to include every member of the group of population that is being investigated.
How big should a sample be?
The sample should be as large as possible.
If the sample size is too small, then the results will be unrepresentative.
A sample is therefore choose to represent the group that is being investigated.
© Boardworks Ltd 2004 5 of 43
Choosing the sample
Suppose, for example, that you wish to investigate the favourite sports of 11 to 15 year-olds.
Would it be reasonable to question a sample of people outside a football ground following a game?
Can you suggest a better sample?
You would have to make sure that you ask equal numbers of girls and boys and that the sample is spread out across all age groups in the range.
© Boardworks Ltd 2004 6 of 43
Different kinds of data
Qualitative data is data that is non-numerical. Qualitative data is data that is non-numerical.
For example,
Sometimes qualitative data can contain numbers.
For example,
favourite football team,
eye colour,
birth place.
favourite number,
last digit in your telephone number,
most used bus route.
© Boardworks Ltd 2004 7 of 43
Quantitative data
Discrete data can only take certain values. Discrete data can only take certain values.
Continuous data comes from measuring and can take any value within a given range.
Continuous data comes from measuring and can take any value within a given range.
Quantitative data is numerical. It can be discrete or continuous.
For example,
For example,
shoe sizes,
the number of children in a class, the number of sweets in a packet.
the weight of a banana, the time it takes for pupils to get to school, the height of 13 year-olds.
© Boardworks Ltd 2004 8 of 43
Discrete or continuous data
© Boardworks Ltd 2004 9 of 43
Summary of basic concepts
Ex.: Feeding habits of infant population (3 to 6 years old)
Population: set where you realize the study.
Childs from 3 to 6 years old
Sample: Subset of the population studied.
The set of children chosen to do the study.
Individual: Every element of the population or the sample.
Every single boy and girl.
Statistics variable: characteristic studied on the population.
The aspect stydied: consumer of vegetables, fruit, meet, fish, etc.
Types of variables:
Qualitative (qualities) Ex: What kind of fruit do you eat?
Quantitative (numerical): discrete o continuous
Ex: How many pieces a week? orWhich quantity in weight per week?
© Boardworks Ltd 2004 10 of 43
D1
D1
D1
D1
D1.1 Concepts
D1.2 Collecting data
Contents
D1.3 Organizing data
D1 Statistics
D1.4 Writing a statistical report
© Boardworks Ltd 2004 11 of 43
Deciding on the data
Data can be collected from a primary source or a secondary source.
Data can be collected from a primary source or a secondary source.
Data from a primary source is data that you have collected yourself, for example:
Data from a secondary source is data that you have collected from somewhere else including the Internet, reference books or newspapers.
From a survey or questionnaire of a group of people.
From an experiment involving observation, counting or measuring. In this case you use an observation sheet.
© Boardworks Ltd 2004 13 of 43
Sources of data
© Boardworks Ltd 2004 14 of 43
Designing a questionnaire
It is important to design a questionnaire so that:
People will co-operate and answer the questions honestly.
The answers to the questions can be analysed and presented.
The questions are not embarrassing or personal.
The questions, if possible, have a specific answer.
© Boardworks Ltd 2004 15 of 43
Designing a questionnaire
Make sure that questions are not embarrassing or personal.
For example, you need to think carefully about questions asking about age or income.
Do not ask : How old are you?
A better question is : Tick one box for your age group.
15-20 21-25 26-30 31 +
© Boardworks Ltd 2004 16 of 43
Suggest a better question
How much do you weigh?
This is too personal, also some people don’t know their weight.
Underweight Average weight Overweight
Would you consider yourself to be:
A better question would be:
© Boardworks Ltd 2004 17 of 43
Designing a questionnaire
For example :
People could answer :
Yes
No
Not much
Only the best bits
Once a day
Sometimes
If possible, write questions so that they have a specific answer.
Did you see the Olympics on TV ?
© Boardworks Ltd 2004 18 of 43
Designing a questionnaire
A better question would be:
How much of the Olympics coverage did you watch?Tick one box only.
None
Less than 1 hour a day
Between 1 to 2 hours a day
More than 2 hours a day
Every eventuality has been accounted for and the person answering the question cannot give another choice.
© Boardworks Ltd 2004 19 of 43
How would you rate the leisure facilities available in your local area? Tick one box only.
Designing a questionnaire
A scale can be used when asking for an opinion.
For example,
Excellent UnsatisfactoryPoorSatisfactoryGood
© Boardworks Ltd 2004 22 of 43
Suggest a better question
The intervals given overlap. Also, if a person has read more than 6 books there is nowhere to tick.
A better question would be:
How many books did you read last month?Tick one box.
0-2 3-5 6-8 8+
How many books did you read last month?
0-2 2-4 4-6
© Boardworks Ltd 2004 24 of 43
Designing an observation sheet
An observation sheet can be used to record data that comes from counting, observing or measuring.
It can also be used to record responses to specific questions.
For example, to investigate a claim that the amount of TV watched has an impact on weight we can use the following:
age gender height (cm) weight (kg) hours of TV watched per week
© Boardworks Ltd 2004 25 of 43
Designing an observation sheet
For example, in our integrated unit we take the following data to test our physical condition:
© Boardworks Ltd 2004 26 of 43
Designing an observation sheet
For example, in our integrated unit we take the following data to test our physical condition:
BOUCHARD INDEX
Years weight height B.I.= W/H
1º Evaluation
2º Evaluation
RUFFIER DICKSON INDEX
P1 P2 P3 R.I.
1º Evaluation
2º Evaluation
EUROFIT TEST
COURSE NAVETTE LONG JUMP ABDOMINALS SPEED FLEXIBILITY
1º Evaluation
2º Evaluation
© Boardworks Ltd 2004 27 of 43
D1
D1
D1
D1
D1.3 Organizing data
Contents
D1.2 Collecting data
D1 Statistics
D1.1 Concepts
D1.4 Writing a statistical report
© Boardworks Ltd 2004 28 of 43
Using a tally chart
When collecting data that involves counting something we often use a tally chart.
For example, this tally chart can be used to record people’s favourite snacks.
favourite snack tally frequency
crisps
fruit
nuts
sweets
The tally marks are recorded, as responses are collected,and the frequencies are then filled in.
13
6
3
8
© Boardworks Ltd 2004 29 of 43
Using a tally chart
© Boardworks Ltd 2004 30 of 43
Grouping discrete data
We take the temperatures during one month:
With these data we can make the frequency table:
24 33 30 32 27 34
24 28 33 34 35 33
27 32 35 33 34 34
25 25 33 35 33 34
27 27 35 34 33 35
Number of data 24
Temperatures tally frequency
24
25
27
33
2
2
4
7
34 6
35 4
© Boardworks Ltd 2004 31 of 43
Grouping discrete data
We take the temperatures during one month:
Can you find the mistake?
24 33 30 32 27 34
24 28 33 34 35 33
27 32 35 33 34 34
25 25 33 35 33 34
27 27 35 34 33 35
Number of data 24
Temperatures tally frequency
24
25
27
33
2
2
4
7
34 6
35 4
© Boardworks Ltd 2004 32 of 43
Grouping discrete data
To avoid these kind of mistakes we add another row to calculate the total frequency which must be the same as the number of data.
Temperatures tally frequency
24
25
27
33
2
2
4
7
34 535 4
Total frequency 24
© Boardworks Ltd 2004 33 of 43
Using a frequency table
Once data has been collected it is often organized into a frequency table.
For example, this frequency table shows the favourite take-away meals of a group of pupils:
Favourite take-away
Pizza
Fish and chips
Burgers
Indian
Frequency
11
7
8
5
Chinese 8
© Boardworks Ltd 2004 34 of 43
Grouping discrete data
A group of 20 people were asked how much change they were carrying in their wallets. These were their responses:
34p£1.7283p £6.36
£4.07£2.97£3.53 6p
£9.5434p£1.68 50p
82p£7.54£1.09 £2.81
£2.4346p£1.70 £1.29
Each amount of money is different and the values cover a large range.
This type of data is usually grouped into equal class intervals.
© Boardworks Ltd 2004 35 of 43
Choosing appropriate class intervals
When choosing class intervals it is important that they include every value without overlapping and are of equal size.
For the following data:
34p£1.7283p £6.36
£4.07£2.97£3.53 6p
£9.5434p£1.68 50p
82p£7.54£1.09 £2.81
£2.4346p£1.70 £1.29
We can use class sizes of £1:
£0.01 - £1.00, £1.01 - £2.00, £2.01 - £3.00, £3.01 - £4.00,
£4.01 - £5.00, Over £5. This is an open class interval.
© Boardworks Ltd 2004 36 of 43
Over 5.00
4.01 - 5.00
3.01 - 4.00
2.01 - 3.00
1.01 - 2.00
0.01 - 1.00
FrequencyAmount of money (£)
3
1
1
3
5
7
Choosing appropriate class intervals
34p£1.7283p £6.36
£4.07£2.97£3.53 6p
£9.5434p£1.68 50p
82p£7.54£1.09 £2.81
£2.4346p£1.70 £1.29
Complete the following frequency table for this data:
© Boardworks Ltd 2004 37 of 43
Choosing appropriate class intervals
The size of the class intervals depends on the range of the data and the number of intervals required.
Explain why class sizes of £5 would be inappropriate.
Could we use a class size of 20p?
For the following data:
34p£1.7283p £6.36
£4.07£2.97£3.53 6p
£9.5434p£1.68 50p
82p£7.54£1.09 £2.81
£2.4346p£1.70 £1.29
© Boardworks Ltd 2004 38 of 43
Grouping continuous data
Continuous data is usually grouped into equal class intervals.
What is wrong with the class intervals in this grouped frequency table showing lengths?
30 ≤ length
20 ≤ length ≤ 30
10 ≤ length ≤ 20
0 ≤ length ≤ 10
FrequencyLength (cm)
This is an open class interval.
30 ≤ length
20 ≤ length < 30
10 ≤ length < 20
0 ≤ length < 10
FrequencyLength (cm)
The class intervals are written using the symbols ≤ and <.
© Boardworks Ltd 2004 39 of 43
Grouping continuous data
Continuous data is usually grouped into equal class intervals.
What is wrong with the class intervals in this grouped frequency table showing weights?
Weight (g) Frequency
0 < weight < 10
10 < weight < 20
20 < weight < 30
30 < weight
Weight (g) Frequency
0 ≤ weight < 10
10 ≤ weight < 20
20 ≤ weight < 30
30 ≤ weight
© Boardworks Ltd 2004 40 of 43
Using two-way tables
A two-way table can be used to organize two sets of data.
For example, pupils from Years 7, 8 and 9 were asked what they usually did during their lunch break. This two-way table shows the results:
Year 7
Year 8
Year 9
Eat school dinners
35
29
38
Eat a packed lunch
42
34
32
Eat at home
19
22
18
© Boardworks Ltd 2004 41 of 43
Integrated unit
In your integrated unit you need first to collect the data of the observations sheet and them organizing them using a frequency table.
RUFFIER DICKSON INDEX
Bouchard index Frequency
TOTAL:
© Boardworks Ltd 2004 42 of 43
D1
D1
D1
D1
D1.4 Processing data
Contents
D1.3 Organizing data
D1.2 Collecting data
D1 Statistics
D1.1 Concepts
© Boardworks Ltd 2004 45 of 43
D2
D2
D2
D2
D2
Finding the mode
Contents
Processing data
Calculating the mean
Calculating statistics
Finding the median
Finding the range
© Boardworks Ltd 2004 46 of 43
Finding the mode
A dice was thrown ten times. These are the results:
What was the modal score?
3 is the modal score because it appears most often.
© Boardworks Ltd 2004 47 of 43
Finding the mode
The mode or modal value in a set of data is the data value that appears the most often.
For example, the number of goals scored by the local football team in the last ten games is:
The modal score is 2, and 1.
Is it possible to have more than one modal value?
Is it possible to have no modal value?
Yes
Yes
2, 1, 2, 0, 0, 2, 3, 1, 2, 1.2, 1, 2, 1, 0, 2, 3, 1, 2, 1.
© Boardworks Ltd 2004 48 of 43
Finding the mode from a frequency table
The mode is the only average that can be used for categorical or non-numerical data.
For example, 30 pupils are asked how they usually travel to school. The results are shown in a frequency table.
What is the modal method of travel?
Method of travel Frequency
Bicycle 6
On foot 8
Car 2
Bus 6
Train 3
8Most children travel on foot.
Travelling on foot is therefore the modal method of travel.
© Boardworks Ltd 2004 49 of 43
Finding the mode from a frequency table
This frequency table shows the frequency of different length words in a given paragraph of text.
What was the modal word length?
For this data there are two modal word lengths: 2 and 4.
We need to look for the word lengths that occur most frequently.
Word length
Frequency
1
3
2
16
3
12
4
16
5
7
6
3
7
11
8
6
9
2
10
116 16
© Boardworks Ltd 2004 50 of 43
Finding the mode from a bar chart
This bar chart shows the scores in a science test:
What was the modal score?
6 is the modal score because it has the highest bar.
0
1
2
3
4
5
6
7
8
9
1 2 3 4 5 6 7 8 9 10
Nu
mb
er o
f p
up
ils
Marks out of ten
© Boardworks Ltd 2004 51 of 43
78
2618
55
23
chocolate
fruit
vegetables
sweets
other
Finding the mode from a pie chart
This pie chart shows the favourite food of a sample of people:
What was the modal food
type?
The biggest sector of the pie chart is for chocolate, so this is the modal food type.
© Boardworks Ltd 2004 52 of 43
Finding the modal class for continuous data
This grouped frequency table shows the times 50 girls and 50 boys took to complete one lap around a race track.
Frequency
Time (minutes:seconds) Boys Girls
2:00 ≤ t < 2:15 3 1
2:15 ≤ t < 2:30 7 6
2:30 ≤ t < 2:45 11 10
2:45 ≤ t < 3:00 13 9
3:15 ≤ t < 3:30 8 12
3:30 ≤ t < 3:45 7 10
3:45 ≤ t < 4:00 1 2
What is the modal class for the girls?
What is the modal class for the boys?
What is the modal class for the pupils regardless of whether they are a boy or a girl?
© Boardworks Ltd 2004 54 of 43
Calculating the mean
The mean is the most commonly used average.
To calculate the mean of a set of values we add together the values and divide by the total number of values.
Mean =Sum of values
Number of values
For example, the mean of 3, 6, 7, 9 and 9 is
3 + 6 + 7 + 9 + 9
5=
34
5= 6.8
© Boardworks Ltd 2004 55 of 43
Calculating the mean from a frequency table
The following frequency table shows the scores obtained when a dice is thrown 50 times.
What is the mean score?
Score
Frequency
1 2 3 4 5 6
8 11 6 9 9 7
Score × Frequency
Total
8 22 18 36 45 42
50
171
The mean score =17150
= 3.42
© Boardworks Ltd 2004 56 of 43
Calculating the mean from a frequency table
To calculate the mean of a set of values we add together the values and divide by the total number of values.
Mean =Sum of Score x Frequency
Sum of frequencies
In the previous example, the mean is
8 + 22 + 18 + 36 + 45 + 42
8 +11 + 6 + 9 + 9 + 7=
171
50= 3.42
© Boardworks Ltd 2004 57 of 43
Problems involving the mean
A pupil scores 78%, 75% and 82% in three tests. What must she score in the fourth test to get an overall mean of 80%?
To get a mean of 80% the four marks must add up to
4 × 80% = 320%
The three marks that the pupils has so far add up to
78% + 75% + 82% = 235%
The mark needed in the fourth test is
320% – 235% = 85%
© Boardworks Ltd 2004 59 of 43
Finding the median
The median is the middle value of a set of numbers arranged in order. For example,
Find the median of
10, 7, 9, 12, 7, 8, 6,
Write the values in order:
6, 7, 7, 8, 9, 10, 12.
The median is the middle value.
© Boardworks Ltd 2004 60 of 43
Finding the median
When there is an even number of values, there will be two values in the middle.
In this case, we have to find the mean of the two middle values.
For example,
Find the median of 56, 42, 47, 51, 65 and 43.
The values in order are:
There are two middle values, 47 and 51.
42, 43, 47, 51, 56, 65.
© Boardworks Ltd 2004 61 of 43
Finding the median
To find the number that is half-way between 47 and 51 we can add the two numbers together and divide by 2.
47 + 51
2=
98
2= 49
Alternatively, find the difference between 47 and 51 and add half this difference to the lower number.
51 – 47 = 4
½ of 4 = 2
2 + 47 = 49
The median of 42, 43, 47, 51, 56 and 65 is 49.
© Boardworks Ltd 2004 62 of 43
Rogue values
The median is often used when there is a rogue value – that is, a value that is much smaller or larger than the rest.
The mean of the data set is 168. This is not representative of the set because it is lower than almost all the data values.
What is the rogue value in the following data set:192, 183, 201, 177, 193, 197, 4, 186, 179?
The median of this data set is:
4, 177, 179, 183, 186, 192, 193, 197, 201.
The median of the data set is not affected by the rogue value, 4.
© Boardworks Ltd 2004 63 of 43
Mean or median?
Would it be better to use the median or the mean to represent the following data sets?
median
mean
mean
median
mean
median
34.2, 36.8, 29.7, 356, 42.5, 37.1?
0.4, 0.5, 0.3, 0.8, 0.7, 1.0?
892, 954, 1026, 908, 871, 930?
3.12, 3.15, 3.23, 9.34, 3.16, 3.20?
97.85, 95.43, 102.45, 98.02, 97.92, 99.38?
87634, 9321, 78265, 83493, 91574, 90046?
© Boardworks Ltd 2004 65 of 43
What does it mean if the range is large?
What does it mean if the range is small?
Finding the range
The range of a set of data is a measure of how the data is spread across the distribution.
To find the range we subtract the lowest value in the set from the highest value.
Range = highest value – lowest value
When the range is large it tells us that the values vary widely in size.
When the range is small it tells us that the values are similar in size.
© Boardworks Ltd 2004 66 of 43
Find the mean, the median and the range
© Boardworks Ltd 2004 67 of 43
Calculating the mean
© Boardworks Ltd 2004 68 of 43
Find the median
© Boardworks Ltd 2004 69 of 43
D2D2.5 Calculating statistics
D2
D2
D2
D2
Contents
D2.2 Calculating the mean
D2.1 Finding the mode
D2 Processing data
D2.4 Finding the range
D2.3 Finding the median
© Boardworks Ltd 2004 70 of 43
Remember the three averages and range
M O D EM O D ECOOMMON
M E A NM E A NAADDD I V I D E
M E D I A NM E D I A N
MIDDDLE R A N G ER A N G E
LAARGEST
SMALLEEST
© Boardworks Ltd 2004 71 of 43
The three averages and range
There are three different types of average:
MODE
most common
MEAN
sum of valuesnumber of values
MEDIAN
middle value
The range is not an average, but tells you how the data is spread out:
RANGE
largest value – smallest value
© Boardworks Ltd 2004 72 of 43
The three averages
Each type of average has its purpose and sometimes one is preferable to an other.
The mode is easy to find and it eliminates some of the effects of extreme values. It is the only type of average that can be used for categorical (non-numerical) data.
The median is also fairly easy to find and has the advantage of being hardly affected by rogue values or skewed data.
The mean is the most difficult to calculate but takes into account all the values in the data set.
© Boardworks Ltd 2004 73 of 43
Find the mean, median and range
© Boardworks Ltd 2004 74 of 43
Find the missing value
© Boardworks Ltd 2004 75 of 43
Calculating statistics
Look at the values on these five cards:
2 4 5 8 11
Choose three cards so that:
The mean is bigger than the median.
The median is bigger than the mean.
The mean and the median are the same.
© Boardworks Ltd 2004 76 of 43
Stem-and-leaf diagrams
Sometimes data is arranged in a stem-and-leaf diagram.
For example, this stem-and-leaf diagram shows the marks scored by 21 pupils in a maths test.
0
1
2
3
4
6
4
0
0
0
7
5
1
2
0
9
5
3
2
8
5
2
6
5
6
8
stem = tensleaves = units
Find the median, mode and range for the data.
5
There are 21 data values so the median will be the 11th value, that is ___ .25
The mode is ___ .32
The range is 40 – 6, which is ___ .34
2 2 2
© Boardworks Ltd 2004 77 of 43
Stem-and-leaf diagrams