s1: chapter 4 representation of data
DESCRIPTION
S1: Chapter 4 Representation of Data. Dr J Frost ([email protected]) . Last modified : 9 th September 2013. Stem and Leaf recap. Put the following measurements into a stem and leaf diagram:. - PowerPoint PPT PresentationTRANSCRIPT
S1: Chapter 4Representation of Data
Dr J Frost ([email protected])www.drfrostmaths.com
Last modified: 20th September 2015
OverviewWe’ll look at 3 different ways of presenting data, as well as ways of analysing them (including ‘skew’).
BOX PLOTS
*NEW since GCSE!* Outliers.
STEM AND LEAF
*NEW since GCSE!* Back to back stem and leaf diagrams.HISTOGRAMS
*NEW since GCSE!* Area is not necessarily equal to frequency.
SkewSkew gives a measure of whether the values are more spread out above the median or below the median.
Height
Freq
uenc
y
Weight
Freq
uenc
y
Sketch Mode
Sketch Median
Sketch Mean
mode
median
mean
mode
median
mean
Sketch Mode
Sketch Median
Sketch Mean
We say this distribution has positive skew.(To remember, think that the ‘tail’ points in the positive direction)
We say this distribution has negative skew.? ?
Skew
Salaries on the UK.
Distribution Skew
High salaries drag mean up.So positive skew.Mean > Median
IQ A symmetrical distribution, i.e. no skew.Mean = Median
Heights of people in the UK Will probably be a nice ‘bell curve’.i.e. No skew.Mean = Median
Age of retirement Likely to be people who retire significantly before the median age, but not many who retire significantly after. So negative skew.Mean < Median
Remember, think what direction the ‘tail’ is likely to point.
?
?
?
?
?
?
?
?
Skew based on mean/medianSuppose for some data we had calculated that and .
Describe the skewness of the marks of the students, giving a reason for your answer. (2)
Negative skew
because mean < median
1st mark
2nd mark
?
?
Bro Tip: If you ever forget which way the two go, just think of salaries! High values (i.e. a positive tail) drag up the mean but not the median. So it’s the position of the mean that determines skew.
Skew based on quartiles
Positive skew Negative skew? ?
No skew?
(The data is spread out more in the positive direction, so we have positive skew)
𝑄3−𝑄2>𝑄2−𝑄1 𝑄2−𝑄1>𝑄3−𝑄2
𝑄2−𝑄1=𝑄3−𝑄2
Example Exam Question
𝑄3−𝑄2>𝑄2−𝑄11st mark
2nd mark Therefore positive skew.
?
?
Test Your Understanding
Available Data Comment on skew (2 marks)Positive skew as Negative skew as Little/no skew as median and mean are roughly equal. ?
?
?
Calculating SkewOne measure of skew can be calculated using the following formula: (Important Note: this will be given to you in the exam if required)
3(mean – median)standard deviation
When mean > median, mean < median, and mean = median, we can see this gives us a positive value, negative value, and 0 respectively, as expected.
Find the skew of the following teachers’ annual salaries:
£3 £3.50 £4 £7 £100
Mean = £23.50 Median = £4 Standard Deviation = £38.28
Skew = 1.53
? ? ?
?
Exercise 1
Using the available data in each case, state the skew (1 mark) and give a justification (1 mark).
a. Positive skew as b. Mean , Median Negative skew as c. No skew as d. Mean , Median Positive skew as e. Negative skew as
In each case state whether the mean or median would be a more appropriate average (1 mark), and give a reason (1 mark).
f. Median as the data is (positively) skewed.g. Median , Mean Median as the data is (negatively) skewed.
1
2
?????
??
Exercise 1
3
??
?
?
Exercise 14
?
?
?
Stem and Leaf recap
4.7 3.6 3.8 4.7 4.1 2.2 3.6 4.0 4.4 5.0 3.7 4.6 4.8 3.7 3.22.5 3.6 4.5 4.7 5.2 4.7 4.2 3.8 5.1 1.4 2.1 3.5 4.2 2.4 5.1
Put the following measurements into a stem and leaf diagram:
12345
41 2 4 52 5 6 6 6 7 7 8 80 1 2 2 4 5 6 7 7 7 7 80 1 1 2
Now find:
𝑀𝑜𝑑𝑒=4.7𝐿𝑜𝑤𝑒𝑟𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒=3.6𝑈𝑝𝑝𝑒𝑟𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒=4.7 𝑀𝑒𝑑𝑖𝑎𝑛=4.05
(1)(4)(9)(12)(4)
Key:2 | 1 means 2.1?
? ?? ?
Back-to-Back Stem and Leaf recap
Girls55 80 84 91 8092 98 40 60 6493 72 96 85 8890 76 54 58 92
91 80 79
Boys80 60 91 65 6781 75 46 72 7174 57 64 60 50
68
The data above shows the pulse rate of boys and girls in a school.
Comment on the results.The back-to-back stem and leaf diagram shows that boy’s pulse rate tends to be lower than girls’.
Girls Boys
456789
60 7 90 0 4 5 7 81 2 4 501
08 5 46 4 0
9 8 6 28 5 4 0 0 08 6 2 2 1 0
Key: 0|4|6Means 40 for girls and 46 for boys.
?
?
Box Plots allow us to visually represent the distribution of the data.
Minimum Lower Quartile Median Upper Quartile Maximum
3 15 17 22 27
0 5 10 15 20 25 30
Sketch Sketch Sketch Sketch Sketch
How is the IQR represented in this diagram?
How is the range represented in this diagram?
Sketch Sketch
IQR
range
Box Plot recap
OutliersAn outlier is: an extreme value.
0 5 10 15 20 25 30
More specifically, it’s generally when we’re 1.5 IQRs beyond the lower and upper quartiles.(But you will be told in the exam if the rule differs from this)
Outliers beyond this point
?
ExamplesSmallest values Largest values Lower Quartile Median Upper Quartile
0, 3 21, 27 8 10 14
0 5 10 15 20 25 30
Draw a box plot to represent the above data.
When there’s an outlier at one end, there’s two allowable places to put the end of the whisker:
Outlier boundaries: Bro Exam Tip: You MUST show your outlier boundary calculations.?
The maximum value not an outlier, 21 (I think this one makes most sense).
OR the outlier boundary, 23.
?
Use one or the other (not both).
Test Your Understanding
a ?
b ?
c ?
(on your printed sheet)
£100k £150k £200k £250k £300k £350k £400k £450k
Kingston
Croydon
Box Plot comparing house prices of Croydon and Kingston-upon-Thames.
Comparing Box Plots
“Compare the prices of houses in Croydon with those in Kingston”. (2 marks)
For 1 mark, one of:•In interquartile range of house prices in Kingston is greater than Croydon.•The range of house prices in Kingston is greater than Croydon.i.e. Something spread related.
For 1 mark:•The median house price in Kingston was greater than that in Croydon.•i.e. Compare some measure of location (could be minimum, lower quartile, etc.)
? ?
Test Your Understanding
Jan 2005 Q2
(on your printed sheet)
?
?
?
Exercise 2
a ?
b ?
c ?
d ?
(on your printed sheet)
Exercise 2 (on your printed sheet)
?
?
?
?
?
Exercise 2 (on your printed sheet)
???
?
Exercise 2 (on your printed sheet)
???
?
Exercise 2 (on your printed sheet)
?
?
?
(Solutions to (d) and (e) on next slide)
Exercise 2 (on your printed sheet)
?
?
Exercise 2 (on your printed sheet)
635 52
45
12 17 28
?
??
6 7 8 9
Shoe Size
Fre
quen
cy
Height
1.0m 1.2m 1.4m 1.6m 1.8m
Fre
quen
cy D
ensit
y
Bar Charts• For discrete data.• Frequency given by
height of bars.
Histograms• For continuous data.• Data divided into (potentially
uneven) intervals.• [GCSE definition] Frequency
given by area of bars.*• No gaps between bars.
? ?
??
Bar Charts vs Histograms
* Not necessarily true. We’ll correct this in a sec.
Use this as a reason whenever you’re asked to justify use of a histogram.
F.D.
Freq
Width
Weight (w kg) Frequency Frequency Density
0 < w ≤ 10 40 4
10 < w ≤ 15 6 1.2
15 < w ≤ 35 52 2.6
35 < w ≤ 45 10 1
??
??
10 20 30 40 50Height (m)
5
4
3
2
1
Freq
uenc
y De
nsity
Frequency = 15
Frequency = 30
Frequency = 40
Frequency = 25?
?
?
?
Bar Charts vs HistogramsStill using the ‘incorrect’ GCSE formula:
Q1
Q2
SKILL #1 :: Area = frequency?
5
4
3
2
1
0
Freq
uenc
y De
nsity
There were 60 runners in a 100m race. The following histogram represents their times. Determine the number of runners with times above 14s.
9 12 18
Time (s)
Total frequency is known; therefore find total area and hence the ‘scaling’.
Total area = 15 + 9 = 24
Then use this scaling along with the desired area.
Area=4×1.5
Area Freq
Area Freq
?
?
Unlike at GCSE, the area of a bar is not necessarily equal to the frequency; there are just proportional.
! Identify the scaling using a known area with known frequency (which may be total area/frequency or just one bar)
A policeman records the speed of the traffic on a busy road with a 30 mph speed limit. He records the speeds of a sample of 450 cars. The histogram in Figure 2 represents the results.
(a) Calculate the number of cars that were exceeding the speed limit by at least 5 mph in the sample. (4 marks)
M1 A1: Determine what one small square or one large square is worth.
(i.e. work out scaling)
M1 A1: Use this to find number of cars travelling >35mph.
May 2012 Q5
7
6
5
4
3
2
1
Bro Tip: We can make the frequency density scale what we like.
Area Freq?
Area Freq
Test Your Understanding (on your printed sheet)
Write:
?
(b) Estimate the value of the mean speed of the cars in the sample. (3 marks)
M1 M1: Use histogram to construct sum of speeds.
30×12.5+240×25+…450
A1 Correct value
¿28.8
?
?
Bro Tip: Whenever you are asked to calculate mean, median or quartiles from a histogram, form a grouped frequency table. Use your scaling factor to work out the frequency of each bar.
Test Your Understanding (on your printed sheet)
Test Your Understanding (on your printed sheet)
(c) Estimate, to 1 decimal place, the value of the median speed of the cars in the sample.(2)(d) Comment on the shape of the distribution. Give a reason for your answer. (2)(e) State, with a reason, whether the estimate of the mean or the median is a better representation of the average speed of the traffic on the road. (2)
?
?
?
SKILL #2 :: Gaps!Weight (to nearest kg)
Frequency F.D.
1-2
3-6
7-9
2
1
0
Freq
uenc
y De
nsity
1 2 3 4 5 6 7 8 9 10 Time (s)
??
Note the gaps affects class width!Remember the frequency density axis is only correct to scale, so there may be some scaling. However in an exam scaling is unlikely to be required for F.D. if the F.D. scale is already given.
?
?
?We set the scaling between area and frequency to be 1.
Jan 2012 Q1
14?
5?
Bro Tip: Be careful that you use the correct class widths!
21 + 45 + 3 = 69?
Test Your Understanding (on your printed sheet)
SKILL #3 :: Width and height on diagramAn exam favourite is to ask what width and height we’d draw a bar in a drawn histogram.
Q: The frequency table shows some running times. On a histogram the bar for 0-4 seconds is drawn with width 6cm and height 8cm. Find the width and height of the bar for 4-6 seconds.
Time (seconds) Frequency
! Bro Tip: Find the scaling for class width to drawn width and frequency density to drawn height.
For 0-4 bar:Class width Frequency density Scaling for width: 1.5Scaling for height: 4
4-6 bar: class width 2, frequency density 4.5
Strategy ?
Solution ?
Test Your Understanding (on your printed sheet)
?
?
Q1
Exercise 3 (on your printed sheet)
?
Answer: Distance is continuous
Note that gaps in the class intervals!4 / 5 = 0.819 / 5 = 3.853 / 10 = 5.3...
?
?
Q2
Exercise 3 (on your printed sheet)
Exercise 3 (on your printed sheet)
?
?
?
?
Q3
Exercise 3 (on your printed sheet)
Q4 [June 2007 Q5]
??
?
?
?
?
Exercise 3 (on your printed sheet)
Q5
?
?
?
?
Exercise 3 (on your printed sheet)
Q6
??
?
??
Exercise 3 (on your printed sheet)
Q7
a ?
b ?
c ?
d ?e ?