s1: chapter 4 representation of data dr j frost ([email protected]) last modified: 25 th...

36
S1: Chapter 4 Representation of Data Dr J Frost ([email protected]) Last modified: 25 th September 2014

Upload: paulina-chisnell

Post on 14-Dec-2015

220 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

S1: Chapter 4Representation of Data

Dr J Frost ([email protected])

Last modified: 25th September 2014

Page 2: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

Stem and Leaf recap

4.7 3.6 3.8 4.7 4.1 2.2 3.6 4.0 4.4 5.0 3.7 4.6 4.8 3.7 3.22.5 3.6 4.5 4.7 5.2 4.7 4.2 3.8 5.1 1.4 2.1 3.5 4.2 2.4 5.1

Put the following measurements into a stem and leaf diagram:

12345

41 2 4 52 5 6 6 6 7 7 8 80 1 2 2 4 5 6 7 7 7 7 80 1 1 2

Now find:

𝑀𝑜𝑑𝑒=4.7𝐿𝑜𝑤𝑒𝑟𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒=3.6𝑈𝑝𝑝𝑒𝑟𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒=4.7 𝑀𝑒𝑑𝑖𝑎𝑛=4.05

(1)(4)(9)(12)(4)

Key:2 | 1 means 2.1?

? ?

? ?

Page 3: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

Back-to-Back Stem and Leaf recap

Girls55 80 84 91 8092 98 40 60 6493 72 96 85 8890 76 54 58 92

91 80 79

Boys80 60 91 65 6781 75 46 72 7174 57 64 60 50

68

The data above shows the pulse rate of boys and girls in a school.

Comment on the results.The back-to-back stem and leaf diagram shows that boy’s pulse rate tends to be lower than girls’.

Girls Boys

456789

60 7 90 0 4 5 7 81 2 4 501

08 5 46 4 0

9 8 6 28 5 4 0 0 08 6 2 2 1 0

Key: 0|4|6Means 40 for girls and 46 for boys.

?

?

Page 4: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

Box Plots allow us to visually represent the distribution of the data.

Minimum Lower Quartile Median Upper Quartile Maximum

3 15 17 22 27

0 5 10 15 20 25 30

Sketch Sketch Sketch Sketch Sketch

How is the IQR represented in this diagram?

How is the range represented in this diagram?

Sketch Sketch

IQR

range

Box Plot recap

Page 5: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

Box Plots recap

0 4 8 12 16 20 24

Sketch a box plot to represent the given weights of cats:

5lb, 6lb, 7.5lb, 8lb, 8lb, 9lb, 12lb, 14lb, 20lb

Minimum Maximum Median Lower Quartile Upper Quartile

5 20 8 7.5 12? ? ? ? ?

Sketch

Page 6: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

OutliersAn outlier is: an extreme value.

0 5 10 15 20 25 30

More specifically, it’s generally when we’re 1.5 IQRs beyond the lower and upper quartiles.(But you will be told in the exam if the rule differs from this)

Outliers beyond this point

?

Page 7: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

Outliers

0 5 10 15 20 25 30

We can display outliers as crosses on a box plot. But if we have one, how do we display the marks for the minimum/maximum?

0 5 10 15 20 25 30

Maximum point is not an outlier, so remains unchanged.

But we have points that are outliers here. This mark becomes the ‘outlier boundary’, rather than the minimum.

Page 8: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

ExamplesSmallest values Largest values Lower Quartile Median Upper Quartile

0, 3 21, 27 8 10 14

0 5 10 15 20 25 30

Smallest values Largest values Lower Quartile Median Upper Quartile

3, 7 20, 25, 26 12 13 16

0 5 10 15 20 25 30

?

?

Page 9: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

Exercises

Pages 58 Exercise 4BQ2

Page 59 Exercise 4CQ1, 2

Page 10: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

£100k £150k £200k £250k £300k £350k £400k £450k

Kingston

Croydon

Box Plot comparing house prices of Croydon and Kingston-upon-Thames.

Comparing Box Plots

“Compare the prices of houses in Croydon with those in Kingston”. (2 marks)

For 1 mark, one of:•In interquartile range of house prices in Kingston is greater than Croydon.•The range of house prices in Kingston is greater than Croydon.i.e. Something spread related.

For 1 mark:•The median house price in Kingston was greater than that in Croydon.•i.e. Compare some measure of location (could be minimum, lower quartile, etc.)

? ?

Page 11: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

6 7 8 9

Shoe Size

Fre

quen

cy

Height

1.0m 1.2m 1.4m 1.6m 1.8m

Fre

quen

cy D

ensi

ty

Bar Charts• For discrete data.• Frequency given by

height of bars.

Histograms• For continuous data.• Data divided into (potentially

uneven) intervals.• [GCSE definition] Frequency

given by area of bars.*• No gaps between bars.

? ?

??

Bar Charts vs Histograms

* Not actually true. We’ll correct this in a sec.

Use this as a reason whenever you’re asked to justify use of a histogram.

Page 12: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

F.D.

Freq

Width

Weight (w kg) Frequency Frequency Density

0 < w ≤ 10 40 4

10 < w ≤ 15 6 1.2

15 < w ≤ 35 52 2.6

35 < w ≤ 45 10 1

??

??

10 20 30 40 50Height (m)

5

4

3

2

1

Freq

uenc

y D

ensi

ty

Frequency = 15

Frequency = 30

Frequency = 40

Frequency = 25?

?

?

?

Bar Charts vs HistogramsStill using the ‘incorrect’ GCSE formula:

Page 13: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

The area of each bar in fact isn’t necessarily equal to the frequency.Actually:

i.e.

Similarly:

Area = frequency?

However, we often let , so that that the becomes an =, as we were allowed to assume at GCSE.

Page 14: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

The key to almost every histogram question……This diagram!

Area Frequency×𝑘

For a given histogram, there’s some scaling to get from an area (whether the total area of the area of a particular bar) to the corresponding frequency.Once you’ve worked out this scaling, any subsequent areas you calculate can be converted to frequencies.

Page 15: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

Area = frequency?

5

4

3

2

1

0

Freq

uenc

y D

ensi

tyThere were 60 runners in a 100m race. The following histogram represents their times. Determine the number of runners with times above 14s.

9 12 18

Time (s)

We first find what area represents the total frequency.

Total area = 15 + 9 = 24

Then use this scaling along with the desired area.

Area=4×1.5

Area Freq

Area Freq

?

?

Page 16: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

Frequency Density = Frequency Class width?

Weight (to nearest kg) Frequency

1-2

3-6

7-9

5

4

3

2

1

0

Freq

uenc

y D

ensi

ty

1 2 3 4 5 6 7 8 9 10

Time (s)

??

Note the gaps!We can use the complete set of information in the first row combined with the bar to again work out the correct ‘scaling’.

Page 17: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

A policeman records the speed of the traffic on a busy road with a 30 mph speed limit. He records the speeds of a sample of 450 cars. The histogram in Figure 2 represents the results.

(a) Calculate the number of cars that were exceeding the speed limit by at least 5 mph in the sample. (4 marks)

M1 A1: Determine what one small square or one large square is worth.

(i.e. work out scaling)

M1 A1: Use this to find number of cars travelling >35mph.

May 2012

7

6

5

4

3

2

1

We can make the frequency density scale what we like.

Area Freq?

Area Freq

?

Page 18: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

A policeman records the speed of the traffic on a busy road with a 30 mph speed limit. He records the speeds of a sample of 450 cars. The histogram in Figure 2 represents the results. (b) Estimate the value of the mean speed of the cars in the sample. (3 marks)

M1 M1: Use histogram to construct sum of speeds.

30×12.5+240×25+…450

A1 Correct value

¿28.8

?

?

May 2012

Bro Tip: Whenever you are asked to calculate mean, median or quartiles from a histogram, form a grouped frequency table. Use your scaling factor to work out the frequency of each bar.

Page 19: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

May 2012

Speed

10-15 12.5 30

20-30 15 240

30-35 32.5 90

35-40 37.5 30

40-45 42.5 60

Page 20: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

Jan 2012

14?

5?

Bro Tip: Be careful that you use the correct class widths!

21 + 45 + 3 = 69?

Page 21: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

M1

A1

B1

M1

A1= 12 runners

?????

Jan 2008

Page 22: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

Answer: Distance is continuous

Note that gaps in the class intervals!4 / 5 = 0.819 / 5 = 3.853 / 10 = 5.3...

?

?

Page 23: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

35 15

(5 x 5) + 15 = 40

? ?

?

Jun 2007

Page 24: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

SkewSkew gives a measure of whether the values are more spread out above the median or below the median.

Height

Freq

uenc

y

Weight

Freq

uenc

y

Sketch Mode

Sketch Median

Sketch Mean

mode

median

mean

mode

median

mean

Sketch Mode

Sketch Median

Sketch Mean

We say this distribution has positive skew.(To remember, think that the ‘tail’ points in the positive direction)

We say this distribution has negative skew.? ?

Page 25: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

Skew

Salaries on the UK.

Distribution Skew

High salaries drag mean up.So positive skew.Mean > Median

IQ A symmetrical distribution, i.e. no skew.Mean = Median

Heights of people in the UK Will probably be a nice ‘bell curve’.i.e. No skew.Mean = Median

Age of retirement Likely to be people who retire significantly before the median age, but not many who retire significantly after. So negative skew.Mean < Median

Remember, think what direction the ‘tail’ is likely to point.

?

?

?

?

?

?

?

?

Page 26: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

Exam QuestionIn the previous parts of a question you’ve calculated that the mean mark of students in a test was and .

(d) Describe the skewness of the marks of the students, giving a reason for your answer. (2)

Negative skew

because mean < median

1st mark

2nd mark

?

?

Page 27: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

Skew

Positive skew Negative skew

Given the quartiles and median, how would you work out whether the distribution had positive or negative skew?

? ?

No skew?

Page 28: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

Exam Question

𝑄3−𝑄2>𝑄2−𝑄11st mark

2nd mark Therefore positive skew.

?

?

Page 29: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

Calculating SkewOne measure of skew can be calculated using the following formula: (Important Note: this will be given to you in the exam if required)

3(mean – median)standard deviation

When mean > median, mean < median, and mean = median, we can see this gives us a positive value, negative value, and 0 respectively, as expected.

Find the skew of the following teachers’ annual salaries:

£3 £3.50 £4 £7 £100

Mean = £23.50 Median = £4 Standard Deviation = £38.28

Skew = 1.53

? ? ?

?

Page 30: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

S1: Chapter 4 Revision!

Page 31: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

RevisionStem and leaf diagrams:• Can you construct one, and write the appropriate key?• Can you calculate mode, mean, median and quartiles?• Can you assess skewness by using these above values?Back-to-back stem and leaf diagrams:• Can you construct one with appropriate key?• Can you compare the data on each side?

12345

41 2 4 52 5 6 6 6 7 7 8 80 1 2 2 4 5 6 7 7 7 7 80 1 1 2

(1)(4)(9)(12)(4)

𝑀𝑜𝑑𝑒=4.7𝐿𝑜𝑤𝑒𝑟𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒=3.6𝑈𝑝𝑝𝑒𝑟𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒=4.7 𝑀𝑒𝑑𝑖𝑎𝑛=4.05

? ?

? ?Type of skewReason:

Key:2 | 1 means 2.1?

? ?

Page 32: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

Girls Boys

456789

60 7 90 0 4 5 7 81 2 4 501

Key: 0|4|6Means 40 for girls and 46 for boys.

Revision

08 5 46 4 0

9 8 6 28 5 4 0 0 08 6 2 2 1 0

The data above shows the pulse rate of boys and girls in a school.

Comment on the results.Boy’s pulse rate tends to be lower than girls’.

Notice the values go outwards from the centre.

??

?

Page 33: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

Revision

Can you:• Appreciate that the frequency density scale doesn’t matter. This is why frequency is

only proportional to area, and not equal to it.• You often need to identify the scaling .

You might only be given the total frequency (in which case you need to find the total area of the histogram to find ).But if you know the frequency associated with a particular bar, just find the area of that single bar.

• If you don’t care about the scaling, then • Be incredibly careful about class widths (i.e. widths of boxes). If the class interval in

the frequency table was with gaps, then you’d draw on the histogram, and use 6 as the width of the box.

• If you want to find the quartiles/median/mean, you need to first construct a grouped frequency table using the histogram.

• When asked to find the number of people with values in a certain range (e.g. with times between 10 and 15s) and it crosses multiple ranges/bars, it’s easier to use the frequency table you’ve constructed from the histogram. Use linear interpolation where necessary.

Histograms

Page 34: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

M1

A1

B1

M1

A1= 12 runners

?????

Revision

Page 35: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

Revision

Smallest values Largest values Lower Quartile Median Upper Quartile

0, 3 21, 27 8 10 14

0 5 10 15 20 25 30

Smallest values Largest values Lower Quartile Median Upper Quartile

3, 7 20, 25, 26 12 13 16

0 5 10 15 20 25 30

?

?

Given that an outlier is a value outside the lower and upper quartiles…

Page 36: S1: Chapter 4 Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 25 th September 2014

Revision

You can determine skewness in three ways:• Comparing quartiles:

When , the width of the right box in the box plot is wider, so it’s positive skew.If a box plot is drawn, it should be immediately obvious!

• Comparing mean/median:When , large values have dragged up the mean, so there’s a tail in the positive direction, and thus the skew is positive.

• Looking at the shape of the distribution. If there’s a ‘positive tail’, the skew is positive.

When asked to justify your answer for skewness, you’re expected to put either something like “” or .You will always be given a formula if you have to calculate a value for skew. But for all formulae, 0 means no skew (i.e. a “symmetric distribution”), >0 means positive skew and <0 means negative skew.

Skewness

𝑆𝑘𝑒𝑤=3 (𝑚𝑒𝑎𝑛−𝑚𝑒𝑑𝑖𝑎𝑛)𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

Find the skew of the following teachers’ annual salaries:

£3 £3.50 £4 £7 £100Mean = £23.50 Median = £4 Standard Deviation = £38.28

Skew = 1.53

? ? ?

?