s1: chapter 4 representation of data

44
S1: Chapter 4 Representation of Data Dr J Frost ([email protected]) www.drfrostmaths.com Last modified: 20 th September 2015

Upload: tavi

Post on 22-Feb-2016

59 views

Category:

Documents


3 download

DESCRIPTION

S1: Chapter 4 Representation of Data. Dr J Frost ([email protected]) . Last modified : 9 th September 2013. Stem and Leaf recap. Put the following measurements into a stem and leaf diagram:. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: S1:  Chapter 4 Representation of Data

S1: Chapter 4Representation of Data

Dr J Frost ([email protected])www.drfrostmaths.com

Last modified: 20th September 2015

Page 2: S1:  Chapter 4 Representation of Data

OverviewWe’ll look at 3 different ways of presenting data, as well as ways of analysing them (including ‘skew’).

BOX PLOTS

*NEW since GCSE!* Outliers.

STEM AND LEAF

*NEW since GCSE!* Back to back stem and leaf diagrams.HISTOGRAMS

*NEW since GCSE!* Area is not necessarily equal to frequency.

Page 3: S1:  Chapter 4 Representation of Data

SkewSkew gives a measure of whether the values are more spread out above the median or below the median.

Height

Freq

uenc

y

Weight

Freq

uenc

y

Sketch Mode

Sketch Median

Sketch Mean

mode

median

mean

mode

median

mean

Sketch Mode

Sketch Median

Sketch Mean

We say this distribution has positive skew.(To remember, think that the ‘tail’ points in the positive direction)

We say this distribution has negative skew.? ?

Page 4: S1:  Chapter 4 Representation of Data

Skew

Salaries on the UK.

Distribution Skew

High salaries drag mean up.So positive skew.Mean > Median

IQ A symmetrical distribution, i.e. no skew.Mean = Median

Heights of people in the UK Will probably be a nice ‘bell curve’.i.e. No skew.Mean = Median

Age of retirement Likely to be people who retire significantly before the median age, but not many who retire significantly after. So negative skew.Mean < Median

Remember, think what direction the ‘tail’ is likely to point.

?

?

?

?

?

?

?

?

Page 5: S1:  Chapter 4 Representation of Data

Skew based on mean/medianSuppose for some data we had calculated that and .

Describe the skewness of the marks of the students, giving a reason for your answer. (2)

Negative skew

because mean < median

1st mark

2nd mark

?

?

Bro Tip: If you ever forget which way the two go, just think of salaries! High values (i.e. a positive tail) drag up the mean but not the median. So it’s the position of the mean that determines skew.

Page 6: S1:  Chapter 4 Representation of Data

Skew based on quartiles

Positive skew Negative skew? ?

No skew?

(The data is spread out more in the positive direction, so we have positive skew)

𝑄3−𝑄2>𝑄2−𝑄1 𝑄2−𝑄1>𝑄3−𝑄2

𝑄2−𝑄1=𝑄3−𝑄2

Page 7: S1:  Chapter 4 Representation of Data

Example Exam Question

𝑄3−𝑄2>𝑄2−𝑄11st mark

2nd mark Therefore positive skew.

?

?

Page 8: S1:  Chapter 4 Representation of Data

Test Your Understanding

Available Data Comment on skew (2 marks)Positive skew as Negative skew as Little/no skew as median and mean are roughly equal. ?

?

?

Page 9: S1:  Chapter 4 Representation of Data

Calculating SkewOne measure of skew can be calculated using the following formula: (Important Note: this will be given to you in the exam if required)

3(mean – median)standard deviation

When mean > median, mean < median, and mean = median, we can see this gives us a positive value, negative value, and 0 respectively, as expected.

Find the skew of the following teachers’ annual salaries:

£3 £3.50 £4 £7 £100

Mean = £23.50 Median = £4 Standard Deviation = £38.28

Skew = 1.53

? ? ?

?

Page 10: S1:  Chapter 4 Representation of Data

Exercise 1

Using the available data in each case, state the skew (1 mark) and give a justification (1 mark).

a. Positive skew as b. Mean , Median Negative skew as c. No skew as d. Mean , Median Positive skew as e. Negative skew as

In each case state whether the mean or median would be a more appropriate average (1 mark), and give a reason (1 mark).

f. Median as the data is (positively) skewed.g. Median , Mean Median as the data is (negatively) skewed.

1

2

?????

??

Page 11: S1:  Chapter 4 Representation of Data

Exercise 1

3

??

?

?

Page 12: S1:  Chapter 4 Representation of Data

Exercise 14

?

?

?

Page 13: S1:  Chapter 4 Representation of Data

Stem and Leaf recap

4.7 3.6 3.8 4.7 4.1 2.2 3.6 4.0 4.4 5.0 3.7 4.6 4.8 3.7 3.22.5 3.6 4.5 4.7 5.2 4.7 4.2 3.8 5.1 1.4 2.1 3.5 4.2 2.4 5.1

Put the following measurements into a stem and leaf diagram:

12345

41 2 4 52 5 6 6 6 7 7 8 80 1 2 2 4 5 6 7 7 7 7 80 1 1 2

Now find:

𝑀𝑜𝑑𝑒=4.7𝐿𝑜𝑤𝑒𝑟𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒=3.6𝑈𝑝𝑝𝑒𝑟𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒=4.7 𝑀𝑒𝑑𝑖𝑎𝑛=4.05

(1)(4)(9)(12)(4)

Key:2 | 1 means 2.1?

? ?? ?

Page 14: S1:  Chapter 4 Representation of Data

Back-to-Back Stem and Leaf recap

Girls55 80 84 91 8092 98 40 60 6493 72 96 85 8890 76 54 58 92

91 80 79

Boys80 60 91 65 6781 75 46 72 7174 57 64 60 50

68

The data above shows the pulse rate of boys and girls in a school.

Comment on the results.The back-to-back stem and leaf diagram shows that boy’s pulse rate tends to be lower than girls’.

Girls Boys

456789

60 7 90 0 4 5 7 81 2 4 501

08 5 46 4 0

9 8 6 28 5 4 0 0 08 6 2 2 1 0

Key: 0|4|6Means 40 for girls and 46 for boys.

?

?

Page 15: S1:  Chapter 4 Representation of Data

Box Plots allow us to visually represent the distribution of the data.

Minimum Lower Quartile Median Upper Quartile Maximum

3 15 17 22 27

0 5 10 15 20 25 30

Sketch Sketch Sketch Sketch Sketch

How is the IQR represented in this diagram?

How is the range represented in this diagram?

Sketch Sketch

IQR

range

Box Plot recap

Page 16: S1:  Chapter 4 Representation of Data

OutliersAn outlier is: an extreme value.

0 5 10 15 20 25 30

More specifically, it’s generally when we’re 1.5 IQRs beyond the lower and upper quartiles.(But you will be told in the exam if the rule differs from this)

Outliers beyond this point

?

Page 17: S1:  Chapter 4 Representation of Data

ExamplesSmallest values Largest values Lower Quartile Median Upper Quartile

0, 3 21, 27 8 10 14

0 5 10 15 20 25 30

Draw a box plot to represent the above data.

When there’s an outlier at one end, there’s two allowable places to put the end of the whisker:

Outlier boundaries: Bro Exam Tip: You MUST show your outlier boundary calculations.?

The maximum value not an outlier, 21 (I think this one makes most sense).

OR the outlier boundary, 23.

?

Use one or the other (not both).

Page 18: S1:  Chapter 4 Representation of Data

Test Your Understanding

a ?

b ?

c ?

(on your printed sheet)

Page 19: S1:  Chapter 4 Representation of Data

£100k £150k £200k £250k £300k £350k £400k £450k

Kingston

Croydon

Box Plot comparing house prices of Croydon and Kingston-upon-Thames.

Comparing Box Plots

“Compare the prices of houses in Croydon with those in Kingston”. (2 marks)

For 1 mark, one of:•In interquartile range of house prices in Kingston is greater than Croydon.•The range of house prices in Kingston is greater than Croydon.i.e. Something spread related.

For 1 mark:•The median house price in Kingston was greater than that in Croydon.•i.e. Compare some measure of location (could be minimum, lower quartile, etc.)

? ?

Page 20: S1:  Chapter 4 Representation of Data

Test Your Understanding

Jan 2005 Q2

(on your printed sheet)

?

?

?

Page 21: S1:  Chapter 4 Representation of Data

Exercise 2

a ?

b ?

c ?

d ?

(on your printed sheet)

Page 22: S1:  Chapter 4 Representation of Data

Exercise 2 (on your printed sheet)

?

?

?

?

?

Page 23: S1:  Chapter 4 Representation of Data

Exercise 2 (on your printed sheet)

???

?

Page 24: S1:  Chapter 4 Representation of Data

Exercise 2 (on your printed sheet)

???

?

Page 25: S1:  Chapter 4 Representation of Data

Exercise 2 (on your printed sheet)

?

?

?

(Solutions to (d) and (e) on next slide)

Page 26: S1:  Chapter 4 Representation of Data

Exercise 2 (on your printed sheet)

?

?

Page 27: S1:  Chapter 4 Representation of Data

Exercise 2 (on your printed sheet)

635 52

45

12 17 28

?

??

Page 28: S1:  Chapter 4 Representation of Data

6 7 8 9

Shoe Size

Fre

quen

cy

Height

1.0m 1.2m 1.4m 1.6m 1.8m

Fre

quen

cy D

ensit

y

Bar Charts• For discrete data.• Frequency given by

height of bars.

Histograms• For continuous data.• Data divided into (potentially

uneven) intervals.• [GCSE definition] Frequency

given by area of bars.*• No gaps between bars.

? ?

??

Bar Charts vs Histograms

* Not necessarily true. We’ll correct this in a sec.

Use this as a reason whenever you’re asked to justify use of a histogram.

Page 29: S1:  Chapter 4 Representation of Data

F.D.

Freq

Width

Weight (w kg) Frequency Frequency Density

0 < w ≤ 10 40 4

10 < w ≤ 15 6 1.2

15 < w ≤ 35 52 2.6

35 < w ≤ 45 10 1

??

??

10 20 30 40 50Height (m)

5

4

3

2

1

Freq

uenc

y De

nsity

Frequency = 15

Frequency = 30

Frequency = 40

Frequency = 25?

?

?

?

Bar Charts vs HistogramsStill using the ‘incorrect’ GCSE formula:

Q1

Q2

Page 30: S1:  Chapter 4 Representation of Data

SKILL #1 :: Area = frequency?

5

4

3

2

1

0

Freq

uenc

y De

nsity

There were 60 runners in a 100m race. The following histogram represents their times. Determine the number of runners with times above 14s.

9 12 18

Time (s)

Total frequency is known; therefore find total area and hence the ‘scaling’.

Total area = 15 + 9 = 24

Then use this scaling along with the desired area.

Area=4×1.5

Area Freq

Area Freq

?

?

Unlike at GCSE, the area of a bar is not necessarily equal to the frequency; there are just proportional.

! Identify the scaling using a known area with known frequency (which may be total area/frequency or just one bar)

Page 31: S1:  Chapter 4 Representation of Data

A policeman records the speed of the traffic on a busy road with a 30 mph speed limit. He records the speeds of a sample of 450 cars. The histogram in Figure 2 represents the results.

(a) Calculate the number of cars that were exceeding the speed limit by at least 5 mph in the sample. (4 marks)

M1 A1: Determine what one small square or one large square is worth.

(i.e. work out scaling)

M1 A1: Use this to find number of cars travelling >35mph.

May 2012 Q5

7

6

5

4

3

2

1

Bro Tip: We can make the frequency density scale what we like.

Area Freq?

Area Freq

Test Your Understanding (on your printed sheet)

Write:

?

Page 32: S1:  Chapter 4 Representation of Data

(b) Estimate the value of the mean speed of the cars in the sample. (3 marks)

M1 M1: Use histogram to construct sum of speeds.

30×12.5+240×25+…450

A1 Correct value

¿28.8

?

?

Bro Tip: Whenever you are asked to calculate mean, median or quartiles from a histogram, form a grouped frequency table. Use your scaling factor to work out the frequency of each bar.

Test Your Understanding (on your printed sheet)

Page 33: S1:  Chapter 4 Representation of Data

Test Your Understanding (on your printed sheet)

(c) Estimate, to 1 decimal place, the value of the median speed of the cars in the sample.(2)(d) Comment on the shape of the distribution. Give a reason for your answer. (2)(e) State, with a reason, whether the estimate of the mean or the median is a better representation of the average speed of the traffic on the road. (2)

?

?

?

Page 34: S1:  Chapter 4 Representation of Data

SKILL #2 :: Gaps!Weight (to nearest kg)

Frequency F.D.

1-2

3-6

7-9

2

1

0

Freq

uenc

y De

nsity

1 2 3 4 5 6 7 8 9 10 Time (s)

??

Note the gaps affects class width!Remember the frequency density axis is only correct to scale, so there may be some scaling. However in an exam scaling is unlikely to be required for F.D. if the F.D. scale is already given.

?

?

?We set the scaling between area and frequency to be 1.

Page 35: S1:  Chapter 4 Representation of Data

Jan 2012 Q1

14?

5?

Bro Tip: Be careful that you use the correct class widths!

21 + 45 + 3 = 69?

Test Your Understanding (on your printed sheet)

Page 36: S1:  Chapter 4 Representation of Data

SKILL #3 :: Width and height on diagramAn exam favourite is to ask what width and height we’d draw a bar in a drawn histogram.

Q: The frequency table shows some running times. On a histogram the bar for 0-4 seconds is drawn with width 6cm and height 8cm. Find the width and height of the bar for 4-6 seconds.

Time (seconds) Frequency

! Bro Tip: Find the scaling for class width to drawn width and frequency density to drawn height.

For 0-4 bar:Class width Frequency density Scaling for width: 1.5Scaling for height: 4

4-6 bar: class width 2, frequency density 4.5

Strategy ?

Solution ?

Page 37: S1:  Chapter 4 Representation of Data

Test Your Understanding (on your printed sheet)

?

?

Page 38: S1:  Chapter 4 Representation of Data

Q1

Exercise 3 (on your printed sheet)

?

Page 39: S1:  Chapter 4 Representation of Data

Answer: Distance is continuous

Note that gaps in the class intervals!4 / 5 = 0.819 / 5 = 3.853 / 10 = 5.3...

?

?

Q2

Exercise 3 (on your printed sheet)

Page 40: S1:  Chapter 4 Representation of Data

Exercise 3 (on your printed sheet)

?

?

?

?

Q3

Page 41: S1:  Chapter 4 Representation of Data

Exercise 3 (on your printed sheet)

Q4 [June 2007 Q5]

??

?

?

?

?

Page 42: S1:  Chapter 4 Representation of Data

Exercise 3 (on your printed sheet)

Q5

?

?

?

?

Page 43: S1:  Chapter 4 Representation of Data

Exercise 3 (on your printed sheet)

Q6

??

?

??

Page 44: S1:  Chapter 4 Representation of Data

Exercise 3 (on your printed sheet)

Q7

a ?

b ?

c ?

d ?e ?