session 11 & 12

38
SESSION 11 & 12 Last Update 3 rd March 2011 Introduction to Statistics

Upload: liuz

Post on 24-Feb-2016

34 views

Category:

Documents


0 download

DESCRIPTION

SESSION 11 & 12. Last Update 3 rd March 2011. Introduction to Statistics. Learning Objectives. (Cumulative Relative) Frequency tables revisited… Catalogue of graphical representations at your disposal Polygons and Ogives – Differentiation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: SESSION 11 & 12

SESSION 11 & 12

Last Update3rd March 2011

Introduction to Statistics

Page 2: SESSION 11 & 12

Lecturer: Florian BoehlandtUniversity: University of Stellenbosch Business SchoolDomain: http://www.hedge-fund-analysis.net/pages/ve

ga.php

Page 3: SESSION 11 & 12

Learning Objectives

1. (Cumulative Relative) Frequency tables revisited…

2. Catalogue of graphical representations at your disposal

3. Polygons and Ogives – Differentiation

Use this presentation as a guide. The contents are all relevant to your examination unless specified to the contrary!

Page 4: SESSION 11 & 12

Raw Data

1. Determine number of class intervalsSample size n = 25Sturges’ formula:

2. Find maximum and minimum obs:

Obs Investment A1 -4.42 5.83 10.44 1.15 -5.36 0.17 11.98 9.59 22.5

10 -2.311 -4.712 -6.813 2.514 1.415 5.516 7.317 4.918 1319 -2.220 16.321 5.822 15.423 6.224 2.725 13.1

# of intervals = 1 + 1.4 * LN(n)

# of intervals = 1 + 1.4 * LN(25)

# of intervals = 5.506 ≈ 6 (Round up to nearest integer

Maximum = 22.5

Minimum = -6.8

Page 5: SESSION 11 & 12

Raw Data

3. Calculate class width

4. Determine the next lower integer value from the minimum:

This is the starting value for the first class interval

Obs Investment A1 -4.42 5.83 10.44 1.15 -5.36 0.17 11.98 9.59 22.5

10 -2.311 -4.712 -6.813 2.514 1.415 5.516 7.317 4.918 1319 -2.220 16.321 5.822 15.423 6.224 2.725 13.1

Class width = (Max – Min) / # of intervals

# of intervals = (22.5 – (-6.8)) / 6

# of intervals = 4.883 ≈ 5 (Round up to nearest integer)

Minimum = -6.8 ≈ -7

Page 6: SESSION 11 & 12

Class Intervals

Lower Bound Upper Bound Class Interval-7 -2 -7 to < -2 -7 + 5 = -2-2 3 -2 to < 3 -2 + 5 = 33 8 3 to < 8 3 + 5 = 88 13 8 to < 13 8 + 5 = 13

13 18 13 to < 18 13 + 5 = 1818 23 18 to < 23 18 + 5 = 23

5. Start with the lowest (integer) value = 7. Add the class width to calculate the upper bound. The combination of upper and lower bound give the class interval (Don’t forget the inequality to avoid overlaps). Continue in the same fashion until all required class intervals (here 6) are defined.

Page 7: SESSION 11 & 12

Midpoints

5. Calculate the midpoints of the class intervals:

Lower Bound Upper Bound Class Interval Midpoints-7 -2 -7 to < -2 -4.5 (-2 + (-7)) / 2-2 3 -2 to < 3 0.5 (3 + (-2)) / 23 8 3 to < 8 5.5 (8 + 3) / 28 13 8 to < 13 10.5 (13 + 8) / 2

13 18 13 to < 18 15.5 (18 + 13) / 218 23 18 to < 23 20.5 (23 + 18) / 2

midpoint = (Upper Bound + Lower Bound) / 2

Page 8: SESSION 11 & 12

Tally

6. Sort all (return) observations into the class intervals (or bins). You may use a designated tally column to do so manually or use the FREQUENCY function in Excel (the results are integer values)

Lower Bound Upper Bound Class Interval Midpoints Tally-7 -2 -7 to < -2 -4.5 |||| |-2 3 -2 to < 3 0.5 |||| 3 8 3 to < 8 5.5 |||| |8 13 8 to < 13 10.5 ||||

13 18 13 to < 18 15.5 |||18 23 18 to < 23 20.5 |

Page 9: SESSION 11 & 12

Observed Frequencies

7. Convert Tally column to observed Frequencies

Lower Bound Upper Bound Class Interval Midpoints Tally Frequency-7 -2 -7 to < -2 -4.5 |||| | → 6-2 3 -2 to < 3 0.5 |||| → 53 8 3 to < 8 5.5 |||| | → 68 13 8 to < 13 10.5 |||| → 4

13 18 13 to < 18 15.5 ||| → 318 23 18 to < 23 20.5 | → 1

Page 10: SESSION 11 & 12

Cumulative Frequencies

8. Calculate cumulative frequencies as the running subtotal of the frequency column

CumulativeLower Bound Upper Bound Class Interval Midpoints Frequency Frequency

-7 -2 -7 to < -2 -4.5 6 6 6-2 3 -2 to < 3 0.5 5 11 6 + 5 = 113 8 3 to < 8 5.5 6 17 11 + 6 = 178 13 8 to < 13 10.5 4 21 17 + 4 = 21

13 18 13 to < 18 15.5 3 24 21 + 3 = 2418 23 18 to < 23 20.5 1 25 24 + 1 = 25

Page 11: SESSION 11 & 12

Relative Frequencies

9. Calculate the relative frequencies:

Cumulative RelativeLower Bound Upper Bound Class Interval Midpoints Frequency Frequency Frequency

-7 -2 -7 to < -2 -4.5 6 6 0.24 6 / 25 = 0.24-2 3 -2 to < 3 0.5 5 11 0.20 5 / 25 = 0.203 8 3 to < 8 5.5 6 17 0.24 6 / 25 = 0.248 13 8 to < 13 10.5 4 21 0.16 4 / 25 = 0.16

13 18 13 to < 18 15.5 3 24 0.12 3 / 25 = 0.1218 23 18 to < 23 20.5 1 25 0.04 1 / 25 = 0.04

Relative Freq. = Frequency / n

Page 12: SESSION 11 & 12

Cumulative Frequencies

10. Calculate cumulative relative frequencies as the running subtotal of the relative frequency column

Lower Upper Class Cumulative Relative Cum. Rel.Bound Bound Interval Midpoints Frequency Frequency Frequency Frequency

-7 -2 -7 to < -2 -4.5 6 6 0.24 0.24 0.24-2 3 -2 to < 3 0.5 5 11 0.20 0.44 0.24 + 0.20 = 0.443 8 3 to < 8 5.5 6 17 0.24 0.68 0.44 + 0.24 = 0.688 13 8 to < 13 10.5 4 21 0.16 0.84 0.68 + 0.16 = 0.84

13 18 13 to < 18 15.5 3 24 0.12 0.96 0.84 + 0.12 = 0.9618 23 18 to < 23 20.5 1 25 0.04 1.00 0.96 + 0.04 = 1.00

Page 13: SESSION 11 & 12

Histogram – Data required

Select the class intervals as the horizontal axis (x-axis) and the observed frequencies as the vertical (y-axis). The height of the bars in the histogram should represent the observed frequencies for each class interval.

Lower Upper Class Cumulative Relative Cum. Rel.Bound Bound Interval Midpoints Frequency Frequency Frequency Frequency

-7 -2 -7 to < -2 -4.5 6 6 0.24 0.24-2 3 -2 to < 3 0.5 5 11 0.20 0.443 8 3 to < 8 5.5 6 17 0.24 0.688 13 8 to < 13 10.5 4 21 0.16 0.84

13 18 13 to < 18 15.5 3 24 0.12 0.9618 23 18 to < 23 20.5 1 25 0.04 1.00

Page 14: SESSION 11 & 12

Histogram

-7 to < -2 -2 to < 3 3 to < 8 8 to < 13 13 to < 18 18 to < 230

1

2

3

4

5

6

7

Investment A

Class Interval

Freq

uenc

y

Page 15: SESSION 11 & 12

Frequency Polygon – Add Intervals

First, add two additional class intervals. These should have the same width as the other class intervals. Thus, they can be created by subtracting the class width form the lower bound of the first interval and adding the class width to the upper bound of the last class interval. Midpoints are calculated as before [(-7-12)/2 = -9.5 and (28 + 23)/2 = 25.5]. The observed frequencies are zero for both new intervals as all observations fall within the old intervals.

Lower Upper Class Cumulative Relative Cum. Rel.Bound Bound Interval Midpoints Frequency Frequency Frequency Frequency

-12 -7 -12 to < -7 -9.5 0-7 -2 -7 to < -2 -4.5 6 6 0.24 0.24-2 3 -2 to < 3 0.5 5 11 0.20 0.443 8 3 to < 8 5.5 6 17 0.24 0.688 13 8 to < 13 10.5 4 21 0.16 0.84

13 18 13 to < 18 15.5 3 24 0.12 0.9618 23 18 to < 23 20.5 1 25 0.04 1.0023 28 23 to < 28 25.5 0

Page 16: SESSION 11 & 12

Frequency Polygon – Data required

Select the midpoints as the horizontal axis (x-axis) and the observed frequencies as the vertical (y-axis). Instead of bars use markers (x/y-coordinates). Draw a line through all markers.

Lower Upper Class Cumulative Relative Cum. Rel.Bound Bound Interval Midpoints Frequency Frequency Frequency Frequency

-12 -7 -12 to < -7 -9.5 0-7 -2 -7 to < -2 -4.5 6 6 0.24 0.24-2 3 -2 to < 3 0.5 5 11 0.20 0.443 8 3 to < 8 5.5 6 17 0.24 0.688 13 8 to < 13 10.5 4 21 0.16 0.84

13 18 13 to < 18 15.5 3 24 0.12 0.9618 23 18 to < 23 20.5 1 25 0.04 1.0023 28 23 to < 28 25.5 0

Page 17: SESSION 11 & 12

Frequency Polygon

-9.5 -4.5 0.5 5.5 10.5 15.5 20.5 25.50

1

2

3

4

5

6

7

Investment A

Midpoints

Freq

uenc

y

Page 18: SESSION 11 & 12

Frequency Polygon – continued

Occasionally, the data has a predefined minimum and maximum. Consider the following frequency table of class marks in statistics: Lower Upper ClassBound Bound Interval Midpoints Frequency

-10 5 -10 to < 5 -2.5 05 20 5 to < 20 12.5 3

20 35 20 to < 35 27.5 235 50 35 to < 50 42.5 650 65 50 to < 65 57.5 2365 80 65 to < 80 72.5 1280 95 80 to < 95 87.5 395 110 95 to < 110 102.5 0

Using the previous approach leads to Midpoints (or results) and class intervals that are actually impossible. The logical maximum for class marks is = 100, the logical minimum is – 0!

Page 19: SESSION 11 & 12

Frequency Polygon – Data required

The solution is to include the maximum and minimum as two additional points of your frequency polygon (xy-Coordinates: 100/0 and 0/0)Lower Upper ClassBound Bound Interval Midpoints Frequency

0 5 0 05 20 5 to < 20 12.5 3

20 35 20 to < 35 27.5 235 50 35 to < 50 42.5 650 65 50 to < 65 57.5 2365 80 65 to < 80 72.5 1280 95 80 to < 95 87.5 395 110 100 0

Page 20: SESSION 11 & 12

Frequency Polygon – Class Marks

0 12.5 27.5 42.5 57.5 72.5 87.5 1000

5

10

15

20

25

Class Marks

Midpoints

Freq

uenc

y

Page 21: SESSION 11 & 12

Histogram – Freq. Polygon comb.

-12 to < -7 -7 to < -2 -2 to < 3 3 to < 8 8 to < 13 13 to < 18 18 to < 23 23 to < 280

1

2

3

4

5

6

7

Investment A

Class Interval

Freq

uenc

y

Page 22: SESSION 11 & 12

Cum. Freq. Graph – Data required

Select the class intervals as the horizontal axis (x-axis) and the cumulative frequencies as the vertical (y-axis).

Lower Upper Class Cumulative Relative Cum. Rel.Bound Bound Interval Midpoints Frequency Frequency Frequency Frequency

-7 -2 -7 to < -2 -4.5 6 6 0.24 0.24-2 3 -2 to < 3 0.5 5 11 0.20 0.443 8 3 to < 8 5.5 6 17 0.24 0.688 13 8 to < 13 10.5 4 21 0.16 0.84

13 18 13 to < 18 15.5 3 24 0.12 0.9618 23 18 to < 23 20.5 1 25 0.04 1.00

Page 23: SESSION 11 & 12

Cumulative Frequency Graph

-7 to < -2 -2 to < 3 3 to < 8 8 to < 13 13 to < 18 18 to < 230

5

10

15

20

25

30

Investment A

Class Interval

Cum

mul

ative

Fre

quen

cy

Page 24: SESSION 11 & 12

less than Ogive – Add interval

For the less than Ogive Graph, an additional data point is required. We can add an additional class interval “ < -7 “. The observed frequency is zero for the new interval as all observations fall within the old intervals.

Lower Upper Class Cumulative Relative Cum. Rel.Bound Bound Interval Midpoints Frequency Frequency Frequency Frequency

-7 < -7 0-7 -2 -7 to < -2 -4.5 6 6 0.24 0.24-2 3 -2 to < 3 0.5 5 11 0.20 0.443 8 3 to < 8 5.5 6 17 0.24 0.688 13 8 to < 13 10.5 4 21 0.16 0.84

13 18 13 to < 18 15.5 3 24 0.12 0.9618 23 18 to < 23 20.5 1 25 0.04 1.00

Page 25: SESSION 11 & 12

less than Ogive – Data required

Select the upper bounds as the horizontal axis (x-axis) and the cumulative frequencies as the vertical (y-axis).

Lower Upper Class Cumulative Relative Cum. Rel.Bound Bound Interval Midpoints Frequency Frequency Frequency Frequency

-7 < -7 0-7 -2 -7 to < -2 -4.5 6 6 0.24 0.24-2 3 -2 to < 3 0.5 5 11 0.20 0.443 8 3 to < 8 5.5 6 17 0.24 0.688 13 8 to < 13 10.5 4 21 0.16 0.84

13 18 13 to < 18 15.5 3 24 0.12 0.9618 23 18 to < 23 20.5 1 25 0.04 1.00

Page 26: SESSION 11 & 12

less than Ogive

-7 -2 3 8 13 18 230

5

10

15

20

25

30

Investment A

Upper Bounds

Cum

ulati

ve F

requ

ency

Page 27: SESSION 11 & 12

Standardising Data

It may be desirable to express data in terms of relative frequencies. These were calculated before and are contained in the table below (both discrete as well as cumulative). All Graphs introduced so far can be based on relative frequency rather than observed frequency.Lower Upper Class Cumulative Relative Cum. Rel.Bound Bound Interval Midpoints Frequency Frequency Frequency Frequency

-7 -2 -7 to < -2 -4.5 6 6 0.24 0.24-2 3 -2 to < 3 0.5 5 11 0.20 0.443 8 3 to < 8 5.5 6 17 0.24 0.688 13 8 to < 13 10.5 4 21 0.16 0.84

13 18 13 to < 18 15.5 3 24 0.12 0.9618 23 18 to < 23 20.5 1 25 0.04 1.00

Page 28: SESSION 11 & 12

Relative Frequency Polygon – Data required

Select the midpoints as the horizontal axis (x-axis) and the relative frequencies as the vertical (y-axis). Instead of bars use markers (x/y-coordinates). Draw a line through all markers. The relative frequencies for the additional class intervals are = 0 (since the observed frequencies = 0). All that changes in comparison to the observed frequency polygon is the y-axis. The shape of the function remains the same.

Lower Upper Class Cumulative Relative Cum. Rel.Bound Bound Interval Midpoints Frequency Frequency Frequency Frequency

-12 -7 -12 to < -7 -9.5 0 0.00-7 -2 -7 to < -2 -4.5 6 6 0.24 0.24-2 3 -2 to < 3 0.5 5 11 0.20 0.443 8 3 to < 8 5.5 6 17 0.24 0.688 13 8 to < 13 10.5 4 21 0.16 0.84

13 18 13 to < 18 15.5 3 24 0.12 0.9618 23 18 to < 23 20.5 1 25 0.04 1.0023 28 23 to < 28 25.5 0 0.00

Page 29: SESSION 11 & 12

Relative Frequency Polygon

-9.5 -4.5 0.5 5.5 10.5 15.5 20.5 25.50.00

0.05

0.10

0.15

0.20

0.25

0.30

Investment A

Midpoints

Rela

tive

Freq

uenc

y

Page 30: SESSION 11 & 12

OR Pie Chart – Data required

Select the class intervals as the categories for the pie slices and the relative frequencies as their corresponding values. The size of the slices should be representative of the proportion. Note that the additional categories have relative frequencies = 0.00. Thus, they may be omitted without altering the pie chart itself. Due to the difficulties associated with free-hand drawing pie charts not relevant to your examination!

Lower Upper Class Cumulative Relative Cum. Rel.Bound Bound Interval Midpoints Frequency Frequency Frequency Frequency

-12 -7 -12 to < -7 -9.5 0 0.00-7 -2 -7 to < -2 -4.5 6 6 0.24 0.24-2 3 -2 to < 3 0.5 5 11 0.20 0.443 8 3 to < 8 5.5 6 17 0.24 0.688 13 8 to < 13 10.5 4 21 0.16 0.84

13 18 13 to < 18 15.5 3 24 0.12 0.9618 23 18 to < 23 20.5 1 25 0.04 1.0023 28 23 to < 28 25.5 0 0.00

Page 31: SESSION 11 & 12

Pie Chart

Investment A

-7 to < -2-2 to < 33 to < 88 to < 1313 to < 1818 to < 23

Page 32: SESSION 11 & 12

Cumulative Relative Frequency Graph – Data required

Lower Upper Class Cumulative Relative Cum. Rel.Bound Bound Interval Midpoints Frequency Frequency Frequency Frequency

-7 -2 -7 to < -2 -4.5 6 6 0.24 0.24-2 3 -2 to < 3 0.5 5 11 0.20 0.443 8 3 to < 8 5.5 6 17 0.24 0.688 13 8 to < 13 10.5 4 21 0.16 0.84

13 18 13 to < 18 15.5 3 24 0.12 0.9618 23 18 to < 23 20.5 1 25 0.04 1.00

Select the class intervals as the horizontal axis (x-axis) and the cumulative relative frequencies as the vertical (y-axis).

Page 33: SESSION 11 & 12

Cumulative Relative Frequency Graph

-7 to < -2 -2 to < 3 3 to < 8 8 to < 13 13 to < 18 18 to < 230

0.10.20.30.40.50.60.70.80.9

1

Investment A

Class Interval

Cum

mul

ative

Rel

ative

Fre

quen

cy

Page 34: SESSION 11 & 12

less than Ogive (relative Freq.) – Data required

Select the upper bounds as the horizontal axis (x-axis) and the cumulative relative frequencies as the vertical (y-axis). The associated cumulative relative frequency is = 0.00 (since no observations fall below -7).

Lower Upper Class Cumulative Relative Cum. Rel.Bound Bound Interval Midpoints Frequency Frequency Frequency Frequency

-7 < -7 0 0.00-7 -2 -7 to < -2 -4.5 6 6 0.24 0.24-2 3 -2 to < 3 0.5 5 11 0.20 0.443 8 3 to < 8 5.5 6 17 0.24 0.688 13 8 to < 13 10.5 4 21 0.16 0.84

13 18 13 to < 18 15.5 3 24 0.12 0.9618 23 18 to < 23 20.5 1 25 0.04 1.00

Page 35: SESSION 11 & 12

less than Ogive (relative Freq.)

-7 -2 3 8 13 18 230.000.100.200.300.400.500.600.700.800.901.00

Investment A

Upper Bounds

Cum

ulati

ve R

elati

ve F

requ

ency

Page 36: SESSION 11 & 12

less than Ogive (relative Freq.)

-7 -2 3 8 13 18 230.000.100.200.300.400.500.600.700.800.901.00

Investment A

Upper Bounds

Cum

ulati

ve R

elati

ve F

requ

ency

P(X < 0%) i.e. negative Performance. Here ≈ 0.31 or 31%

Page 37: SESSION 11 & 12

Why use Relative Frequencies?

In order to compare two datasets (i.e. Investment A and Investment B), the frequencies need to be standardised to compare the frequency distributions. This is necessary since the sample sizes, class intervals and class width may be different across samples.

Page 38: SESSION 11 & 12

Graphical Representations

Observed Frequencies

Relative Frequencies

Histogram Polygon Ogive

discreet cumulative cumulative discreet

Pie ChartPolygon