frequency distribution statistics

36
FREQUENCY DISTRIBUTIONS How to organize, present and analyze data Content of 60s Pop Songs Yeah Actual Lyrics Baby Oooh

Upload: jialin-wu

Post on 29-May-2017

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Frequency Distribution Statistics

FREQUENCY DISTRIBUTIONS

How to organize, present and analyze data

Content of 60s Pop Songs

YeahActual LyricsBabyOooh

Page 2: Frequency Distribution Statistics

2

Consider the following exampleHow old is John?How old is Mary?How old is Frank?………How old am I?

FREQUENCY DISTRIBUTIONS

Page 3: Frequency Distribution Statistics

3

On the basis of a sample with 40 values, representing the age (in years, thus discrete) of EHL students

40Ages manualCount the number of times each age appears in the sample and chalk it up on the given diagram

EXAMPLE: DISCRETE VARIABLE

Page 4: Frequency Distribution Statistics

4

ABSOLUTE FREQUENCY DISTRIBUTION

Here the y-values represent the frequency in absolute values

Page 5: Frequency Distribution Statistics

5

RELATIVE FREQUENCY DISTRIBUTION

Here the y-values represent the frequency in percentage

240=5%

440=10% 3

40=7.5%

Page 6: Frequency Distribution Statistics

6

THE MOST FREQUENT VALUE: THE MODE

The MODE is found by the Xcel function: MODE (ranges) Result: 21 years

There are 8 21-year old students in this sample. This represents the LARGEST frequency, ie, the MODE

The set of these 8 21-year old students is called the MODAL CLASS

Page 7: Frequency Distribution Statistics

7

SPECIAL CASE

This frequency distribution has two (nearly equal) peaks: Bi-modal distribution

Page 8: Frequency Distribution Statistics

8

The median divides the data in two EQUAL parts:50% of the data’s values are BELOW the MEDIAN value50% of the data’s values are ABOVE the MEDIAN valueXcel function: MEDIAN (ranges)

THE MEDIAN VALUE: A “DEMOCRATIC” VALUE

Page 9: Frequency Distribution Statistics

9

POSITION OF THE MEDIAN

The MEDIAN value is 21.5 years (found by Xcel)Notice that there are 20 students younger and 20 students older than the MEDIAN

Page 10: Frequency Distribution Statistics

10

Median: the central data point of a data set after sorting.If the data has an odd number of values it’s literally the data value in the center of the sorted data set.If the data set has an even number of values it’s the average of the two values closest to the center of the sorted data set.

Example: annual precipitations in Geneva between 1976 and 1993 (mm)

After sortingTo find the position of the Median :

Here:

WHAT IS THE MEDIAN ?

583 890 777 958 875 926 524 756 619730 688 528 901 884 969 1258 850 939

524 528 583 619 688 730 756 777 850 875 884 890 901 926 939 958 969 1258

9.5 value out of 18 Center of the data set

Page 11: Frequency Distribution Statistics

11

THE AVERAGE (AVG) VALUE: A “BALANCED” MEASURE

: the values of the variable

: SUM

: the SUM of ALL the given values

n = number of valuesXcel function: AVERAGE (ranges)

NB: In many textbooks the average is called the “mean”. This gives the honest average a poor image, so it is not used in this course.

Symbol

Formula

𝑥

𝑥=∑ 𝑥𝑖𝑛

Page 12: Frequency Distribution Statistics

12

POSITION OF THE AVG

The AVG value is 21.65 (found by Xcel)This point on the Age axis can be considered the CENTROID of this distribution, hence the idea of a “balanced” value.

Page 13: Frequency Distribution Statistics

13

You made a survey on 10 different families to see how many children they have. You obtained the following observations: 0, 0, 1, 1, 2, 2, 2, 3, 4, 5

Indicate whether each statement is true or false.The mode is 5The average is 2.5The median is 2The variable is quantitativeThe variable is quantitative continuous

QUICK QUIZ

Page 14: Frequency Distribution Statistics

14

When data are classified or in any way grouped, we can calculate the average of the following

= the value of variable at the MIDDLE of the frequency class = the value of the frequency

40Ages computer

THE AVG OF CLASSIFIED DATA

Formula:

Page 15: Frequency Distribution Statistics

15

SYMMETRICAL DISTRIBUTIONS

In perfectly symmetrical frequency distributions, the relative positions of MODE, MEDIAN and AVG coincide

Page 16: Frequency Distribution Statistics

16

ASYMMETRICAL DISTRIBUTIONS

In a asymmetrical frequency distribution the relative positions of these three parameters appear as shown. This distribution is skewed to the right. The mirror image of this situation is also possible.

AVG MEDIANMODE

Page 17: Frequency Distribution Statistics

17

THE RANGE OF A GROUP OF VALUES

Age distribution of 40 students

Page 18: Frequency Distribution Statistics

18

QUICK QUIZ

The distribution is left skewedThe mode is smaller than the median and the averageMode = Median = AverageThe mode is between 50 and 60The average is higher than 5The median is between 4 and 5

From the following frequency distribution, indicate whether each statement is true or false.

0

10

20

30

40

50

60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Page 19: Frequency Distribution Statistics

19

You are given burger sizes of the last 20 burgers sold in one fast food. Answer the following questions.

What is the type of the variable “Burger Size”?Compute the range.Calculate the mode, median and average.Classify the data into 4 classes and compute the frequency distribution.Represent graphically the relative frequency distribution and comment it.

EXERCISE 1

Page 20: Frequency Distribution Statistics

20

QUICK QUIZ

Indicate whether each statement is true or false.

x3= 27 clientsThe sample size is 50 clientsf4 = 18% of days 28 clients came to your restaurantThe median is 28 clientsThe average cannot be calculated

You are reported in the table below the number of clients that came to your restaurant the last 50 days.

Compute the missing valuesxi ni fi Fi

25 5 10.00% 10.00%26   12.00%  27     32.00%28 9 18.00% 50.00%29 11 22.00% 72.00%30      

> 30 5 10.00% 100.00%

Page 21: Frequency Distribution Statistics

21

Using data from the customer satisfaction feedback of one service, answer the following questions:

What is the type of the variable?Compute the absolute and relative frequency distribution.Graph the relative frequency and comment your results.

EXERCISE 2

Page 22: Frequency Distribution Statistics

22

GRAPHICAL TOOLS

Use of different graphical representations depends on the nature (qualitative or quantitative) of the variable being studied.

Qualitative Variable

• Circle diagram• Bar chart

Quantitative Variable

• Discret• Bar chart• Steam and Leaf• Box Plot

• Continous• Histogram• Density Curve• Box Plot

Page 23: Frequency Distribution Statistics

23

GRAPHICAL TOOLS: CIRCLE DIAGRAM

Represents the terms of the variable as a disc. Surfaces for each category are determined by angles that are proportional to observed frequencies.

αi =360°*fi

Page 24: Frequency Distribution Statistics

24

GRAPHICAL TOOLS: BAR CHART

Represents the various possible values of the variable according to their absolute or relative frequency.

Page 25: Frequency Distribution Statistics

25

Annual precipitations in Geneva between 1976 and 1993 (mm):

Procedure:Separate each number into a stem and a leaf.Here, we choose the number of hundreds asthe stem and the tens digit as the leafGroup the numbers with the same stems

Remarks:Stem and leaf plots simultaneously show data repartition and data itselfThe leaves are sorted in increasing orderThe most difficult step is the scale choice: tens/hundreds; sometimes 5/50; 2/20, etc…

GRAPHICAL TOOLS : STEM AND LEAF PLOTS

583 890 777 958 875 926 524 756 619730 688 528 901 884 969 1258 850 939 Stem Leaf

5 2 3 86 2 9

7 3 6 88 5 8 8 99 0 3 4 6 7

101112 6

Page 26: Frequency Distribution Statistics

26

QUICK QUIZ

Indicate whether each statement is true or false.

This graphical representation is called a histogram.The average expenditure cannot be calculated.The expenditures distribution is skewed to the left.The median is at 21.

As a marketing consultant you observed 50 consecutive shoppers at a grocery store, and recorded how much money each shopper spent in the store.

The following graph provides this information.

1| 0 matches for 10 francs

0 2 7 7 8 9

1 0 1 2 3 3 4 4 4 5 5 5 5 7 7 8 8 9

2 0 0 1 1 1 1 4 6 7 9 9

3 1 2 3 3 4 5 6 8 9

4 1 4 6

5 2

6 2 4 4 9

Page 27: Frequency Distribution Statistics

27

QUICK QUIZ

Indicate whether each statement is true or false.

Team 2 is made out of 6 students.The range of the scores is 59.The highest obtained score is 70.The median is 32.40% of the students totaled less than 30 points.The average cannot be calculated.The variable is quantitative discrete.25% of the students have more than 36 points.The circle diagram could be a good graphical representation of the observations.

The scores of a team from the last Statistics quiz are given in the stem and leafs graph below. The quiz was graded on 70pts.

Reading scale :1 | 5 represent 15 points

   1 0 7 92 1 1 3 6 83 0 1 3 5 6 7 7 4 1 1 1 25  6 9

Page 28: Frequency Distribution Statistics

28

GRAPHICAL TOOLS: HISTOGRAM

Represents the distribution of the variable taking into account the frequency and amplitude of classes.

Distribution of employees wages according to the salary classes, Switzerland 2008

Monthly net salary, private and public sector (Confederation) together

Page 29: Frequency Distribution Statistics

29

Great visual representation of many important characteristics of a data set.

Data needed:Minimum and MaximumAverageMedianFirst and Third quartiles (Q1 and Q3)

GRAPHICAL TOOLS: BOX PLOT

Page 30: Frequency Distribution Statistics

36

BOX PLOT ILLUSTRATION

Page 31: Frequency Distribution Statistics

38

QUICK QUIZ

From the Box Plot above, indicate weather each statement is true or false.

75% of airports have an annual traffic lower than 100'000 flights. Half of the airports have an annual traffic greater than 70'000 flights. The skew is positive.Two airports in particular have most traffic.

The Box Plot here under represents the Swiss Civil Aviation Airport traffic in 2009.

Page 32: Frequency Distribution Statistics

39

GRAPH EXAMPLES

Page 33: Frequency Distribution Statistics

40

GRAPH EXAMPLESIn October 2012, a well known newspaper published that “the average salary in Switzerland is ranked 6th among 29 countries used for the study. Below is the reference graph published by the OFS (office féféral de la statistique). What can you conclude?

Page 34: Frequency Distribution Statistics

41

QUICK QUIZ

Given this information, indicate whether each statement is true or false?

The data cannot be graphically represented in terms of relative frequency because the last class “8000 and more” is open.The most suitable graph is the circle diagram because the variable "Salary" is Quantitative continuous.A histogram would be the best graphical representation of the data.The steam and leaf graph is not possible because the Variable "Salary" is classified.

We would like to study the distribution of net monthly salary for Swiss employees in 2013. Relative frequencies per class are given in the table below:

Salaryclassification

Relative frequency

0-3000 CHF 2%3000-4000 CHF 14%4000-5000 CHF 24%5000-6000 CHF 20%6000-7000 CHF 13%7000-8000 CHF 9%

8000 and more CHF 19%Total 100%

Page 35: Frequency Distribution Statistics

42

The life cycle of 20 bulbs from the company Superligth SA has been measured during a control. The results obtained are in the stem-and-leaf (see Excel file).

Find the quartiles of this distribution and compute the IQR.Find the average life cycle knowing that the sum of leafs are 18800 hours.Find the mode?

EXERCISE 3

Page 36: Frequency Distribution Statistics

43

Answer the following questions using the available exam grades distribution.

How many students attended the exam?Compute the 5-number summary of the exam results.What is the average grade?Draw the graph of the distribution and comment it.

EXERCISE 4