146 11 categorical_data online

34
MATH& 146 Lesson 11 Section 1.6 Categorical Data 1

Upload: greg-kent

Post on 07-Feb-2017

75 views

Category:

Education


2 download

TRANSCRIPT

Page 1: 146 11 categorical_data online

MATH& 146

Lesson 11

Section 1.6

Categorical Data

1

Page 2: 146 11 categorical_data online

Frequency

The first step to organizing categorical data is to count the number of data values there are in each category of interest.

We can organize these counts (or frequencies) into a frequency table, which records the totals and the category names.

2

Page 3: 146 11 categorical_data online

Frequency

A class with 20 students had the following

distribution of grades:

A, A, A, B, B, B, B, B, C, C, C, D, D, D, D, D, D, F, F, F

3

GRADE FREQUENCY

A 3

B 5

C 3

D 6

F 3

Page 4: 146 11 categorical_data online

GRADE FREQUENCY RELATIVE FREQUENCY

A 3 0.15

B 5 0.25

C 3 0.15

D 6 0.30

F 3 0.15

Relative Frequency

A relative frequency is the proportion of times a

category occurs. Relative frequencies can be

written as fractions, decimals, or percents.

4

Page 5: 146 11 categorical_data online

GRADE FREQUENCYRELATIVE

FREQUENCY

CUMULATIVE RELATIVE

FREQUENCY

A 3 0.15 0.15

B 5 0.25 0.40

C 3 0.15 0.55

D 6 0.30 0.85

F 3 0.15 1.00

Cumulative Relative

Frequency

Cumulative relative frequency is the

accumulation of the previous relative frequencies.

5

Page 6: 146 11 categorical_data online

Example 1

Fifty part-time students were asked how many courses

they were taking this term. The (incomplete) results

are shown below:

a. Fill in the blanks in the table above.

b. What percent of students take exactly two courses?

c. What percent of students take at most two courses?

6

# of Courses Frequency Relative Frequency

Cumulative Relative

Frequency

1 30 0.6

2 15

3

Page 7: 146 11 categorical_data online

Graphs of Categorical Data

There are two simple visual summaries that are

used for categorical data

Circle graphs (pie charts) show the amount of

data that belong to each category as a proportional

part of the whole.

Bar graphs consist of bars that are separated

from each other. The bars can be rectangles or

they can be rectangular boxes and they can be

vertical or horizontal.

7

Page 8: 146 11 categorical_data online

Graphs of Categorical Data

To get a better sense of graphing categorical data,

consider the following table about the Titanic. The

table lists the number and percentages in each class

on the Titanic's voyage.

8

CLASS FREQUENCY RELATIVE FREQUENCY

First 325 14.77%

Second 285 12.95%

Third 706 32.08%

Crew 885 40.21%

Total 2201 100.01%

Page 9: 146 11 categorical_data online

When you are interested in relative frequencies, a

pie chart might be your display of choice.

Pie Charts

They slice the circle into

pieces whose size is

proportional to the

fraction of the whole in

each category.

9

Page 10: 146 11 categorical_data online

10

Page 11: 146 11 categorical_data online

Pie Charts

There are two rules to

follow when creating a

pie chart:

1) The pieces have to

add up to 100%.

2) No person can be

represented in

more than one

piece.

11

BAD PIE CHART

271% even without

an Other category.

Page 12: 146 11 categorical_data online

Example 2

Which set of percentages

would best fit this pie

chart?

A. 54%, 8%, 30%, 8%

B. 47%, 23%, 8%, 22%

C. 51%, 17%, 15%, 17%

D. 27%, 26%, 24%, 23%

12

Page 13: 146 11 categorical_data online

Bar Charts

A bar chart displays the distribution of a

categorical variable, showing the counts for each

category next to each other for easy comparison.

Notice that each bar is separated from each other.

13

Page 14: 146 11 categorical_data online

Pie Charts vs. Bar Charts

While pie charts are well known, they are not

typically as useful as other charts. It is generally

more difficult to compare group sizes in a pie chart

than in a bar chart, especially when categories

have nearly identical counts or proportions.

14

Page 15: 146 11 categorical_data online

Example 3

Use the graphs to rank the categories from largest

to smallest.

15

Page 16: 146 11 categorical_data online

Example 4

Which category is largest? Which is smallest?

16

Page 17: 146 11 categorical_data online

The Titanic

Here is part of a data matrix about the passengers

and crew aboard the Titanic. Each case (row) of

the data table represents a person on board the

ship.

Survived Age Sex Class

Died Adult Male Third

Survived Adult Male Crew

Died Child Male Third

Survived Child Female First

Died Adult Male Third

Died Adult Female Crew17

Page 18: 146 11 categorical_data online

The Titanic

The problem with data matrices is that you can't

see what's going on. And seeing is just what we

want to do. We need ways to show the data so

that we can see patterns, relationships, trends,

and exceptions.

Survived Age Sex Class

Died Adult Male Third

Survived Adult Male Crew

Died Child Male Third

Survived Child Female First

Died Adult Male Third

Died Adult Female Crew18

Page 19: 146 11 categorical_data online

The Titanic

To look at two categorical variables together, we

often arrange the counts in a two-way table. Here

is a two-way table of those aboard the Titanic,

classified according to class of ticket and whether

or not they survived.

Class

First Second Third Crew Total

Su

rviv

al Survived 203 118 178 212 711

Died 122 167 528 673 1490

Total 325 285 706 885 2201

19

Page 20: 146 11 categorical_data online

The Titanic

Because the table shows how the individuals are

distributed along each variable, contingent on the

value of the other variable, such a table is called a

contingency table.

Class

First Second Third Crew Total

Su

rviv

al Survived 203 118 178 212 711

Died 122 167 528 673 1490

Total 325 285 706 885 2201

20

Page 21: 146 11 categorical_data online

Class Frequency

First 325

Second 285

Third 706

Crew 885

Total 2201

The margins of the table, both on

the right and at the bottom, give

totals. The bottom line is just the

frequency table of the variable

Class.

Contingency Tables

Class

First Second Third Crew Total

Su

rviv

al Survived 203 118 178 212 711

Died 122 167 528 673 1490

Total 325 285 706 885 2201

21

Page 22: 146 11 categorical_data online

The right column of the table is the frequency table

of the variable Survival.

Contingency Tables

Class

First Second Third Crew Total

Su

rviv

al Survived 203 118 178 212 711

Died 122 167 528 673 1490

Total 325 285 706 885 2201

Survival Frequency

Survived 711

Died 1490

Total 2201

22

Page 23: 146 11 categorical_data online

Each cell of the table gives the count for a

combination of values of the two variables. For

example, the highlighted cell shows that 118

second-class passengers survived.

So what does the green highlighted cell show?

Contingency Tables

Class

First Second Third Crew Total

Su

rviv

al Survived 203 118 178 212 711

Died 122 167 528 673 1490

Total 325 285 706 885 2201

23

Page 24: 146 11 categorical_data online

Row Proportions

The table below shows the row proportions for

the Titanic data set. The row proportions are

computed as the counts divided by their row totals.

24

Class

First Second Third Crew Total

Su

rviv

al

Survived 203/711 = .286 118/711 = .166 178/711 = .250 212/711 = .298 711/711 = 1.000

Died122/1490 =

.082167/1490 = .112

528/1490 =

.354

673/1490 =

.452

1490/1490 =

1.000

Total325/2201 =

.148

285/2201 =

.129

706/2201 =

.321

885/2201 =

.402

2201/2201 =

1.000

Page 25: 146 11 categorical_data online

Row Proportions

So what does 203/711 = .286 (first column, first

row) represent?

It corresponds to the proportion of survivors who

were in first class.

25

Class

First Second Third Crew Total

Su

rviv

al

Survived 203/711 = .286 118/711 = .166 178/711 = .250 212/711 = .298 711/711 = 1.000

Died122/1490 =

.082167/1490 = .112

528/1490 =

.354

673/1490 =

.452

1490/1490 =

1.000

Total325/2201 =

.148

285/2201 =

.129

706/2201 =

.321

885/2201 =

.402

2201/2201 =

1.000

Page 26: 146 11 categorical_data online

Example 5

a) What does 167/1490 = .112 (second column,

second row) represent in the table?

b) What does 885/2201 = .402 (fourth column,

third row) represent in the table?

26

Class

First Second Third Crew Total

Su

rviv

al

Survived 203/711 = .286 118/711 = .166 178/711 = .250 212/711 = .298 711/711 = 1.000

Died122/1490 =

.082167/1490 = .112

528/1490 =

.354

673/1490 =

.452

1490/1490 =

1.000

Total325/2201 =

.148

285/2201 =

.129

706/2201 =

.321

885/2201 =

.402

2201/2201 =

1.000

Page 27: 146 11 categorical_data online

Column Proportions

A contingency table of the column proportions is

computed in a similar way, where each column

proportion is computed as the count divided by the

corresponding column total.

27

Class

First Second Third Crew Total

Su

rviv

al

Survived 203/325 = .625 118/285 = .414 178/706 = .252 212/885 = .240 711/2201 = .323

Died 122/325 = .375 167/285 = .586 528/706 = .748 673/885 = .7601490/2201 =

.677

Total325/325 =

1.000

285/285 =

1.000

706/706 =

1.000

885/885 =

1.000

2201/2201 =

1.000

Page 28: 146 11 categorical_data online

Example 6

a) What does 167/285 = .586 (second column,

second row) represent in the table?

b) What does 711/2201 = .323 (fifth column, first

row) represent in the table?

28

Class

First Second Third Crew Total

Su

rviv

al

Survived 203/325 = .625 118/285 = .414 178/706 = .252 212/885 = .240 711/2201 = .323

Died 122/325 = .375 167/285 = .586 528/706 = .748 673/885 = .7601490/2201 =

.677

Total325/325 =

1.000

285/285 =

1.000

706/706 =

1.000

885/885 =

1.000

2201/2201 =

1.000

Page 29: 146 11 categorical_data online

Column Proportions

In the table, the value 0.625 indicates that 62.5%

of first class passengers survived. This rate of

survival is much higher compared to second class

passengers (41.4%), third class passengers

(25.2%), or crew members (24.0%).

29

Class

First Second Third Crew Total

Su

rviv

al

Survived 203/325 = .625 118/285 = .414 178/706 = .252 212/885 = .240 711/2201 = .323

Died 122/325 = .375 167/285 = .586 528/706 = .748 673/885 = .7601490/2201 =

.677

Total325/325 =

1.000

285/285 =

1.000

706/706 =

1.000

885/885 =

1.000

2201/2201 =

1.000

Page 30: 146 11 categorical_data online

Column Proportions

Because these differences in survival rates

between the classes is unlikely from random

chance alone, this provides evidence that the class

and survival variables are associated. We say the

two variables are dependent.

30

Class

First Second Third Crew Total

Su

rviv

al

Survived 203/325 = .625 118/285 = .414 178/706 = .252 212/885 = .240 711/2201 = .323

Died 122/325 = .375 167/285 = .586 528/706 = .748 673/885 = .7601490/2201 =

.677

Total325/325 =

1.000

285/285 =

1.000

706/706 =

1.000

885/885 =

1.000

2201/2201 =

1.000

Page 31: 146 11 categorical_data online

Example 3

A random set of 100 people who have pets were

polled to see if there was an association between

gender and whether they preferred either a dog or

a cat. The results of the survey are below.

31

Dog Cat Total

Male 40 10 50

Female 20 30 50

Total 60 40 100

Page 32: 146 11 categorical_data online

Example 3 continued

a) Compute and interpret the column proportions.

b) Does there appear to be an association

between gender and type of pet? Explain.

32

Dog Cat Total

Male 40 10 50

Female 20 30 50

Total 60 40 100

Page 33: 146 11 categorical_data online

Example 4

There are 10 boys and 12 girls in Mr. Fleck's fourth

grade class and 15 boys and 18 girls in Mrs. Parker’s

fourth grade class. One student is randomly selected

to be hall monitor.

a) Use this information to complete the contingency

table below.

33

Teacher

Gender

Boy Girl Total

Mr. Fleck

Mrs. Parker

Total

Page 34: 146 11 categorical_data online

Example 4 continued

a) Compute and interpret the row proportions.

b) Does there appear to be an association between

teacher and student's gender? Explain.

34

Gender

Boy Girl Total

Mr. Fleck 10 12 22

Mrs. Parker 15 18 33

Total 25 30 55