146 11 categorical_data online
TRANSCRIPT
![Page 1: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/1.jpg)
MATH& 146
Lesson 11
Section 1.6
Categorical Data
1
![Page 2: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/2.jpg)
Frequency
The first step to organizing categorical data is to count the number of data values there are in each category of interest.
We can organize these counts (or frequencies) into a frequency table, which records the totals and the category names.
2
![Page 3: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/3.jpg)
Frequency
A class with 20 students had the following
distribution of grades:
A, A, A, B, B, B, B, B, C, C, C, D, D, D, D, D, D, F, F, F
3
GRADE FREQUENCY
A 3
B 5
C 3
D 6
F 3
![Page 4: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/4.jpg)
GRADE FREQUENCY RELATIVE FREQUENCY
A 3 0.15
B 5 0.25
C 3 0.15
D 6 0.30
F 3 0.15
Relative Frequency
A relative frequency is the proportion of times a
category occurs. Relative frequencies can be
written as fractions, decimals, or percents.
4
![Page 5: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/5.jpg)
GRADE FREQUENCYRELATIVE
FREQUENCY
CUMULATIVE RELATIVE
FREQUENCY
A 3 0.15 0.15
B 5 0.25 0.40
C 3 0.15 0.55
D 6 0.30 0.85
F 3 0.15 1.00
Cumulative Relative
Frequency
Cumulative relative frequency is the
accumulation of the previous relative frequencies.
5
![Page 6: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/6.jpg)
Example 1
Fifty part-time students were asked how many courses
they were taking this term. The (incomplete) results
are shown below:
a. Fill in the blanks in the table above.
b. What percent of students take exactly two courses?
c. What percent of students take at most two courses?
6
# of Courses Frequency Relative Frequency
Cumulative Relative
Frequency
1 30 0.6
2 15
3
![Page 7: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/7.jpg)
Graphs of Categorical Data
There are two simple visual summaries that are
used for categorical data
Circle graphs (pie charts) show the amount of
data that belong to each category as a proportional
part of the whole.
Bar graphs consist of bars that are separated
from each other. The bars can be rectangles or
they can be rectangular boxes and they can be
vertical or horizontal.
7
![Page 8: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/8.jpg)
Graphs of Categorical Data
To get a better sense of graphing categorical data,
consider the following table about the Titanic. The
table lists the number and percentages in each class
on the Titanic's voyage.
8
CLASS FREQUENCY RELATIVE FREQUENCY
First 325 14.77%
Second 285 12.95%
Third 706 32.08%
Crew 885 40.21%
Total 2201 100.01%
![Page 9: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/9.jpg)
When you are interested in relative frequencies, a
pie chart might be your display of choice.
Pie Charts
They slice the circle into
pieces whose size is
proportional to the
fraction of the whole in
each category.
9
![Page 10: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/10.jpg)
10
![Page 11: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/11.jpg)
Pie Charts
There are two rules to
follow when creating a
pie chart:
1) The pieces have to
add up to 100%.
2) No person can be
represented in
more than one
piece.
11
BAD PIE CHART
271% even without
an Other category.
![Page 12: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/12.jpg)
Example 2
Which set of percentages
would best fit this pie
chart?
A. 54%, 8%, 30%, 8%
B. 47%, 23%, 8%, 22%
C. 51%, 17%, 15%, 17%
D. 27%, 26%, 24%, 23%
12
![Page 13: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/13.jpg)
Bar Charts
A bar chart displays the distribution of a
categorical variable, showing the counts for each
category next to each other for easy comparison.
Notice that each bar is separated from each other.
13
![Page 14: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/14.jpg)
Pie Charts vs. Bar Charts
While pie charts are well known, they are not
typically as useful as other charts. It is generally
more difficult to compare group sizes in a pie chart
than in a bar chart, especially when categories
have nearly identical counts or proportions.
14
![Page 15: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/15.jpg)
Example 3
Use the graphs to rank the categories from largest
to smallest.
15
![Page 16: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/16.jpg)
Example 4
Which category is largest? Which is smallest?
16
![Page 17: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/17.jpg)
The Titanic
Here is part of a data matrix about the passengers
and crew aboard the Titanic. Each case (row) of
the data table represents a person on board the
ship.
Survived Age Sex Class
Died Adult Male Third
Survived Adult Male Crew
Died Child Male Third
Survived Child Female First
Died Adult Male Third
Died Adult Female Crew17
![Page 18: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/18.jpg)
The Titanic
The problem with data matrices is that you can't
see what's going on. And seeing is just what we
want to do. We need ways to show the data so
that we can see patterns, relationships, trends,
and exceptions.
Survived Age Sex Class
Died Adult Male Third
Survived Adult Male Crew
Died Child Male Third
Survived Child Female First
Died Adult Male Third
Died Adult Female Crew18
![Page 19: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/19.jpg)
The Titanic
To look at two categorical variables together, we
often arrange the counts in a two-way table. Here
is a two-way table of those aboard the Titanic,
classified according to class of ticket and whether
or not they survived.
Class
First Second Third Crew Total
Su
rviv
al Survived 203 118 178 212 711
Died 122 167 528 673 1490
Total 325 285 706 885 2201
19
![Page 20: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/20.jpg)
The Titanic
Because the table shows how the individuals are
distributed along each variable, contingent on the
value of the other variable, such a table is called a
contingency table.
Class
First Second Third Crew Total
Su
rviv
al Survived 203 118 178 212 711
Died 122 167 528 673 1490
Total 325 285 706 885 2201
20
![Page 21: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/21.jpg)
Class Frequency
First 325
Second 285
Third 706
Crew 885
Total 2201
The margins of the table, both on
the right and at the bottom, give
totals. The bottom line is just the
frequency table of the variable
Class.
Contingency Tables
Class
First Second Third Crew Total
Su
rviv
al Survived 203 118 178 212 711
Died 122 167 528 673 1490
Total 325 285 706 885 2201
21
![Page 22: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/22.jpg)
The right column of the table is the frequency table
of the variable Survival.
Contingency Tables
Class
First Second Third Crew Total
Su
rviv
al Survived 203 118 178 212 711
Died 122 167 528 673 1490
Total 325 285 706 885 2201
Survival Frequency
Survived 711
Died 1490
Total 2201
22
![Page 23: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/23.jpg)
Each cell of the table gives the count for a
combination of values of the two variables. For
example, the highlighted cell shows that 118
second-class passengers survived.
So what does the green highlighted cell show?
Contingency Tables
Class
First Second Third Crew Total
Su
rviv
al Survived 203 118 178 212 711
Died 122 167 528 673 1490
Total 325 285 706 885 2201
23
![Page 24: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/24.jpg)
Row Proportions
The table below shows the row proportions for
the Titanic data set. The row proportions are
computed as the counts divided by their row totals.
24
Class
First Second Third Crew Total
Su
rviv
al
Survived 203/711 = .286 118/711 = .166 178/711 = .250 212/711 = .298 711/711 = 1.000
Died122/1490 =
.082167/1490 = .112
528/1490 =
.354
673/1490 =
.452
1490/1490 =
1.000
Total325/2201 =
.148
285/2201 =
.129
706/2201 =
.321
885/2201 =
.402
2201/2201 =
1.000
![Page 25: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/25.jpg)
Row Proportions
So what does 203/711 = .286 (first column, first
row) represent?
It corresponds to the proportion of survivors who
were in first class.
25
Class
First Second Third Crew Total
Su
rviv
al
Survived 203/711 = .286 118/711 = .166 178/711 = .250 212/711 = .298 711/711 = 1.000
Died122/1490 =
.082167/1490 = .112
528/1490 =
.354
673/1490 =
.452
1490/1490 =
1.000
Total325/2201 =
.148
285/2201 =
.129
706/2201 =
.321
885/2201 =
.402
2201/2201 =
1.000
![Page 26: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/26.jpg)
Example 5
a) What does 167/1490 = .112 (second column,
second row) represent in the table?
b) What does 885/2201 = .402 (fourth column,
third row) represent in the table?
26
Class
First Second Third Crew Total
Su
rviv
al
Survived 203/711 = .286 118/711 = .166 178/711 = .250 212/711 = .298 711/711 = 1.000
Died122/1490 =
.082167/1490 = .112
528/1490 =
.354
673/1490 =
.452
1490/1490 =
1.000
Total325/2201 =
.148
285/2201 =
.129
706/2201 =
.321
885/2201 =
.402
2201/2201 =
1.000
![Page 27: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/27.jpg)
Column Proportions
A contingency table of the column proportions is
computed in a similar way, where each column
proportion is computed as the count divided by the
corresponding column total.
27
Class
First Second Third Crew Total
Su
rviv
al
Survived 203/325 = .625 118/285 = .414 178/706 = .252 212/885 = .240 711/2201 = .323
Died 122/325 = .375 167/285 = .586 528/706 = .748 673/885 = .7601490/2201 =
.677
Total325/325 =
1.000
285/285 =
1.000
706/706 =
1.000
885/885 =
1.000
2201/2201 =
1.000
![Page 28: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/28.jpg)
Example 6
a) What does 167/285 = .586 (second column,
second row) represent in the table?
b) What does 711/2201 = .323 (fifth column, first
row) represent in the table?
28
Class
First Second Third Crew Total
Su
rviv
al
Survived 203/325 = .625 118/285 = .414 178/706 = .252 212/885 = .240 711/2201 = .323
Died 122/325 = .375 167/285 = .586 528/706 = .748 673/885 = .7601490/2201 =
.677
Total325/325 =
1.000
285/285 =
1.000
706/706 =
1.000
885/885 =
1.000
2201/2201 =
1.000
![Page 29: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/29.jpg)
Column Proportions
In the table, the value 0.625 indicates that 62.5%
of first class passengers survived. This rate of
survival is much higher compared to second class
passengers (41.4%), third class passengers
(25.2%), or crew members (24.0%).
29
Class
First Second Third Crew Total
Su
rviv
al
Survived 203/325 = .625 118/285 = .414 178/706 = .252 212/885 = .240 711/2201 = .323
Died 122/325 = .375 167/285 = .586 528/706 = .748 673/885 = .7601490/2201 =
.677
Total325/325 =
1.000
285/285 =
1.000
706/706 =
1.000
885/885 =
1.000
2201/2201 =
1.000
![Page 30: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/30.jpg)
Column Proportions
Because these differences in survival rates
between the classes is unlikely from random
chance alone, this provides evidence that the class
and survival variables are associated. We say the
two variables are dependent.
30
Class
First Second Third Crew Total
Su
rviv
al
Survived 203/325 = .625 118/285 = .414 178/706 = .252 212/885 = .240 711/2201 = .323
Died 122/325 = .375 167/285 = .586 528/706 = .748 673/885 = .7601490/2201 =
.677
Total325/325 =
1.000
285/285 =
1.000
706/706 =
1.000
885/885 =
1.000
2201/2201 =
1.000
![Page 31: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/31.jpg)
Example 3
A random set of 100 people who have pets were
polled to see if there was an association between
gender and whether they preferred either a dog or
a cat. The results of the survey are below.
31
Dog Cat Total
Male 40 10 50
Female 20 30 50
Total 60 40 100
![Page 32: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/32.jpg)
Example 3 continued
a) Compute and interpret the column proportions.
b) Does there appear to be an association
between gender and type of pet? Explain.
32
Dog Cat Total
Male 40 10 50
Female 20 30 50
Total 60 40 100
![Page 33: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/33.jpg)
Example 4
There are 10 boys and 12 girls in Mr. Fleck's fourth
grade class and 15 boys and 18 girls in Mrs. Parker’s
fourth grade class. One student is randomly selected
to be hall monitor.
a) Use this information to complete the contingency
table below.
33
Teacher
Gender
Boy Girl Total
Mr. Fleck
Mrs. Parker
Total
![Page 34: 146 11 categorical_data online](https://reader034.vdocument.in/reader034/viewer/2022042906/5899b3ff1a28aba11e8b55fb/html5/thumbnails/34.jpg)
Example 4 continued
a) Compute and interpret the row proportions.
b) Does there appear to be an association between
teacher and student's gender? Explain.
34
Gender
Boy Girl Total
Mr. Fleck 10 12 22
Mrs. Parker 15 18 33
Total 25 30 55