stats chapter 4
DESCRIPTION
TRANSCRIPT
![Page 1: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/1.jpg)
Chapter 4
More about relationships between 2 variables
![Page 2: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/2.jpg)
4.1 TRANSFORMING TO ACHIEVE LINEARITY
![Page 3: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/3.jpg)
What if the scatterplot is not linear?
• Of course not all data is linear!• Our method in statistics will involving
mathematically operating on one or both of the explanatory and response variables
• An inverse transformation will be used to create a non-linear regression model
• This will be a little “mathy”
![Page 4: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/4.jpg)
Transformations
• Before we begin transformations, remember that some well known phenomenon act in predictable ways– I.e. when working with time and gravity,
you should know that there is a square relationship between distance and time!
![Page 5: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/5.jpg)
The Basics
• The data from measurements (raw data) must be operated on.
• Apply the same mathematical transformation on the raw data– Ex. “Square every response”
• Use methods from the previous chapter to find the LSRL for the transformed data
• Analyze your regression to ensure the LSRL is appropriate
• Apply an inverse transformation on the LSRL to find the regression for the raw data.
![Page 6: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/6.jpg)
Example
Please refer to p 265 exercise 4.2Length (cm) Period (s)
16.5 0.777
17.5 0.839
19.5 0.912
22.5 0.878
28.5 1.004
31.5 1.087
34.5 1.129
37.5 1.111
43.5 1.290
46.5 1.371
106.5 2.115
![Page 7: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/7.jpg)
Example
• Data inputted into L1 and L2
• Scatterplot• Looks pretty good,
right?
![Page 8: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/8.jpg)
Exercise
• LSRL• Y=.6+.015X
r = 0.991• Residual Plot• Perhaps we can do
better!
![Page 9: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/9.jpg)
Example
• L3 = L2^.5 (square root)
• LinReg L1, L3• Note that the value
of r2 has increased• Note that the value
of the residual of the last point has decreased
![Page 10: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/10.jpg)
Exponential Models
• Many natural phenomenon are explained by an exponential model.
• Exponential models are marked by sharp increases in growth and decay.
• Basic model: y = A·Bx
• For this transformation, you need to take the logarithm of the response data.
• You may use “log10” or “ln” your choice.– I prefer “ln” (of course)
![Page 11: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/11.jpg)
Exponential Models
After the transformation, we have the following linear model: ln(y) = a + b·x
1. ln(y) = a + b·x2. eln(y) = e(a + b·x) exponentiate3. y = ea · ebx property of
logarithms4. Let ‘A’ = ea redefine variables
‘B’ = eb
5. y = A·Bx this is our model
![Page 12: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/12.jpg)
Exponential Models
• Since this is an ‘applied math’ course, you need not remember how to apply the inverse transformation
• Whew• BUT you do need to memorize:
when ln(y) = a + bxy = A·Bx
where ‘A’ = ea and ‘B’ = eb
![Page 13: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/13.jpg)
Exponential Models
Let’s try this data
![Page 14: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/14.jpg)
Exponential Models
Take the ln of L2- the response list and store in
L3
![Page 15: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/15.jpg)
Exponential Models
These are our “transformed responses”
![Page 16: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/16.jpg)
Exponential Models
From our homescreen, we perform an LSRL
using the transformed data
![Page 17: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/17.jpg)
Exponential Models
We don’t have to store this regression for transformed
data
![Page 18: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/18.jpg)
Exponential Models
Take note of the values of ‘a’ and ‘b’
![Page 19: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/19.jpg)
Exponential Models
A quick look at the residuals
![Page 20: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/20.jpg)
Exponential Models
The values of the residuals are small .. . no defined pattern
![Page 21: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/21.jpg)
Exponential Models
• Our regression model is exponential y = A·Bx
Where A = ea and B = eb • y = e0.701 · (e0.184)x
![Page 22: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/22.jpg)
Exponential Models
• Our regression model is exponential y = A·Bx
Where A = ea and B = eb • y = e0.701 x (e0.184)x
![Page 23: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/23.jpg)
Exponential Models
• Our regression model is exponential y = A·Bx
Where A = ea and B = eb • y = e0.701 x (e0.184)x
• Ory = 2.06 · (1.20)x
![Page 24: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/24.jpg)
Exponential Models
Put our regression in Y1
![Page 25: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/25.jpg)
Exponential Models
Change Plot1 from a resid. to a scatter plot
![Page 26: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/26.jpg)
Exponential Models
Looks pretty good, eh?
![Page 27: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/27.jpg)
Power Models
• These models are used when the rate of increase is less severe than an exponential model, or if you suspect a ‘root’ model
• For this model, you will find the logarithms of both the expl var and the resp var
![Page 28: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/28.jpg)
Power models
LSRL on transformed data yields:ln(y) = a + b·ln(x)
1. ln(y) = a + b·ln(x)2. e ln(y) = e(a + b·ln(x))
3. y = ea·eln(x^b)
4. y = ea ·xb
5. Let ‘A’ = ea
6. y = A · xb
![Page 29: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/29.jpg)
Power models
Let’s use this data to find a power model
![Page 30: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/30.jpg)
Power models
This time we need to transform both lists
![Page 31: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/31.jpg)
Power models
This time we need to transform both lists
![Page 32: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/32.jpg)
Power models
Transformed exp = L3Transformed resp = L4
![Page 33: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/33.jpg)
Power models
LSRL on transformed datano need to store in Y1
![Page 34: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/34.jpg)
Power models
Take note of the values of ‘a’ and ‘b’
![Page 35: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/35.jpg)
Power models
A quick look at the residuals
![Page 36: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/36.jpg)
Power models
Note that we use the transformed exp var
![Page 37: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/37.jpg)
Power models
No defined pattern
![Page 38: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/38.jpg)
Power models
Residuals are all small in size
![Page 39: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/39.jpg)
Power models
• When ln(y) = a + b·ln(x),y = A · xb
where ‘A’ = ea
Our model is y = (e1.31)· x1.27
![Page 40: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/40.jpg)
Power models
• When ln(y) = a + b·ln(x),y = A · xb
where ‘A’ = ea
Our model is y = (e1.31) · x1.27
![Page 41: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/41.jpg)
Power models
• When ln(y) = a + b·ln(x),y = A · xb
where ‘A’ = ea
Our model is y = (e1.31) · x1.27
Or y = 3.71 · x1.27
![Page 42: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/42.jpg)
Power models
Regression in Y1
![Page 43: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/43.jpg)
Power models
Change from resid to scatter plot
![Page 44: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/44.jpg)
Power models
(notice L1 and L2)
![Page 45: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/45.jpg)
Power models
Looks pretty good!
![Page 46: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/46.jpg)
Power models
• Much like the exponential model, you only need to know how the transformed model becomes the model for the raw data.
• When ln(y) = a + b·ln(x),y = A · xb
where ‘A’ = ea
![Page 47: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/47.jpg)
Transformation thoughts
• Although this is not a major topic for the course, you still need to be able to apply these two transformations (exp and power)
• Be sure to check the residuals for the LSRL on transformed data! You may have picked the wrong model :/
• If one model doesn’t work, try the other. I would start with the exponential model.
• Don’t transform into a cockroach. Ask Kafka!
![Page 48: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/48.jpg)
Assn 4.1
• pg 276 #5, 8, 9, 11, 12
![Page 49: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/49.jpg)
4.2 RELATIONSHIPS BETWEEN CATEGORICAL VARIABLES
![Page 50: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/50.jpg)
Marginal Distributions
• Tables that relate two categorical variables are called “Two-Way Tables”– Ex 4.11 pg 292
• Marginal Distribution– Very fancy term for “row totals and column
totals”– Named because the totals appear in the
margins of the table. Wow.
• Often, the percentage of the row or column table is very informative
![Page 51: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/51.jpg)
Marginal Distributions
Age Group
Female
Male Total
15-17 89 61 15018-24 5668 4697 1036525-34 1904 1589 349435 or older
1660 970 2630
Totals 9321 7317 16639
![Page 52: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/52.jpg)
Marginal Distributions
Age Group
Female
Male Total
15-17 89 61 15018-24 5668 4697 1036525-34 1904 1589 349435 or older
1660 970 2630
Totals 9321 7317 16639
Column Totals
![Page 53: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/53.jpg)
Marginal Distributions
Age Group
Female
Male Total
15-17 89 61 15018-24 5668 4697 1036525-34 1904 1589 349435 or older
1660 970 2630
Totals 9321 7317 16639
Row Totals
![Page 54: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/54.jpg)
Marginal Distributions
Age Group
Female
Male Total
15-17 89 61 15018-24 5668 4697 1036525-34 1904 1589 349435 or older
1660 970 2630
Totals 9321 7317 16639
Grand Total
![Page 55: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/55.jpg)
Marginal Distributions “Age Group”
![Page 56: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/56.jpg)
Marginal Distributions “Age Group”
Age Group
Female
Male Total Marg. Dist.
15-17 89 61 15018-24 5668 4697 1036525-34 1904 1589 349435 or older
1660 970 2630
Totals 9321 7317 16639
![Page 57: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/57.jpg)
Marginal Distributions “Age Group”
Age Group
Female
Male Total Marg. Dist.
15-17 89 61 15018-24 5668 4697 1036525-34 1904 1589 349435 or older
1660 970 2630
Totals 9321 7317 16639Row total / grand
total150/16639=0.009
![Page 58: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/58.jpg)
Marginal Distributions “Age Group”
Age Group
Female
Male Total Marg. Dist.
15-17 89 61 150 0.9%18-24 5668 4697 1036525-34 1904 1589 349435 or older
1660 970 2630
Totals 9321 7317 16639Row total / grand
total150/16639=0.009
![Page 59: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/59.jpg)
Marginal Distributions “Age Group”
Age Group
Female
Male Total Marg. Dist.
15-17 89 61 150 0.9%18-24 5668 4697 10365 62.3%25-34 1904 1589 3494 21.0%35 or older
1660 970 2630 15.8%
Totals 9321 7317 16639 100%
Adds to 100%
![Page 60: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/60.jpg)
Marginal Distributions “Gender”
Age Group
Female
Male Total
15-17 89 61 15018-24 5668 4697 1036525-34 1904 1589 3494
35 &up 1660 970 2630Totals 9321 7317 16639Margin
dist.56% 44% 100%
Similarly for columns
![Page 61: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/61.jpg)
Describing Relationships
• Some relationships are easier to see when we look at the proportions within each group
• These distributions are called “Conditional Distributions”
• To find a conditional distribution, find each percentage of the row or column total.
• Let’s look at the same table, and find the conditional distribution of gender, given each age group
![Page 62: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/62.jpg)
Conditional DistributionsAge
GroupFemale Male Total
15-17 89 61(40.7%)
150(100%)
18-24 5668(54.7%)
4697(45.3%)
10365(100%)
25-34 1904(54.5%)
1589(45.5%)
3494(100%)
35 or older
1660(63.1%)
970(36.9%)
2630(100%)
Totals 9321(56%)
7317(44%)
16639(100%)
![Page 63: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/63.jpg)
Conditional DistributionsAge
GroupFemale Male Total
15-17 89 61(40.7%)
150(100%)
18-24 5668(54.7%)
4697(45.3%)
10365(100%)
25-34 1904(54.5%)
1589(45.5%)
3494(100%)
35 or older
1660(63.1%)
970(36.9%)
2630(100%)
Totals 9321(56%)
7317(44%)
16639(100%)
We will look at the conditional
distribution for this row
![Page 64: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/64.jpg)
Conditional DistributionsAge
GroupFemale Male Total
15-17 89 61(40.7%)
150(100%)
18-24 5668(54.7%)
4697(45.3%)
10365(100%)
25-34 1904(54.5%)
1589(45.5%)
3494(100%)
35 or older
1660(63.1%)
970(36.9%)
2630(100%)
Totals 9321(56%)
7317(44%)
16639(100%)
This cell is 89/150 (cell total /row total)
=53.9%
![Page 65: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/65.jpg)
Conditional DistributionsAge
GroupFemale Male Total
15-17 89(59.3%)
61(40.7%)
150(100%)
18-24 5668(54.7%)
4697(45.3%)
10365(100%)
25-34 1904(54.5%)
1589(45.5%)
3494(100%)
35 or older
1660(63.1%)
970(36.9%)
2630(100%)
Totals 9321(56%)
7317(44%)
16639(100%)
This cell is 89/150 (cell total /row total)
=59.3%
![Page 66: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/66.jpg)
Conditional DistributionsAge
GroupFemale Male Total
15-17 89(59.3%)
61(40.7%)
150(100%)
18-24 5668(54.7%)
4697(45.3%)
10365(100%)
25-34 1904(54.5%)
1589(45.5%)
3494(100%)
35 or older
1660(63.1%)
970(36.9%)
2630(100%)
Totals 9321(56%)
7317(44%)
16639(100%)
This cell is 61/150 (cell total /row total)
=40.7%
![Page 67: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/67.jpg)
Conditional DistributionsAge
GroupFemale Male Total
15-17 89(59.3%)
61(40.7%)
150(100%)
18-24 5668(54.7%)
4697(45.3%)
10365(100%)
25-34 1904(54.5%)
1589(45.5%)
3494(100%)
35 or older
1660(63.1%)
970(36.9%)
2630(100%)
Totals 9321(56%)
7317(44%)
16639(100%)
This cell is 61/150 (cell total /row total)
=40.7%
![Page 68: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/68.jpg)
Conditional DistributionsAge
GroupFemale Male Total
15-17 89(59.3%)
61(40.7%)
150(100%)
18-24 5668(54.7%)
4697(45.3%)
10365(100%)
25-34 1904(54.5%)
1589(45.5%)
3494(100%)
35 or older
1660(63.1%)
970(36.9%)
2630(100%)
Totals 9321(56%)
7317(44%)
16639(100%)
![Page 69: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/69.jpg)
Conditional DistributionsAge
GroupFemale Male Total
15-17 89(59.3%)
61(40.7%)
150(100%)
18-24 5668(54.7%)
4697(45.3%)
10365(100%)
25-34 1904(54.5%)
1589(45.5%)
3494(100%)
35 or older
1660(63.1%)
970(36.9%)
2630(100%)
Totals 9321(56%)
7317(44%)
16639(100%)
The table with complete
conditional distributions for
each row
![Page 70: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/70.jpg)
Conditional DistributionsAge
GroupFemale Male Total
15-17 89(59.3%)
61(40.7%)
150(100%)
18-24 5668(54.7%)
4697(45.3%)
10365(100%)
25-34 1904(54.5%)
1589(45.5%)
3494(100%)
35 or older
1660(63.1%)
970(36.9%)
2630(100%)
Totals 9321(56%)
7317(44%)
16639(100%)
For an analysis of the effect of age
groups, compare a row’s conditional
distribution…
![Page 71: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/71.jpg)
Conditional DistributionsAge
GroupFemale Male Total
15-17 89(59.3%)
61(40.7%)
150(100%)
18-24 5668(54.7%)
4697(45.3%)
10365(100%)
25-34 1904(54.5%)
1589(45.5%)
3494(100%)
35 or older
1660(63.1%)
970(36.9%)
2630(100%)
Totals 9321(56%)
7317(44%)
16639(100%)
With the marginal distribution for the
columns…
![Page 72: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/72.jpg)
Conditional DistributionsAge
GroupFemale Male Total
15-17 89(59.3%)
61(40.7%)
150(100%)
18-24 5668(54.7%)
4697(45.3%)
10365(100%)
25-34 1904(54.5%)
1589(45.5%)
3494(100%)
35 or older
1660(63.1%)
970(36.9%)
2630(100%)
Totals 9321(56%)
7317(44%)
16639(100%)
They should be close …
![Page 73: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/73.jpg)
Conditional DistributionsAge
GroupFemale Male Total
15-17 89(59.3%)
61(40.7%)
150(100%)
18-24 5668(54.7%)
4697(45.3%)
10365(100%)
25-34 1904(54.5%)
1589(45.5%)
3494(100%)
35 or older
1660(63.1%)
970(36.9%)
2630(100%)
Totals 9321(56%)
7317(44%)
16639(100%)
… unless there is an effect caused by
the age group (?)
![Page 74: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/74.jpg)
Conditional DistributionsAge
GroupFemale Male Total
15-17 89(59.3%)
61(40.7%)
150(100%)
18-24 5668(54.7%)
4697(45.3%)
10365(100%)
25-34 1904(54.5%)
1589(45.5%)
3494(100%)
35 or older
1660(63.1%)
970(36.9%)
2630(100%)
Totals 9321(56%)
7317(44%)
16639(100%)
… and these are not close to the
marginal distribution!
![Page 75: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/75.jpg)
Conditional Distributions
• Based on the previous table, the distribution of “gender given age group” are not that different.
• We can see that the “35 and older” group seems to differ slightly from the overall trend.
![Page 76: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/76.jpg)
Conditional Distributions “age group given gender”
Age Group
Female Male Total
15-17 89(1%)
61(0.8%)
150(0.9%)
18-24 5668(60.8%)
4697(64.2%)
10365(62.3%)
25-34 1904(20.4%)
1589(21.7%)
3494(21.0%)
35 or older
1660(17.8%)
970(13.3%)
2630(15.8%)
Totals 9321(100%)
7317(100%)
16639(100%)
![Page 77: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/77.jpg)
Conditional Distributions “age group given gender”
Age Group
Female Male Total
15-17 89(1%)
61(0.8%)
150(0.9%)
18-24 5668(60.8%)
4697(64.2%)
10365(62.3%)
25-34 1904(20.4%)
1589(21.7%)
3494(21.0%)
35 or older
1660(17.8%)
970(13.3%)
2630(15.8%)
Totals 9321(100%)
7317(100%)
16639(100%)
Here is the same chart with the
conditional distributions by
gender…
![Page 78: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/78.jpg)
Conditional Distributions “age group given gender”
Age Group
Female Male Total
15-17 89(1%)
61(0.8%)
150(0.9%)
18-24 5668(60.8%)
4697(64.2%)
10365(62.3%)
25-34 1904(20.4%)
1589(21.7%)
3494(21.0%)
35 or older
1660(17.8%)
970(13.3%)
2630(15.8%)
Totals 9321(100%)
7317(100%)
16639(100%)
Is there a gender effect noticeable from this table?
![Page 79: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/79.jpg)
Conditional Distributions “age group given gender”
Age Group
Female Male Total
15-17 89(1%)
61(0.8%)
150(0.9%)
18-24 5668(60.8%)
4697(64.2%)
10365(62.3%)
25-34 1904(20.4%)
1589(21.7%)
3494(21.0%)
35 or older
1660(17.8%)
970(13.3%)
2630(15.8%)
Totals 9321(100%)
7317(100%)
16639(100%)
![Page 80: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/80.jpg)
Conditional Distribution
Conclusions from the previous chart• Females are more likely to be in the “35
and older group” and less likely to be in the “18 to 24” group
• Males are more likely to be in the “18 to 24” group and less likely to be in the “35 and older” group
• These differences appear slight. Are actually “significant” with respect to the overall distribution?
![Page 81: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/81.jpg)
Conditional Distribution
• No single graph portrays the form of the relationship between categorical variables.
• No single numerical measure (such as correlation) summarizes the strength of the association.
![Page 82: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/82.jpg)
Simpson’s Paradox
• Associations that hold true for all of several groups can reverse direction when teh data is combined to form a single group.
• EX 4.15 pg 299• This phenomenon is often the result
of an “unaccounted” variable.
![Page 83: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/83.jpg)
Assignment 4.2
• Pg 298 #23-25, 29, 31-35
![Page 84: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/84.jpg)
4.3 ESTABLISHING CAUSATION
![Page 85: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/85.jpg)
Different Relationships
• Suppose two variables (X and Y) have some correlation– i.e. when X increases in value, Y
increases as well– One of the following relationships may
hold.
![Page 86: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/86.jpg)
Different Relationships
Causation• In this relationship, the explanatory
variable is somehow affecting the response variable.
• In most instances, we are looking to find evidence of a causation relationship
![Page 87: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/87.jpg)
Different Relationships
Causation
![Page 88: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/88.jpg)
Different Relationships
Common Response• In this relationship, both X and Y are
correlated to a third (unknown) variable (Z).
• EX, When Z increases X increases and Y increases.
• Unless we known about Z, it appears as though X and Y have a causation relationship.
![Page 89: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/89.jpg)
Different Relationships
Common Response
![Page 90: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/90.jpg)
Different Relationships
Confounding• X and Y have correlation, • An (often unknown) third variable ‘Z”
also has correlation with Y• Is X the explanatory variable, or is Z
the explanatory variable, or are the both explanatory variables?
![Page 91: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/91.jpg)
Different Relationships
Confounding
![Page 92: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/92.jpg)
Causation
• The best way to establish causation is with a carefully designed experiment– Possible ‘lurking variables’ are controlled
• Experiments cannot always be conducted–Many times, they are costly or even
unethical
• Some guidelines need to be established in cases where an observational study is the only method to measure variables.
![Page 93: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/93.jpg)
Causation- some criteria
• Association is strong• Association is consistent (among
different studies)• Large values of the response variable
are associated with stronger responses (typo?)
• The alleged cause precedes the effect in time
• The alleged cause is probable
![Page 94: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/94.jpg)
Assignment 4.3
Pg312 #41, 45, 50, 51
![Page 95: Stats chapter 4](https://reader033.vdocument.in/reader033/viewer/2022061204/547f26a1b4af9ff0438b4621/html5/thumbnails/95.jpg)
Chapter 4 Review
• #37, 53, 54, 57