nominal responses: baseline-category logit models (agresti 7.1)

NOMINAL RESPONSES:NOMINAL RESPONSES:

BASELINE-CATEGORY LOGIT MODELS BASELINE-CATEGORY LOGIT MODELS (Agresti 7.1)(Agresti 7.1)

Kathy Fung and Lin ZhangKathy Fung and Lin Zhang

Statistics 6841 ProjectStatistics 6841 Project

Winter 2005Winter 2005

23/4/19 2

ObjectiveObjective

Introduction of NOMINAL Introduction of NOMINAL RESPONSES (BASELINE-RESPONSES (BASELINE-CATEGORY LOGIT MODELS)CATEGORY LOGIT MODELS)

The Concept and ExampleThe Concept and Example

23/4/19 3

Model DefinitionModel Definition

23/4/19 4

Some Notes:Some Notes:

• With categorical predictors, XWith categorical predictors, X22 and and GG22 goodness-of-fit statistics provide a goodness-of-fit statistics provide a model check when data are not model check when data are not sparse. sparse.

• When an explanatory variable is When an explanatory variable is continuous or the data are sparse continuous or the data are sparse such statistics are still valid for such statistics are still valid for comparing nested models differing comparing nested models differing by relative few terms.by relative few terms.

23/4/19 5

Alligator Food Choice Alligator Food Choice Example Example

623/4/19

SAS code of Table 7.1*SAS for Baseline-Category Logit Models with Alligator Data in Table 7.1;

data gator;infile 'K:\CSU Hayward\Stat 6841\project\gator.txt';input lake gender size food count ;

proc logistic; freq count; class lake size / param=ref; model food(ref='1') = lake size / link=glogit aggregate scale=none;proc catmod; weight count; population lake size gender; model food = lake size / pred=freq pred=prob;run;

723/4/19

Output The LOGISTIC Procedure

Model Information

Data Set WORK.GATOR Response Variable food Number of Response Levels 5 Frequency Variable count Model generalized

logit Optimization Technique Fisher's scoring Number of Observations Read 80 Number of Observations Used 56 Sum of Frequencies Read 219 Sum of Frequencies Used 219

Response Profile Ordered Total Value food Frequency 1 1 94 2 2 61 3 3 19 4 4 13 5 5 32

Logits modeled use food=1 as the reference category.

NOTE: 24 observations having nonpositive frequencies or weights were excluded since they do not contribute to the analysis.

823/4/19

Output

Class Level Information

Class Value Design Variables

lake 1 1 0 0 2 0 1 0 3 0 0 1

4 0 0 0

size 1 1 2 0

Model Convergence Status

Convergence criterion (GCONV=1E-8) satisfied.

923/4/19

Output Deviance and Pearson Goodness-of-Fit Statistics

Criterion Value DF Value/DF Pr > ChiSq

Deviance 17.0798 12 1.4233 0.1466 Pearson 15.0429 12 1.2536 0.2391 Number of unique profiles: 8

Model Fit Statistics Intercept Intercept and Criterion Only covariates

AIC 612.363 580.080 SC 625.919 647.862 -2 Log L 604.363 540.080

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq Likelihood Ratio 64.2826 16 <.0001 Score 57.2475 16 <.0001 Wald 49.7584 16 <.0001

1023/4/19

Output Type 3 Analysis of Effects

Wald Effect DF Chi-Square Pr > ChiSq

lake 12 35.4890 0.0004 size 4 18.7593 0.0009

Analysis of Maximum Likelihood Estimates Standard Wald Parameter food DF Estimate Error Chi-Square Pr > ChiSq Intercept 2 1 -1.5490 0.4249 13.2890 0.0003 Intercept 3 1 -3.3139 1.0528 9.9081 0.0016 Intercept 4 1 -2.0931 0.6622 9.9894 0.0016 Intercept 5 1 -1.9043 0.5258 13.1150 0.0003 lake 1 2 1 -1.6583 0.6129 7.3216 0.0068 lake 1 3 1 1.2422 1.1852 1.0985 0.2946 lake 1 4 1 0.6951 0.7813 0.7916 0.3736

1123/4/19

Output Analysis of Maximum Likelihood Estimates

Standard Wald Parameter food DF Estimate Error Chi-Square Pr > ChiSq

lake 1 5 1 0.8262 0.5575 2.1959 0.1384 lake 2 2 1 0.9372 0.4719 3.9443 0.0470 lake 2 3 1 2.4583 1.1179 4.8360 0.0279 lake 2 4 1 -0.6532 1.2021 0.2953 0.5869 lake 2 5 1 0.00565 0.7766 0.0001 0.9942 lake 3 2 1 1.1220 0.4905 5.2321 0.0222 lake 3 3 1 2.9347 1.1161 6.9131 0.0086 lake 3 4 1 1.0878 0.8417 1.6703 0.1962 lake 3 5 1 1.5164 0.6214 5.9541 0.0147 size 1 2 1 1.4582 0.3959 13.5634 0.0002 size 1 3 1 -0.3513 0.5800 0.3668 0.5448 size 1 4 1 -0.6307 0.6425 0.9635 0.3263 size 1 5 1 0.3316 0.4483 0.5471 0.4595

1223/4/19

Output

Odds Ratio Estimates Point 95% Wald Effect food Estimate Confidence Limits lake 1 vs 4 2 0.190 0.057 0.633 lake 1 vs 4 3 3.463 0.339 35.343 lake 1 vs 4 4 2.004 0.433 9.266 lake 1 vs 4 5 2.285 0.766 6.814 lake 2 vs 4 2 2.553 1.012 6.437 lake 2 vs 4 3 11.685 1.306 104.508 lake 2 vs 4 4 0.520 0.049 5.490 lake 2 vs 4 5 1.006 0.219 4.608 lake 3 vs 4 2 3.071 1.174 8.032 lake 3 vs 4 3 18.815 2.111 167.717 lake 3 vs 4 4 2.968 0.570 15.447 lake 3 vs 4 5 4.556 1.348 15.400 size 1 vs 2 2 4.298 1.978 9.339 size 1 vs 2 3 0.704 0.226 2.194 size 1 vs 2 4 0.532 0.151 1.875 size 1 vs 2 5 1.393 0.579 3.354

1323/4/19

Output The CATMOD Procedure

Data Summary Response food Response Levels 5 Weight Variable count Populations 16 Data Set GATOR Total Frequency 219 Frequency Missing 0 Observations 56

Population Profiles Sample lake size gender Sample Size ----------------------------------------------- 1 1 1 1 13 2 1 1 2 26 3 1 2 1 7 4 1 2 2 9 5 2 1 1 5 6 2 1 2 15 7 2 2 1 26 8 2 2 2 2 9 3 1 1 12 10 3 1 2 12 11 3 2 1 28 12 3 2 2 1 13 4 1 1 27 14 4 1 2 14 15 4 2 1 12 16 4 2 2 10

1423/4/19

OutputResponse Profiles

Response food----------------

1 1 2 2

3 3 4 4 5 5

Maximum Likelihood Analysis Maximum likelihood computations converged.

Maximum Likelihood Analysis of Variance

Source DF Chi-Square Pr > ChiSq -------------------------------------------------- Intercept 4 70.39 <.0001 lake 12 35.49 0.0004 size 4 18.76 0.0009 Likelihood Ratio 44 52.48 0.1784

1523/4/19

Analysis of Maximum Likelihood Estimates

Function Standard Chi- Parameter Number Estimate Error Square Pr > ChiSq ---------------------------------------------------------------------------- Intercept 1 1.1514 0.2343 24.14 <.0001 2 0.4317 0.2737 2.49 0.1147 3 -0.6795 0.3818 3.17 0.0751 4 -0.9745 0.4049 5.79 0.0161 lake 1 1 -0.2391 0.3458 0.48 0.4892 1 2 -1.9977 0.4946 16.31 <.0001 1 3 -0.6556 0.6071 1.17 0.2802 1 4 0.1736 0.5654 0.09 0.7589 2 1 0.5814 0.5061 1.32 0.2506 2 2 1.4184 0.5250 7.30 0.0069 2 3 1.3810 0.6279 4.84 0.0278 2 4 -0.3542 0.9153 0.15 0.6988 3 1 -0.9293 0.3836 5.87 0.0154 3 2 0.0925 0.3910 0.06 0.8131 3 3 0.3467 0.5130 0.46 0.4991 3 4 -0.1240 0.5830 0.05 0.8316 size 1 1 -0.1658 0.2241 0.55 0.4595 1 2 0.5633 0.2525 4.98 0.0257 1 3 -0.3414 0.3257 1.10 0.2945 1 4 -0.4811 0.3564 1.82 0.1770

Output

23/4/19 16

Table 7.2Table 7.2

23/4/19 17

Some Test Results for Some Test Results for Table 7.2Table 7.2

• The data are sparse, 219 observations The data are sparse, 219 observations scattered among 80 cells. Thus, Gscattered among 80 cells. Thus, G22 is more is more reliable for compar ing models than for reliable for compar ing models than for testing fit. testing fit.

• The statistics The statistics • GG22 [( )|(G)] = 2.1 and [( )|(G)] = 2.1 and • GG22=[(L + S)|(G + L + S)] = 2.2, =[(L + S)|(G + L + S)] = 2.2,

each based on df = 4, suggest simplifying by each based on df = 4, suggest simplifying by collapsing the table over gender. (Other analyses, collapsing the table over gender. (Other analyses, not presented here, show that adding interaction not presented here, show that adding interaction terms including G do not improve the fit terms including G do not improve the fit significantly.) significantly.)

• The GThe G22 and X and X22 values for the collapsed table values for the collapsed table indicate that both L and S have effects.indicate that both L and S have effects.

23/4/19 18

Table 7.3Table 7.3

23/4/19 19

Table 7.4Table 7.4

23/4/19 20

Prediction Equation for Log Prediction Equation for Log Odds of Selecting Odds of Selecting

Invertebrates Instead of FishInvertebrates Instead of Fish

• where s=1 for size 2.3 meters and 0 otherwise, where s=1 for size 2.3 meters and 0 otherwise, • zH is a dummy variable for Lake Hancock (zH=1 for alligators in zH is a dummy variable for Lake Hancock (zH=1 for alligators in

that lake and 0 otherwise), and that lake and 0 otherwise), and • zO and zT are dummy variables for lakes Oklawaha and Trafford. zO and zT are dummy variables for lakes Oklawaha and Trafford. • Size of alligators has a noticeable effect. For a given lake, for small Size of alligators has a noticeable effect. For a given lake, for small

alligators the estimated odds that primary food choice was alligators the estimated odds that primary food choice was invertebrates instead of fish are exp(1.46) = 4.3 times the invertebrates instead of fish are exp(1.46) = 4.3 times the estimated odds for large alligators; estimated odds for large alligators;

• the Wald 95% confidence interval is exp[1.46 ± 1.96(0.396)] = the Wald 95% confidence interval is exp[1.46 ± 1.96(0.396)] = (2.0,9.3). (2.0,9.3).

• The lake effects indicate that the estimated odds that the primary The lake effects indicate that the estimated odds that the primary food choice was invertebrates instead of fish are relatively higher food choice was invertebrates instead of fish are relatively higher at Lakes Trafford and Oklawaha and relatively lower at Lake at Lakes Trafford and Oklawaha and relatively lower at Lake Hancock than they are at Lake George.Hancock than they are at Lake George.

23/4/19 21

Further Estimate Further Estimate CalculationCalculation

23/4/19 22

Estimating Response Estimating Response ProbabilitiesProbabilities

(Model)(Model)The equation that expresses The equation that expresses

multinomial logit models directly in multinomial logit models directly in terms of response probabilities is terms of response probabilities is

23/4/19 23

Estimating Response Estimating Response ProbabilitiesProbabilities

(Results)(Results)• From Table 7.4 the estimated From Table 7.4 the estimated

probability that a large alligator in probability that a large alligator in Lake Hancock has invertebrates as Lake Hancock has invertebrates as the primary food choice is the primary food choice is

• The estimated probabilities for The estimated probabilities for reptile, bird, other, and fish are 0.072, reptile, bird, other, and fish are 0.072, 0.141, 0.194, and 0.570.0.141, 0.194, and 0.570.

23/4/19 24

Quality vs. QuantityQuality vs. Quantity

23/4/19 25

Summary and ConclusionSummary and Conclusion

nominal responses: baseline-category logit models (agresti 7.1)

Documents

lake size pred

lake size link

use food

model check

total value food frequency

alligator data

number of observations

ref model foodref