department of statistics, university of california, riverside, ca 92521 1 graduate student, 2...

1
Department of Statistics, University of California, Riverside, CA 92521 1 Graduate Student, 2 Faculty and Director of Collaboratory, 3 Manager of Collaboratory Bradley-Terry Model Bradley-Terry Model Introducti Introducti on on Model Goodness Of Fit Model Goodness Of Fit Experimental Design Experimental Design Data Data Estimation Of True Ratings Estimation Of True Ratings v v i i Maximum Likelihood Estimates of the true ratings v i were obtained using the R software package and are presented below. Algorithm Algorithm Bradley-Terry Model Analysis of Cat Food Recipes Hongjie Deng 1 , Daniel R. Jeske 2 and Ted Younglove 3 The Del Monte Pet Products Division of Del Monte Foods conducted palatability studies of dry cat food, wet cat food, and cat treats using paired comparison consumption tests. The Statistical Consulting Collaboratory at the University of California, Riverside was consulted to improve the analysis of the paired comparison experiments, focusing initially on the experiments that used dry cat food. Our goal was to apply Bradley-Terry modeling and analysis techniques to the experimental data. In particular, we wanted to estimate a quality score for each food recipe, test whether the scores were significantly different, and explore the power of alternative paired comparison designs. Food A Food B Food C Food D Food E Food A - 20 22 20 1 Food B 9 - 6 7 1 Food C 8 24 - 8 2 Food D 10 23 21 - 3 Food E 29 29 27 27 - Note: Cell ij =number of cats who prefer food i over food j Suppose there are t treatments in an experiment involving paired comparisons. Each pair of treatments is compared by k different judges. Define Pr ( treatment i is preferred over treatment j ) = . Define r ijk = rank of the i-th treatment when compared with j-th treatment by judge k. The saturated likelihood function is The Bradley-Terry link function is , where v i = true rating of treatment i. Rank treatments based on . ij j i i v v v ij e e e 1 , 1 t p t i j j ij i Diet Rank Food E 2.3024 0.9413 1 Food A 0 0.5317 2 Food D -0.2132 0.4802 3 Food C -0.7307 0.3535 4 Food B -1.4367 0.1933 5 i v ˆ i p ˆ Estimates Of Estimates Of Food A Food B Food C Food D Food E Food A - 0.81 0.67 0.55 0.09 Food B 0.19 - 0.33 0.23 0.02 Food C 0.33 0.67 - 0.37 0.05 Food D 0.45 0.77 0.63 - 0.07 Food E 0.91 0.98 0.95 0.93 - Multiple Comparison Procedure Multiple Comparison Procedure To identify which food recipes are different with respect to cat preference, the procedure uses an algorithm to generate hypothetical tables of data under the null hypothesis . | ˆ ˆ | j i v v Randomly generate a table of data 10 4 times (see below) Compare the Monte-Carlo p-value with Conclusion Calculate ( i = 1,…,t ) for each table For each (i, j) pair, calculate Monte-Carlo p-value = ( # of Q > ) /10 4 Obtain for each table * * 1 ˆ ˆ | | i j i j t Q Max v v * ˆ i v Histogram Of Histogram Of Q Q Pr (Q>0.6512)=0.05 Test Results Test Results H o : Bradley-Terry model fits the data H a : Bradley-Terry model does not fit the data Test statistic: = -2( ) where is the saturated likelihood function, reduced by the Bradley-Terry link function. Test Result: p-value = 0.1093. Do not reject H o at = 0.05. Power Analysis Power Analysis Test Test How Well The Model Fit How Well The Model Fit Food A Food B Food C Food D Food E Food A - 20 (23.43) 22 (20.25) 20 (16.59) 1 (2.73) Food B 9 (5.57) - 6 (9.91) 7 (6.82) 1 (0.70) Food C 8 (9.75) 24 (20.09) - 8 (10.83) 2 (1.33) Food D 10 (13.41) 23 (23.18) 21 (18.71) - 3 (2.24) Food E 29 (27.17) 29 (29.30) 27 (27.67) 27 (27.76) - Observed Frequencies and ( Expected Frequencies ) Two Alternative Designs Two Alternative Designs D1: 10 panels comparing all pairs of recipes with 30 cats each D2: 4 panels comparing (A,B) , (B,C) , (C,D) , (D,E), each with 75 cats D2 has the minimum number of comparisons needed to estimate all the ratings and is motivated by being a simpler experiment to manage. Power Comparison Power Comparison H o : v 1 = v 2 = v 3 = v 4 = v 5 (i.e., no difference in recipes) H a : not H o Power levels for each design of a 5% test of H o using 1000 simulated data sets are presented below. True Ratings Power D1 D2 (1,1,1,1,1) 0.055 0.050 (1,1,1,1,1.2) 0.111 0.084 (1,1,1,1.2,1.2) 0.180 0.095 (1,1,1.2,1.2,1. 2) 0.178 0.078 (1,1,1,1,1.5) 0.544 0.378 (1,1,1,1.5,1.5) 0.743 0.338 (1,1,1.5,1.5,1. 5) 0.776 0.383 (1,1,1,1,1.8) 0.946 0.769 (1,1,1,1.8,1.8) 0.992 0.754 (1,1,1.8,1.8,1. 8) 0.993 0.797 (1,1.2,1.2,1.2, 1.2) 0.122 0.074 (1,1.5,1.5,1.5, 1.5) 0.554 0.365 Cinnamon Wheaties Lana Wheaties 2 2 12 1, 1 ( ,..., ) ( ) (1 ) ijk jik n t r r t t ij ij k i j L How To Randomly Generate A Table Of Data How To Randomly Generate A Table Of Data Food A Food B Food C Food D Food E Food A - C12= Bin(29,0 . 5 ) C13= Bin(30,0 . 5 ) C14= Bin(30,0 . 5 ) C15= Bin(30,0 . 5 ) Food B 29-C12 - C23= Bin(30,0 . 5 ) C24= Bin(30,0 . 5 ) C25= Bin(30,0 . 5 ) Food C 30-C13 30-C23 - C34= Bin(29,0 . 5 ) C35= Bin(29,0 . 5 ) Food D 30-C14 30-C24 29-C34 - C45= Bin(30,0 . 5 ) Food E 30-C15 30-C25 29-C35 30-C45 - Food E Food A Food D Food C Food B ij Note: Foods connected by a line are not significantly different at =0.05 ) ,..., ( log log , 1 12 t t H L L o Conclusion Conclusion Although D2 is simpler to manage, it has less power than D1. The loss information by using less panels is not compensated for by using more in each panel. Contrasts under D1 have equal precision while for D2 do not. Contrast s Standard Deviation Contrast s Standard Deviation D1 D2 D1 D2 v 1 -v 2 0.23 0.24 v 2 -v 4 0.23 0.34 v 1 -v 3 0.23 0.33 v 2 -v 5 0.24 0.41 v 1 -v 4 0.24 0.42 v 3 -v 4 0.24 0.23 v 1 -v 5 0.24 0.48 v 3 -v 5 0.24 0.33 v 2 -v 3 0.23 0.24 v 4 -v 5 0.24 0.24 Comparison Comparison Of Standard Deviation Of Contrasts Of Standard Deviation Of Contrasts Based on 1000 simulations under H o . Results are relatively invariant to what is assumed for the true v i values . Special Thanks To: Javier Suarez and Hua Yu of the UCR Statistical Consulting Collaboratory. Graduate Students of the Fall 2005 offering of STAT 293: Yingtao Bi, Mike Huang, Steward Huang, Sungsu Kim, Scott Lesch, Rupam Pal, Jose Sanchez, Jason Wilson, Rui Xiao, Karen Huaying Xu, and Qi Zhang. o H L Q Frequency 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0 500 1000 1500 2000 2500 The experiment was conducted using a colony of 300 cats, male and female, of various breeds and ages. Each cat in a panels of 30 randomly selected cats was given two different bowls of food on each of two days. On the first day food A was placed to the left and food B was placed to the right. On the second day, the left-right orientation was reversed. The relative amount of each food, A and B, that the cats consumed over the two days was used to indicate which food they preferred. In this poster, we show how to analyze the data and select the food recipe that is most attractive to the cats. 0 1 2 3 4 5 : H v v v v v

Upload: anne-ball

Post on 13-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Department of Statistics, University of California, Riverside, CA 92521 1 Graduate Student, 2 Faculty and Director of Collaboratory, 3 Manager of Collaboratory

Department of Statistics, University of California, Riverside, CA 925211Graduate Student, 2Faculty and Director of Collaboratory, 3Manager of Collaboratory

Bradley-Terry ModelBradley-Terry Model

IntroductionIntroduction Model Goodness Of FitModel Goodness Of Fit

Experimental DesignExperimental Design

DataData

Estimation Of True Ratings Estimation Of True Ratings vvii

Maximum Likelihood Estimates of the true ratings vi were obtained using the R software package and are presented below.

AlgorithmAlgorithm

Bradley-Terry Model Analysis of Cat Food Recipes

Hongjie Deng1, Daniel R. Jeske2 and Ted Younglove3

The Del Monte Pet Products Division of Del Monte Foods conducted palatability studies of dry cat food, wet cat food, and cat treats using paired comparison consumption tests.

The Statistical Consulting Collaboratory at the University of California, Riverside was consulted to improve the analysis of the paired comparison experiments, focusing initially on the experiments that used dry cat food.

Our goal was to apply Bradley-Terry modeling and analysis techniques to the experimental data. In particular, we wanted to estimate a quality score for each food recipe, test whether the scores were significantly different, and explore the power of alternative paired comparison designs.

  Food A Food B Food C Food D Food E

Food A - 20 22 20 1

Food B 9 -  6 7 1

Food C 8 24 -  8 2

Food D 10 23 21 -  3

Food E 29 29 27 27 - 

Note: Cellij=number of cats who prefer food i over food j

Suppose there are t treatments in an experiment involving paired comparisons. Each pair of treatments is compared by k different judges.

Define Pr ( treatment i is preferred over treatment j ) = .

Define rijk = rank of the i-th treatment when compared with j-th treatment by judge k.

The saturated likelihood function is

The Bradley-Terry link function is , where vi = true rating of treatment i.

Rank treatments based on .

ij

ji

i

vv

v

ijee

e

1,1

tp

t

ijjij

i

Diet Rank

Food E 2.3024 0.9413 1

Food A 0 0.5317 2

Food D -0.2132 0.4802 3

Food C -0.7307 0.3535 4

Food B -1.4367 0.1933 5

iv̂ ip̂

Estimates OfEstimates Of

  Food A Food B Food C Food D Food E

Food A - 0.81 0.67 0.55 0.09

Food B 0.19 -  0.33 0.23 0.02

Food C 0.33 0.67 -  0.37 0.05

Food D 0.45 0.77 0.63 -  0.07

Food E 0.91 0.98 0.95 0.93 - 

Multiple Comparison ProcedureMultiple Comparison Procedure

To identify which food recipes are different with respect to cat preference, the procedure uses an algorithm to generate hypothetical tables of data under the null hypothesis .

|ˆˆ| ji vv

Randomly generate a table of data 104 times (see below)

Compare the Monte-Carlo p-value with

Conclusion

Calculate ( i = 1,…,t ) for each table

For each (i, j) pair, calculate Monte-Carlo p-value = ( # of Q > ) /104

Obtain for each table* *

1ˆ ˆ| |i j

i j tQ Max v v

*ˆiv

Histogram Of Histogram Of QQ

Pr (Q>0.6512)=0.05

Test ResultsTest Results

Ho: Bradley-Terry model fits the dataHa: Bradley-Terry model does not fit the data

Test statistic: = -2( ) where is the saturated likelihood function, reduced by the Bradley-Terry link function.

Test Result: p-value = 0.1093. Do not reject Ho at = 0.05.

Power AnalysisPower Analysis

TestTest

How Well The Model FitHow Well The Model Fit

Food A Food B Food C Food D Food E

Food A - 20 (23.43) 22 (20.25) 20 (16.59) 1 (2.73)

Food B 9 (5.57) - 6 (9.91) 7 (6.82) 1 (0.70)

Food C 8 (9.75) 24 (20.09) - 8 (10.83) 2 (1.33)

Food D 10 (13.41) 23 (23.18) 21 (18.71) - 3 (2.24)

Food E 29 (27.17) 29 (29.30) 27 (27.67) 27 (27.76) -

Observed Frequencies and ( Expected Frequencies )

Two Alternative DesignsTwo Alternative Designs

D1: 10 panels comparing all pairs of recipes with 30 cats eachD2: 4 panels comparing (A,B) , (B,C) , (C,D) , (D,E), each with 75 cats

D2 has the minimum number of comparisons needed to estimate all the ratings and is motivated by being a simpler experiment to manage.

Power ComparisonPower Comparison

Ho: v1 = v2 = v3 = v4 = v5 (i.e., no difference in recipes)Ha: not Ho

Power levels for each design of a 5% test of Ho using 1000 simulated data sets are presented below.

True Ratings

Power

D1 D2

(1,1,1,1,1) 0.055 0.050

(1,1,1,1,1.2) 0.111 0.084

(1,1,1,1.2,1.2) 0.180 0.095

(1,1,1.2,1.2,1.2) 0.178 0.078

(1,1,1,1,1.5) 0.544 0.378

(1,1,1,1.5,1.5) 0.743 0.338

(1,1,1.5,1.5,1.5) 0.776 0.383

(1,1,1,1,1.8) 0.946 0.769

(1,1,1,1.8,1.8) 0.992 0.754

(1,1,1.8,1.8,1.8) 0.993 0.797

(1,1.2,1.2,1.2,1.2) 0.122 0.074

(1,1.5,1.5,1.5,1.5) 0.554 0.365

Cinnamon Wheaties LanaWheaties

2 2

12 1,1

( ,..., ) ( ) (1 )ijk jikn t r r

t t ij ijk i j

L

How To Randomly Generate A Table Of DataHow To Randomly Generate A Table Of Data

Food A Food B Food C Food D Food E

Food A - C12=

Bin(29,0.5)C13=

Bin(30,0.5)C14=

Bin(30,0.5)C15=

Bin(30,0.5)

Food B 29-C12 - C23=

Bin(30,0.5)C24=

Bin(30,0.5)C25=

Bin(30,0.5)

Food C 30-C13 30-C23 - C34=

Bin(29,0.5)C35=

Bin(29,0.5)

Food D 30-C14 30-C24 29-C34 -C45=

Bin(30,0.5)

Food E 30-C15 30-C25 29-C35 30-C45 - 

Food E Food A Food D Food C Food B

ij

Note: Foods connected by a line are not significantly different at =0.05

),...,(loglog ,112 ttH LLo

ConclusionConclusion

Although D2 is simpler to manage, it has less power than D1. The loss of information by using less panels is not compensated for by using more catsin each panel. Contrasts under D1 have equal precision while for D2 theydo not.

Contrasts

Standard Deviation Contrasts

Standard

Deviation

D1 D2 D1 D2

v1-v2 0.23 0.24 v2-v4 0.23 0.34

v1-v3 0.23 0.33 v2-v5 0.24 0.41

v1-v4 0.24 0.42 v3-v4 0.24 0.23

v1-v5 0.24 0.48 v3-v5 0.24 0.33

v2-v3 0.23 0.24 v4-v5 0.24 0.24

ComparisonComparison Of Standard Deviation Of ContrastsOf Standard Deviation Of Contrasts

Based on 1000 simulations under Ho. Results are relatively invariant to what is assumed for the true vi values .

Special Thanks To: Javier Suarez and Hua Yu of the UCR Statistical Consulting Collaboratory. Graduate Students of the Fall 2005 offering of STAT 293: Yingtao Bi, Mike Huang, Steward Huang, Sungsu Kim, Scott Lesch, Rupam Pal, Jose Sanchez, Jason Wilson, Rui Xiao, Karen Huaying Xu, and Qi Zhang.

oHL

Q

Fre

que

ncy

0.0 0.2 0.4 0.6 0.8 1.0 1.2

050

010

00

150

020

00

250

0

The experiment was conducted using a colony of 300 cats, male and female, of various breeds and ages. Each cat in a panels of 30 randomly selected cats was given two different bowls of food on each of two days. On the first day food A was placed to the left and food B was placed to the right. On the second day, the left-right orientation was reversed.

The relative amount of each food, A and B, that the cats consumed over the two days was used to indicate which food they preferred. In this poster, we show how to analyze the data and select the food recipe that is most attractive to the cats.

0 1 2 3 4 5:H v v v v v