generalized linear mixed model english premier league soccer – 2003/2004 season

27
Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

Upload: agnes-lester

Post on 17-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

Generalized Linear Mixed Model

English Premier League Soccer – 2003/2004 Season

Page 2: Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

Introduction

• English Premier League Soccer (Football) 20 Teams – Each plays all others twice (home/away) Games consist of two halves (45 minutes each) No overtime Each team is on offense and defense for 38 games

(38 first and second halves) Response Variable: Goals in a half Potential Independent Variables

• Fixed Factors: Home Dummy, Half2 Dummy, Game#(1-38)• Random Factors: Offensive Team, Defensive Team

Distribution of Response: Poisson?

Page 3: Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

Preliminary SummaryTeam Off Goals Def Goals Team Off Goals Def Goals

Arsenal 73 26 Southampton 44 45

Aston Villa 48 44 Wolverhampton 38 77

Blackburn 51 59 Birmingham 43 48

Charlton 51 51 Bolton 48 56

Everton 45 57 Chelsea 67 30

Leeds United 40 79 Fulham 52 46

Liverpool 55 37 Leicester City 48 65

Manchester United 64 35 Manchester City 55 54

Newcastle 52 40 Middlesbrough 44 52

Tottenham 47 57 Portsmouth 47 54

Half2 Goals0 4611 551

Home Goals0 4401 572

Goals by Game Order

0

5

10

15

20

25

30

35

40

45

0 5 10 15 20 25 30 35 40

Game Order

To

tal

Go

als

DW2.03335

Page 4: Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

Summary of Previous Slide

• Teams vary extensively on offense and defense Offense: min=38, max=73, mean=50.6, SD=8.85 Defense: min=26, max=79, mean=50.6, SD=13.75 Strong Negative correlation between off/def: r=-0.80

• Home Teams outscore Away Teams 1.3:1 • Second Half outscores First Half 1.2:1• No evidence of autocorrelation in total goals

scored over weeks, Durbin-Watson Stat = 2.03

Page 5: Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

“Marginal Analysis” – No Team Effects

• Break Down Goals by Home/Half2 (380 Games)Goals Home1 Road1 Home2 Road2Mean 0.6921 0.5211 0.8132 0.6368Variance 0.6886 0.5141 0.9122 0.6277Obs freqs

0 192 223 175 1981 127 124 130 1332 48 26 56 413 12 6 10 64 1 1 8 15 0 0 1 1

6+ 0 0 0 0

Exp freqs0 190.20 225.68 168.51 201.001 131.64 117.59 137.03 128.012 45.55 30.64 55.71 40.76

3+ 12.61 6.09 18.75 10.23

Chi-Sq0 0.0171 0.0318 0.2497 0.04491 0.1633 0.3493 0.3604 0.19462 0.1314 0.7014 0.0015 0.0014

3+ 0.0120 0.1350 0.0034 0.4846

Corr Home1 Road1 Home2 Road2Home1 1.0000 -0.0445 0.0970 0.1184Road1 -0.0445 1.0000 0.1079 0.0460Home2 0.0970 0.1079 1.0000 -0.0794Road2 0.1184 0.0460 -0.0794 1.0000

Sum 0.3238 1.2175 0.6151 0.7256df 2 2 2 2CV(.05) 5.991 5.991 5.991 5.991P-value 0.8505 0.5440 0.7353 0.6957

Page 6: Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

Summary of Previous Slide• Means (Variances) for 4 Half Types:

Home/1st Half: Mean = 0.692 Variance = 0.689 Away/1st Half: Mean = 0.521 Variance = 0.514 Home/2nd Half: Mean = 0.813 Variance = 0.912 Away/2nd Half: Mean = 0.637 Variance = 0.628 Thus, means and variances in strong agreement

• Chi-Square Statistics for testing for Poisson: Df = (4 categories-1)-(1 Parameter estimated) = 2 P-values all exceed 0.50 (.8505, .5440, .7353, .6957) Goals scored consistent with Poisson Distribution

Page 7: Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

Observed & Expected Counts

0

50

100

150

200

250

Fre

qu

en

cy

observed

expected

Home/1st Half Away/1st Half Home/2nd half Away/2nd Half

0 1 2 3+ 0 1 2 3+ 0 1 2 3+ 0 1 2 3+

Page 8: Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

Generalized Linear Models

• Dependent Variable: Goals Scored• Distribution: Poisson• Link Function: log• Independent Variables: Home, Half2 Dummy Variables• Models:

2*2)(log :Model2

2)(log :1 Model

HomeHalf2Half2Home0

Half2Home0

HalfHomeHalfHomeYE

HalfHomeYE

Model fit using generalized linear model software packages

Page 9: Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

Parameter Estimates / Model Fit – Model 1

Distribution Poisson Link Function Log Dependent Variable goals Number of Observations Read 1520 Number of Observations Used 1520

Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 1517 1650.4574 1.0880 Scaled Deviance 1517 1650.4574 1.0880 Pearson Chi-Square 1517 1549.2570 1.0213 Scaled Pearson X2 1517 1549.2570 1.0213 Log Likelihood -1411.0226

Algorithm converged.

Page 10: Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

Parameter Estimates / Model Fit – Model 1

Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Intercept 1 -0.6397 0.0588 -0.7549 -0.5245 118.48 home 1 0.2624 0.0634 0.1381 0.3866 17.12 half2 1 0.1783 0.0631 0.0546 0.3020 7.98 Scale 0 1.0000 0.0000 1.0000 1.0000

Analysis Of Parameter Estimates Parameter Pr > ChiSq Intercept <.0001 home <.0001 half2 0.0047 Scale

NOTE: The scale parameter was held fixed.

Page 11: Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

Parameter Estimates / Model Fit – Model 2

Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF

Deviance 1516 1650.3613 1.0886 Scaled Deviance 1516 1650.3613 1.0886 Pearson Chi-Square 1516 1549.7072 1.0222 Scaled Pearson X2 1516 1549.7072 1.0222 Log Likelihood -1410.9745

Algorithm converged.

Page 12: Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

Parameter Estimates / Model Fit – Model 2 Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi-Parameter DF Estimate Error Limits SquareIntercept 1 -0.6519 0.0711 -0.7912 -0.5126 84.15home 1 0.2839 0.0941 0.0995 0.4683 9.10half2 1 0.2007 0.0958 0.0129 0.3885 4.39home*half2 1 -0.0395 0.1274 -0.2891 0.2101 0.10Scale 0 1.0000 0.0000 1.0000 1.0000

Parameter Pr > ChiSq

Intercept <.0001 home 0.0026 half2 0.0363 home*half2 0.7566 Scale

NOTE: The scale parameter was held fixed.

Page 13: Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

Testing for Home/Half2 Interaction

• H0: No Home x Half2 Interaction (HomeHalf2 = 0)• HA: Home x Half2 Interaction (HomeHalf2 ≠ 0)• Test 1 – Wald Test • Test 2 – Likelihood Ratio Test

7564.0962.

0962.0))9745.1410(2())0226.1411(2(

))eihood(H(-2log(lik))eihood(H(-2log(lik T.S.

:Test ratio Likelihood

7566.0961.

0961.01274.0

0395.0

SE :T.S.

:Test Wald

21

0

21

2

2

HomeHalf2

^HomeHalf2

^

2

PP

PP

X

A

obs

Page 14: Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

Testing for Main Effects for Home & Half2

• Wald tests only reported here (both effects are very significant)

• Tests based on Model 1 (no interaction model)

0047.98.798.70631.0

1783.0:..

0:0: :Effect Half2

0001.13.1713.170634.0

2624.0:..

0:0: :Effect Home

21

22

Half2Half20

21

22

HomeHome0

PPXST

HH

PPXST

HH

obs

A

obs

A

Page 15: Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

Interpreting the GLM

820.0)20.1(686.0 Home/Half2

630.0)20.1(53.0 Away/Half2

686.0)30.1(53.0 Home/Half1

5275.0 Away/Half1

:Means Estimated

)()12,1( Home/Half2

)()12,0( Away/Half2

)()02,1( Home/Half1

)()02,0( Away/Half1

)(

:Model

1783.02624.06397.0^

1783.06397.0^

2624.06397.0^

6397.^

2

Half2

^

Home

^

0

^

Half2

^

0

^

Home

^^

0

^

0

Half2Home0

Half20

Home0

0

Half2Home0

ee

ee

ee

ee

eYEHalfHome

eYEHalfHome

eYEHalfHome

eYEHalfHome

eYE HalfHome

Page 16: Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

Incorporating Random (Team) Effects

• Teams clearly vary in terms of offensive and defensive skills (see slide 3)

• Since many factors are inputs into team abilities (players, coaches, chemistry), we will treat team offensive and defensive effects as Random

• There will be 20 random offensive effects (one per team) and 20 defensive effects

Page 17: Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

Random Team Effects

• All effects are on log scale for goals scored

• Offense Effects: oi ~ NID(0,o2)

• Defense Effects: di ~ NID(0,d2)

• In Estimation process assume COV(oi,di)=0 which seems a stretch (but we can still “observe” the covariance of the estimated random effects)

Page 18: Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

Mixed Effects Model

• Fixed Effects: Intercept, Home, Half2 (• Random Effects: Offteam, Defteam ()• Conditional Model (on Random Effects)

0,,0~,0~

Teamfor Effect Defense Teamfor Effect Offense

effect Half 2 Effect Home Intercept

12,02,1,0

20,...,120,...,12,12,1

2log

,,2

,2

,

,,

ndHalf2Home0

2121

,,Half2Home0

lDefkOffdlDefokOff

lDefkOff

lDefkOffjiijkl

COVNIDNID

lk

HalfHalfHomeHome

lklkji

HalfHome

Page 19: Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

Model in Matrix Notation - Example

DDOOg βZβZXαZβXαμμeμY )log()(

League has 3 Teams: A, B, C

Order of Entry of Games: A@B, A@C, B@C, B@A, C@A, C@B

Order of Entry of Scores within Game: Home/1st, Away/1st, Home/2nd, Away/2nd

3 Offense Effects, 3 Defense Effects, 24 Observations

DC

DB

DA

D

OC

OB

OA

O

ββα

Half2

Home

0

Page 20: Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

Model – Based on 3 Teams

DDOOg βZβZXαZβXαμeμy )(

1 1 0 0 1 0 1 0 01 0 0 1 0 0 0 1 01 1 1 0 1 0 1 0 01 0 1 1 0 0 0 1 01 1 0 0 0 1 1 0 01 0 0 1 0 0 0 0 11 1 1 0 0 1 1 0 01 0 1 1 0 0 0 0 11 1 0 0 0 1 0 1 01 0 0 0 1 0 0 0 1

X= 1 1 1 Z0= 0 0 1 ZD= 0 1 01 0 1 0 1 0 0 0 11 1 0 1 0 0 0 1 01 0 0 0 1 0 1 0 01 1 1 1 0 0 0 1 01 0 1 0 1 0 1 0 01 1 0 1 0 0 0 0 11 0 0 0 0 1 1 0 01 1 1 1 0 0 0 0 11 0 1 0 0 1 1 0 01 1 0 0 1 0 0 0 11 0 0 0 0 1 0 1 01 1 1 0 1 0 0 0 11 0 1 0 0 1 0 1 0

Page 21: Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

Sequence of Potential Models

1. No fixed or random effects (common mean)

2. Fixed home and second half effects, no random effects

3. Fixed home and second half effects, random offense team effects

4. Fixed home and second half effects, random defense team effects

5. Fixed home and second half effects, random offense and defense team effects

Page 22: Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

Results – Estimates (P-Values)

Model Home Half2 o2 d

2 Res2 -2lnL AIC BIC

1 -.407

(.0001)

N/A N/A N/A N/A 1.044 5001.9 5003.9 5009.3

2 -.6397

(.0001)

.2624

(.0001)

.1783

(.0052)

N/A N/A 1.0213 4992.3 4994.3 4999.6

3 -.6413

(.0001)

.2624

(.0001)

.1783

(.0050)

.01004

(.143*)

N/A 1.0099 4985.6 4989.6 4991.6

4 -.6592

(.0001)

.2624

(.0001)

.1783

(.0040)

N/A .0588

(.012*)

0.9630 4958.6 4962.6 4964.6

5 -.6605

(.0001)

.2624

(.0001)

.1783

(.0039)

.0084

(.162*)

.0549

(.012*)

0.9531 4951.9 4957.9 4960.9

•Based on Z-test, not preferred Likelihood Ratio Test

•H0:o2 = 0 vs HA:0

2>0 TS: 4958.6-4951.9=6.7 P=0.5P(12 ≥6.7)=.005

•Based on AIC, BIC, Model with both offense and defense effects is best

•No interaction found between team effects and home or half2

Page 23: Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

Goodness of Fit

• We Test whether the Poisson GLMM is appropriate model by means of the Scaled Deviance

• H0: Model Fits HA: Model Lacks Fit• Deviance = 1570.7• DF = N-#fixed parms = 1520-3=1517• P-value=P(2≥1570.7)=0.1646• No Evidence of Lack-of-Fit*• * If we use Scaled Deviance, we do reject, where scaled

deviance=1570.7/0.9531=1647.9

Page 24: Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

Best Linear Unbiased Predictors (BLUPs)

Team Off Effect Def Effect Team Off Effect Def EffectArsenal 0.1284 -0.4016 Leicester City -0.0120 0.2112Aston Villa -0.0170 -0.0873 Liverpool 0.0240 -0.2018Birmingham -0.0469 -0.0262 Manchester City 0.0281 0.0649Blackburn 0.0049 0.1333 Manchester United 0.0775 -0.2348Bolton -0.0142 0.0914 Middlesbrough -0.0398 0.0335Charlton 0.0030 0.0205 Newcastle 0.0065 -0.1516Chelsea 0.0941 -0.3255 Portsmouth -0.0208 0.0630Everton -0.0325 0.1046 Southampton -0.0414 -0.0724Fulham 0.0079 -0.0549 Tottenham -0.0201 0.1050Leeds United -0.0582 0.3758 Wolverhampton -0.0712 0.3529

Estimated Team (Random) Effects

(Teams with High Defense values Allow More Goals)

Parameter EstimateIntercept -0.6605Home 0.2624Half2 0.1783

Estimated Fixed Effects

For each Halfijkl compute exp{-0.6605+HOMEi+HALF2j+ok+dl} as the BLUP

Page 25: Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

Comparison of BLUPs with Actual Scores

• For Each Team Half, we have Actual and BLUP• Correlation Between Actual & BLUP = 0.2655• Concordant Pairs of Halves (One scores higher

on both Actual and BLUP than other) = 452471• Discordant Pairs of Halves = 355617• “Gamma” =

(452471-355617)/(452471+355617)=0.1199• Evidence of Some Positive Association Between

actual and predicted scores

Page 26: Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

"Distribution" of BLUPs by Actual Goals Scored

0

0.5

1

1.5

2

2.5

3

0 0.2 0.4 0.6 0.8 1 1.2 1.4

BLUP

No

rmal

Den

sity 0

1

2

3+

Sources: Data: SoccerPunter.com

Methods:

Littell, Milliken, Stroup, Wolfinger(1996). “SAS System for Mixed Models”

Wolfinger, R. and M. O’Connell(1993). “Generalized Linear Mixed Models: A Pseudo-Likelihood Approach,” J. Statist. Comput. Simul., Vol. 48, pp. 233-243.

Page 27: Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season

SAS Codedata one;infile 'engl2003d.dat';input hteam $ 1-20 rteam $21-40 goals 47-48 half2 56 home 64 round 71-73;if home=1 then do; offteam=hteam; defteam=rteam; end;else do; offteam=rteam; defteam=hteam; end;

%include 'glmm800.sas';%glimmix(data=two, procopt=method=reml, stmts=%str( class offteam defteam;

model goals = home half2 /s; random offteam defteam /s ; ), error=poisson, link=log);

run;