using soccer goals to motivate the poisson process

7

Click here to load reader

Upload: crumpiteer

Post on 12-Nov-2014

977 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Using Soccer Goals to Motivate the Poisson Process

CHUUsing Soccer Goals to Motivate the Poisson Process

Using Soccer Goals to Motivate the Poisson Process

Singfat CHUNUS Business School

National University of Singapore1 Business Link

SINGAPORE 117592

[email protected]

ABSTRACT

In introducing analytic queuing models, textbooksrarely justify the assumption of the exponential andthe Poisson distributions. This paper demonstrateshow a real-life phenomenon familiar to many students,namely the occurrences of soccer goals, can drivethe ideas. An Excel worksheet is made available forfurther analysis.

Editor’s note: This is a pdf copy of an html documentwhich resides athttp://ite.pubs.informs.org/Vol3No2/Chu/

1 INTRODUCTION

A standard topic in OR and OM courses is the man-agement of queues. Analytics developed for tractablequeuing models assume random arrivals according toa Poisson process with constant rateλ . Equivalently,inter-arrival times are said to be independent andto follow an exponential distribution with mean 1 /λ .The appropriateness of the exponential and Poissondistributions, their linkage and their properties whichlead to simple analytics, often escape our students astextbooks rarely provide empirical evidence to justifythem.

Recently, Grossman (1999) expounded on the utilityof process-driven spreadsheet queuing simulation tohelp students experience queue dynamics. Inter-arrivaltimes and service times (Figure 2, in the paper) aregenerated according to parameters specified by theuser. To appreciate the practicality of the simulationexercise, the user must therefore have a feel for therelevance of the exponential distribution used to gener-ate the inter-arrival times. This simulation approach,coupled with some analytical insights, is undoubtedly

fruitful to the pedagogy.

Experience has shown that analytical concepts arebest driven by real-life illustrations. An early ex-ample of this pedagogical paradigm is Lafleur et al.(1972) who report a student project which modeledthe number of individuals encountered during theapproximately 30 seconds required to walk through adormitory entrance as a Poisson distribution. Anotherexample is Schmuland (2001), who uses the Poissonmodel to explain the phenomena of bursts in sharkattacks and the scoring patterns of ice hockey legendWayne Gretzky.

Based on his observation that the Poisson distri-bution provides a good fit for goals scored in icehockey games, Berry (2000) assumes an exponentialdistribution for the times between goals to estimate thestrategic time to ”pull the goalie” when a team is downin a game. This problem was recently re-visited byZaman (2002) from a Markov Chain angle.

The link between the Poisson and exponentialdistributions can only be illustrated empirically if dataon the sequential timing of the random occurrencesare available. Call centers, for example, often collectstatistics on the number of customer calls receivedand not necessarily the times of individual calls. Thesame situation may prevail at toll booths where onlythe rates of car arrivals may be recorded. Likewise,Larsen and Marx (1981) have an interesting case studyon the incidence of war outbreaks during the period1500-1931. But they also do not provide data on wheneach war started.

This paper makes available a dataset on the tim-ing of a competitive phenomenon, namely the scoringof soccer goals in a sequence of 232 games. Soccer,with its global following, provides an excellent channelto sell the Poisson and the exponential distributionsto our students. The paper will demonstrate theirrelevance, their inter-relationship, their properties andtheir implication towards a better understanding ofthe soccer game. The occurrences of touchdownsin American football or home runs in baseball insequential games played by a particular team or playercan also lend themselves to a similar analysis.

INFORMS Transaction on Education3:2 (62-68) 62 c©INFORMS ISSN: 1532-0545

Page 2: Using Soccer Goals to Motivate the Poisson Process

CHUUsing Soccer Goals to Motivate the Poisson Process

2 THE WORLD CUP DATASET

The World Cup tournament pitching the best nationalteams is the ultimate in the soccer world. It is heldevery four years. South Korea and Japan co-hosted thelatest tournament between 31 May and 30 June 2002.That tournament involved 32 countries, which were ini-tially divided into 8 groups of four. In this first stage,each country played against each of its 3 group peers.The top 2 countries within each group then advancedto a knockout second stage. Ultimately, Brazil beatGermany 2-0 in the 64th and final game.

In 1998, the same tournament format was used inFrance. The host country was ultimately crowned asthe champion. However, when Italy and USA respec-tively hosted the World Cup in 1990 and 1994, only 24countries were showcased. In these two gatherings, therespective winners, Germany and Brazil, emerged after52 games.

Data on all the goal occurrences in these four WorldCup tournaments are available atFifa’s World Cupwebsite. Previous tournaments are ignored as the web-site does not provide information on the sequence ofgames. As such, they would not help in the illustrationof the exponential distribution.

This paper therefore focuses on the 232 games playedin the 1990-2002 World Cup tournaments. Only thegoals scored in the 90 minutes regulation time are con-sidered. This leaves out goals scored in extra time or inpenalty shoot-outs. A regular soccer game consists oftwo halves scheduled for 45 minutes. However, injurytime is often added at the end of each half to compen-sate for game stoppages arising from player injuries.The extent of injury time in each game is unfortu-nately not available. For consistency in the analysis,a goal scored in injury time, say at the 92nd minute,is recorded as occurring at the 90th minute. This isbecause until 1998, goals scored in added times werealways recorded at either the 45th or 90th minute. Thismay have some effect on the fit of the Poisson and ex-ponential distributions to the data.

The first game in Italy90 saw Cameroon scoring a sin-gle goal against Argentina at the 67th minute. In thesecond game, Romania scored 2 goals against the thenSoviet Union at the 42nd and 57th minutes. The time tothese goals are respectively 65(= (90−67)+42) and15 (= 57−42) minutes. Proceeding in the same way,574 inter-goal times were obtained by the end of the2002 Final game (game 232). Figure 1 illustrates thecomputation of the time between goals.

Figure 1: Illustration of time between goals

3 MOTIVATING THE EXPONENTIAL DIS-TRIBUTION

3.1 Mean and standard deviation coincide

The suitability of an exponential fit to the 574 inter-goal time intervals can be demonstrated in many ways.For example, a special property of the exponential dis-tribution is that its mean equals its standard deviation.These two statistics were found to be respectively 36.25and 36.68 minutes. These compare favorably againsttheir common theoretical expectation, namely 90/λ or36.31 minutes whereλ = 575/232 is the mean number

of goals scored per 90-minutes regulation game.

3.2 Chi-square test for the exponential fit

A more thorough validation is to compare the empir-ical distributions of the times between goals againstan exponential distribution with the estimated mean.For example, the theoretical probability of observingan inter-goal time interval between 0 and 10 minutesis 0.2407(= 1− exp(λ ∗ 10/90)). This implies thatamong 574 inter-goal intervals, we would expect about138 (= 574∗ .2407) to lie between 0 and 10 minutes.

INFORMS Transaction on Education3:2 (62-68) 63 c©INFORMS ISSN: 1532-0545

Page 3: Using Soccer Goals to Motivate the Poisson Process

CHUUsing Soccer Goals to Motivate the Poisson Process

Inter-goal Actual Empirical Theoretical ExpectedDuration (minutes) Probability Probability

0-10 144 0.2504 0.2407 13810-20 106 0.1843 0.1828 10520-30 86 0.1496 0.1388 8030-40 52 0.0904 0.1054 6040-50 46 0.0800 0.0800 4650-60 27 0.0470 0.0607 3560-70 35 0.0626 0.0461 2670-80 16 0.0278 0.0350 2080-90 22 0.0383 0.0266 1590-100 12 0.0209 0.0202 12100-110 3 0.0052 0.0153 9110-120 3 0.0052 0.0116 7120-130 6 0.0104 0.0088 5

130 or more 16 0.0278 0.0279 16Total 574 1 1 574

Table 1: Time between goals

Figure 2: Time between goals

The actual number of intervals was 144. Table 1 reportssimilar calculations for other intervals while Figure 2provides a graphical comparison.

The exponential fit to the actual distribution of timebetween goals can be assessed by a chi-square test.This returns a p-value of 0.2008 thereby suggestingthat the suitability of the exponential fit to the data cannot be rejected. This interfacing with Statistics is aneffective means to illustrate the link between differentdisciplines to the students.

3.3 Memoryless Property

The dataset also allows for a demonstration that the ex-ponential distribution lacks memory. Mathematically,it is written as P(T> t + s — T> t) = P(T> s), whereT is the time between consecutive goals. This impliesthat if the last goal occurred t minutes ago, then thechance of observing the next goal beyond the next sminutes only depends on s and bears no relationshipto t. In other words, the relative chance is a functionof the length of the time interval s irrespective of therelative location of the origin t.

INFORMS Transaction on Education3:2 (62-68) 64 c©INFORMS ISSN: 1532-0545

Page 4: Using Soccer Goals to Motivate the Poisson Process

CHUUsing Soccer Goals to Motivate the Poisson Process

Using Table 1 we find, for example, that empiricallyP(T > 10) = 1-144/574 = 0.7491 while P(T> 20 — T> 10) = (574 - 144 - 106) / (574 - 144) = 0.7534 andlikewise, P(T> 30 — T > 20) = 0.7346 etc. The em-pirical probabilities are again close to their theoreticalvalue of 0.7593, assuming thatλ = 575/232.

Alternatively, a soccer-centric approach to motivatethe memoryless property of the exponential distribu-tion is to ask the following, ”Given that the last goalwas scored 20 minutes ago, what is the chance of ob-serving a goal within the next 10 minutes?” The abovecalculations indicate that this probability is about 0.25.

More examples can be developed from Table 1 to illus-trate that no matter where a new origin is positioned,the exponential distribution regenerates itself as if theorigin is at time 0. Intuitively, this implies that the rel-ative variability or the coefficient of variation in both

the conditional and unconditional distributions mustbe the same. The exponential distribution satisfies thisrequirement since its mean and standard deviation co-incide in theory.

3.4 Independence of inter-goal times

Finally, another assumption in queuing theory is thatinter-arrival times are independent. To justify this, weshow that the autocorrelation function of the times be-tween goals does not depart significantly from 0. Thisis indeed the case as none of the autocorrelations listedin Table 2 lie outside a band of plus or minus twice thestandard error i.e. 2/

√574 or 0.0835.

The first-order or serial correlation can be visualizedvia a scatter plot of consecutive inter-goal times, asdepicted in Figure 3.

Lag 1 2 3 4 5 6 7 8 9Autocorrelation -0.0112 0.0199 -0.0564 0.0063 0.0058 0.0065 -0.0301 0.0249 -0.0387

Table 2: Autocorrelations for time between goals

Figure 3: Serial correlation of time between goals

4 ILLUSTRATING THE POISSON DISTRI-BUTION

4.1 Mean and Variance coincide

If the times between goals are exponentially dis-tributed, then the number of goals within a fixed period

should follow a Poisson distribution. The estimatedrate λ is 575/232 or about 2.4784 goals per 90 min-utes regulation game. The variance of the number ofgoals per game is found to be 2.4584. This is in closeconformance to the theoretical result that the mean andthe variance of a Poisson distribution coincide. The

INFORMS Transaction on Education3:2 (62-68) 65 c©INFORMS ISSN: 1532-0545

Page 5: Using Soccer Goals to Motivate the Poisson Process

CHUUsing Soccer Goals to Motivate the Poisson Process

fit of the Poisson distribution can be further assessedby benchmarking a theoretical frequency distributionof goals per game against the actual frequencies.

4.2 Chi-square test for Poisson Fit

The expected number of games with x goals is obtainedby multiplying the Poisson probability of x goals by

232, the total number of matches. The results are dis-played in Table 3. A chi-square test fails to reject thevalidity of the Poisson fit (p-value=0.9745).

A visual contrast of the actual distribution of goalsagainst the Poisson fit is provided in Figure 4.

INFORMS Transaction on Education3:2 (62-68) 66 c©INFORMS ISSN: 1532-0545

#Goals Actual Empirical Theoretical Expected# Games Probability Probability # Games

0 19 0.0819 0.0839 191 49 0.2112 0.2079 482 60 0.2586 0.2576 603 47 0.2026 0.2128 494 32 0.1379 0.1319 315 18 0.0776 0.0654 15

6 or more 7 0.0302 0.0406 9Total 232 1 1 232

Table 3: Actual and theoretical Distribution of goals over 90 minute intervals

Figure 4: Actual and expected distribution of goals

Page 6: Using Soccer Goals to Motivate the Poisson Process

CHUUsing Soccer Goals to Motivate the Poisson Process

4.3 Is the Poisson rate constant?

One property of the Poisson process is that the num-ber of events within alternative time intervals is alsoPoisson but with a proportionally adjusted mean. Thevalidity of this property can be ascertained in Table 4which contrasts the actual and expected distributions

of goals in different time intervals fixed at 45, 15, 10,5 and 1 minutes. Generally, the discrepancies betweenthe actual and expected frequencies are statistically in-significant. The Poisson property of coincident meanand variance is also evident for the different time inter-vals.

Time Intervals (mins)45 15 10 5 1

#Goals Actual Expected Actual Expected Actual Expected Actual Expected Actual Expected0 141 134 910 921 1576 1585 3629 3639 20305 203131 157 167 397 380 453 437 520 501 575 5672 100 103 77 79 55 60 26 343 48 43 8 11 4 6 1 24 16 135 2 3

Total 464 1392 2088 4176 20880Mean 1.2392 1.2392 0.4131 0.4131 0.2754 0.2754 0.1377 0.1377 0.0275 0.0275

Variance 1.2580 1.2392 0.3878 0.4131 0.2639 0.2754 0.1327 0.1377 0.0267 0.0275p-value

forchi- 0.88 0.66 0.60 0.22 0.74

squaretest of

fit

Table 4: Actual and theoretical Distribution of goals over alternative time intervals. p-values obtained after consol-idating classes such that their expected count is 5 or more.

Tables 3 and 4 are useful for contrasting empirical andtheoretical probabilities pertaining to the number ofgoals. For instance, one can read off the empiricalprobability of 2 goals within 10 minutes as 55/2088 or0.0263. The probability under an ideal Poisson processwould have been 0.0288.

Soccer fans may question the constancy of the rateof goals production as follows: If a goal has just beenscored, does this alter the rate of goals scored in theremainder of the game? The empirical evidence pointsinterestingly to NO. In other words, the rate of goalproduction stays constant. Out of the 232 games, therewere 213 with at least one goal scored. The mean timeto the first goal was 33.71 minutes. This translates toa mean production of 2.7 goals over the 90 minutesduration of a game. Following the first goal, a total

of 362 goals were scored i.e. at a mean production of2.7173 goals over 90 minutes. The mean rates of goalproduction before and after the first goal do not appearto be different. In layman term, the result suggests thatafter a goal is scored, changes in strategy by the teamsare effectively neutralized. As a result, the total rate ofgoal scoring stays constant. Students could be asked toinvestigate whether this result still holds after say a 1-1or a 2-0 score.

4.4 Axioms of the Poisson Process

Table 4 also lends itself to the illustration of the axiomsof the Poisson process, namely

1. P{1 goal in interval (t, t+∆t)} = λ * ∆t + o(∆t)

2. P{0 goal in interval (t , t+∆t)} = 1 - λ * ∆t + o(∆t)

INFORMS Transaction on Education3:2 (62-68) 67 c©INFORMS ISSN: 1532-0545

Page 7: Using Soccer Goals to Motivate the Poisson Process

CHUUsing Soccer Goals to Motivate the Poisson Process

3. The numbers of goals in non-overlapping time pe-riods are independent.

As the goal occurrences are recorded in minutes, theminimal time period that can be studied is 1 minute.From Table 4, the empirical probability of a goal in a 1minute time interval is 575/22880, which is the product

of λ =575/232 and∆t=1/90. This compares favorablywith theoretical probability of 0.02716. As there wasno 1-minute interval with 2 or more goals scored, boththe first and second axioms are verified. The autocor-relation function in Table 5 supports the third axiomof independence in the number of goals in these 22880non-overlapping 1-minute time intervals.

Lag 1 2 3 4 5 6 7 8 9Autocorrelation -0.0122 -0.0086 0.0021 -0.0033 0.0128 -0.0051 0.0003 0.0092 0.0033

Table 5: Autocorrelation for number of goals in 1-minute time intervals (margin of error=0.0138)

5 CONCLUSION

This contribution demonstrates that the Poisson and ex-ponential distributions can be convincingly illustratedusing an application of interest to many soccer fansworldwide. Students have an easier time grasping theconcepts because they can see how they arise in a do-main for which they may have a deep passion. As abonus, the findings provide them with some possiblynon-intuitive insights into the game of soccer. It is awin-win for both educators and students. They are in-vited to probe the dataset to investigate further prob-lems of interest to them.

ACKNOWLEDGEMENT

I thank the two referees whose helpful feedback con-siderably improved the paper.

REFERENCES

Berry S. (2000), “A Statistician Reads the Sports Pages:My Triple Crown,” Chance, Vol. 13, No. 3, pp. 56-61.

Grossman T.A. (1999), “Spreadsheet Modeling andSimulation Improves Understanding of Queues,”Inter-faces, Vol. 29, No. 3, pp. 88-103.

Lafleur M.S., P.F. Henrichsen, P.C. Landry, and R.B.Moore (1972), “The Poisson Distribution: An Ex-perimental Approach to Teaching Statistics,”PhysicsTeacher, Vol. 10, pp. 314-321.

Larsen R.J., and M.L. Marx (1981),An Introductionto Mathematical Statistics and its Applications, Pren-tice Hall, p. 148.(Englewood Cliffs)

Schmuland B. (2001), “Shark Attacks and the PoissonApproximation,” (http://www.pims.math.ca/pi/issue4/page12-14.pdf)

Zaman Z. (2002), “Coach Markov Pulls Goalie Pois-son,”Chance, Vol. 14, No. 2, pp. 31-35.

INFORMS Transaction on Education3:2 (62-68) 68 c©INFORMS ISSN: 1532-0545