0804.1110

Upload: sf1234567890

Post on 14-Apr-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 0804.1110

    1/9

    arXiv:0804.111

    0v2[physics.soc-p

    h]28Jul2008

    Understanding Baseball Team Standings and Streaks

    C. Sire1 and S. Redner1, 2

    1Laboratoire de Physique Theorique - IRSAMC, CNRS,Universite Paul Sabatier, 31062 Toulouse, France

    2Center for Polymer Studies and Department of Physics,Boston University, Boston, Massachusetts 02215, USA

    Can one understand the statistics of wins and losses of baseball teams? Are their consecutive-game winning and losing streaks self-reinforcing or can they be described statistically? We apply theBradley-Terry model, which incorporates the heterogeneity of team strengths in a minimalist way, toanswer these questions. Excellent agreement is found between the predictions of the Bradley-Terrymodel and the rank dependence of the average number team wins and losses in major-league baseballover the past century when the distribution of team strengths is taken to be uniformly distributedover a finite range. Using this uniform strength distribution, we also find very good agreementbetween model predictions and the observed distribution of consecutive-game team winning andlosing streaks over the last half-century; however, the agreement is less good for the previous half-century. The behavior of the last half-century supports the hypothesis that long streaks are primarilystatistical in origin with little self-reinforcing component. The data further show that the past half-century of baseball has been more competitive than the preceding half-century.

    PACS numbers: 89.75.-k, 02.50.Cw

    I. INTRODUCTION

    The physics of systems involving large numbers of in-teracting agents is currently a thriving field of research[1]. One of its many appeals lies in the opportunityit offers to apply precise methods and tools of physicsto the realm of soft science. In this respect, biolog-ical, economic, and a large variety of human systemspresent many examples of competitive dynamics that canbe studied qualitatively or even quantitatively by statis-tical physics. Among them, sports competitions are par-ticularly appealing because of the large amount of data

    available, their popularity, and the fact that they con-stitute almost perfectly isolated systems. Indeed, mostsystems considered in econophysics [2] or evolutionarybiology [3] are strongly affected by external and oftenunpredictable factors. For instance, a financial modelcannot predict the occurrence of wars or natural dis-asters which dramatically affect financial markets, norcan it include the effect of many other important exter-nal parameters (Chinas GDP growth, German exports,Googles profit. . . ). On the other hand, sport leagues(soccer [4], baseball [5], football [6]. . . ) or tournaments(basketball [7, 8], poker [9]. . . ) are basically isolated sys-tems that are much less sensitive to external influences.Hence, despite their intrinsic human nature, which ac-tually contribute to their appeal, competitive sports areparticularly suited to quantitative theoretical modeling.In this spirit, this work is focused on basic statisticalfeatures of game outcomes in Major-League baseball.

    In Major-League baseball and indeed in any competi-tive sport, the main observable is the outcome of a singlegame who wins and who loses. Then at the end ofa season, the win/loss record of each team is fundamen-tal. As statistical physicists, we are not concerned withthe fates of individual teams, but rather with the aver-

    age win/loss record of the 1st, 2nd, 3rd, etc. teams, aswell as the statistical properties of winning and losingstreaks. We concentrate on major-league baseball to il-lustrate statistical properties of game outcomes becauseof the large amount of available data [10] and the nearconstancy of the game rules during the so-called modernera that began in 1901.

    For non-US readers or for non-baseball fans, during themodern era of major-league baseball, teams have beendivided into the nearly-independent American and Na-tional leagues [11]. At the end of each season a championof the American and National leagues is determined (by

    the best team in each league prior to 1961 and by leagueplayoffs subsequently) that play in the World Series todetermine the champion. As the data will reveal, it isalso useful to separate the 19011960 early modern era,with a 154-game season and 16 teams, and the 19612005expansion era, with a 162-game season in which the num-ber of teams expanded in stages to its current value of30, to highlight systematic differences between these twoperiods. Our data is based on the 163674 regular-seasongames that have occurred between 1901 and the end ofthe 2005 season (72741 between 190160 and 90933 be-tween 19612005).

    While the record of each team can change significantlyfrom year to year, we find that the time average win/lossrecord of the rth-ranked team as a function of rank r isstrikingly regular. One of our goals is to understand therank dependence of this win fraction. An important out-come of our study is that the Bradley-Terry (BT) compe-tition model [12, 13] provides an excellent account of theteam win/loss records. This agreement between the dataand theory is predicated on using a specific form for thedistribution of team strengths. We will argue that thebest match to the data is achieved by using a uniformdistribution of teams strengths in each season.

    http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2http://arxiv.org/abs/0804.1110v2
  • 7/29/2019 0804.1110

    2/9

    2

    Another goal of this work is to understand the sta-tistical features of consecutive-game team winning andlosing streaks. The existence of long streaks of all typesof exceptional achievement in baseball, as well as in mostcompetitive sports, have been well documented [14] andcontinue to be the source of analysis and debate amongsports fans. For long consecutive-game team winning andteam losing streaks, an often-invoked theme is the no-

    tion of reinforcementa team that is on a roll is morelikely to continue winning, and vice versa for a slump-ing team on a losing streak. The question of whetherstreaks are purely statistical or self reinforcing contin-ues to be vigorously debated [15]. Using the BT modeland our inferred uniform distribution of team strengths,we compute the streak length distribution. We find thatthe theoretical prediction agrees extremely well with thestreak data during 19612005. However, there is a slightdiscrepancy between theory and the tail of the streak dis-tribution during 190160, suggesting that non-statisticaleffects may have played a role during this early period.

    As a byproduct of our study, we find clear evidence

    that baseball has been more competitive during 19612005 than during 190160 and feature that has beenfound previously [16]. The manifestation of this increasedcompetitiveness is that the range of team records and thelength of streaks was narrower during the latter period.This observation fits with the general principle [17] thatoutliers become progressively rarer in a highly compet-itive environment. Consequently, extremes of achieve-ment become less and less likely to occur.

    II. STATISTICS OF THE WIN FRACTION

    A. Bradley-Terry Model

    Our starting point to account for the win/loss recordsof all baseball teams is the BT model [12, 13] that incor-porates the heterogeneity in team strengths in a naturaland simple manner. We assume that each team has anintrinsic strength xi that is fixed for each season. Theprobability that a team of strength xi wins when it playsa team of strength xj is simply

    pij =xi

    xi + xj. (1)

    Thus the winning probability depends continuously onthe strengths of the two competing teams [18]. When twoequal-strength teams play, each team has a 50% proba-bility to win, while if one team is much stronger, then itswinning probability approaches 1.

    The form of the winning probability of Eq. (1) is quitegeneral. Indeed, we can replace the team strength xi byany monotonic function f(xi). The only indispensableattribute is the ordering of the team strengths. Thus thenotion of strength is coupled to the assumed form of thewinning probability. If we make a hypothesis about one

    of these quantities, then the other is no longer a vari-able that we are free to choose, but an outcome of themodel. In our analysis, we adopt the form of the winningprobability in Eq. (1) because of its simplicity. Then theonly relevant unknown quantity is the probability distri-bution of the xis. As we shall see in the next section,this distribution of team strengths can then be inferredfrom the season-end win/loss records of the teams, and

    a good fit to the data is obtained when assuming a uni-form distribution of team strengths. Because only theratio of team strengths is relevant in Eq. (1), we there-fore take team strengths to be uniformly distributed inthe range [xmin, 1], with 0 xmin 1. Thus the onlymodel parameter is the value ofxmin.

    For uniformly distributed team strengths {xj} that liein [xmin, 1], the average winning fraction for a team ofstrength x that plays a large number of games N, withequal frequencies against each opponent is

    W(x) =1

    N

    Nj=1

    x

    x + xj

    x1 xmin

    1

    xmin

    dy

    x + y

    =x

    1 xmin ln

    x + 1

    x + xmin

    , (2)

    where we assume N in the second line. We thentransform from strength x to scaled rank r by x = xmin+(1 xmin)r, with r = 0, 1 corresponding to the weakestand strongest team, respectively (Fig. 1). This result forthe win fraction is one of our primary results.

    0 0.2 0.4 0.6 0.8 1r

    0.3

    0.4

    0.5

    0.6

    0.7

    W(r)

    FIG. 1: Average win fraction W(r) versus scaled rank rfor 190160 () and 19612005 (). For these periods, thedashed lines are simulation results for the BT model withxmin = 0.278 and 0.435 respectively. The solid curves rep-resent Eq. (2), corresponding to simulations for an infinitelylong season and an infinite number of teams.

    To check the prediction of Eq. (2), we start with avalue of xmin and simulate 10

    4 periods of a model base-ball league that consists of: (i) 16 teams that play 60

  • 7/29/2019 0804.1110

    3/9

    3

    0 0.2 0.4 0.6 0.8 1r

    0.4

    0.5

    0.6

    W(r)

    FIG. 2: Convergence ofW(r) versus scaled rank r as a func-tion of season length for 19612005, using xmin = 0.435 and30 teams. The circles and the thick dashed curve are thebaseball data and the corresponding BT model data for an = 162 game season. The thin dashed lines are model datafor a season of n = 300, 500, and 1000 games averaged over100000 seasons. The full line corresponds to the model for aninfinitely long season with 30 teams. Finally, the + symbolsgive the result of Eq. (2), which corresponds to an infinite-length season and an infinite number of teams.

    seasons of 154 games (corresponding to 190160) and (ii)30 teams that play 45 seasons of 162 games (19612005),with uniformly distributed strengths in [xmin, 1] for bothcases, but with different values of xmin. Using the win-ning probability pij of Eq. (1), we then compute the av-erage win fraction W(r) of each team as function of itsscaled rank r. We then incrementally update the valueof xmin to minimize the difference between the simu-lated values ofW(r) with those from game win/loss data.Nearly the same results are found if each team plays everyopponent with equal probability or equally often, as longas the number of teams and number of games is not unre-alistically small. The BT model, with each team playingeach opponent with the same probability, gives very goodfits to the data by choosing xmin = 0.278 for the period190160, and xmin = 0.435 for 19612005 (Fig. 1). If the actual game frequencies in each season are used todetermine opponents, xmin changes slightlyto 0.289 for190160but remains unchanged for 19612005.

    Despite the fact that the number of teams has in-creased from 16 to 30 since in 1961, the range of winfractions is larger in the early era (0.320.67) than in theexpansion era (0.360.63), a feature that indicates thatbaseball has become more competitive. This observationaccords with the notion that the pressure of continuouscompetition, as in baseball, gradually diminishes the like-lihood of outliers [17]. Given the crudeness of the modeland real features that we have ignored, such as home-fieldadvantage (approximately 53% for the past century andslowly decreasing with time), imbalanced playing sched-ules, and in-season personnel changes due to trades and

    player injuries, the agreement between the data and sim-ulations of the BT model is satisfying.

    It is worth noting in Fig. 1 is that the win fractiondata and the corresponding numerical results from sim-ulations of the BT model deviate from the theoreticalprediction given in Eq. (2) when r 0 and r 1. Thisdiscrepancy is simply a finite-season effect. As shown inFig. 2, when we simulate the BT model for progressively

    longer seasons, the win/loss data gradually converges tothe prediction of Eq. (2).The present model not only reproduces the average

    win record W(r) over a given period, but it also correctlyexplains the season-to-season fluctuation 2(r) of the winfraction defined as

    2(r) 1Y

    Yj=1

    (W(r) Wj(r))2, (3)

    where Wj(r) is the winning fraction of the rth-ranked

    team during the jth season and

    W(r) = 1Y

    Yj=1

    Wj(r),

    is the average win fraction of the rth-ranked team and Yis the number of years in the period. These fluctuationsare the largest for extremal teams (and minimal for aver-age teams). There is also an asymmetry of(r) with re-spect to r = 1/2. Our simulations of the BT model withthe optimal xmin values that were determined previouslyby fitting to the win fraction quantitatively reproducethese two features of (r).

    0 0.2 0.4 0.6 0.8 1

    r

    0.01

    0.02

    0.03

    0.04

    (r)

    FIG. 3: Season-to-season fluctuation (r) for 190160 ()and for 19612005 (). The dashed lines are numerical simu-lations of the BT model for 104 periods with the same xminas in Fig. 1.

    In addition to the finite-season effects described above,another basic consequence of the finiteness of the seasonis that the intrinsically strongest team does not necessar-ily have the best win/loss record. That is, the average

  • 7/29/2019 0804.1110

    4/9

    4

    win fraction W does not necessarily increase with teamstrength. By luck, a strong team can have a poor recordor vice versa. It is instructive to estimate the numberof games G that need to be played to ensure that thewin/loss record properly reflects team strength. The dif-ference in the number of wins of two adjacent teams in thestandings is proportional to G(1xmin)/T, namely, thenumber of games times their strength difference; the lat-

    ter is proportional to (1 xmin)/T for a league that con-sists ofT teams. This systematic contribution to the dif-ference should significantly exceed random fluctuations,which are of the order of

    G. Thus we require

    G

    T

    1 xmin

    2(4)

    for the end-of-season standings to be ordered by teamstrength. Fig. 2 and Fig. 3 illustrate the fact that thiseffect is more important for the top-ranked and bottom-ranked teams. During the 190160 period, when major-league baseball consisted of independent American andNational leagues, T = 8, G = 154, and xmin

    0.3, so

    that the season was just long enough to resolve adjacentteams. Currently, however, the season length is insuffi-cient to resolve adjacent teams. The natural way to dealwith this ambiguity is to expand the number of teamsthat qualify for the post-season playoffs, which is what iscurrently done.

    B. Applicability of the Bradley-Terry Model

    Does the BT model with uniform teams strength pro-vide the most appropriate description of the win/lossdata? We perform several tests to validate this model.

    First, as mentioned in the previous section, the assump-tion (1) for the winning probability can be recast moregenerally as

    pij =f(xi)

    f(xi) + f(xj), (5)

    so that an arbitrary Xi = f(xi) reduces to the orig-inal winning probability in Eq. (1). Hence the cru-cial model assumption is the separability of the winningprobability. In particular, the BT model assumes that

    pij/pji = pij/(1 pij) is only a function of characteris-tics of team i, divided by characteristics of team j. Oneconsequence of this separability is the detailed-balance

    relationpik

    1 pikpkj

    1 pkj =pij

    1 pij , (6)

    for any triplet of teams. This relation quantifies the obvi-ous fact that if team A likely beats B, and B likely beatsC, then A is likely to beat C. Since we do not know theactual pij in a given baseball season, we instead consider

    zij =Wij

    Gij Wij , (7)

    where Wij is the number of wins of team i against j,and Gij is the number of game they played against eachother in a given season. If seasons were infinitely long,then zij pij/(1 pij), and hence

    zikzkj = zij. (8)

    -3 -2 -1 0 1 2 3

    -1

    0

    1

    FIG. 4: Comparison of the detailed balanced relation Eq. (8)for baseball data to the results of the BT model over 104

    periods (dashed lines), where each period corresponds to theresults of all baseball games during either 190160 (triangles)or 19602005 (circles). The xmin values are the same as inFig. 1. The straight lines are guides for the eye, with slope0.63 for the data for 190160 and 0.30 for 19612005.

    -3 -2 -1 0 1 2 3

    -0.5

    0

    0.5

    FIG. 5: Dependence of ln(zikzkj) vs ln(zij) on seasonlength for the 19612005 period. All Gij s are multiplied byM = 5, 10, 100 (steepening dot-dashed lines). The thickdashed line corresponds to M = 104 and is indistinguishablefrom a linear dependence with unit slope.

    To test the detailed balance relation Eq. (8), we plotln(zikzkj) as a function of ln(zij) from game data,averaged over all team triplets (i,j,k) and all seasonsin a given period (Fig. 4). We discard events for which

  • 7/29/2019 0804.1110

    5/9

    5

    Wij = Gij or Wij = 0 (team i won or lost all gamesagainst team j). Our simulations of the BT model over104 realizations of the 190160 and 19612005 periodswith the same Gij as in actual baseball seasons and withthe optimal values of xmin for each period are in excel-lent agreement with the game data. Although zikzkj inthe figure has a sublinear dependence of zij (slope muchless than 1 in Fig. 4), the slope progressively increases

    and ultimately approaches the expected linear relationbetween zikzkj and zij as the season length is increased(Fig. 5). We implement an increased season length bymultiplying all the Gij by the same factor M. Noticealso that ln(zikzkj) versus ln(zij) for the 190160 pe-riod has a larger slope than for 1961-2005 because theGij s are larger in the former period (Gij = 22) than inthe latter (Gij in the range 519).

    This study of game outcomes among triplets of teamsprovides a detailed and non-trivial validation for the BTform Eq. (2) for the winning probability. As a byproduct,we learn that cyclic game outcomes, in which team Abeats B, B beats C, and C beats A, are unlikely to occur.

    C. Distribution of Team Strengths

    Thus far, we have used a uniform distribution of teamstrengths to derive the average win fraction for the BTmodel. We now determine the most likely strength dis-tribution by searching for the distribution that gives thebest fit to the game data for W(r) by minimizing thedeviation between the data and the simulated form ofW(r). Here the deviation is defined as

    2 = r[W(r) W(r; )]2

    rW(r)2

    , (9)

    where W(r; ) is the winning fraction in simulations ofthe BT model for a trial distribution (x) in which theactual game frequencies Gij were used in the simulation,and W(r) is the game data for the winning fraction.

    We assume that the two periods 190160 and 19612005 are long enough for W(r) to converge to its averagevalue. We parameterize the trial strength distribution asa piecewise linear function ofn points, {(yi)}, with yi [0, 1] and yn 1. We then perform Monte Carlo (MC)simulations, in which we update the yi and i = (yi) bysmall amounts in each step to reduce . Specifically, ateach MC step, we select one value of i = 1,...,n, and

    with probability 1/2 adjust yi (except yn = 1) byuy/10, where y is the spacing between yi andits nearest neighbor, and u is a uniform randomnumber between 0 and 1;

    with probability 1/2, update (yi) by u (yi)/10.If decreases as a result of this update, then yi or (yi)is set to its new value; otherwise the change in the param-eter value is rejected. We choose n = 8, which is large

    enough to obtain a distribution with significant featuresand for which typically 10002000 MC steps are suffi-cient for convergence. A larger n greatly increases thenumber of MC steps necessary to converge and also in-creases the risk of being trapped in a metastable statebecause the size of the phase space grows exponentiallywith n. To check that this algorithm does not get trappedin a metastable state, we started from several different

    initial states and found virtually identical final distribu-tions (Fig. 6). The MC-optimized distribution for eachperiod is remarkably close to uniform, as shown in thisfigure.

    0 0.2 0.4 0.6 0.8 1x

    0

    0.5

    1

    1.5

    2

    2.5

    (x)

    FIG. 6: Optimized strength distributions (x) for 190160(triangles) and 19612005 (circles), together with the opti-mal uniform distributions (dashed). For 19612005, we alsoshow the final distributions starting from yis equally spacedbetween y1 = 0.1 and y8 = 1 with the distribution : (a) uni-form on [0.1, 1] (open circles), and (b) a symmetric V-shapeon [0.1, 1] (full circles).

    0 0.2 0.4 0.6 0.8 1r

    0.3

    0.4

    0.5

    0.6

    W(r)

    FIG. 7: Comparison of the winning fraction W(r) extractedfrom the actual baseball data (symbols) to the model with aconstant (x) (dashed lines), and with the optimal log-normaldistribution (x) (full lines).

    Although the optimal distributions are visually not

  • 7/29/2019 0804.1110

    6/9

    6

    uniform, the small difference in the relative errors, thecloseness of y1 and xmin, and the imperceptible differ-ence in the r dependence of W(r) for the uniform andoptimized strength distributions suggests that a uniformteam strength distribution on [xmin, 1] describes the gamedata quite well.

    For completeness, we also considered theconventionally-used log-normal distribution of team

    strengths [5, 19]:

    (x) =1

    2xexp

    1

    22

    lnx

    x

    +

    2

    2

    2. (10)

    With the normalization convention of Eq. (10), the av-erage team strength is simply x, which can be set toany value due to the invariance of pij with respect tothe transformation x x. Hence, the only relevantparameter is the width . Using the same MC optimiza-tion procedure described above, we find that a log-normalansatzfor the strength distribution with optimal parame-ter gives a visually inferior fit of the winning fraction in

    both periods compared to the uniform strength distribu-tion, especially for r close to 1 (see Fig. 7). The relativeerror for the log-normal distribution is also a factor of6 and 3 larger, respectively, than for the optimal distri-bution in the 190160 and 19612005 periods. However,we do reproduce the feature that the optimal log-normaldistribution for 19612005 is narrower ( = 0.238) thanthat for 190160 ( = 0.353), indicating again that base-ball is more competitive in the second period than in thefirst.

    III. WINNING AND LOSING STREAK

    STATISTICS

    We now turn to the distribution of consecutive-gamewinning and losing streaks. Namely, what are the prob-abilities Wn and Ln to observe a string of n consecu-tive wins or n consecutive losses, respectively? Becauseof its emotional appeal, streakiness in a wide variety ofsports continues to be vigorously researched and debated[15, 20, 21]. In this section, we argue that indepen-dent game outcomes that depend only on relative teamstrengths describes the streak data for the period 1961-2005 quite well. The agreement is not as good for theperiod 1901-60 and suggests that non-statistical effectsmay have played a role in the longest streaks.

    Historically, the longest team winning streak (with tiesallowed) in major-league baseball is 26 games, achievedby the 1916 New York Giants in the National Leagueover a 152-game season [22]. The record for a pure win-ning streak since 1901 (no ties) is 21 games, set by theChicago Cubs in 1935 in a 154-game season, while theAmerican League record is a 20-game winning streak bythe 2002 Oakland Athletics over the now-current 162-game season. Conversely, the longest losing streak since1901 is 23, achieved by the 1961 Philadelphia Phillies

    in the National League [23], and the American Leaguelosing-streak record is 21 games, set by the BaltimoreOrioles at the start the 1988 season. For completeness,the list of all winning and all losing streaks of 15 gamesis given in the appendix.

    0 5 10 15 20 25n

    10-5

    10-4

    10-3

    10-2

    10-1

    100

    Pn

    FIG. 8: Distribution of winning/losing streaks Pn versus nsince 1901 on a semi-logarithmic scale for 190160 () and19612005 (). The dashed curves are the result of simula-tions with xmin = 0.278 and xmin = 0.435 for the two re-spective periods. The smooth curves are streak data fromrandomized win/loss records, and the dotted curve is 2n.

    Fig. 8 shows the distribution of team winning and los-ing streaks in major-league baseball since 1901. Becausethese winning and losing streak distributions are virtu-ally identical for n 15, we consider Pn = (Wn + Ln)/2,the probability of a winning or a losing streak of lengthn (Fig. 8). It is revealing to separate the streak distri-

    butions for 190160 and 19612005. Their distinctnessis again consistent with the hypothesis that baseball isbecoming more competitive. In fact, exceptional streakswere much more likely between 190160 than after 1961.Of the 55 streaks of 15 games, 27 occurred between190130, 13 between 193160, and 15 after 1960 [24].

    The first point about the streak distributions is thatthey decay exponentially with n, for large n. This be-havior is a simple consequence of the following bound:consider a baseball league that consists of teams with ei-ther strengths x = 1 or x = xmin > 0, and with gamesonly between strong and weak teams. Then the distri-bution of winning streaks of the strong teams decays as(1 + xmin)n; this represents an obvious upper bound for

    the streak distribution in a league where team strengthsare uniformly distributed in [xmin, 1].

    We now apply the BT model to determine the formof the consecutive-game winning and losing streak dis-tributions. Using Eq. (2) for the single-game outcomeprobability, the probability that a team of strength xhas a streak of n consecutive wins is

    Pn(x) =n

    j=1

    x

    x + xj

    x0x + xn+1

    xn+1x + xn+1

    . (11)

  • 7/29/2019 0804.1110

    7/9

    7

    The product gives the probability for n consecutive winsagainst teams of strengths xj , j = 1, 2, . . . , n (some fac-tors possibly repeated), while the last two factors givethe probability that the 0th and the (n + 1)st games arelosses to terminate the winning streak at n games. As-suming a uniform team strength distribution (x), andfor the case where each team plays the same number ofgames with every opponent, we average Eq. (11) over all

    opponents and then over all teams.The first average gives:

    Pn(x){xj} = xn

    1

    x + y

    ny

    x + y

    . (12)

    with 1

    x + y

    =

    1

    1 ln

    x + 1

    x +

    ,

    y

    x + y

    = 1 x

    1 ln

    x + 1

    x +

    for a uniform distribution of team strengths in [xmin, 1].

    Here we use the fact that each team strength is indepen-dent, so that the product in Eq. (11) factorizes. We nowaverage over the uniform strength distribution, to find,for the team-averaged probability to have a streak of nconsecutive wins,

    Pn = 11 xmin

    1xmin

    f(x) eng(x) dx , (13)

    where

    f(x) =

    1 x

    1 xmin ln

    x + 1

    x + xmin

    2

    g(x) = l n x + ln

    11 xmin ln

    x + 1

    x + xmin

    .

    Since g(x) monotonically increases with x within[xmin, 1], the integral in Eq. (13) is dominated by thebehavior near the maximum of g(x) at x = 1 for large n.Performing the integral by parts [25], the leading behav-ior is

    Pn eng(1) , (14)with

    g(1) =

    ln(1

    xmin) + lnln

    2

    1 + xmin .

    As expected, Pn decays exponentially with n, butwith a decay rate that decreases as teams become moreheterogeneous (decreasing xmin). In the limit of equal-strength teams, the most rapid decay of the streak prob-ability arises, Pn = 2n, while the widest disparity inteam strengths, xmin = 0, leads to the slowest possibledecay Pn (ln2)n (0.693)n.

    We simulated the streak distribution Pn using the samemethodology as that for the win/loss records; related

    simulations of streak statistics are given in Refs. [19, 21].Taking xmin = 0.435 for 19612005the same value asthose used in simulations of the win/loss recordswe finda good match to the streak data for this period. The ap-parent systematic discrepancy between data and theoryfor n 17 is illusory because streaks do not exist forevery value of n. Moreover, the number of streaks oflength n 17 is only eight, so that fluctuations are quiteimportant.

    For the 190160 period, if we use xmin = 0.278, thedata for Pn is in excellent agreement with theory forn < 17. However, for n in the range 1722, the data is aroughly factor of 2 greater than that given by the analyt-ical solution Eq. (14) or by simulations of the BT model.Thus the tail of the streak distribution for this early pe-riod appears to disagree with a purely statistical modelof streaks. Again, the number of events for a n 17 is 5or less, compared to a total number of 70000 winningand losing streaks during this period. Hence one cannotexclude the possibility that the observed discrepancy forn 17 is simply due to lack of statistics.

    Finally, we test for the possible role of self-reinforcement on winning and losing streaks. To this end,we take each of the 2166 season-by-season win/loss his-tories for each team and randomize them 105 times. Foreach such realization of a randomized history, we com-pute the streak distribution and superpose the resultsfor all randomized histories. The large amount of datagives streak distributions with negligible fluctuations upto n = 30 and which extend to n = 44 and 41 for the twosuccessive periods. More strikingly, these streak distribu-tions based on randomized win/loss records are virtuallyidentical to the simulated streak data as well as to thenumerical integration of Eq. (13), as shown in Fig. 8.

    IV. SUMMARY

    To conclude, the Bradley-Terry (BT) competitionmodel, in which the outcome of any game dependsonly on the relative strengths of the two competingteams, quantitatively accounts for the average win/lossrecords of Major-League baseball teams. The distribu-tion of team strengths that gives the best match to thesewin/loss records was found to be quite close to uniformover a range [xmin, 1], with xmin 0.28 for the early mod-ern era of 19011960 and xmin 0.44 for the expansionera of 19612005. This same BT model also reproducesthe season-to-season fluctuations of the win/loss records.An important consequence of the BT model is the ex-istence of a non trivial detailed-balance relation whichwe verified with satisfying accuracy. We consider thisverification as a quite stringent test of the theory.

    The same BT model was also used to account for thedistribution of team consecutive-game winning and losingstreaks. We found excellent agreement between the pre-diction of the BT model and the streak data for n < 17for both the 1901-60 and 1961-2005 periods. However,

  • 7/29/2019 0804.1110

    8/9

    8

    the tail of the streak distribution for the 190160 periodwith n 17 is less accurately described by the BT the-ory and it is an open question about the mechanismsfor the discrepancy, although it could well originate fromlack of statistics. We also provided evidence that self-reinforcement plays little role in streaks, as randomiza-tions of the actual win/loss records produces streak dis-tributions that are indistinguishable from the streak data

    except in for the n 17 tail during the 1901-60 period.We also showed that the optimal team strength distri-bution is narrower for the period 19612005 compared to190160. This narrowing shows that baseball competi-tion is becoming keener so that outliers in team perfor-mance over an entire seasonas quantified by win/lossrecords and lengths of winning and losing streaksareless likely to occur.

    We close by emphasizing the parsimonious natureof our modeling. The only assumed features are theBradley-Terry form Eq. (2) for the outcome of a single

    game, and the uniform distribution of the winning proba-bilities, controlled by the single free parameter xmin. Allother model features can then be inferred from the data.While we have ignored many aspects of baseball thatought to play some rolethe strength of a team chang-ing during a season due to major trades of players and/orinjuries, home-field advantage, etc.the agreement be-tween the win fraction data and the streak data with

    the predictions of the Bradley-Terry model are extremelygood. It will be worthwhile to apply the approaches ofthis paper to other major sports to learn about possibleuniversalities and idiosyncracies in the statistical featuresof game outcomes.

    Acknowledgments: SR thanks Guoan Hu for datacollection assistance, Jim Albert for literature advice,and financial support from NSF grant DMR0535503 andUniversite Paul Sabatier.

    [1] See e.g., W. Weidlich Sociodynamics; A Systematic Ap-proach to Mathematical Modelling in Social Sciences,(Harwood Academic Publishers, 2000); M. Lassig andA. Valleriani (eds.), Biological Evolution and Statisti-cal Physics, (Springer, Berlin, 2002); M. Newman, A.-L.Barabasi, and D. J. Watts, The structure and dynamicsof networks, Princeton University Press (2006).

    [2] J.-P. Bouchaud and M. Potters, Theory of financial riskand derivative pricing: from statistical physics to riskmanagement, Cambridge University Press (2003).

    [3] J. Krug and C. Karl, Physica A 318, 137 (2003); K. Jainand J. Krug, J. Stat. Mech. P04008 (2005).

    [4] E. Bittner, A. Nussbaumer, W. Janke, and M. Weigel,

    Eurpphys. Lett. 78, 58002 (2007); Nature 441 793(2006).

    [5] J. Albert & J. Bennett, Curve Ball: Baseball, Statistics,and the Role of Chance in the Game (Springer New York,2001).

    [6] J. Park and M. E. J. Newman, J. Stat. Mech. P10014(2005).

    [7] E. Ben-Naim, S. Redner, and F. Vazquez, Europhys.Lett. 77, 30005 (2007).

    [8] E. Ben-Naim and N. W. Hengartner, Phys. Rev. E 76,026106 (2007).

    [9] C. Sire, J. Stat. Mech. P08013 (2007).[10] The data presented here were obtained from

    www.shrpsports.com.[11] However, since 1997 a small amount of interleague play

    during the regular season has been introduced.[12] E. Zermelo Mathematische Zeitschrift 29, 435 (1929).[13] R. A. Bradley & M. E. Terry, Biometrika 39, 324 (1952).[14] See, e.g., http://en.wikipedia.org/wiki/List of MLB

    individual streaks.[15] R. C. Vergin, J. of Sport Behavior 23 2000.[16] E. Ben-Naim, F. Vazquez, and S. Redner, Journal of

    Quantitative Analysis in Sports 2, No. 4, Article 1 (2006);[17] S. J. Gould, Full House: The Spread of Excellence from

    Plato to Darwin (Three Rivers Press, New York, 1996).[18] In contrast, in Ref. [16], the winning probability was

    taken to be independent of the relative strengths of thetwo teams; the stronger team won with a fixed probabil-ity p and the weaker won with probability 1 p.

    [19] B. James, J. Albert, & H. S. Stern, Chance 6, 17 (1993).[20] T. Gilovich, R. Vallone, & A. Tversky, Cognitive Psy-

    chology 17, 295 (1985).[21] J. Albert, Chance 17, 37 (2004).[22] http://answers.yahoo.com/question/index?qid=

    1006053108634. This record is slightly tainted be-cause of a tie during this streak, and ties are no longerallowed to occur; every game that is tied at the end ofthe regulation 9 innings must continue until one teamwins.

    [23] http://en.wikipedia.org/wiki/List of worst MLBseason records.

    [24] Moreover, three of the post-1960 15 game losingstreaks occurred during the initial year of necessarilyweak expansion teams because they were stocked withthe weakest players from established teams (1962 NYMets, 1969 Montreal Expos, 1972 Texas Rangers).

    [25] C. M. Bender & S. A. Orszag, Advanced MathematicalMethods for Scientists and Engineers(McGraw-Hill, NewYork, 1978) section 6.3.

  • 7/29/2019 0804.1110

    9/9