effects of travel distance on away team win percentage in the nfl
TRANSCRIPT
1
Effects of Travel Distance on Away Team Win Percentage in the NFL
Boston College
Econometric Methods Professor Cox
5/5/2016
Kyle Waters Yi Zhang
Donald Tynion George Acevedo
2
Abstract: Does traveling long distances have a negative impact on team performance in the
National Football League (NFL)? The consequences of longer travel distance are travel fatigue
and jet lag, physiological conditions which can be harmful to an NFL player’s potential
performance. This would imply an advantage to those teams who travel less and play in familiar
time zones. Yet, the distribution of travel is grossly unequal among teams – some teams can
accumulate over five times more distance throughout a season than others. Existing analysis of
NFL team performance from win-loss records from 1978-1987 suggest that longer distances
traveled on away games with multiple time zone changes influence game outcomes. Previous
studies acknowledge the need for more robust research on the subject in method and
implementation. A well-structured econometric model with accurate data on distance traveled
and time zone change provides a mechanism to clarify and confirm previous studies. After
controlling for various qualities of both team and stadium, the predicted odds of the away team
winning decreases by 3.5% as the distance needed to travel to the home opponent increases by
1000 km. Further investigation seems to indicate that the magnitude of this finding can change
under certain circumstances, though the negative impact remains clear. With the NFL currently
exploring an expansion team for London, England, these conclusions would suggest a London-
based team to be at a major disadvantage to United States-based opponents.
3
I. Introduction
The idea of a home-field advantage1 is a greatly cited concept. The theory suggests that teams
that play on the road are at a major disadvantage playing in a stadium with a crowd rooting
against them, unfamiliar stadium characteristics, and referees – in some cases – biased in favor
of the home team. Does a major portion of this advantage, though, originate from the effects of
travel that away teams must undergo to compete? Anyone who has traveled long distances by
plane can testify to the detrimental impact of travel fatigue. Many people have also experienced
jet lag, where the body clock of the individual is asynchronous from the local time zone, as a
result of crossing one or more time zones. These factors have real, detrimental effects on human
health and can be especially complicating for professional athletes. Some NFL teams, though,
travel much more than others. The Seattle Seahawks and San Francisco 49ers are often the most
frequent fliers. Together, these two teams traversed over 88,536 kilometers in 2003 while the
Philadelphia Eagles and New York Giants covered a mere 31,003 kilometers. If distance traveled
decreases the odds of winning, west coast teams such as these two would have a legitimate
argument to make.
In 2013, researchers from Stanford University2 investigated the question of whether NFL game
outcomes depend upon changes in time zone. Their study, titled, “The Impact of Circadian
Misalignment on Athletic Performance in Professional Football Players,” looked at data from 40
years and found that west coast teams playing east coast teams at night held a big advantage.
This study, though, did not account for the actual distance traveled and instead, focused on start
time and time zone. This paper aims to add the variable for distance traveled in addition to the
start time and time zone.
Our main finding centers on a general observation that when holding all else equal, an increase
in distance of 1000 kilometers will lower the odds of the away team winning by 3.5%. To reach
this conclusion, we employed a number of empirical methods. We focused on running an
ordinary least squares regression on five separate models each including different control
variables. Our final model indicates that distance traveled follows a nonlinear effect on win
percentage. The predicted win percentage greatly decreases when teams travel more than 3600
kilometers. Interestingly, this finding aligns with the average distance needed to cross three time
zones. Put differently, the decrease in the odds of the away team winning is worse when
traveling 1000 extra kilometers from a distance of 3000 km than from a distance of 2000 km.
The model’s estimate of the effect of distance on predicted win change with the consideration of
other variables that may be correlated with distance traveled. Our model omits a potentially
important determinant of home field advantage—unfamiliar weather conditions. Away teams’
odds of winning can potentially be impaired by playing in unfamiliar weather conditions. Our
1 Bill Barnwell (2012), NFL’s Frequent-Flier Phenomenon 2 Roger S. Smith; Bradley Efron; Cheri D. Mah; Atul Malhotra (2013), The Impact of Circadian Misalignment on Athletic Performance in Professional Football Players
4
model attempts to account for this deficiency by adding a dummy variable for whether the game
is played at a dome or not. We observe that when the away team plays at a dome, the effects of
travel distance on team performance are significantly less negative. Furthermore, some teams
recognize the downsides of increased travel and attempt to mitigate the effects by staying in the
time zone they play in on the road in back to back weeks.
The basic conclusion that distance traveled negatively impacts team performance is significant
when considering the NFL’s potential plans to expand into London, England. If distance traveled
across multiple time zones truly has a major negative effect on performance, it would be best for
the NFL to drop the idea. It would also make sense to optimize divisional arrangements on
geographical distance to minimize travel.
5
II. Data
The dataset we created to test the relationship between travel distance and win percentage was
specifically tailored to fit the purposes of this study. The dataset compiles the data from every
game of every season between 2000 and 2003. For the purpose of our analysis, all games are
observed from the perspective of the away team.
The table below summarizes the basics of the dataset:
TABLE 1
Summary Statistics
Variable Mean Std. Dev. Min Max
Win (𝑤𝑖𝑛, 1 if the observed team wins, 0 if loses) .4195076 .4937123 0 1
Team Talents & Performance
Rest days since the last game (𝑟𝑒𝑠𝑡8)a .1439394 .3511946 0 1
MVP (𝑚𝑣𝑝, 1 if the observed team has MVP, 0 if not) .0445076 .2063176 0 1
MVP Candidate (𝑚𝑣𝑝𝑐𝑎𝑛𝑑)b .1410985 .3482881 0 1
# of offensive pro-bowlers of the observed team (𝑝𝑏𝑜𝑓𝑓1) 1.293561 1.550721 0 6
# of offensive pro-bowlers of the opponent (𝑝𝑏𝑜𝑓𝑓2) 1.310606 1.572911 0 6
# of defensive pro-bowlers of the observed team (𝑝𝑏𝑑𝑒𝑓1) 1.334280 1.357094 0 6
# of defensive pro-bowlers of the opponent (𝑝𝑏𝑑𝑒𝑓2) 1.340909 1.351591 0 6
Stadium
Stadium Age (𝑠𝑡𝑑𝑚𝑎𝑔𝑒) 26.05492 11.73357 0 57
Attendance (𝑎𝑡𝑡𝑑95)c .6922348 .4617875 0 1
Dome (𝑑𝑜𝑚𝑒, 1 if the stadium is indoor, 0 if not) .2367424 .4252843 0 1
Mile High (𝑚𝑖𝑙𝑒ℎ𝑖𝑔ℎ, 1 if in Denver, 0 if not) .0303030 .1715010 0 1
Time & Distance
Crossing 3 time zones to East & Early afternoon game
(𝑡𝑧3𝑒𝑒𝑎𝑟𝑙𝑦, 1 if yes, 0 if not)d
.0388258 .1932710 0 1
Crossing 3 time zones to East & Night game
(𝑡𝑧3𝑒𝑛𝑖𝑔ℎ𝑡, 1 if yes, 0 if not)d
.0113636 .1060432 0 1
Distance (𝑑𝑖𝑠𝑡)e 1518.366 1022.705 0 4376.407
# of Observations 1056
a. 𝑟𝑒𝑠𝑡8 is a dummy that = 1 if # of rest days since the last game <= 8, zero otherwise b. 𝑚𝑣𝑝𝑐𝑎𝑛𝑑 is a dummy that = 1 if the observed team has MVP candidate (MVP excluded) in that season, zero otherwise c. 𝑎𝑡𝑡𝑑95 is the dummy that = 1 if the average attendance rate of that season > 95%, zero otherwise d. Early afternoon game is defined as before 3pm (EST), normally around 1pm; Night game is defined as after 6pm
(EST), normally around 9pm; the reference group is Late afternoon game (3pm – 6pm, EST) e. 𝑑𝑖𝑠𝑡 in Kilometers
The main resource to create the dataset is the website Pro-Football-Reference, which compiles
the dates and times of all games dated back to 1920: win-loss, game performance—in terms of
both offensive and defensive statistics—as well as the expected points set before each game.
The attendance data collected is not an exact representation of each individual game, but an
annual average from each stadium. It is expressed as a dummy variable in the model with 95%
6
capacity being the cutoff point to help account for crowd noise and other effects. It is worth
pointing out that this dummy variable, though under the category of stadium, also offers a proxy
for the strength of opponent because better teams’ home games have higher attendance.
For consistency, the distance traveled is assumed to be identical to the distance from stadium to
stadium. The distance data is computed based on the Haversine Formula, which accounts for the
curvature of the earth. The latitude and longitude given for each stadium are used as inputs to the
calculation. It is further assumed in the dataset that the away team travels directly to the home
team’s stadium before each game and directly back to their home stadium after each game.
Layovers, delays, and distance to and from each airport are not taken into consideration.
One problem with the dataset might be the lack of observations, which is a tradeoff made
between building a first-hand, tailor-made, reliable dataset and borrowing datasets that may not
best suit the needs of the paper. The collection of time and distance variables accounted for a
large portion of the workload. A larger dataset across more seasons with more observations
would most likely yield a more accurate representation of the relationship between distances
traveled and win percentage.
7
III. Theoretical Considerations
The fundamental consideration is a potential negative relationship between travel distance and
win percentage for the away team in a game, namely, the travel fatigue. Travel fatigue is
identified as disorientation because of changes in climate, natural lighting, dehydration due to
cabin air, restricted choice of food, and limited space for exercise or movement, which is
expected to be an inhibitor to athletic performance. Symptoms of travel fatigue can be eased by
sleep, which is part of the reasons to control for the rest-day dummy variable (𝑟𝑒𝑠𝑡8).
However, the distance-win relationship is complicated by a number of other factors, which
includes change of time zones, starting time of the game and stadium condition.
One of the key factors is jet lag caused by traveling across the time zones in the country, ranging
from -3 to 3. Winning percentage is expected to decrease for the away teams as travel distance
increases because the air travel causes jet lag when the visiting team travels across time zones.
Moreover, jet lag could have an incremental effect on the distance effect - travelling from west to
east is expected to be different from travelling from north to south in terms of the negative
performance impact, given the same level of distance.
Starting time is another factor of consideration. The effects of different starting times will vary
due to the existence of travel fatigue and jet lag. A 1pm game in Eastern Standard Time (EST) in
FedEx Field, Washington will possibly have different effects on Washington Redskins and
Arizona Cardinals which just traveled more than 3000km for the game.
Given the level of complexity of time and distance factors, the report strategically focuses on one
special case – the effect of the away team crosses 3 time zones from west to east that is from
Pacific Standard Time (PST) to EST. If the San Francisco 49ers visit Philadelphia and compete
against Eagles in a 1pm game, the body clock of the 49ers supposedly still remains at 10am,
when the game kicks off. If the game starts at 9pm, the body clock of the 49ers will be at 6pm.
These two scenarios are expected to yield different outcomes.
These theories will be explored in depth in the later sections.
8
IV. Descriptive Findings
The first consideration is whether there is any evidence that “home-field advantage” influences
athletic performance. A clear answer surfaces when comparing home and away win percentages
in the seasons from 2000 to 2003. The home team wins 58% of the time in these years, indicating
a strong impact from home-field advantage. A simple difference of means test of whether home
teams win more demonstrates that home teams win more on average, with a significant p-value
of 0. Whether this advantage stems from total distance traveled or time zones crossed by the
away team, however, requires further analysis.
Comparing game outcomes from years 2000 to 2003, shows an inverse relationship between
travel distance and win percentage: longer travel, on average, results in lower win percentage.
Away teams that traveled more than the average 1518 kilometers to their home opponents in
these years win their matchup 39.4% of the time. Teams that traveled less than 1518 km, on the
other hand, win 43.6% of the time. While this difference seems significant on the surface, control
variables for important confounding factors such as strength of opponent and size of the crowd
cheering for the home team, for example, must be taken into account.
This trend becomes apparent with the implementation of a smoothed relationship between
distance traveled and win percentage. The graph below depicts a helpful visualization of the
downward trend suggested by the simple conditional mean discussed above.
GRAPH 1
9
There is an inverse relationship between distance traveled and win percentage. However, it
seems that predicted win percentage increases as travel increases around 3000 kilometers. This
seemingly contradictory trend reappears even with more intricate econometric modeling. It is
difficult to distinguish whether this trend has meaning or is simply noise in the data.
Jet lag, as opposed to travel fatigue, requires the crossing of time zones. When investigating the
basic data for wins and time zone change, it appears that the more time zones crossed, the lower
the win percentage. Away teams crossing three time zones either to the west or east had a win
percentage of 37.6% compared to a win percentage of 45.4% for those teams that stayed in their
familiar time zone. Another intriguing comparison emerged when looking at the win percentage
of those teams that travel three time zones to the west compared to those teams that travel three
time zones to the east. Teams that travel east to west three time zones win 40% of the time while
teams that travel three time zones west to east win 35.2% of the time.
Starting time is significant explanation for this finding. Most NFL games start in the early
afternoon, which is particularly decreases athletic performance to west coast teams playing on
the east coast. In order to fit in pre-game routines for a 1pm EST game, players awake around
9am. Yet, this feels like 6am to players accustomed to Pacific Standard Time. This finding is a
well-known one—west coast teams such as the San Francisco 49ers argue against early
afternoon games on the east coast for years. The graph below summarizes these findings with a
smoothed relationship. Each dot indicates a single time zone change, positive for west to east and
negative for east to west. The middle dot represents win percentage with 0 time zones changed.
GRAPH 2
10
Hence, the graph is consistent with the finding that time-zone change decreases athletic
performance, and that eastward time zone change decreases performance more than westward
time zone change.
Another interesting trend arises when observing away team win percentage at domed stadiums
versus non-domed stadiums. Away teams that play their opponent at a domed stadium win
45.2% of the time while away teams that play at outdoor stadiums win 40.9% of the time. A
possible explanation for this finding is the unfamiliar weather conditions the teams face when
traveling on the road. For instance, when the Miami Dolphins come north to play the New
England Patriots at the non-domed Gillette Stadium, they face drastically cooler weather most of
the time. This observation is important when considering the effects of distance on performance
as the farther a team travels, the more likely they are to face unfamiliar weather conditions. This
finding, however, requires a similar test of robustness with the addition of control variables
before drawing conclusions.
Another somewhat related factor is unfamiliar altitude effects or altitude sickness. Athletic
performance peaks near sea level, where oxygen levels are 20.9% and air pressure is higher,
allowing for easier respiration3. Away teams in Denver, Colorado playing at Sports Authority
Field, however, have to adjust to lower air pressures when playing at an altitude of 5280 feet.
Not surprisingly, those teams playing the Denver Broncos in Colorado in the four seasons from
2000 through 2003 performed very poorly. Away teams had a win percentage of 28.1% when
playing the Broncos at home. Of course, one must consider that over these four years the
Broncos enjoyed a winning record of 39-25 and made the playoffs in two of these seasons. Also,
the dedicated Broncos fans might assert that they provide the best home-field advantage for their
team by packing the stadium every game. Nevertheless, it is an important consideration when
looking at the true effects of distance on predicted win percentage.
3 Sports Authority Field, Elevation 5,280 Feet above Sea Level
11
V. Empirical Model and Results
The dependent variable is the away team’s win rate. The independent explanatory variable is
distance traveled. Non-stochastic explanatory variables are also included in each model to
control for any positive or negative effects that these variables have on the dependent variable.
The base model, Regression 1, uses Ordinary Least Squares (OLS) regression of a win-loss
dummy variable on the travel distance, while controlling for other variables including stadium
condition, team talent, team performance, etc. The basic relationship of interest is as follows:
𝑤𝑖𝑛 = 𝛽0 + 𝛽1𝑟𝑒𝑠𝑡8 + 𝛽2𝑠𝑡𝑑𝑚𝑎𝑔𝑒 + 𝛽3𝑎𝑡𝑡𝑑95 + 𝛽4𝑚𝑣𝑝 + 𝛽5𝑚𝑣𝑝𝑐𝑎𝑛𝑑 + 𝛽6𝑝𝑏𝑜𝑓𝑓1 + 𝛽7𝑝𝑏𝑜𝑓𝑓2
+ 𝛽8𝑝𝑏𝑑𝑒𝑓1 + 𝛽9𝑝𝑏𝑑𝑒𝑓2 + 𝛽10𝑑𝑖𝑠𝑡
Regression 1 solely investigates the distance-win relationship assuming linearity. The Linear
Probability Model (LPM) is applied against the logit model, for both simplicity and a clearer
economic explanation of the coefficients.
Regression 2 explores the potential non-linear/quasi-polynomial distance-win relationship by
including higher degrees of the variable 𝑑𝑖𝑠𝑡. Two interaction terms, 𝑑𝑜𝑚𝑒 ∗ 𝑑𝑖𝑠𝑡 and 𝑚𝑖𝑙𝑒ℎ𝑖𝑔ℎ ∗
𝑑𝑖𝑠𝑡, are included in Regression 3 in order to investigate if playing indoors (𝑑𝑜𝑚𝑒) and playing
in the high-altitude Denver (𝑚𝑖𝑙𝑒ℎ𝑖𝑔ℎ) will affect the impact of the coefficient on the distance
variable (𝛽10).
The comprehensive model, Regression 4, studies the effect of changing time zone and starting
time, with a specific focus on the extreme cases where West teams travel 3 time zones to East
and compete in an early afternoon (~1pm) or a night game (~9pm), and their effect on the win
rate and the distance’s impact. The comprehensive model in our study is as follows:
𝑤𝑖𝑛 = 𝛽0 + 𝛽1𝑟𝑒𝑠𝑡8 + 𝛽2𝑠𝑡𝑑𝑚𝑎𝑔𝑒 + 𝛽3𝑎𝑡𝑡𝑑95 + 𝛽4𝑚𝑣𝑝 + 𝛽5𝑚𝑣𝑝𝑐𝑎𝑛𝑑 + 𝛽6𝑝𝑏𝑜𝑓𝑓1 + 𝛽7𝑝𝑏𝑜𝑓𝑓2
+ 𝛽8𝑝𝑏𝑑𝑒𝑓1 + 𝛽9𝑝𝑏𝑑𝑒𝑓2 + 𝛽10𝑑𝑖𝑠𝑡 + 𝛽11𝑡𝑧3𝑒𝑒𝑎𝑟𝑙𝑦 + 𝛽12𝑡𝑧3𝑒𝑛𝑖𝑔ℎ𝑡 + 𝛽13𝑡𝑧3𝑒𝑒𝑎𝑟𝑙𝑦
∗ 𝑑𝑖𝑠𝑡 + 𝛽14𝑡𝑧3𝑒𝑛𝑖𝑔ℎ𝑡 ∗ 𝑑𝑖𝑠𝑡 + 𝛽15𝑑𝑜𝑚𝑒 ∗ 𝑑𝑖𝑠𝑡 + 𝛽16𝑚𝑖𝑙𝑒ℎ𝑖𝑔ℎ ∗ 𝑑𝑖𝑠𝑡
For comparison purposes, Regression 5 further develops Regression 4 by incorporating the
polynomial terms from Regression 2.
All models employ robust standard errors given the strong evidence of heteroscedasticity. Both
the Breusch-Pagan test (p-value = 0.0023) and White test (p-value = 0.0000) imply strong
rejection for the assumption of homoscedasticity. The final results of the robust OLS regressions
are summarized below:
TABLE 2
12
Regression Resultsa
Dependent Variable – 𝑾𝒊𝒏
Variable Regression 1 Regression 2 Regression 3 Regression 4 Regression 5
𝑟𝑒𝑠𝑡8
0.0314
(0.76)
0.0346
(0.85)
0.0326
(0.79)
0.0329
(0.80)
0.0355
(0.86)
𝑚𝑣𝑝
0.272***
(3.83)
0.276***
(3.87)
0.274***
(3.84)
0.276***
(3.86)
0.280***
(3.89)
𝑚𝑣𝑝𝑐𝑎𝑛𝑑
0.167***
(3.83)
0.169***
(3.85)
0.171***
(3.94)
0.168***
(3.88)
0.170***
(3.88)
𝑝𝑏𝑜𝑓𝑓1
0.0128
(1.26)
0.0133
(1.30)
0.0122
(1.20)
0.0126
(1.24)
0.0143
(1.39)
𝑝𝑏𝑜𝑓𝑓2
-0.0424***
(-4.91)
-0.0419***
(-4.83)
-0.0423***
(-4.85)
-0.0420***
(-4.80)
-0.0414***
(-4.72)
𝑝𝑏𝑑𝑒𝑓1
0.0579***
(5.19)
0.0550***
(4.85)
0.0576***
(5.19)
0.0574***
(5.14)
0.0537***
(4.74)
𝑝𝑏𝑑𝑒𝑓2
-0.0367***
(-3.26)
-0.0397***
(-3.47)
-0.0328***
(-2.89)
-0.0311***
(-2.71)
-0.0333***
(-2.85)
𝑠𝑡𝑑𝑚𝑎𝑔𝑒
-0.00232*
(-1.79)
-0.00219*
(-1.68)
-0.00232*
(-1.73)
-0.00239*
(-1.78)
-0.00229*
(-1.70)
𝑎𝑡𝑡𝑑95
-0.116***
(-3.37)
-0.117***
(-3.37)
-0.128***
(-3.67)
-0.126***
(-3.57)
-0.127***
(-3.58)
𝑑𝑖𝑠𝑡
-0.0000324**
(-2.34)
-0.000964*
(-1.87)
-0.0000371***
(-2.67)
-.0000353**
(-2.21)
-.0010698**
(-2.05)
𝑑𝑖𝑠𝑡2
0.00000131*
(1.86)
0.00000145**
(2.03)
𝑑𝑖𝑠𝑡3
-7.59e-10*
(-1.86)
-8.50e-10**
(-2.04)
𝑑𝑖𝑠𝑡4
1.90e-13*
(1.82)
2.14e-13**
(2.00)
𝑑𝑖𝑠𝑡5
-1.71e-17*
(-1.77)
-1.92e-17*
(-1.93)
𝑑𝑜𝑚𝑒 ∗ 𝑑𝑖𝑠𝑡
0.0000446**
(2.07)
0.0000445**
(2.04)
0.0000537**
(2.41)
𝑚𝑖𝑙𝑒ℎ𝑖𝑔ℎ ∗ 𝑑𝑖𝑠𝑡
0.0000252
(0.55)
0.0000245
(0.53)
0.0000360
(0.75)
𝑡𝑧3𝑒𝑒𝑎𝑟𝑙𝑦
0.485
(0.68)
0.987
(1.26)
𝑡𝑧3𝑒𝑒𝑎𝑟𝑙𝑦 ∗ 𝑑𝑖𝑠𝑡
-0.000147
(-0.73)
-0.000303
(-1.37)
𝑡𝑧3𝑒𝑛𝑖𝑔ℎ𝑡
2.339***
(2.78)
2.762***
(2.68)
𝑡𝑧3𝑒𝑛𝑖𝑔ℎ𝑡 ∗ 𝑑𝑖𝑠𝑡
-0.000599***
(-2.60)
-0.000731***
(-2.60)
𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡
0.580***
(10.98)
0.779***
(5.81)
0.575***
(10.65)
0.570***
(10.36)
0.798***
(5.93)
13
# of observations 1056 1056 1056 1056 1056
R2 0.1366 0.1397 0.1404 0.1427 0.1481
a. T-statistics in parentheses, * p<0.10, ** p<0.05, *** p<0.01
In Regression 1, the negative and statistically significant distance coefficient clearly indicates the
downward sloping distance-win relationship. Hence, reaffirming the inverse relationship
between distance traveled and win percentage in our descriptive findings.
Regression 2 attempts to approximate the non-linearity by employing a distance polynomial with
a degree of 5. This turns out to be a relatively good fit as all of the coefficients are significant at
the 10% level, yet not at the 5% level. To analyze the polynomial distance-win relationship, the
fitted line with 95% Confidence Intervals is plotted as below:
GRAPH 3
Observe that the downtrend is notably evident on two tails, namely, [0, 900] and [3700, 4300].
The two intervals approximately match the mean distances of traveling within a time zone and
crossing 3 time zones in our descriptive findings — suggesting the possibility that changing time
zone has an effect on distance-win relationship (to be investigated in Regression 4).
It is noted that in Regression 2 the coefficient of 𝑑𝑖𝑠𝑡 further deviates from zero against the
decline of precision. The low precision, i.e. the large standard errors of Regression 2, is partly
14
due to the inclusion of high-order items (e.g. 𝑑𝑖𝑠𝑡5). The effect is particularly remarkable when
distance is large.
In Regression 3, the distance coefficient is statistically significant at the 1% level with a larger
absolute t statistic after specifying the variables 𝑑𝑜𝑚𝑒 and 𝑚𝑖𝑙𝑒ℎ𝑖𝑔ℎ, which implies an omitted
variable bias of coefficients in Regression 1.
The plotted graph of distance-win relationship, conditional on 𝑑𝑜𝑚𝑒 is presented below:
GRAPH 4
The positive and significant coefficient of 𝑑𝑜𝑚𝑒 ∗ 𝑑𝑖𝑠𝑡 confirms the positive effect of playing
indoors – especially after a long-distance travel, because playing an indoor game helps the away
team to avoid unfamiliar weather conditions. This may suggest that a significant amount of the
negative distance effect on team performance can be explained by weather conditions.
The insignificance of 𝑚𝑖𝑙𝑒ℎ𝑖𝑔ℎ ∗ 𝑑𝑖𝑠𝑡 suggests that for the away team, it makes no difference
whether the game takes place in Sports Authority Field in Denver.
To address the issues of changing time zone and starting time, Regression 4 includes two
dummies 𝑡𝑧3𝑒𝑒𝑎𝑟𝑙𝑦 and 𝑡𝑧3𝑒𝑛𝑖𝑔ℎ𝑡 and their interaction terms with 𝑑𝑖𝑠𝑡.
The plotted graph of distance-win relationship, conditional on 𝑡𝑧3𝑒𝑛𝑖𝑔ℎ𝑡 , is presented below:
15
GRAPH 5
The results suggest that when traveling west to east, playing a night game does provide some sort
of advantage, and in particular alleviates a bit the drawbacks of long-distance travels (negative
coefficient of 𝑧3𝑒𝑛𝑖𝑔ℎ𝑡 ∗ 𝑑𝑖𝑠𝑡 ). In other words, the advantage diminishes when 𝑑𝑖𝑠𝑡 increases.
Admittedly, the predicted probability of the win rate when 𝑡𝑧3𝑒𝑛𝑖𝑔ℎ𝑡 = 1 greatly exceeds 1
(100%) and thus, may provide little in reference value. Three items in our analysis help to
explain this. Firstly, the use of LPM instead of the logit model and secondly, a small sample size
(12 observations across 2000 – 2003 when 𝑡𝑧3𝑒𝑛𝑖𝑔ℎ𝑡 = 1), and third, the unavailability
of samples when 𝑑𝑖𝑠𝑡 is small (which leads to an enormous standard error when approaching Y-
axis, as shown above). In order to fulfill the condition of crossing 3 time zones, the travel
distance usually needs to be around 3600km.
On the other hand, the model finds no observable effect of an early afternoon game on game
outcomes.
Regression 5 serves as a comparison by reintroducing the idea of a polynomial into the
comprehensive model. The results skew the coefficients and are more significant when compared
with Regression 2. This indicates that when we continue to control for 𝑑𝑜𝑚𝑒, 𝑚𝑖𝑙𝑒ℎ𝑖𝑔ℎ time zone
and starting time variables, the nonlinear distance-win relationship still potentially exists.
16
VI. Extensions
Addressing the question of whether travel distance affects team performance ultimately comes
down to accounting for all aspects of home-field advantage. If we were to leave important
variables out of the model, we would expect the individual contribution from distance traveled to
be more subdued. One such variable discussed in this paper is unfamiliar weather conditions.
The addition of this variable in some form would most likely result in a more reliable estimate of
increased travel on team performance. A hypothesized way of including this variable would be to
use the U.S. Department of Energy’s climate zone map and assign a zone to each team. If the
team plays a game outside of their zone, they are assigned a one and zero otherwise. We would
also need to take into account the date of the game—games played in September, when it is still
summer, would include fewer weather discrepancies than games played in late November and
December.
Additionally, there may be alternative methods for controlling for talent of team which we may
find to be more reliable. One way could be to use data from the video game Madden NFL, which
gives ratings on the team’s offense and defense on a scale of 1 to 100. A complication, though,
would arise when looking for statistics on team ratings before 1995, as the Madden Series was
not around earlier than that.
17
VII. Conclusion
Based on the analysis above, traveling longer distance seems to have a negative impact on team
performance in the NFL, when looking at this question the model we created confirms a lot of
our prior suspicions. Some west coast teams travel over twice as much as their east coast
counterparts and therefore, are clearly at a disadvantage. Using this model our findings
concluded that the odds of an away team winning decrease by 3.5% as the travel distance to
another stadium increases by 1000km.
Using the data from our model one would conclude that the best way to create a fairer league
would be to redraw the divisions based on minimizing travel distance. With the NFL growing
year after year pulling in billions of dollars in revenue, one would think the smallest amount of
data that can be used to predict the outcome of a game could be useful. This, however, is not true
in the real world. An outside viewer looking in at the league would ponder why the Dallas
Cowboys are in the NFC East with a team like the New York Giants.
While the data in the model points to this as a faulty set up, creating the fairest competition
possible is not the core pursuit of the NFL. The goal of the NFL has, and always will be, profit.
The Dallas Cowboys vs. the New York Giants has always been one of the largest rivalries in
sports and for reasons like that, the NFL thrives. The NFL caters to fans and is rewarded in
massive cash flows. The league has recently floated the idea of a London-based team. Looking at
the data from our model helps to predict the negative implications that this would have for
upholding fair competition. Yet, a London-based team would bring in another huge market of
consumers and likely lead to an increase in revenue for the league itself. Models such as the one
we created can offer some insight in predicting how a team will perform in a season, but unless
the findings are so outrageous and show viewers themselves are malcontent, it is very likely
these statistics will remain as they truly are, just statistics.
18
VIII. Appendix
A. Distance Matrix, Using 2003 Stadium Locations
B. Pro Bowlers Per Team (2000-2003)
19
C. Time Zone