effects of travel distance on away team win percentage in the nfl

1

Effects of Travel Distance on Away Team Win Percentage in the NFL

Boston College

Econometric Methods Professor Cox

5/5/2016

Kyle Waters Yi Zhang

Donald Tynion George Acevedo

2

Abstract: Does traveling long distances have a negative impact on team performance in the

National Football League (NFL)? The consequences of longer travel distance are travel fatigue

and jet lag, physiological conditions which can be harmful to an NFL player’s potential

performance. This would imply an advantage to those teams who travel less and play in familiar

time zones. Yet, the distribution of travel is grossly unequal among teams – some teams can

accumulate over five times more distance throughout a season than others. Existing analysis of

NFL team performance from win-loss records from 1978-1987 suggest that longer distances

traveled on away games with multiple time zone changes influence game outcomes. Previous

studies acknowledge the need for more robust research on the subject in method and

implementation. A well-structured econometric model with accurate data on distance traveled

and time zone change provides a mechanism to clarify and confirm previous studies. After

controlling for various qualities of both team and stadium, the predicted odds of the away team

winning decreases by 3.5% as the distance needed to travel to the home opponent increases by

1000 km. Further investigation seems to indicate that the magnitude of this finding can change

under certain circumstances, though the negative impact remains clear. With the NFL currently

exploring an expansion team for London, England, these conclusions would suggest a London-

based team to be at a major disadvantage to United States-based opponents.

3

I. Introduction

The idea of a home-field advantage1 is a greatly cited concept. The theory suggests that teams

that play on the road are at a major disadvantage playing in a stadium with a crowd rooting

against them, unfamiliar stadium characteristics, and referees – in some cases – biased in favor

of the home team. Does a major portion of this advantage, though, originate from the effects of

travel that away teams must undergo to compete? Anyone who has traveled long distances by

plane can testify to the detrimental impact of travel fatigue. Many people have also experienced

jet lag, where the body clock of the individual is asynchronous from the local time zone, as a

result of crossing one or more time zones. These factors have real, detrimental effects on human

health and can be especially complicating for professional athletes. Some NFL teams, though,

travel much more than others. The Seattle Seahawks and San Francisco 49ers are often the most

frequent fliers. Together, these two teams traversed over 88,536 kilometers in 2003 while the

Philadelphia Eagles and New York Giants covered a mere 31,003 kilometers. If distance traveled

decreases the odds of winning, west coast teams such as these two would have a legitimate

argument to make.

In 2013, researchers from Stanford University2 investigated the question of whether NFL game

outcomes depend upon changes in time zone. Their study, titled, “The Impact of Circadian

Misalignment on Athletic Performance in Professional Football Players,” looked at data from 40

years and found that west coast teams playing east coast teams at night held a big advantage.

This study, though, did not account for the actual distance traveled and instead, focused on start

time and time zone. This paper aims to add the variable for distance traveled in addition to the

start time and time zone.

Our main finding centers on a general observation that when holding all else equal, an increase

in distance of 1000 kilometers will lower the odds of the away team winning by 3.5%. To reach

this conclusion, we employed a number of empirical methods. We focused on running an

ordinary least squares regression on five separate models each including different control

variables. Our final model indicates that distance traveled follows a nonlinear effect on win

percentage. The predicted win percentage greatly decreases when teams travel more than 3600

kilometers. Interestingly, this finding aligns with the average distance needed to cross three time

zones. Put differently, the decrease in the odds of the away team winning is worse when

traveling 1000 extra kilometers from a distance of 3000 km than from a distance of 2000 km.

The model’s estimate of the effect of distance on predicted win change with the consideration of

other variables that may be correlated with distance traveled. Our model omits a potentially

important determinant of home field advantage—unfamiliar weather conditions. Away teams’

odds of winning can potentially be impaired by playing in unfamiliar weather conditions. Our

1 Bill Barnwell (2012), NFL’s Frequent-Flier Phenomenon 2 Roger S. Smith; Bradley Efron; Cheri D. Mah; Atul Malhotra (2013), The Impact of Circadian Misalignment on Athletic Performance in Professional Football Players

4

model attempts to account for this deficiency by adding a dummy variable for whether the game

is played at a dome or not. We observe that when the away team plays at a dome, the effects of

travel distance on team performance are significantly less negative. Furthermore, some teams

recognize the downsides of increased travel and attempt to mitigate the effects by staying in the

time zone they play in on the road in back to back weeks.

The basic conclusion that distance traveled negatively impacts team performance is significant

when considering the NFL’s potential plans to expand into London, England. If distance traveled

across multiple time zones truly has a major negative effect on performance, it would be best for

the NFL to drop the idea. It would also make sense to optimize divisional arrangements on

geographical distance to minimize travel.

5

II. Data

The dataset we created to test the relationship between travel distance and win percentage was

specifically tailored to fit the purposes of this study. The dataset compiles the data from every

game of every season between 2000 and 2003. For the purpose of our analysis, all games are

observed from the perspective of the away team.

The table below summarizes the basics of the dataset:

TABLE 1

Summary Statistics

Variable Mean Std. Dev. Min Max

Win (𝑤𝑖𝑛, 1 if the observed team wins, 0 if loses) .4195076 .4937123 0 1

Team Talents & Performance

Rest days since the last game (𝑟𝑒𝑠𝑡8)a .1439394 .3511946 0 1

MVP (𝑚𝑣𝑝, 1 if the observed team has MVP, 0 if not) .0445076 .2063176 0 1

MVP Candidate (𝑚𝑣𝑝𝑐𝑎𝑛𝑑)b .1410985 .3482881 0 1

# of offensive pro-bowlers of the observed team (𝑝𝑏𝑜𝑓𝑓1) 1.293561 1.550721 0 6

# of offensive pro-bowlers of the opponent (𝑝𝑏𝑜𝑓𝑓2) 1.310606 1.572911 0 6

# of defensive pro-bowlers of the observed team (𝑝𝑏𝑑𝑒𝑓1) 1.334280 1.357094 0 6

# of defensive pro-bowlers of the opponent (𝑝𝑏𝑑𝑒𝑓2) 1.340909 1.351591 0 6

Stadium

Stadium Age (𝑠𝑡𝑑𝑚𝑎𝑔𝑒) 26.05492 11.73357 0 57

Attendance (𝑎𝑡𝑡𝑑95)c .6922348 .4617875 0 1

Dome (𝑑𝑜𝑚𝑒, 1 if the stadium is indoor, 0 if not) .2367424 .4252843 0 1

Mile High (𝑚𝑖𝑙𝑒ℎ𝑖𝑔ℎ, 1 if in Denver, 0 if not) .0303030 .1715010 0 1

Time & Distance

Crossing 3 time zones to East & Early afternoon game

(𝑡𝑧3𝑒𝑒𝑎𝑟𝑙𝑦, 1 if yes, 0 if not)d

.0388258 .1932710 0 1

Crossing 3 time zones to East & Night game

(𝑡𝑧3𝑒𝑛𝑖𝑔ℎ𝑡, 1 if yes, 0 if not)d

.0113636 .1060432 0 1

Distance (𝑑𝑖𝑠𝑡)e 1518.366 1022.705 0 4376.407

# of Observations 1056

a. 𝑟𝑒𝑠𝑡8 is a dummy that = 1 if # of rest days since the last game <= 8, zero otherwise b. 𝑚𝑣𝑝𝑐𝑎𝑛𝑑 is a dummy that = 1 if the observed team has MVP candidate (MVP excluded) in that season, zero otherwise c. 𝑎𝑡𝑡𝑑95 is the dummy that = 1 if the average attendance rate of that season > 95%, zero otherwise d. Early afternoon game is defined as before 3pm (EST), normally around 1pm; Night game is defined as after 6pm

(EST), normally around 9pm; the reference group is Late afternoon game (3pm – 6pm, EST) e. 𝑑𝑖𝑠𝑡 in Kilometers

The main resource to create the dataset is the website Pro-Football-Reference, which compiles

the dates and times of all games dated back to 1920: win-loss, game performance—in terms of

both offensive and defensive statistics—as well as the expected points set before each game.

The attendance data collected is not an exact representation of each individual game, but an

annual average from each stadium. It is expressed as a dummy variable in the model with 95%

6

capacity being the cutoff point to help account for crowd noise and other effects. It is worth

pointing out that this dummy variable, though under the category of stadium, also offers a proxy

for the strength of opponent because better teams’ home games have higher attendance.

For consistency, the distance traveled is assumed to be identical to the distance from stadium to

stadium. The distance data is computed based on the Haversine Formula, which accounts for the

curvature of the earth. The latitude and longitude given for each stadium are used as inputs to the

calculation. It is further assumed in the dataset that the away team travels directly to the home

team’s stadium before each game and directly back to their home stadium after each game.

Layovers, delays, and distance to and from each airport are not taken into consideration.

One problem with the dataset might be the lack of observations, which is a tradeoff made

between building a first-hand, tailor-made, reliable dataset and borrowing datasets that may not

best suit the needs of the paper. The collection of time and distance variables accounted for a

large portion of the workload. A larger dataset across more seasons with more observations

would most likely yield a more accurate representation of the relationship between distances

traveled and win percentage.

7

III. Theoretical Considerations

The fundamental consideration is a potential negative relationship between travel distance and

win percentage for the away team in a game, namely, the travel fatigue. Travel fatigue is

identified as disorientation because of changes in climate, natural lighting, dehydration due to

cabin air, restricted choice of food, and limited space for exercise or movement, which is

expected to be an inhibitor to athletic performance. Symptoms of travel fatigue can be eased by

sleep, which is part of the reasons to control for the rest-day dummy variable (𝑟𝑒𝑠𝑡8).

However, the distance-win relationship is complicated by a number of other factors, which

includes change of time zones, starting time of the game and stadium condition.

One of the key factors is jet lag caused by traveling across the time zones in the country, ranging

from -3 to 3. Winning percentage is expected to decrease for the away teams as travel distance

increases because the air travel causes jet lag when the visiting team travels across time zones.

Moreover, jet lag could have an incremental effect on the distance effect - travelling from west to

east is expected to be different from travelling from north to south in terms of the negative

performance impact, given the same level of distance.

Starting time is another factor of consideration. The effects of different starting times will vary

due to the existence of travel fatigue and jet lag. A 1pm game in Eastern Standard Time (EST) in

FedEx Field, Washington will possibly have different effects on Washington Redskins and

Arizona Cardinals which just traveled more than 3000km for the game.

Given the level of complexity of time and distance factors, the report strategically focuses on one

special case – the effect of the away team crosses 3 time zones from west to east that is from

Pacific Standard Time (PST) to EST. If the San Francisco 49ers visit Philadelphia and compete

against Eagles in a 1pm game, the body clock of the 49ers supposedly still remains at 10am,

when the game kicks off. If the game starts at 9pm, the body clock of the 49ers will be at 6pm.

These two scenarios are expected to yield different outcomes.

These theories will be explored in depth in the later sections.

8

IV. Descriptive Findings

The first consideration is whether there is any evidence that “home-field advantage” influences

athletic performance. A clear answer surfaces when comparing home and away win percentages

in the seasons from 2000 to 2003. The home team wins 58% of the time in these years, indicating

a strong impact from home-field advantage. A simple difference of means test of whether home

teams win more demonstrates that home teams win more on average, with a significant p-value

of 0. Whether this advantage stems from total distance traveled or time zones crossed by the

away team, however, requires further analysis.

Comparing game outcomes from years 2000 to 2003, shows an inverse relationship between

travel distance and win percentage: longer travel, on average, results in lower win percentage.

Away teams that traveled more than the average 1518 kilometers to their home opponents in

these years win their matchup 39.4% of the time. Teams that traveled less than 1518 km, on the

other hand, win 43.6% of the time. While this difference seems significant on the surface, control

variables for important confounding factors such as strength of opponent and size of the crowd

cheering for the home team, for example, must be taken into account.

This trend becomes apparent with the implementation of a smoothed relationship between

distance traveled and win percentage. The graph below depicts a helpful visualization of the

downward trend suggested by the simple conditional mean discussed above.

GRAPH 1

9

There is an inverse relationship between distance traveled and win percentage. However, it

seems that predicted win percentage increases as travel increases around 3000 kilometers. This

seemingly contradictory trend reappears even with more intricate econometric modeling. It is

difficult to distinguish whether this trend has meaning or is simply noise in the data.

Jet lag, as opposed to travel fatigue, requires the crossing of time zones. When investigating the

basic data for wins and time zone change, it appears that the more time zones crossed, the lower

the win percentage. Away teams crossing three time zones either to the west or east had a win

percentage of 37.6% compared to a win percentage of 45.4% for those teams that stayed in their

familiar time zone. Another intriguing comparison emerged when looking at the win percentage

of those teams that travel three time zones to the west compared to those teams that travel three

time zones to the east. Teams that travel east to west three time zones win 40% of the time while

teams that travel three time zones west to east win 35.2% of the time.

Starting time is significant explanation for this finding. Most NFL games start in the early

afternoon, which is particularly decreases athletic performance to west coast teams playing on

the east coast. In order to fit in pre-game routines for a 1pm EST game, players awake around

9am. Yet, this feels like 6am to players accustomed to Pacific Standard Time. This finding is a

well-known one—west coast teams such as the San Francisco 49ers argue against early

afternoon games on the east coast for years. The graph below summarizes these findings with a

smoothed relationship. Each dot indicates a single time zone change, positive for west to east and

negative for east to west. The middle dot represents win percentage with 0 time zones changed.

GRAPH 2

10

Hence, the graph is consistent with the finding that time-zone change decreases athletic

performance, and that eastward time zone change decreases performance more than westward

time zone change.

Another interesting trend arises when observing away team win percentage at domed stadiums

versus non-domed stadiums. Away teams that play their opponent at a domed stadium win

45.2% of the time while away teams that play at outdoor stadiums win 40.9% of the time. A

possible explanation for this finding is the unfamiliar weather conditions the teams face when

traveling on the road. For instance, when the Miami Dolphins come north to play the New

England Patriots at the non-domed Gillette Stadium, they face drastically cooler weather most of

the time. This observation is important when considering the effects of distance on performance

as the farther a team travels, the more likely they are to face unfamiliar weather conditions. This

finding, however, requires a similar test of robustness with the addition of control variables

before drawing conclusions.

Another somewhat related factor is unfamiliar altitude effects or altitude sickness. Athletic

performance peaks near sea level, where oxygen levels are 20.9% and air pressure is higher,

allowing for easier respiration3. Away teams in Denver, Colorado playing at Sports Authority

Field, however, have to adjust to lower air pressures when playing at an altitude of 5280 feet.

Not surprisingly, those teams playing the Denver Broncos in Colorado in the four seasons from

2000 through 2003 performed very poorly. Away teams had a win percentage of 28.1% when

playing the Broncos at home. Of course, one must consider that over these four years the

Broncos enjoyed a winning record of 39-25 and made the playoffs in two of these seasons. Also,

the dedicated Broncos fans might assert that they provide the best home-field advantage for their

team by packing the stadium every game. Nevertheless, it is an important consideration when

looking at the true effects of distance on predicted win percentage.

3 Sports Authority Field, Elevation 5,280 Feet above Sea Level

11

V. Empirical Model and Results

The dependent variable is the away team’s win rate. The independent explanatory variable is

distance traveled. Non-stochastic explanatory variables are also included in each model to

control for any positive or negative effects that these variables have on the dependent variable.

The base model, Regression 1, uses Ordinary Least Squares (OLS) regression of a win-loss

dummy variable on the travel distance, while controlling for other variables including stadium

condition, team talent, team performance, etc. The basic relationship of interest is as follows:

𝑤𝑖𝑛 = 𝛽0 + 𝛽1𝑟𝑒𝑠𝑡8 + 𝛽2𝑠𝑡𝑑𝑚𝑎𝑔𝑒 + 𝛽3𝑎𝑡𝑡𝑑95 + 𝛽4𝑚𝑣𝑝 + 𝛽5𝑚𝑣𝑝𝑐𝑎𝑛𝑑 + 𝛽6𝑝𝑏𝑜𝑓𝑓1 + 𝛽7𝑝𝑏𝑜𝑓𝑓2

+ 𝛽8𝑝𝑏𝑑𝑒𝑓1 + 𝛽9𝑝𝑏𝑑𝑒𝑓2 + 𝛽10𝑑𝑖𝑠𝑡

Regression 1 solely investigates the distance-win relationship assuming linearity. The Linear

Probability Model (LPM) is applied against the logit model, for both simplicity and a clearer

economic explanation of the coefficients.

Regression 2 explores the potential non-linear/quasi-polynomial distance-win relationship by

including higher degrees of the variable 𝑑𝑖𝑠𝑡. Two interaction terms, 𝑑𝑜𝑚𝑒 ∗ 𝑑𝑖𝑠𝑡 and 𝑚𝑖𝑙𝑒ℎ𝑖𝑔ℎ ∗

𝑑𝑖𝑠𝑡, are included in Regression 3 in order to investigate if playing indoors (𝑑𝑜𝑚𝑒) and playing

in the high-altitude Denver (𝑚𝑖𝑙𝑒ℎ𝑖𝑔ℎ) will affect the impact of the coefficient on the distance

variable (𝛽10).

The comprehensive model, Regression 4, studies the effect of changing time zone and starting

time, with a specific focus on the extreme cases where West teams travel 3 time zones to East

and compete in an early afternoon (~1pm) or a night game (~9pm), and their effect on the win

rate and the distance’s impact. The comprehensive model in our study is as follows:

𝑤𝑖𝑛 = 𝛽0 + 𝛽1𝑟𝑒𝑠𝑡8 + 𝛽2𝑠𝑡𝑑𝑚𝑎𝑔𝑒 + 𝛽3𝑎𝑡𝑡𝑑95 + 𝛽4𝑚𝑣𝑝 + 𝛽5𝑚𝑣𝑝𝑐𝑎𝑛𝑑 + 𝛽6𝑝𝑏𝑜𝑓𝑓1 + 𝛽7𝑝𝑏𝑜𝑓𝑓2

+ 𝛽8𝑝𝑏𝑑𝑒𝑓1 + 𝛽9𝑝𝑏𝑑𝑒𝑓2 + 𝛽10𝑑𝑖𝑠𝑡 + 𝛽11𝑡𝑧3𝑒𝑒𝑎𝑟𝑙𝑦 + 𝛽12𝑡𝑧3𝑒𝑛𝑖𝑔ℎ𝑡 + 𝛽13𝑡𝑧3𝑒𝑒𝑎𝑟𝑙𝑦

∗ 𝑑𝑖𝑠𝑡 + 𝛽14𝑡𝑧3𝑒𝑛𝑖𝑔ℎ𝑡 ∗ 𝑑𝑖𝑠𝑡 + 𝛽15𝑑𝑜𝑚𝑒 ∗ 𝑑𝑖𝑠𝑡 + 𝛽16𝑚𝑖𝑙𝑒ℎ𝑖𝑔ℎ ∗ 𝑑𝑖𝑠𝑡

For comparison purposes, Regression 5 further develops Regression 4 by incorporating the

polynomial terms from Regression 2.

All models employ robust standard errors given the strong evidence of heteroscedasticity. Both

the Breusch-Pagan test (p-value = 0.0023) and White test (p-value = 0.0000) imply strong

rejection for the assumption of homoscedasticity. The final results of the robust OLS regressions

are summarized below:

TABLE 2

12

Regression Resultsa

Dependent Variable – 𝑾𝒊𝒏

Variable Regression 1 Regression 2 Regression 3 Regression 4 Regression 5

𝑟𝑒𝑠𝑡8

0.0314

(0.76)

0.0346

(0.85)

0.0326

(0.79)

0.0329

(0.80)

0.0355

(0.86)

𝑚𝑣𝑝

0.272***

(3.83)

0.276***

(3.87)

0.274***

(3.84)

0.276***

(3.86)

0.280***

(3.89)

𝑚𝑣𝑝𝑐𝑎𝑛𝑑

0.167***

(3.83)

0.169***

(3.85)

0.171***

(3.94)

0.168***

(3.88)

0.170***

(3.88)

𝑝𝑏𝑜𝑓𝑓1

0.0128

(1.26)

0.0133

(1.30)

0.0122

(1.20)

0.0126

(1.24)

0.0143

(1.39)

𝑝𝑏𝑜𝑓𝑓2

-0.0424***

(-4.91)

-0.0419***

(-4.83)

-0.0423***

(-4.85)

-0.0420***

(-4.80)

-0.0414***

(-4.72)

𝑝𝑏𝑑𝑒𝑓1

0.0579***

(5.19)

0.0550***

(4.85)

0.0576***

(5.19)

0.0574***

(5.14)

0.0537***

(4.74)

𝑝𝑏𝑑𝑒𝑓2

-0.0367***

(-3.26)

-0.0397***

(-3.47)

-0.0328***

(-2.89)

-0.0311***

(-2.71)

-0.0333***

(-2.85)

𝑠𝑡𝑑𝑚𝑎𝑔𝑒

-0.00232*

(-1.79)

-0.00219*

(-1.68)

-0.00232*

(-1.73)

-0.00239*

(-1.78)

-0.00229*

(-1.70)

𝑎𝑡𝑡𝑑95

-0.116***

(-3.37)

-0.117***

(-3.37)

-0.128***

(-3.67)

-0.126***

(-3.57)

-0.127***

(-3.58)

𝑑𝑖𝑠𝑡

-0.0000324**

(-2.34)

-0.000964*

(-1.87)

-0.0000371***

(-2.67)

-.0000353**

(-2.21)

-.0010698**

(-2.05)

𝑑𝑖𝑠𝑡2

0.00000131*

(1.86)

0.00000145**

(2.03)

𝑑𝑖𝑠𝑡3

-7.59e-10*

(-1.86)

-8.50e-10**

(-2.04)

𝑑𝑖𝑠𝑡4

1.90e-13*

(1.82)

2.14e-13**

(2.00)

𝑑𝑖𝑠𝑡5

-1.71e-17*

(-1.77)

-1.92e-17*

(-1.93)

𝑑𝑜𝑚𝑒 ∗ 𝑑𝑖𝑠𝑡

0.0000446**

(2.07)

0.0000445**

(2.04)

0.0000537**

(2.41)

𝑚𝑖𝑙𝑒ℎ𝑖𝑔ℎ ∗ 𝑑𝑖𝑠𝑡

0.0000252

(0.55)

0.0000245

(0.53)

0.0000360

(0.75)

𝑡𝑧3𝑒𝑒𝑎𝑟𝑙𝑦

0.485

(0.68)

0.987

(1.26)

𝑡𝑧3𝑒𝑒𝑎𝑟𝑙𝑦 ∗ 𝑑𝑖𝑠𝑡

-0.000147

(-0.73)

-0.000303

(-1.37)

𝑡𝑧3𝑒𝑛𝑖𝑔ℎ𝑡

2.339***

(2.78)

2.762***

(2.68)

𝑡𝑧3𝑒𝑛𝑖𝑔ℎ𝑡 ∗ 𝑑𝑖𝑠𝑡

-0.000599***

(-2.60)

-0.000731***

(-2.60)

𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡

0.580***

(10.98)

0.779***

(5.81)

0.575***

(10.65)

0.570***

(10.36)

0.798***

(5.93)

13

# of observations 1056 1056 1056 1056 1056

R2 0.1366 0.1397 0.1404 0.1427 0.1481

a. T-statistics in parentheses, * p<0.10, ** p<0.05, *** p<0.01

In Regression 1, the negative and statistically significant distance coefficient clearly indicates the

downward sloping distance-win relationship. Hence, reaffirming the inverse relationship

between distance traveled and win percentage in our descriptive findings.

Regression 2 attempts to approximate the non-linearity by employing a distance polynomial with

a degree of 5. This turns out to be a relatively good fit as all of the coefficients are significant at

the 10% level, yet not at the 5% level. To analyze the polynomial distance-win relationship, the

fitted line with 95% Confidence Intervals is plotted as below:

GRAPH 3

Observe that the downtrend is notably evident on two tails, namely, [0, 900] and [3700, 4300].

The two intervals approximately match the mean distances of traveling within a time zone and

crossing 3 time zones in our descriptive findings — suggesting the possibility that changing time

zone has an effect on distance-win relationship (to be investigated in Regression 4).

It is noted that in Regression 2 the coefficient of 𝑑𝑖𝑠𝑡 further deviates from zero against the

decline of precision. The low precision, i.e. the large standard errors of Regression 2, is partly

14

due to the inclusion of high-order items (e.g. 𝑑𝑖𝑠𝑡5). The effect is particularly remarkable when

distance is large.

In Regression 3, the distance coefficient is statistically significant at the 1% level with a larger

absolute t statistic after specifying the variables 𝑑𝑜𝑚𝑒 and 𝑚𝑖𝑙𝑒ℎ𝑖𝑔ℎ, which implies an omitted

variable bias of coefficients in Regression 1.

The plotted graph of distance-win relationship, conditional on 𝑑𝑜𝑚𝑒 is presented below:

GRAPH 4

The positive and significant coefficient of 𝑑𝑜𝑚𝑒 ∗ 𝑑𝑖𝑠𝑡 confirms the positive effect of playing

indoors – especially after a long-distance travel, because playing an indoor game helps the away

team to avoid unfamiliar weather conditions. This may suggest that a significant amount of the

negative distance effect on team performance can be explained by weather conditions.

The insignificance of 𝑚𝑖𝑙𝑒ℎ𝑖𝑔ℎ ∗ 𝑑𝑖𝑠𝑡 suggests that for the away team, it makes no difference

whether the game takes place in Sports Authority Field in Denver.

To address the issues of changing time zone and starting time, Regression 4 includes two

dummies 𝑡𝑧3𝑒𝑒𝑎𝑟𝑙𝑦 and 𝑡𝑧3𝑒𝑛𝑖𝑔ℎ𝑡 and their interaction terms with 𝑑𝑖𝑠𝑡.

The plotted graph of distance-win relationship, conditional on 𝑡𝑧3𝑒𝑛𝑖𝑔ℎ𝑡 , is presented below:

15

GRAPH 5

The results suggest that when traveling west to east, playing a night game does provide some sort

of advantage, and in particular alleviates a bit the drawbacks of long-distance travels (negative

coefficient of 𝑧3𝑒𝑛𝑖𝑔ℎ𝑡 ∗ 𝑑𝑖𝑠𝑡 ). In other words, the advantage diminishes when 𝑑𝑖𝑠𝑡 increases.

Admittedly, the predicted probability of the win rate when 𝑡𝑧3𝑒𝑛𝑖𝑔ℎ𝑡 = 1 greatly exceeds 1

(100%) and thus, may provide little in reference value. Three items in our analysis help to

explain this. Firstly, the use of LPM instead of the logit model and secondly, a small sample size

(12 observations across 2000 – 2003 when 𝑡𝑧3𝑒𝑛𝑖𝑔ℎ𝑡 = 1), and third, the unavailability

of samples when 𝑑𝑖𝑠𝑡 is small (which leads to an enormous standard error when approaching Y-

axis, as shown above). In order to fulfill the condition of crossing 3 time zones, the travel

distance usually needs to be around 3600km.

On the other hand, the model finds no observable effect of an early afternoon game on game

outcomes.

Regression 5 serves as a comparison by reintroducing the idea of a polynomial into the

comprehensive model. The results skew the coefficients and are more significant when compared

with Regression 2. This indicates that when we continue to control for 𝑑𝑜𝑚𝑒, 𝑚𝑖𝑙𝑒ℎ𝑖𝑔ℎ time zone

and starting time variables, the nonlinear distance-win relationship still potentially exists.

16

VI. Extensions

Addressing the question of whether travel distance affects team performance ultimately comes

down to accounting for all aspects of home-field advantage. If we were to leave important

variables out of the model, we would expect the individual contribution from distance traveled to

be more subdued. One such variable discussed in this paper is unfamiliar weather conditions.

The addition of this variable in some form would most likely result in a more reliable estimate of

increased travel on team performance. A hypothesized way of including this variable would be to

use the U.S. Department of Energy’s climate zone map and assign a zone to each team. If the

team plays a game outside of their zone, they are assigned a one and zero otherwise. We would

also need to take into account the date of the game—games played in September, when it is still

summer, would include fewer weather discrepancies than games played in late November and

December.

Additionally, there may be alternative methods for controlling for talent of team which we may

find to be more reliable. One way could be to use data from the video game Madden NFL, which

gives ratings on the team’s offense and defense on a scale of 1 to 100. A complication, though,

would arise when looking for statistics on team ratings before 1995, as the Madden Series was

not around earlier than that.

17

VII. Conclusion

Based on the analysis above, traveling longer distance seems to have a negative impact on team

performance in the NFL, when looking at this question the model we created confirms a lot of

our prior suspicions. Some west coast teams travel over twice as much as their east coast

counterparts and therefore, are clearly at a disadvantage. Using this model our findings

concluded that the odds of an away team winning decrease by 3.5% as the travel distance to

another stadium increases by 1000km.

Using the data from our model one would conclude that the best way to create a fairer league

would be to redraw the divisions based on minimizing travel distance. With the NFL growing

year after year pulling in billions of dollars in revenue, one would think the smallest amount of

data that can be used to predict the outcome of a game could be useful. This, however, is not true

in the real world. An outside viewer looking in at the league would ponder why the Dallas

Cowboys are in the NFC East with a team like the New York Giants.

While the data in the model points to this as a faulty set up, creating the fairest competition

possible is not the core pursuit of the NFL. The goal of the NFL has, and always will be, profit.

The Dallas Cowboys vs. the New York Giants has always been one of the largest rivalries in

sports and for reasons like that, the NFL thrives. The NFL caters to fans and is rewarded in

massive cash flows. The league has recently floated the idea of a London-based team. Looking at

the data from our model helps to predict the negative implications that this would have for

upholding fair competition. Yet, a London-based team would bring in another huge market of

consumers and likely lead to an increase in revenue for the league itself. Models such as the one

we created can offer some insight in predicting how a team will perform in a season, but unless

the findings are so outrageous and show viewers themselves are malcontent, it is very likely

these statistics will remain as they truly are, just statistics.

18

VIII. Appendix

A. Distance Matrix, Using 2003 Stadium Locations

B. Pro Bowlers Per Team (2000-2003)

19

C. Time Zone

effects of travel distance on away team win percentage in the nfl

Data & Analytics