production process in the nba: a formula for a …
TRANSCRIPT
PRODUCTION PROCESS IN THE NBA:
A FORMULA FOR A SUCCESSFUL TEAM
A THESIS
Presented to
The Faculty of the Department of Economics and Business
The Colorado College
In Partial Fulfillment of the Requirements for the Degree
Bachelor of Arts
By
Jigmei Dorji
May 2016
PRODUCTION PROCESS IN THE NBA:
A FORMULA FOR A SUCCESSFUL TEAM
Jigmei Dorji
May 2016
Economics
Abstract
Achieving success in the National Basketball Association is not only a priceless and
historic feat, but teams that have success in the playoffs and regular season also benefit
from financial bonuses. This paper estimates a production function for professional
basketball teams, and uses the results to determine significant areas of focus that are
positively and negatively associated with regular season win percentage. A Cobb-
Douglas production function and multi-variable Ordinary Least Squares regression
models are applied to data collected from the 2010-11 through 2014-15 seasons in the
National Basketball Association. The results are also applied to successful teams in the
playoffs in order to determine how regular season results translate to the playoffs. The
resulting estimates indicate that successful NBA teams over the last five seasons have
focused on shooting efficiently, keeping opponent shooting percentages low, rebounding,
forcing turnovers at a high rate, and building their teams through the draft.
KEYWORDS: Correlation, Econometrics, Multicollinearity, Multiple Variable Model,
Ordinary Least Squares, Regression, Cobb Douglas, Production Function, Production
Measurement, Sports
JEL CODES: C1, C3, D24, L83
ON MY HONOR, I HAVE NEITHER GIVEN NOR RECEIVED
UNAUTHORIZED AID ON THIS THESIS
Jigmei Dorji
Signature
TABLE OF CONTENTS
ABSTRACT
INTRODUCTION……………………………………...…………………………………1
Financial Incentive………………………………………………………………...2
Area of Focus………………...……………………………………………………2
LITERATURE REVIEW………………………………………………………………....3
Cobb-Douglas Production Functions……………………………………………...6
Basketball Analytics………………………………………………………………7
THEORETICAL FRAMEWORK………………………………………………………...9
Graph 1: Output and Marginal Product of Input X………………………………10
DATA AND METHODOLOGY………………………………………………………...12
Recent Trends in Basketball Statistics…………………………………………...18
Methodology……………………………………………………………………..19
REGRESSION RESULTS AND ANALYSIS…………………………………………..20
Table 1: Net Rating Regression Results…………………………………………20
Graph 2: Net Rating vs. Winning Percentage……………………………………21
Table 2: Play Style Regression Results………………………………………….22
Table 2.1: Offensive Rating Regression Results………………………………...24
Table 2.1a: Field Goal Percentage vs. Assists…………………………………...28
Table 2.2: Defensive Rating Regression Results………………………………...29
Table 3: Fixed Team Effects Regression Results………………………………..30
Table 3.1: Summarization of Win Percentage by Conference…………………...31
Table 4: Play Style and Fixed Team Effects Regression Results………………..33
Table 4.1: Play Style and Fixed Team Effects Beta Coefficients………………..35
Table 4.2: Summarization of Inputs and Output…………………………………37
Table 4.3: Play Style and Fixed Team Effects VIFs……………………………..39
CONCLUSION…………………………………………………………………………..40
REFERENCES…………………………………………………………………………..45
1
Introduction
Every year from October to June for 82 games (with more successful teams
playing close to 100 including the playoffs), 30 teams fight for supremacy in the National
Basketball Association (NBA). But, only one team can call themselves the champions of
the league at the end of the season. Over the last five seasons, four different franchises
have claimed the Larry O’Brien trophy – the Dallas Mavericks, the Miami Heat (twice),
the San Antonio Spurs, and most recently, the Golden State Warriors. The current decade
has been a relatively balanced few years when compared to historical trends. The NBA
has only had ten different teams win a title since 1975, indicating that the league has been
enjoying a spell of competitive balance in the last five seasons.
Led by eventual Most Valuable Player (MVP) Stephen Curry, the 2014-15
Warriors were able to jump out to an early lead in the standings despite playing in an
incredibly competitive Western Conference. Golden State finished the season with a
league-leading record of 67 wins and 15 losses and defeated the Cleveland Cavaliers in
the NBA Finals for their first championship since 1975. Although the supremacy that
Golden State exhibited during the 2015 season is now indisputable, did the Warriors
exhibit distinct types of advantages that allowed them to dominate over the rest of the
NBA? What types of effects did the style of play have on the winning percentage of the
team? How did the personnel affect the record of the team? Did the fixed effects that the
franchise has implemented off the court have an impact on the record on the floor?
2
Financial Incentive
Winning a championship is the unmistakable goal of any NBA franchise and the
players involved. Not only is claiming the Larry O’Brien trophy and hanging up the title
banner priceless and intangibly significant, but teams that have success in the playoffs
benefit from monetary bonuses as well. After winning the 2014 title, the San Antonio
Spurs were awarded over $2 million in bonuses from the NBA, not including the trophy
and championship rings, while the runner-up Miami Heat were also awarded $1.5
million.
The valuations of franchises also increase significantly after winning the
championship. According to Forbes, recent NBA champions have gained an average of
30% in team value after raising the trophy. Teams also raise ticket prices significantly
during their Finals run. In 2015, the average Finals ticket prices in Golden State ran over
$1,200 while the average ticket prices in Cleveland were over $1,300. Compared to the
regular season, when the Warriors charged an average ticket price of $327 and the Cavs
charged an average price of $258, merely playing in the Finals has provided a tangible
financial benefit. In addition, teams that make the playoffs are awarded nearly $200,000
as a bonus, while teams that reach the conference finals are awarded over $380,000. The
team that finishes the regular season with the best record in the NBA and in each
conference is also awarded well over $300,000 as a bonus.
Area of Focus
The length of this study focuses on the past five seasons in the NBA. In particular,
I plan to observe the characteristics of teams since the 2010-2011 season and determine
the inputs that have been significant in affecting regular season success in the past five
3
years of NBA basketball. Due to the makeup of the last four title-winning teams, I expect
that accurate three-point shooting and assisted field goals, as well as defending three-
pointers are indicators of successful team basketball in the NBA. These team
characteristics are related to the style of play that teams consciously employ, but are also
a result of the type of players that each team has available. As far as fixed team effects, I
expect that teams that build through the draft and free agency will have positive
relationships with winning percentage.
I plan to use an extensive list of independent variables in the model, including
variables measuring the output of the team, statistics measuring the style of play the team
utilizes (such as percentage of total shots from three-point range), and additional
variables measuring fixed team effects (such as average attendance and the conference of
the team). After running several regressions and determining significant variables
through the model, the models should be able to describe successful teams from the past
five years through style of play as well as the areas of focus that the team exhibits
through their statistics. After determining the significant inputs in the model, the results
should also be able to predict characteristics of successful teams in the future.
Literature Review
The article “Who is ‘Most Valuable’? Measuring the Player’s Production of Wins
in the National Basketball Association” by David Berri of Managerial and Decision
Economics (Berri, 1999) focused on linking individual performance to team wins in the
NBA. Berri found that although having the MVP certainly helps to produce wins, having
multiple efficient and productive teammates – particularly in the Playoffs – was the key
4
factor in the 1997-1998 season. The model that Berri used to determine each player’s
production of wins was:
Production of wins = (PM + TF + TDF – PA + TA) * total mins (2.1)
Berri first calculated each of the inputs individually, and then combined the factors into
the equation above. His inputs were per-minute player production (PM), per-minute team
tempo factor (TF), per-minute team defensive factor (TDF), average per-minute
production at position (PA), and average player’s per-minute production (TA). The
results that the model produced indicated that one dominant player per team is not
enough to have success against Playoff competition.
Fiona Carmichael, Dennis Thomas, and Robert Ward’s article “Team
Performance: The Case of English Premiership Football” (Carmichael, Thomas, & Ward,
2000) utilized a linear production function where the individual match results were
determined by various input variables. However, a slight variation in inputs that the
authors used compared to the other cases used as background knowledge in this study
was the difference in the types of independent variables. Team performance is still the
variable utilized as output in this scenario, but statistics such as difference in shots on
target, difference in percentage of all successful passes, difference in number of red
cards, difference in clearances, blocks, and interceptions, and the difference in
cumulative team goal differences before the game in question, as well as a number of
other statistics were categorized as inputs for the production function. The results found
that player skills such as accurate and efficient shooting and passing, as well as defensive
skills such as tackles, clearances, and blocks were all significant independent variables in
determining team performance in the Premiership.
5
José M. Sánchez Santos, Pablo Castellanos García, and Jesus A. Dopico Castro
used data from the Spanish league in their article “The Production Process in Basketball:
Empirical Evidence from the Spanish League” (Santos, Garcia, & Castro, 2006). The
authors found that factors such as home-court advantage, field goal and free throw
percentage, keeping turnovers and fouls in check, and defensive rebounds had the highest
marginal effects on the probability of winning a particular game. The authors used two
different models to estimate the probability of winning a game in the Spanish ACB
League. Their first model considered the statistics of the home team versus the away
team in each game in relative terms, finding that the home team won in nearly 62% of the
observed games and that the means for the home team in shooting percentage, assists, and
total rebounds were higher than the visiting team. Their second model specifically
analyzed the influence of home-court advantage on the probability of winning, and used
separate variables for the home team and the visitor. The second model wound up finding
similar significant results to their first model.
Nate Silver of ESPN’s FiveThirtyEight utilized analytic methods in his article
“Every NBA Team’s Chance of Winning A Title by 2019” (Silver, 2014), using current
performance of the team, average age of the team, and the talent level of the best player
on the team to estimate which team had the brightest outlook in the near future. Silver
found that the Golden State Warriors, Los Angeles Clippers, and the Cleveland Cavaliers
were the three teams with the highest probability to win at least one championship by
2019, based on the three factors listed above. When calculating the average age of the
team as well as measuring the talent level of the best player on the team, Silver used
projected wins added – a statistic based on a combination of Win Shares and Player
6
Efficiency Rating (PER) – to weight the team’s average age by performance to help
determine the relative age of the best players on each team, and to determine how many
projected wins each team’s best player accounted for.
Cobb-Douglas Production Functions
Thomas A. Zak, Cliff J. Huang and John J. Siegfried used a production function
in their article “Production Efficiency: The Case of Professional Basketball” (Zak,
Huang, & Siegfried, 1979). Using a Cobb-Douglas production function and data for
individual games during the 1976-77 NBA season, the authors formulated a production
frontier and estimated the impact of various inputs used in the production process. The
variables included in the model were ratio of field goal percentages, ratio of field goal
percentages, ratio of offensive and defensive rebounds, ratio of assists, ratio of personal
fouls, ratio of steals and blocks, ratio of turnovers, and a binary dummy variable for
location (home versus away games), while using the ratio of the final scores as the
dependent variable. The empirical results from their production function found that the
output was most responsive to field-goal percentage, free-throw percentage, and
rebounding. Other variables significantly affecting output were turnovers and personal
fouls. Based on their results, teams playing at home held an observed advantage over
their visiting opponents.
“An Empirical Estimation of a Production Function: The Case of Major League
Baseball” by Charles E. Zech (Zech, 1981) also uses a Cobb-Douglas production function
in order to estimate production of victories by a team in Major League Baseball. Zech’s
model used the major skills involved in baseball, such as batting average, home runs,
stolen bases, ratio of strikeouts to walks, total fielding chances, as well as years with the
7
same manager and manager win percentage, to describe team success in Major League
Baseball. Based on the results, hitting for average is by far the most important factor
contributing to team success, which contradicts the conventional wisdom that pitching is
the most important factor in baseball. The author then used the results to measure the
most valuable player (MVP) in the American League and National League in the MLB in
a particular year by empirically determining the value that each player brings to the team.
Zech used each player’s marginal product, calculated by computing each team’s batting
average, home runs, etc. without each player and using the values in the production
function to determine the number of expected victories the team would have
accomplished without the player. Using the difference between the two values as the
player’s marginal product, the model was able to determine which player added the most
wins to each team in the 1977 season.
Basketball Analytics
Basketball Analytics by Stephen Shea and Christopher Baker (Shea & Baker,
2013) provided insight into the rapidly growing world of basketball analytics and was
used as background research and knowledge for this thesis. Shea and Baker used
traditional statistics to create new stats that attempted to measure players in teams in new
and more effective means. A notable statistic introduced in Basketball Analytics is
Offensive Efficiency (OE), measured as:
OE = (FG + A) / (FGA – ORB + A + TO). (2.2)
Offensive Efficiency is defined as a percentage variable because the formula produces a
higher result when made field goals, assists, and offensive rebounds are higher, while
missed field goals and turnovers bring the resulting value lower. Offensive Efficiency, as
8
well as total points and total assists, is used to create Efficient Offensive Production
(EOP). EOP is used to describe total offensive production but also accounts for efficiency
of the player or team.
Shea and Baker also introduced a defensive statistic that “accounts for defensive
contributions beyond blocks or steals” (Shea and Baker, 2013) called Defensive Stops
Gained (DSG). Statistics such as DSG are important to come up with, as defensive
statistics have traditionally been lacking compared to offensive statistics. DSG is
measured by using net effective field goal percentage, net offensive rebound percentage,
and net turnover percentage, and included several positive per-game constants.
However, the most important statistic Shea and Baker introduced is Approximate
Value (AV). AV is calculated by adding together Defensive Points Saved (or Defensive
Stops Gained * 2) and Efficient Offensive Production in order to describe total
contribution to the team from each player. The resulting statistic is comparable to Player
Efficiency Rating (PER) and Wins Produced (WP) as the most complete measurement of
a player’s total performance.
Stephen Shea’s Basketball Analytics: Spatial Tracking (Shea, 2014) was also used
in this thesis as background knowledge and research. Using the new spatial player
tracking data collected by SportVU, Shea expands on his previous work and describes
new ways to measure performance. Shea shows that the most efficient regions on the
floor to shoot from are the corner three and the restricted area near the basket through
effective field goal percentage, and proves that catch and shoot attempts are far more
efficient than pull-up attempts. Shea was able to predict a team’s effective field goal
percentage and overall offensive efficiency through utilization of drives and catch and
9
shoot corner threes, indicating that drives and kick-outs to corner threes combined have a
positive effect on offensive efficiency (Shea, 2014).
Shea was also able to quantify the spacing in an offense and the stretch of a
defense, and showed the effects that both have in game situations. Using the Miami Heat
and San Antonio Spurs of 2013 and 2014 (both teams made the NBA Finals both
seasons) as examples, spacing was shown to be beneficial on the offensive end, as long as
efficient shooters were on the floor to draw the defense away from the basket. On the
other hand, stretching the defense proved to be disastrous for the defensive team, as
drawing defenders further from the rim allows the offense more space to perform drives
to the restricted area and other actions detrimental to the integrity of the defense.
Theoretical Framework
In this paper, the theoretical background is focused on the production frontier and
the corresponding production function. In terms of an economic or production theory
view, a basketball team can be compared to a competitive firm. Each team has a different
view for a successful team, and in this case, production can be seen as winning
percentage while the various statistics and variables take the place of the traditional
production inputs. The form of the function is the Cobb-Douglas production function:
Q = AXa1X
b2…Xx
n (3.1)
where n is the number of variables involved in the production function, A is a positive
constants, and a, b, and x are the exponents of the function. The Cobb-Douglas form has
several advantages for this type of study, especially since the exponents give relevant
information concerning returns to scale. Breaking down the derivation of the Cobb-
10
Douglas form; if Q1 = AXa1X
b2, and the firm doubles the amount of both variables, the
enterprise produces Q2 = A(2X1)a(2X2)
b = 2a+bAXa1X
b2. Thus, the output increases by
Q2/Q1 = (2a+bAXa1X
b2)/(AXa
1Xb2) = 2a+b. If a + b > 1, the firm (or team, in this case) is
experiencing increasing returns to scale. If the sum of the exponents is equivalent to 1,
the team is exhibiting constant returns to scale. Finally, if a + b < 1, the team is going
through decreasing returns to scale.
Graph 1: Output and Marginal Product of Input X
Another appealing property of the Cobb-Douglas form is the marginal products.
In the Cobb-Douglas production function form, the marginal products of each input
0
1000
2000
3000
4000
5000
0 5 10 15 20 25
Output (Q)
-100
0
100
200
300
400
0 5 10 15 20 25
Marginal Product of Input X
Input X
Input X
11
depend on the levels of other inputs. If the function is still Q1 = AXa1X
b2, the marginal
product of X1 would be:
MPX1 = dQ/dX1 = aAX1-a1
Xb2 , (3.2)
while the marginal product of X2 would be equivalent to
MPX2 = dQ/dX2 = bAXa1X
1-b2. (3.3)
After being broken down into partial derivatives, we can see that the marginal product of
one input depends on both the derivative of output with respect to the input in question,
but also on the value of the other inputs. This is important for this particular study
because in basketball, the number of shots taken and subsequent field goals made by a
team depends on the number of opportunities to possess the ball through rebounds,
forcing turnovers, etc.
The Cobb-Douglas production function also captures elasticity in a convenient
manner. Elasticity is defined as the percentage change in one variable in response to a
given percentage change in another variable while holding all other relevant variables
constant. In this particular form, the elasticity can also be interpreted as the exponents of
the respective inputs. For example, in a traditional Cobb-Douglas production function
Q = AXa1X
b2, if exponent a = 0.2, a 1% increase in X1 would lead to approximately a
0.2% increase in output Q. Finally, the Cobb-Douglas form is a widely used specification
when dealing with production functions. This makes the Cobb-Douglas form familiar to
many and therefore relatively simple to interpret. For this model, the variables not
already categorized in percent values are transformed by the natural log in order to undo
the exponentiation of the Cobb-Douglas production function. This process allows the
exponents to be interpreted as regression coefficients, and also permits the coefficients to
12
be interpreted as the elasticity for each respective input. Using the natural log also
generates a linear rather than quadratic function.
Data and Methodology
For this project in particular, the functional form of the model will be:
Q = X (ORa)(3P%b)(3PAc)(PITPd)(Re)(Af)(FBPg)(TOh)(FT%i)(FTAj)(FG%k)
(%TS3l)(%3FGMAm)(C3%n)(DRo)(OFG%p)(O3P%q)(OTOr)(STLs)(NRt)(DRAFTu)
(TRADEv)(FAw)(FANSx)(AGEy)(CONFz) U, (4.1)
and:
Q = regular season win percentage,
OR = offensive rating, or number of points scored per 100 possessions,
3P% = 3-point percentage,
3PAPG = 3-point attempts per game,
PITPPG = points in the paint per game,
RPG = rebounds per game,
APG = assists per game,
FBPPG = fast break points per game,
TOVPG = turnovers per game,
FT% = free-throw percentage,
FTAPG = free-throw attempts per game,
FG% = field-goal percentage,
%TS3 = percent of total shots from 3-point territory,
%3FGMA = percent of 3-point field goals made assisted by a teammate,
C3% = corner 3-point percentage,
DR = defensive rating, or number of points allowed per 100 possessions,
OFG% = opponent’s field-goal percentage,
O3P% = opponent’s 3-point percentage,
OTOVPG = opponent’s turnovers per game,
13
STLPG = steals per game,
NR = net rating, or the difference in offensive and defensive rating, or OR – DR,
DRAFT = number of players acquired through the draft or draft rights trade,
TRADE = number of players acquired through trade,
FA = number of players acquired through free agency,
FANS = average attendance per game,
AGE = average age of the team,
CONF = dummy variable describing the conference that each team plays in,
and X as a positive constant and U as the error term.
An extensive list of traditional, advanced, and shooting statistics for individual
players, teams, and opponents are tabulated and recorded by the NBA. The database on
NBA.com, as well as the data located on basketball-reference.com and
basketball.realgm.com, provided the data for this project. Data was gathered on the NBA
regular seasons from the 2010-11 season through the 2014-15 season, using per game
averages for the majority of statistics and percentages for shooting statistics.
In the model, I decided to use regular season win percentage as the indicator of
team success, or the output in the production function. Using win percentage rather than
other methods of capturing team success, such as the ratio of final scores or absolute
score differences, is important because it describes the success of the team over the
course of the entire season while showing the consistency of the team. Using win
percentage also accounts for a team’s playing style and does not differentiate between
high and low scoring teams.
Offensive, defensive, and net ratings are important statistics to tabulate, as they
also account for a team’s playing style. Offensive rating is equivalent to the number of
14
points scored per 100 possessions and defensive rating is the number of points allowed
per 100 possessions, which by definition does not differentiate between fast and slow
paced teams. The statistic is more of a measurement of efficiency, and is also known as
offensive/defensive efficiency by ESPN’s John Hollinger. Net rating is the difference
between offensive and defensive rating, and is the measurement of a team’s point
differential per 100 possessions. Over the 150 observations (30 teams over 5 seasons), the
minimum net rating a team accomplished was -15.5 by the historically bad Charlotte
Bobcats in the lockout-shortened 2012 season. The high for net rating over the last five
seasons was 11.4 and occurred in the 2015 season by the defending champion Golden
State Warriors.
Teams in the NBA have increasingly utilized the 3-point shot over the last five
seasons, as the average 3-point attempts per team have gone up in each successive year
from 18 attempts per game in 2011 to 22.4 per game in 2015. However, teams have been
shooting relatively similar percentages over the last five seasons, as league-wide averages
have stayed steady around 35% from beyond the 3-point line. As previously mentioned, I
expect that shooting an above-average 3-point percentage, shooting a high volume of 3-
point shots, and keeping opposing 3-point percentage down are all positive descriptors of
successful team performance. However, when running the model through the regression,
the coefficient estimate for 3-point percentage is expected to be the elasticity of non-
corner 3-point percentage, as corner 3-point percentage is also included as an input.
3-point percentage from the corner is also estimated to be an important factor in
determining win percentage, as the corner three is the shortest 3-point shot and therefore
the most efficient shot from 3-point range. Corner threes are 22 feet from the rim, while
15
3-point shots above the break around the rest of the arc are 23 feet, 9 inches. Compared to
the average 3-point percentage league-wide of 35%, the average corner 3-point
percentage over the last five years has been 38.5%.
Assisted 3-point shots are also expected to be an indicator of successful team
play, as a high number of assisted shots tend to lead to either catch-and-shoot 3-point
shots, open attempts, or both, and generally lead to higher percentage shots. The average
percent of 3-point shots that have been assisted by teammates over the last five years is
about 85%.
The mean for points in the paint have stayed steady over the last five seasons at
just over 41 points per game. However, points in the paint is a strong indicator of a
successful offense as shots close to the basket are the most efficient and effective shots an
offense can generate on a consistent basis.
Rebounds and assists per game are traditional statistics and historically strong
indicators of team success. Offensive rebounds give teams extra possessions, while
rebounds on the defensive end help to end possessions and start offense. Assists are good
signs of ball movement on the offensive side and tend to lead to higher-percentage shots
for teammates. The average number of rebounds per game per team has increased slightly
throughout the five successive seasons in question from 41.4 to 43.3, indicating a slight
increase in pace – or possessions per game – during the same timespan. Meanwhile,
assists per game have had slight peaks and valleys over the last half-decade but have
remained steady at just under 22 assists per game.
Turnovers, opponent’s turnovers, steals, and fast-break points per game are
closely related and are very important in determining easy shots for teammates and for
16
the opposition as well. Although steals are closely correlated with forced turnovers – as a
steal constitutes a turnover – forced turnovers make up more than steals and also include
dead-ball turnovers. However, steals force a live-ball turnover and tend to lead to fast-
break points in transition. I anticipate that forcing turnovers, steals, and fast-break points
have a positive impact on win percentage. On the other hand, turnovers lead to the same
advantage for the opposition and should have a negative impact on team success.
I anticipate that free throw percentage and volume of free throw attempts are
positive indicators of team success, as drawing fouls places the opposition in foul trouble.
The average free-throw percentage over the last five seasons has been about 75%,
indicating that for every 100 possessions of only shooting free throws a team would
achieve an offensive rating of 150, which would be the highest offensive rating in history.
Field-goal shooting (offensively and defensively) is expected to have a positive
(or negative, in opposing field goal percentage) impact on win percentage. In a
hypothetical situation when all other variables are equal, a team that shoots at a superior
percentage from the field than its opposition has a tangible advantage.
Although efficiency statistics such as true shooting percentage and effective field
goal percentage can be useful as well, the results can be skewed when using such
statistics. True shooting percentage and effective field goal percentage are composed of
other statistics (such as field goals, free throw percentage, field goals attempted, and three
point shots made) and when combined with the original statistics in the ordinary least
squares regression model, can bias the results. These types of statistics also do not
differentiate between teams that place focus on scoring in the paint against teams that
shoot a high volume of outside shots. For example, if Team A shot 40 of 100 while
17
making 20 three-point shots and Team B shot 50 of 100 with zero three-pointers made,
both teams would end up with 100 points from the same amount of attempts. Although
this scenario is highly improbable, similar situations can occur and the results have no
way of telling us the style of play that the team employs.
The average attendance per game should have a positive relationship with regular
season winning percentage. However, this may not be a factor of causation but rather of
correlation, as teams that win tend to draw a larger crowd. On the other hand, teams
playing at home have historically held an advantage over their visiting opponents due to
“superior performance by the home team and not preferential treatment by officials”
(Zak, Huang, & Siegfried, 1979). Chicago has led the league in attendance in all of the
last five years at an average of nearly 22,000 per game and has ranked fourth in the
league in win percentage with an average of over 65% over the same span.
The average age is expected to have a positive relationship with success until a
certain point – around 29 – and then is expected to diminish as players age. The
championship teams over the last five seasons have had varying mixes of players
regarding roster composition. The LeBron James-led Miami Heat from 2011 to 2014
were built primarily through free agency, as the Heat had the most free agents in the
league when building rosters from 2011-2013 and ranked second and third in 2014 and
2015, respectively. On the other hand, the San Antonio Spurs have maintained a
consistent core of players acquired on draft day, ranking in the top 10 in the league in the
last five years.
18
Recent Trends in Basketball Statistics
The recent partnership between the NBA and SportVU has dramatically expanded
the range of statistics available to the public, as the system provides new precise data that
would not be possible to gather without the use of SportVU camera technology and
tracking software. As a result, basketball is experiencing a renaissance of sorts with data
and statistics. Prior to player tracking, capturing the dynamic movements within a
basketball game was nearly impossible due to the fluidity and complexity of actions that
take place on the floor. The camera technology that SportVU has now implemented in
every NBA arena follows the ball and every player on the court, providing real-time
player and ball positioning and utilizing advanced statistical algorithms to derive
previously unavailable statistics.
The lack of statistics prior to the NBA’s partnership with SportVU particularly
needed improvement on the defensive side of the game, as steals and blocks have
historically been poor or neutral indicators of defensive ability. Although steals and
blocks are still important to categorize, as they effectively end an opponent’s offensive
possession and can potentially generate transition offense, placing importance on steals
and blocks can incentivize players to gamble rather than preventing their man from
getting to the basket or boxing out for defensive rebounding. As the main objective on the
defensive side of the ball is to prevent the opposition from getting open shots that lead to
made baskets, the player tracking system from SportVU now categorizes defensive
presence with statistics such as opponent’s contested field goal percentage and rim
protection, as well as keeping up with traditional statistics.
19
The NBA’s partnership with SportVU has revolutionized statistics for basketball,
as many new figures are now available to teams as well as the public. However, player
tracking data is only available from the 2013-2014 NBA season onward, as the SportVU
camera technology was only implemented into all NBA arenas in 2013. As this project is
dealing with data from the 2010-11 season through the 2014-15 season, this project will
unfortunately be devoid of player tracking statistics. However, future research can be
conducted with this new technology, as similar ideas can be used with player tracking
data to discover new areas of importance that have not been previously categorized.
Methodology
There are four models in place within the larger construct of this project. All four
models have regular season winning percentage from 2010-11 to 2014-15 as the
dependent variable. The first model is purely used as a reference to the succeeding
models, and consists of net rating as the only independent variable. The second model
will be constrained to the play style statistics, while the third model will consist of the
fixed team effects. The final model will tie both play style and team effects together in
order to gain a picture of a successful franchise, on and off the court.
The final three models will be run through OLS regression several times, using F-
tests in order to determine the significance of the independent variables. Multicollinearity
tests will also be performed, as the large set of interrelated variables suggests that
collinearity is potentially present within the inputs. However, although multicollinearity
does not violate OLS assumptions, acknowledging multicollinearity if present is
important as the effects can bias the resulting coefficient estimates.
20
Regression Results and Analysis
After utilizing the natural logarithm to transform the variables that are not already
categorized as percent values (such as rebounds per game, points in the paint, average
attendance, etc.), the regression coefficient after performing ordinary least squares
regression can be interpreted as the elasticity for each input. However, certain inputs –net
rating, players acquired on draft day, and players acquired via trade – have at least one
singular negative or zero values within the dataset. Therefore, the original value for these
inputs must be used instead of the logged form of the variables, as the natural log of
negative and zero numbers are undefined.
The first model comprises of winning percentage as the dependent variable and
net rating as the single independent variable. Net rating is an exceptionally good predictor
of winning, as it is describes the difference between offensive and defensive efficiency.
This can also be seen as the difference between points scored per 100 possessions and
points allowed per 100 possessions. This first model will be used primarily as a reference
point to the ensuing models to come.
Table 1: Net Rating Regression Results
WP Coefficient: Std. Error: t-score: P-value: 95% CI:
NR 0.0293 0.0006 48.30 0.000 0.0281, 0.0305
X 0.4999 0.0031 159.53 0.000 0.4938, 0.5062
R-squared:
0.9403
Adjusted R2:
0.9399
Residual:
0.2181
Prob > F:
0.0000
Note: WP = winning percentage, NR = net rating, X = positive constant, CI = confidence interval
21
First, we look to the p-value of the F-test. As shown in Table 1, the p-value is
0.0000, indicating that the overall model is statistically significant. In the dataset, the
values for winning percentage are cataloged into three decimal places (50% is logged as
0.500, for example). As shown in Table 1, the coefficient for net rating is 0.0293. This
value indicates that for every point increase in net rating, winning percentage increases
by 2.93 percentage points (from 0.500 to 0.5293, to continue the example). The R-
squared value of the model is at 0.9403 and the adjusted R-squared value is 0.9399,
indicating that the model explains approximately 94 percent of the variability of the data
around the mean. As shown below, net rating and winning percentage have a very strong
positive relationship, and the data points fit the regression line remarkably well.
Graph 2: Net Rating vs. Win Percentage
22
The second model is constrained to the play style variables. The objective behind
keeping this particular model to the play style variables is to help define and describe on-
the-court success while controlling for the actions taking place behind the scenes. It is
important to differentiate the effects of play style variables from the fixed effects of the
franchise before combining the two in the final model, as teams can potentially have
success on the floor without having established fixed team variables in place. Another
reason for keeping this model constrained to play style inputs is to help determine if the
fixed team effects are important and significant to team success on the floor. This model
is attempting to describe the effect that coaching has on winning, as these play style
variables are mostly determined by coaching decisions.
Table 2: Play Style Regression Results
WP Coefficient: Std. Error: t-score: P-value: 95% CI:
3P% -0.0496 0.3046 -0.16 0.871 -0.6522, 0.5529
FT% 0.0386 0.1585 0.24 0.808 -0.2749, 0.3521
FG% -1.3520 0.7773 -1.74 0.084 -2.8899, 0.1859
%TS3 0.8745 0.5786 1.51 0.133 -0.2702, 2.0192
%3FGMA 0.1261 0.0974 1.29 0.198 -0.0665, 0.3187
C3P% -0.1051 0.1457 -0.72 0.472 -0.3932, 0.1831
OFG% -0.3627 0.6610 -0.55 0.584 -1.6705, 0.9450
O3FG% 0.1481 0.2876 0.51 0.607 -0.4209, 0.7171
logOR 3.5094 0.4504 7.79 0.000 2.6184, 4.4004
log3PAPG -0.2086 0.1310 -1.59 0.114 -0.4677, 0.0506
logPITPPG 0.1296 0 .0697 1.85 0.066 -0.0088, 0.2669
logRPG -0.2308 0.2002 -1.15 0.251 -0.6269, 0 .1653
logAPG 0.0648 0.0593 1.09 0.277 -0.0526, 0 .1821
23
logFBPPG 0.0003 0.0220 0.01 0.990 -0.0432, 0 .0437
logTOVPG 0.0179 0.0884 0.20 0.840 -0.1569, 0.1927
logFTAPG -0.0413 0.0509 -0.81 0.419 -0.1421, 0.0595
logDR -3.0305 0.3121 -9.71 0.000 -3.648, -2.4131
logOTOVPG -0.0786 0.1116 -0.70 0.482 -0.2994, 0.1421
logSTLPG -0.0173 0.0719 -0.24 0.811 -0.1595, 0.1249
X -0.1566 1.9754 -0.08 0.937 -4.0647, 3.7515
R-squared:
0.9470
Adjusted R2:
0.9393
Residual:
0.1936
Prob > F:
0.0000
Note: 3P% = 3-point percentage, FT% = free throw percentage, FG% = field goal percentage, %TS3 =
percent of total shots from 3, %3FGMA = percent of assisted 3-point field goals made, C3P% = corner 3-
point percentage, OFG% = opponent field goal percentage, O3FG% = opponent 3-point percentage, logOR
= natural log of offensive rating, log3PAPG = natural log of 3-point attempts per game, logPITPPG =
natural log of points in the paint per game, logRPG = natural log of rebounds per game, logAPG = natural
log of assists per game, logFBPPG = natural log of fast break points per game, logTOVPG = natural log of
turnovers per game, logFTAPG = natural log of free throw attempts per game, logDR = natural log of
defensive rating, logOTOVPG = natural log of opponent turnovers per game, logSTLPG = natural log of
steals per game
As shown in Table 2, the p-value of the F-test is 0.0000, indicating that the overall
model is statistically significant. Similar to the previous model, the R-squared and
adjusted R-squared values are 0.9470 and 0.9393, respectively, indicating that the data
fits the regression line and that the model explains much of the variability around the
mean. Despite this fit, most of the variables have P-values above 0.05, indicating that
they are not significant at the 95% level. The natural log of offensive rating and the
natural log of defensive rating are the only two inputs that are significant to winning
percentage in this second model, and are statistics that model efficient offenses and
defenses. However, many of the other inputs involved in the model are pieces that are
needed to have high offensive ratings and low defensive ratings. Therefore, if we break
down the play style variables into offensive and defensive categories, the resulting
models should result in significant variables describing offensive and defensive rating.
24
Table 2.1: Offensive Rating Regression Results
OR Coefficient: Std. Error: t-score: P-value: 95% CI:
3P% 34.3029 5.8727 5.84 0.000 22.6894, 45.9166
FT% 21.8690 2.7661 7.91 0.000 16.3989, 27.3392
FG% 129.4964 8.1827 15.83 0.000 113.3145, 145.6782
%TS3 24.7104 9.9754 2.48 0.014 4.9834, 44.4375
%3FGMA -1.8888 2.0296 -0.93 0.354 -5.9024, 2.1249
C3P% -5.8564 3.0983 -1.89 0.061 -11.9836, 0.2706
log3PAPG -0.0408 2.3165 -0.02 0.986 -4.6217, 4.5401
logPITPPG 5.6671 1.2556 4.51 0.000 3.1842, 8.1500
logRPG 9.2363 1.9745 4.68 0.000 5.3317, 13.1410
logAPG -3.1921 1.2620 -2.53 0.013 -5.6877, -0.6965
logFBPPG 0.1197 0.4331 0.28 0.783 -0.7367, 0 .9761
logTOVPG -15.5971 1.1179 -13.95 0.000 -17.8078, -13.3865
logFTAPG 7.8618 0.8213 9.57 0.000 6.2377, 9.4859
X -15.2168 8.3950 -1.81 0.072 -31.8183, 1.3848
R-squared:
0.9467
Adjusted R2:
0.9416
Residual:
98.0228
Prob > F:
0.0000
The p-value of the F-test is 0.0000, signifying that the overall model is
statistically significant. The R-squared and adjusted R-squared values have stayed
relatively similar to Table 2, indicating that the model still explains much of the
variability around the regression mean. However, many of the inputs in Table 2.1 have p-
values less than 0.05, indicating that these variables are statistically significant at the 95%
level. The coefficient values for each of the variables indicates the amount of change
estimated in offensive rating given a one unit change (or one percent change, in the case
25
of the variables transformed by the natural log) in the value of each respective variable,
given that all other variables in the model are held constant.
Percent of 3-pointers made from an assist has a negative coefficient estimate and
a p-value of 0.354, and is therefore insignificant to offensive rating. This result is
somewhat unexpected, as assisted three-point makes should theoretically lead to
uncontested three-point attempts, which tends to be a mark of a good offense. Perhaps a
better variable to tabulate in future research would have been the number of uncontested
three-point attempts a team is able to generate per game, as this new variable would be
able to describe both ball movement and spacing of an offense. However, this type of
variable is produced through SportVU’s player tracking system (explained in Data and
Methodology) and is unavailable for the entire period of this study.
Three-point attempts per game are also insignificant to offensive rating, as the p-
value for the natural log of 3-point attempts per game has a p-value of 0.986. This result
can be explained, as bad offenses can be liable to jack up three-point attempts at a low-
percentage, high-volume rate, while teams with a dominant inside presence can establish
a good offense without a high volume of three-point attempts. An example of a bad
offense that takes a high volume of low percentage three-point attempts is the 2015
Philadelphia 76ers. Although Philadelphia has been accused on numerous occasions of
tanking during the present in order to build for the future, the fact remains that Philly had
the 11th highest rate of three-point attempts per game over all teams during the last five
seasons at over 26 attempts per game while generating a paltry 93 offensive rating, good
for second worst over the same time span. On the other hand, the Memphis Grizzlies of
2011 attempted only 11.3 three-pointers per game, which was the lowest mark over the
26
last five seasons. However, the twin towers of Marc Gasol and Zach Randolph helped
lead Memphis to an offensive rating of 104.4 during the 2011 season, an above-average
mark over the last five seasons. In fact, Memphis has only average 13.4 three-point
attempts per game over the given time frame – lowest in the league – while maintaining a
league-average offensive rating.
Fast break points are also an insignificant input to offensive rating, as the p-value
shows 0.783. This result can be explained through each team’s preference of style, as the
Denver Nuggets, Houston Rockets and Golden State Warriors have combined for 7 of the
15 highest totals in fast break points. On the other hand, the New York Knicks and
Brooklyn Nets have combined for 6 of the 10 lowest totals in fast break points per game
over the last five seasons, all to varying degrees of success.
Corner three-point percentage is both negative and insignificant at the 95% level
(albeit just barely, as the p-value is 0.061), which is the most unexpected result obtained
from Table 2.1. A potential explanation for the input’s lack of significance to offensive
rating is that shooting percentage from the corner does not account for the number of
attempts, particularly open attempts, that an offense is able to generate. In hindsight, a
variable that could potentially have more success predicting offensive rating is the total
number of corner threes and uncontested corner threes that a team attempts per game.
The total number of attempts a team is able to produce from the corner should be
indicative of an offense that spreads the floor and moves the ball, as good defenses tend
to focus on defending the corner three.
Non-corner three-point percentage, field goal percentage, and free throw
percentage are all significant at the 99.9% level and all three inputs have highly positive
27
coefficient estimates. The coefficient results indicate that shooting percentages have very
strong influences on offensive rating, with field goal percentage being the best indicator.
This is not a new result, as good offenses are historically dependent on being able to
score at an efficient percentage from the field and from the free throw line.
Percent of total shots attempted from three-point range has a positive coefficient
of 24.7, indicating that teams that place greater emphasis on three-point attempts in their
shot selection tend to be associated with a more efficient offense. Compared to total
attempts per game from three, this input is achieved through a team’s gameplan and
accounts for the pace and total shot attempts. The variable is also statistically significant
at the 98.6% level, as the p-value is 0.014.
The natural log of points in the paint has a positive coefficient estimate of 5.7,
indicating that for every additional percent increase in points in the paint, offensive
efficiency will likewise increase by approximately 5.7%. This result makes sense in a
basketball perspective, as points in the paint are the closest shots to the basket a team can
produce and are therefore the most efficient. The variable is also statistically significant
at the 99.9% level.
Rebounds per game also have a strong positive connection with offensive rating,
as the beta coefficient for the natural log of rebounds per game is 9.2. The result denotes
a strong association between rebounds and offensive efficiency, as an additional percent
increase in rebounds per game leads to 9.2% increase in offensive rating.
The coefficient for the natural log of assists per game is negative, which is an
unanticipated result. Theoretically, assists would be presumed to have a positive link to
offensive rating, as assists tend to create high percentage shot attempts for teammates.
28
However, assists have been a thorn in the side of the analytics movement in basketball, as
assist percentage (percentage of field goals assisted) has historically had little influence
on field goal percentage (Ziller, 2013). Conceptually, assists should have a significant
and strong positive effect on a team’s shooting percentage.
Table 2.1a: Field Goal Percentage vs. Assists
FGP Coefficient: Std. Error: t-score: P-value: 95% CI:
logAPG 0.1013 0.0152 6.65 0.000 0.0712, 0.1314
X 0.1410 0.0469 3.01 0.003 0.0483, 0.2337
R-squared:
0.2298
Adjusted R2:
0.2246
Residual:
0.0285
Prob > F:
0.0000
Although assists are significant to field goal percentage, the coefficient estimate
for the natural log of assists per game is 0.1. This result indicates that for every percent
increase in assists per game, field goal percentage increase by 0.1%. While the result is
indeed positive, the effect is minimal.
Turnovers per game have a strong negative association with offensive rating, and
the variable is also highly significant. This result is to be expected, as turnovers result in a
loss of possession and forfeits a shot attempt. Free throw attempts per game, on the other
hand, has a strong positive association with offensive rating and is also highly significant.
Teams that attempt a high number of free throws per game are troublesome for opposing
defenses to deal with, as free throws place opposing teams in foul trouble, which tends to
limit minutes of the guilty players. Free throws are also a very efficient source of offense,
as the league average has hovered around 75% over the last five seasons.
29
Table 2.2: Defensive Rating Regression Results
DR Coefficient: Std. Error: t-score: P-value: 95% CI:
OFG% 190.9438 9.2621 20.62 0.000 172.6366, 209.251
O3P% 25.8783 8.2924 3.12 0.002 9.4877, 42.2688
logRPG 1.4735 2.5516 0.58 0.564 -3.5698, 6.5169
logOTOVPG -7.9471 2.9019 -2.72 0.007 -13.6830 -2.2111
logSTLPG 0.6417 1.9624 0.33 0.744 -3.2374, 4.5203
X 25.8060 7.5349 3.42 0.001 10.9127, 40.6993
R-squared:
0.8643
Adjusted R2:
0.8596
Residual:
210.9558
Prob > F:
0.0000
Table 2.2 describes defensive rating as the dependent variable through defensive
statistics. The p-value of the F-test is 0.0000, indicating that the overall model is
statistically significant. The R-squared value is 0.8643, which is lower than the values for
net rating and offensive rating but is still a relatively high value. The adjusted R-squared
is 0.8596, meaning that approximately 86% of the variability of defensive rating is
accounted for by the model, even after taking the number of predictor variables in the
model into account.
Opponent field goal percentage and opponent three-point percentage are the two
strongest predictors of defensive rating, and are both statistically significant at the 95%
level or higher. These two results, compared to steals per game – which is statistically
insignificant – imply that limiting high-percentage shots and keeping the shooting
percentages of opposing teams low is more important to a stingy defense than steals.
As shown above, rebounds per game are not significant statistically. Although the
coefficient is positive, the effect rebounds have on defensive rating is negligible. The
30
variable for rebounds is total rebounds, and is therefore used in both offensive rating and
defensive rating regression models.
Blocks were not used in this study, as the traditional statistic that groups all
blocks into one category has generally been a poor indicator of defensive prowess.
Blocks do not account for a team’s shooting percentage at the rim, nor do they account
for the distance from the basket of the blocked shots attempted. Blocks are also assumed
to begin transition offense or guarantee possession for the team doing the blocking, but
according to Nylon Calculus, “57.2% of all blocked shots were recovered by the defense”
(Willard, 2015). However, expanding blocks into multiple new statistics has the potential
to be a strong descriptor of effective defense with future research.
The third model is constrained to the fixed team effects. The objective behind
keeping this model to the fixed team effects is to determine if play style variables are
important and significant to successful team basketball in the NBA. The question that this
model is attempting to answer is: does the coach or the general manager have a higher
level of accountability when it comes to success on the floor? This third model will
attempt to answer this question by categorizing variables that occur off the court and
determine if these inputs are significant to regular season winning percentage.
Table 3: Fixed Team Effects Regression Results
WP Coefficient: Std. Error: t-score: P-value: 95% CI:
CONF 0.0435 0.0194 2.24 0.026 0.0052, 0.0819
DRAFT 0.0501 0.0165 3.04 0.003 0.0176, 0.0827
TRADE 0.0327 0.0161 2.03 0.044 0.0009, 0.0646
31
FA 0.0309 0.0159 1.96 0.053 -0.0003, 0.0623
logFANS 0.3216 0.0919 3.50 0.001 0.1399, 0.5031
logAGE 1.1451 0.1821 6.29 0.000 0.7852, 1.5051
X -7.0172 0.8785 -7.99 0.000 -8.7537, -5.2807
R-squared:
0.4755
Adjusted R2:
0.4535
Residual:
1.9170
Prob > F:
0.0000
Note: CONF = dummy variable (0 for eastern conference, 1 for western conference), DRAFT = number of
players acquired on draft day, TRADE = number of players acquired via trade, FA = number of players
acquired via free agency, logFANS = natural log of average attendance per game, logAGE = natural log of
average age of team
As shown above, the p-value of the F-test is 0.0000, indicating that the overall
model is statistically significant. The R-squared and adjusted R-squared values are
0.4755 and 0.4535 respectively, signifying that the model explains just over 45% of the
variability around the mean. Although the R-squared values are approximately half of the
previous two models, the P-values for each input is statistically significant at the 95%
level (excluding FA, which is significant at the 94.7% level). The coefficient for
conference shows that Western Conference teams on average perform at a superior rate
than their Eastern Conference counterparts. Table 4 also shows this result using basic
descriptive statistics:
Table 3.1: Summarization of Win Percentage by Conference
Eastern Conference:
Variable: # of Obs. Mean Std. Dev. Min. Max.
WP 75 0.4681 0.1580 0.106 0.805
Western Conference:
Variable: # of Obs. Mean Std. Dev. Min. Max.
WP 75 0.5318 0.1496 0.195 0.817
32
Shifting the focus back to Table 3, the coefficient value for players acquired on
draft day is the highest amongst the roster composition variables, and is estimated at
0.0501. This result indicates that for every additional player on the roster originally
acquired on the respective player’s draft day – whether the player was drafted by his
current team or if the team traded for his draft rights – winning percentage is estimated to
increase by 5.01 percentage points (from 0.500 to 0.5501, or 50% to 55.01%).
Conceptually, this makes sense in a basketball perspective, as teams that are able to draft
and develop a higher number of players allows for a higher level of continuity and
familiarity between the players and within each team’s respective system.
The San Antonio Spurs are an excellent example of building through the draft and
having continued success. Over the years, the Spurs have made smart draft night
decisions and acquired players such as Tim Duncan, Manu Ginobili, Tony Parker, and
Kawhi Leonard, and have been the most successful team over the last five seasons,
winning over 72% of their games. The number of players acquired through free agency
and the number of players acquired via trade also both have positive coefficient estimates
and are relatively similar in value, but neither variable affects winning percentage at the
same rate as players acquired through the draft.
The coefficient estimate for the natural log of average attendance is 0.3216,
indicating that for every additional percent increase in attendance, output increases by
0.3216 percent. However, based on the results, the model is unable to specify whether a
higher winning percentage causes a higher average attendance or whether the two
variables are positively correlated.
33
The natural log of age has a coefficient estimate 1.1451, indicating that when the
average age of a team increases by 1 percent, the winning percentage of the team is
expected to increase by 1.1451 percent. This result also makes theoretical sense on a
basketball level, as teams with more veterans are more experienced and are more likely to
be competing for a championship. However, the resulting estimate is not indicating that a
team exclusively made up of old veteran players is expected to be more successful. The
span of average ages over the last five years ranges from 23.2 to 31.3, so the coefficient is
estimating success within this range.
The final model is a combination of play style variables and fixed team effects.
Combining the play style and team effects variables should provide insight into the
structure and focus of successful franchises. This final model is attempting to describe
how the front office, coaching staff, and roster collaborate to result in accomplishments
on the court.
Table 4: Play Style and Fixed Team Effects Regression Results
WP Coefficient: Std. Error: t-score: P-value: 95% CI:
3P% 1.1469 0.3729 3.08 0.003 0.4089, 1.8847
FT% 0.7179 0.1787 4.02 0.000 0.3644, 1.0715
FG% 3.2704 0.6178 5.29 0.000 2.0481, 4.4929
%TS3 3.8434 0.7135 5.39 0.000 2.4316, 5.2552
%3FGMA 0.2354 0.1307 1.80 0.074 -0.0232, 0.4940
C3P% -0.3448 0.1969 -1.75 0.082 -0.7345, 0.0449
OFG% -2.9839 0.5703 -5.23 0.000 -4.1125, -1.8554
O3P% -0.6903 0.3859 -1.79 0.076 -1.4539, 0.0734
34
log3PAPG -0.7743 0.1676 -4.62 0.000 -1.1061, -0.4426
logPITPPG 0.1839 .0969 1.90 0.060 -0.0079, 0.3758
logRPG 0.9821 0.2213 4.44 0.000 0.5442, 1.4199
logAPG -0.0531 0.0806 -0.66 0.511 -0.2125, 0.1064
logFBPPG -0.0197 0.0314 -0.63 0.531 -0.0818, 0.0424
logTOVPG -0.5454 0.0738 -7.39 0.000 -0.6914, -0.3994
logFTAPG 0.1053 00.0546 1.93 0.056 -0.0027, 0.2133
logOTOVPG 0.3979 0.1425 2.79 0.006 0.1159, 0.6799
logSTLPG -0.0470 0.0985 -0.48 0.634 -0.2419, 0.1478
logFANS 0.0436 0.0471 0.92 0.357 -0.0496, 0.1368
logAGE 0.3763 0.0999 3.77 0.000 0.1787, 0.5739
DRAFT 0.0163 0.0081 2.00 0.047 0.0002, 0.0324
TRADE 0.0076 0.0078 0.98 0.329 -0.0078, 0.0230
FA 0.0071 0.0076 0.94 0.350 -0.0079, 0.0220
X -4.8313 1.048 -4.61 0.000 -6.9068, -2.7559
R-squared:
0.9055
Adjusted R2:
0.8891
Residual:
0.3455
Prob > F:
0.0000
The p-value of the F-test is 0.0000, indicating that the overall model is
statistically significant. The R-squared and adjusted R-squared values are 0.9055 and
0.8891 respectively, which indicates that the model explains approximately 90% of the
variability around the regression mean. The coefficient value for the positive constant X
describes the predicted value if the inputs all equal zero, which does not have much
significance for this particular model.
In order to compare the relative strength of each input to one another, we can take
the beta scores of each variable. The beta scores are measured in standard deviations
rather than the units of the original variables, which allow the relative strength of each
35
variable within the model to be compared. The beta scores are essentially the regression
coefficients if the output and inputs were transformed standard scores, or z-scores. As
shown in Table 4.1, the variable with the highest positive coefficient estimate and beta
coefficient is the percent of total shots taken that were three-point attempts. However, the
input with the lowest negative beta coefficient is the natural log of three-point attempts
per game.
Although the results for the highest and lowest coefficient values are initially
confusing, this can potentially be explained by the shot selection of each team.
Conceptually, a team taking a higher number of three-point attempts is more likely to be
taking a higher percentage of lower quality shots, as a higher number of attempts signals
that the team is playing at a faster pace (or number of possessions per game) and is taking
a higher number of total shot attempts. High pace rankings in the past have not translated
to success in the playoffs, as only the 1981-82 Los Angeles Lakers and the 2014-15
Golden State Warriors have ranked in the top five in pace since 1978 and won the
championship. A team that takes a higher number of three-point attempts each game is
also likely to trailing in the overall score of the game and is attempting to make a
comeback. However, a team that incorporates a higher percentage of threes into the total
shots taken per game is theoretically not rushing to take quick shots but instead is
choosing to focus on three-point attempts, rather than midrange shots, which have
historically been less efficient.
Below, Table 4.1 describes each independent variable and the associated beta
coefficient. As mentioned previously, the coefficients are measured in standard
deviations, which allows the variables to be compared amongst one another.
36
Table 4.1: Play Style and Fixed Team Effects Beta Coefficients
WP Beta coefficients:
3P% 0.1468
FT% 0.1345
FG% 0.3291
%TS3 1.1412
%3FGMA 0.0612
C3P% -0.0660
OFG% -0.2725
O3P% -0.0666
log3PAPG -0.9914
logPITPPG 0.1071
logRPG 0.2877
logAPG -0.0253
logFBPPG -0.0265
logTOVPG -0.2463
logFTAPG 0.0720
logOTOVPG 0.2072
logSTLPG -0.0350
logFANS 0.0336
logAGE 0.1646
DRAFT 0.1971
TRADE 0.1018
FA 0.0931
Since the beta coefficients are measured in standard deviations, one standard
deviation increase in three-point percentage results in an estimated increase of 0.1468
standard deviations in win percentage (as shown in Table 4.1). In order to compare the
37
effect of the standard deviations of each respective variable, Table 4.2 summarizes the
list of statistics used in the combined model of play style variables and fixed team effects
and shows the standard deviations and means of each input as well as output. The actual
effect that the beta values are interpreting are based on the values in Tables 4.1 and 4.2.
As shown, one standard deviation in three-point percentage is equivalent to 0.0201, or
approximately two percentage points. Therefore, when three-point percentage increases
by two percent, winning percentage is expected to increase by 0.1468, or 14.68
percentage points of a standard deviation. Since the standard deviation for winning
percentage is 0.1566, or 15.66 percentage points, when three point percentage increases
by one standard deviation of two percent, winning percentage is expected to increase by
2.3 percent.
Table 4.2: Summarization of Inputs and Output
Variable: # of Obs. Mean Std. Dev. Min Max
WP 150 0.5001 0.1566 0.106 0.817
3P% 150 0.3538 0.0201 0.295 0.403
FT% 150 0.7554 0.0293 0.66 0.828
FG% 150 0.4526 0.0158 0.408 0.501
%TS3 150 0.2437 0.0465 0.136 0.392
%3FGMA 150 0.8476 0.0407 0.715 0.934
C3P% 150 0.3848 0.0299 0.319 0.467
OFG% 150 0.4525 0.0143 0.419 0.487
O3P% 150 0.3548 0.0151 0.308 0.411
log3PAPG 150 2.9792 0.2005 2.4248 3.4874
logPITPPG 150 3.7266 0.0911 3.5086 4.0604
38
logRPG 150 3.7449 0.0459 3.6082 3.8607
logAPG 150 3.0758 0.0746 2.9178 3.3105
logFBPPG 150 2.5728 0.2106 2.1282 3.0397
logTOVPG 150 2.6700 0.0707 2.4159 2.8736
logFTAPG 150 3.1338 0.1071 2.8094 3.4372
logOTOVPG 150 2.6691 0.0816 2.4248 2.8565
logSTLPG 150 2.0272 0.1165 1.7047 2.2618
logFANS 150 9.7595 0.1207 9.5079 10.0061
logAGE 150 3.2789 0.0685 3.1442 3.4436
DRAFT 150 5.0467 1.8943 0 10
TRADE 150 4.4600 2.0939 0 11
FA 150 5.1600 2.0598 1 10
Below in Table 4.3, the variance inflation factors (VIFs) of each input are
described in descending order. VIFs measure the multicollinearity of the set of regression
variables. Multicollinearity can be a problem, particularly when the number of inputs is
high (as in this model), since having independent variables that are closely related can
bias the precise results of the individual inputs. VIFs above 10 may be especially
indicative of multicollinearity, while a tolerance closer to one means that collinearity is
not an issue. As shown below, the number of three-point attempts and the percentage of
total shots taken from three-point territory are very closely related and are likely affected
by multicollinearity. The number of players acquired through trades, free agency, and the
draft are also potentially affected by multicollinearity.
Although multicollinearity is clearly present in this model, the collinear variables
are all representing individual value to the overall model and will therefore remain in the
model. Although multicollinearity remains a problem for regression models such as this
39
one, having multicollinearity in the model does not violate Ordinary Least Squares
assumptions, as multicollinearity does not affect the overall fit of the model nor does
having it present in the model result in inadequate prediction estimates.
Table 4.3: Play Style and Fixed Team Effects Variance Inflation Factors (VIFs)
Variable: VIF: Tolerance (1/VIF):
log3PAPG 61.89 0.016158
%TS3 60.30 0.016585
TRADE 14.54 0.068771
FA 13.26 0.075431
DRAFT 13.02 0.076787
logTOVPG 7.40 0.135161
logSTLPG 7.21 0.138732
logRPG 5.65 0.177144
FG% 5.19 0.192593
logPITPPG 4.27 0.233973
OFG% 3.64 0.274424
3P% 3.06 0.326633
logAGE 2.56 0.390164
logFBPPG 2.39 0.418371
logAPG 1.98 0.505467
C3P% 1.91 0.523245
logFTAPG 1.87 0.534404
O3P% 1.87 0.536135
logFANS 1.77 0.564224
%3FGMA 1.55 0.645506
FT% 1.51 0.664381
logOTOVPG 1.49 0.670415
40
Although multicollinearity is clearly present in this model, the collinear variables
are all representing individual value to the overall model and will therefore remain in the
model. Although multicollinearity remains a problem for regression models such as this
one, having multicollinearity in the model does not violate Ordinary Least Squares
assumptions, as multicollinearity does not affect the overall fit of the model nor does
having it present in the model result in inadequate prediction estimates.
Conclusion
As shown in Tables 4 and 4.1, the model found that independent variables with
the greatest statistically significant positive effect on win percentage are shooting
percentages (including field goal, three-point, and free throw percentages), percent of
total shots taken as three-point attempts, total rebounds, forcing turnovers, increased age
and experience (to a certain limit), and the number of players acquired through the draft.
On the other hand, the inputs with the most detrimental negative effects on winning are
opponent field goal percentage, total three-point attempts per game, and turnovers.
In order to tell if the resulting variables have an effect that has translated to
success in the playoffs, the inputs that significantly affected regular season win
percentage will be compared to the teams that ranked highly in these statistics over the
last five seasons. Three-point percentage translates very well to playoff success, as three
of the top six teams over the last five seasons are the 2015 Golden State Warriors (2),
2014 San Antonio Spurs (3), and the 2013 Miami Heat (6). All three of these teams were
eventually crowned as NBA Champions in each respective season. Field goal percentage
is also crucial to playoff success, as six of the top ten teams were NBA Finalists in their
41
respective seasons. All ten of the top ten teams also finished with a win percentage of
65% or above. The top ten in free throw percentage is dominated by the Oklahoma City
Thunder teams from 2013, 2011, 2014, and 2012, the Portland Trail Blazers of 2014,
2011, 2015, and 2012. Although neither of the two teams has won a title within this span,
both have experienced stellar regular season records, as OKC has averaged a stellar win
percentage of over 70% for an average record of 58 wins and 24 losses during the four
seasons above and Portland has averaged a very respectable win percentage of 57.25%
for an average record of 47 wins and 35 losses.
Percentage of total shots from three was proven to be the most effective predictor
of win percentage in the regular season amongst the independent variables in Tables 4
and 4.1, and has had a positive effect on playoff success as well. 20 of the top 25 teams
ranked by this statistic were playoff teams, but the 2015 Golden State Warriors (13) were
the only team in this set to win it all. However, the 2015 Cleveland Cavaliers (6) and the
2014 Miami Heat (21) were both participants in their respective Finals, and the 2015
Houston Rockets (1) and 2015 Atlanta Hawks (9) finished as conference finalists.
Rebounds per game are a category that is clearly important to team success and
has been proven to have a positive effect on regular season winning percentage.
However, the 2015 Golden State Warriors were the only team in the last 5 seasons to
rank top 10 in rebounding during the season and win the title in the same year. On the
other hand, rebounding is a statistic that teams place differing amounts of importance on,
as rebounding is particularly dependent on the personnel and capabilities of the team. For
example, the Miami Heat ranked last in the league in rebounding in each of the last three
seasons, but won the NBA Finals in 2013 and finished as the runner-up in 2014. For a
42
team like Miami that does not emphasize rebounding, the alternative is to focus on
different aspects of the game in order to make up for their lack in rebounding.
Forcing turnovers has had a positive effect on winning percentage because teams
that force a high number of turnovers give themselves a better opportunity to win the
game based on the sheer number of possessions. As turning the ball over decreases the
number of possessions available to even attempt a shot, forcing turnovers has the same
effect on the opposition while creating extra possessions for the team. For each season,
the 2015 Golden State Warriors and the 2013 and 2014 Miami Heat teams finished in the
top five in forced turnovers per game and won the NBA championship. However, turning
the ball over has been proven to have a negative effect on winning, since turnovers
concede possessions.
Due to the increased experience and establishment that comes with older NBA
veteran players, having a team consisting of an average age of 29-31 provides a positive
effect to winning. Having an older team also likely indicates that the players are all fully
developed and in their primes, rather than having the inexperience that youthful players
bring to the team. Older teams have been successful in the playoffs in the last five
seasons, as the 2011 Dallas Mavericks, 2012 and 2013 Miami Heat, and the 2014 San
Antonio Spurs all ranked in the top five during their respective seasons in average age,
and all four teams went on to win the title at the end of each season.
The San Antonio Spurs have ranked in the top ten in the league for each of the
last five seasons in the number of players on the roster acquired through the draft and are
essentially the model for consistency and success, averaging a win percentage of 73%, or
nearly 60 wins per year, over the last five seasons. A potential reason for their success is
43
the number of valuable players that the Spurs acquired on draft day, such as Tim Duncan,
Tony Parker, Manu Ginobili, and Kawhi Leonard.
Other teams that have had a consistently high number of players acquired on draft
day are: the Oklahoma City Thunder, who have also ranked in the top ten every season
over the last five and have averaged a win percentage of 68%, or nearly 56 wins per
season; the Chicago Bulls, who have finished first, twelfth, fifth, and tenth in the league
over the last four seasons and have a win percentage of 62.5%, or over 51 wins per
season; the Portland Trail Blazers, who have finished first twice, fifth, and second in four
of the last five seasons, and finished with an average win percentage of 57% in those four
seasons; and the Golden State Warriors, who finished amongst the bottom ten teams in
the league in 2011 and 2012 (and won 43.9% and 34.8% of their games, respectively),
but have finished in the top ten over the last three seasons and have won an average of
67% of their games during this span.
Holding opponents to a low field goal percentage has been proven to be the best
indicator of a strong defense (as far as traditional statistics can describe) and also has a
significantly positive effect on regular season win percentage. However, this statistic can
also be very indicative of playoff success, as all four of the championship teams in the
last five seasons have finished in the top ten in opponent field goal percentage during
their respective title-winning seasons.
Based on the results found in the model and displayed in Table 2.1, an efficient
offense must shoot the ball well from the field, as well as the free throw line and three-
point line, factor in a high percentage of their total shots as three-point attempts, rebound
the ball well, limit turnovers, and get to the free throw line often. A strong defensive team
44
forces a high amount of turnovers while keeping the field goal percentage of the
opposition low, as shown in Table 2.2. Meanwhile, teams in the Western Conference
were shown to historically outperform their Eastern Conference counterparts, teams that
build through the draft have had higher winning percentages than teams that focus more
on building through free agency and through trades, and teams with more age and
experience tend to perform at a higher level than those made up of younger and less
experienced players, shown in Table 3.
Based on the regression results from these models, a prototype for teams to follow
should be the standards set by the Golden State Warriors and San Antonio Spurs. The
Spurs in particular have set a standard of excellence dating back from their first NBA title
in 1999 and continuing to dominate through 2014 and their fifth championship. Despite
the relatively small socioeconomic market that the city of San Antonio exists in
(compared to cities such as New York, Los Angeles, and Chicago), the Spurs have
managed to build a dynasty of a basketball team through smart drafting, strong defense,
efficient shooting, and a collection of effective and established veteran players.
On the other hand, Golden State has experienced a meteoric rise from mediocrity
over the last five seasons, as the Warriors have enjoyed a leap from winning 35% of their
games in 2012 to winning 82% of their games in 2015 (and only continuing to improve in
2016). The Warriors have managed to improve by these drastic measures through
devastating three-point shooting, dominant and chaotic defense, and surprisingly strong
rebounding. Although Golden State has mimicked San Antonio’s process of building
through the draft, a key difference between the two is the average age. San Antonio has
had an average age of nearly 29 over the last five seasons, while the Warriors have an
45
average age of just over 25, suggesting that the Warriors will remain a threat to win the
title for several years. Based on the results, management and coaching has had a very
strong influence on the success of both of these teams, as management is in charge of
personnel and drafting, while the coaching staff has been able to optimize the focus on
the floor for each set of players.
References
Berri, D. (December, 1999). Who Is ‘Most Valuable?’ Measuring the Player’s
Production of Wins in the National Basketball Association. Managerial and
Decision Economics, 20 (8). Retrieved from
http://onlinelibrary.wiley.com/doi/10.1002/1099-1468(199912)20:8%3C411::AID-
MDE957%3E3.0.CO;2-G/pdf.
Carmichael, F., Thomas, D., Ward, R. (January, 2000). Team Performance: The
Case of English Premiership Football. Managerial and Decision Economics, 21 (1).
Retrieved from http://onlinelibrary.wiley.com/doi/10.1002/1099-
1468(200001/02)21:1%3C31::AID-MDE963%3E3.0.CO;2-Q/pdf.
Santos, J., Garcia, P., Castro, J. (July, 2006). The Production Process in Basketball:
Empirical Evidence from the Spanish League. International Association of Sports
Economists, Working Paper Series, 06-11. Retrieved from
http://college.holycross.edu/RePEc/spe/Santos_Basketball.pdf.
Silver, N. (2014). Every Team’s Chance of Winning a Title by 2019.
FiveThirtyEight. Retrieved from http://fivethirtyeight.com/features/every-nba-teams-
chance-of-winning-a-title-by-2019/.
Zak, T., Huang, C., Siegfried, J. (July, 1979). Production Efficiency: The Case of
Professional Basketball. The Journal of Business, 52(8). Retrieved from http://0-
www.jstor.org.tiger.coloradocollege.edu/stable/pdf/2352368.pdf?acceptTC=true.
Zech, C. (Fall, 1981). An Empirical Estimation of a Production Function: The Case
of Major League Baseball. The American Economist, 25(2). Retrieved from
http://www.jstor.org/stable/25603335?seq=1#page_scan_tab_contents.
46
Ziller, T. (April, 2013). How Important Are Assists in the NBA? SBNation.
Retrieved from http://www.sbnation.com/2013/4/10/4208428/nba-assists-shooting-
knicks-heat-bulls.
Willard, J. (September, 2015). Shot Blocking Details: Mining 19 Years of Play-by-
Play Data. Nylon Calculus. Retrieved from
http://nyloncalculus.com/2015/09/21/shot-blocking-details-mining-19-years-of-play-
by-play-data/.
Shea, S., Baker, C. (2013). Basketball Analytics. St. Louis, MO: CreateSpace
Independent Publishing Platform.
Shea, S. (2014). Basketball Analytics: Spatial Tracking. St. Louis, MO: CreateSpace
Independent Publishing Platform.
Oliver, D. (2004). Basketball on Paper. Dulles, VA: Potomac Books, Inc.
Perloff, J. (2008). Microeconomics: Theory and Applications with Calculus. Boston,
MA: Pearson Education, Inc.
www.basketball-reference.com.
www.basketball.realgm.com.
www.nba.com.
www.forbes.com/nba-valuations.