production process in the nba: a formula for a …

PRODUCTION PROCESS IN THE NBA:

A FORMULA FOR A SUCCESSFUL TEAM

A THESIS

Presented to

The Faculty of the Department of Economics and Business

The Colorado College

In Partial Fulfillment of the Requirements for the Degree

Bachelor of Arts

By

Jigmei Dorji

May 2016

PRODUCTION PROCESS IN THE NBA:

A FORMULA FOR A SUCCESSFUL TEAM

Jigmei Dorji

May 2016

Economics

Abstract

Achieving success in the National Basketball Association is not only a priceless and

historic feat, but teams that have success in the playoffs and regular season also benefit

from financial bonuses. This paper estimates a production function for professional

basketball teams, and uses the results to determine significant areas of focus that are

positively and negatively associated with regular season win percentage. A Cobb-

Douglas production function and multi-variable Ordinary Least Squares regression

models are applied to data collected from the 2010-11 through 2014-15 seasons in the

National Basketball Association. The results are also applied to successful teams in the

playoffs in order to determine how regular season results translate to the playoffs. The

resulting estimates indicate that successful NBA teams over the last five seasons have

focused on shooting efficiently, keeping opponent shooting percentages low, rebounding,

forcing turnovers at a high rate, and building their teams through the draft.

KEYWORDS: Correlation, Econometrics, Multicollinearity, Multiple Variable Model,

Ordinary Least Squares, Regression, Cobb Douglas, Production Function, Production

Measurement, Sports

JEL CODES: C1, C3, D24, L83

ON MY HONOR, I HAVE NEITHER GIVEN NOR RECEIVED

UNAUTHORIZED AID ON THIS THESIS

Jigmei Dorji

Signature

TABLE OF CONTENTS

ABSTRACT

INTRODUCTION……………………………………...…………………………………1

Financial Incentive………………………………………………………………...2

Area of Focus………………...……………………………………………………2

LITERATURE REVIEW………………………………………………………………....3

Cobb-Douglas Production Functions……………………………………………...6

Basketball Analytics………………………………………………………………7

THEORETICAL FRAMEWORK………………………………………………………...9

Graph 1: Output and Marginal Product of Input X………………………………10

DATA AND METHODOLOGY………………………………………………………...12

Recent Trends in Basketball Statistics…………………………………………...18

Methodology……………………………………………………………………..19

REGRESSION RESULTS AND ANALYSIS…………………………………………..20

Table 1: Net Rating Regression Results…………………………………………20

Graph 2: Net Rating vs. Winning Percentage……………………………………21

Table 2: Play Style Regression Results………………………………………….22

Table 2.1: Offensive Rating Regression Results………………………………...24

Table 2.1a: Field Goal Percentage vs. Assists…………………………………...28

Table 2.2: Defensive Rating Regression Results………………………………...29

Table 3: Fixed Team Effects Regression Results………………………………..30

Table 3.1: Summarization of Win Percentage by Conference…………………...31

Table 4: Play Style and Fixed Team Effects Regression Results………………..33

Table 4.1: Play Style and Fixed Team Effects Beta Coefficients………………..35

Table 4.2: Summarization of Inputs and Output…………………………………37

Table 4.3: Play Style and Fixed Team Effects VIFs……………………………..39

CONCLUSION…………………………………………………………………………..40

REFERENCES…………………………………………………………………………..45

1

Introduction

Every year from October to June for 82 games (with more successful teams

playing close to 100 including the playoffs), 30 teams fight for supremacy in the National

Basketball Association (NBA). But, only one team can call themselves the champions of

the league at the end of the season. Over the last five seasons, four different franchises

have claimed the Larry O’Brien trophy – the Dallas Mavericks, the Miami Heat (twice),

the San Antonio Spurs, and most recently, the Golden State Warriors. The current decade

has been a relatively balanced few years when compared to historical trends. The NBA

has only had ten different teams win a title since 1975, indicating that the league has been

enjoying a spell of competitive balance in the last five seasons.

Led by eventual Most Valuable Player (MVP) Stephen Curry, the 2014-15

Warriors were able to jump out to an early lead in the standings despite playing in an

incredibly competitive Western Conference. Golden State finished the season with a

league-leading record of 67 wins and 15 losses and defeated the Cleveland Cavaliers in

the NBA Finals for their first championship since 1975. Although the supremacy that

Golden State exhibited during the 2015 season is now indisputable, did the Warriors

exhibit distinct types of advantages that allowed them to dominate over the rest of the

NBA? What types of effects did the style of play have on the winning percentage of the

team? How did the personnel affect the record of the team? Did the fixed effects that the

franchise has implemented off the court have an impact on the record on the floor?

2

Financial Incentive

Winning a championship is the unmistakable goal of any NBA franchise and the

players involved. Not only is claiming the Larry O’Brien trophy and hanging up the title

banner priceless and intangibly significant, but teams that have success in the playoffs

benefit from monetary bonuses as well. After winning the 2014 title, the San Antonio

Spurs were awarded over $2 million in bonuses from the NBA, not including the trophy

and championship rings, while the runner-up Miami Heat were also awarded $1.5

million.

The valuations of franchises also increase significantly after winning the

championship. According to Forbes, recent NBA champions have gained an average of

30% in team value after raising the trophy. Teams also raise ticket prices significantly

during their Finals run. In 2015, the average Finals ticket prices in Golden State ran over

$1,200 while the average ticket prices in Cleveland were over $1,300. Compared to the

regular season, when the Warriors charged an average ticket price of $327 and the Cavs

charged an average price of $258, merely playing in the Finals has provided a tangible

financial benefit. In addition, teams that make the playoffs are awarded nearly $200,000

as a bonus, while teams that reach the conference finals are awarded over $380,000. The

team that finishes the regular season with the best record in the NBA and in each

conference is also awarded well over $300,000 as a bonus.

Area of Focus

The length of this study focuses on the past five seasons in the NBA. In particular,

I plan to observe the characteristics of teams since the 2010-2011 season and determine

the inputs that have been significant in affecting regular season success in the past five

3

years of NBA basketball. Due to the makeup of the last four title-winning teams, I expect

that accurate three-point shooting and assisted field goals, as well as defending three-

pointers are indicators of successful team basketball in the NBA. These team

characteristics are related to the style of play that teams consciously employ, but are also

a result of the type of players that each team has available. As far as fixed team effects, I

expect that teams that build through the draft and free agency will have positive

relationships with winning percentage.

I plan to use an extensive list of independent variables in the model, including

variables measuring the output of the team, statistics measuring the style of play the team

utilizes (such as percentage of total shots from three-point range), and additional

variables measuring fixed team effects (such as average attendance and the conference of

the team). After running several regressions and determining significant variables

through the model, the models should be able to describe successful teams from the past

five years through style of play as well as the areas of focus that the team exhibits

through their statistics. After determining the significant inputs in the model, the results

should also be able to predict characteristics of successful teams in the future.

Literature Review

The article “Who is ‘Most Valuable’? Measuring the Player’s Production of Wins

in the National Basketball Association” by David Berri of Managerial and Decision

Economics (Berri, 1999) focused on linking individual performance to team wins in the

NBA. Berri found that although having the MVP certainly helps to produce wins, having

multiple efficient and productive teammates – particularly in the Playoffs – was the key

4

factor in the 1997-1998 season. The model that Berri used to determine each player’s

production of wins was:

Production of wins = (PM + TF + TDF – PA + TA) * total mins (2.1)

Berri first calculated each of the inputs individually, and then combined the factors into

the equation above. His inputs were per-minute player production (PM), per-minute team

tempo factor (TF), per-minute team defensive factor (TDF), average per-minute

production at position (PA), and average player’s per-minute production (TA). The

results that the model produced indicated that one dominant player per team is not

enough to have success against Playoff competition.

Fiona Carmichael, Dennis Thomas, and Robert Ward’s article “Team

Performance: The Case of English Premiership Football” (Carmichael, Thomas, & Ward,

2000) utilized a linear production function where the individual match results were

determined by various input variables. However, a slight variation in inputs that the

authors used compared to the other cases used as background knowledge in this study

was the difference in the types of independent variables. Team performance is still the

variable utilized as output in this scenario, but statistics such as difference in shots on

target, difference in percentage of all successful passes, difference in number of red

cards, difference in clearances, blocks, and interceptions, and the difference in

cumulative team goal differences before the game in question, as well as a number of

other statistics were categorized as inputs for the production function. The results found

that player skills such as accurate and efficient shooting and passing, as well as defensive

skills such as tackles, clearances, and blocks were all significant independent variables in

determining team performance in the Premiership.

5

José M. Sánchez Santos, Pablo Castellanos García, and Jesus A. Dopico Castro

used data from the Spanish league in their article “The Production Process in Basketball:

Empirical Evidence from the Spanish League” (Santos, Garcia, & Castro, 2006). The

authors found that factors such as home-court advantage, field goal and free throw

percentage, keeping turnovers and fouls in check, and defensive rebounds had the highest

marginal effects on the probability of winning a particular game. The authors used two

different models to estimate the probability of winning a game in the Spanish ACB

League. Their first model considered the statistics of the home team versus the away

team in each game in relative terms, finding that the home team won in nearly 62% of the

observed games and that the means for the home team in shooting percentage, assists, and

total rebounds were higher than the visiting team. Their second model specifically

analyzed the influence of home-court advantage on the probability of winning, and used

separate variables for the home team and the visitor. The second model wound up finding

similar significant results to their first model.

Nate Silver of ESPN’s FiveThirtyEight utilized analytic methods in his article

“Every NBA Team’s Chance of Winning A Title by 2019” (Silver, 2014), using current

performance of the team, average age of the team, and the talent level of the best player

on the team to estimate which team had the brightest outlook in the near future. Silver

found that the Golden State Warriors, Los Angeles Clippers, and the Cleveland Cavaliers

were the three teams with the highest probability to win at least one championship by

2019, based on the three factors listed above. When calculating the average age of the

team as well as measuring the talent level of the best player on the team, Silver used

projected wins added – a statistic based on a combination of Win Shares and Player

6

Efficiency Rating (PER) – to weight the team’s average age by performance to help

determine the relative age of the best players on each team, and to determine how many

projected wins each team’s best player accounted for.

Cobb-Douglas Production Functions

Thomas A. Zak, Cliff J. Huang and John J. Siegfried used a production function

in their article “Production Efficiency: The Case of Professional Basketball” (Zak,

Huang, & Siegfried, 1979). Using a Cobb-Douglas production function and data for

individual games during the 1976-77 NBA season, the authors formulated a production

frontier and estimated the impact of various inputs used in the production process. The

variables included in the model were ratio of field goal percentages, ratio of field goal

percentages, ratio of offensive and defensive rebounds, ratio of assists, ratio of personal

fouls, ratio of steals and blocks, ratio of turnovers, and a binary dummy variable for

location (home versus away games), while using the ratio of the final scores as the

dependent variable. The empirical results from their production function found that the

output was most responsive to field-goal percentage, free-throw percentage, and

rebounding. Other variables significantly affecting output were turnovers and personal

fouls. Based on their results, teams playing at home held an observed advantage over

their visiting opponents.

“An Empirical Estimation of a Production Function: The Case of Major League

Baseball” by Charles E. Zech (Zech, 1981) also uses a Cobb-Douglas production function

in order to estimate production of victories by a team in Major League Baseball. Zech’s

model used the major skills involved in baseball, such as batting average, home runs,

stolen bases, ratio of strikeouts to walks, total fielding chances, as well as years with the

7

same manager and manager win percentage, to describe team success in Major League

Baseball. Based on the results, hitting for average is by far the most important factor

contributing to team success, which contradicts the conventional wisdom that pitching is

the most important factor in baseball. The author then used the results to measure the

most valuable player (MVP) in the American League and National League in the MLB in

a particular year by empirically determining the value that each player brings to the team.

Zech used each player’s marginal product, calculated by computing each team’s batting

average, home runs, etc. without each player and using the values in the production

function to determine the number of expected victories the team would have

accomplished without the player. Using the difference between the two values as the

player’s marginal product, the model was able to determine which player added the most

wins to each team in the 1977 season.

Basketball Analytics

Basketball Analytics by Stephen Shea and Christopher Baker (Shea & Baker,

2013) provided insight into the rapidly growing world of basketball analytics and was

used as background research and knowledge for this thesis. Shea and Baker used

traditional statistics to create new stats that attempted to measure players in teams in new

and more effective means. A notable statistic introduced in Basketball Analytics is

Offensive Efficiency (OE), measured as:

OE = (FG + A) / (FGA – ORB + A + TO). (2.2)

Offensive Efficiency is defined as a percentage variable because the formula produces a

higher result when made field goals, assists, and offensive rebounds are higher, while

missed field goals and turnovers bring the resulting value lower. Offensive Efficiency, as

8

well as total points and total assists, is used to create Efficient Offensive Production

(EOP). EOP is used to describe total offensive production but also accounts for efficiency

of the player or team.

Shea and Baker also introduced a defensive statistic that “accounts for defensive

contributions beyond blocks or steals” (Shea and Baker, 2013) called Defensive Stops

Gained (DSG). Statistics such as DSG are important to come up with, as defensive

statistics have traditionally been lacking compared to offensive statistics. DSG is

measured by using net effective field goal percentage, net offensive rebound percentage,

and net turnover percentage, and included several positive per-game constants.

However, the most important statistic Shea and Baker introduced is Approximate

Value (AV). AV is calculated by adding together Defensive Points Saved (or Defensive

Stops Gained * 2) and Efficient Offensive Production in order to describe total

contribution to the team from each player. The resulting statistic is comparable to Player

Efficiency Rating (PER) and Wins Produced (WP) as the most complete measurement of

a player’s total performance.

Stephen Shea’s Basketball Analytics: Spatial Tracking (Shea, 2014) was also used

in this thesis as background knowledge and research. Using the new spatial player

tracking data collected by SportVU, Shea expands on his previous work and describes

new ways to measure performance. Shea shows that the most efficient regions on the

floor to shoot from are the corner three and the restricted area near the basket through

effective field goal percentage, and proves that catch and shoot attempts are far more

efficient than pull-up attempts. Shea was able to predict a team’s effective field goal

percentage and overall offensive efficiency through utilization of drives and catch and

9

shoot corner threes, indicating that drives and kick-outs to corner threes combined have a

positive effect on offensive efficiency (Shea, 2014).

Shea was also able to quantify the spacing in an offense and the stretch of a

defense, and showed the effects that both have in game situations. Using the Miami Heat

and San Antonio Spurs of 2013 and 2014 (both teams made the NBA Finals both

seasons) as examples, spacing was shown to be beneficial on the offensive end, as long as

efficient shooters were on the floor to draw the defense away from the basket. On the

other hand, stretching the defense proved to be disastrous for the defensive team, as

drawing defenders further from the rim allows the offense more space to perform drives

to the restricted area and other actions detrimental to the integrity of the defense.

Theoretical Framework

In this paper, the theoretical background is focused on the production frontier and

the corresponding production function. In terms of an economic or production theory

view, a basketball team can be compared to a competitive firm. Each team has a different

view for a successful team, and in this case, production can be seen as winning

percentage while the various statistics and variables take the place of the traditional

production inputs. The form of the function is the Cobb-Douglas production function:

Q = AXa1X

b2…Xx

n (3.1)

where n is the number of variables involved in the production function, A is a positive

constants, and a, b, and x are the exponents of the function. The Cobb-Douglas form has

several advantages for this type of study, especially since the exponents give relevant

information concerning returns to scale. Breaking down the derivation of the Cobb-

10

Douglas form; if Q1 = AXa1X

b2, and the firm doubles the amount of both variables, the

enterprise produces Q2 = A(2X1)a(2X2)

b = 2a+bAXa1X

b2. Thus, the output increases by

Q2/Q1 = (2a+bAXa1X

b2)/(AXa

1Xb2) = 2a+b. If a + b > 1, the firm (or team, in this case) is

experiencing increasing returns to scale. If the sum of the exponents is equivalent to 1,

the team is exhibiting constant returns to scale. Finally, if a + b < 1, the team is going

through decreasing returns to scale.

Graph 1: Output and Marginal Product of Input X

Another appealing property of the Cobb-Douglas form is the marginal products.

In the Cobb-Douglas production function form, the marginal products of each input

0

1000

2000

3000

4000

5000

0 5 10 15 20 25

Output (Q)

-100

0

100

200

300

400

0 5 10 15 20 25

Marginal Product of Input X

Input X

Input X

11

depend on the levels of other inputs. If the function is still Q1 = AXa1X

b2, the marginal

product of X1 would be:

MPX1 = dQ/dX1 = aAX1-a1

Xb2 , (3.2)

while the marginal product of X2 would be equivalent to

MPX2 = dQ/dX2 = bAXa1X

1-b2. (3.3)

After being broken down into partial derivatives, we can see that the marginal product of

one input depends on both the derivative of output with respect to the input in question,

but also on the value of the other inputs. This is important for this particular study

because in basketball, the number of shots taken and subsequent field goals made by a

team depends on the number of opportunities to possess the ball through rebounds,

forcing turnovers, etc.

The Cobb-Douglas production function also captures elasticity in a convenient

manner. Elasticity is defined as the percentage change in one variable in response to a

given percentage change in another variable while holding all other relevant variables

constant. In this particular form, the elasticity can also be interpreted as the exponents of

the respective inputs. For example, in a traditional Cobb-Douglas production function

Q = AXa1X

b2, if exponent a = 0.2, a 1% increase in X1 would lead to approximately a

0.2% increase in output Q. Finally, the Cobb-Douglas form is a widely used specification

when dealing with production functions. This makes the Cobb-Douglas form familiar to

many and therefore relatively simple to interpret. For this model, the variables not

already categorized in percent values are transformed by the natural log in order to undo

the exponentiation of the Cobb-Douglas production function. This process allows the

exponents to be interpreted as regression coefficients, and also permits the coefficients to

12

be interpreted as the elasticity for each respective input. Using the natural log also

generates a linear rather than quadratic function.

Data and Methodology

For this project in particular, the functional form of the model will be:

Q = X (ORa)(3P%b)(3PAc)(PITPd)(Re)(Af)(FBPg)(TOh)(FT%i)(FTAj)(FG%k)

(%TS3l)(%3FGMAm)(C3%n)(DRo)(OFG%p)(O3P%q)(OTOr)(STLs)(NRt)(DRAFTu)

(TRADEv)(FAw)(FANSx)(AGEy)(CONFz) U, (4.1)

and:

Q = regular season win percentage,

OR = offensive rating, or number of points scored per 100 possessions,

3P% = 3-point percentage,

3PAPG = 3-point attempts per game,

PITPPG = points in the paint per game,

RPG = rebounds per game,

APG = assists per game,

FBPPG = fast break points per game,

TOVPG = turnovers per game,

FT% = free-throw percentage,

FTAPG = free-throw attempts per game,

FG% = field-goal percentage,

%TS3 = percent of total shots from 3-point territory,

%3FGMA = percent of 3-point field goals made assisted by a teammate,

C3% = corner 3-point percentage,

DR = defensive rating, or number of points allowed per 100 possessions,

OFG% = opponent’s field-goal percentage,

O3P% = opponent’s 3-point percentage,

OTOVPG = opponent’s turnovers per game,

13

STLPG = steals per game,

NR = net rating, or the difference in offensive and defensive rating, or OR – DR,

DRAFT = number of players acquired through the draft or draft rights trade,

TRADE = number of players acquired through trade,

FA = number of players acquired through free agency,

FANS = average attendance per game,

AGE = average age of the team,

CONF = dummy variable describing the conference that each team plays in,

and X as a positive constant and U as the error term.

An extensive list of traditional, advanced, and shooting statistics for individual

players, teams, and opponents are tabulated and recorded by the NBA. The database on

NBA.com, as well as the data located on basketball-reference.com and

basketball.realgm.com, provided the data for this project. Data was gathered on the NBA

regular seasons from the 2010-11 season through the 2014-15 season, using per game

averages for the majority of statistics and percentages for shooting statistics.

In the model, I decided to use regular season win percentage as the indicator of

team success, or the output in the production function. Using win percentage rather than

other methods of capturing team success, such as the ratio of final scores or absolute

score differences, is important because it describes the success of the team over the

course of the entire season while showing the consistency of the team. Using win

percentage also accounts for a team’s playing style and does not differentiate between

high and low scoring teams.

Offensive, defensive, and net ratings are important statistics to tabulate, as they

also account for a team’s playing style. Offensive rating is equivalent to the number of

14

points scored per 100 possessions and defensive rating is the number of points allowed

per 100 possessions, which by definition does not differentiate between fast and slow

paced teams. The statistic is more of a measurement of efficiency, and is also known as

offensive/defensive efficiency by ESPN’s John Hollinger. Net rating is the difference

between offensive and defensive rating, and is the measurement of a team’s point

differential per 100 possessions. Over the 150 observations (30 teams over 5 seasons), the

minimum net rating a team accomplished was -15.5 by the historically bad Charlotte

Bobcats in the lockout-shortened 2012 season. The high for net rating over the last five

seasons was 11.4 and occurred in the 2015 season by the defending champion Golden

State Warriors.

Teams in the NBA have increasingly utilized the 3-point shot over the last five

seasons, as the average 3-point attempts per team have gone up in each successive year

from 18 attempts per game in 2011 to 22.4 per game in 2015. However, teams have been

shooting relatively similar percentages over the last five seasons, as league-wide averages

have stayed steady around 35% from beyond the 3-point line. As previously mentioned, I

expect that shooting an above-average 3-point percentage, shooting a high volume of 3-

point shots, and keeping opposing 3-point percentage down are all positive descriptors of

successful team performance. However, when running the model through the regression,

the coefficient estimate for 3-point percentage is expected to be the elasticity of non-

corner 3-point percentage, as corner 3-point percentage is also included as an input.

3-point percentage from the corner is also estimated to be an important factor in

determining win percentage, as the corner three is the shortest 3-point shot and therefore

the most efficient shot from 3-point range. Corner threes are 22 feet from the rim, while

15

3-point shots above the break around the rest of the arc are 23 feet, 9 inches. Compared to

the average 3-point percentage league-wide of 35%, the average corner 3-point

percentage over the last five years has been 38.5%.

Assisted 3-point shots are also expected to be an indicator of successful team

play, as a high number of assisted shots tend to lead to either catch-and-shoot 3-point

shots, open attempts, or both, and generally lead to higher percentage shots. The average

percent of 3-point shots that have been assisted by teammates over the last five years is

about 85%.

The mean for points in the paint have stayed steady over the last five seasons at

just over 41 points per game. However, points in the paint is a strong indicator of a

successful offense as shots close to the basket are the most efficient and effective shots an

offense can generate on a consistent basis.

Rebounds and assists per game are traditional statistics and historically strong

indicators of team success. Offensive rebounds give teams extra possessions, while

rebounds on the defensive end help to end possessions and start offense. Assists are good

signs of ball movement on the offensive side and tend to lead to higher-percentage shots

for teammates. The average number of rebounds per game per team has increased slightly

throughout the five successive seasons in question from 41.4 to 43.3, indicating a slight

increase in pace – or possessions per game – during the same timespan. Meanwhile,

assists per game have had slight peaks and valleys over the last half-decade but have

remained steady at just under 22 assists per game.

Turnovers, opponent’s turnovers, steals, and fast-break points per game are

closely related and are very important in determining easy shots for teammates and for

16

the opposition as well. Although steals are closely correlated with forced turnovers – as a

steal constitutes a turnover – forced turnovers make up more than steals and also include

dead-ball turnovers. However, steals force a live-ball turnover and tend to lead to fast-

break points in transition. I anticipate that forcing turnovers, steals, and fast-break points

have a positive impact on win percentage. On the other hand, turnovers lead to the same

advantage for the opposition and should have a negative impact on team success.

I anticipate that free throw percentage and volume of free throw attempts are

positive indicators of team success, as drawing fouls places the opposition in foul trouble.

The average free-throw percentage over the last five seasons has been about 75%,

indicating that for every 100 possessions of only shooting free throws a team would

achieve an offensive rating of 150, which would be the highest offensive rating in history.

Field-goal shooting (offensively and defensively) is expected to have a positive

(or negative, in opposing field goal percentage) impact on win percentage. In a

hypothetical situation when all other variables are equal, a team that shoots at a superior

percentage from the field than its opposition has a tangible advantage.

Although efficiency statistics such as true shooting percentage and effective field

goal percentage can be useful as well, the results can be skewed when using such

statistics. True shooting percentage and effective field goal percentage are composed of

other statistics (such as field goals, free throw percentage, field goals attempted, and three

point shots made) and when combined with the original statistics in the ordinary least

squares regression model, can bias the results. These types of statistics also do not

differentiate between teams that place focus on scoring in the paint against teams that

shoot a high volume of outside shots. For example, if Team A shot 40 of 100 while

17

making 20 three-point shots and Team B shot 50 of 100 with zero three-pointers made,

both teams would end up with 100 points from the same amount of attempts. Although

this scenario is highly improbable, similar situations can occur and the results have no

way of telling us the style of play that the team employs.

The average attendance per game should have a positive relationship with regular

season winning percentage. However, this may not be a factor of causation but rather of

correlation, as teams that win tend to draw a larger crowd. On the other hand, teams

playing at home have historically held an advantage over their visiting opponents due to

“superior performance by the home team and not preferential treatment by officials”

(Zak, Huang, & Siegfried, 1979). Chicago has led the league in attendance in all of the

last five years at an average of nearly 22,000 per game and has ranked fourth in the

league in win percentage with an average of over 65% over the same span.

The average age is expected to have a positive relationship with success until a

certain point – around 29 – and then is expected to diminish as players age. The

championship teams over the last five seasons have had varying mixes of players

regarding roster composition. The LeBron James-led Miami Heat from 2011 to 2014

were built primarily through free agency, as the Heat had the most free agents in the

league when building rosters from 2011-2013 and ranked second and third in 2014 and

2015, respectively. On the other hand, the San Antonio Spurs have maintained a

consistent core of players acquired on draft day, ranking in the top 10 in the league in the

last five years.

18

Recent Trends in Basketball Statistics

The recent partnership between the NBA and SportVU has dramatically expanded

the range of statistics available to the public, as the system provides new precise data that

would not be possible to gather without the use of SportVU camera technology and

tracking software. As a result, basketball is experiencing a renaissance of sorts with data

and statistics. Prior to player tracking, capturing the dynamic movements within a

basketball game was nearly impossible due to the fluidity and complexity of actions that

take place on the floor. The camera technology that SportVU has now implemented in

every NBA arena follows the ball and every player on the court, providing real-time

player and ball positioning and utilizing advanced statistical algorithms to derive

previously unavailable statistics.

The lack of statistics prior to the NBA’s partnership with SportVU particularly

needed improvement on the defensive side of the game, as steals and blocks have

historically been poor or neutral indicators of defensive ability. Although steals and

blocks are still important to categorize, as they effectively end an opponent’s offensive

possession and can potentially generate transition offense, placing importance on steals

and blocks can incentivize players to gamble rather than preventing their man from

getting to the basket or boxing out for defensive rebounding. As the main objective on the

defensive side of the ball is to prevent the opposition from getting open shots that lead to

made baskets, the player tracking system from SportVU now categorizes defensive

presence with statistics such as opponent’s contested field goal percentage and rim

protection, as well as keeping up with traditional statistics.

19

The NBA’s partnership with SportVU has revolutionized statistics for basketball,

as many new figures are now available to teams as well as the public. However, player

tracking data is only available from the 2013-2014 NBA season onward, as the SportVU

camera technology was only implemented into all NBA arenas in 2013. As this project is

dealing with data from the 2010-11 season through the 2014-15 season, this project will

unfortunately be devoid of player tracking statistics. However, future research can be

conducted with this new technology, as similar ideas can be used with player tracking

data to discover new areas of importance that have not been previously categorized.

Methodology

There are four models in place within the larger construct of this project. All four

models have regular season winning percentage from 2010-11 to 2014-15 as the

dependent variable. The first model is purely used as a reference to the succeeding

models, and consists of net rating as the only independent variable. The second model

will be constrained to the play style statistics, while the third model will consist of the

fixed team effects. The final model will tie both play style and team effects together in

order to gain a picture of a successful franchise, on and off the court.

The final three models will be run through OLS regression several times, using F-

tests in order to determine the significance of the independent variables. Multicollinearity

tests will also be performed, as the large set of interrelated variables suggests that

collinearity is potentially present within the inputs. However, although multicollinearity

does not violate OLS assumptions, acknowledging multicollinearity if present is

important as the effects can bias the resulting coefficient estimates.

20

Regression Results and Analysis

After utilizing the natural logarithm to transform the variables that are not already

categorized as percent values (such as rebounds per game, points in the paint, average

attendance, etc.), the regression coefficient after performing ordinary least squares

regression can be interpreted as the elasticity for each input. However, certain inputs –net

rating, players acquired on draft day, and players acquired via trade – have at least one

singular negative or zero values within the dataset. Therefore, the original value for these

inputs must be used instead of the logged form of the variables, as the natural log of

negative and zero numbers are undefined.

The first model comprises of winning percentage as the dependent variable and

net rating as the single independent variable. Net rating is an exceptionally good predictor

of winning, as it is describes the difference between offensive and defensive efficiency.

This can also be seen as the difference between points scored per 100 possessions and

points allowed per 100 possessions. This first model will be used primarily as a reference

point to the ensuing models to come.

Table 1: Net Rating Regression Results

WP Coefficient: Std. Error: t-score: P-value: 95% CI:

NR 0.0293 0.0006 48.30 0.000 0.0281, 0.0305

X 0.4999 0.0031 159.53 0.000 0.4938, 0.5062

R-squared:

0.9403

Adjusted R2:

0.9399

Residual:

0.2181

Prob > F:

0.0000

Note: WP = winning percentage, NR = net rating, X = positive constant, CI = confidence interval

21

First, we look to the p-value of the F-test. As shown in Table 1, the p-value is

0.0000, indicating that the overall model is statistically significant. In the dataset, the

values for winning percentage are cataloged into three decimal places (50% is logged as

0.500, for example). As shown in Table 1, the coefficient for net rating is 0.0293. This

value indicates that for every point increase in net rating, winning percentage increases

by 2.93 percentage points (from 0.500 to 0.5293, to continue the example). The R-

squared value of the model is at 0.9403 and the adjusted R-squared value is 0.9399,

indicating that the model explains approximately 94 percent of the variability of the data

around the mean. As shown below, net rating and winning percentage have a very strong

positive relationship, and the data points fit the regression line remarkably well.

Graph 2: Net Rating vs. Win Percentage

22

The second model is constrained to the play style variables. The objective behind

keeping this particular model to the play style variables is to help define and describe on-

the-court success while controlling for the actions taking place behind the scenes. It is

important to differentiate the effects of play style variables from the fixed effects of the

franchise before combining the two in the final model, as teams can potentially have

success on the floor without having established fixed team variables in place. Another

reason for keeping this model constrained to play style inputs is to help determine if the

fixed team effects are important and significant to team success on the floor. This model

is attempting to describe the effect that coaching has on winning, as these play style

variables are mostly determined by coaching decisions.

Table 2: Play Style Regression Results


3P% -0.0496 0.3046 -0.16 0.871 -0.6522, 0.5529

FT% 0.0386 0.1585 0.24 0.808 -0.2749, 0.3521

FG% -1.3520 0.7773 -1.74 0.084 -2.8899, 0.1859

%TS3 0.8745 0.5786 1.51 0.133 -0.2702, 2.0192

%3FGMA 0.1261 0.0974 1.29 0.198 -0.0665, 0.3187

C3P% -0.1051 0.1457 -0.72 0.472 -0.3932, 0.1831

OFG% -0.3627 0.6610 -0.55 0.584 -1.6705, 0.9450

O3FG% 0.1481 0.2876 0.51 0.607 -0.4209, 0.7171

logOR 3.5094 0.4504 7.79 0.000 2.6184, 4.4004

log3PAPG -0.2086 0.1310 -1.59 0.114 -0.4677, 0.0506

logPITPPG 0.1296 0 .0697 1.85 0.066 -0.0088, 0.2669

logRPG -0.2308 0.2002 -1.15 0.251 -0.6269, 0 .1653

logAPG 0.0648 0.0593 1.09 0.277 -0.0526, 0 .1821

23

logFBPPG 0.0003 0.0220 0.01 0.990 -0.0432, 0 .0437

logTOVPG 0.0179 0.0884 0.20 0.840 -0.1569, 0.1927

logFTAPG -0.0413 0.0509 -0.81 0.419 -0.1421, 0.0595

logDR -3.0305 0.3121 -9.71 0.000 -3.648, -2.4131

logOTOVPG -0.0786 0.1116 -0.70 0.482 -0.2994, 0.1421

logSTLPG -0.0173 0.0719 -0.24 0.811 -0.1595, 0.1249

X -0.1566 1.9754 -0.08 0.937 -4.0647, 3.7515

R-squared:

0.9470

Adjusted R2:

0.9393

Residual:

0.1936

Prob > F:

0.0000

Note: 3P% = 3-point percentage, FT% = free throw percentage, FG% = field goal percentage, %TS3 =

percent of total shots from 3, %3FGMA = percent of assisted 3-point field goals made, C3P% = corner 3-

point percentage, OFG% = opponent field goal percentage, O3FG% = opponent 3-point percentage, logOR

= natural log of offensive rating, log3PAPG = natural log of 3-point attempts per game, logPITPPG =

natural log of points in the paint per game, logRPG = natural log of rebounds per game, logAPG = natural

log of assists per game, logFBPPG = natural log of fast break points per game, logTOVPG = natural log of

turnovers per game, logFTAPG = natural log of free throw attempts per game, logDR = natural log of

defensive rating, logOTOVPG = natural log of opponent turnovers per game, logSTLPG = natural log of

steals per game

As shown in Table 2, the p-value of the F-test is 0.0000, indicating that the overall

model is statistically significant. Similar to the previous model, the R-squared and

adjusted R-squared values are 0.9470 and 0.9393, respectively, indicating that the data

fits the regression line and that the model explains much of the variability around the

mean. Despite this fit, most of the variables have P-values above 0.05, indicating that

they are not significant at the 95% level. The natural log of offensive rating and the

natural log of defensive rating are the only two inputs that are significant to winning

percentage in this second model, and are statistics that model efficient offenses and

defenses. However, many of the other inputs involved in the model are pieces that are

needed to have high offensive ratings and low defensive ratings. Therefore, if we break

down the play style variables into offensive and defensive categories, the resulting

models should result in significant variables describing offensive and defensive rating.

24

Table 2.1: Offensive Rating Regression Results

OR Coefficient: Std. Error: t-score: P-value: 95% CI:

3P% 34.3029 5.8727 5.84 0.000 22.6894, 45.9166

FT% 21.8690 2.7661 7.91 0.000 16.3989, 27.3392

FG% 129.4964 8.1827 15.83 0.000 113.3145, 145.6782

%TS3 24.7104 9.9754 2.48 0.014 4.9834, 44.4375

%3FGMA -1.8888 2.0296 -0.93 0.354 -5.9024, 2.1249

C3P% -5.8564 3.0983 -1.89 0.061 -11.9836, 0.2706

log3PAPG -0.0408 2.3165 -0.02 0.986 -4.6217, 4.5401

logPITPPG 5.6671 1.2556 4.51 0.000 3.1842, 8.1500

logRPG 9.2363 1.9745 4.68 0.000 5.3317, 13.1410

logAPG -3.1921 1.2620 -2.53 0.013 -5.6877, -0.6965

logFBPPG 0.1197 0.4331 0.28 0.783 -0.7367, 0 .9761

logTOVPG -15.5971 1.1179 -13.95 0.000 -17.8078, -13.3865

logFTAPG 7.8618 0.8213 9.57 0.000 6.2377, 9.4859

X -15.2168 8.3950 -1.81 0.072 -31.8183, 1.3848

R-squared:

0.9467

Adjusted R2:

0.9416

Residual:

98.0228

Prob > F:

0.0000

The p-value of the F-test is 0.0000, signifying that the overall model is

statistically significant. The R-squared and adjusted R-squared values have stayed

relatively similar to Table 2, indicating that the model still explains much of the

variability around the regression mean. However, many of the inputs in Table 2.1 have p-

values less than 0.05, indicating that these variables are statistically significant at the 95%

level. The coefficient values for each of the variables indicates the amount of change

estimated in offensive rating given a one unit change (or one percent change, in the case

25

of the variables transformed by the natural log) in the value of each respective variable,

given that all other variables in the model are held constant.

Percent of 3-pointers made from an assist has a negative coefficient estimate and

a p-value of 0.354, and is therefore insignificant to offensive rating. This result is

somewhat unexpected, as assisted three-point makes should theoretically lead to

uncontested three-point attempts, which tends to be a mark of a good offense. Perhaps a

better variable to tabulate in future research would have been the number of uncontested

three-point attempts a team is able to generate per game, as this new variable would be

able to describe both ball movement and spacing of an offense. However, this type of

variable is produced through SportVU’s player tracking system (explained in Data and

Methodology) and is unavailable for the entire period of this study.

Three-point attempts per game are also insignificant to offensive rating, as the p-

value for the natural log of 3-point attempts per game has a p-value of 0.986. This result

can be explained, as bad offenses can be liable to jack up three-point attempts at a low-

percentage, high-volume rate, while teams with a dominant inside presence can establish

a good offense without a high volume of three-point attempts. An example of a bad

offense that takes a high volume of low percentage three-point attempts is the 2015

Philadelphia 76ers. Although Philadelphia has been accused on numerous occasions of

tanking during the present in order to build for the future, the fact remains that Philly had

the 11th highest rate of three-point attempts per game over all teams during the last five

seasons at over 26 attempts per game while generating a paltry 93 offensive rating, good

for second worst over the same time span. On the other hand, the Memphis Grizzlies of

2011 attempted only 11.3 three-pointers per game, which was the lowest mark over the

26

last five seasons. However, the twin towers of Marc Gasol and Zach Randolph helped

lead Memphis to an offensive rating of 104.4 during the 2011 season, an above-average

mark over the last five seasons. In fact, Memphis has only average 13.4 three-point

attempts per game over the given time frame – lowest in the league – while maintaining a

league-average offensive rating.

Fast break points are also an insignificant input to offensive rating, as the p-value

shows 0.783. This result can be explained through each team’s preference of style, as the

Denver Nuggets, Houston Rockets and Golden State Warriors have combined for 7 of the

15 highest totals in fast break points. On the other hand, the New York Knicks and

Brooklyn Nets have combined for 6 of the 10 lowest totals in fast break points per game

over the last five seasons, all to varying degrees of success.

Corner three-point percentage is both negative and insignificant at the 95% level

(albeit just barely, as the p-value is 0.061), which is the most unexpected result obtained

from Table 2.1. A potential explanation for the input’s lack of significance to offensive

rating is that shooting percentage from the corner does not account for the number of

attempts, particularly open attempts, that an offense is able to generate. In hindsight, a

variable that could potentially have more success predicting offensive rating is the total

number of corner threes and uncontested corner threes that a team attempts per game.

The total number of attempts a team is able to produce from the corner should be

indicative of an offense that spreads the floor and moves the ball, as good defenses tend

to focus on defending the corner three.

Non-corner three-point percentage, field goal percentage, and free throw

percentage are all significant at the 99.9% level and all three inputs have highly positive

27

coefficient estimates. The coefficient results indicate that shooting percentages have very

strong influences on offensive rating, with field goal percentage being the best indicator.

This is not a new result, as good offenses are historically dependent on being able to

score at an efficient percentage from the field and from the free throw line.

Percent of total shots attempted from three-point range has a positive coefficient

of 24.7, indicating that teams that place greater emphasis on three-point attempts in their

shot selection tend to be associated with a more efficient offense. Compared to total

attempts per game from three, this input is achieved through a team’s gameplan and

accounts for the pace and total shot attempts. The variable is also statistically significant

at the 98.6% level, as the p-value is 0.014.

The natural log of points in the paint has a positive coefficient estimate of 5.7,

indicating that for every additional percent increase in points in the paint, offensive

efficiency will likewise increase by approximately 5.7%. This result makes sense in a

basketball perspective, as points in the paint are the closest shots to the basket a team can

produce and are therefore the most efficient. The variable is also statistically significant

at the 99.9% level.

Rebounds per game also have a strong positive connection with offensive rating,

as the beta coefficient for the natural log of rebounds per game is 9.2. The result denotes

a strong association between rebounds and offensive efficiency, as an additional percent

increase in rebounds per game leads to 9.2% increase in offensive rating.

The coefficient for the natural log of assists per game is negative, which is an

unanticipated result. Theoretically, assists would be presumed to have a positive link to

offensive rating, as assists tend to create high percentage shot attempts for teammates.

28

However, assists have been a thorn in the side of the analytics movement in basketball, as

assist percentage (percentage of field goals assisted) has historically had little influence

on field goal percentage (Ziller, 2013). Conceptually, assists should have a significant

and strong positive effect on a team’s shooting percentage.

Table 2.1a: Field Goal Percentage vs. Assists

FGP Coefficient: Std. Error: t-score: P-value: 95% CI:

logAPG 0.1013 0.0152 6.65 0.000 0.0712, 0.1314

X 0.1410 0.0469 3.01 0.003 0.0483, 0.2337

R-squared:

0.2298

Adjusted R2:

0.2246

Residual:

0.0285

Prob > F:

0.0000

Although assists are significant to field goal percentage, the coefficient estimate

for the natural log of assists per game is 0.1. This result indicates that for every percent

increase in assists per game, field goal percentage increase by 0.1%. While the result is

indeed positive, the effect is minimal.

Turnovers per game have a strong negative association with offensive rating, and

the variable is also highly significant. This result is to be expected, as turnovers result in a

loss of possession and forfeits a shot attempt. Free throw attempts per game, on the other

hand, has a strong positive association with offensive rating and is also highly significant.

Teams that attempt a high number of free throws per game are troublesome for opposing

defenses to deal with, as free throws place opposing teams in foul trouble, which tends to

limit minutes of the guilty players. Free throws are also a very efficient source of offense,

as the league average has hovered around 75% over the last five seasons.

29

Table 2.2: Defensive Rating Regression Results

DR Coefficient: Std. Error: t-score: P-value: 95% CI:

OFG% 190.9438 9.2621 20.62 0.000 172.6366, 209.251

O3P% 25.8783 8.2924 3.12 0.002 9.4877, 42.2688

logRPG 1.4735 2.5516 0.58 0.564 -3.5698, 6.5169

logOTOVPG -7.9471 2.9019 -2.72 0.007 -13.6830 -2.2111

logSTLPG 0.6417 1.9624 0.33 0.744 -3.2374, 4.5203

X 25.8060 7.5349 3.42 0.001 10.9127, 40.6993

R-squared:

0.8643

Adjusted R2:

0.8596

Residual:

210.9558

Prob > F:

0.0000

Table 2.2 describes defensive rating as the dependent variable through defensive

statistics. The p-value of the F-test is 0.0000, indicating that the overall model is

statistically significant. The R-squared value is 0.8643, which is lower than the values for

net rating and offensive rating but is still a relatively high value. The adjusted R-squared

is 0.8596, meaning that approximately 86% of the variability of defensive rating is

accounted for by the model, even after taking the number of predictor variables in the

model into account.

Opponent field goal percentage and opponent three-point percentage are the two

strongest predictors of defensive rating, and are both statistically significant at the 95%

level or higher. These two results, compared to steals per game – which is statistically

insignificant – imply that limiting high-percentage shots and keeping the shooting

percentages of opposing teams low is more important to a stingy defense than steals.

As shown above, rebounds per game are not significant statistically. Although the

coefficient is positive, the effect rebounds have on defensive rating is negligible. The

30

variable for rebounds is total rebounds, and is therefore used in both offensive rating and

defensive rating regression models.

Blocks were not used in this study, as the traditional statistic that groups all

blocks into one category has generally been a poor indicator of defensive prowess.

Blocks do not account for a team’s shooting percentage at the rim, nor do they account

for the distance from the basket of the blocked shots attempted. Blocks are also assumed

to begin transition offense or guarantee possession for the team doing the blocking, but

according to Nylon Calculus, “57.2% of all blocked shots were recovered by the defense”

(Willard, 2015). However, expanding blocks into multiple new statistics has the potential

to be a strong descriptor of effective defense with future research.

The third model is constrained to the fixed team effects. The objective behind

keeping this model to the fixed team effects is to determine if play style variables are

important and significant to successful team basketball in the NBA. The question that this

model is attempting to answer is: does the coach or the general manager have a higher

level of accountability when it comes to success on the floor? This third model will

attempt to answer this question by categorizing variables that occur off the court and

determine if these inputs are significant to regular season winning percentage.

Table 3: Fixed Team Effects Regression Results


CONF 0.0435 0.0194 2.24 0.026 0.0052, 0.0819

DRAFT 0.0501 0.0165 3.04 0.003 0.0176, 0.0827

TRADE 0.0327 0.0161 2.03 0.044 0.0009, 0.0646

31

FA 0.0309 0.0159 1.96 0.053 -0.0003, 0.0623

logFANS 0.3216 0.0919 3.50 0.001 0.1399, 0.5031

logAGE 1.1451 0.1821 6.29 0.000 0.7852, 1.5051

X -7.0172 0.8785 -7.99 0.000 -8.7537, -5.2807

R-squared:

0.4755

Adjusted R2:

0.4535

Residual:

1.9170

Prob > F:

0.0000

Note: CONF = dummy variable (0 for eastern conference, 1 for western conference), DRAFT = number of

players acquired on draft day, TRADE = number of players acquired via trade, FA = number of players

acquired via free agency, logFANS = natural log of average attendance per game, logAGE = natural log of

average age of team

As shown above, the p-value of the F-test is 0.0000, indicating that the overall

model is statistically significant. The R-squared and adjusted R-squared values are

0.4755 and 0.4535 respectively, signifying that the model explains just over 45% of the

variability around the mean. Although the R-squared values are approximately half of the

previous two models, the P-values for each input is statistically significant at the 95%

level (excluding FA, which is significant at the 94.7% level). The coefficient for

conference shows that Western Conference teams on average perform at a superior rate

than their Eastern Conference counterparts. Table 4 also shows this result using basic

descriptive statistics:

Table 3.1: Summarization of Win Percentage by Conference

Eastern Conference:

Variable: # of Obs. Mean Std. Dev. Min. Max.

WP 75 0.4681 0.1580 0.106 0.805

Western Conference:

Variable: # of Obs. Mean Std. Dev. Min. Max.

WP 75 0.5318 0.1496 0.195 0.817

32

Shifting the focus back to Table 3, the coefficient value for players acquired on

draft day is the highest amongst the roster composition variables, and is estimated at

0.0501. This result indicates that for every additional player on the roster originally

acquired on the respective player’s draft day – whether the player was drafted by his

current team or if the team traded for his draft rights – winning percentage is estimated to

increase by 5.01 percentage points (from 0.500 to 0.5501, or 50% to 55.01%).

Conceptually, this makes sense in a basketball perspective, as teams that are able to draft

and develop a higher number of players allows for a higher level of continuity and

familiarity between the players and within each team’s respective system.

The San Antonio Spurs are an excellent example of building through the draft and

having continued success. Over the years, the Spurs have made smart draft night

decisions and acquired players such as Tim Duncan, Manu Ginobili, Tony Parker, and

Kawhi Leonard, and have been the most successful team over the last five seasons,

winning over 72% of their games. The number of players acquired through free agency

and the number of players acquired via trade also both have positive coefficient estimates

and are relatively similar in value, but neither variable affects winning percentage at the

same rate as players acquired through the draft.

The coefficient estimate for the natural log of average attendance is 0.3216,

indicating that for every additional percent increase in attendance, output increases by

0.3216 percent. However, based on the results, the model is unable to specify whether a

higher winning percentage causes a higher average attendance or whether the two

variables are positively correlated.

33

The natural log of age has a coefficient estimate 1.1451, indicating that when the

average age of a team increases by 1 percent, the winning percentage of the team is

expected to increase by 1.1451 percent. This result also makes theoretical sense on a

basketball level, as teams with more veterans are more experienced and are more likely to

be competing for a championship. However, the resulting estimate is not indicating that a

team exclusively made up of old veteran players is expected to be more successful. The

span of average ages over the last five years ranges from 23.2 to 31.3, so the coefficient is

estimating success within this range.

The final model is a combination of play style variables and fixed team effects.

Combining the play style and team effects variables should provide insight into the

structure and focus of successful franchises. This final model is attempting to describe

how the front office, coaching staff, and roster collaborate to result in accomplishments

on the court.

Table 4: Play Style and Fixed Team Effects Regression Results


3P% 1.1469 0.3729 3.08 0.003 0.4089, 1.8847

FT% 0.7179 0.1787 4.02 0.000 0.3644, 1.0715

FG% 3.2704 0.6178 5.29 0.000 2.0481, 4.4929

%TS3 3.8434 0.7135 5.39 0.000 2.4316, 5.2552

%3FGMA 0.2354 0.1307 1.80 0.074 -0.0232, 0.4940

C3P% -0.3448 0.1969 -1.75 0.082 -0.7345, 0.0449

OFG% -2.9839 0.5703 -5.23 0.000 -4.1125, -1.8554

O3P% -0.6903 0.3859 -1.79 0.076 -1.4539, 0.0734

34

log3PAPG -0.7743 0.1676 -4.62 0.000 -1.1061, -0.4426

logPITPPG 0.1839 .0969 1.90 0.060 -0.0079, 0.3758

logRPG 0.9821 0.2213 4.44 0.000 0.5442, 1.4199

logAPG -0.0531 0.0806 -0.66 0.511 -0.2125, 0.1064

logFBPPG -0.0197 0.0314 -0.63 0.531 -0.0818, 0.0424

logTOVPG -0.5454 0.0738 -7.39 0.000 -0.6914, -0.3994

logFTAPG 0.1053 00.0546 1.93 0.056 -0.0027, 0.2133

logOTOVPG 0.3979 0.1425 2.79 0.006 0.1159, 0.6799

logSTLPG -0.0470 0.0985 -0.48 0.634 -0.2419, 0.1478

logFANS 0.0436 0.0471 0.92 0.357 -0.0496, 0.1368

logAGE 0.3763 0.0999 3.77 0.000 0.1787, 0.5739

DRAFT 0.0163 0.0081 2.00 0.047 0.0002, 0.0324

TRADE 0.0076 0.0078 0.98 0.329 -0.0078, 0.0230

FA 0.0071 0.0076 0.94 0.350 -0.0079, 0.0220

X -4.8313 1.048 -4.61 0.000 -6.9068, -2.7559

R-squared:

0.9055

Adjusted R2:

0.8891

Residual:

0.3455

Prob > F:

0.0000

The p-value of the F-test is 0.0000, indicating that the overall model is

statistically significant. The R-squared and adjusted R-squared values are 0.9055 and

0.8891 respectively, which indicates that the model explains approximately 90% of the

variability around the regression mean. The coefficient value for the positive constant X

describes the predicted value if the inputs all equal zero, which does not have much

significance for this particular model.

In order to compare the relative strength of each input to one another, we can take

the beta scores of each variable. The beta scores are measured in standard deviations

rather than the units of the original variables, which allow the relative strength of each

35

variable within the model to be compared. The beta scores are essentially the regression

coefficients if the output and inputs were transformed standard scores, or z-scores. As

shown in Table 4.1, the variable with the highest positive coefficient estimate and beta

coefficient is the percent of total shots taken that were three-point attempts. However, the

input with the lowest negative beta coefficient is the natural log of three-point attempts

per game.

Although the results for the highest and lowest coefficient values are initially

confusing, this can potentially be explained by the shot selection of each team.

Conceptually, a team taking a higher number of three-point attempts is more likely to be

taking a higher percentage of lower quality shots, as a higher number of attempts signals

that the team is playing at a faster pace (or number of possessions per game) and is taking

a higher number of total shot attempts. High pace rankings in the past have not translated

to success in the playoffs, as only the 1981-82 Los Angeles Lakers and the 2014-15

Golden State Warriors have ranked in the top five in pace since 1978 and won the

championship. A team that takes a higher number of three-point attempts each game is

also likely to trailing in the overall score of the game and is attempting to make a

comeback. However, a team that incorporates a higher percentage of threes into the total

shots taken per game is theoretically not rushing to take quick shots but instead is

choosing to focus on three-point attempts, rather than midrange shots, which have

historically been less efficient.

Below, Table 4.1 describes each independent variable and the associated beta

coefficient. As mentioned previously, the coefficients are measured in standard

deviations, which allows the variables to be compared amongst one another.

36

Table 4.1: Play Style and Fixed Team Effects Beta Coefficients

WP Beta coefficients:

3P% 0.1468

FT% 0.1345

FG% 0.3291

%TS3 1.1412

%3FGMA 0.0612

C3P% -0.0660

OFG% -0.2725

O3P% -0.0666

log3PAPG -0.9914

logPITPPG 0.1071

logRPG 0.2877

logAPG -0.0253

logFBPPG -0.0265

logTOVPG -0.2463

logFTAPG 0.0720

logOTOVPG 0.2072

logSTLPG -0.0350

logFANS 0.0336

logAGE 0.1646

DRAFT 0.1971

TRADE 0.1018

FA 0.0931

Since the beta coefficients are measured in standard deviations, one standard

deviation increase in three-point percentage results in an estimated increase of 0.1468

standard deviations in win percentage (as shown in Table 4.1). In order to compare the

37

effect of the standard deviations of each respective variable, Table 4.2 summarizes the

list of statistics used in the combined model of play style variables and fixed team effects

and shows the standard deviations and means of each input as well as output. The actual

effect that the beta values are interpreting are based on the values in Tables 4.1 and 4.2.

As shown, one standard deviation in three-point percentage is equivalent to 0.0201, or

approximately two percentage points. Therefore, when three-point percentage increases

by two percent, winning percentage is expected to increase by 0.1468, or 14.68

percentage points of a standard deviation. Since the standard deviation for winning

percentage is 0.1566, or 15.66 percentage points, when three point percentage increases

by one standard deviation of two percent, winning percentage is expected to increase by

2.3 percent.

Table 4.2: Summarization of Inputs and Output

Variable: # of Obs. Mean Std. Dev. Min Max

WP 150 0.5001 0.1566 0.106 0.817

3P% 150 0.3538 0.0201 0.295 0.403

FT% 150 0.7554 0.0293 0.66 0.828

FG% 150 0.4526 0.0158 0.408 0.501

%TS3 150 0.2437 0.0465 0.136 0.392

%3FGMA 150 0.8476 0.0407 0.715 0.934

C3P% 150 0.3848 0.0299 0.319 0.467

OFG% 150 0.4525 0.0143 0.419 0.487

O3P% 150 0.3548 0.0151 0.308 0.411

log3PAPG 150 2.9792 0.2005 2.4248 3.4874

logPITPPG 150 3.7266 0.0911 3.5086 4.0604

38

logRPG 150 3.7449 0.0459 3.6082 3.8607

logAPG 150 3.0758 0.0746 2.9178 3.3105

logFBPPG 150 2.5728 0.2106 2.1282 3.0397

logTOVPG 150 2.6700 0.0707 2.4159 2.8736

logFTAPG 150 3.1338 0.1071 2.8094 3.4372

logOTOVPG 150 2.6691 0.0816 2.4248 2.8565

logSTLPG 150 2.0272 0.1165 1.7047 2.2618

logFANS 150 9.7595 0.1207 9.5079 10.0061

logAGE 150 3.2789 0.0685 3.1442 3.4436

DRAFT 150 5.0467 1.8943 0 10

TRADE 150 4.4600 2.0939 0 11

FA 150 5.1600 2.0598 1 10

Below in Table 4.3, the variance inflation factors (VIFs) of each input are

described in descending order. VIFs measure the multicollinearity of the set of regression

variables. Multicollinearity can be a problem, particularly when the number of inputs is

high (as in this model), since having independent variables that are closely related can

bias the precise results of the individual inputs. VIFs above 10 may be especially

indicative of multicollinearity, while a tolerance closer to one means that collinearity is

not an issue. As shown below, the number of three-point attempts and the percentage of

total shots taken from three-point territory are very closely related and are likely affected

by multicollinearity. The number of players acquired through trades, free agency, and the

draft are also potentially affected by multicollinearity.

Although multicollinearity is clearly present in this model, the collinear variables

are all representing individual value to the overall model and will therefore remain in the

model. Although multicollinearity remains a problem for regression models such as this

39

one, having multicollinearity in the model does not violate Ordinary Least Squares

assumptions, as multicollinearity does not affect the overall fit of the model nor does

having it present in the model result in inadequate prediction estimates.

Table 4.3: Play Style and Fixed Team Effects Variance Inflation Factors (VIFs)

Variable: VIF: Tolerance (1/VIF):

log3PAPG 61.89 0.016158

%TS3 60.30 0.016585

TRADE 14.54 0.068771

FA 13.26 0.075431

DRAFT 13.02 0.076787

logTOVPG 7.40 0.135161

logSTLPG 7.21 0.138732

logRPG 5.65 0.177144

FG% 5.19 0.192593

logPITPPG 4.27 0.233973

OFG% 3.64 0.274424

3P% 3.06 0.326633

logAGE 2.56 0.390164

logFBPPG 2.39 0.418371

logAPG 1.98 0.505467

C3P% 1.91 0.523245

logFTAPG 1.87 0.534404

O3P% 1.87 0.536135

logFANS 1.77 0.564224

%3FGMA 1.55 0.645506

FT% 1.51 0.664381

logOTOVPG 1.49 0.670415

40

Although multicollinearity is clearly present in this model, the collinear variables

are all representing individual value to the overall model and will therefore remain in the

model. Although multicollinearity remains a problem for regression models such as this

one, having multicollinearity in the model does not violate Ordinary Least Squares

assumptions, as multicollinearity does not affect the overall fit of the model nor does

having it present in the model result in inadequate prediction estimates.

Conclusion

As shown in Tables 4 and 4.1, the model found that independent variables with

the greatest statistically significant positive effect on win percentage are shooting

percentages (including field goal, three-point, and free throw percentages), percent of

total shots taken as three-point attempts, total rebounds, forcing turnovers, increased age

and experience (to a certain limit), and the number of players acquired through the draft.

On the other hand, the inputs with the most detrimental negative effects on winning are

opponent field goal percentage, total three-point attempts per game, and turnovers.

In order to tell if the resulting variables have an effect that has translated to

success in the playoffs, the inputs that significantly affected regular season win

percentage will be compared to the teams that ranked highly in these statistics over the

last five seasons. Three-point percentage translates very well to playoff success, as three

of the top six teams over the last five seasons are the 2015 Golden State Warriors (2),

2014 San Antonio Spurs (3), and the 2013 Miami Heat (6). All three of these teams were

eventually crowned as NBA Champions in each respective season. Field goal percentage

is also crucial to playoff success, as six of the top ten teams were NBA Finalists in their

41

respective seasons. All ten of the top ten teams also finished with a win percentage of

65% or above. The top ten in free throw percentage is dominated by the Oklahoma City

Thunder teams from 2013, 2011, 2014, and 2012, the Portland Trail Blazers of 2014,

2011, 2015, and 2012. Although neither of the two teams has won a title within this span,

both have experienced stellar regular season records, as OKC has averaged a stellar win

percentage of over 70% for an average record of 58 wins and 24 losses during the four

seasons above and Portland has averaged a very respectable win percentage of 57.25%

for an average record of 47 wins and 35 losses.

Percentage of total shots from three was proven to be the most effective predictor

of win percentage in the regular season amongst the independent variables in Tables 4

and 4.1, and has had a positive effect on playoff success as well. 20 of the top 25 teams

ranked by this statistic were playoff teams, but the 2015 Golden State Warriors (13) were

the only team in this set to win it all. However, the 2015 Cleveland Cavaliers (6) and the

2014 Miami Heat (21) were both participants in their respective Finals, and the 2015

Houston Rockets (1) and 2015 Atlanta Hawks (9) finished as conference finalists.

Rebounds per game are a category that is clearly important to team success and

has been proven to have a positive effect on regular season winning percentage.

However, the 2015 Golden State Warriors were the only team in the last 5 seasons to

rank top 10 in rebounding during the season and win the title in the same year. On the

other hand, rebounding is a statistic that teams place differing amounts of importance on,

as rebounding is particularly dependent on the personnel and capabilities of the team. For

example, the Miami Heat ranked last in the league in rebounding in each of the last three

seasons, but won the NBA Finals in 2013 and finished as the runner-up in 2014. For a

42

team like Miami that does not emphasize rebounding, the alternative is to focus on

different aspects of the game in order to make up for their lack in rebounding.

Forcing turnovers has had a positive effect on winning percentage because teams

that force a high number of turnovers give themselves a better opportunity to win the

game based on the sheer number of possessions. As turning the ball over decreases the

number of possessions available to even attempt a shot, forcing turnovers has the same

effect on the opposition while creating extra possessions for the team. For each season,

the 2015 Golden State Warriors and the 2013 and 2014 Miami Heat teams finished in the

top five in forced turnovers per game and won the NBA championship. However, turning

the ball over has been proven to have a negative effect on winning, since turnovers

concede possessions.

Due to the increased experience and establishment that comes with older NBA

veteran players, having a team consisting of an average age of 29-31 provides a positive

effect to winning. Having an older team also likely indicates that the players are all fully

developed and in their primes, rather than having the inexperience that youthful players

bring to the team. Older teams have been successful in the playoffs in the last five

seasons, as the 2011 Dallas Mavericks, 2012 and 2013 Miami Heat, and the 2014 San

Antonio Spurs all ranked in the top five during their respective seasons in average age,

and all four teams went on to win the title at the end of each season.

The San Antonio Spurs have ranked in the top ten in the league for each of the

last five seasons in the number of players on the roster acquired through the draft and are

essentially the model for consistency and success, averaging a win percentage of 73%, or

nearly 60 wins per year, over the last five seasons. A potential reason for their success is

43

the number of valuable players that the Spurs acquired on draft day, such as Tim Duncan,

Tony Parker, Manu Ginobili, and Kawhi Leonard.

Other teams that have had a consistently high number of players acquired on draft

day are: the Oklahoma City Thunder, who have also ranked in the top ten every season

over the last five and have averaged a win percentage of 68%, or nearly 56 wins per

season; the Chicago Bulls, who have finished first, twelfth, fifth, and tenth in the league

over the last four seasons and have a win percentage of 62.5%, or over 51 wins per

season; the Portland Trail Blazers, who have finished first twice, fifth, and second in four

of the last five seasons, and finished with an average win percentage of 57% in those four

seasons; and the Golden State Warriors, who finished amongst the bottom ten teams in

the league in 2011 and 2012 (and won 43.9% and 34.8% of their games, respectively),

but have finished in the top ten over the last three seasons and have won an average of

67% of their games during this span.

Holding opponents to a low field goal percentage has been proven to be the best

indicator of a strong defense (as far as traditional statistics can describe) and also has a

significantly positive effect on regular season win percentage. However, this statistic can

also be very indicative of playoff success, as all four of the championship teams in the

last five seasons have finished in the top ten in opponent field goal percentage during

their respective title-winning seasons.

Based on the results found in the model and displayed in Table 2.1, an efficient

offense must shoot the ball well from the field, as well as the free throw line and three-

point line, factor in a high percentage of their total shots as three-point attempts, rebound

the ball well, limit turnovers, and get to the free throw line often. A strong defensive team

44

forces a high amount of turnovers while keeping the field goal percentage of the

opposition low, as shown in Table 2.2. Meanwhile, teams in the Western Conference

were shown to historically outperform their Eastern Conference counterparts, teams that

build through the draft have had higher winning percentages than teams that focus more

on building through free agency and through trades, and teams with more age and

experience tend to perform at a higher level than those made up of younger and less

experienced players, shown in Table 3.

Based on the regression results from these models, a prototype for teams to follow

should be the standards set by the Golden State Warriors and San Antonio Spurs. The

Spurs in particular have set a standard of excellence dating back from their first NBA title

in 1999 and continuing to dominate through 2014 and their fifth championship. Despite

the relatively small socioeconomic market that the city of San Antonio exists in

(compared to cities such as New York, Los Angeles, and Chicago), the Spurs have

managed to build a dynasty of a basketball team through smart drafting, strong defense,

efficient shooting, and a collection of effective and established veteran players.

On the other hand, Golden State has experienced a meteoric rise from mediocrity

over the last five seasons, as the Warriors have enjoyed a leap from winning 35% of their

games in 2012 to winning 82% of their games in 2015 (and only continuing to improve in

2016). The Warriors have managed to improve by these drastic measures through

devastating three-point shooting, dominant and chaotic defense, and surprisingly strong

rebounding. Although Golden State has mimicked San Antonio’s process of building

through the draft, a key difference between the two is the average age. San Antonio has

had an average age of nearly 29 over the last five seasons, while the Warriors have an

45

average age of just over 25, suggesting that the Warriors will remain a threat to win the

title for several years. Based on the results, management and coaching has had a very

strong influence on the success of both of these teams, as management is in charge of

personnel and drafting, while the coaching staff has been able to optimize the focus on

the floor for each set of players.

References

Berri, D. (December, 1999). Who Is ‘Most Valuable?’ Measuring the Player’s

Production of Wins in the National Basketball Association. Managerial and

Decision Economics, 20 (8). Retrieved from

http://onlinelibrary.wiley.com/doi/10.1002/1099-1468(199912)20:8%3C411::AID-

MDE957%3E3.0.CO;2-G/pdf.

Carmichael, F., Thomas, D., Ward, R. (January, 2000). Team Performance: The

Case of English Premiership Football. Managerial and Decision Economics, 21 (1).

Retrieved from http://onlinelibrary.wiley.com/doi/10.1002/1099-

1468(200001/02)21:1%3C31::AID-MDE963%3E3.0.CO;2-Q/pdf.

Santos, J., Garcia, P., Castro, J. (July, 2006). The Production Process in Basketball:

Empirical Evidence from the Spanish League. International Association of Sports

Economists, Working Paper Series, 06-11. Retrieved from

http://college.holycross.edu/RePEc/spe/Santos_Basketball.pdf.

Silver, N. (2014). Every Team’s Chance of Winning a Title by 2019.

FiveThirtyEight. Retrieved from http://fivethirtyeight.com/features/every-nba-teams-

chance-of-winning-a-title-by-2019/.

Zak, T., Huang, C., Siegfried, J. (July, 1979). Production Efficiency: The Case of

Professional Basketball. The Journal of Business, 52(8). Retrieved from http://0-

www.jstor.org.tiger.coloradocollege.edu/stable/pdf/2352368.pdf?acceptTC=true.

Zech, C. (Fall, 1981). An Empirical Estimation of a Production Function: The Case

of Major League Baseball. The American Economist, 25(2). Retrieved from

http://www.jstor.org/stable/25603335?seq=1#page_scan_tab_contents.

http://onlinelibrary.wiley.com/doi/10.1002/1099-1468(199912)20:8%3C411::AID-MDE957%3E3.0.CO;2-G/pdf

http://onlinelibrary.wiley.com/doi/10.1002/1099-1468(199912)20:8%3C411::AID-MDE957%3E3.0.CO;2-G/pdf

http://onlinelibrary.wiley.com/doi/10.1002/1099-1468(200001/02)21:1%3C31::AID-MDE963%3E3.0.CO;2-Q/pdf

http://onlinelibrary.wiley.com/doi/10.1002/1099-1468(200001/02)21:1%3C31::AID-MDE963%3E3.0.CO;2-Q/pdf

http://college.holycross.edu/RePEc/spe/Santos_Basketball.pdf

http://fivethirtyeight.com/features/every-nba-teams-chance-of-winning-a-title-by-2019/

http://fivethirtyeight.com/features/every-nba-teams-chance-of-winning-a-title-by-2019/

http://0-www.jstor.org.tiger.coloradocollege.edu/stable/pdf/2352368.pdf?acceptTC=true

http://0-www.jstor.org.tiger.coloradocollege.edu/stable/pdf/2352368.pdf?acceptTC=true

http://www.jstor.org/stable/25603335?seq=1#page_scan_tab_contents

46

Ziller, T. (April, 2013). How Important Are Assists in the NBA? SBNation.

Retrieved from http://www.sbnation.com/2013/4/10/4208428/nba-assists-shooting-

knicks-heat-bulls.

Willard, J. (September, 2015). Shot Blocking Details: Mining 19 Years of Play-by-

Play Data. Nylon Calculus. Retrieved from

http://nyloncalculus.com/2015/09/21/shot-blocking-details-mining-19-years-of-play-

by-play-data/.

Shea, S., Baker, C. (2013). Basketball Analytics. St. Louis, MO: CreateSpace

Independent Publishing Platform.

Shea, S. (2014). Basketball Analytics: Spatial Tracking. St. Louis, MO: CreateSpace

Independent Publishing Platform.

Oliver, D. (2004). Basketball on Paper. Dulles, VA: Potomac Books, Inc.

Perloff, J. (2008). Microeconomics: Theory and Applications with Calculus. Boston,

MA: Pearson Education, Inc.

www.basketball-reference.com.

www.basketball.realgm.com.

www.nba.com.

www.forbes.com/nba-valuations.

http://www.sbnation.com/2013/4/10/4208428/nba-assists-shooting-knicks-heat-bulls

http://www.sbnation.com/2013/4/10/4208428/nba-assists-shooting-knicks-heat-bulls

http://www.basketball-reference.com/

http://www.basketball.realgm.com/

http://www.nba.com/

production process in the nba: a formula for a …

Documents