sports analytics in the era of big data and data science
TRANSCRIPT
![Page 1: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/1.jpg)
SPORTS ANALYTICS IN THE ERA OF BIG DATA AND DATA SCIENCE KONSTANTINOS PELECHRINIS
@kpelechrinis https://412sportsanalytics.wordpress.com
![Page 2: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/2.jpg)
![Page 3: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/3.jpg)
DATA-DRIVEN COACHES?
![Page 4: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/4.jpg)
DATA-DRIVEN FRONT OFFICES?
![Page 5: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/5.jpg)
WHY NOW?
➤ Data analysis & use of statistics is not new in sports!!
➤ Now we have the technology to collect many more detailed information about the game
➤ Detailed box score
➤ Play-by-play data
➤ Player tracking
![Page 6: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/6.jpg)
TRACKING
![Page 7: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/7.jpg)
RESOURCES
Some of the examples
are taken from this book
![Page 8: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/8.jpg)
SPORT MARKETS
➤ A typical business or firm operates with the objective of profit maximization
➤ This might not be the case for the owner of a professional sports team!!
➤ For profit year by year
➤ Maximize wins
➤ Capital appreciation
![Page 9: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/9.jpg)
SPORT MARKETS
➤ Becoming the dominant player is not the goal in sports industry
➤ If a team were assured of victory in almost any competition the whole league would be of little - if at all - interest
➤ Competitive balance
➤ Salary cap!
➤ Draft!
![Page 10: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/10.jpg)
SPORT MARKETS
●
● ●
●
● ●●
●●●●
●●
●●
● ●
●●●●
●● ●
●● ●
●●
●
LAA
BAL WSN
LAD
STL DET
SFGPIT & OAK
CLENYY
TORMIL
ATL
MIA
CHC PHI
BOSMIN
TEXCOL
ARI
KCR
SEA
NYM
SDP & TBRCIN
CHW
HOU
40
45
50
55
60
50 100 150 200 250Team Payroll (Millions of Dollars)
Perc
enta
ge o
f Gam
es W
on
Correlation coef=0.26p-value = 0.16!
Only 6% of the win/loss percentage is
explained by the payroll differences!
![Page 11: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/11.jpg)
RANKING TEAMS
➤ Team performance is central to sports data science
➤ Ratings and rankings
➤ Challenges
➤ Imbalance in team schedules
➤ Win/Loss percentages does not consider strength schedule
![Page 12: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/12.jpg)
RANKING TEAMS
➤ Network-based solution
➤ Win/loss directed network
➤ PageRank
![Page 13: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/13.jpg)
RANKING TEAMS
![Page 14: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/14.jpg)
RANKING TEAMS
![Page 15: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/15.jpg)
RANKING TEAMS
➤ Unidimensional scaling
➤ Matrix of how many times each team beats the other
➤ Transform to proportions, average across rows or columns and standardized it
➤ Automatic adjustment for schedule strength
![Page 16: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/16.jpg)
RANKING TEAMS
NYK PHIMINLAL ORL
SACCHA DENDETIND MIABOS MILBKN UTAPHXNOP WASOKC TORCHI PORDALCLE MEMLAC SASHOUATL
GSW
0
200
400
600
ATL BKN BOS CHA CHI CLE DAL DEN DET GSWHOU IND LAC LAL MEM MIA MIL MIN NOP NYK OKC ORL PHI PHX POR SAC SAS TOR UTA WAS
Ran
king
Sco
re
![Page 17: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/17.jpg)
COACHING DECISIONS
➤ Evidence-based coaching
➤ Go for the 4th down or not?
➤ Go for the 2-point conversion or take the cheap shot?
➤ Shoot for three to win or shoot for two to tie the game?
➤ …
➤ We can now quantify the rationality of coaches!
![Page 18: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/18.jpg)
COACHING DECISIONS
![Page 19: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/19.jpg)
COACHING DECISIONS
OR
![Page 20: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/20.jpg)
COACHING DECISIONS
E[p]= 2* - 1*
15
14
14
15
9
24
12
13
13
1624
21
10
17
21
11
12
14
11
12
10
14
9
16
22
6
14
5
14
22
1218
-0.50
-0.25
0.00
0.25
0.50
ARI ATL BAL BUF CAR CHI CIN CLE DAL DEN DET GB HOU IND JAC KC MIA MIN NE NO NYG NYJ OAK PHI PIT SD SEA SF STL TB TENWAS
Exp
ecte
d P
oint
Gai
n
![Page 21: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/21.jpg)
COACHING DECISIONS
![Page 22: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/22.jpg)
COACHING DECISIONS
Touchback
-2
-1
0
1
2
3
0 25 50 75 100Distance to the goal line when 4th down
Exp
ecte
d po
ints
gai
ned
![Page 23: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/23.jpg)
COMPUTATIONAL GAME MODELS
![Page 24: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/24.jpg)
COMPUTATIONAL GAME MODELS
-1.0
-0.5
0.0
0.5
1.0
Q1 Q2 Q3 Q4Quarter
Rat
io r
QuarterQ1
Q2
Q3
Q4
0.00
0.01
0.02
0.03
0.04
0 20 40 60Time (minute)
Turn
over
Den
sity
![Page 25: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/25.jpg)
COMPUTATIONAL GAME MODELS
Bootstrap
BB
Historical game data
Correlationmatrix
LogisticRegression
Model
x1111,· · ·· · ·,xB1B1
x1212,. . .. . .,xB2B2
P1P1
P2P2
H0 : P1 = P2H0 : P1 = P2
H1 : P1 6= P2H1 : P1 6= P2
P1 � P2P1 � P2
pp-value
Mean accuracy=0.627 Mean accuracy=0.787
Mean accuracy=0.517 Mean accuracy=0.6
0.00
0.25
0.50
0.75
1.00
8 9 10 11 12 13 14 15 16 17Week
Accuracy
Legend text
2014
2015
![Page 26: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/26.jpg)
LEAGUE CHANGES
➤ Can we predict and/or evaluate the impact of a rule change?
➤ What if we move the three point line further away?
➤ What was the impact of the new PAT rule?
➤ Will the new touchback rule give an advantage to the offense?
![Page 27: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/27.jpg)
LEAGUE CHANGES
Should the 3-point line be moved further away?
![Page 28: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/28.jpg)
LEAGUE CHANGES
![Page 29: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/29.jpg)
LEAGUE CHANGES
![Page 30: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/30.jpg)
SPORTS MARKETING
➤ Sports are part of the entertainment market
➤ Marketing decisions can always benefit from good data!
➤ What price should the ticket have?
➤ What team-branded merchandise should you sell?
➤ Does a swag promotion justify a higher ticket price?
➤ What is the best strategy for national branding?
➤ …
![Page 31: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/31.jpg)
SPORTS MARKETING
➤ Case study: Consumer preferences for Dodger’s stadium seating
➤ Conjoint analysis
➤ Product profiles
➤ Consumers rank the products
➤ Ranking reveals their preference
![Page 32: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/32.jpg)
SPORTS MARKETING
Part worths (i.e., regression coefficients),
reflect the strength of consumer preferences
for each level of each product attribute.
![Page 33: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/33.jpg)
SPORTS MARKETING
➤ Can we use these results to assess willingness for a consumer to pay for tickets?
➤ $20 tickets have part-worth of 3.25, while $95 tickets have part-worth of -3.50.
➤ Difference in part-worth is 6.25, which in terms of $ this corresponds to $75
➤ 1 part-worth is worth $11.11 to the consumer
➤ For this consumer we see that the part-worth differential between a loge seat and a field seat is 2.75
➤ This consumer is willing to spend 2.75*11.11=$30.55 for a field seat compared to a loge seat
![Page 34: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/34.jpg)
PROMOTING BRANDS & PRODUCTS
![Page 35: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/35.jpg)
PROMOTING BRANDS & PRODUCTS
= a* + b* + c* + d
![Page 36: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/36.jpg)
PROMOTING BRANDS & PRODUCTS
![Page 37: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/37.jpg)
DATA SOURCES
➤ There are various websites where you can get data
➤ Mainly aggregate statistics, boxscores etc
![Page 38: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/38.jpg)
DATA SOURCES
➤ Flexibility —> play-by-play data
➤ Major leagues provide an API
➤ Sport enthusiast have created libraries to access them
Case study: NFLgame in Python
https://github.com/BurntSushi/nflgame
![Page 39: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/39.jpg)
DATA SOURCESgames = nflgame.games(2015,week=1,kind=‘REG’)
>>> games [<nflgame.game.Game object at 0x107652210>, <nflgame.game.Game object at 0x107652310>, <nflgame.game.Game object at 0x107652410>, <nflgame.game.Game object at 0x107652510>, <nflgame.game.Game object at 0x107652610>, <nflgame.game.Game object at 0x107652710>, <nflgame.game.Game object at 0x107652810>, <nflgame.game.Game object at 0x107652910>, <nflgame.game.Game object at 0x107652a10>, <nflgame.game.Game object at 0x107652b10>, <nflgame.game.Game object at 0x107652c10>, <nflgame.game.Game object at 0x107652d10>, <nflgame.game.Game object at 0x107652e10>, <nflgame.game.Game object at 0x107652f10>, <nflgame.game.Game object at 0x107d02050>, <nflgame.game.Game object at 0x107d02150>]
>>> games[0].home u'NE' >>> games[0].away u'PIT' >>>
>>> games[0].score_home 28 >>> games[0].score_away 21
![Page 40: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/40.jpg)
DATA SOURCES
>>> for i in games[0].drives: ... print i ... PIT (Start: Q1 15:00, End: Q1 09:40) Missed FG NE (Start: Q1 09:40, End: Q1 07:41) Punt PIT (Start: Q1 07:41, End: Q1 03:14) Punt NE (Start: Q1 03:14, End: Q2 11:11) Touchdown PIT (Start: Q2 11:11, End: Q2 08:38) Missed FG NE (Start: Q2 08:38, End: Q2 04:01) Touchdown PIT (Start: Q2 04:01, End: Q2 00:03) Field Goal NE (Start: Q2 00:03, End: Q2 00:00) End of Half NE (Start: Q3 15:00, End: Q3 10:37) Touchdown PIT (Start: Q3 10:37, End: Q3 06:43) Touchdown NE (Start: Q3 06:43, End: Q3 04:15) Punt PIT (Start: Q3 04:15, End: Q4 11:39) Field Goal NE (Start: Q4 11:39, End: Q4 09:20) Touchdown PIT (Start: Q4 09:20, End: Q4 08:29) Punt NE (Start: Q4 08:29, End: Q4 07:29) Punt PIT (Start: Q4 07:29, End: Q4 07:00) Interception NE (Start: Q4 07:00, End: Q4 02:59) Punt PIT (Start: Q4 02:59, End: Q4 00:02) Touchdown NE (Start: Q4 00:02, End: Q4 00:00) End of Game
![Page 41: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/41.jpg)
DATA SOURCESplays = nflgame.combine_plays(games) >>> for p in plays: ... print p ... (NE, NE 35, Q1) S.Gostkowski kicks 65 yards from NE 35 to end zone, Touchback. (PIT, PIT 20, Q1, 1 and 10) (15:00) De.Williams right tackle to PIT 38 for 18 yards (D.Hightower). (PIT, PIT 38, Q1, 1 and 10) (14:21) B.Roethlisberger pass short right to A.Brown pushed ob at PIT 47 for 9 yards (D.Hightower). (PIT, PIT 47, Q1, 2 and 1) (14:04) De.Williams right guard to NE 49 for 4 yards (J.Collins; M.Brown). (PIT, NE 49, Q1, 1 and 10) (13:26) B.Roethlisberger pass short right to H.Miller to NE 35 for 14 yards (J.Mayo). (PIT, NE 35, Q1, 1 and 10) (12:42) (Shotgun) De.Williams right guard to NE 24 for 11 yards (J.Collins). (PIT, NE 24, Q1, 1 and 10) (12:05) A.Brown sacked at NE 32 for -8 yards (M.Brown). (PIT, NE 32, Q1, 2 and 18) (11:20) (Shotgun) De.Williams right end pushed ob at NE 28 for 4 yards (D.Hightower). PENALTY on PIT-M.Gilbert, Offensive Holding, 10 yards, enforced at NE 32 - No Play. (PIT, NE 42, Q1, 2 and 28) (10:53) W.Johnson right guard to NE 36 for 6 yards (R.Ninkovich). NE-D.Easley was injured during the play. He is Out. (PIT, NE 36, Q1, 3 and 22) (10:28) (Shotgun) B.Roethlisberger pass short right to H.Miller to NE 26 for 10 yards (P.Chung; M.Butler). (PIT, NE 26, Q1, 4 and 12) (9:44) J.Scobee 44 yard field goal is No Good, Wide Right, Center-G.Warren, Holder-J.Berry. (NE, NE 34, Q1, 1 and 10) (9:40) (Shotgun) T.Brady pass short left to J.Edelman pushed ob at NE 47 for 13 yards (W.Gay). PENALTY on NE-N.Solder, Unnecessary Roughness, 15 yards, enforced between downs. (NE, NE 32, Q1, 1 and 10) (9:14) (Shotgun) T.Brady pass short left to D.Lewis to NE 44 for 12 yards (J.Harrison). (NE, NE 44, Q1, 1 and 10) (9:00) (No Huddle, Shotgun) T.Brady pass short left to D.Lewis ran ob at PIT 43 for 13 yards. (NE, PIT 43, Q1, 1 and 10) (8:31) (No Huddle, Shotgun) T.Brady pass incomplete short right to R.Gronkowski. (NE, PIT 43, Q1, 2 and 10) (8:27) T.Brady pass incomplete deep right to D.Amendola. (NE, PIT 43, Q1, 3 and 10) (8:22) (Shotgun) T.Brady sacked at PIT 43 for 0 yards (B.Dupree). (NE, PIT 43, Q1, 4 and 10) (7:48) R.Allen punts 36 yards to PIT 7, Center-J.Cardona, fair catch by A.Brown. (PIT, PIT 7, Q1, 1 and 10) (7:41) De.Williams left guard to PIT 13 for 6 yards (A.Branch; G.Grissom). (PIT, PIT 13, Q1, 2 and 4) (7:07) De.Williams left tackle to PIT 12 for -1 yards (C.Jones). (PIT, PIT 12, Q1, 3 and 5) (6:26) (Shotgun) B.Roethlisberger pass short left to A.Brown pushed ob at PIT 22 for 10 yards (D.McCourty). (PIT, PIT 22, Q1, 1 and 10) (5:54) De.Williams right guard to PIT 26 for 4 yards (R.Ninkovich). PENALTY on PIT-K.Beachum, Illegal Formation, 5 yards, enforced at PIT 22 - No Play. (PIT, PIT 17, Q1, 1 and 15) (5:29) (Shotgun) B.Roethlisberger pass short right to A.Brown to PIT 20 for 3 yards (J.Collins). (PIT, PIT 20, Q1, 2 and 12) (4:48) B.Roethlisberger sacked at PIT 14 for -6 yards (D.Hightower). (PIT, PIT 14, Q1, 3 and 18) (4:03) (Shotgun) B.Roethlisberger pass deep left to H.Miller to PIT 31 for 17 yards (D.McCourty; T.Brown). (PIT, PIT 31, Q1, 4 and 1) (3:25) J.Berry punts 50 yards to NE 19, Center-G.Warren. D.Amendola to NE 34 for 15 yards (V.Williams). PENALTY on NE-M.Slater, Illegal Block Above the Waist, 10 yards, enforced at NE 20. (NE, NE 10, Q1, 1 and 10) (3:14) D.Lewis left tackle to NE 18 for 8 yards (W.Allen). (NE, NE 18, Q1, 2 and 2) (2:40) D.Lewis up the middle to NE 19 for 1 yard (M.Mitchell). (NE, NE 19, Q1, 3 and 1) (2:05) T.Brady up the middle to NE 20 for 1 yard (L.Timmons; S.McLendon). (NE, NE 20, Q1, 1 and 10) (1:14) D.Lewis left end pushed ob at NE 25 for 5 yards (L.Timmons). PENALTY on NE-N.Solder, Offensive Holding, 10 yards, enforced at NE 20 - No Play. (NE, NE 10, Q1, 1 and 20) (:45) (Shotgun) T.Brady pass short left to A.Dobson to NE 19 for 9 yards (W.Gay). (NE, NE 19, Q1, 2 and 11) (:12) (Shotgun) T.Brady pass short left to J.Edelman to NE 28 for 9 yards (C.Allen). ….
![Page 42: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/42.jpg)
What does all this mean for me?
![Page 43: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/43.jpg)
Work = Fun
![Page 44: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/44.jpg)
BUT…
➤ Good understanding of fundamentals of statistics and probabilities
➤ Ability to work with APIs and data
➤ Python, R, MySQL
➤ Of course domain knowledge
![Page 45: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/45.jpg)
![Page 46: Sports Analytics in the Era of Big Data and Data Science](https://reader034.vdocument.in/reader034/viewer/2022050613/58aace011a28ab2f728b5eed/html5/thumbnails/46.jpg)