columbia university baseball analytics case competition
TRANSCRIPT
The Three Most Valuable Position Players in MLB
University of Florida
Ronnie SocashRJ Walsh
Tanner CrouchDanny Lueck
Table of Contents
1.Preliminary Process – Find the original pool of players
2.Statistical Analysis – Projecting Future Performance
3.Market Value – Comparable players; predict future contracts
4.Risks – Identify potential pitfalls
5.Identification – Five Most Valuable Position Players
6.Case Study – Analysis of the final cut
7.The Final Three
Preliminary Process
• Step 1: Identify the original pool of players– Position player WAR leaders– Future and recent top prospects
• Step 2: Build a database of 20 possible players– Name, Team, Age, Seasons Played, Games Played, WAR
over last three seasons (if applicable)
Projecting Future WAR
R2 = .91
Database of players from 1986 - 2016
WAR = -3.9352 + 36*(age) - .0062*(age2)
Statistical Analysis
Apply the percentage change from year to year to each player
Use the average of the last two seasons of WAR for each player
Example: Nolan Arenado
Age at 2017 Opening Day: 25
2015 WAR: 4.5
2016 WAR: 5.2
(5.2 + 4.5) / 2 = 4.9 WAR
2017 projection: 4.9 * (slope change from “24 to 25”)
= 4.9 * (1.049073) = 5.1 WAR
2018 projection: 5.1 * (slope change from “25 to “26”)
= 5.1 * (1.036205) = 5.3 WAR
Process repeated through the 2021 season
Converting WAR Into Dollars
$7 million per marginal win in 2017
5% increase each year.
Nolan Arenado:–2017: 5.1 predicted WAR * ($7,000,000) = $35,616,028.35–2018: 5.3 predicted WAR * ($7,350,000) = $38,750,766.66–Continue through 2021–Add all five dollar figures to calculate total 5-year future value in terms of WAR–Over next 5 seasons: 26.7 WAR and $207,276,574.20 Production Value
Projecting Future Salary
Step 1: Identify Comparable Players
Comparable Player Qualifications:
-Same Position
-Within 3 years of age at time of breakout/peak
- Peak/Breakout seasons within 1.5 WAR
League Projected Salary Increase
Similar to the Qualifying Offer, we used the average of the top 125 contracts over the last 10 years
We then created a linear model to predict future increases
• R2 value of .96
• Avg. Salary = -1.14B + 573,300*(year)
Example: Manny MachadoComparable Players: Pablo Sandoval and Kyle Seager
• Machado entered 1st year of arbitration after 6.8 WAR season
• Sandoval entered 1st year of arbitration after 5.3 WAR season
• Seager entered 1st year of arbitration after 5.4 WAR season
• Machado (24), Sandoval (25), Seager (27)
• All play third base
Projecting Arbitration
• Use the comparable player salaries during similar career points
• Calculate the percentage of the QO that players made in those years
• Adjust the current player’s salary to reflect the percent change
Final Adjustment
If Machado’s production was equal to Seager or Sandoval, we would expect Machado to receive same percentage of the QO.
In order to adjust for the difference in production of the current player, we then multiplied the projected salary by the percent difference in WAR over the three previous seasons.
Sample Size AnalysisA balancing act between younger and older players
Younger – can be paid at a discount, but, less of a track recordOlder – command more money, but, proven track record
Variance formulaMean = (career WAR) / (career seasons played)Divided by (# of games played) – 1
Fielding ProfilePlayer UZR DRS
Lindor 20.8 17
Betts 17.8 32
Bryant 5.3 (3B) / 6.2 (OF) 9
Trout -0.3 6
Seager 10.6 0
The players that derive the most value from their defense are Francisco Lindor and Mookie Betts.
FanGraphs uses UZR as its main component of WAR, and all five players are within 20 runs of each other.
Assuming the value of one win is 10 runs, the most WAR defense would likely account for is two wins.
Defensive Metric Variability
According to the FanGraphs glossary, there is a high level of variability in UZR. For example, UZR is given a five-run error range in either direction. Therefore, a UZR of +10 could be either +5 or +15.
Because FanGraphs’ WAR rating uses defensive metrics that are less exact to evaluate, we believe that offensive performance should be weighted more heavily than defense in terms of predicting future performance.
2016 Player Profiles
Player Soft % Med % Hard % wRC+ wOBA
Trout 12.0 46.3 41.7 171 .418
Bryant 17.0 42.7 40.3 149 .396
Seager 12.7 47.6 39.7 137 .372
Betts 17.4 49.2 33.4 135 .379
Lindor 17.2 55.2 27.5 112 .340
Mike Trout has the best wRC+, Hard Hit %, and wOBA
There seems to be a slight correlation between Hard Hit % and wRC+
Accounting for Changes in the Game
In late May, ESPN reported that MLB’s Competition Committee agreed to raise the strike zone.
The Strike Zone is to be moved to the top of the hitter’s knees from “the hollow beneath the kneecap” currently.
With fastball velocity continuing to increase, we believe that this will result in more pitchers challenging hitters up in the zone.
The “bottom of the zone” will no longer belong to pitchers.
Tying It All Together
We believe that Corey Seager and Kris Bryant are going to be the most positively affected by an upward shift in the strike zone.
Francisco Lindor is the player most likely to be negatively affected by the strike zone shift.
We do not believe Mike Trout will be particularly affected in any drastic way due to his five full seasons of high-level performance. If pitchers have not figured him out by now, there is no indication that they will.
Mike Trout
#3
1. TOTAL SURPLUS VALUE:$236,765,429.42
2. Four years of Team Control
3. Best Hard Hit %, wOBA, and wRC+
4. Projected 50.7 WAR between 2017-2021
1. TOTAL SURPLUS VALUE: $258,096,445.52
2. Five years of Team Control
3. 2nd best Hard Hit % and 2nd highest wRC+
4. Projected 41.4 WAR from 2017-2021
Kris Bryant
#2
1. TOTAL SURPLUS VALUE: $311,915,858.51
2. Five years of Team Control
3. 3rd best Hard Hit % and 3rd highest wRC+
4. Projected 47.2 WAR from 2017-2021
Corey Seager
#1