chapter five – offensive evaluation · web viewchapter five – offensive evaluation the...

117
1 Chapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and predicting individual players’ offensive performance, independent of the performance of their teammates. Introduction For years, when I opened the Sunday sports section of the newspaper during the baseball season, I quickly turned to the columns listing (almost) up-to-date performance data for that year’s teams and regular players. Both teams and players were ranked by batting average, explicitly suggesting that batting average was the best measure of offensive performance. However, a team does not win a game because it has a higher batting average in that game than their opponent. A team wins a game because it scores more runs than their opponent. A game in which the team with more hits scores fewer runs and loses is not a rarity. The point is that the goal of a team’s offense is not to achieve a gaudy batting average, but to score a lot of runs. Of course, the offensive skill measured by batting average, the ability to get hits on batted balls, is a major contributing factor to prolific run scoring. However, there are two other offensive skills that also make significant contributions to run scoring. One is the ability to coax bases on balls from opposing pitchers, and the other is the ability to turn one’s hits into extra bases. One thing that is noteworthy about these latter two skills is that the variation among regular players is far greater than that for batting average. In this day and age, regular players rarely hit below .240 or above .360; thus, there is about a fifty percent difference between high- and low-end base-hitting skill. In contrast, the most successful power hitters knock out 80 extra base hits annually, whereas the least achieve less than half that amount. Even more striking, there are players who

Upload: others

Post on 25-Nov-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

1

Chapter Five – Offensive Evaluation

The objective of this chapter is to describe methods that have been proposed for measuring and predicting individual players’ offensive performance, independent of the performance of their teammates.

Introduction

For years, when I opened the Sunday sports section of the newspaper during the baseball season, I quickly turned to the columns listing (almost) up-to-date performance data for that year’s teams and regular players. Both teams and players were ranked by batting average, explicitly suggesting that batting average was the best measure of offensive performance. However, a team does not win a game because it has a higher batting average in that game than their opponent. A team wins a game because it scores more runs than their opponent. A game in which the team with more hits scores fewer runs and loses is not a rarity. The point is that the goal of a team’s offense is not to achieve a gaudy batting average, but to score a lot of runs.

Of course, the offensive skill measured by batting average, the ability to get hits on batted balls, is a major contributing factor to prolific run scoring. However, there are two other offensive skills that also make significant contributions to run scoring. One is the ability to coax bases on balls from opposing pitchers, and the other is the ability to turn one’s hits into extra bases. One thing that is noteworthy about these latter two skills is that the variation among regular players is far greater than that for batting average. In this day and age, regular players rarely hit below .240 or above .360; thus, there is about a fifty percent difference between high- and low-end base-hitting skill. In contrast, the most successful power hitters knock out 80 extra base hits annually, whereas the least achieve less than half that amount. Even more striking, there are players who regularly get 120 walks per years while others get only 20, a 600 percent difference. It stands to reason that a player who gets 80 extra base hits or 120 walks is making a far greater contribution to team run scoring than a player who gets 30 extra base hits or 20 walks. Therefore, any serious measure of offensive performance must include all three of these skills.

There is a further problem with using batting average as an indicator of player skill. Batting average is largely a function of the number of singles that a player hits, and that number can vary widely from year to year. As a consequence, BA is not as consistent from year to year as other offensive indices. Studies of year-to-year correlations in batting average include Schall and Smith (2000b; an average correlation of only .38 for batting averages in a data set including virtually all twentieth century position players with at least 50 at bats, with a range of .18 [1990-1991] to .63 [1905-1906]), Harder (1991; .32 across 1977 through 1980), Panas (2010; .43 for 2000-2001, 2002-2003, and 2004-2005), and Baumer and Zimbalist (2014; .41 for 1995 through 2001). In contrast, Harder noted year to year correlations of .67 for home runs per at

Page 2: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

2

bat, and Schutz (1995), examining performance during four five-year periods (1928-1932, 1948-1952, 1968-1972, and 1988-1992) for each batter with at least 100 at bats during every year in one of those periods, uncovered considerably more consistency in a set of power-related indices (home runs, slugging average, RBIs) than average-related indices (batting and on-base averages and runs scored).

In fact, there seems to be a clear demarcation between indices that are and are not consistent across seasons. Using as their data set all batters with at least 200 at bats for consecutive seasons between 1969 and 2003, someone writing in the 2004 Baseball Prospectus noted strong year-to-year correlations for some basic performance indices (strikeouts, .84; stolen bases, .83; walks and home runs, each .75) but lower ones for others (singles and batting outs, each .58; hits, .45; triples, .41; doubles, .38). Along the same lines, based on 350 plate appearances in at least two consecutive seasons between from 2002 and 2004, J. C. Bradbury and David Gassko (2006) replicated the above for strikeouts (.84), walks (.71), home runs, (.76) and home runs per fly ball (.77; see Panas, 2010, for almost identical numbers in the 2000-2005 interim), along with hit by pitches (.71), ground balls (.73), and fly balls (also .73), although surprisingly neither line drives (.10) nor home runs per line drive (.37). However, the year-to-year correlations for singles and double /triples resulting from fly balls (.03 and .22 respectively), ground balls (.16 and .14 respectively), or line drives (.11 and .14 respectively) were quite low. In a follow-up using 2003 through 2006 data with the same 350 inclusion rules, Gassko (2007b) replicated much of the above along with showing that consistency on batting average on outfield flies (.52) was somewhat lower, and batting average on ground balls (.22) was far lower than hitting flies or grounders in the first place. Along the same lines, Baumer and Zimbalist (2014) noted correlations of only .34 for batting average on balls in play but .74 for isolated power, with the latter strongly influenced by number of home runs. Some of their other numbers (.84 for strikeouts per at bat, .77 for walks per at bat) were also consistent with the just-discussed earlier work. The implication is that batter skill influences whether they hit the ball and how they hit it, but, assuming it is a batted ball in play, batter skill relatively very little influence on whether it becomes a hit. Baumer and Zimbalist make a nice argument in that vein: batting average is a function of three influences - home runs per at bat, strikeouts per at bat, and hits on balls in play per at bat - and while the first two of these are fairly stable from year-to-year, the latter unstable component is responsible for about 70 percent of the batting average figure. In the same vein, Wolfersberger and Yaspan (2015) claimed that Russell Carleton noted that measures tended to stabilize faster and thus be better predictors to the extent that the relevant outcome is under batter control. Thus, strikeout rates stabilized at 60 plate appearances, whereas the triple slash measures SA (320 PAs), OBA (460PAs), and BA (910 PAs) take far longer.

In an analogous vein, Willie Runquist (1999) computed a reliability figure for batting measures, based on the proportion of variance among players that was not random as measured by pooled variance among at bats for players compared to

Page 3: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

3

variance across players for 1996. Note the consistency among these figures and those above despite the difference in computational method:

Measure BA OBA SA ISO OPS SBA.L. .52 .73 .73 .82 .68 .92N.L. .37 .68 .70 .73 .64 .67Measure 1B 2B 3B HR BBA.L. .64 .41 .39 .84 .85N.L. .58 .38 .46 .80 .87

It is a shame that Willie did not include strikeouts in the analysis, as the reliability would probably have been high. McShane, Braunstein, Piette, and Jensen (2011) considered the across-season consistency of 50 diverse offensive measures for 1575 players across 8598 player-seasons between 1974 and 2006 and determined that indicators of contact rate (specifically, strikeouts per plate appearance), speed (Bill James’s Speed Score index, described in the Strategy chapter), power (isolated power, although a couple of home runs measures also sufficed), batting eye (walks per plate appearance) and ground ball versus fly ball tendencies (number of either per plate appearance) were the most consistent, rather than measures relevant to batting average per se.

Jim Albert (2005, 2006b, 2007) tried to distinguish how much of different performance measures are based on ability versus luck, with findings quite consistent with Bradbury/Gassko. Jim proposed an analytic technique conceptually analogous to the types of parametric statistical procedures social scientists work with, which decomposes the total variance in performance across pitchers into two components: the explainable “systematic” variance, in this case representing batters’ true skill, and the unexplainable “error” variance, in this case representing random fluctuation/luck. In the 2005 piece, through analyzing standard offensive statistics for 2003 batters, Jim observed batter strikeouts to be the most strongly ability-based, followed by home run rate, ability to get walks, batting average on balls in play, overall batting average, rate of doubles + triples, and, last and based the most on luck, ability to hit singles. These conclusions for SO, HR, BB, and BABIP were basically replicated for 2005 in the 2006b article. The crux of the matter is that both OBA and SA are more closely associated with runs scored than batting average. All of these considerations have led the Baseball Prospectus group to refer collectively to strikeouts, walks, and home runs as the Three True Outcomes, because they measure actual batter skill far more accurately than any measure influenced by batting average on balls in play.

For this reason, indices that combine batting average with measures of power and walking skill have been proposed. The two best known are on-base average, which combines the ability to get hits with the ability to get walks (and be hit by pitches, which a few players truly excel at), and slugging average, which fuses the ability to get hits with the ability to get extra bases on those hits. (As these figures are typically

Page 4: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

4

presented in the form of averages [e.g., getting on base 4 times in 10 plate appearances yields an OBA of .400] and not percentages [in which case it would be 40.0%], I will refer to them throughout this book by the former term rather than the latter despite popular usage. I will be doing the same for pitcher and team winning averages and fielding averages.) The roots of on-base average go back to an 1879 index called “reached first base,” but was resurrected so to speak by the Dodgers’ 1950s statistician Allan Roth and detailed in an article attributed to his boss Branch Rickey that was published in 1954 and described below. Its modern popularity can be attributed to Pete Palmer’s (1983a) discussion. Baumer and Zimbalist (2014) calculated a year-to-year correlation of .53 for OBA, which is higher than for BA. Earlier, Ben Baumer (2008) had demonstrated that the reason that on-base average is more closely associated with runs scored than batting average is because the impact on the measures of hits per balls in play, which varies considerably from year to year due to random processes, is greater for BA than for OBA.

Incidentally, in the same article Roth/Rickey introduced Isolated Power, measured by bases gained beyond first base on hits (1 for doubles, 2 for triples, and 3 for home runs), which is a purer measure of raw power than slugging average and very consistent; in the study mentioned earlier, Panas (2010) noted year-to-year correlations of .76 and, as noted earlier, Baumer/Zimbalist observed it as .74. It is a problematic measure, because, as Reuter (1982b) showed, it decreases with a single because a single increases the denominator by one at bat but has no impact on the numerator. He proposed the following adjustment:

doubles + 2 (triples) + 3 (home runs)divided by

at bats minus hits

which is analogous to

slugging average minus batting averagedivided by

1 minus batting average

Indices that combine all three skills, either by summing slugging and on-base average (on-base plus slugging, or OPS) or multiplying them, are even more closely related with run scoring, and, happily, are becoming more commonly referred to in newspaper and on-air reporting with each passing season. Any serious attempt at measuring offensive performance must begin with some combination of the three major skills. Rickey/Roth did produce such a measure, the Batting Rating, consisting of OBA + ¾ISO, the fraction representing a slightly smaller relative impact for power. However, OPS’s year-to-year correlation in Baumer and Zimbalist’s (2014) analysis was .57, again not particularly high. Nonetheless, as we will see soon, Rickey/Roth were very

Page 5: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

5

much on the right track. Note that I’ve thus far not said anything about stolen bases as an offensive

weapon. This is because in general stolen bases do not have much of an impact on run scoring. There are occasional exceptions when a truly outstanding base thief is in his prime; but these are the exceptions rather than the rule. Nonetheless, stolen bases and caught stealing should be included in a comprehensive index for measuring a player’s offensive performance.

Perhaps, however, this entire tactic is wrongheaded. If the goal is to score runs, and one wants to measure a player’s contribution to a team’s success in this area, then perhaps one should consider a player’s runs batted in and/or runs scored. Indeed, players with high totals in those areas are usually significant contributors to their team’s offense. For this reason, “runs produced” (RBI plus runs scored minus home runs) has been used as a performance measure from time to time. However, note the very first sentence in this chapter; we wish to measure player offensive ability independently of their teammates’. A player’s number of runs batted in and runs scored is fundamentally dependent on their teammates. With the exception of home runs, a player can neither score nor drive in a run unless there is another player to respectively drive them in or be on base when they hit. There are a lot of examples of players whose individual performance was quite ordinary but who boasted gaudy runs batted in totals due to batting with an inordinate number of base runners, or runs scored totals due to batting in front of particularly prolific hitters. There are also quite a few cases of players with excellent individual indices but relatively few runs scored (due to poor hitters behind him) or RBIs (common among leadoff hitters). In truth, a player’s number of runs scored and runs batted in are both strongly influenced by their lineup position. For this reason, it makes more sense to evaluate a player’s runs scored and batted in the context of opportunity by, for example, dividing RBI by the number of baserunners aboard during a batter’s plate appearances.

All of the above is not say that there is no skill involved in hitting singles; to say that there is not is to imply that the difference between Tony Gwynn and Rob Deer was due to luck. In addition, a majority of most players’ hits are singles. For this reason, hitting for average must be included in any serious attempt at measuring offensive performance, along with the other fundamental offensive skills of hitting for power, collecting walks and hit by pitches, and stealing bases without being caught while ignoring raw runs scored and runs batted in totals. In the following sections, I will describe a large number of these attempts, classify them according to their most basic assumptions, and report on studies that have compared their degree of association with run scoring in an attempt to determine their relative value. I have divided them into the following categories:1 – Methods based on run expectancy tables. In short, these use run expectancies from the set of base-out situations to estimate the relative values of a set of offensive measures (e.g., home runs, walks, stolen bases) and then use these values to compute an overall measure of offensive performance through a regression equation. In a

Page 6: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

6

regression equation relevant to offense in baseball, each of the set of offensive measures is multiplied by a weight (the regression coefficient) indicating the contribution that measure makes to the index representing the batter’s offensive performance. The products are then summed together. In addition, a constant is usually either added or subtracted to this sum. The final total is the number of runs credited to the batter. The form of the equation is

Number of runs = constant + (weight) home runs + (weight) walks + (weight) steals … [other indices such as number of singles, double, triples, hit by pitches, etc.]

Weights can be negative, resulting in products that subtract from predicted runs; this would be the case for number of times caught stealing and number of times grounding into double plays. As with all measures, in order to evaluate their accuracy, one can compare the number of runs that the formula predicts for given players with their actual formula; the closer the prediction, the better the measure.2 – Bottom-up regression methods. These differ from top-down regression methods (to be described next) by calculating values (regression coefficients) for each measure (e.g., home runs) through determining the relationship between each measure and total run production while holding the other measures (singles, doubles, etc.) constant. One typically begins with the total number of singles, doubles, etc. hit by entire leagues in a given season, along with total runs scored. One then can calculate regression coefficients representing the impact of each offensive measure relative to runs scored. The resulting equation is then used to estimate the offensive performance of individual players. Bottom-up regression methods run into two problems. First, using an equation that works well for league run production for evaluating individual players presumes that team run production is a simple additive function of player offensive performance; in other words, add up the players and you have the team. This is almost but not quite true, because there will be at least a small “assembly effect” for team offensive performance over and above what the players provide individually. A player who gets on base 40 percent of the time will use up fewer outs and thus allow for a greater number of at bats for subsequent hitters in the lineup than a player who gets on base 30 percent of the time, allowing those subsequent hitters opportunities to score additional runs for their team. Thus the impact of the former on team run production is found not only with their own performance but with that of subsequent hitters. As such, a player on a good hitting team will have more plate appearances and thus produce more runs than the same player on a poor hitting team. The accuracy of these methods can be excellent, however, which demonstrates that assembly effects are generally fairly small. They tend to be a problem for teams with either unusually high or low levels of offensive performance.

A second problem with both run expectancy and bottom-up regression methods is often overlooked by those who propose and advocate them. The regression coefficients in any bottom-up regression method are based on data from a specific era in baseball. Eras of course differ significantly in their run-scoring environment, and this

Page 7: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

7

has a big impact on the relative size of the coefficients. For example, a home run is “really” worth the number of runs that it scores, anywhere from 1 to 4. In an era with a relatively high on-base average, the typical home run will be hit with more runners aboard and thus be worth more runs that homers in low on-base average times. Analogously a single will be worth more in a high on-base average environment because it is more likely to either drive in a run or end with its batter scoring on a later hit than in a low on-base average environment. It is not unusual for those who propose bottom-up regression methods in particular to claim that their method is better than others, using as evidence the relative predictive performance among the methods compared. This “demonstration” is circular and demonstrates nothing, because by definition any bottom-up regression method will be the best for representing the data on which it is based. It will certainly do worse than others when applied to different seasons. Tom Tango has discussed this issue in some length in articles on the tangotiger.com website.3 – Top-down regression methods. The first two methods begin with data and compute values for offensive measures based on these data. Top-down methods begin with a theoretical conception of the game and base regression equations on that conception. Compared with the previous two types, they will usually be somewhat poorer at representing offensive performance for a given season, but easier to understand and use and, relatively speaking, they should be less affected by differences in baseball era.4 – A team of a given player. This is a poor title for a method that imagines that all of a team’s at bats were assigned to a specific player, and estimates the number of runs that team would then score.5 – Comparison to replacement level. Rather than provide a measure of a player’s performance independently of other players, these attempt to estimate the number of runs or wins an offensive player would contribute to their team relative to a “replacement level player.” A replacement level player is an abstract representation of the weakest regular player in major league baseball, for which there is likely to be a better player either on a major league bench or high-level minor league team who, in an ideal world, would replace him. 6 – A set of miscellaneous methods, some of value and some not.I’ve not attempted here to be totally complete; see Malcolm (1999) for some not included here.

Page 8: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

8

Methods Based on Run Expectancy Tables

Run Producing Average and its Progeny

In the chapter on the inning, I presented George Lindsey’s (1963) table relevant to the run expectancy for various base-out situations. Lindsey noted later in that article how one can use these data in order to calculate the overall number of runs that a given event will supply on average. At that time, Lindsey’s presented the following equation:

.41 “single” + .82 double + 1.06 triple + 1.42 home run

and used it to evaluate the 1960 performance of Harvey Kuenn, Al Kaline, Rocky Colavito, and Harmon Killebrew. However, Lindsey’s “single” included all circumstances in which hitters reach first base, in so doing not distinguishing actual singles from walks and hit by pitches. In later reviews of his work (1977, 1994), Lindsey named this method Run Producing Average (RPA) and credited walks and HBPs with .33 runs and singles with .46 runs.

Mark Pankin’s (1978) Offensive Performance Average (OPA) piggybacks on and purposely simplifies Lindsey’s work. His equation is

Singles + 2 (doubles) + 2.5 (triples) + 3.5 (home runs) + .8 (walks and hit by pitches) + .5 (stolen bases).

divided byat Bats + walks + hit by pitches

Over the years 1965 through 1975, OPA measured at the team level correlated with team runs per game at .957, significantly higher than batting average (.794), slugging average (.906), and a top-down method proposed by Earnshaw Cook called DX and described below (.891), and equally well to a “team of a given player” method by Cover and Keilers, Offensive Earned Run Average, also described below (.958). Mark also mentioned the possibility of adding to the numerator coefficients for outs at bat (at bats minus hits; -.65) and caught stealing (-.75), which together raised the correlation to .963.

The most well-known offense evaluation method belonging in this category is Pete Palmer’s Batting Runs, due to its featured status in Thorn and Palmer (1984). Pete got the coefficients through simulations based on all major league games played between 1901 and 1977. The original formula was:

.46 (singles) + .80 (doubles) + 1.02 (triples) + 1.40 (home runs) + .33 (walks + hit by pitches) + .30 (stolen bases) - .60 (caught stealing) – .25 (outs at bat) - .50 (outs on

base)

Page 9: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

9

When working with older data, caught stealing and outs on base are often unavailable. Given that stolen bases are not useful without the corresponding caught stealing information, the formula became:

.47 (singles) + .78 (doubles) + 1.09 (triples) + 1.40 (home runs) + .33 (walks + hit by pitches) – .25 (outs at bat)

In the 2006 Baseball Encyclopedia, Pete acknowledged that the value of each event would differ between high and low offense contexts. With more data to work with, his formula changed to:

.47 (hits) + (.85 doubles) + 1.02 (triples) + 1.40 (home runs) + .33 (walks + hit by pitches) – (ABF x outs at bat)

with ABF standing for “average batting factor.” ABF can be computed by dividing the formula above (except for that final term) by outs at bat (see Costa, Huber, and Saccoman, 2008 for this newer version). In a personal communication, Pete told me that the replacement of singles with hits freed him from having to calculate the number of singles whenever he used the formula.

Due to the absence of detailed data, all of the above (as is much of this work) had to be based on the assumption that the likelihood of events is independent of base-out situation, which is obviously false. In 2017, Pete Palmer used 1946-2015 Retrosheet data to determine the current formula:

.453 (single) + .752 (double) + 1.038 (triple) + 1.413 (home runs) + .31 (unintentional walks) + .157 (intentional walks) -.241 (outs)

The figure for unintentional walks is an estimate, as Pete actually provided a combined intentional/unintentional value of .298. The reason that intentional walks are so much lower than unintentional is that the former tend to occur in circumstances in which their impact of runs is less, particularly with runners on second, third, or both those bases, occurring in more than two percent of relevant cases (the highest is 2nd and 3rd with one out; more than 12 percent). IBBs are given in fewer than one percent in all other circumstances (thanks to a personal communication from Pete helping to explain all of this).

Phil Birnbaum (1999a), based on simulations using the team-as-a-given-player method, i.e., seeing how many runs would be scored by a lineup of Barry Bonds’es versus a lineup of Mario Mendozas and then comparing that total to the Batting Runs estimate for that player transformed into a runs per game total, concluded that Batting Runs works well for the average player (those who if given all their teams at bats would score from 3 to 7 runs per game) but underpredicts the very poor (Mendoza) and very good (Bonds). Soon after (2000b), Phil showed that the formula does not do a good job

Page 10: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

10

of predicting the outcomes of games; it badly underpredicts runs scored when a team scores more than 11 or fewer than 2 runs, and overpredicts runs totals in-between. As will be noted later in this chapter, this is also the case for other such indices. This is because, as described earlier, run scoring is not a linear function of hitting. For example, it would not be surprising for a team to score one run if it got five hits. But maintaining that five-to-one ratio quickly becomes absurd. Two runs scored on ten hits does happen, but is noticeably underproductive. How about three runs on fifteen hits? Four runs on twenty hits? Runs happen when hits (and walks, and extra bases) do not occur randomly over innings but are bunched together.

One can divide Batting Runs for a given season by the calculated number of runs it takes for a team to win an additional game that season (which varies a bit from year to year, but is usually about 10) to get Batting Wins, the number of team wins the player can be said to be responsible for. You can then combine it with Fielding Wins and Baserunning Wins to get Total Player Rating, which is described in the Overall Evaluation chapter.

Gary Skoog’s (1987) Value Added approach is very simple (see discussion by Tom Ruane, 2005a). He used the Palmer run expectancy table, but the method would work just as well with an alternative. Basically, one takes the run potential at the end of a plate appearance, subtracts from it the run potential at the end of the plate appearance, and adds any runs that scored during the PA. If the result is positive, the player has contributed to run scoring, and if it is negative, the player has damaged run scoring. Each inning, the lead-off hitter is changed with .454 for before the PA, which is the mean run potential for no baserunners/no outs. The batter making the third out in an inning ends with a 0, meaning that they cannot have a positive contribution unless a run scored during the inning-ending event. It is important to remember that one cannot simply use the run potential at the beginning of a plate appearance when making these calculations, because various events can occur during a PA that change the base-out situation (SB, CS, WP, PB, Balk). Instead, one must use the run potential just before the “event” (e.g., walk, hit, out) that ends the PA. Stolen bases and caught stealing are credited to the baserunner. Getting on base on an error is not credited to the batter. The batter does get credit for baserunners getting extra bases on hits (e.g., first to third on a single), which Skoog was not comfortable with and invited discussion by interested analysts. Jim Albert (2001) recreated the Skoog method using 1987 National League data gathered by Project Scoresheet, used it to estimate team run scoring per game, and then compared those estimates to actual team runs per game using the root mean square error (RMSE) as a goodness of fit measure. Its RMSE was .067, compared to .121 for Batting Runs, .202 for Bill James’s Runs Created (described later), .212 for OPS, and .242 for OBA.

Lieff (1998) proposed a method intended to simplify Skoog (who commented positively about it in an afterword) which I will further simplify in my description. He defined a function he called “C” as the run potential for a given base-out situation corrected for the impact of the number of outs on expected runs for the rest of the

Page 11: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

11

inning. C for the 24 base-out situations are as follows:

Outs

Bases Occupied 0 1 2

None -.454 .600 1.602

First .217 1.371 2.488

Second -.068 1.150 2.349

Third -.277 .952 2.315

First and Second .620 1.961 3.240

First and Third .361 1.761 3.203

Second and Third .054 1.478 3.036

Loaded .746 2.303 3.899 In addition, one needs the C function for the end of the inning: 2.547 if no one on base, 3.547 if one runner, 4.547 if two runners, and 5.547 if three runners. To determine the number of runs for which a given player is responsible over a season:1 – for each of the 24 base-out situations, multiply the number of plate appearances for the base-out situation before the event occurred by (C + 1)2 – sum these 24 products3 – for each of the 24 base-out situations and 4 inning-ended situations, multiply the number of plate appearances for the base-out situation after the event occurred by C4 – sum these 28 products5 – subtract the results of step 4 from the results of step 2.

Tom Tango (Tango, Lichtman, and Dolphin, 2006) took his estimates of the run value of events as described in the Inning chapter and multiplied each coefficient by 1.15, resulting in the following equation for weighted On-Base Average (wOBA):

.72 (non-intentional walks) + .75 (hit by pitches) + .90 (singles) + .92 (reach base on error) + 1.24 (doubles) + 1.56 (triples) + 1.95 (home runs)

divided by

Page 12: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

12

plate appearances

The point of the 1.15 is that it calibrates wOBA such that it can be interpreted as one interprets OBA, with .340 denoting about average, .400 excellent, and below .300 poor. Tom also included an easier to compute approximation:

2 (on-base average) + slugging averagedivided by

3

This works fairly well because, as we shall see soon, OBA is actually close to two times as critical in run scoring as SA. Baumer and Zimbalist (2014) included wOBA in their work, revealing a year-to-year correlation of .57, virtually the same as for OPS. Tom has come to realize that the weights are dependent on run environment and, if you want an exact figure, you need to use weights specific to a given season.

FanGraphs has adopted wOBA and has used it as the basis for their weighted Runs Above Average (wRAA) index. To calculate it for a given player, one would1 – subtract league average wOBA from that player’s wOBA,2 – divide by a seasonally-adjusted wOBA scale coefficient (1.26 for 2011), which gives you a value for each plate appearance, and 3 – multiply by plate appearances.League average should always be 0. Then using the 10-runs-equal-1-win formula, FanGraphs uses wRAA to represent batting in their version of WAR (fWAR), as discussed in the Overall Evaluation chapter.

Baseball Prospectus goes one more step, turning wRAA into True Average (TAv), which replaces runs above average with an index scaled like batting average to make interpretation easier. Traditionally, .260 has been taken as league average BA, so the computation of it was1 – divide wOBA by plate appearances, in so doing going back to the results of step 2 for wOBA,2 – multiplying by .9, and then3 – adding .260.More recently, BP has used league average BA for each season rather than .260.

Tom Tango based Weighted Runs Created (wRC) on wOBA, defined by

([player wOBA – league wOBA] divided by a seasonal adjustment called “wOBA scale”)plus

([league runs scored/league plate appearances] X player plate appearances)

wRC is very similar to wRAA. In addition, we have a park and league adjusted version of wRC, wRC+, defined by

Page 13: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

13

(The results of the numerator for wRC + [what I will call A, defined by league runs scored divided by league plate appearances])

plus(A – [park factor X A)

all divided by(league wRC divided by league plate appearances minus those by pitchers) X 100

The FanGraphs website https://www.fangraphs.com/library/offense/wrc/ provides a summary of wRC and wRC+, along with the following evaluation scale (I changed the second rating term):

Rating wRC wRC+ Rating wRC wRC+Excellent 105 160 Below Average 60 80Good 90 140 Poor 50 75Above Average 75 115 Awful 40 60Average 65 100

As for miscellaneous run expectancy methods, D’Angelo (2010) describes what he calls Markov runs. Imagine that a player gets all of his team’s at bats. The number of runs that team would score in a game on average as computed through Markov analysis is that player’s Markov runs. D’Angelo cites Jeff Sagarin’s work with USA Today in computing Markov runs for players during the season. According to an abstract, Robinson (2017) used simulated run distributions to estimate players’ offensive value to their team. Saavedra, Powers, McCotter, Porter, and Mucha (2010) concocted a statistically-sophisticated evaluation system based on the run expectancy for specific batter-pitcher matchups. They presented findings using all Retrosheet data between 1954 and 2008. The results of their model correlated almost perfectly (.96) with an index based on overall run expectancy, implying that in practice their method is more trouble than it is worth. This truly is a methodological study only.

There are problems with all of these methods that Pudaite (1988) pointed out in the context of the following approach.

Player Win Average and Player Game Percentage

Player Win Averages were proposed by two brothers, Eldon and Harlan Mills, as a measure for singling out clutch players as distinct from their overall performance in a self-authored 1970 book (their work is described in detail in Jim Albert and Jay Bennett’s far-easier-to-find Curve Ball [2001], and is available on line at https://trace.tennessee.edu/utk_harlan/6/). As I will note below, it fails at this goal but was a significant milestone in the advancement of evaluation techniques for both batters

Page 14: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

14

and pitchers. The Mills brothers began with play-by-play data for the entire 1969 season, and used it as raw data for computer simulations of thousands of games, resulting in the odds of both home and away team winning based on half-inning, base-out situation, and run differential between team; a total of more than 8000 possible game situations. (Pete Palmer and John Thorn noted that in The Hidden Game of Baseball that these odds would be untrustworthy for situations that appear relatively rarely due to poor sample size in the original data set.) Next, the brothers calculated the change in odds for winning and losing (which of course are reciprocal) resulting from every play during the 1969 season, assigning Win Points to the player responsible for increasing their team’s odds and Loss Points to the player decreasing theirs. Through multiplying the change in win probabilities by 20, the points are computed on a 1000 point scale, so that a one-percent increase in odds counts as 10 Win Points. To repeat their example, Bobby Thompson’s home run heard around the world would earn Thompson 1472 Win Points (the odds of a Giants win was about 26 percent before the play and 100 percent afterwards) and Ralph Branca 1472 Loss Points. Fielding errors are charged as Loss Points to the fielder rather than the pitcher. In addition, a fielder who makes a particularly good defensive play on a hard hit ball may be given Win Points for the out rather than the pitcher. A seasonal index, the Player Win Average, is computed by dividing the player’s total Win Points by the sum of their total Win and Loss Points, such that 1.0 would be the best possible performance and 0 the worse. Based on their 1969 data, it seems that the realistic range is from .7 to .3.

As I noted above, this method does not succeed in distinguishing clutch players from overall good performers. The authors believed it did because Win and Loss Points are greater for events at the most critical times in the game. However, as I will discuss in length in the next chapter, in order to achieve this goal, one would have to demonstrate a reliable difference in player performance between clutch and non-clutch situations. The Mills brothers did not do so. If one looks at their 1969 Player Win Average leaders, one sees the best hitters of the day; for example, for National League hitters with at least 300 at bats, Willie McCovey ranked first, Pete Rose second, Dick Allen third, Rico Carty fourth, Willie Stargell fifth, Roberto Clemente sixth, and Henry Aaron seventh. For starting pitchers, the top seven was Larry Dierker, Tom Seaver, Jerry Koosman, Bob Gibson, Juan Marichal, Phil Niekro, and Steve Carlton. That’s 10 Hall of Famers (if one includes Rose) out of top 14. Nonetheless, the idea of evaluating players based on the direct impact of their performance on winning and losing is sound.

Pudaite (1988), in a thoughtful discussion of the method, made additional criticisms. Pitchers are given total credit or blame for defense no matter whether or not fielders make plays on balls they ought to, and batters get total credit or blame for baserunner advancement whether the baserunner is or is not fast enough to get extra bases on hits. In addition, the method assumes that subsequent hitters are league average, with a nice example of why that is a problem: with a runner on third and two outs, a walk to the number 8 batter is worth less than usual in this circumstance if the next batter is a weak hitting pitcher unlikely to knock the baserunner home. The

Page 15: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

15

baserunner and subsequent hitter issue are relevant not only here but to all well-known run expectancy methods, including those already described here.

Jay Bennett and John Flueck’s (1992) Player Game Percentage (also detailed in Curve Ball) was an explicit development of the concept behind Player Win Averages. It differed as follows: First, rather than use a 1000 point scale, Bennett and Flueck used the actual change in probabilities following from a game event as their index. Second, rather than dividing by total number of points, Bennett and Flueck divided by the number of games played. Third, rather than estimates based on computer simulations, Bennett and Flueck employed the Lindsay run expectancy charts as the basis for direct computations of changes in probabilities. Their method allows for a determination of the impact of a player on a team’s won-loss record compared to an average player through adding Player Game Percentage to the team’s record. Bennett (1993) used Player Game Percentages in a 100 game simulation of Joe Jackson’s 1919 World Series performance to reveal that, if anything, Jackson performed better in “clutch” situations than one would have expected by chance during the Series despite his acceptance of the gambler’s dirty money.

Keith Woolner (2006; see also Nate Silver, 2006c) used his concept of Win Expectancy, which was originally designed for computing the odds for winning a game given the game situation (as discussed in The Inning chapter), to show how batters could be evaluated through its use; the method is analogous to the previous two.

Bottom-Up Regression Methods

What I call a bottom-up regression method is one in which the weights for each component are calculated based on their relative contribution to the total number of runs scored by a team or league over a given set of years. I call them bottom-up because they start with the data and then work up to the regression equation. As mentioned earlier, any bottom-up regression method will always be based on a particular period of time, and will be an extremely accurate predictor of player production during that period of time but not necessarily for any other. Thus, any claims about the superiority of a bottom-up regression method must be taken with a grain of salt. Further, when it is required, the need to add or subtract a constant to make the equation work lends inconvenience and complexity to the method and decreases its elegance.

The first bottom-up regression method was quite literally more than half a century before its time. F. C. Lane was a well-known baseball writer in the earlier decades of the 20th century. In two 1917 articles in Baseball Magazine during the years of the first World War, Lane laid out the same arguments that a statistical baseball researcher would make today. At the beginning of this series, Lane explained why batting average is a poor measure of offensive performance. It considers all hits of equivalent value, and to consider all hits equal is to consider all coins to be of equal value whether a penny or a half dollar. In truth, ‘“A hit is valuable in so far as it results in a score. The entire aim of a baseball team at bat is to score runs.” Therefore, different types of hits

Page 16: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

16

should be weighted according to their value in scoring runs. Of course, sometimes a single leads to multiple runs and other times to none, so “The only rule to be applied is the average value of a hit in terms of runs produced under average conditions during a season” (all quotes from page 54 of the January 1917 issue).

Lane took careful measure of the impact of each hit during 62 games he observed during the 1916 season. Based on the odds of both the batter, baserunners, and future baserunners (in the case of force plays in essence replacing the batter with a teammate on base), Lane estimated that a single is worth 45.7% of a run, a double 78.6% of a run, a triple 115% of a run, and a home run 155.1% of a run. The relative size of these estimates is remarkably close to current calculations. In a follow-up (1917, March), Lane extolled the importance of walks and calculated their impact as 25.4% of a run.

As noted, it took more than half a century for another bottom-up method to appear. In 1977, Steve Mann (listed in Furtado, 1999b) proposed his Run Productivity Average, which is based on the value of each event leading to the player under evaluation either scoring a run or driving a run in. It has the following formula:

.51 (singles) + .82 (doubles) + 1.38 (triples) + 2.63 (home runs) + .25 (walks) + .15 (stolen bases) - .25 (caught stealing)

divided byplate appearances

and then added to .016. The regression coefficients look like others described here except for home runs, which is so much higher because a batter gets both an RBI and a run scored for homering. But as a homer counts for only one run scored, the formula is misleading in that regard.

Bennett and Flueck’s (1983) Expected Run Production Average is a classic bottom-up regression method, in the sense that it nicely illustrates the method. The researchers used data from 1969 to 1976 to compute the following regression equation:

-.67 + .499(singles) + .728 (doubles) + 1.265 (triples) + 1.449 (home runs)+ .353 (walks) + .362 (hit by pitches) + .126 (stolen bases) + .394 (sacrifice flies)

- .305 (grounded into double plays - .085 (outs at bat).

Caught stealing and sacrifice bunts were originally included in this list but, as they did not improve predictive accuracy over these ten, they were not included in the equation, as is proper for regression.

Paul Johnson’s fairly well known top-down version of Estimated Runs Produced is a top-down method to be described later. Two later bottom-up versions apparently were described in the 1991 STATS Baseball Scoreboard book and appear in Furtado (1999b):

Page 17: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

17

.48 (singles) + .80 (doubles) + 1.12 (triples) + 1.44 (home runs) + .32 (walks + hit by pitches) + .16 (stolen bases) - .10 (outs)

and

.318 (total bases) + .333 (walks – intentional walks + hit by pitches – caught stealing – grounded into double players) + .25 (hits) + .2 (stolen bases) - .085 (at bats)

Blass (1992), based on 284 team-seasons between 1976 and 1986, computed the following:

.32 (walks) + .49 (hit by pitches) + .52 (singles) + .63 (doubles) + .85 (triples) + 1.38 (home runs) +.17 (stolen bases) – .21 (grounded into double play and caught stealing

combined) + .69 (sacrifice flies) - .9 (out at plate)

The -.21 seems too high; but anyway, Blass claimed that the formula accounted for 84 percent of variance in team runs.

Furtado’s (1999a) Extrapolated Runs (XR), a mainstay in the long-running Big Bad Baseball Annual, came in several different versions. The “basic” version is:

.34 (walks) + .5 (singles) + .72 (doubles) + 1.04 (triples) + 1.44 (home runs) + .18 (stolen bases) – .32 (caught stealing) – .096 (at bats minus hits)

Dividing XR by plate appearances gives you Extrapolated Average (XAVG; defined in Walker, 1999b). The average in that era seemed to be .125, with only a handful of players over .200. That would obviously change depending on the general offensive environment. If we multiple .125 by 600 plate appearances per season, we end up with 75 XR as a league average. Using a measure of extrapolated runs per 27 outs, Baumer and Zimbalist (2014) concluded that its year-to-year correlation was .61.

Phil Birnbaum (1999a), in an article in which he examined Batting Runs, Estimated Runs Produced, and Runs Created (a very important top-down method to be described shortly), proposed the aptly named Ugly Weights, which was purposely intended to be as accurate as possible across the entire spectrum of performance at the expense of any simplicity and of making any intuitive sense whatsoever. It is:

.46(singles) + .80(doubles) + 1.02 (triples) + 1.4 (home runs) +.33 (walks) + .3 (stolen bases) - .5 (caught stealing) – {[.687(batting average) – 1.188(batting average squared)

+ .152 (isolated power squared) – 1.288 (walks divided by at bats) (batting average) - .049 (batting average) (isolated power) + .271 (batting average) (isolated power)

(walks divided by at bats) + .459 (walks divided by at bats) - .552 (walks divided by at bats squared) - .018] outs}

Page 18: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

18

In his simulation, it works well all the way from 1 to 15 runs per game. But, for the same reason I discussed earlier for Batting Runs, Phil showed (2000b) that it works poorly for individual games, overpredicting up to eleven runs scored and underpredicting for more than eleven.

Scott Berry (2000c) is a statistician who for many years wrote a column on baseball statistics for a journal called Chance. Scott performed a multiple regression for nine American League teams for 1998 and found team runs to be predicted by:

.34 (walks) + .49 (singles) + .72 (doubles) + 1.14 (triples) + 1.51 (home runs) + .26 (stolen bases) – .14 (caught stealing) – .10 (out at plate)

Later (2006a), based on data for all teams from 1995 to 2004, Berry came up with

.30 (walks) + .59 (singles) + .71 (doubles) + .91 (triples) + 1.48 (home runs) + .27 (stolen bases) – .20 (caught stealing) – .14 (out at plate)

.This latter version accounted for 92.7 percent of variance in team runs scored with a standard deviation of 24.3 runs.

In his second book on ranking hitters, Schell (2005) proposed what he called Event Specific Batting Runs based on data from 1947 through 2003:

.32 (walks + hit by pitches) + .52 (singles) + .80 (doubles) + 1.11 (triples) + 1.52 (home runs) -.273 (at bats minus hits)

Jim Albert (2007) calculated a model based on independent estimates of the Three True Outcomes (strikeouts, walk, home runs) along with batting performance on balls in play. Based on all 30 teams in 2005:

-3.2 + 13.2 (walks per plate appearance) – 12.3 (strikeouts per at bats) + 40.9 (home runs per batted balls) + 24.5 (singles + doubles + triples per batted balls in play).

The regression accounted for more variance (.83) than both OPS (.77) and Runs Created (.82) for 2005, and generally did so for each earlier seasons back to 1950. Distinguishing double and triples from singles only improved prediction by 1 percent.

Total Base Average (TBA; Heeren & Palmer, 2011) is Dave Heeren’s attempt to evaluate a batter according to the mean number of bases each event either he or any baserunners advances. It consists of

5.5 (homers) + 4.5 (triples) + 3.3 (doubles) + 1.8 (singles) + 1.4 (unintentional walks and hit by pitches) + 1.2 (sacrifice bunts and sacrifice flies) + (intentional walks and stolen

Page 19: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

19

bases) – (caught stealing and ground into double plays)

It is very easy to find serious fault with this formula. It clearly overvalues every event that costs outs, even if they advance baserunners and particularly if they do not. Equating sacrifice bunts, which cost runs, with sacrifice flies, which earn them, is indefensible. Heeren claimed that a given player’s TBA will be fairly close to his OPS; perhaps 50 points higher if he is a prolific and efficient base stealer and 50 points lower for players guilty of too many of the two negative events.

Lanning (2010), in an article on the impact of integration on team profits during the early 1950s, applied a method he called Expected Runs Produced per At Bat (ERP), with the formula

.16 X ([3 X singles] + [5 X doubles] + [7 X triples] + [9 X home runs] + [2 X walks] + stolen bases – [.61 X outs made])

I cannot find any other mention of this method. The author referred to Paul Johnson’s Estimated Runs Produced, but that is an obviously different method. In any case, it looks like a mix of top down (weights for most of the factors) and bottom-up (the weight for outs made, plus the .16) computations.

For all of these, Reuter (1982a) made the interesting point that the ballpark relative value of different offensive events are 1 for walks, 2 for singles, 3 for doubles, 4 for triples, and 5 for home runs. As such, the following formula describes them in general:

walks + 2 (singles) + 3 (doubles) + 4 (triples) + 5 (home runs)divided by

plate appearances

which translates to

walks + hits + total basesdivided by

plate appearances

which would be close to OPS if slugging average used PA as the denominator rather than at bats.

Top-Down Regression Methods

A top-down regression method differs from a bottom-up method in that weights for the performance indices used to predict run production are either dispensed with or based on the analyst’s intuitions about their impact. I call these top-down because they start with the analyst’s “theory” of offense and then work down to performance data. A

Page 20: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

20

top-down method will always be less accurate than a bottom-up method during the period of time for which the weights in the bottom-up down method were based on. However, top-down methods will generalize across all eras of baseball, be easier to use, and be more elegant than bottom-down methods.

Top-down methods are not new. Travis Hoke (1935) reported on one he devised that he claimed to have used to compute daily figures for Branch Rickey when Rickey was the Browns’ “secretary,” back around 1913 and 1914. Hoke’s system was weighted such that the opportunity to reach first base counted for one, to move either oneself or another player from first to second counted for two, second to third counted for three, and third to home counted for four. For example, a player coming up with bases empty has the opportunity for ten points, which he will get if he homers. The player coming up with bases loaded has the opportunity for ten points for himself, nine for the runner on first, seven for the runner on second, and four for the runner on third. The actual advancement based on what happens during the at bat would then be divided by this opportunity index to obtain Hoke’s offensive measure.

I should also mention On-Base Plus Slugging (OPS) here, which is simply on-base average plus slugging average. It correlates well with run scoring, and is simple to understand. It was to all extents and purposes introduced in Thorn and Palmer (1984) and, due to the combination of simplicity and surprising accuracy, has become the most well-known of the newer measures. The following descriptors make sense:

OPS Descriptor OPS Descriptor1200 Historically Great 800 Decent1100 Great 700 Okay1000 Excellent 600 Poor900 Very good 500 Terrible

These descriptors are position-specific to an extent; a first baseman at 700 is not okay, and a really good fielding catcher or shortstop can get away with 600.

OPS is misleading in a way, because its use assumes that OBA and SA have an equal relationship with run scoring. They do not; most analysts have concluded that OBA is the more strongly related, probably (as Mark Pankin, 2005, argued) because staying away from outs (keeping OBA up) is more important in run scoring than extra bases (the point of considering SLG). There is disagreement about how much more important OBA is. At the extremes, Lee (2011), using Korean League data, believed between two and three times more and Mark Pankin (2005), based on the American majors in 2001, about 2. Hakes and Sauer (2006) noted OBA to have twice the impact on team winning average than SLG. Wang (2006), based on data from 1960 through 2005, originally claimed it as about 1.8, whereas Coffin and Cowgill (2005) computed 1.9 between 1987 and 2004, with the ratio between OBA and isolated power at 2.5. Hakes and Sauer (2006)’s calculation also implies that it is about twice as important.

Page 21: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

21

For these reasons, Aaron Gleeman (see Panas, 2010) has proposed a Gross Production Average in which OBA is multiplied by 1.8 before the addition to SA, and then the sum is divided by 4, which results in an index approximating BA in its range. In contrast, Phil Birnbaum (2005a), based on 1987 and 1988, computed it as just 1.2 more important, but raised that to 1.5 (2005b) in response to Mark’s work.

Importantly, Wang (2007) demonstrated that the ratio is era dependent, at about 1.5 between 1959 and 1993, 2.2 between 1994 and 2000, and 1.6 between 2001 to 2006. Note that these estimates pretty much reconcile the differences among the previous numbers, and they make sense if we can assume that during the “steroids era” home runs were more plentiful and as a consequence individually relatively less valuable. Mark Pankin (2004), using methods based on the impact of transitions among base-out situations, examined the relative association of walks and extra bases on hits (which isolated power measures) and found the former to have been almost two times as important for the 1984-1992 interval. In a follow-up (2006), Mark examined the issue year-by-year starting with 1900 and uncovered an extremely strong correlation of .92 between the ratio of on-base average to slugging average and the regular scoring environment, such that it was at its lowest, 1.53 in 1968 (the “year of the pitcher”) and 1908 (the trough of the dead-ball era) and highest in 1930 (when the National League as a whole averaged .300) and 2000 (the peak of the steroid era). Pete Palmer (2009) looked at a wider range of seasons, with the ratio about 1.8 for the 2000 to 2008 period but 2 to 1 before then back to 1920, and 1 to 1 during the 1900-1920 deadball period (where home runs were very scarce and so more valuable when they did occur). As Phil pointed out in his response, piggybacking on Mark’s explanation for why OPA is more important, as offense goes up teams get on base more often, leading to run scoring independently of extra base hits.

There are however two arithmetic problems with OPS that may have biased all of these previous analyses in favor of OBA. First, as D’Angelo (2010) noted, the two components have different denominators; at bats for slugging average and plate appearances for on-base average. Although not quite apples and oranges, the two denominators will differ by the sum of walks and hit by pitches. Second, they have different ranges; OBA cannot be greater than one whereas SLG can be four. Barry Codell (1992) proposed what he called the Diamond Weight explicitly to correct for the denominator problem with OPS described earlier. It is simply times reached base plus total bases and then divided by plate appearances. As such, it double counts the first base on any hit. The most critical response came from Deli (2012). In order to counteract these arithmetic problems, Deli normalized his data (1980-2007) before regressing OBA and SA on runs scored. His calculation resulted in the conclusion that SLG was more important than OBA overall (regression coefficients were.526 and .438 respectively), for 17 of the 28 seasons, and for 24 of the 30 teams across that span.

In any case, Total Baseball sometimes adjusts OPS for park and league and normalizes it to an average of 100. In order to interpret the resulting index, Adjusted Production (PRO+), one can read the difference between a batter’s figure and 100 as

Page 22: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

22

the percentage better or worse relative to average that the batter achieved. So a PRO+ of 110 means (approximately) 10 percent better than league average; one of 90 10 percent worse. Finally, in order to come up with an index more indicative of skill rather than whether hits happen to fall in a given season, J. C. Bradbury (2006) computed a Predicted On-base Plus Slugging (PrOPS) by giving hitters “credit” for the number of grounders, flies, and liners they hit each multiplied by the bases that each of these batted-ball types would net on average in the ballpark in which they were hit. This allowed Bradbury to see which players were particularly lucky or unlucky on batted balls during the 2002-2005 stretch. PrOPS for a given year had a slightly greater correlation with OPS the next year than did OPS in that given year, an example of luck evening out over time. Top-down regression methods can be divided into subcategories, which I will cover in turn:

Bases Gained Per Opportunity

This subcategory includes indices that divide bases gained by the opportunity to gain them, most simply plate appearances. Perhaps the simplest of these is Maher’s (1977) Offensive Average, which has been used in quite a few academic studies. It consists of

total bases + walks + stolen basesdivided by

plate appearances

Tenbarge (1996) proposed a simpler version without stolen bases that he called Earned-Based Average (EBA), along with a correction for different offensive contexts through dividing by league average EBA. Gilbert (1993) proposed a more complicated version (Bases per Plate Appearance) by adding hit by pitches and subtracting caught stealing and grounded into double plays to the Offensive Average numerator.

Another early example is D’Esopo and Lefkowitz’s (1977) Scoring Index:

hits + walks + hit by pitches + errors – double playsdivided by

at bats + walks + hit by pitches + sacrifices

which is basically a jazzed up version of on-base average; note that extra bases on hits are not included. As a consequence, the leading Scoring Index in the National League during 1959 belonged to Joe Cunningham, who led that league in OBA that season. As it does not consider power, it obviously does not do a good job of evaluating offense.

Clay Davenport’s Equivalent Average (EqA) has been a mainstay in the Baseball Prospectus annuals since they began. It is sort of a cross between a top-down and

Page 23: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

23

bottom-up system, in that it was designed in response to data but purposely kept as simple a bases-gained-per-opportunity method as the data allowed. There are multiple versions, here is one from 1999 and 2001, beginning with the following top-down raw index:

hits + total bases + stolen bases + 1.5 walksdivided by

at bats + walks + caught stealing + (stolen bases divided by 3)

The point of the 1.5 is to try to put the on-base and slugging components of the formula on an equal footing. This raw index is then adjusted for home ballpark and then transformed into a measure with an intended mean of .260 that can be interpreted in terms of a batting average, i.e., a EqA of .300 is very good, of .240 is poor, and thus easy for the average fan to interpret. As Davenport readily admits, it makes little theoretical sense. Davenport has a method for turning EqA into runs (Equivalent Runs) which is similarly ad-hoc. Austin (1998) demonstrated that EqAs are not much different than OPS divided by three unless the player has an extreme skill (e.g., very prolific basestealer) or plays in an unusual run scoring environment. Austin proposed some corrections to OPS/3 to bring it closer to EqA, resulting in something he called QEqA.

On-Base Times Slugging

A large number of very significant top-down indices are based on what Dick Cramer (1977) claimed to be a “fundamental empirical relationship” in baseball play; that runs scored across a league approximately equals total plate appearances multiplied by slugging average multiplied again by on-base average. Although undoubtedly correct, Dick’s insight has the following problem: a batter’s job is to accumulate as many bases and as few outs as possible in the opportunities afforded. Thus in a rough sense they are additive – a double adds a base to a single, a triple adds a base to a double. What is multiplicative is team scoring. Consider a simplified baseball world in which a team scores if and only if Player A and Player B both get hits in an inning. Two hits mean a run, so our prediction equation would be:

Runs = .5 Hits

But does that mean that one hit means half a run? It would in the abstract world of statistical baseball research, and three hits equals a run and a half. This is fine for the evaluation of single players, as the player who did get a hit deserves some credit.

But this does not work in the real world of team run scoring, where not only is it useless for neither Player A nor Player B to get a hit, it is just as useless for either A to get a hit but the other not to. Now, what mathematical relationship reflects this fact? The answer is multiplication rather than addition, because in multiplication 1 hit times 0

Page 24: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

24

hits means 0 runs, which is the truth of the matter. In contrast, 1 hit times 1 hit is the only way to get 1 run. So offense at the team level is multiplicative because, other than homers, it takes more than one player to succeed. The implication of this is that when you evaluate additive player models in the real world of multiplicative run scoring, you are bound to run into problems. This may be part of the reason why so many of our better models run into problems in extreme cases, where the effects of multiplicity become extreme.

Cramer pioneered offensive evaluation methods based on this relationship. The first, Batter Run Average (BRA), was devised independently by Dick and Pete Palmer (1974). A simple form of BRA consists of on-base average multiplied by slugging average; a more complex version divides stolen bases by two, subtracts caught stealing from that and adds this result to the times reached based in the numerator of the on-base average equation . As described in the Strategy chapter, the reason for dividing stolen bases by two before the subtraction is that the loss in expected runs from each caught stealing approximately counterbalances two successful steals. Dick and Pete determined that the average runs a batter contributes to his team in a given plate appearance equals BRA minus (.7 X BRA); when multiplied by plate appearances in a season, the result estimates the runs a batter has added during the season from his own effort. The authors make the point that this number is in one sense an underestimate, as a higher on-base average provides subsequent batters more opportunity to contribute, and perhaps the batter should receive some credit for this achievement also. Dick and Pete computed a ratio of player BRA over league BRA to measure the relative excellence of hitters; not surprisingly, at that time Ruth (2.66) and Williams (2.66) led the way by fairly large margins.

Batter Win Average (BWA) was a “further refinement” of BRA credited to Cramer (1977) alone. BWA added to the numerator of the on-base average the following additional terms: ½ on base on errors – (2 x grounded into double plays). The result of this computation is then compared with what a league-average player for the season would achieve with the same number of plate appearances as the player under examination. Batter Win Average is, then relative to the average player in the league. The 1969 league leaders were Willie McCovey (+.130) and Harmon Killebrew (+.109), the league worsts were Gil Garrido (NL, -.071) and Tim Cullen (AL, -.0).

Cramer (1987) called his last attempt Runs Contributed. It is as follows:

([hits + walks + hit by pitches] + .5 [stolen bases] – caught stealing – 1.5 [grounded into double plays]) X slugging average

This one should be interpreted as actual runs, such that an index of 100 is excellent and 75 about average.

In his 1966 book, Cook proposed a different Scoring Index (abbreviated as DX for some unknown reason), which is basically on-base average minus extra base hits multiplied by ways of getting extra bases. Specifically, for each player individually, it

Page 25: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

25

consists of the probabilities rather than the raw numbers for each of the following:

singles + walks + hit by pitches + opposition errors - (2 X sacrifice bunts)multiplied by

extra base hits + stolen basesdivided by

raw number of plate appearances

Cook wished to subtract the probability of caught stealing from the second term, but given the absence of readily-available caught stealing data at that time, he went without it. Cook found a regression equation predicting team DX to be correlated with team scoring within two percent. However, others have found DX to work poorly relative to other performance measures. For example, as described above, Mark Pankin noted a correlation at the team level between DX and runs scored of .891, which is actually worse than that for slugging average (.906).

Runs Created

Bill James first introduced Runs Created (RC) in the self-published 1978 Baseball Abstract (page 103) and discussed it in some detail in the first conventionally published edition (1982, pages 5-10). There are quite a few different versions of RC, all predicated on the basic notion that a player’s offensive performance is best represented by the ability to get on base multiplied by the ability to achieve extra bases. Theoretically, Bill argued that run scoring is predicated on getting on base (represented by hits plus walks) and advancing runners, and used total bases as a stand-in for the latter. This double-counts hits, as he admitted, but as hits always advance runners whereas walks don’t, there is some rationale for giving them more weight. If we add the impact of attempted steals to this, we get the following “basic” formula:

(hits + walks – caught stealing) X (total bases + .7 [stolen bases])divided by

at bats + walks + caught stealing

Stolen bases get less value because by themselves they never advance runners. The exact figure of .7 is a guesstimate that Bill claimed worked well.

By 1983 (pages 232-235), Bill was using a more technical version that he claimed to be somewhat more accurate:

(hits + walks + hit by pitches – caught stealing) X (total bases + .65 [stolen bases + sacrifice hits + sacrifice flies])

divided byat bats + walks + sacrifice hits + sacrifice flies + caught stealing + hit by pitches +

Page 26: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

26

grounded into double plays

Bill was using another technical version in time for the 1986 Abstract:

(hits + walks + hit by pitches – caught stealing – grounded into double plays)X

(total bases + .26 [walks + hit by pitches - intentional walks]+

.52 [sacrifice bunts + sacrifice flies + stolen bases])all divided by

plate appearances

with the .52 and .26 computed empirically bottom-up. There are a series of other technical versions, each relevant to the types of data readily available during different historical eras (James, 1986a). In addition, Lee Sinins proposed Runs Created Above Average (RCAA), comparing a given player to the league-average hitter.

Paul Johnson (1985) was apparently the first to notice that Runs Created works quite well for the average range of players but significantly underpredicts the very poor players, i.e. those combining low on-base averages with poor slugging averages, and overpredicting the very good, i.e. combining highs in both those measures. This is probably because of the multiplicity “problem” described above. A top-down version of Paul’s Estimated Runs Produced received a lot of attention right after it was proposed, partly because Bill James introduced it in that year’s Abstract despite the fact that it seemed to perform better than Runs Created. According to Phil Birnbaum’s (1999a) simulation it works well in the 3 to 7 run range but underpredicts both above and below that. It is basically a top-down procedure due to its theoretical derivation (positive offense minus negative offense), but an “impure” one in the sense that it adds empirically-deduced weights:

(2 X [total bases + walks + hit by pitches] + hits + stolen bases) minus(.605 X [at bats + caught stealing + grounded into double plays – hits)

X .16

An unabashedly bottom-up replacement was described earlier.Phil Birnbaum (1999a), in replication based on the simulations described earlier,

noted that RC works less well than Batting Runs, predicting well in the 4 to 6 run per game range but with the problems just described outside of that interim. Because Bill recognized this problem, there was one final version that surfaced in 1998, in which the advancement part of the equation became:

(total bases + .24 [walks + hit by pitches – intentional walks] + .62 [stolen bases] + .5 [sacrifice hits + sacrifice bunts] - .03 [strikeouts])

Page 27: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

27

What is notable is the continued movement here toward a bottom-up formula. In addition, one then adjusts for the relationship between the results of the basic RC and the RC of a hypothetical replacement level player (which he arbitrarily defined as .3 for the getting on base part and .375 for the advancement part of the numerator), and then corrected for the hitter’s specific success at homering with runners on base and hitting with runners in scoring position, resulting in “preliminary RC.” After computing the preliminary RC for everyone on the hitter’s team, the analyst sums these and then compares the sum to how much the team actually scored, showing whether the team over- or underperformed the RC projection. Each player is then either credited with their share of any overperformance or charged with their share of any underperformance, no matter whether the player actually was responsible and despite the fact that this stand-in for clutch hitting should already have been dealt with in the adjustments for hitting with baseunners as just described. I suspect that it is more accurate than earlier versions, but has traded off simplicity and ease of interpretation and included hard-to-defend adjustments as part of the bargain. Fortado (1999) with help from other BBBAers have described this version in detail and provided on-target criticisms of its assumptions. Phil (2000b) also showed that it overpredicts runs scoring by a bit between one and six runs, and by a lot for additional runs.

Dan Levitt (2003a) calculated a correlation of .68 between RCs for 1991 and 1992, which as described earlier is actually lower than for quite a few individual measures. It was a bit lower when using only the first (.56) or second (.64) halves of 1991, showing that unlike some commentary, the next season is predicted more accurately by an entire season rather than only its second half. His sample size, however, was relatively small, just 109 players with at least 200 at bats in both the first and second half of 1991 and 250 ABs in 1992.

Another move toward turning Runs Created into a bottom-up procedure was proposed by David H. Robinson (1987a). David’s work is predicated on taking the theoretical basis for RC (on base time slugging) seriously, conceiving of the formula through algebraic reformulation as:

(At Bats) X (Slugging Average) X (On-Base Average)

This requires some “poetic license” so to speak, because the denominator for RC is plate appearances, the same as OBA, while David used at bats. Anyway, David examined what would happen when one estimated both RC and Estimated Runs Produced using OBA and SA data specific for different base-out situations as measured for 78 teams in 1984, 1985, and 1986 and presented in the Elias annuals. He ended up determining that using SA with runners on base and OBA with bases empty were the best estimators for team run production, which makes sense if we assume that a batter’s job is to get on base if no one is on and, if someone is on, to drive them in. This led David to a new conception of RC as based on two types of plate appearances,

Page 28: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

28

those with bases empty and those with runners on, and with different weights for both SA and OBA based on their relative weight in each type. One would then compute RC in the following three steps:1 – Compute a “true” slugging average, one that weights what happens with runners on base more than bases empty, by

(.15 X SA with bases empty) + (.85 X SA with runners on)2 – Compute a “true” on-base average, one that weights what happens with bases more than with runners on, by

(.70 X OBA with bases empty) + (.30 X OBA with runners on)3 – Multiplying the results of steps 1 and 2 together, and then multiplying again by number of at bats.David ended the article with revised SA, OBA and RC figures for nine prominent hitters of the day, about half of which were higher and half lower than the original.

Heipp (2005) described how one can estimate the run value of individual events (singles, doubles, etc.) in the context of Runs Created. David Smyth (1987) came up with a version that estimates the number of runs that a given player would add to or subtract from the performance of an average team. Smyth’s more important achievement, Base Runs, will be described below.

Dan Levitt (2005) used runs created as the basis for his Pennants Added Above Average (PAAAv). Conceptually, Dan defined it as “…the increased (or deceased) probability of a randomly selected team reaching the post season when substituting the subject player for a typical player on that randomly chosen team” (page 15). The details can be found in Dan’s article; the basic idea is to1 – Take the number of runs created for the player being examined,2 – See how it changes the winning average (Dan used both the actual and Pythagorean estimate (see Team Evaluation chapter) depending on the circumstance) for each team in the league if that player’s runs created were used rather than the actual average runs created per player on the given team, then3 – Use the known probability of a team with a certain winning average qualifying for the post season (based on 1901 through 2002 except for the 1981 and 1994 strike seasons) to compute the change in those probabilities due to using the given player’s RC rather than the average player’s, and finally4 – Add those up across all the teams in the league.Michael Wolverton (2002) came up with an analogous concept, but through a different method; subtract the probability that the player under question’s team would have made the playoffs/won the pennant without him from the probability that it would with him. It does not appear that he thought to include the probability that a typical player would affect those probabilities in his calculations, which I believe to be a mistake.

Bases Gained Per Out Made

This subcategory includes indices that replace plate appearances with outs made

Page 29: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

29

in the denominator. Albert and Bennett (2001) noted an advantage of this subcategory over base gained per opportunity; the fact that the number of outs per game is more consistent across games than the number of at bats or plate appearances. Barry Codell’s (1979) Base-Out Percentage (BOP) is an early example of this type of index:

total bases + walks + hit by pitches + steals + sacrifice hits + sacrifice fliesdivided by

(at bats - hits) + sacrifice hits + sacrifice flies + caught stealing + double plays

A very similar one is Norm Hitzges and Dave Lawson’s (1994) Total Offensive Production Rating:

total bases + walks + hit by pitches + steals + sacrifice hits + sacrifice flies – caught stealing – grounded into double plays

divided by(at bats – hits) + sacrifice hits + sacrifice flies + caught stealing + grounded into double

plays

They also proposed normalizing for league average and adjusting for home ball park, with a method analogous to Palmer’s.

Total Average is the brainchild of popular writer Tom Boswell, and its popularity is partly due to his reputation. It originally appeared in the January 1981 issue of Inside Sports, and is well discussed in Thorn and Palmer (1984). It consists of:

total bases + walks + steals + hit by pitchesdivided by

(at bats – hits) + caught stealing + grounded into double plays

Base Runs (reviewed by Heipp, 2001) is a uniquely well-thought-out procedure proposed by David Smyth and described in a number of on-line sites (see, for example, Tom Tango’s defense of it on his website; this description is largely based on his). It deals very effectively with the multiplicity problem. It does have a bottom-up feature, however, that requires additional computational work. We begin with two observations: First, a plate appearance can end with the batter hitting a home run and thus scoring, which is a good thing for their team, a batter getting on base and hopefully being driven in by someone else, which is also a good thing, and a batter making an out, which is a bad thing. We want to ignore batters driving in runs because, as the first two possibilities account for all runs, they are redundant. We also want to separate the effect of home runs from the effect of other ways to score, because at the level of the individual player only the latter requires the help of other players. Second, if a player gets on base, they can either score, which is a good thing for their team, be left on base, which is a bad thing, or make out on the bases, which is a bad thing. Thus the

Page 30: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

30

“score rate” (odds of scoring a run) can be calculated by dividing the number of times a batter scores when already on base (thus excluding home runs) by the total number of times the batter is on base (the sum of the three possibilities listed in the second observation).

All of this implies that we can calculate a run value for a player based on the number of times that player gets on base, the odds of that player scoring given being on base, and the number of home runs the player hits. Base runs are then theoretically equivalent to:

(times on base X score rate) + home runs)

If we have a good measure of the score rate, we are in business. The problem is that we may have to estimate it using readily available data. Smyth did it as follows:

advancement = (1.4 X total bases) – (0.6 X hits) – (3.0 X home runs) + (0.1 X walks)outs = at bats minus hitsscore rate = advancement divided by (advancement plus outs)

with the regression-type rates calculated bottom-up.As he did for Runs Created, Heipp (2005) described how one can estimate the

run value of individual events (singles, doubles, etc.) in the context of this method.

A Team of a Given Player

Cover and Keilers’s (1977) Offensive Earned Run Average (OERA) was perhaps the first method that evaluated a player’s offensive by estimating the number of runs that player would score if he was given all his teams’ plate appearances and those plate appearances were statistically independent, i.e; he performed the same way regardless of base-out situation. Although the authors state that it is ideally calculated from using the outcomes of the batter’s actual sequence of plate appearances in a season (i.e., how many runs would score given his first, second, third etc. PA in a game until he made three outs, at which point another “inning” would begin with the next PA), these data were at that time unavailable; now, it could be done. Given that unavailability, their main task was to propose equations to represent the expected results of this sort of sequence across a season. It requires the following set of assumptions:1 – Sacrifices are not included.2 – Errors are counted as outs.3 – Runners do not advance on outs.4 – All singles advance baserunners two bases and doubles advance them three.5 – There are no double plays; although not stated, I assume that one out is counted.

Their method makes use of the following insight; that if a batter alternated only walks and outs, then at most there would be six fewer runs scored than plate

Page 31: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

31

appearances, because when an inning would end three plate appearances would be consumed for outs and three would load the bases (one needs four walks to score a run). Analogous reasoning and the assumptions just mentioned result in a limit of five fewer PAs then for singles, four for doubles and triples, and three for home runs. I will leave the form and remainder of the reasoning behind their formulas (one each for walks, singles, doubles/ triples, and home runs) to the interested and mathematically sophisticated reader (a set of which I am not a member). The bottom line; at the time of the article, not surprisingly, Ted Williams and Babe Ruth are in a virtual tie (13.20 and 13.19 respectively) for the highest lifetime OERA, followed again not surprisingly by Gehrig, Foxx, and Greenberg. For the 1975 season, Joe Morgan led the way at 11.01, thanks to a .327 BA, .468 OPS (reported as .471 in the sixth Total Baseball). As noted earlier, Mark Pankin calculated a correlation of .958 at the team level between OERA and runs scored.

OERA has caught the attention of Japanese researchers. Katsunori (2001) modified OERA through the addition of a component for base stealing. Interestingly, this modification led to a lower OERA for most members of a sample of highly productive Japanese hitters. Sueyoshi, Ohnishi, and Kinase (1999) predated Katsunori with including not only stealing but sacrifices and double plays, in a method pairing OERA with data envelopment analysis. This method also resulted in reordering of relative ratings.

Bill James (1983, page 6) has proposed a couple of different relevant methods. Runs Created per Game (RC/27) is an estimate of how many runs a team would score in a specific season if a given player had every plate appearance and every game ran nine innings, assuming league average pitching and fielding. One takes the player’s Runs Created figure for a given season, multiply it 25.5 (the average number of outs for each side per game, differing from 27 due to outs made on the bases plus incomplete ninth innings), and then divide by the number of outs that player made during that season.

One can then take this figure and assign that player an Offensive Won-Lost Percentage (OWP; introduced in the 1981 Baseball Abstract, also see page 6 of the 1983 Abstract) for the season as follows: 1 – Square the Runs Created per Game figure.2 – Square the average runs per game per team for that season.3 – Sum the results of steps 1 and 23 – Divide the result of step 1 by the result of step 3. The resulting figure is the estimated winning average of a team made of just that player for the season. It is analogous to the Pythagorean equation for estimating expected team winning average based on the number of runs it scores and gives up (see the Team Evaluation chapter). In short, it corrects for the fact that extremely good teams win in blow-outs, and extremely poor teams lose in blow-outs, a disproportionate number of times.4 – Divide the number of outs the player made that season by 25.5, which gives the

Page 32: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

32

number of games for which the player is responsible for all the outs.5 – Multiply the result of step 4 by the result of step 3, giving you the number of wins the player is responsible for.6 – Subtracting the result of step 5 from the result of step 4 gives you the number of losses the player is responsible for.

Neil Munro (1984) argued that players with extremely high on-base averages make fewer outs, thus are credited with fewer offensive games and end up being shortchanged by OWP. He suggested replacing the game-wide 25 or 25.5 outs in an offensive game with a measure specific to the player under examination, his plate appearances divided by 38 (the average number of PA per game. For whatever reason, a later version of OWP changed step 4 from using per game to using per out in the formula; see Tom Tango’s tangotiger website for details. I would also add that It would be a bit more accurate to replace the average runs per game per team used in step 2 with the average runs per game for league teams other than the player’s own.

Brock Hanke (1998) described Offensive Wins Above Replacement, a variant. One subtracts the replacement level OWP of .350 from a given hitter’s OWP, providing an estimate of the difference in the contribution of that player and a random replacement. This figure is then multiplied by the number of “offensive games” for which the player is responsible. This was originally defined as the number of outs for which the player was responsible divided by 27, which makes conceptual sense, but later by the average of total team plate appearances earned by the player.

A newer James creation, Theoretical Team Runs Created (see Panas, 2010), compares the Runs Created by a league-average team in a given year with that by a team with eight average players and the player under examination, with a correct for the proportion of games that the player actually participated in. It is based on a method called Marginal Lineup Value (MLV) created by David Tate and described by Tom Tango on his website. The Prospectus folks use a Positional MLV that compares to the average hitter at the given player’s position rather than overall league average.

Scahill (1990), in asking whether Babe Ruth would have been more valuable as an outfielder or pitcher for the 1920-1924 Yankees, presented a different approach to this issue that is logically sensible but with several unfortunate problems. The first, based on an unfortunate assumption made by Scully (1974), was to use slugging average as the sole indicator of offensive performance and rely on Scully’s finding that a one point increase in team SA increases team winning average by .92. Scahill then multiplied Ruth’s SA over that five year interim (.777) with the proportion of team at bats Ruth had (.112) and then again by .92, resulting in .080, representing the contribution Ruth made to team winning average with his slugging. The team winning average over the interim was .622, so subtracting .080 leaves .542, the team winning average with neither Ruth nor a replacement player, which translates to 61 fewer victories, or 12 per season, which despite the problem makes sense as the contribution of a historically great player to his team.

Page 33: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

33

The second step was adding the value of a replacement player. Here lies the second problem in Scahill’s analysis, as he used the team’s average slugging average without Ruth to represent this player rather than a replacement level right fielder. Anyway, that average was .412, so multiplying it by .112 and again by .92 leaves .042. Adding that to .542 gives you .584, the team winning average with the replacement player. This is 32 wins over the interim more than .542, or more than 6 wins per season. We know now that six wins per season is not an average player; it is the performance of an All-Star. In any case, subtracting that from 61 means that Ruth supplied the Yankees 29 wins more than the replacement player or 6 per season, which is in actuality a gross underestimate.

The third step would be to compute how many wins Ruth as a pitcher would have supplied over the pitcher who would have been replaced. Scahill did not make this calculation, but his discussion reveals the third problem; a reliance on winning average as the indicator of pitcher quality, and the assumption that Ruth would have exactly duplicated his winning average as a Red Sox pitcher (.650) as a Yankee.

Dallas Adams (1987b) perhaps inadvertently came up with an index of this type when attempting to come up with an alternative method for Runs Created. Dallas input Babe Ruth’s 1920 performance into all nine batting order positions, and concluded that a team of 1920 Ruth’s would have averaged 18.29 runs per game. This equates to 194 runs created, which is 17 fewer than the James’s current runs created formula had estimated. Dallas also simulated overall American League run production that year, finding an average of 4.36 (which was actually considerably higher 4.76; personal communication from Cliff Blau). Putting 18.29 and 4.36 into the Pythagorean equation for estimating team winning average (see the Team Evaluation chapter) resulted in a Ruthian offensive winning average of .971 (James had come up with .934).

Using the results of simulation of game play, Beaudoin (2013) calculated an index that he called Number of Runs Generated per Game (NRGG) which is basically a team-of-a-given-player measure.

Replacement Level

The concept of replacement level probably first appeared in Bill James’s 1979 Baseball Abstract (pages 84-85). Bill’s concept of replacement level would be the performance of what he called “freely available talent,” basically a career AAA-level player whom a team could obtain at relatively little cost. I would propose a somewhat different interpretation. We would need to presume that we can rank order all the players at a position in terms of their expected offensive performance. Said simply, given that there are currently 30 teams in the major leagues, replacement level is the expected performance of the 31st best player at a given position (for starting pitchers, the 151st). If a team’s starter at a position is at or below replacement level, it means that

Page 34: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

34

somewhere there is another player at that position who is better and the team should want to replace its player with that better one.

No matter its exact definition, the reason that replacement level is a more useful comparison level than the average is that the latter underestimates the value that an average player can provide if it is consistent over a significant stretch. Using the average implies that a player exactly at average makes no contribution to the team’s performance, and a player a little below average actually hurts the team. But the following example, based on one used by Keith Woolner (2006c), shows the fallacy here. Let us presume that the Macon Maulers have the #1 shortstop as their regular and the Savannah Sluggers have the #15 best, i.e., just about average, shortstop. The former is capable of a .320 EqA and the latter a .260. The Maulers are obviously better off than the Sluggers at the shortstop position. However, suppose that #1 shortstop is hurt after 100 plate appearances, and their replacement is #31, in other words exactly at replacement level, and capable of a .230 EqA, which he delivers in the season’s remaining 500 PAs. If we combine the .320 in 100 PAs with the .230 in 500 PAs, we get a season’s contribution of .245 at shortstop for the Maulers. The Sluggers thus received more value from their average shortstop than did the Maulers from the combination. In fact, the Columbus Crushers, with their #18 aka below average shortstop (EqA = .250), also received more. In general, you can get more value from average performance over a long period of time than excellent performance over a short stint.

Several of the methods discussed thus far provide an index that is calibrated around the average player. Given this fact, a player with a long but slightly below average career may be viewed as having had a less valuable career for his teams than a player with a short but above average career. Due to its relative fame, Batting Runs has been the most discussed method making this assumption, so I will use it as an example. If you look at the list of the top 300 in career Batting Runs included in The Hidden Game of Baseball, despite his over 3000 hits and 900 stolen bases against just over 300 times caught stealing, you will find Lou Brock at number 294, behind among other stalwarts Johnny Grubb at number 256. Granted, Brock had slightly lower lifetime on-base and slugging averages than Grubb. But he did so over more than 6000 additional plate appearances. I don’t think that many people who follow baseball believe that Grubb’s career offensive contributions to his teams were greater than Brock’s. It may well be that Grubb spent more of his career providing above average offense than Brock, but Brock certainly provided more seasons with performance above what a replacement level player would have contributed. If one could come up with a measure of offense in which replacement level equals zero rather than average performance, then Brock would rank well ahead of Grubb.

One controversy surrounding the concept of replacement level is attempting to estimate exactly what it is. In his original discussion, Bill James estimated that replacement level was probably about 15 percent below the average offensive performance in a given league, and this guess was pretty close to what it is now

Page 35: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

35

believed to be. Woolner presented a very complicated formula that can be used to compute it, but it is much easier and generally sufficient to define replacement level at about 80 percent of the average offensive performance at a given position in a given season for most positions. There are, however, arguments about four of them. One argument claims higher replacement levels for catcher and shortstop (85 percent) and lower ones for first base and designated hitter (75 percent). These figures come from the general performance exhibited by backup players at each position. The probable reasons for these exceptions is that the defensive responsibilities of the former are significant enough that good fielding/poor hitting catchers or shortstops often serve as regulars with bad fielding/good hitters as backups, decreasing the overall offensive difference between regulars and backups across the majors. In contrast, teams will suffer from really bad fielding starting first basemen if they hit very well, in which case they often carry really good fielding first basemen who don’t hit much to be late-inning defensive subs, increasing the starter/sub difference. One can, however, reverse the argument, claiming it is actually lower (75 percent) for shortstop and catcher under the assumption that a team is willing to go with relatively worse offensive performance at these positions if they have a good fielder, and eighty five percent for first base and designated hitter for the opposite reason.

In a different context, Brock Hanke (1999) proposed an Offensive Winning Percentage of .350 as a ballpark estimate that he believed does a decent job of distinguishing major league starters from backups. As Hanke readily admitted, there is some arbitrariness about these figures, and we would want a more trustworthy estimate. Phil Birnbaum (1994) tried something simple, but it did not work very well. Phil classified every season in the 20th century up to his analysis totaling at least 100 games and 300 at bats; i.e. regular playing, into a categories defined by RC/27; 6.75 or more, 6.25 to 6.74, and so on down in .50 increments to 2.24 and lower, and then tried to determine cutoffs that distinguish between categories resulting in a greater than versus less than 50 percent chance of that player not playing regularly during the next season. It worked fairly well for third base (from 38 percent for 2.25-2.74 to 67 percent for 2.24 and lower) and second base (from 33 percent for 2.75-3.24 to 53 percent for 2.25-2.74) but not elsewhere, and the findings lack intuitive sense (second base should be lower than third base given that defense is valued more highly there, not higher). In general, Phil’s hope would be that obvious drops in the odds of maintaining a regular position would define replacement levels, but for positions other than these two there was a fairly smooth decline instead. Phil added Fielding Runs (see the fielding chapter on that) to RC/27 to see if considering defense and offense together would help, but it didn’t have much impact. The method just is not subtle enough to determine replacement levels.

An unpublished paper from about 2000 by Sky Andrecheck (n.d.) proposing an evaluation method called Benefit Value described an interesting approach to replacement level; although apparently the author did not realize that this was what he was doing, given some of his relevant comments (“replacement value sounds good in

Page 36: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

36

theory, but nobody can be sure how bad that poor player actually is - making it an impractical measure”; page 1). Andrecheck began with the by-then well-known criticism of Batting Runs that the average player is not worthless, but nonetheless used that measurement as the basis for his procedure. To evaluate the offense of a given player, Andrecheck suggested that one compares the Batting Runs for that player to that of the players who played the same position for other teams over the same amount of games. To use his example, in 1999 Mark McGwire was credited with 63 Batting Runs in 151 games. Let us imagine that McGwire was a Red that season. Then, the Red’s actual starter, Sean Casey, would back him up, and Casey’s actual backup, Hal Morris, would have not been on the team. In actuality, Casey played in 148 games and Morris played in 25 (totaling 173). In Andrecheck’s system, McGwire is given credit for his actual number of games (151), and so is credited with all of Casey’s games and 3 of Morris’s, while Casey is given credit for the other 22 of Morris’s games (maintaining the 174 total). McGwire would be credited with his 63 Batting Runs, charged with the Batting Runs that Casey would have amassed in the 126 games he no longer is credited with (in other words, 126/148 of his actual total), which turns out to be 24, and also charged for all of Morris’s Batting Runs for the season as he would not have played at all; although, as this turned out to be -2, the 2 is in this case added on. Thus McGwire as a Red would have been worth 63 - 24 - (-2) or 41 runs. In essence, a Reds team with McGwire and Casey would have scored 41 more runs than the actual Reds team with Casey and Morris. One then does the same for each other team in the majors and averages across them. McGwire’s average across teams (his Benefit Value) was 70. The reason that this is a way of looking at runs above replacement level is that McGwire on the Reds replaces Morris (and whomever is the weakest player at the same position on the other teams). Looked at another way, suppose we were to compute Morris’s value. He is a weaker hitter than, and would not get to replace, McGwire. Similarly, he is a weaker hitter than, and would not get to replace, Casey. A player gets 0 points for a given team if he would not replace anyone on it. Now Morris might get points for the Cardinals if he were a stronger hitter than McGwire’s backup. But the point is that the worst hitter at a given position in baseball will replace no one and get 0; the second worst will replace only that worst player and receive only the points separating the two, and so on.

Walker and Furtado (1999) describe a representative method. It starts with Brock Hanke’s assumption that a team of replacement level offensive players would, defense being equal, win 35 percent of its games, as would a team of replacement level defensive players, offense being equal. One then uses the Pythagorean theorem for predicting team performance based on the number of runs scored and given up (see the Team Evaluation chapter); their version is as follows:

(Runs scored)1.83

divided by

Page 37: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

37

(Runs scored)1.83 + (Runs allowed)1.83

If a team were exactly at league average in runs scored and allowed, the equation would predict a .500 winning average. Now, in order for the equation to predict a winning average of .350, the runs scored and allowed would each have to be deflated by .713; they call this the “runs deflator.” In other words, a replacement level offense team would score 71.3% of an average offense.

Walker and Furtado went on to compute a player’s Extrapolated Wins as an index of offensive performance relative to the replacement player, via the following steps:1 – You need the player’s league’s total number of

A – Extrapolated Runs,B - Plate appearances, andC - Outs made, including those made at bat plus double plays hit into and times

caught stealing.2 – Compute the following league indices:

A – The league “out percentage,” actually an average, outs made divided by PA per PA,

B – League Extrapolated Average, by, as mentioned earlier, dividing league Extrapolated Runs by plate appearances.

C – Outs made divided by number of teams, in other words expected outs per team

D – Plate appearances made divided by number of teams, in other words expected PA per team

E – The expected number of Extrapolated Runs a team scored, via

(Expected outs per team divided by league out percentage)Multiplied by

League Extrapolated Average

3 – Compute the player’s out percentage, Extrapolated Average, and the proportion of his team’s plate appearance the player consumes.4 – Depending on whether the player makes more or fewer outs than the league average, the player might provide fewer or more opportunities for those after him in the batting order to get plate appearances. We use the following formula

Team outsDivided by

(Player out percentage times player’s average of team PA) plus (league out proportion times [1 minus player’s proportion of team PA])

to get an adjusted number of team plate appearances. If we subtract the average

Page 38: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

38

number of team plate appearances from this adjusted number, we get the difference in team PAs from average due to that player. If the player makes more (less) outs than league average, then the team has lost (gained) PA because of that player.5 = Now we need the player’s contribution to the team’s run production. So we

A – Compute an adjusted team Extrapolated Average via

(Team XAVG times player’s average of team PA) + (league XAVG times [1 minus player’s average of team PA])

B – Compute a figure for player’s contribution to run scoring through multiplying the results of Step 4 by Step 5A.

The gives you the predicted number of runs the player should be credited with. You can then compare it with the league average runs scored per team to see how many runs better or worse the player was than league average. 6 – Next, to compare the average player with the offensive value of a replacement level player, you credit the replacement level player with the same number of plate appearances as the given player, and then multiply that by the average Extrapolated Average. The replacement player is only worth .713 of that average, which is about 66 runs. The upshot is that replacement level was 26 runs below average in 1998. This figure is useful for any player for that league-season.7 – You can recompute Steps 4 and 5 for the replacement player, using the real player’s proportion of team PA and values for the out percentage and XAVG for the replacement player (described in Appendices to Walker and Furtado’s article). I would instead use the 26 runs below average figure as the replacement player’s XR.8 – One last use of the Pythagorean theorem:

(Given player’s XR)1.83

divided by(Given player’s XR)1.83 + (Replacement player’s XR)1.83

which gives us the winning average when the given player replaces a replacement level player on an otherwise average team. The difference between the result and .500 is then multiplied by 162 to provide additional wins the player provides over replacement level, which they authors call Extrapolated Wins. As an aid in interpretation, the highest figure in 1998 was Mark McGwire’s 11.63; a rating of 5 would have placed a player at approximately 20th place in either league.

Rachel Heacock (2017, 2018) used an investment tool called a capital asset pricing model (CAPM), which estimates volatility in performance, on position player data (she gave no further details on this) to estimate the risk in “investing” in the player. As she describes it, CAPM allows an analyst to distinguish between the “systematic risk” inherent in investing, as based on the volatility either of the market as a whole or a

Page 39: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

39

given sector of it, from the “unsystematic risk” specific to the target of the investment. The model gives you an index (beta) for systematic risk, where average risk is defined as a beta equaling one. In baseball, risk is represented by the extent to which a player’s upcoming performance may be expected to deviate from that predicted. If the team as a whole gets better or worse, then the player’s performance is expected to improve or worsen, analogously to the systematic risk involved in a specific stock given the performance of the market. A regression line with the intercept value equaling the relevant major league replacement level on a given performance metric (Rachel used wRC+) can be computed; see the original for details. This allows the determination of a “team-specific replacement level” in terms of the expected wRC+ in which the relationship between a replacement level player and expected team performance; the better the team, the higher the team-specific replacement level. You can then compare predicted player performance to potential risk. Players with predicted performance greater than the calculated risk are keepers (above team-specific replacement level), those with the former below the latter are not (below team-specific replacement level). The author reminds us in both articles that the regression lines and therefore the performance/risk relationship are team-specific, which complicates computing the risk for adding a player to one based on performance on another team. In the 2018 article, Rachel discussed some improvements to the model, including using logistic regression to allow for multiple factors predicting and up versus down decision on keeping a player, and adding estimates of risk unaccounted for by past player performance,

All versions of Wins Above Replacement (WAR; see the Overall Evaluation chapter) need some definition of replacement level for computation. One of the best known of these methods is Baseball References’s rWAR, in which replacement level is defined as 20.5 runs in 600 plate appearances. If all players (including pitchers) were at replacement level, the team winning average would be at .294, or 48 wins. The sum of all players’ WARs for a given season should be approximately equal to the number of wins over 49 that the team achieved. Another well-known method is Keith Woolner’s Value Over Replacement Player (VORP). In research reported in the 2002 Baseball Prospectus, Woolner calculated the difference between the mean equivalent average for the starter (as defined by who had the most plate appearances) at every position for every team and the mean EqA for every other player at the same position on the same team between 1893 and 1998. In general, the backups were 80 percent as productive as the starters, with the exception of catchers (85%) and first basemen (75%); this is where the relevant argument above came from. In 2005, Woolner described a more sophisticated version with a computation method exploiting his work on Win Expectancy. To compute VORP, one first takes the average runs per out (Woolner used Equivalent Runs) at a given position and multiply it by the value for that position (as just noted, .8 except for catcher and first base), which provides run production at replacement level. Then one compares that figure with the run production of the given player to see how many runs the player is above or below that replacement level.

There is more to say about evaluating players via replacement level, and various

Page 40: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

40

methods for representing wins above replacement are listed in the Overall Evaluation chapter.

HITf/x- and PITCHf/x-Based Methods

The availability of data from recorded events on the field, specifically HITf/x and PITCHf/x, has been available since 2008 and has opened up a goldmine of potential applications for evaluating batting, pitching, and fielding. PITCHf/x came first, but after the 2008 season Peter Jensen suggested to those overseeing the data that what it included could be adapted for measurements of the speed, distance, and both horizontal and vertical angles (the latter now called launch angle) of batted ball trajectories and the location at which they land either inside or outside of the playing field. Thanks to these data, we now have fairly objective measures of the type of batted ball (fly, liner, grounder, popup) and its exit velocity upon being hit, and the combination of the two can be quite instructive. (For those wanting to explore the technicalities of the PITCHf/x data, see Mike Fast’s 2007 glossary of terminology).

The basic framework for an overall and, to the best of my knowledge, at the time unnamed measure of offensive performance using HITf/x data was proposed by Jensen in 2009. A bottom-up linear weights concept, it includes strikeouts (weighed at –.29 based on 2005-2008 data), unintentional (.32) and intentional (.09) walks, and hit by pitches (.34). The highlights so to speak are the HITf/x indices: 1 – bat speed, divided into six categories: less than 80 mph, 80-85 mph, 85-90 mph, 90-95 mph, 95-100 mph, and over 100 mph.2 – vertical angle, divided into eleven categories: less than –5 degrees, -5 to 0 degrees, 0 to 5 degrees, so on until 35-40 degrees, and more than 40 degrees.3 – horizontal angle, divided into three simple categories: pulled, center, and opposite.The idea is to assign a weight to each of the 6 X 11 X 3 or 198 combinations of these categories. Jensen included an attached spreadsheet with such weights, based only on April 2009 games.

Mike Fast obtained the 2008 data for 124,000 batted balls and performed the first detailed examination of exit velocity and launch angle. He reported a massive amount of findings in two articles (2011a, 2011b); I will summarize. The following are batting averages on balls in play launched at given vertical angles, estimated by me off of a diagram: Extreme negative angles, between –80 degrees and –40 degrees from horizontal, in other words balls beaten into the ground, resulted in paltry BABIPs of about .200. Those in the –30 to –20 degree range were even less effective, with BABIP at about .100, obviously weak grounders. After that, BABIP exploded to .200 by –10 degrees, .350 by 0 degrees (straight horizontal), .650 by 10 degrees, and peaks around .800 at 12 degrees, the ideal for batters. It then deploded just as quickly; to .550 by 20 degrees and .200 by 30 degrees, after which we are in popup territory and tiny BABIPs. Home runs have a narrow range, starting at about 20 degrees, peaking in the high 20s, and ending right around 40 degrees. Note that BABIP is greatest a good 10-

Page 41: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

41

15 degrees lower than the ideal home run area; Andrew Perpetua (2017a) made the important point that the 10 to 20 degree area is where exit velocity is typically highest.

As for effect of exit velocity, Mike presented another diagram and a table for the ideal 12 degree angle. I will again summarize: Less than 60 mph; grounders about sixty percent of the time. When grounders, 8 percent are hit so weakly that they land in infield no-man’s lands and become hits, 2 percent dribble into the outfield. If weak flies, only about 8 percent are lucky enough to drop in front of outfielders for hits.60-80 mph – about 5/8ths are flies. For grounders, infield hit percent drops to 5½ percent infield hits but 10½ percent get into the outfield. If flies, 25½ percent become hits, as most are still easy fly outs.80-90 mph – the fly/grounder proportion remains about the same. Infield hits remains at about 5 percent, but now 27½ percent are hit hard enough to get through. As for flies, now 36½ percent become hits, with some now getting between gaps and down the line for extra bases. And for the first time we have home runs, accounting for about 4½ percent of these flies.Over 90 mph – the fly/grounder breakdown is still 5/8ths for the former. Grounders only become infield hits 3½ percent of the time, but now 44 percent get into the outfield as hits. As for flies, 71 percent now become hits, many for extra bases and almost 18 percent for four bags.All of this equates to the following results for batting average, once again estimates based on his relevant diagram:

Miles per hour Batting Average Miles per hour Batting Average20 .100 80 .35040 .050 90 .50060 .150 100 .60070 .200 110 .650

Critically, Mike then split the data into half seasons and calculated that the correlation across batters between halfs was .76. This is strong evidence that exit velocity is, not surprisingly, a skill

This type of HITf/x data analysis continues to this day: The following is based on the 2014 season and was presented online by Tony Blengino (2015): We begin by defining the four types of batted balls by the launch angle at which it is hit, plus the mean BA and SA for that type:

Type Vertical angle when hit

Proportion of batted balls

Mean batting average

Mean slugging average

Popup >50 degrees 7.7% .015 .019Fly 20-50 degrees 28.0% .275 .703

Page 42: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

42

Liner 5-20 degrees 20.9% .661 .869Grounder < 5 degrees 43.4% .245 .267 Next, we combine the type of hit with exit velocity. For fly balls, Tony noted BA/SA figures of .560 and 1.884 for flies hit at greater than 92½ miles per hour and .077 and .148 for flies from 75 to 90 miles per hour. Although Fast’s diagrams show that there is a fine gradation among the exact velocities, the overall message is stark in both studies – exit velocity really matters, and, as the author put it, when age or injury lowers a batter’s average from the first category to the second, “it is where many careers go to die.”

Turning to liners, Tony is unclear about the normal fate of hard hit (97.5 mph) ones, although it takes that to get a homer out of it. Below that, batters were better off between 75 and 80 mph (.739 BA, .820 SA) than between 87½ and 90 (.637 BA, .825 SA), likely because the latter more likely reaches an outfielder’s glove. Nonetheless, any liner is good; a slower one (65 to 70mph) still gets you .547 BA and .579 SA. Below that, it ends up caught (.218 BA, .238 SA).

Finally, grounders. One going at least 95 mph (Ichiro’s apparent goal) flies through the infield (.532 BA) and sometimes through the outfield to the wall (.583 SA), but less than 70 mph is a piece of cake for the infield (.116 BA, .123 SA). In-between, either you find a hole or you don’t (.340 BA, .375 SA).bb b

An overall measure of offense for specific players based on average exit velocity should be possible. Andrew Perpetua (2017b), with the help of Enos Sarris, worked on one, which was a relative measure of player exit velocity as compared with the league average specific to the type and location of pitches faced. Andrew has also determined that, not surprisingly, pitches in the strike zone when batted have faster exit velocities than those outside of the zone. Putting the two points together, it seems that a hitter’s average hit velocity would be based partly on plate discipline (limiting batted balls in fair territory to pitches in the strike zone) along with factors such as swing speed. In addition, Healey (2017a), using HITf/x data, computed an overall wOBA for each location on vertical and horizontal angles for every batted ball in fair territory during 2014. Measured for each player, this work provides the potential for creating an index for batter evaluation based solely on the angle at which batted balls are hit.

HITf/x’s counterpart, PITCHf/x, provides data on pitch location, speed, and type. This information provides the opportunity to display data on batting prowess against pitches differing on these variables. As Joe Sheehan explained on a leap day post (2008), one can divide the strike zone and the area around it into zones (he has nine within the strike zone and sixteen surrounding it) and compute run expectancy values for pitches within each; and for detailed work, subdivide the data into type of matchup (e.g., lefty batter against righty pitcher), specific counts, or types of pitches.

Starting with Joe’s article, and in subsequent graphic representations often seen on media, the 25 zones are displayed with indices (BA is a popular one) for outcomes. It turns out that these data are not particularly reliable, with a split-halves correlation of

Page 43: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

43

only .30 for 2007-2011 data (Mike Fast, 2011d), at least partly due to small sample sizes in each zone even with all these data. In the same essay Joe noted that a continuous measure would be better than his 25 bins, and Dave Allen (2009, 2010) diagrammed steps in that direction. One result has been the introduction of heat maps, visual representations of the performance of batters on pitches in different locations inside and outside of the strike zone. These use gradations of color to stand for differences in batter performance, with red corresponding to the hottest hitting (over .300 BA), then orange (about .280) yellow (.260), green (.230), light (.200) and dark blue (below that) for the coldest. Dave Allen (2010) used heat maps to demonstrate various outcomes for different pitch types (changeup, four- and two-seam fastball, curve, slider) for the four left/righthanded pitcher/batter combinations. Several efforts have been made at providing more precise and accurate methods for computing heat maps than what was first available (Baumer & Draghicescu, 2010; Cross and Sylvan, 2018). One point made by Cross and Sylvan is that the differences among players, although easily discernible and very important for scouting/evaluation, are dwarfed by tendencies general across the majors, such as (not surprisingly) worse performance in the four sectors at the corners of the plate than the other five in the strike zone. In addition, Gore and Snapp (2011) introduced the idea of a measure of batter proficiency based on this type of data, which they called Swing Quality Metric, providing an example of a detailed analysis of Derrek Lee’s 2009 and 2010 performance to give an explanation for why it differed so much between the two years.

Current data also allows a composite measure of how solidly a batted ball has been struck. Tom Tango has defined a Barrel as a batted ball that, when classified along with analogous batted balls, results in a batting average of at least .500 and slugging average of at least 1.500 (see http://m.mlb.com/glossary/statcast/barrel for discussion). It is defined according to the interaction between exit velocity and launch angle, in that the greater the velocity, the further from “ideal” launch angle a batted ball can be while still maintaining that classification. At 98 miles per hour, launch angle must be within a narrow range of 26 to 30 degrees to qualify. For every one mile an hour increase, the allowable launch angle goes up by two or three degrees; at 116 mph, it is very wide (8 to 50). An equation that can help define it is:

(launch_speed * 1.5 - launch_angle) >= 117and (launch_speed + launch_angle) >= 124and launch_angle <= 50and launch_speed >= 98

Distinctions have also been made among regular Barrels, Near-Barrels, and Perfect Barrels. Russell Carleton (2016) determined that in 2015 and 2016, the Barrel definition was met by only 12.9 percent of batted balls and 36.9 percent of extra base hits. But turning the idea around, 58.8 percent of barrels resulted in homers, and an additional 20.1 percent doubles or triples. The correlation between barrels and homers across players was an almost perfect .93 and for doubles/triples a still noteworthy .58.

Deserved Runs Created Plus (DRC+) is the Baseball Prospectus’s latest entry

Page 44: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

44

into the mix, which they (of course) claim to outperform all others (Judge, 2018a) although it correlates at over .9 with them (Main, 2018b). I assume it uses HITf/x data. I will have no more to say about it until I know how it works.

Never Was’s

Listed here are a number of miscellaneous methods, none of which have been at all influential. Some are of passing interest, others not.

Page 45: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

45

Of Passing Interest

Anderson and Sharp (1997) proposed a method that they called the Composite Batter Index, or CBI for short. They used a statistical procedure called data envelopment analysis that basically provides an analogy of a variance accounted for measure between a group of independent and dependent variables. Anderson and Sharp searched for the best predictive combination of walks, singles, doubles, triples, and home runs in order to produce an index of relative performance among players in the same league during the same season. The CBI score represents the percentage of plate appearances that would be required for the best virtual player to produce at least as much as the player studied (pages 153-154). A score of .8 means that some other hitter or combination of hitters in the league could have produced at least as many of each type of hit in 20 percent fewer plate appearances. The term “virtual” is due to the fact that the performance representing the league best could be based on combinations of hitter performances (e.g., that of the seasons’ best power hitter plus best singles hitter plus best walker). Thus, CBI is normalized, not to the average, as is Palmer’s methods, but rather to the virtual league leader’s performance. Mazur (1994) produced an analogous but far less valid procedure, based on the apples-and-oranges indices of batting average, home runs, and runs batted in.

Koop (2002) proposed a method that compares players to one another based on three measures; singles/walks/hit by pitches, double/triples, and home runs. Based on all players with at least 200 at bats during a season from 1995 through 1999, Koop’s analysis resulted in three lists, one comparing power hitters (topped by Gary Sheffield, Frank Thomas, and Barry Bonds), one on-base average hitters (led by Jason Kendall, Chuck Knoblauch, and Tony Gwynn), and one hitters with a range of skills but excelling in none (Edgar Martinez, Mark Grace, and John Olerud). Overall lists are possible, but require assumptions concerning the relative importance of the three defining measures.

Not of Passing Interest

Here I will mention a few methods that are of no value. I am included them here because they have been published in academic journals, although they should not have been. Lanoue and Revetta (1993) presented a method in which players are scaled according to five idiosyncratically chosen indices of totally different scales (in order of, in their judgment, importance, runs produced, extra base hits, batting average, stolen bases, and walks) and these scales, weighted by importance judgment, then combined into an overall ranking. Wittkowski, Song, Anderson, and Daniels (2008) proposed a method based on combining rank orderings of batting average, slugging average, on-base plus slugging, and a fourth, idiosyncratic index, all of which, due to their great redundancy, provides relatively little independent information. The resulting measure, which they called a u-score, was correlated at an almost perfect .986 with Batting Runs scores for 2003 batters. Despite their claim that their measure does a particularly good

Page 46: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

46

job of singling out players with specialized skills, I see no point to it.I mentioned data envelopment analysis above, a method that has unfortunately

resulted in a work that usually trite and often misleading. John Ruggiero’s (2010) relevant effort is sadly no exception. Based on standard offensive categories, Ruggiero determined that the best offensive performers in 2009 were players who were close to the lead league in these categories. However, his method basically equated rankings in each category, such that Denard Span’s 10 triples made him as “productive” as Prince Fielder. Need I say any more.

In addition, in the past Baseball Research Journal and even Baseball by the Numbers included articles on performance metrics that are analogously of no value. I have chosen not to described them here.

Comparisons among Methods

Given the existence of all of these measures, it is not surprising that effort has been directed to determining their relative effectiveness. Such efforts are fraught with difficulty. First, it is very critical to remember that any bottom-up method worth its salt using coefficients based on a specific sample of years will by its very nature be very accurate for those years, but the coefficients may not work at all for other samples of seasons. Many researchers have proposed their own specific formula, and then evaluated it against other formulas using the same sample of seasons that they used in computing their own. Such a demonstration is useless and to all extents and purposes tautological. Second, by their very nature, methods based on run expectancy tables and bottom-up regression techniques will be more accurate over a given set of years than methods based on top-down regression techniques, just because the former have been designed for accuracy rather than theoretical elegance.

A further problem has to do with lineup position. For one, Tom Hanrahan (2008a) computed the average number of runners on base for each lineup position over the course of a game. Most ranged from .62 (2nd position) to .74 (4th position), but the leadoff spot lagged well behind at .50. Tom then performed a million-game simulation using typical performance at each lineup position and then compared that to performance with an exceptional batter in different lineup positions to see the relative impact on scoring. The upshot was that this impact was about ten percent less for the leadoff position than for the others. The implication is that our evaluation procedures do not control for lineup position and as such there is an implicit assumption that offensive events are worth the same no matter where in the lineup they occur. As such, they overvalue leadoff hitters by about 10 percent, because their hits net fewer runs than hits from other positions. An easy response is that leadoff hitters should not be blamed for the fact that there are on average fewer people on base when they are up.

A better idea is to evaluate classes of methods to see which class works better. As mentioned earlier, a top-down regression method will probably be less accurate than a bottom-up method or a run expectancy method but more accurate than an index

Page 47: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

47

based on just one of two aspects of offense, such as batting average, on-base average, or slugging average. Wyers (2009) offered a thoughtful such attempt, showing not only that but why a method such as Base Runs will be more accurate than Runs Created or Extrapolated Runs using a data set big enough (all Retrosheet data from 1956 through 2007) to counter the problem of formulas designed for a specific sample of years.

Below, I present comparisons among methods offered by Pete Palmer (Thorn & Palmer, 1984), John Jarvis (1998b), and Jim Furtado (1999c), all based on the standard deviation between the number of runs that a given method predicts a team would score and the number of runs it actually scored in a year. Pete based his analysis on data from 1946 through 1982. Jarvis’s data included all that was available on-line at that time: American League 1967, 1980 to 1986 and 1992 to 1996. Furtado applied data from 1955 through 1997. For reasons described earlier, the fact that both Pete and Jim found their own methods to be the best of all can be taken with a grain of salt.

Jarvis Palmer FurtadoNon-normalized Batting Runs 21.9 19.8 22.1Estimated Runs Produced (Johnson) 22.0 24.3Runs Created, Technical Version 22.7 24.6 22.2Expected Run Production Average 23.1 24.5Offensive Earned Run Average 23.8Batter Run Average, Basic Version 24.8 24.4 24.9Runs Created, Basic Version 25.1 25.8 25.9Total Average 26.3 31.1 29.0Base-Out Percentage 26.7 30.9Cook’s DX 26.9 26.4 31.2Batter Win Average 29.3On-Base Plus Slugging 35.0 20.4 41.4D’Esopo/Lefkowitz’s Scoring Index 39.6Branch Rickey’s Version of OPS 35.4 41.0 35.3Isolated Power 49.8 50.8On-Base Average 50.9 53.0 48.7Normalized Linear Weights 59.2 22.3Slugging Percentage 88.9 39.9 38.0Batting Average 96.7 54.8 49.4Extrapolated Runs 20.9Offensive Performance Average 39.7Equivalent Average 41.3

Baumer and Zimbalist (2014) went after the same fish with a different bait; correlations for each season from 1954 through 2011 with run scoring. Their findings: .81 for isolated power, .82 for batting average, .89 for on-base average, .91 for

Page 48: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

48

slugging average, .94 for wOBA, and .95 for OPS and extrapolated runs per 27 outs. In other words, as unreliable as some of these measures are across years, all of them (and the skills they measure) are involved in run production. Walk rate was also related with scoring, although more moderately at .42. In contrast, some of the more skill-based measures (strikeout rate, .09; and strikeout-to-walk-ratio, negative .20) are not directly related with run scoring.

Home runs have been a particular problem for evaluation methods. A hitter’s offensive value from scoring on their own home runs gets mixed up in most methods with their offensive value from scoring on hits by other hitters. As pointed out when discussing Base Runs above, the processes are fundamentally different. Thus the value of home runs can get drastically off for extremes in power. On his tangotiger website, Tom Tango showed that in an environment with a very large number of home runs, Linear Weights badly underestimates and Runs Created badly overestimates the actual run value of home runs. Base Runs, in contrast, did very well, probably because it separates home runs from other ways of scoring. Wyers’s (2009) findings, using Extrapolated Runs rather than linear weights, are basically the same.

Clifford Blau (1999b) came up with two creative methods for comparing four evaluation methods: Runs Created, Estimated Runs Produced, Extrapolated Runs, and Ugly Weights. The first was to use analogous indices (i.e., various types of hits given up, walks allowed, batters retired, etc.) for pitchers and comparing the estimated runs given up from using those indices to actual runs allowed. Overall, ERP and XR outperformed the others, although not surprisingly UW did better for the best and worst quartiles. The second was to create indices from the results of multiple data sets of either 18 high scoring or 18 low scoring games. This time, ERP and XR excelled for high offenses and UW for low. RC never performed as well as at least one other method, but it was not necessarily the worst of all.

Ballpark Adjustments

There are good hitter’s parks and good pitcher’s parks. This fact has long been noted but not considered in player or team evaluation for decades. Pete Palmer (Palmer, 1978) was apparently the first to take the issue seriously. Pete’s method (Park Factor) has been to calculate a season-specific ballpark factor as a correction for both hitting and pitching. A quick-and-dirty way of doing this is:

1 – Runs scored and allowed at home divided by number of home games2 – Runs scored and allowed on the road divided by number of road games3 – Divide 1 by 2. Gives an index in which a totally neutral park would be a 1, a hitter’s park greater than 1, a pitcher’s park less than 1.4 – To continue the quick and dirty, divide a player’s overall offensive performance (such as batter runs or runs created) by this park factor index, and the result would give you a park-neutral number that can be used to compare players to one another.

Page 49: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

49

This results in an index of greater than 1 for good hitter’s parks and less than 1 for good pitcher’s parks. You can then divide measures such as batting average and earned run average by these indices, resulting in them getting smaller/larger in a good/bad hitter’s park as you would want. If instead you want to normalize them on a scale of 100, you would divide the player’s BA by league average BA, divide by the park factor, and then multiply by 100 to get it to scale. This method does not, however, work simply for adjusting for Pete’s Batting and Pitching Runs, because they are deviations from average. The Baseball Prospectus folks make adjustments to their projections in analogous ways.

Pete (Palmer, 1983c; see also Thorn and Palmer, 1984) made several adjustments to increase accuracy. First, because dividing both numerator and denominator by games can lead to slight problems if, for example, a team is much better at home (and so doesn’t bat much in the ninth inning) than on the road (thus batting a lot in the ninth), he replaced “games” by “innings.” Second, based on a point made by Jim Reuter (1983), Pete realized the following: The idea here is to have a denominator that is universal for a specific league and year, such that the denominator will be the same for relevant ball park in that league/year. But that doesn’t work. For example, if a team plays in a good hitter’s park, the impact of that park will be missing from the denominator, resulting in a lower away effect than for teams playing in a poor hitting park (where the denominator includes the good hitting park but not the poor one). Third, the denominator makes the analogous presumption that the pitchers every team face are identically effective, which is obviously false and will lead to overestimates for a team if their pitchers are particularly good (leaving the denominator biased toward poor pitching) and underestimates if the staff is bad; this is critical when using Park Factor to adjust offensive and pitching indices. Fourth, a team may have purposely designed their roster to take advantage of their home ballpark’s idiosyncrasies, which could be taken as a bias in the numerator. For these sort of reasons, Pete decided to limit the formula to runs by the opposition.

The method still has a number of problems. One is that it assumes a totally balanced schedule, but that has not occurred since divisions were formed. Willie Runquist (1995) made a number of other good points about this technique. First, as also noted by Bill Felber (2005), Park Factor will fluctuate a lot from season to season due to random variation. As a consequence, ideally you would want to merge several years of data for a more stable estimate. But that brings its own problem, because as it is based on relative run-scoring environments across teams it assumes teams stay in the same ballpark across seasons. If, for example, a couple of teams move from a good hitting park to a good pitching park between seasons, or adjust their park to encourage offense, then the other ballparks will, without any changes on their own, look like better hitting parks than previously (and analogously encouraging offense). It also assumes no change either in team personnel, which is obviously untenable, or, as Bill Felber (2005) suggested, team strategy, such as a conscious decision between

Page 50: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

50

emphasizing either small-ball or the three-run homer. Thus one is stuck between the proverbial rock and hard place when it comes to measurement; random year-to-year fluctuation suggests the need for multiple years, but the other factors imply a one-year time frame.

Both Willie and Bill Felber made the further point that including home team performance can lead to measurement bias. A very good home team’s performance is focused on its 81 home games but diluted across all the road games, which could result in overestimation of the home field. In addition, Willie and Bill James (1983a) pointed out that Pete’s method ignores the possibility of specific player/ballpark interactions in the sense that some players are uniquely suited to some ballparks and not others (think Dante Bichette in Colorado compared to his time in Anaheim and Milwaukee during what would normally have been his peak years) whereas it might not matter much for other players. Finally, it only works at the seasonal level in which batters play both home and away and not if, for example, one wants only to recalibrate home performance measures alone.

Both Willie and G. Jay Walker (1999a) make another useful point relevant to such adjustments. If one is chiefly concerned with a ball-park (sic) estimate of adjustment criteria, then methods such as Palmer’s, in which an offensive index is calculated and then adjusted, is sufficient. But if one is interested in more precision, then one should do the adjustments for each of the factors that make up the index first and then combine them into a total index. This is because total ballpark effects are actually a composite of different factors. For example, as is well known, some favor home runs while others depress them. Some favor extra base hits other than home runs due to spacious outfields and artificial turf. Those with poorer lighting depress batting averages and increase strikeouts. Michael Schell did just that in his 2005 ranking of players on specific offensive indices (summarized below), and his Event Specific Batting Runs index described earlier was computed with these adjustments included.

And finally, both the older and newer, i.e. not the cookie-cutter generation of ballparks, tend to be asymmetric, such that right and lefthanded hitters on the same team may be differential advantaged or disadvantaged. In a relevant demonstration, Ron Selter (2002) estimated the impact of asymmetry on adjustments for National League parks in use from 1927 through 1937. For example, Philadelphia’s Baker Bowl has been thought to be biased for lefties; Selter’s data supports that thought, with adjustments that differed by .11 for BA, .08 for OBA, and a whopping .20 for SA.

A group of researchers including Carl Morris (Acharya et al., 2008) discerned one last problem with that formula; inflationary bias. I use their example to illustrate: Assume a two-team league with Team A’s ballpark “really” has a factor of 2 and Team B’s park a “real” factor of .5. That means four times as many runs should be scored in the first as in the second. Now we assume that this hold true, and that in two-game series at each park each team scores a total of eight runs at A’s home and two runs a B’s. If you plug these numbers into the basic formula, you get

Page 51: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

51

1 – (8 + 8) / 2 = 8 for A; (2 + 2) / 2 = 2 for B2 – (2 + 2) / 2 = 2 for A; (8 + 8) / 2 = 8 for B3 – 8 / 2 = 4 for A; 2 / 8 = .25 for B

figures that are twice what they should be. The authors proposed that a simultaneous solving of a series of equations controlling for team offense and defense, with the result representing the number of runs above or below league average the home park would give up during a given season. Using Retrosheet data from 2000 to 2006 for each league separately (despite interleague play, mudding the waters) and, based on 2006, a 5000-game simulation, the authors found their method to be somewhat more accurate and, in particular, less biased than the basic formula. They noted how their method also allows for the comparison of what specific players would accomplish in a neutral ballpark and how a given player’s performance would change if moving from one home ballpark to another.

Sample Size

It happens every April. Some guy who has been a .270 hitter for his entire career hits .400 that month and the pundits claim that he has finally found himself. At the same time, a long-time .320 hitter hits .200, and everyone thinks he’s washed up. At the end of the year, the first player is back around .270 and the second near .320 again. Analogous events occur in September; a decent player on a playoff-bound team goes off on a tear for a couple of weeks and the analysts are convinced that he will carry the team through the playoffs. Comes the post-season, he reverts to his customary decency. The point is that a lot of people who do not understand that the results of batting performance over a relatively small number of at bats cannot be trusted as representative of what the player will achieve over the long haul.

The sample size problem was first noted in print by the pioneering baseball researcher George Lindsey (1959), who presented a diagram showing the 90 percent confidence interval for .300 and .240 batting averages in the range of 50 to 600 at bats. Lindsey presented several relevant cases in his text; for example, assuming this confidence interval, a .300 hitter can be anywhere between .225 and .375 after 100 at bats and could hit as low as .165 over a week. Abelson (1985) presented a good demonstration of the problem in an essay dedicated to the general question of accounting for variance. He demonstrated that, given a mean batting average of .270 and a standard deviation of .025, which is a reasonable estimate for a 500 at bat season (see just below), the amount of variance accounted for by a batter’s ability to hit for average on the batter’s odds of getting a hitting in a given at bat can be computed by

(.025) 2 (.270)(1-.270)

Page 52: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

52

which equals .00317; in other words less than a third of one percent. To use his then-relevant example, this implies that the difference between George Brett (who, at .320, would be 2 standard deviations above the norm) and Lenn Sakata (at .220, 2 s.d.’s below) would be four times this figure, or about 1.3 percent. The catch is that we ought to be comparing batters based on not one but rather hundreds of at bats, in which case this small difference adds up to something large. But over 10 or 20 at bats, it matters not. Incidentally, Mead (1984) calculated the standard deviation for batting averages for a 500 at bat season to be 20 points. Other standard deviation estimates by Mead were 6 home runs for a player averaging 36, 30 points in ERA for 250 innings pitched, and 6 wins for a team in a 162 game schedule.

Getting back specifically to the “April” examples in the first paragraph, the tendency for batting performance that is out of the ordinary to revert toward that ordinary over a period of time is a well-known statistical phenomenon known as “regression toward the mean.” To demonstrate regression toward the mean, Scott Berry (2004a) noted that of the ten left-handed hitting players (no idea why this restriction) with the highest batting average at the end of May 2003, the average of eight had decreased by the end of the season, while eight of the ten with the lowest had increased . The Appendix in Tango, Lichtman, and Dolphin (2006) describes a method for estimating expected full seasonal performance based on a small sample.

Predicting Performance

An issue in prediction is the tendency for players who have had particularly good or poor seasons to revert closer to average the next year. This is another example of regression to the mean, to which Bill James gave the more fanciful title of the Plexiglass Principle. It occurs as a result of the “laws” of randomness; when a player has a randomly good (or bad) year, it is unlikely that it will be followed by another randomly good (or bad) year and likely that the player will have a normal year. In a statistical demonstration, Schall and Smith (2000b) performed a regression with 1998 batting averages as the independent variable and 1999 batting averages as the dependent variables for players with at least 50 at bats. The results indicated a regression coefficient of .378, meaning that a player who hit 50 points over the league mean in 1998 would be predicted to hit 19 points over that mean in 1999 (and analogously for hitters 50 points under the mean). Clearly this would be a particularly good (poor) hitter for average, but not as good (bad) as their 1998 average would make you think. Schall and Smith used this information in models that improved prediction of one year’s batting average from the previous season when compared to models that did not consider regression toward the mean.

For prolific home run hitters, Scott Berry (1999b) proposed a model that considered three years of data along with regression toward the mean, such that a player’s number of home runs from the previous year is discounted to 65 percent, from

Page 53: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

53

two years previous to 35 percent, and from three years previous to 15 percent. In addition, the model included variability for the number of games played in a season. For example, the resulting model predicted 59 home runs for Mark McGwire in 1999 with a standard deviation of 10.2, an 86 percent chance of hitting 50 home runs, 52 percent chance of hitting 60 (which as Cliff Blau pointed out to me is strange, there should be only 50 percent of cases over the mean), and 16 percent chance of breaking his own record with 71. A year later, (2000b), Berry analyzed the accuracy of his model during 1999 for the 25 most prolific home run hitters in 1998, and noted, in its support, that the error of his predictions approximated a normal distribution. Nonetheless, just assuming that this group would hit the same number in 1999 as they did in 1998 would have led to predictions just about as accurate as his. Next, Berry (2002b) used the distribution suggested by the variability estimation component of the model to calculate the odds of extreme increases in home run performance from year to year, as an informal examination of whether there is evidence of steroid use as a possible cause. He settled on 40 seasons in which an increase would have occurred by chance less than once in one thousand for players with at least 1000 previous at bats. The most unlikely increase was Kirby Puckett’s jump from 4 home runs in 1986 to 31 home runs in 1987, with a likelihood of occurrence of 2 in 100 million. Next was Brady Anderson’s 50 home runs in 1996, well into a career that otherwise topped out at 21 and following a season of 16; this had odds of 6½ in 10 million. Turning to the steroid issue, Sammy Sosa’s jump from 36 to 66 homers between 1997 and 1998 had odds of 4 in ten thousand; Barry Bonds’s from 49 to 73 between 2000 and 2001 had odds of 1 in ten thousand. Although small, the latter was almost exactly the same as Luis Gonzalez’s rise from 31 to 57 the same year, and was about the same or less than, for example, jumps by Lou Gehrig (16 to 47 between 1926 and 1927; 4 in one million), Hank Greenberg (40 to 58 between 1937 and 1938; 1½ in 100 thousand), Stan Musial (19 to 39 between 1947 and 1948; 4 in ten thousand), and Carl Yastrzemski (16 to 44 between 1966 and 1967; 4 in 100 thousand). In addition, although there were nine extreme jumps during the steroid-ridden 1990s, there were, among far fewer players, ten in the 1970s and seven during the 1940s. The moral of the story; we cannot use big year-to-year jumps in home run production as evidence either for or against steroid use.

Quite a few researchers have attempted to determine the accuracy of within-season predictions of performance results based on early season levels. A particularly interesting one proposed by Bradley Efron and Carl Morris is described in a relatively accessible manner in a 1977 article; a 1975 version is far more technical. If you have a sample of players who have the same given number of at bats, you can get a better estimate of the end-of-season batting average for each of them using the following formula:

(Mean of current batting averages for entire sample) + ([current batting average for that player minus mean for entire sample] X regression-toward-the-mean weight)

Page 54: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

54

with that weight calculated by

([sample size of players minus 3] X variance among current batting averages)divided by

{[(current batting average for that player minus mean for entire sample) squared] and then summed across players}

The general statistical method this was based on was proposed by a mathematician named Charles Stein and is known as Stein’s estimator. Efron and Morris demonstrated this method by examining the batting averages through their first 45 at bats in 1970 of 18 very diverse players, including Robert Clemente’s .400 and Thurman Munson’s .178. The procedure shrunk each player’s BA toward the sample mean by about 80 percent, leaving Clemente at .294 and Munson at .247. Clemente actually ended at .346 and Munson at .318; but across the 18 players, the estimated BA was closer to the actual BA than the first 45 AB performance. The reason why the estimates were as far off as they were was that Efron and Morris did not use a random sample of players in their sample but chose players with widely diverse early BAs; a more representative sample would have resulted in less shrinkage. Jensen, McShane, and Wyner (2009a) presented a method geared specifically for home runs, although it could also be used for other indices. It estimates a career trajectory for a player based on the player’s past performance and the trajectories of former players at the same fielding position; i.e., a second baseman would be modeled through examining other second basemen. The author(s) purposely tried to keep it as simple as possible, using only the player’s age, number of at bats, and team as variables along with past home runs and position. One unique move they made was to distinguish between “elite” and “non-elite” home runs hitters at each position, and include regression to the mean for each of these groups separately. To compare the method’s predictions with those for PECOTA and MARCEL, they used overall 1990-2005 data to project 2006 home runs totals for the 118 batters who had averaged at least 2.5 home runs per 100 at bats in any 300 or greater at bat season during that interim. Their system was the most accurate predictor of the predictor for a plurality of players, but overall was a bit less accurate than PECOTA due to very poor predictions for a few.

Several responses to their work (Albert & Birnbaum, 2009; Glickman, 2009; Quintana & Muller, 2009) included the following criticisms: 1 – The use of the player’s team conflates home ball field effect and the impact of the player’s own team, which could have been mitigated by using Retrosheet data instead of overall seasonal data.2 – The distinction between elite and non-elite players is artificial as the distribution is not bi-modal but rather continuous.3 – Estimating player performance against only others sharing the same position makes no sense.

Page 55: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

55

4 – They failed to adjust for the increase in overall home run rate between 1990 and 2005.5 – The use of only a small subset of players leaves open the question of how accurate their method is for the bulk of them.In a rejoinder, Jensen et al. (2009b) accepted many of these issues but claim that adding year-by-year corrections for the increase in homer rate barely improved prediction, and that the use of player position acts as a proxy for player speed and size differences. The latter is an interesting point given that PECOTA relies on these aspects in its predictions.

Brentnell, Crowder, and Hand (2011) used Efron and Morris’s data set to compare the accuracy of a number of different estimation methods, including both the Efron/Morris and their own. Jim Albert (2016) developed and compared methods for projecting BA and OBA based on dividing them into component parts (for BA, strikeouts per at bat, home runs per batted ball, and hits per balls in play; for OBA, these three plus walks + hit by pitches per plate appearance), estimating performance on each, and then combining them into final projections. In so doing, he replicated the fact described at the beginning of this chapter that differences among players in home runs and strikeouts are more stable over time than those for hits on bases in play. Gu and Koenker (2017) used 2002 to 2011 data to predict 2012 batting averages for 344 players with at least 40 at bats that latter season and 3½ prior major league seasons. Adding an age trajectory improved prediction accuracy. Xie, Kou and Brown (2012) and Feng and Dicker (2018) presented Bayesian methods for predicting the second half of the season from the first half. Other statisticians proposing models for predicting in-season performance include Jiang and Zhang (2010), Muralidharan (2010), Martin (2015) and Weinstein, Asaf, Zhuang Ma, Lawrence D. Brown & Cun-Hui Zhang (2018)..

How much data do you need to do a good estimation? Using Bayesian methods analogous to Stein’s estimator with 2005 data, Lawrence Brown (2008) demonstrated that batting averages over the first half of the season do a poor job of predicting BA over the second half without some “shrinkage” toward the average of every batter’s first half performance. Using one month is useless, for the same reason. In contrast, BA for five months does a good job of predicting September performance. Neal, Tan, Hao, and Wu (2010) examined several models for predicting batting and on-base averages for the second half of 2005, with the following conclusions: not surprisingly, a larger sample size (all of 2004 plus the first half of 2005) leads to more accurate predictions than a smaller sample size (only the first half of 2005), and as the number of at bats on which the predictions are based affects prediction accuracy, including this number in predictive models improves accuracy.

A thoughtful take on the prediction question for batting averages was provided by Burnson (2003). He reasoned that BA is a product of three components: the percent of at bats in which the batter hits a fair ball, the ability to get extra bases on hits, and speed. Burnson then estimated regression equations for each league predicting batting

Page 56: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

56

average from indices for the following three measures: “contact percentage,” a weighted ratio of extra base hits per at bat, and James’s speed score index. This is a weird mix, as the three are on wildly different scales, but their combination did a better job of predicting BA for the second half of 2002 than BA for the first half of 2002 for 79% of batters in his sample (the criteria for inclusion were not stated), and those for full years (2000 and 2001) were more accurate predictors for 55% the next year (2001 and 2002, respectively).

In 2010, Bill James came up with a quick and dirty method for predicting whether a batter with at least 400 PAs will, based on one year’s performance, hit better the next year. Without going into detail, it is based on:1 – The tendency for performance to regress toward the mean, through comparing the first year’s OPS and runs scored plus RBI compared to career indices;2 – A version of batting average on balls in play, given that unusually low/high BABIPs are more often than not products of sustained bad/good luck on batted balls unlikely to repeat the next season;3 – Walk/strikeout ratios compared to OPS, because when a good figure for the first index is matched with a bad number for the latter, it is also likely bad-luck driven, and analogously for bad BB/K ratios linked with good OPS;4 – Player age, as younger hitters tend to get better and older hitters get worse; and5 – Speed score (based on triples, stolen bases, and ability to stay out of double plays), because fast players tend to age well.Bill displayed evidence that his index predicts fairly accurately, which is not surprising given his judicious choice of criteria.

I cannot leave this section without mentioning a paper by Griffith Chung (n.d.).Based on a sample of 155 players whose rookie years were between 1983 and 1987, Griffith demonstrated that the standard deviations for how much each player’s rookie year batting average, slugging average, and linear weights estimates differed from their analogous third, fifth, and seventh year performance was so large as compared with the means of these differences that rookie data cannot be used to predict later performance. Griffith was in 8th grade at this time, using statistical methods that I did not learn until my junior year in college. Unfortunately, I have seen no evidence that he has maintained his interest in sabermetric research.

Projection Systems

Some projection systems have been proposed that are worthy of note. Bill James presented a series of such methods that use a position player’s performance during the previous three or four seasons and projects the rest of the player’s career. The first, Brock2, is described in Bill’s 1985 Abstract, first in general (pages 16-20) and then in enough detail for readers to recreate (pages 301-305). With regular players, the system first uses the player’s age, position, and league average batting performance to determine whether the player is still above replacement level. If yes, it continues

Page 57: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

57

projecting as before, with improving performance up to age 27 and reductions starting afterward. If the player is projected as below replacement level, then it begins reducing playing time, leading to the end of his career in a couple of years. Bill’s examples reveals the system to be not particularly accurate for younger players (not surprising, given all the influences that can turn out to affect a young player’s career trajectory) and more successful for older players. Bill unveiled projections using later versions (Brock4, Brock 6) in subsequent years (see Bill’s 1987a essay, in which he discussed considerations relevant to its upgrading).

Unfortunately, the Brock methods don’t seem to work very well. Based on all complete careers between 1911 and 2000 which encompassed at least six seasons and 200 at bats, Jarvis (2002) examined Brock2 and found that not only did it underpredict career achievements, it was less accurate at projecting annual performance data than using a given player’s data from the previous season. When revising the program, perhaps Bill overcompensated, because after studying projections for 200 players between 1950 and 1980, Dallas Adams concluded that Brock4 overpredicted most career indices by twenty to forty percent. Not surprisingly, as there was more past data to work with, projections for more experienced players were more accurate than those for the less experienced.

Nate Silver’s (Silver, 2003, 2006b) Player Empirical Comparison and Optimization Test Algorithm (always referred to by its acronym PECOTA) is a projection method used by the Prospectus group, and its details are proprietary and thus not public. Given Silver’s later work with his fivethirtyeight.org, my guess is that it is a simulation model run through a large number of iterations in order to provide a distribution of possible performances for the next year for a given player. It seems to be more accurate than the Brock sequence, although how much so is unknown. It projects a player’s past performance on to future performance as follows:

First, the establishment of an expected baseline performance trajectory based on a three year weighted average (both major and minor league normalized for ball park and league effects), using indices relevant to power, contact, batting eye, and speed, along with player position, age, height, weight, and usage, and including in that baseline subtle interactions among components that are ignored by other methods; including those between batting average and absence of strikeouts, between doubles and homers, between walks and isolated power, and between stolen bases and singles. This allows, for example, a role for doubles in predicting future home run totals by young players.

Second, inspired by Bill James’s “Similarity Scores” (see The Overall Evaluation chapter for this), the development of a set of similar players based on a wide selection of components, some based on production (for position players, he listed isolated power, batting average, walk and strikeout rates; for pitchers, the three basic DIPS measures), some on usage (career length, annual plate appearances, batters faced for pitchers, “and so on”), and “phenotypic attributes” (handedness, height and weight) along with playing position. For the latter, the similar players need not field the same

Page 58: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

58

position as the one under examination, but if not, they tend to be from “similar” positions (his example, shortstops will be compared to a second basemen before left fielders).

Third, calculate how well the set of similars performed in the year of interest relative to their predicted baseline, and then use that relationship to predict how the player under examination would perform relative to his baseline.

It also estimates an expected variance in performance in a given year, which allowed Silver to provide estimates of the odds of performance 20 percent above (Breakout) or below (Collapse) the projection. PECOTA has led to the following insights: players combining multiple skills (pairings such as speed and power or good batting eye and batting average) tend to maintain performance longer than those with only one of these skills; players manning more difficult positions have relative shorter careers, particularly second basemen, probably due to greater risk of injury; shorter hitters have more trouble developing power whereas taller players more difficulty with plate discipline; high strikeout rates early on signal worse player development. There is also a version of PECOTA for pitchers that has received little attention; all that Silver mentioned is that future performance for pitchers with high walk rates is harder to predict than that for control artists. PECOTA has continued to be adjusted based on insights since its creation, and although the claims of creators of this and competing systems cannot be trusted, it does seem to be a bit more accurate than many of those competitors.

MARCEL is Tom Tango’s system, which is also based on a weighted average of the previous three years of performance, but includes regression to the mean to the extent that past data are scarce. Tom’s system works as follows (taken from http://www.tangotiger.net/archives/stud0346.shtml):1 – To project a given season, use data from the previous three seasons with relative weights of 5, 4, and 3 and weighted by number of plate appearances, an attempted control for random variation as that decreases with more PA.2 – Add this to the average figures for the majors (minus pitchers’ contributions to those averages) over these seasons, also weighted 5, 4, and 3 and normalized to 1200 PA.3 – Determine projected plate appearances based on the following formula:

(Previous season PA * .5) + (Season before PA * .1) + 2004 – Adjust the results of step 2 in terms of the projected PA.5 – Perform an age adjustment on those results to control for career trajectories. To quote from the website: “If over 29, AgeAdj = (age - 29) * .003. If under 29, AgeAdj = (age - 29) * .006.”6 – Perform one more adjustment, against the previous season’s averages.Kudos for Tom for making the details public, in contrast with the proprietary nature of its competitors.

Getting to those competitors, ZiPS is an acronym for sZymborski Projection System, named after its creator Tom Szymborski and applied at the Baseball Think Factory. It uses the past four seasons for players between the ages of 24 and 38 with a relative ratio of 8, 5, 4, and 3, and the previous three seasons for younger and older

Page 59: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

59

players (where performance trajectories are steeper). Even with this, projections are weighted by player type to control for the trajectory differences between, for example, speed versus power players and power versus finesse pitchers. Steamer was created as a school project in the Fall of 2008 by high school teacher Jared Cross and two of his students, Dash Davidson and Peter Rosenbloom. Rather than a set weight for past seasons, Steamer uses regression analysis of past data to set the relative weights of each. There are a ton of others; Vladimir, CAIRO (described as an adjustment of MARCEL), KATOH, and OLIVER (used in the Hardball Times series), and one developed by Baseball Info Solutions and used in the annual Bill James Handbook. Daniel Calzada’s (2018) DeepBall is a more recent addition, using a neural network formalism including ballpark effects and estimations of missing data based on particular players’ ratio of grounders/flies/popups/liners; the author claimed that it outperformed MARCEL, the only one public enough for him to test.

Minors to Majors Projections

Another myth that needs to be busted is the idea that there is no relationship between minor league and major league batting performance. In fact, one can predict major league performance in one year from minor league performance in the previous year as well as one can predict the former from major league performance in the previous year. Bill James (1985, pages 5-12) once again was responsible for this insight, which at that time he considered the most important finding in his career. His method for “translating” a hitter’s AAA performance in a given year into major league performance in the same year, or Major League Equivalency (MLE), works as follows:1 – take the sum of runs scored and given up by the hitter’s minor league team and the same sum for the major league team the hitter would be playing with, and then divide the former by the latter. This gives you a measure of run-scoring environment between the two teams without considering the inherent difference between AAA and the majors.2 – multiply that ratio by .82, which Bill proposed as that inherent difference. He called the product of this multiplication the “m factor.”3 – multiply the player’s runs scored and RBI by the m factor.4 – adjust the number of hits, doubles, triples, homers, walks, and strikeouts through specific formulas for each (see the original essay for them) related in various ways to either the m factor or its square (the “M factor”).5 – adjust the latter results for specific ballpark effects (also listed in the essay).

The Baseball Prospectus people, particularly Clay Davenport, have always put a lot of effort into this issue, based on Equivalent Average. Clay (2002a) included a short piece in the BP annual summarizing the overall decline in offensive performance for jumping one level (short-season low A to middle A to high A to AA to AAA to majors) between seasons. Overall, the decline is less for younger players than for older, probably because their skill level is increasing at a quicker rate and thus offsetting the better caliber of play in the higher level. Ignoring the details, hitters below age 25

Page 60: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

60

maintain more than 90% of their performance as they make these jumps. There is one exception, however, as the AAA to majors jump decreases performance by more than 10%, belying the claim that the AA to AAA leap is the toughest.

Dan Levitt (n.d.) explained his own system in an unpublished essay. His goal was to predict player performance in the average MLB ballpark at age 25. You start with player performance and then weight it by the player’s age, level of league, and specific ballpark. Level of league numbers are as follows:

AAA .78AA .75High A .65Low A .6

Note the big jump from high A to AA and how the further jump to AAA is so small. Age adjustments are as follows:

18 1.5619 1.4320 1.3321 1,2422 1.1723 1.1024 1.0525 1.0026 .9627 .9328 .90

These are of course averages and don’t work for a lot of players. Academically and far less usefully, Henry and Hulin (1987) noted that (the

unfortunately terribly flawed) runs produced measure in players’ last minor league season correlated .3 with those in their first major league season, .2 with those in their second, and less than that afterward. In addition, using what seems to be a sample of 94 position players with 10 years of experience at the end of 1984. They examined every paired comparison of year (years 1 and 2, 2 and 3, 1 and 3, etc.) up to between years 1 and 9. They discerned correlations in the .5 range between adjacent seasons. These correlations decreased slowly but steadily as time intervals increased; down to about .3 for seasons five years apart and .2 seven years apart.

Evaluating Specific Skills

Plate Discipline

Page 61: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

61

Russell Carleton (2007) performed a very thoughtful study concerning the concept of plate discipline, which I can only describe in brief. We often measure discipline through looking at the ratio of walks to strikeouts, but this ratio conflates two different capabilities: the ability to recognize which pitches to swing at and which to take, and the ability to put a ball in play (or homer, which to simplify Carleton’s argument I will include in that category) given the decision to swing. Carleton attempted to get at these abilities using what data was available: Retrosheet data from 1993 through 1998 for every player season with more than 100 plate appearances (2426 in all), allowing him to distinguish balls, called and swinging strikes, foul balls, and balls hit in play. Russell borrowed two concepts from signal detection theory. The first is “sensitivity,” the ability to make few errors in one’s decisions. In baseball, sensitivity is represented by the ability to recognize strikes one should swing at versus balls one should take. In detail, it is the “hit rate” (pitches that should have been swung at, measured as the proportion of pitches swung at that were hit into fair territory) minus the “false alarm rate” (pitches that should have been taken, measured by the number of swinging strikes divided by swinging strikes plus balls). The larger the number, the batter is at deciding when swinging is versus is not a good idea. Russel recognized that the definition for false alarm rate will misclassify called strikes purposely taken by the batter when in search for a more hittable pitch. Response bias is represented by the bias toward swinging versus not. It consisted of the proportion of balls that should have been swung at that were hit (versus swung at and missed) paired with the proportion of balls that should have been taken and were (versus called strikes). The notion here is to measure how often batters swing in the first place. Players could be very high in this measure (swing too often) or very low (not swing enough). See the article for details, including how Carleton handled foul balls.

These two measures had a very small statistic relationship in the data and so measured different things. Both were also consistent over time for players (intraclass correlations of .72 for sensitivity and .81 for response), implying they are real skills. Both correlated about .5 with strikeout/walk ratio, again implying two differing but significant skills, and sensitivity correlated .22 with age, meaning that players improvement their judgment with experience. Carleton listed some players that were very high and very low in both. Vladimir Guerrero was an interesting case, as he was the most sensitive (as he made contact when he swung more than others) but had the worst response bias in the direction of swinging too often. Scott Hatteberg had the worst response bias in terms of not swinging enough. Finally, Carleton examined how his measures predicted strikeout and walk rates in stepwise multiple regression equations. In order of importance, strikeout rate was decreased by contact rate, “good decision rate” (the ratio of pitches that were either taken or put into play), surprisingly swing percentage, and again surprisingly increased by two-strike fouls (apparently giving the pitcher another chance to strike the batter out). Again in importance order, walk rate was decreased by swing percentage and contact rate and increased by good

Page 62: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

62

decision rate and two strike fouls.Retrosheet does not have data on pitch location, which limited Carleton’s options

in this regard. The real measure we would want of sensitivity would be to compare pitches in the strike zone that were swung at versus taken for strikes with pitches outside of the strike zone that were taken for balls versus swung at. At the time Russell did his work, PITCHf/x was just starting; and now we can do the type of study we (Russell included I’m sure) would really want. An early attempt was Dan Fox’s (2007) Fish and Eye rates from 2007. To Fish is to swing at a pitch outside of the actual strike zone, with an average of 32 or 33 percent, whereas an Eye is a pitch taken in the strike zone (excepting 3-0 often-automatic-take counts), with an average of 25 to 27 percent. Dan also proposed Square (percentage of true strikes swing at and made contact with, averaging about 87%) and Bad Ball (percentage of true balls made contact with, averaging about 73%). Fouls are included, although Dan noted that there is an argument that they should not be. In any case, Fish and Eye plus their reciprocals allow for a standard signal detection formula giving us a true plate discipline figure.

Panas (2010) described work by Appleman which uncovered a consistent relationship between batting average, swinging at pitches in the strike zone, not swinging at pitches outside of the strike zone, the percentage of swings in which contact was made, and not surprisingly given all of this walk/strikeout ratio.

Home Runs

Fellingham and Fisher (2018) built on a model proposed by Scott Berry and associates (Berry, Reese & Larkey, 1999a) that will be discussed in the Offensive Issues chapter to predict home run production. They used the data for every player with 50 or more at bats for at least six seasons between 1871 and 2008 to build their model and 22 players from the years 2009 to 2016 to test it. After correcting for home park and seasonal tendencies, the batters were (if I understand correctly) grouped into them into 55 clusters depending on the details of career trajectory. They ended up with two different models, one predicting 90.1% and other 80.2% within a 95% confidence interval for 131 total seasons across those 22 players.

Runs Batted In

As a measure, runs batted in represents a critically important skill (after all, runs win games), but evaluating a player’s ability through RBI runs into serious bias due to markedly different opportunities based on number of base runners characteristically on base when players bat. This has led to attempts to measure and properly weight those differences. Cy Morong (corrected version published on line of article published in 2002) computed the following regression equation for predicting runs batted in, using data for all players with at least 6000 plate appearances between 1987 and 2001:

Page 63: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

63

.187 (RBI opportunity per at bat, consisting of 1 for the batter plus 1 for each base runner, for a maximum of 5) + .196 (batting average) + .468 (isolated power) - .303

This is a very powerful equation, accounting for 97.4 percent of the variance. Note that isolated power is easily the most important factor in RBI per at bat. Also notice the importance of RBI opportunity, about equivalent to BA, which added up to a difference as great as 150 in a season between a cleanup hitter and a leadoff man. With the most proficient players approximating .2 RBI per at bat, this could add up to a difference of 25 or 30 RBI in a season, evidence that batting order position is a significant factor in the opportunity for and thus total number of runs batted in.

Tom Hanrahan (2014) offered a simpler fix for the bias problem by dividing RBI by “effective outs”: the number of outs a batter makes multiplied by the number of base runners on during at bats. The 2013 National League average for the result of this division was .090; the league leader was Frankie Freeman (.176) with league RBI leader Paul Goldschmidt right behind (.169).

Strikeouts

In the Inning and Game chapter I cited work that reveals that strikeouts are more costly than other types of outs, and in this chapter I mentioned that Bill James gave strikeouts a specific and negative role in his 1998 version of Runs Created. Dick O’Brien (1983) demonstrated an implication of this fact. He examined annual data from 1970 to 1982 for all batters with 20 or more home runs, and found that the mean ratio of strikeouts to home runs was lower for those 137 in the sample with 100 or more RBIs (exactly 3) than for the 289 with fewer than 100 RBIs (3.64), with a small (28) subsample of the latter with fewer than 70 RBIs even worse (3.95). This method is problematic as it is confounded with RBI opportunity, which Dick realized in time to perform a much better analysis in 1986 based on 1984-1985 data from Elias. This time he compared strikeouts per at bat with the percentage of batters above the 28 percent mean for percentage of runners driven in from scoring position. A table displaying the relevant data for each strikeout/at bat hundredth unit from .05 to .22 (the range with sample sizes of 9 or greater for each unit) reveals an obvious correlation, which in an approximate test (one would need the actual data for each batter to do it correctly) I calculated at a very strong .77. Therefore, the better the HR/AB ratio, the better at driving in runners in scoring position. In addition, perhaps it was a fluke based on small sample sizes but there was a very clear breakpoint between the batters at or better than the average SO/AB ratio of .16 and those above it. Every one of the hundredths units at .16 or better averaged at least 56 percent RBI per opportunity whereas every one of the hundredths units at .17 or worse averaged at most 48 percent, with the figures at .16 and .17 breakpoint starkly different (65 percent versus 47 percent, with sample sizes of 45 and 34 respectively).

Page 64: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

64

Rankings

There have been a lot of attempts specifically designed to rank-order batters, but I will limit my discussion to just one. One of the more publicized, and I might add, more intelligent was Michael Schell’s (1999b, 2005). In his first book, Schell attempted to estimate a context-free ordering of batting average through four transformations:1 – Wanting to ignore the inevitable late-career drop-off that is bound to occur to everyone who plays too long, Schell limited the analysis to the first 8000 at bats of a career. A few players (e.g., Roberto Clemente) with BAs that did not dip in their later years are shortchanged here.2 – An adjustment for the significant historic changes in batting average to equalize the hit-happy 1920s with the hit-scarce 1960s, a topic covered in depth in the Offensive Issues chapter. In short, BAs are normalized against a batting average of .255 in most instances. As overall BAs with a designated hitter are higher than with pitchers batting, the figure was raised to.263 for the American League since 1973.3 – An adjustment for overall league talent. Another issue discussed in detail in the Offensive Issues chapter, there is very good reason to believe that the talent base from which the major leagues chose during the 1800s and early 1900s was considerably smaller than at present, resulting in a replacement level significantly lower than it is now. As they were playing against talent which was overall worse, the better players had the opportunity to shine more than at present.4 – An adjustment for home ballpark.I shall skip the statistical details of the adjustments, which I found generally defensible if not always ideal (as Schell himself would be the first to admit). The final list seems intuitively valid. Modern stars Tony Gwynn (rising from #16 in raw BA to the very top spot), Rod Carew (#28 to #3), and Wade Boggs (#23 to #9) get their deserved rewards; Harry Heilmann (#10 to #47) and Bill Terry (#13 to #31) get cut down to size. (I am curious where Ichiro would stand).

Schell also gave an adjusted ranking for OBA, with Ted Williams ending up king. In his second book, with improved adjustment procedures, he redid these two lists and supplied additional ones for doubles (Stan Musial was #1), triples (Lance Johnson), home runs (the Babe, by a wide margin), runs scored (Rickey Henderson), RBI (the Babe), walks (Max Bishop!), strikeouts (Gwynn), steals (Henderson), SA (the Babe), and overall offense (the Babe first, the Splendid Splinter second, and everyone else far behind.

And Finally…

Petersen, Stanley, and associates have published three papers (Petersen, Jung, & Stanley, 2008; Petersen, Jung, Yang, and Stanley, 2011; Petersen, Penner, & Stanley, 2011) demonstrating that the distribution of career totals in “statistics of longevity” such as number of career at bats, hits, home runs, and runs batted in across

Page 65: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

65

players approximates a power curve, with about 10 times as many players with for example career 40 home runs as with 400, and 10 times as many with 4 as with 40.

References

Abelson, Robert (1985). A variance explanation paradox: When a little is a lot. Psychological Bulletin, Vol. 97 No. 1, pages 129-133.

Acharya, Robit A., Alexander J. Ahmed, Alexander N. D’Amour, Haibo Lu, Carl N. Morris, Bradley D. Oglevee, Andrew W. Peterson, and Robert N. Swift (2008). Improving major league baseball park factor estimates. Journal of Quantitative Analysis in Sports, Vol. 4 Issue 2, Article 4.

Adams, Dallas (1987b). Using a baseball simulator program to calculate batter runs created. Baseball Analyst, Vol. 31, pages 17-20.

Albert, Jim (2001). Using play-by-play baseball data to develop a better measure of batting performance. Retrieved from www.math.bgsu.edu/~albert/papers/rating_paper2

Albert, Jim (2005). A baseball hitter’s batting average: Does it represent ability or luck? Stats, vol. 42.

Albert, Jim (2006b). A breakdown of a batter’s plate appearance – four hitting rates. By the Numbers, Vol. 16 No. 1, pages 23-30.

Albert, Jim (2007). Hitting in the pinch. In Jim Albert and Ruud H. Koning (Eds.), Statistical thinking in sports (pages 111-134). Boca Raton, FL: Chapman & Hall/CRC

Albert, Jim (2016). Improved component predictions of batting and pitching measures. Journal of Quantitative Analysis in Sports, Vol. 12 Issue 2, pages 73-85.

Albert, Jim and Jay Bennett (2001). Curve Ball. New York: Copernicus Books.Albert, Jim and Phil Birnbaum (2009). Comment on article by Jensen et al. Bayesian

Analysis, Vol. 4 No. 4, pages 653-660.Allen, Dave (2009). Run value by pitch location. Available at

www.baseballanalysts.com/archives/2009/03/run_value_by_pi.phpAllen, Dave (2010). Where was that pitch? In Dave Studenmund (Prod.), Hardball

Times Baseball Annual 2010 (pages 159-166). Skokie, IL: Acta Sports.Anderson, Timothy R., and Gunter P. Sharp (1997). A new measure of baseball batters

using DEA. Annals of Operations Research, Vol. 73, pages 141-155.Andrecheck, Sky (n.d.). Benefit value: Evaluating a player’s worth. Unpublished.Austin. Tom (1998). Call me QEQA. In Don Malcolm, Brock J. Hanke, Ken Adams, and

G. Jay Walker (Eds.), The Big Bad Baseball Annual (pp. 29-31). Indianapolis: Masters Press.

Baumer, Ben S. (2008). Why on-base percentage is a better indicator of future performance than batting average: An algebraic proof. Journal of Quantitative Analysis in Sports, Vol. 4 Issue 2, Article 3.

Baumer, Ben and Dana Draghicescu (2010). Mapping batter ability using special

Page 66: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

66

statistical techniques. Presented at annual JSM American Statistical Association conference and available at https://www.semanticscholar.org/paper/Mapping-Batter-Ability-in-Baseball-Using-Spatial-Baumer-Draghicescu/ ccbba766ed516f24404ea58347541aa0739709e0

Baumer, Benjamin and Andrew Zimbalist (2014). The sabermetric revolution: Assessig the growth of analytics in baseball. Philadelphia: University of Pennsylvania Press.

Beaudoin, David (2013). Various applications to a more realistic baseball simulator. Journal of Quantitative Analysis in Sports, Vol. 9 Issue 3, pages 271-283.

Begly, John P., Michael S. Guss, Theodore S. Wolfson, Siddharth A. Mahure, Andrew Bennett, Jay M. (1993). Did Shoeless Joe Jackson throw the 1919 World Series? The

American Statistician, Vol. 47 No. 4, pages 241-250.Bennett, Jay M. and John A. Flueck (1983). An evaluation of major league baseball

offensive performance models. The American Statistician, Col. 37 No. 1, pages 76-82.

Bennett, Jay M. and John A. Flueck (1992). Player game percentage. Proceedings of the American Statistical Association Section on Statistics in Sports, pages 64-66.

Berry, Scott M. (1999b). How many will Big Mac and Sammy hit in ’99? Chance, Vol. 12 No. 2, pages 51-55.

Berry, Scott M. (2000b). A nasty question. Chance, Vol. 13, No. 3, pages 60-61.Berry, Scott M. (2000c). Modeling offensive ability in baseball. Chance, Vol. 13 No. 4,

pages 56-59.Berry, Scott M. (2002b). A juiced analysis. Chance, Vol. 15 No. 4, pages 50-53.Berry, Scott M. (2004a). The black cat unmasked. Chance, Vol. 17 No. 1, pages 53-

56.Berry, Scott M. (2006a). Budgets and baseball concave or convex: Winning the salary

game. Chance, Vol. 19 No 1, pages 57-60.Birnbaum, Phil (1994). When does a player lose his job? By the Numbers, Vol. 6 No.

3, pages 6-10.Birnbaum, Phil (1999a). Bias in run statistics. By the Numbers, Vol. 9 No. 2, pages 22-

29.Birnbaum, Phil (2000b). Run statistics don’t work for games. By the Numbers, Vol. 10

No. 3, pages 16-19Birnbaum, Phil (2005a). Is OBP really worth three times as much as SLG? By the

Numbers, Vol. 15 No. 2, pages 9-11.Birnbaum, Phil (2005b). Phil Birnbaum responds [to Mark Pankin]. By the Numbers,

Vol. 15 No. 4, page 14.Blass, Asher A. (1992). Does the baseball labor market contradict the human capital

model of investment? Review of Economics and Statistics, Vol. 74 No. 2, pages 261-268.

Blau, Clifford (1999b). Measuring the accuracy of runs formulas for players. By the Numbers, Vol. 9 No. 3, pages 31-33.

Page 67: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

67

Blengino, Tony (May 5, 2015). Data: The Basics. Available at www.fangraphs.com/blogs/getting-the-most-out-of-batted-ball-data-part-1-the-basics

Bradbury, J. C. (2006). Giving players their PrOPS: A Platonic measure of hitting. In Aaron Gleeman and Dave Studenmund (Eds.), The Hardball Times Baseball Annual (pages 161-167). Skokie, IL: Acta Sports.

Bradbury, J. C., and David Gassko (2006). Do players control batted balls? In Aaron Gleeman and Dave Studenmund (Eds.), The Hardball Times Baseball Annual (pages 154-160). Skokie, IL: Acta Sports.

Brentnell, Adam R., Martin J. Crowder and David J. Hand (2011). Approximate repeated-measures shrinkage. Computational Statistics and Data Analysis, Vol. 55 No. 2, pages 1150-1159.

Brown, Lawrence D. (2008). In-season prediction of batting averages: A field test of empirical Bayes and Bayes methodologies. Annals of Applied Statistics, Vol. 2 No. 1, pages 113-152.

Burnson, John (2003). Expected batting average. In Ron Shandler (Ed.), Baseball Forecaster 2003 (pages 7-8). Roanoke: Shandler Enterprises.

Calzada, Daniel (2018). Deepball: Modeling expectation and uncertainty in baseball with recurrent neural networks. https://www.retrosheet.org/Research/Calzada/CALZADA-THESIS-2018.pdf

Carleton, Russell (2007). Is walk the opposite of strikeout? By the Numbers, Vol. 17 No. 1, pages 3-9.

Carleton, Russell (2016). Getting to the bottom of the barrel. https://www.baseballprospectus.com/news/article/30584/baseball-therapy-getting-to-the-bottom-of-the-barrel/

Carleton, Russell A. (2017). The Shift. Chicago, IL: Triumph Books.Chung, Griffith. (n.d.) The predictive value of baseball players’ rookie batting statistics.

Unpublished paper.Codell, Barry (1979). The base-out percentage: Baseball’s newest yardstick. Baseball

Research Journal, No. 8, pages 35-39.

Page 68: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

68

Codell, Barry F. (1992). DW: A way to truly weigh the diamond! By the Numbers, Vol. 5 No. 1, pages 4-6.

Coffin, Donald A. and Bruce W. Cowgill (2005). The relative value of OBA and SLG: Another view. By the Numbers, Vol. 15 No. 4, pages 17-20.

Cook, Earnshaw with Wendell R. Garner (1966). Percentage Baseball. Cambridge, MA: MIT Press.

Costa, Gabriel B., Michael R. Huber, and John T. Saccoman (2008). Understanding sabermetrics. Jefferson, NC: McFarland.

Cover, Thomas M. and Carroll W. Keilers (1977). An offensive earned-run average for baseball. Operations Research, Vol. 25 No. 5, pages 729-741; plus errata, Vol. 27 No. 1, page 207.

Cramer, Richard D. (1977). Do clutch hitters exist? Baseball Research Journal, No. 6, pages 74-78.

Cramer, Dick (1987). Why teams win and lose: Player par-run performance in 1985 & 1986. In Bill James, John Dewan, and Project Scoresheet (1987), The Great American Baseball Stat Book. 1st ed (pages 535-537). New York: Ballantine Books.

Cramer, Richard D. and Pete Palmer (1974). The batter’s run average (BRA). Baseball Research Journal, No. 3, pages 50-57.

Cross, Jared and Dana Sylvan (2018). Modeling spatial batting ability using a known covariance matrix. Journal of Quantitative Analysis in Sports. Vol 14 No. 3, pages 155-168.

D’Angelo, John P. (2010). Baseball and Markov chains: Power hitting and power series. Notices of the American Mathematical Society, Vol. 57 No. 4, pages 490-495.

Davenport, Clay (2002a). Doing it over. In Joe Sheehan (Ed.), Baseball Prospectus 2002 (pages 467-469). Dulles, VA: Brassey’s.

Deli, Daniel (2012). Assessing the relative importance of inputs to a production function: Getting on base versus hitting for power. Journal of Sports Economics, Vol. 14 No. 2, pages 203-217.

D’Esopo, Donato A. and Benjamin Lefkowitz (1977). The distribution of runs in the game of baseball. In Shaul P. Ladany and Robert E. Machol (Eds.), Optimal strategies in sports (pages 55-62). Amsterdam: North-Holland.

Efron, Bradley and Carl Morris (1975). Data analysis using Stein’s estimator and its generalizations. Journal of the American Statistical Association, Vol. 70 No. 350, pages 311-319.

Efron, Bradley and Carl Morris (1977, May). Stein’s paradox in statistics. Scientific American, pages 119-127.

Fast, Mike (2007). Glossary of the Gameday pitch fields. https://fastballs.wordpress.com/2007/08/02/glossary-of-the-gameday-pitch-fields/

Page 69: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

69

Fast, Mike (2011a). Who controls how hard the ball is hit? https://www.baseballprospectus.com/news/article/15532/spinning-yarn-who-controls-how-hard-the-ball-is-hit/

Fast, Mike (2011b). How does quality of contact relate to BABIP? https://www.baseballprospectus.com/article/15562/spinning-yarn-how-does-quality-of-contact-relate-to-babip/

Felber, Bill (2005). The Book on the Book. New York: St. Martin’s Press.Fellingham, Gilbert W., and Jared D. Fisher (2018). Predicting home run production in

major league baseball using a Bayesian semiparametric model. The American Statistician, Vol. 72 No. 3, pages 253-264.

Feng, Long and Lee H. Dicker (2018). Approximate nonparametric maximum likelihood for mixture models: A convex optimization approach to fitting arbitrary multivariate mixing distributions. Computational Statistics and Data Analysis, Vol. 122, pages 80-91.

Fox, Dan (2007). The return of the fish eye. https://www.baseballprospectus.com/news/article/6705/schrodingers-bat-the-return-of-the-fish-eye/

Furtado, Jim (1999a). Introducing XR. In Don Malcolm, Brock J. Henke, Ken Adams and G. Jay Walker, 1999 Big Bad Baseball Annual (pp. 479-484). Chicago: NTC Press.

Furtado, Jim (1999b). Why do we need another player evaluation method? In Don Malcolm, Brock J. Henke, Ken Adams and G. Jay Walker, 1999 Big Bad Baseball Annual (pp. 465-474). Chicago: NTC Press.

Furtado, Jim (1999c). Measuring accuracy in run estimation methods. In Don Malcolm, Brock J. Henke, Ken Adams and G. Jay Walker, 1999 Big Bad Baseball Annual (pp. 485-488). Chicago: NTC Press.

Gassko, David (2007b). Do players control batted balls? (Part Two). In David Studenmund (Ed.), 2007 Hardball Times Baseball Annual (pages 156-160). Skokie, IL: ACTA Sports.

Glickman, Mark E. (2009). Comment of article by Jensen et al. Bayesian Analysis, Vol. 4 No. 4, pages 661-664.

Gore, Ross and Cameron Snapp (2011). A major league baseball (MLB) Swing Quality Metric. MIT Sloan Sports Analytics Conference 2011.

Gu, Jiaying and Roger Koenker (2017). Empirical Bayesball remixed: Empirical Bayes methods for longitudinal data. Journal of Applied Econometrics, Vol. 32, pages 575-599.

Hakes, John K. and Raymond D. Sauer (2006). An economic evaluation of the Moneyball hypothesis. Journal of Economic Perspectives, Vol. 13, pages 173-185.

Hanke, Brock J. (1998). WAR redeclared. In Don Malcolm, Brock J, Hanke, Ken Adams, and G. Jay Walker (Eds.), The 1998 Big Bad Baseball Annual (pages 495-503). Indianapolis: Masters Press.

Page 70: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

70

Hanrahan, Tom (2008a). How leadoff hitters are sabermetrically overrated. By the Numbers, Vol. 18 No. 4, pages 4-7.

Hanrahan, Tom (2014). Giving context to RBI. By the Numbers, Vol. 24 No. 1, pages 5-9.

Page 71: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

71

Harder, Joseph W. (1991). Equity theory versus expectancy theory: The case of major league baseball free agents. Journal of Applied Psychology, Vol. 76 No. 3, pages 458-464.

Harder, Joseph W. (1992). Play for pay: Effects of inequity in a pay-for-performance context. Administrative Science Quarterly, Vol. 37 No. 2, pages 321-335.

Heacock, Rachel (2017). Applying asset pricing theory to MLB. https://www.fangraphs.com/tht/applying-asset-pricing-theory-to-mlb/

Heacock, Rachel (2018). What Wall Street can teach us about baseball players. 2018 Hardball Times. https://www.fangraphs.com/tht/tht-annual-2018/what-wall-street-can-teach-us-about-baseball-players/

Healey, Glenn (2017a). Learning, visualizing, and assessing a model for the intrinsic value of a batted ball. IEEE Access, Vol. 5, pages 13811-13822

Heeren, Dave and Pete Palmer (2011). Basic Ball. Haworth, NJ: St. Johann Press.Heipp, Brandon (2001). A promising new run estimator – Base Runs. By the Numbers,

Vol. 11 No. 3, pages 18-19.Heipp, Brandon (2005). Finding implicit linear weights in run estimators. By the

Numbers, Vol. 15 No. 4, pages 9-12.Henry, Rebecca A. and Charles L. Hulin (1987). Stability of skilled performance across

time: Some generalizations and limitations on utilities. Journal of Applied Psychology, Vol. 72 No. 3, pages 457-462.

Hitzges, Norm and Dave Lawson (1994). Essential baseball 1994. New York: Penguin.Hoke, Travis (1935). The base in baseball. Esquire, Vol. 4 No. 4, pages 67 and 140.James, Bill (1978). The 1978 Baseball Abstract. Lawrence, KS: Bill James.James, Bill (1979). The 1979 Baseball Abstract. Lawrence, KS: Bill James.James, Bill (1981). The 1981 Baseball Abstract. Lawrence, KS: Bill James.James, Bill (1982). The 1982 Bill James Baseball Abstract. New York: Ballantine

Books.James, Bill (1983). The 1983 Bill James Baseball Abstract. New York: Ballantine

Books.James, Bill (1983a). {Untitled note}. Baseball Analyst, No. 6, page 20.James, Bill (1985). The 1985 Bill James Baseball Abstract. New York: Ballantine

Books.James, Bill (1986). The 1986 Bill James Baseball Abstract. New York: Ballantine

Books.James, Bill (1986a). The Bill James Historical Baseball Abstract. New York: Villard

Books.James, Bill (1987a). Research in progress. Baseball Analyst, No. 33, pages 10-15.James, Bill (2010). Strong seasons leading index. In Dave Studenmund (Producer),

The Hardball Times Baseball Annual 2010 (pages 76-85). Skokie, IL: Acta Sports.

Jarvis, John F. (1998b). A survey of baseball player performance evaluation methods.Retrieved from http://knology.net/johnfjarvis/runs_survey.html

Page 72: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

72

Jarvis, John F. (2002). Career summaries and projections. Presented at the 2002 SABR convention and retrieved from http://knology.net/johnfjarvis/cftn.html

Jensen, Peter (2009). Using HITf/x to measure skill. https://www.fangraphs.com/tht/using-hitf-x-to-measure-skill/

Jensen, Shane, Blakeley B. McShane and Abraham J. Wyner (2009a). Hierarchical Bayesian modeling of hitting performance in baseball. Bayesian Analysis, Vol. 4 No. 4, pages 631-652.

Jensen, Shane, Blakeley B. McShane and Abraham J. Wyner (2009b). Rejoinder. . Bayesian Analysis, Vol. 4 No. 4, pages 669-674.

Jiang, Wenhua and Cun-Hui Zhang (2010). Empirical Bayes in-season prediction of baseball batting averages. In James O. Berger, T. Tony Cai, Iain Johnstone, and Lawrence D. Brown (Eds.), Borrowing strength: Theory powering applications (pages 263-273). Beachwood, OH: Institute of Mathematical Statistics

Johnson, Paul (1985). Estimated runs produced. In Bill James, The 1985 Bill James Baseball Abstract. New York: Ballantine Books.

Judge, Jonathan (2018a). The performance case for DRC+. https://www.baseballprospectus.com/news/article/45383/the-performance-case-for-drc/

Katsunori, Ano. (2001). Modified offensive earned-run average with steal effect for baseball. Applied Mathematics and Computation, Vol. 120, pages 279-288.

Koop, Gary (2002). Comparing the performance of baseball players: A multiple-outpit approach. Journal of the American Statistical Association, Vol. 97 No. 459, pages 710-720.

Lane, F. C. (1917, January). Why the system of batting averages should be reformed. Baseball Magazine, pages 52-60.

Lane, F. C. (1917, March). The base on balls. Baseball Magazine, pages 93-95.Lanning, Jonathan A. (2010). Productivity, discrimination, and lost profits during

baseball’s integration. Journal of Economic History, Vol. 70 No. 4, pages 964-988.

Lanoue, M. R. and J. J. Mevetta Jr. (1993). An analytic hierarchy approach to major league baseball offensive performance ratings. Mathematical Computer Modelling, Vol. 17 Nos. 4 & 5, pages 195-209.

Lee, Young Hoon (2011). Is the small-ball strategy effective in winning games? A stochastic frontier production approach. Journal of Productivity Analysis, Vol. 35, pages 51-59.

Levitt, Dan (2003a). The predictive value of half-season statistics. By the Numbers, Vol. 13 No. 1, pages 5-6.

Levitt, Dan (2005). Beyond player wins: Calculating individual player pennants added. By the Numbers, Vol. 15 No. 1, pages 15-20

Levitt, Dan (n.d.). Projecting batting ability from minor league statistics: A complete methodology.

Lieff, Matthew E. (1998). Simplified method for run creation measurement. In Bill

Page 73: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

73

James, Don Zminda, and Project Scoresheet (Eds.), The 1998 Great American Baseball Stat Book (pages 615-618). New York: Villard Books.

Lindsey, G. R. (1959). Statistical data useful for the operation of a baseball team. Operations Research, Vol. 7, pages 197-207.

Lindsey, George R. (1963). An investigation of strategies in baseball. Operations Research, Vol. 11, pages 477-501.

Maher, C. (1977, November). Batting average: A true gauge of a hitter’s value? Baseball Digest, pages 85-91.

Mains, Rob (2018b). Comparing DRC+, OPS+, and wOBA+. https://www.baseballprospectus.com/news/article/45445/comparing-drc-ops-and-wrc/

Malcolm, Don (1999). The changing face of competition. In Don Malcolm, Brock J. Henke, Ken Adams and G. Jay Walker, 1999 Big Bad Baseball Annual (pp. 22-23). Chicago: NTC Press.

Martin, Ryan (2015). Asymptotically optimal nonparametric empirical Bayes via predictive recursion. Communications in Statistics: Theory and Methods, Vol. 44, pages 268-299.

McShane, Blakeley B., Alexander Braunstein, James Piette, and Shane T. Jensen (2011). A hierarchical Bayesian variable selection approach to major league baseball hitting metrics. Journal of Quantitative Analysis in Sports, Vol. 7 Issue 4, Article 2.

Mills, Eldon G., and Harlan D. Mills (1970). Player Win Averages. South Brunswick, NJ: A. S. Barnes.

Morong, Cyril (2002). RBIs, opportunities and power hitting. Baseball Research Journal, No. 31, pages 98-101, with corrections retrieved from http://cyrilmorong.com/WEB.htm

Munro, Neil (1984). Batters’ offensive wins and losses. Baseball Analyst, No. 14, pages 12-17.

Muralidharan, Omkar (2010). An empirical Bayes mixture method for effect size and false discovery rate. Annals of Applied Statistics, Vol. 4 No. 1, pages 422-438.

Neal, Dan, James Tan, Feng Hao, and Samuel S. Wu (2010). Simply better: Using regression models to estimate major league batting average. Journal of Quantitative Analysis in Sports, Vol. 6 Issue 3, Article 12.

O’Brien, Dick (1983). Power hitters strikeout/home run ratios. Baseball Analyst, No. 8, pages 16-17.

O’Brien, Dick (1986). Cloudland revisited. Baseball Analyst, No. 26, pages 2-5.Palmer, Pete (1978). Home park effects on performance in the American League.

Baseball Research Journal, No. 7, pages 50-60.Palmer, Pete (1983a). On-base average. In L. Robert Davids (Ed.), Insider’s baseball

(pages 210-214). New York: Scribner’s.Palmer, Pete (1983c). Adjusted home park factors. Baseball Analyst, No. 6, pages 1-4.Palmer, Pete (1983d). Distribution of runs. Baseball Analyst, No. 7, pages 19-20.

Page 74: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

74

Palmer, Pete (2009). McCracken and Wang revisited. By the Numbers, Vol. 19 No. 1, pages 9-13.

Palmer, Pete (2017). Calculating skill and luck in major league baseball. Baseball Research Journal, Vol. 46 No. 1, pages 56-60.

Panas, Lee (2010). Beyond Batting Average. Self-published.Pankin, Mark. D. (1978). Evaluating offensive performance in baseball. Operations

Research, Vol. 26 No. 4, pages 610-619.Pankin, Mark (2004). Relative value of on-base pct. and slugging avg. Presented at

the annual SABR convention and available at http://www.pankin.com/baseball.htm

Pankin, Mark (2005). More on OBP vs. SLG. By the Numbers, Vol. 15 No. 4, pages 13-15.

Pankin, Mark (2006). Additional on-base worth 3x additional slugging? Presented at the 2006 SABR convention and available at the Retrosheet research page.

Perpetua, Andrew (2017a). Beware of launch angle. https://www.fangraphs.com/fantasy/beware-of-launch-angle/

Perpetua, Andrew (2017b). Adjusting exit velocity for pitch speed and location. https://www.fangraphs.com/fantasy/adjusting-exit-velocity-for-pitch-speed-and-location/

Petersen, Alexander M., Woo-Sung Jung and H. Eugene Stanley (2008). On the distribution of career longevity ansd the evolution of home-run prowess in professional baseball. Europhysics Letters, Vol. 83, Article 50010.

Petersen, Alexander M., Woo-Sung Jung, Jae-Suk Yang, and H. Eugene Stanley (2011, January 4). Quantitative and empirical demonstration of the Matthew effect in a study of career longevity. Proceedings of the National Academy of Sciences, Vol. 108 No. 2, pages 18-23.

Petersen, A. M., O. Penner and H. E. Stanley (2011). Methods for detrending success metrics to account for inflationary and deflationary factors. European Physical Journal B, Volume 79, pages 67-78.

Pudaite, Paul R. (1988). Player Win Averages: An extended book review. Baseball Analyst, No. 38, pages 2-7.

Quintana, Fernando A. and Peter Muller (2009). Comment of article by Jensen et al. Bayesian Analysis, Vol. 4 No. 4, pages 665-668.

Reuter, Jim (1982a). More on the “true” slugging percentage. Baseball Analyst, No. 3, pages 2-6.

Reuter, Jim (1982b). Thoughts on isolated power. Baseball Analyst, No. 4, pages 23-24.

Reuter, Jim (1983). Home park factors. Baseball Analyst, No. 5, page 6.Rickey, Branch (1954, August 2). Goodby to some old baseball ideas. Life, pages 78-86

and 89.Robinson, David H. (1987). The analysis of run potential. In Project Scoresheet, The

Great American Baseball Statbook, first edition (pages 514-516). New York:

Page 75: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

75

Ballantine Books.Robinson, David H. (1987a). Improving the Runs Created formula. Baseball Analyst,

No. 3, pages 2-6.Ruane, Tom (2005a). The Value Added approach to evaluating performance. Available

at http://www.retrosheet.org/Research/RuaneT/valueadd_art.htmRuggiero, John (2010). Frontiers in major league baseball. New York: Springer. Runquist, Willie (1995). Baseball By the Numbers. Jefferson, NC: McFarland.Runquist, Willie (1999). Reliability of statistics. By the Numbers, Vol. 9 No. 4, pages 7-

10.Saavedra, Serguei, Scott Powers, Trent McCotter, Mason A. Porter, and Peter J. Mucha

(2010). Mutually-antagonistic interactions in baseball networks. Physica A, Vol. 389, pages 1131-1141.

Scahill, Edward M. (1990). Did Babe Ruth have a comparative advantage as a pitcher? Journal of Economic Education, Vol. 21 No. 4, pages 403-410.

Schall, Teddy amd Gary Smith (2000b). Do baseball players regress toward the mean? The American Statistician, Vol. 54 No. 4, pages 231-235.

Schell, Michael J. (1999b). Baseball’s All-Time Best Hitters. Princeton, NJ: Princeton University Press.

Schell, Michael J. (2005). Baseball’s All-Time Best Sluggers. Princeton, NJ: Princeton University Press.

Schutz, Robert W. (1995). The stability of individual performance in baseball: An examination of four 5-year periods. American Statistical Association, Proceedings of the Section on Statistics in Sports, pages 39-44.

Scully, Gerald W. (1974). Pay and performance in major league baseball. American Economic Review, Vol. 64 No. 6, pages 915-930.

Selter, Ron (2002). Batting and asymmetric ballparks: A study of NL ballparks 1927-1937. Presented at the 2002 SABR convention.

Sheehan, Joe P. (February 29, 2008). Locational run values. Available at www.baseballanalysts.com/archives/2008/02/lwts_by_locatio.php

Silver, Nate (2003). Introducing PECOTA. In Gary Huckabee, Chris Kahrl, and Dave Pease (Eds.), Baseball Prospectus 2003 (pages 507-514). Dulles, VA: Brassey’s.

Silver, Nate (2006b). Why was Kevin Maas a bust? In Jonah Keri (Ed.) Baseball Between the Numbers (pages 253-271). New York: Basic Books.

Silver, Nate (2006c). Is David Ortiz a clutch hitter? In Jonah Keri (Ed.) Baseball Between the Numbers (pages 14-34). New York: Basic Books.

Skoog, Gary R. (1987). Measuring runs created: The value added approach. In Bill James, 1987 Baseball Abstract pages 280-285. New York: Ballantine Books.

Smyth, David (1987). A new framework for assessing individual offensive performance. Baseball Analyst, No. 29, pages 6-8.

Page 76: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

76

Sueyoshi, Toshiyuki, Kenji Ohnishi, and Youichi Kinase (1999). A benchmark approach for baseball evaluation. European Journal of Operational Research, Vol. 115 No. 3, pages 429-448.

Tango, Tom, Mitchel G. Lichtman and Andrew E. Dolphin (2006). The Book: Playing the Percentages in Baseball. TMA Press.

Tenbarge, Lawrence (1996). Earned-base average. Baseball Research Journal, No. 25, pages 133-135.

Thorn, John, and Pete Palmer (1984). The Hidden Game of Baseball. Garden City, NY: Doubleday.

Walker, G. Jay (1999a). Creating well-adjusted statistics. In Don Malcolm, Brock J. Henke, Ken Adams and G. Jay Walker, 1999 Big Bad Baseball Annual (pp. 488-493). Chicago: NTC Press.

Walker, G. Jay (1999b). Extrapolated average. In Don Malcolm, Brock J. Henke, Ken Adams and G. Jay Walker, 1999 Big Bad Baseball Annual (p. 493). Chicago: NTC Press.

Walker, G. Jay and Jim Furtado (1999). Extrapolated wins. In Don Malcolm, Brock J. Henke, Ken Adams and G. Jay Walker, 1999 Big Bad Baseball Annual (pp. 494-503). Chicago: NTC Press.

Wang, Victor (2006). The OBP/SLG ratio: What does history say? By the Numbers, Vol. 16 No. 3, pages 3-4.

Wang, Victor (2007). A closer look at the OBP/SLG ratio. By the Numbers, Vol. 17 No. 1, pages 10-14.

Weinstein, Asaf, Zhuang Ma, Lawrence D. Brown & Cun-Hui Zhang (2018). Wittkowski, Knut M., Tingting Song, Kent Anderson, and John E. Daniels (2008). U-scores for multivariate data in sports. Journal of Quantitative Analysis in Sports, Vol. 4, Issue 3, Article 7.

Wolfersberger, Jesse and Matthew Yaspan (2015). Trying to quantify recency effect. In Dave Studenmund and Paul Swydan (Prods.), The 2015 Hardball Times Baseball Annual (pages 360--367). FanGraphs.

Wolverton, Michael (2002). The problem with “peak.” In Joseph S. Sheehan (Ed.), Baseball Prospectus (pages 470-475). Washington, DC: Brassey.

Woolner, Keith (2002). Understanding and measuring replacement level. In Joe Sheehan (Ed.), Baseball Prospectus 2002 (pages 455-466). Washington, DC: Brasseys.

Woolner, Keith (2005). An analytical framework for win expectancy. In Baseball Prospectus 2005 (pages 520-533). New York: Workman.

Woolner, Keith (2006). Adventures in win expectancy. In Steven Goldman (Ed.), Baseball Prospectus 2006 (pages 506-511). New York: Workman.

Woolner, Keith (2006c). Why is Mario Mendoza so important? In Jonah Keri (Ed.), Baseball Between the Numbers (pages 157-173). New York, NY: Basic Books.

Wyers, Colin (2009). The best run estimator. In Dave Studenmund (Producer), The Hardball Times Baseball Annual (pages 209-215). Skokie, IL: Acta Sports.

Page 77: Chapter Five – Offensive Evaluation · Web viewChapter Five – Offensive Evaluation The objective of this chapter is to describe methods that have been proposed for measuring and

77

Xie, Xianchao, S. C. Kou, and Lawrence D. Brown (2012). SURE estimates for a heteroscedastic hierarchical model. Journal of the American Statistical Association, Vol. 107 No. 500, pages 1465-1479.