Baseball Between The Numbers

Posted on: December 9th, 2011 by
Comments Disabled

“Boy, Davis looks fast this year.”
“Yeah, but that speed’s overrated.”
“Well, he had an EqA of .320…”
“Sure, but his VORP was only a quarter of a win.”
“And at 29, with no power, his PECOTA has him falling off pretty quickly. Just like the Rockies to overpay for a guy past his prime.”

Sounds like a foreign language. Well, except for the part about the Rockies. Welcome to the world of Modern Baseball Analysis. The world of Baseball Between the Numbers, and The world that tells you just how much better Babe Ruth was than Barry Bonds is.

Started by Bill James in the late 1970s, and brought to the World Beyond Rotisserie by Moneyball, modern baseball analysis tries to provide a more accurate and insightful picture of the game than traditional statistics can. Traditional statistics are extremely context-dependent. A batter’s home ballpark, the defense behind a pitcher, who hits in front of a batter, or even the era in which they played, can all skew a player’s numbers to the point where they tell us virtually nothing about his performance on the field. Modern baseball analysis tries to account for these variables, and to introduce new ways of accounting for how a player contributes to (or detracts from) his team’s success.

For offense, the usual technique are to convert a player’s contributions into extra runs for his team – the number of runs he contributes over a player available at the league minimum off the waiver wire. Convert these runs into wins at the rate of roughly one win per 10 runs, and we get a better sense of how much the player is helping or hurting his team. Another technique looks at situational hitting: given the inning, the score, the men on base, and the number of outs, calculate how much what a player does increases (or decreases) his team’s chance of winning the game. A leadoff shot in the first obviously means less than a homer in the ninth, even though both happen with the score tied.

Offensive statistics like RBIs and runs scored are too dependent on the situation, and thus on teammates. After all, if the bases are empty, even the best player can only drive in a single run. However, there’s still only one batter at a time. Defense and pitching are much more intertwined, and here’s where the real action is right now. One of the best articles in the book attempts to isolate a pitcher’s performance. Beginning with the question of why pitchers seem so inconsistent from year to year, we find that the fault dear Brutus isn’t in our stars but in our selves. In fact, certain statistics, things that eliminate the defense, are remarkably constant from year to year. Looking only at home runs, strikeouts, and walks, it turns out that pitchers are remarkably consistent; it’s usually the defense behind them that varies.

Teasing out individual defensive peformance is even harder. Why penalize a second baseman when he’s trying to turn a double play with Marv Thronberry? New stats such as Range Factor (how many balls does a player get to?) and Bases Allowed on Balls in Play – BABIP – help separate the defense from the pitcher.

In general, the statistical analysis is superb, using the proper tools for various correlations. However, every once in a while they do make mistakes. In an early article examining RBI vs. VORP (Value Over Replacement Player), the analyst uses a ratio of RBI/VORP to demonstrate how misleading RBI is, showing the best and worst RBI/VORP seasons over the last 34 years. Since VORP can be 0, he arbitrarily adds 10 to all VORPs on the list. This is just a terrible way to make the comparison. A better solution would have been to put VORP in the numerator, since RBI can never be negative.

Where traditional statistics count things, these new ones calculate them. Even ERA still counts actual runs – the argument is over who deserves the “blame” for allowing any given run to score. The new analysis produces statistics based on the presumptive worth of a given act. How many runs can it be expected to produce? How many runs is a player worth to his team? How much closer to victory does a certain strategy bring a team? These invariably involve judgment calls, and so it’s not surprising that the counter-intuitive – or at least iconoclastic – results they produce have been rejected by long-time insiders.

One of the chief joys of reading Bill James was the joyful writing, the almost blog-like style that he had developed. Never mind the latest statistical toy or gee-whiz discovery. The simple pleasure of his writing was enough to keep me re-reading his Abstracts all year. I still remember some of his writing over 20 years later. By that standard, Baseball Betweem the Numbers falls short. The joy seems buried in the intelligence.

Another shortcoming is the lack of a glossary with the actual equations. Many of the equations are listed, but some are in the text, some are in the glossary, some are in the end notes, and some fairly simple ones, like Equivalent Average, aren’t listed at all. That’s too bad, because this superb introduction to the state of the art leaves the reader eager to pursue some research on his own. While a CD probably doesn’t make much sense, a simple website registration such as O’Reilly or other technical publishers have, could make both data and formulas available to customers.

The book contains some fine individual studies. One analysis of Coors Field struck particularly close to home, with some surprising results. Another article uses Mario Mendoza, of the famous Mendoza Line, as the motivation to examine replacement-level talent. A couple of closely-reasoned pieces take apart baseball economics, from the value of a stadium to the value of an MVP shortstop. “What Does Mike Redmond Know About Tom Glavine” is a reminder of the perils of small sample sizes, applied to all sorts of situations. Even the first article, arguing for replacing RBI with VORP, is still terrific, despite bobbling that ratio grounder at the end. Its fellow bookend, looking at successful playoff teams, as opposed to winning regular-season teams, helps explain why Billy Beane can’t win a playoff series.

If you want to see sharp, finely-tuned analysis applied to your favorite game, this is the book for you. And if Babe Ruth played today? Think 900 home runs.


Comments are closed.