For me, it was the numbers. I didn’t really discover baseball until I was in college, in the 80s, when a roommate left a copy of that year’s Bill James Baseball Abstract in the suite. I was smitten. It wasn’t just the statistics – it was the analysis and the writing.
James understood that baseball statistics were individual performance in a team context and knew how to separate the two. He knew how to make the numbers tell a story. It wasn’t just playing games with numbers, it was bringing insight into the game to the numbers. Every article, every year, had something fresh and surprising, putting all those statistics together in a different way. Most importantly, he knew how to write.
James stopped writing the Abstract because he found that, while he still loved the game, he had run out of ways of surprising people. But it left me in an awful fix. At that time, there was virtually no free historical or current data available on a large scale. Current data was there, every Tuesday and Wednesday, on the USA Today sports page. I might be able to punch in a few numbers and come up with the predicted wins for the teams, or with the runs created for a few players. But other than that, there wasn’t much out there.
What a difference a decade makes.
In true Army of Davids fashion, James also organized a fans’ response to the Elias Bureau’s hoarding of official stats. He developed an easy-to-learn, fairly-easy-to-use scoring system, and then helped organize a network of fans to score games and collect the stats on their own. This had the effect of breaking the Elias monopoly in stages, to the point where even Elias itself uses the Project Scoresheet scoring system.
There are now publicly available, real-time and historical databases online. There are publicly-available free stats packages and databases to help analyze them. Fortunately, Baseball Hacks can help you locate, set up, and use parts one and two. You still have to provide the third ingredient – analytical skill – on your own, though.
The name sounds like a John Kruk at-bat, but in fact, it’s a brisk and illuminating tutorial in how to load those public databases onto your own computer’s database, and then how to access them for statistical analysis.
While Adler does spend some time on Excel, his tools of choice are MySQL and a powerful stats package known as R. Both have the advantages of being free and open-source, so freelancers are continually developing new plugins for them. MySQL doesn’t have a built-in GUI, as does Access, though Adler does guide you to free downloads of Officially-Sanctioned tools.
For both R and MySQL, Adler guides you completely, if somewhat briefly, through a generic installation and basic configuration. He also shows how to set up and populate a MySQL database with those fat, juicy stats, and the how to get R to talk to MySQL. In fact, the bulk of the discussion is technical, rather than baseball-focused, making me wonder if the book isn’t a little bit of a trojan horse itself, designed to sneak in technical competence under cover of sports fanaticism.
The book is probably about the right length for the subject, and it assumes a minimum of knowledge of both subjects, geared to someone thirsty for more. The subject obviously appeals to fantasy-league seamheads, many of whom have some technical background to start off with. The problem is, the cool tool-building sometimes comes at the expense of baseball exposition.
Adler shows a fair number of cool stats and tools, like linear weights and runs created. But in keeping with the rest of the book, he does so with a minimum of exposition, leaving behind much of the analysis that created these tools in the first place, and that makes them so compelling. In true geek fashion, he also shows you how to set up your own fantasy league management system. With so many online tools available for just that purpose, the pages might have been better-spent explaining the Favorite Toy or Win Shares.
Still, it’s a substantial achievement to put the pieces together as cleanly has he has. Adler’s had to cover a lot of diverse ground here, and he does it fairly efficiently. He’s also provided enough documentation so that the interested reader knows where to look. And for someone who wants to use R to analyze, say, the stock market, to know where to start.
Once he finishes setting up that fantasy league.