Stataesthetics

All it took was a split second for Barry Bonds to hit the home run that froze time in AT&T Park and left the Nationals twiddling their thumbs while fireworks lit up the sky above San Francisco. Bonds had broken Hank Aaron’s longstanding lifetime home run record, turning an otherwise more or less average (for Bonds) season into a glittering career capstone. “That folks, is baseball history, and I feel privileged to have seen it,” said one commentator as homer number 756 unfolded in replay. Friends and family rushed the field, and Bonds’s godfather Willie Mays took the microphone to congratulate Bonds on the JumboTron. There was a pre-recorded message from Aaron, a pitch-by-pitch analysis of the at-bat. There was replay after replay. “And here it is, folks.” Crack. The fan who snagged Bonds’s home run ball later sold his memento for just over $750,000.

Michael Phelps holds eighteen Olympic gold medals, more golds than any other athlete in any sport and twice the number held by the next most winning medalists. In 2008, he broke Mark Spitz’s 1972 record for the most first-place finishes in a single Games. His most historic Olympic moment? A strong contender is precisely that record-breaking eighth win in Beijing, an otherwise unremarkable race in which Jason Lezak easily held off Australian Eamon Sullivan to secure the 4×100 medley for the United States. I mean, for Phelps. As one announcer put it, Lezak kept Sullivan at bay “to give Phelps the greatest single performance in Olympic Games history.” Gold medal number eight. Every seat in the Water Cube was filled for the event.

That’s the bizarre beauty of numbers in sports, the perverse pleasure of repetition and accumulation, of pseudo-objective achievement and competitive measurement, the strange allure of one more than the last. What’s so exciting about seeing Lance Armstrong win his seventh Tour de France—as opposed to his sixth or fifth or third or first? Nothing in particular. Yet researchers surveying one European television market concluded that the final stage of the 2005 race cornered the largest viewing audience of any Tour from 1997 to 2010. What’s the most memorable moment in baseball history? According to fans polled by MLB.com, it was Cal Ripken’s 2131st consecutive game. When the game became official at 9:20 p.m. on the night of 6 September 1995, the sold-out crowd at Camden Yards reportedly cheered for 22 minutes straight.

Where but in the world of sport does a simple linear statistic—one more than the last—demand spectacle, evoke drama, inspire awe? And where but in the world of sport can the beauty of a single number be tarnished as instantly? Only three months after his record-breaking home run, Bonds was indicted for perjury and obstruction of justice in connection with the BALCO scandal. ESPN packaged the story in a boxing metaphor: “Bonds’ already-teetering legacy has taken a full body shot.” Images of Bonds hitting his famous homer turn poignant, captioned in a past tense laden with regret, and the “already-teetering” star becomes a spectacle of another sort. Ditto Ben Johnson, Mark McGwire, Lance Armstrong, and so on—athletes whose gorgeously improbable statistical triumphs lost their sheen once it seemed that the humanly impossible was just inhumanly enhanced.

The aesthetic appeal of enumerative facts incorporates a moral calculus, a value that’s purely contextual, and while a special kind of magnetism emanates from one more than the last, that magnetism carries with it a particular fragility. Additive statistics are easy to consume and to commemorate, but their meanings are highly dependent on situating frameworks. No special importance inheres in the numerals 756, 8, 7, or 2031. And since these numbers don’t speak for themselves, their meanings quickly shift once a new record-breaker crashes the scene or when a once-admired star falls from grace.

Enter the new analytics.

The new analytics—advanced statistics, computational analysis, sabermetrics, Moneyball, whatever—are designed to eliminate guesswork and affective fallacy from sports predictions, to fine-tune draft picks, and to more accurately represent a player’s value. In brief, to mitigate uncontrolled contextual variables. What do you get when you replace a “traditional” statistic like earned run average (ERA), which is highly dependent on variables outside the pitcher’s control (quality of fielding, for example) with a defense-independent pitching statistic (DIPS) derived from Vörös McCracken’s deceptively simple 2001 observation that one ought to differentiate a pitcher’s work from the work of his teammates? Surely you get a more virtuously balanced sense for the division of fielding labor. Arguably, though, you also get a better sense of a pitcher’s individual ranking in the league, since fielding-independent statistics eliminate the influence of teammates. Let’s look at Justin Verlander to see if it works.

Verlander, the unanimous winner of the 2011 American League Cy Young Award, posted a 2.40 ERA in 2011, the lowest in the league. He pitched 251 innings, struck out 250 batters, gave up just 67 runs, and posted 24 wins, not only taking the Cy Young but capturing the elusive pitching Triple Crown (lowest ERA, most wins, most strikeouts) and the AL MVP. Shane Ryan, writing for Grantland, has called Verlander a “statistical darling,” which sounds great, since his traditional stats handily match up with his public stature and, hence, with his image as the most valuable pitcher in the AL. Is Verlander, though, a sabermetric statistical darling? Again, what happens when we replace traditional measures like ERA with sabermetric statistics that dissociate fielding-dependent numbers like wins and runs from defense-independent ones like strikeouts?

To measure defense-independent pitching statistics, the analytics site FanGraphs uses a “fielding independent pitching” (FIP) formula, which scales strikeouts, walks, hits by pitch, and home runs against league averages for innings played. According to FanGraphs, Verlander posted a 2.99 FIP in 2011, only 1.25 times higher than his conventional ERA. ESPN doesn’t publish their DIPS formula, but according to their calculations, Verlander posted a 3.13 DIPS for 2011, 1.3 times his ERA. The FanGraphs FIP ranking drops Verlander to second in the league. By ESPN’s lights, he comes in third. Both sites rank CC Sabathia above him. So who’s the better pitcher? Who’s more valuable to his team and in the eyes of the league? Sabathia finished fourth in Cy Young voting and was fourteenth in line for the MVP. Choosing the grounds upon which to compare pitchers is tough enough using crude measures; advanced metrics—with their carefully qualified givens and constants and their frequently mysterious proprietary algorithms—don’t necessarily simplify matters.

In spite (or perhaps because) of this, we love them. And not just in conventionally statistic-driven games like baseball. Advanced analytics are quickly becoming status quo in the professional sporting world at large, owing at least in part to Nate Silver’s meteoric ascension in the wake of the 2008 presidential election. And then there was Moneyball (I mean the Brad Pitt jam, but I suppose you might also think of the book). But to find ourselves an Ur-statistician, we have to look to Bill James, whose originally self-published The Bill James Baseball Abstract became one of the most influential ventures in the history of sports, shifting the analytic paradigm irrevocably away from fuzzy notions of heroism and toward hard-boiled numerical fact. Ben McGrath wrote an excellent story on James for The New Yorker way back in 2003, just a year before the Boston Red Sox, with the newly-hired Bill James on board, took home their first World Series win since the Babe Ruth era. “The Red Sox,” McGrath wrote, in what has since become a ubiquitous idiom, “have not merely sided with the brainiacs; they’ve enlisted the help of the founding nerd.”

The recent popularity of nerdom in sports (possibly of the poseur variety, but more importantly of the number-crunching sort) has given rise of course to a debate about the nature of athletic value, about the measures of virtuosity, and about, in short, the aesthetics of statistics. Home-spun romantics face off against algorithmically clever data-farmers, nostalgia-sopped print traditionalists against digi-savvy socially-networked Grantlanders.

All this became painfully clear in the run-up to the 2012 American League Most Valuable Player award, during which baseball analysts young and old bandied the virtues of sabermetrics and conventional stats, betting close odds on whether new or old numerical regimes would triumph in a race between statistically-sophisticated holistic measures and improbable accumulation. It got contentious. In the end, Mike Trout’s sabermetrically-superior performance wasn’t enough to overshadow Miguel Cabrera’s improbable hitting Triple Crown, the first in baseball since 1967. In the end, in fact, the vote wasn’t close. Miguel Cabrera took 22 of 28 votes, and his boring old batting average (.330), home runs (44), and RBI (139) earned him the league’s most valuable title. Cabrera, by the way, earns $21 million a year. The Angels just renewed Mike Trout’s rookie contract for $510,000.

Jonah Keri sums up Trout’s undervaluation two ways: in terms of human affective error (players aren’t good statisticians, but Baseball Writers Association voters tend to trust their authority) and in terms of carefully weighted statistics accounting for factors other than raw hitting (like park differences, sure, but also base-running, and fielding). Most analysts, however, turned in Trout’s defense to a measure thus far conspicuously absent from this discussion: wins above replacement (WAR). WAR measures the number of wins with which a player can be credited above the number of wins a minimally-valued replacement might reasonably be expected to contribute to the team. WAR is non-standard, so like defense independent pitching statistics, different publications calculate it differently. In general, though, WAR collates hitting, base-running, and fielding; accounts for park conditions, player position, and league averages; and weighs the resulting figure against a fixed value denoting the cost of replacing the player in question with a scrub. Like WAR itself, fixed replacement value is calculated differently in different venues; the replacement value FanGraphs uses, for example, is 20 runs below average per 600 plate appearances, or (20/600)PA. Sounds simple, right?

At the end of the day, it is. And that’s exactly why statophiles like Keri avoid mentioning WAR when they want to make the mathematically objective case for an undervalued player like Mike Trout. Calculating wins above replacement might be a complicated maneuver in and of itself, but once the value’s calculated, it’s merely—you know what’s coming—additive. One more than. For all its behind-the-scenes sophistication, the most debated metric in the sabermetric toolbox reduces mathematical complexity to an easily digested linear scale.

Further, since the detailed formulas governing WAR stats are frequently proprietary, the layperson has to rely on outside authorities to make the calculations rather than crunching the numbers herself. In 2003, to give a salient example, Nate Silver sold his Player Empirical Comparison and Optimization Test Algorithm (PECOTA) to Baseball Prospectus, which markets data to major league insiders as well as to fantasy baseball enthusiasts. While historical PECOTA stats are publically accessible, current predictions are big business, and the formulas themselves remain confidential. And once someone else has done the number-crunching, you only need rudimentary addition to bat this fancy stat around the happy hour table. According to ESPN, my statistical digest of choice, Justin Verlander posted an impressive 8.3 WAR in 2011; Sabathia came in second with 7.0 wins. Verlander’s clearly the better pitcher. Done.

The aesthetics of the new analytics and the aesthetics of one more than the last are at heart the same. It doesn’t matter that one comes with hipster frames and a humming MacBook while the other languishes in front of a sputtering 90s Dell waiting for the DSL to come back online. Stataesthetics boils down to one-upmanship.

But what about the spectacle? you might ask right about now. What about the fireworks and the streamers and the banners unfurled from the B&O Warehouse behind Camden Yards? I’ve never heard 22 minutes of applause for a WAR (ignore if you can the inescapable stupidity of this humorless acronym)…

And no, of course you haven’t. But the absence of fireworks isn’t the absence of celebration, just as to laud the abstract quality of commitment isn’t to say that commitment can’t be measured. Ultimately, celebrating 2131 consecutive instances of participation does just that. It takes a quantified measure—a measure of commitment, of team spirit, of devotion, of sheer good health, perhaps, but a quantified measure nonetheless.

Even the most resolutely committed luddites and devoted romantics among us, after all, revert now and then to less than high-minded standards of quantifiable comparison. Take Rick Reilly’s self-consciously dumb equation of the Packers with the Steelers, for example: “They’re the exact same team! Consider: Both teams have no cheerleaders.” Or if you want something less tongue-in-cheek, more properly qualitative and moralizing, take this one-to-one comparison of Roger Federer and Tiger Woods on the grounds of greatness in the category of Life:

The question was “Who’s greater?” Greatness is more than what happens on stage. Both men are great athletes, but greatness is as greatness does. Greatness is also in the way you carry it, the way you treat fans and colleagues and waiters. I’ve never met anybody in tennis more polite and giving and generous than Federer. [ . . . ] At the same time, I’ve never met anybody in golf less interested in others than Woods.

The conceit here—greatness measured by the Golden Rule—suggests that real success is something qualitatively alien to the metrics of accomplishment, that moral greatness matters more than wins, more than records, more than comebacks. Real success, Reilly reminds us, depends on the company you keep and on how you keep it, and in order to be truly great, an athlete must measure up as a sportsman. But hold the phone. Because even here, the ghostly imprint of a statistical table hovers in the background of this metaphor: “more polite,” “less interested.” Do you get the sense Reilly would measure such things if he could? So we shouldn’t be surprised to see the whole conceit come crashing down around a number in the end. “Tiger Woods,” Reilly concludes, “still has 9½ years to rise to the place Federer is now. I hope he does.” Who’s greater? Federer. By 9.50 years’ kindness (YK).

At both ends of the sports analysis spectrum, achievement and natural ability are assessed according to quantified standards, even if at the parodic end the numerical scales applied are blissfully arbitrary or irrelevant (number of cheerleaders, years of kindness). Measurement is what makes sports matter, what makes them mean something, and measurement is always already about context, no matter how computationally complex, how objectively neutral, how rationally unemotional. Sorting out the contexts is a pleasure in itself, and that’s a beautiful thing. The aesthetics of counting justify our enjoyment in the otherwise mindless activities of whacking a ball with a stick or running in circles until we’re too tired to stand or jumping off something as tall as a building. That’s what sporting is all about.

But we might notice, too, that the justification of secret joys goes the other way as well, which is, I suppose, just to point out how the mindlessness of jumping through hoops conveniently authorizes what otherwise might have been merely an embarrassing penchant for numbers.

Katie Muth teaches contemporary literature and culture in the United Kingdom, where from time to time she crawls out of bed at odd hours to consume American sporting events.

Great piece! I think you’re right about the aesthetic aspect. What’s interesting is how often that gets obscured in the interest of constructing polemics. But without insane energy put into creating the metrics and what-have-yous to support the arguments, you wouldn’t have the beauty. It’s all a fast-spinning merry-go-round that goes nowhere but is really fun to ride.

— Steve T.    Apr 27, 12:42 AM    #

print

Leave a Reply

Your email address will not be published. Required fields are marked *