Tuesday, May 11, 2010

Rethinking OPS

So, I've been thinking about baseball stats lately. Of course, I spend a lot of my time thinking about baseball stats. Particularly, what I was thinking about, though, is Bill James's reason for creating Win Shares-- he wanted a simple number (integer, actually, but I'm not that picky, since even a percentage could be converted to a larger number and rounded, if one were so inclined) to define players. That's what I'm after, too. Specifically, what I've looked into was OBP, SLG, and OPS.

For a long time, I've felt that those three statistics were really key in determining the value of a player. Now, that's probably the most obvious thing ever, but we have to start somewhere. Anyway, here's what I've thought about those stats:

OBP: Perfect. Well, as close to perfection in the history of baseball prior to sabermetrics, anyway. This is a fantastic way of valuing offensive contribution for two reasons: one, we know that on-base percentage is more closely tied to run-scoring than slugging percentage. Second of all, most "sluggers" have high OBPs anyway. And third, it actually measures contribution on a per-plate-appearance basis, which is extremely valuable in assessing a batter before he comes to the plate.

SLG: Positive and negative. In my humble opinion, not as strong as OBP for a couple of reasons. First of all, it's much more affected by ballpark than OBP (although both are certainly affected, it's undeniable that SLG is affected more). That's an issue I'm not really ready to touch. But, there are other problems. First of all, it's inadequate to value a batter before he comes to bat, because SLG does not factor in alternative ways of getting on base to hitting the ball. Third, while it certainly does an excellent job of covering most offensive situations, it does not cover the stolen base, which is an incredibly important thing, particularly to a few players (Rickey Henderson, Lou Brock, Tim Raines, Maury Wills, Ty Cobb, Vince Coleman, etc.) who are incorrectly undervalued without it.

OPS: You know what's funny about OPS? For a long time, it's been the Holy Grail of stats. Sure, it's simple and crude, but it's easy and it gets the job done. Well, the point here is accuracy, not just adequacy. In my opinion, the first major flaw of OPS, which is really obvious to me, anyway, is also the biggest. If you imagine OBP and SLG as fractions, rather than as decimals, it's pretty obvious that the denominators (that's the bottom number, in case you've forgotten) are different. While OBP is over PA, SLG is over AB. That's an issue for me. So, I tried to correct this problem. Here was my process:

I first went about with the theory that SLG was the right idea, but that it merely needed to be tweaked. Therefore, I decided the best thing to do would be to tweak it. So, I decided to incorporate on-base (and steals, actually) in the best way I knew how. I "fixed" the denominator to be "total offensive chances (TOC);" that is PA+SBA. Pretty easy, huh? So, then I had to fix the numerator. I called this "distance travelled," because it measures the distance on offensive player moved himself with his bat, his eye, or his legs (or any part of his body that were to be hit by a pitch). Anyway, that looks like this: TB+BB+HBP+SB. I was later informed that I should have been accounting for GDP (just to name-drop, the person who suggested I should do this was none other than Sean Forman of baseballreference.com-- I sent this to him the day I figured it out, and he looked at it because he's a wonderful human being. Anyway, it was also his idea to include CS. He mentioned Ks as well, but I've decided that doesn't really jive with what I'm going for, so I decided to ignore that suggestion while I accepted his others), which I should have. Initially, I subtracted this from the numerator, but decided it was actually better described as a wasted offensive chance than as the removal of a Total Base, so I added that to the denominator instead. The only downside to this approach is on the micro-level (because in the course of a season, it all works out): that is, if a player did nothing but ground into double plays, he would have a value of "0," when anyone can see that his value is, in fact, negative. I would, though, like to point out that this problem is also present in OBP, SLG, and OPS, and my goal wasn't to create a perfect stat (not yet, at least)-- merely a better one. Anyway, when you divide the whole thing out, I call it "average distance travelled (ADT)," and it looks like this:

[TB+BB+HBP+SB]/[PA+SBA+GDP]

The vast majority of the time, this number will be pretty dern similar to SLG. Personally, I like doing all this extra work, though. Other than the extra work, the only real downside is trying to factor in CS, since those data are not always available. Anyway, I thought this would be a passable substitute for OPS, since it factored both statistics into the equation. I did this two summers ago, and have only looked at it a couple of times since then.

Yesterday, though, I had a breakthrough. I realized that what I had really done was fix slugging percentage, which I knew from the beginning. What I didn't really realize until yesterday was that this statistic, in and of itself, was not, in fact, enough to compensate for OPS. I thought that modifying SLG to incorporate OBP would be to fix the problem of why SLG wasn't sufficient by itself. What I now think I did (by accident, I'll admit) is to fix a standard error in OPS-- or at least I removed a small chunk out of that error. For those who don't know, OPS actually slightly overvalues SLG. What we know from research is that OBP is actually slightly more valuable than SLG. That is, it's more important to get on base often than it is to go far on the basepaths. Since I've "corrected" SLG, what I actually needed to do was to still add in OBP. The only issue with OBP, as it would normally be used, is that to simply add it to ADT would be creating the same fundamental problem of OPS I railed against earlier-- the different denominator. So, I merely fixed that issue, again by incorporating steals, since they represent additional offensive chances-- in other words, chances to make outs or bases. There's the much higher success rate of stolen base attempts to be contended with here, but, for the vast majority of players, that won't really fudge their numbers too deceptively. Anyway, the new on-base (nOBP) formula looks like this:

[H+BB+HBP+SB]/[PA+SBA+GDP]

Obviously, combining the two makes a newer formula for OPS. I call it "expected offensive contribution (EOC)," and it looks like this:

[(H+BB+HBP+SB)+(TB+BB+HBP+SB)]/[PA+SBA+GDP]

or, more simply:

[H+TB+2BB+2HBP+2SB]/[PA+SBA+GDP]

It kind of looks complicated compared to OBP SLG, or OPS, but I really think it's a simpler and much more elegant estimate of how much a player has contributed. Here are some examples of some well-known seasons, and what they look like with the traditional measures (OBP/SLG/OPS), and my modified measures (nOBP/ADT/EOC). First, some that change very little:

Babe Ruth, 1920: .532/.847/1.379; .526/.862/1.388

Robin Yount, 1982: .379/.578/.957; .377/.589/.966

Hank Aaron, 1959: .401/.636/1.037; .397/.643/1.040

Interestingly, if Ruth had grounded into as few as 5 double plays (those data are not available), he would actually be worse by EOC than by OPS. Anyway, here are some of the players with a more “normal” difference:

George Brett, 1980: .454/.664/1.118; .455/.680/1.135

Albert Pujols, 2003: .439/.667/1.106; .435/.693/1.128

Sammy Sosa, 2001: .437/.737/1.174; .433/.761/1.193

This seems to be (for high-caliber players who are more traditional power/average hitters) about the normal rate of change—somewhere in the .015-.020 range. Here are some guys with big changes:

Ted Williams, 1941: .553/.735/1.287; .542/.783/1.325

Lou Gehrig, 1936: .478/.696/1.174; .475/.748/1.223

Mickey Mantle, 1957: .512/.665/1.177; .518/.737/1.255

As you may notice, the huge difference here is that these guys walked a lot, and that added plate discipline makes them much, much more valuable. Anyway, the common thread you’ll notice in this last group is that these are guys with uncommon plate discipline and base-stealing numbers. That shouldn’t be a surprise, since SBs are weighted so heavily by this system. Anyway, some .100+ point guys:

Craig Biggio, 1997: .415/.501/.916; .444/.593/1.037

Rickey Henderson, 1982: .398/.382/.780; .469/.544/1.013

Ty Cobb, 1915: .486/.487/.973; .620/.714/1.334

There were a lot of other guys I could have picked here (Maury Wills in 1962 had a difference of .164; Vince Coleman in 1987 was a difference of .192; heck—Ron Hunt, virtually on the strength of HBP alone moved up .097 in 1971), but I thought this was an interesting sample. Biggio’s is truly fascinating, as it’s powered by the big, fat goose-egg in GDP, a ridiculous 34 HBP, 47/57 in SB, and enough patience to draw 84 walks. An all-time underrated season. But enough about that—as you see, Cobb’s difference is over .350 points! It’s really interesting, actually, that by traditional OPS, Cobb’s best season is 1911 at 1.088, followed closely by 1925 at 1.066. That may sound suspicious to many of you baseball fans out there, since 1925 was well after Cobb’s prime. However, he did have limited PAs that season (490). Anyway, people who are familiar with Cobb’s career may know that he had an excellent 1915 season—although his OPS that season was only .944—which is one point below his career average. However, throw in 96 steals (caught 38 times), 118 walks, and 10 HBP, and suddenly we have an interesting discussion. Here are Cobb’s numbers from those two seasons:

1925: .468/.598/1.066; .469/.646/1.115

This is a sizeable difference, as measured by EOC. However, the next one is unbelievable:

1915: .486/.487/.973; .620/.714/1.334

I think it’s safe to say that Cobb was a better player in 1915 than in 1925, and this number supports that.

Another interesting case is Hank Aaron’s. Here are what I think are his two best seasons—1959 and 1963 (even though they’re printed above, I’ll repeat them side-by-side):

1959: .401/.636/1.037; .397/.643/1.040

1963: .391/.586/.977; .407/.629/1.037

Wow. What a difference, huh? Obviously, the stolen bases are huge (31 compared to 8), but so too are HBP (4 to 0), and his improvement in GDP (from 19 down to 11). Likely, though, it’s the walks that make the biggest difference. With only 51 walks in 1959, Aaron wasn’t setting the world on fire. However, his 78 in 1963 make a huge difference. Anyway, the 60 points of OPS are reduced to a statistically insignificant level, more accurately depicting these two seasons.

Finally, because I found this amusing, I think I’ll share it with you. In conducting this research today, I sample 41 seasons, and no two had a smaller difference (.002) than two A’s players of the 1980s—Rickey Henderson in 1982, and Mark McGwire in 1987. Sorry to re-hash Henderson, but I want you to see the comparison:

Henderson, 1982: .398/.382/.780; .469/.544/1.013

McGwire, 1987: .370/.618/.987; .367/.649/1.015

I found this hilarious. Can you imagine a funnier pair than the top SB season of the decade (and the century, for that matter) to be paired with the top HR season of the decade? You have to love baseball.

No comments:

Post a Comment