Saturday, September 29, 2012

Fun with Runs Created

I've been rather prolific lately.  It's been fun while it's lasted, though I don't know how much longer that'll be.

Regardless, when I was looking at all of that Chipper Jones stuff yesterday, I couldn't help but start looking at Runs Created.  Why?  Well, when I was thinking "What might Chipper Jones have 1500 of?" one of the things that crossed my mind was RC.  And, in fact, Chipper does have over 1500 RC.

For the uninitiated, Runs Created has many versions, but the basic version is this:


It's a handy formula, because it only takes four variables.  Just for funsies, I checked six teams this year, to see how closely these factors matched with what we'd expect them to be.  I tried to get teams from different types of parks, overachievers, underachievers, good teams, and bad.  And I didn't want to do all 30 teams.  So anyway, you can see how ridiculously accurate RC is.  The team is listed, and then actual Runs/RC:

Milwaukee:  752/748
Pittsburgh:  639/626
Los Angeles Dodgers: 616/606
Colorado Rockies:  745/777
New York Mets:  639/640
New York Yankees:  765/789
Boston Red Sox: 721/714

So you can see that Colorado's been a pretty extreme example... but they've been awful, so it makes sense that they've been underachieving what components would have you believe they'd do.

Anyway, RC doesn't actually work great for individual hitters.  The reason is that the formula assumes that these four factors (AB, H, BB, TB) actually interact with one another.  Well, obviously, Chipper Jones doesn't just interact with his own AB/H/BB/TB... he actually interacts with other people's... and those other people weren't as good as Chipper Jones.  So RC naturally overestimates the abilities of good hitters.  But who cares?  It's still fun, and it's a good, summative way to look at basically all parts of hitting while still being easy enough to calculate yourself.  So keep this in your pocket, because it's fun to pull out sometimes.

And on to the crux of this post...

So, after goofin' around with RC a little, I looked at the leaderboard on Baseball-Reference (mad shout-out to them... that's where I've gotten the stats for the last few posts I've done, but I completely forgot to credit them - sorry, Sean and Neil!), and realized that they used a more complex version of RC*.  No biggy.  I just used their top 100 players, and recalculated the "basic" version of RC for each player.

*Under normal circumstances, the various "technical" versions of RC don't differ that much from one another, but they do for a couple of people.  Specifically, they crush the guys who have mad base-stealing skills.  Barry Bonds loses 246 Runs using the basic version, and so does Joe Morgan - seriously, both lost 246, on the nose.  Rickey Henderson lost - get this - 334 runs.  Those are HUGE differences, and I'm sorry to have to not include them.  But it does take away from the beautiful simplicity of RC the more you add.  And most other players were affected by 40 runs or fewer.  That sounds huge, but since we're dealing with guys who, for the most part, had 20-year careers, we're talking 2 R/year, which is pretty insignificant.  74/100 were affected by 60 runs or fewer, and no one had 60 or more runs added by using the less-technical version.  Besides, while it does affect the number, it only rarely has a significant effect on the order of players, so I decided to do this as simply as possible.

Anyway, this is really just for fun.  So I calc'd it, and looked at the results.  Interesting, no doubt.  But then I decided to look at it as a rate stat.  Actually, as a ratio stat, because I didn't want to estimate PAs, or use ABs, or use AB+BB (which would have been fine, I guess).  So I used batting outs (AB-H).  My question was, who has "hit" .300, using RC/O (that's runs created/out)?

Well, of these 100 guys, less than 1/3 of them.  Ruth tops the list, with nearly 1 RC for every 2 outs (.495).  Hank Aaron "hit" exactly .300 by this calculation.  As always, the list was populated by pre-integration guys, and guys who played in the Selig Era.  Every one of the 32 guys fits into one of those 3 categories, or is Hank Aaron, Willie Mays, or Mickey Mantle.  I was expecting Mike Schmidt, but he finished lower at .273 than (get this) Will Clark (.274).  In case you're curious, the worst finisher by rate was Lou Brock, who "hit" .198.  Now, I know what you're thinking - "but Brock was a basestealer!  So he probably got robbed by using the basic formula!"  Nope.  Brock lost only 46 runs by using the basic version of RC.  Still a hefty load, yes, but only enough to push him up from #100 to #98.  So, yeah.

But then I thought, well, why don't I take the rate, and multiply it by the raw number.  That should give a nice compromise.  Of course, it does do this.  If you're into algebra, write out the equation, look at the cancelling, and be amused.  I was.  But it doesn't really matter, because it pretty much looks right.  Anyway, by this reckoning, here are the top ten hitters of all-time, balancing rate and total.  They're presented as Runs Created/RCRate/RC*RCRate, with the third of those being the organizing principle.

Babe Ruth 2733/.495/1352
Ted Williams 2347/.468/1091
Barry Bonds 2646/.383/1013
Lou Gehrig 2250/.426/959
Stan Musial 2551/.348/887
Ty Cobb 2510/.346/870
Jimmie Foxx 2119/.386/818
Rogers Hornsby 2030/.387/786
Hank Aaron 2576/.300/772
Willie Mays 2333/.307/716

Something about having Aaron and Mays next to one another feels really right about this.

In case you're curious, since we've been talking Chipper lately, he ranks #19, for now.  I say "for now" not just because he's active, but because he's two spots behind A-Rod, one ahead of Todd Helton, and two ahead of Jim Thome.  Actually by B-R RC, RC rate, or RCRate+RC, Thome and Jones end up right next to each other, so I guess they belong together.  But anyway, there could still be some movement among those guys, even by the end of the season, so nothing's really set in stone there.  Manny Ramirez (#12) is the highest ranking "active" player.  Among "active" players who actually are active, Albert Pujols tops the list (#14).  The lowest ranking player among these 100 was Steve Finley.  At #99 was Lou Brock.  Derek Jeter ranks 1 point behind Lance Berkman (432-433).  I wouldn't have guessed that.  Edgar Martinez ranks at #37 - and people say he's not a HOFer.  Milwaukee's own Al Simmons ranks #22.  Speaking of Milwaukee, Robin Yount is all over the map.  He ranks #55 by RC (behind another Milwaukee connection, Eddie Mathews), by rate he ranks #98 (ahead of Finley and Brock), and by overall, he ranks #90.  Frankly, it's not bad for a SS, I think.  Molly fairs better, #56 overall; and since I mentioned Mathews, he's one spot ahead of Molitor.

So, that's pretty much it, I guess.

The Greatness of Chipper

Well, Cybermetrics got me thinking again with Cy's post over there today.

Players with .300 Avg, 1000 XBH, 1500 BB, 150 SB, 1500 RBI:

Chipper Jones

That's the whole list.  So that's pretty cool.  But here are some other exclusive lists of which he's a part.

.300/.400/.500 career guys:

Dan Brouthers
Ty Cobb
Jimmie Foxx
Lou Gehrig
Hank Greenberg
Harry Heilmann
Todd Helton
Rogers Hornsby
Shoeless Joe Jackson
Chipper Jones
Edgar Martinez
Stan Musial
Lefty O'Doul
Mel Ott
Albert Pujols
Manny Ramirez
Babe Ruth
Tris Speaker
Frank Thomas
Joey Votto
Larry Walker
Ted Williams

1500 RBI, 1500 R, 1500 BB (the 1500 Club, as it were)

Barry BondsLou Gehrig
Chipper Jones
Mickey Mantle
Stan Musial
Mel Ott
Babe Ruth
Mike Schmidt
Jim Thome
Ted Williams
Carl Yastrzemski
 And, of course, both groups combined (which gives us totals and rates combined):

Lou Gehrig
Chipper Jones
Stan Musial
Mel Ott
Babe Ruth
Ted Williams

Honestly, I know all about the "let's make a list fallacy," as Bill James called it, but I nonetheless find it impressive that one can create a group with just these six names on it.  I mean, the other five are generally regarded as some of the best pure hitters of all time (although Mel Ott is often left out of such discussions).  And only one of them played his entire career post-integration:  the great Chipper Jones.

Friday, September 28, 2012

Answer to Trivia Question

Over at Cybermetrics, there's a post asking two trivia questions.  First, who are the only two players with 1200+ RBI, 250+HR, 300+ SB, and 3000+ Hits.

Easy.  Got it on my first guesses.  Willie Mays and Derek Jeter.

But that's actually the lesser question.  The primary question is, "Through 1990, Who Were The Only 3 Players With 1000+ RBIs, 250+ HRs, 250+ SBs and 2500+ Hits?"

Willie Mays
Joe Morgan
Vada Pinson

Since 1990 (since it's only seven more guys):
Craig Biggio
Barry Bonds
Andre Dawson
Steve Finley
Derek Jeter
Gary Sheffield
Robin Yount

Actually, both Yount and Paul Molitor narrowly miss on the first question.  Yount finished 29 SBs shy... Molly missed by 26 HRs.

For the second trivia question, I noticed that including RBI actually has no bearing on the answer to the question.  As to the first question, Biggio misses by 25 RBI.  I guess Biggio and Jeter are pretty closely tied in my mind.  I've watched them both throughout my life, both are middle infielders, both are part of a core "group" of players (Bags and Bigg in Houston, the "Core Four" in New York), both are associated with just one team (though, who knows, Jeter may finish his career elsewhere).  It doesn't surprise me to see them this close statistically on such a list.  The only real difference is that the media had a positive effect on Jeter's popularity; the media was more or less neutral on Biggio, though it could be argued that they had an averse effect, by not appreciating what he did well.  Anyway, that's that.

Thursday, September 27, 2012

ERA+ Estimators, All-Time Greats

So, after my last post (which I realize just went live this morning), I decided it would be fun to look at some of the great seasons of all time.  If you need explanations of what the numbers mean, see that other post.  Here are the numbers:

Bob Gibson, 1968:  206/106/628
Doc Gooden, 1985:  322/137/892
Greg Maddux, 1995:  235/91/494
Walter Johnson, 1910:  428/181/1585
Steve Carlton, 1972:  256/136/887
Curt Schilling, 2002:  328/196/852
Roger Clemens, 1997:  361/165/954
Pedro Martinez, 1999:  520/223/1111
Hal Newhouser, 1946:  867/156/2540
Randy Johnson, 2001:  320/207/1008
Justin Verlander, 2009:  228/112/548
Sandy Koufax, 1965:  300/207/1008
Lefty Grove, 1930: 4913/145/14299 (!!!!!)
Dizzy Dean, 1933:  945/120/2770
Cy Young, 1905:  317/123/1018

What's funny is,  last year, no one broke 1000 in the final category (and I doubt anyone will this year).  Basically, all of these guys either won the Cy Young in the year noted, or there was no CY, but the pitcher at hand was considered the best in the league.  There are a couple of exceptions, however:  Schilling '02 and Verlander '09.

Verlander lost out to Zack Greinke, who posted a 234/109/536 line, which is very, very similar to Verlander's.  Schilling in '02 lost out to teammate Randy Johnson, who posted a 300/175/780 line, which is just a hair below what Schilling did that year.  Neither of these could really be considered a "bad" choice, particularly Greinke over Verlander in '09, since they were nearly identical by this measure.

These columns are, of course, somewhat interesting.  What I found myself most drawn to was the middle column - it's SO-BB over average (if an average pitcher faced the same number of batters).  Both Randy Johnson and Sandy Koufax had differences of 207 over an average pitcher, but both pale in comparison to Pedro Martinez's 223 in 1999.

As for the other two columns, the first functions a bit like ERA+:  it shows how much better the player was than the league rate.  Obviously, Lefty Grove's 1930 line is absolutely ridiculous.  So is Martinez's, Newhouser's, and Dean's, when you look at them.  But the others?  Well, they're not actually that far off from some of the best ERA+ seasons of all-time.  High-200s, low-300s seems pretty normal for an historically great season.  So it doesn't even seem weird.

The final column, which is the rate/league rate multiplied by innings pitched, serves to take workload into account, so that a relief pitcher's line can be compared (somewhat) to a starting pitcher's.  Well, again, Grove is off the charts.  But most of the other numbers are pretty tame, actually.

So, the point is, I really like this stat line, and I think I'll return to it once in a while to check up on how various pitchers are doing in the coming years.

ERA+ Estimator

Tom Tango's been doing some brilliant stuff lately.  That links to an article at The Hardball Times, at which there is a discussion of (and hyperlink to) an article from THT last week about ERA estimators.  Anyway, as it turns out, the best way to estimate ERA is (K-BB)/BF.  That's it.  Strikeouts, minus walks.  Then divide by batters faced.

This is very, very interesting stuff, indeed.  Because I have an idea for how I may look at Cy Young voting differently in light of this fact.  Essentially, it makes sense to me to take a players (K-BB)/BF, and divide it over the league rate.  That is, it becomes (K-BB)/BF+, for all intents and purposes, or Est+ (as in "estimated+").  Here's last year's AL:

15655 SO
6949 BB
86425 BF
For a quotient of:  .1007

And, in that order, here are some of the leading candidates for last year's AL CY:

Verlander - 250/57/969
Sabathia - 230/61/985
Weaver - 198/56/926
Haren - 192/33/953
Wilson - 206/74/915
Hernandez - 222/67/964

Overall, here are their personal quotient, over the league quotient (of .1007, if you recall), times 100:

Verlander - 198
Sabathia - 170
Weaver - 152
Haren - 165
Wilson - 143
Hernandez - 159

Now, if you wanted to multiply these times IP to get a playing time factor, that would be fine by me.  But just know that this might be a way to look at things in the future.  Or another way to look at it would be to look at the raw total difference between the pitcher at hand and an average pitcher.  In other words, the league rate of (SO-BB)/BF, multiplied by the individual at hand's BF, and then subtracted from the individuals SO-BB.  In mathematical terms, it would look something like this ("pl" for "player," "lg" for "league):


For the aforementioned pitchers, that would yield these results (numbers truncated, rather than rounded; rate*IP in parentheses):

Verlander -95 (496)
Sabathia -69 (404)
Weaver -48 (358)
Haren -62 (394)
Wilson -39 (319)
Hernandez -57 (373)

Clear-cut in favor of Verlander, right?

So why do I bring this up?  I bring it up because of the Cy Young race in the NL right now.  As it stands, here are the numbers for some of the best candidates:

Dickey - 151/52/331
Gonzalez - 134/32/260
Cueto - 112/13/237
Kimbrel - 358/67/215
Chapman - 305/65/211
Kershaw - 151/52/321

No, those numbers for Cueto and Kimbrel are not misprints.  They're seriously that much better than the league... albeit in limited numbers.  However, as you can see, the difference in total numbers is not that great, as they rank ahead of all but one guy, who's out on the periphery...

Cliff Lee - 174/71/346

Seriously.  The guy with the 6-8 record is blowing people away.  He's actually having a great year; his team's just not winning.  The Phillies' offense is terrible, and it's getting taken out on Cliff Lee, even though he's probably been the best starting pitcher in the NL this season.

Now, I don't ever think a number like this would ever go "mainstream," but it's interesting to think that, if it did, Cliff Lee might have a shot at the NL Cy Young.  But as it stands right now, I can't imagine he's on too many people's radar screens.

Anyway, here's the AL as it stands right now, just in case you're curious:

Sabathia - 155/49/285
Hernandez - 153/54/338
Verlander - 162/66/374
Scherzer - 191/81/352
Price - 154/50/315
Weaver - 117/13/211
Darvish - 139/35/256
Sale - 158/50/298
Shields - 145/46/308
Rodney - 180/24/126
Nathan - 232/37/142

Now, obviously, the season isn't over yet, so we could see some movement on both of these leaderboards.  But as it stands right now, I believe my vote for AL Cy Young, strictly on the basis of these stats, would go the way of Justin Verlander.  He's pitched more innings than Scherzer, who's pitched better.  They'd be 1-2, with King Felix in third, since I view the third of these columns as most significant.  In the NL, I desperately want RA Dickey to win the Cy Young.  I've loved the guy since he was with the Twins and I was living in Minnesota.  But I can't shake the nagging suspicion that Cliff Lee is actually the NL's best pitcher.  So I suppose it'll be like last year's NL MVP, in which I voted for Matt Kemp, because he was the best player... but I wanted Ryan Braun to win, because he's my favorite player.  Likewise, I'll be rooting for RA Dickey, but when the time comes for the IBAs (Internet Baseball Awards - you should vote if you never have before!), as things stand right now, I'm voting for Cliff Lee.

Monday, September 24, 2012

Cabrera vs. Trout

So, everyone's weighed in on this already.  Like Brian Kenny.  And Joe Posnanski.  And a bunch of others.  What do I have to say?  Well, the "statheads" are right, of course.  Trout's been better than Cabrera this year, and by quite a bit.  That's true whether or not Cabrera wins the Triple Crown.

But I'm not actually here to debate the two (in spite of the title), because to my mind, it's pretty cut-and-dried who's been better.  I'm here to celebrate Miggy Cabrera.

As some of you who read older posts may have noticed, I love the Triple Crown.  I understand that it's pretty meaningless in the grand scheme of things, but it's so flippin' fun, I just can't help but love it.  But, as everyone knows, there's a HUGE amount of luck in the Triple Crown.  Like take Yaz's Triple Crown:  44/121/.316.  Good numbers, all.  But at no point in the 1990s would he have led the league in ANY of those categories.  At all.  So while it's true that he won the Triple Crown, that season wouldn't have in a lot of other years.

Now, I know I'm cherry picking here, and projecting, and not adjusting for era.  But Miggy's final numbers this year may be:  45/140/.330!  That's incredible.  Do you know how many times that's been done in MLB history?  17 times.  That's it.  By 7 different players.  Here's the list:

Babe Ruth, 1921:  59/171/.378 (no MVP awarded)
Babe Ruth, 1926:  47/146/.372 (ineligible for MVP)
Lou Gehrig, 1927: 47/175/.373 (Won MVP)
Babe Ruth, 1927:  60/164/.356 (ineligible for MVP)
Babe Ruth, 1929: 46/154/.345 (no MVP awarded)
Babe Ruth, 1930:  49/153/.359 (no MVP awarded)
Hack Wilson, 1930: 56/191/.356 (no MVP awarded)
Babe Ruth, 1931:  46/163/.373 (Did Not Win MVP)
Lou Gehrig, 1931: 46/184/.341 (Did Not Win MVP)
Jimmie Foxx, 1932:  58/169/.364 (Won MVP)
Jimmie Foxx, 1933: 48/163/.356 (Won MVP)
Lou Gehrig, 1934:  49/165/.363 (Did Not Win MVP, Won Triple Crown)
Lou Gehrig, 1936: 49/152/.354 (Won MVP)
Joe DiMaggio, 1937:  46/167/.346 (Did Not Win MVP)
Jimmie Foxx, 1938: 50/175/.349 (Won MVP)
Hank Greenberg, 1938:  58/146/.340 (Did Not Win MVP)
Todd Helton, 2001:  49/146/.336 (Did Not Win MVP)
Miguel Cabrera, 2012?

So, you can see, players have not won the MVP with this kind of season before.  In fact, in some ways, Gehrig's 1934 is most similar:  team missed the playoffs (as the Tigers may), but Gehrig reached these milestones AND won the Triple Crown... but he lost the MVP.  As you can see, there have been 17 of these seasons.  But in six of them, there was no MVP awarded, or the player who accomplished the feat was ineligible.  That leaves 11 seasons.  Of those 11, two occurred in the same season as another.  That leaves nine seasons in which a player COULD have won the MVP, finishing with a .45/140/.330 line. Someone won with that line 5 times - just over half (and keep in mind that we're not including near-misses, like Albert Belle in 1998, or Ted Kluszewski in 1954, which would lower the percentage even more).  So, while Cabrera is putting up a fantastic season, it still wouldn't be unprecedented for him to not win the award; and that's true even if he did win the Triple Crown.

One final note:  all of these seasons occurred in the 1920s and 1930s, the best eras in history for baseball offense, and before integration, good minor league systems, and proper scouting; well, besides the Helton season, which occurred in perhaps the most run-rich environment (outside of the Baker Bowl in the early '30s) in baseball history.  So if Cabrera does it this year, we may actually say that it's a totally unique season in baseball history.  And he deserves to be celebrated for that.  He just doesn't deserve the MVP.

Saturday, September 22, 2012

The OPSBI Fallacy

Yes, I realize I'm a year too late to this one.  But the MVP talk is a-startin', and I just know some idiot sportswriter is going to reference this "stat" when explaining their vote, so I thought it was worth another look.  I realize it's been torn apart by saberists before.  But I wanted to take a different look at OPSBI.

For those who don't know, OPSBI was created by Jim Bowden, the former Reds/Nats GM.  The basic idea is this:  you take the players on-base percentage, and add his slugging percentage (thus, the OPS).  Now, eliminate the decimal point (or multiply by 1000, however you prefer to think about it).  Then, add the player's RBI.  That's it.

So, first, I want to point out why this is stupid.  First of all, it doesn't measure anything.  At all.  It arbitrarily adds a rate stat (two rate stats, actually) to a counting stat, without any consideration for why.  It doesn't correlate to anything.  As we know, OPS has a reasonable correlation with run-scoring.  RBI represent actual runs batted in (though not runs scored).  So, you're adding something which correlates to run-scoring with something that actually is one-half of run-scoring.  Now, someone clever could probably make the argument that if you added Runs to this, you'd solve some of that problem.  But here's the question - why add anything to OPS at all?  I don't get it.

Anyway, Bill James says, "For a statistic to have value, it has to be meaningful with reference to something other than its own formula" (The New Bill James Historical Abstract, in the player comment on Craig Biggio).  OPSBI fails that test.

Of course, there are still defenders out there.  I don't feel like looking for articles right now, but I remember reading at least two of them last offseason.  Here's the thing they'll say, more or less:  "Who cares if it doesn't measure anything - it gives the right answer!"  Okay, well, in my mind, it gives the "right" answer - that is, it affirms (much of the time) the conclusion sportswriters have already made.  But I personally believe that I can throw out a lot of other BS stats that will do the same thing (more or less).  Anyway, I'll be looking at the last five years of data (not including 2012, of course, since we're still underway), and looking at the top MVP-finishers (non-pitchers only) for each season, and comparing them by different metrics.  Those metrics are:

MVP Finish - where did the player finish in subjective MVP voting?
OPSBI - of course.
RunAvg - (R+RBI)/AB
ButTheKitchenSink*Games - Games*(RBI+R+TB+BB+HBP+SB)/(3*AB) ; I already posted about this before.
 ButTheKitchenSink(rate) - (RBI+R+TB+BB+HBP+SB)/(3*AB) ; same thing, but as a rate stat (scaled to batting average); really, what this is, is RunAvg+BattingAvg+SecondaryAvg, and divided by three so that it looks like the players batting average.
StolenHomes - SB+HR
TripleCrowns - (3*HR)+((1000*Avg-100)/2)+(RBI)
HitByBallSacs - because I couldn't resist:  HBP+BB+SF+SH
rWAR - because, wouldn't it be fun if I used a metric that actually seemed to represent real value?

One last note before presenting:  in 2008, Manny Ramirez only played 53 games in the NL, so only his NL stats are included.  Also, I hope this formats okay.  Here come the charts:

2011 NL MVP OPSBI RunAvg BTKS BTKSr StoHos 3Crowns BallSacs rWAR
Braun 1 1105 .391 57.9 .386 66 343.0 66 7.7
Kemp 2 1112 .400 63.7 .395 79 366.0 87 7.8
Fielder 3 1101 .378 62.2 .384 39 345.5 123 4.3
Upton 4 986 .326 54.2 .341 52 294.5 82 5.7
Pujols 5 1005 .352 50.0 .340 46 322.5 72 5.1

2011 AL MVP OPSBI RunAvg BTKS BTKSr StoHos 3Crowns BallSacs rWAR
Ellsbury 1 1033 .339 54.9 .347 71 329.5 69 8.0
Bautista 2 1159 .405 64.6 .433 52 340.0 142 7.7
Granderson 3 1035 .437 62.3 .400 66 332.0 108 5.3
Cabrera 4 1138 .378 62.3 .387 32 337.0 116 7.3
Cano 5 1000 .356 52.1 .327 36 325.0 58 5.2

2010 NL MVP OPSBI RunAvg BTKS BTKSr StoHos 3Crowns BallSacs rWAR
Votto 1 1137 .400 60.4 .403 53 349.0 101 6.7
Pujols 2 1129 .397 63.6 .400 56 358.0 113 7.3
C. Gonzalez 3 1091 .388 53.3 .367 60 353.0 49 5.8
A. Gonzalez 4 1005 .318 52.8 .330 31 362.0 101 4.1
Tulowitzki 5 1044 .391 44.6 .365 38 306.5 59 6.5

2010 AL MVP OPSBI RunAvg BTKS BTKSr StoHos 3Crowns BallSacs rWAR
Hamilton 1 1144 .376 49.6 .373 40 343.5 53 8.4
Cabrera 2 1168 .432 61.4 .409 41 366.0 100 6.1
Cano 3 1023 .339 52.3 .327 32 326.5 70 7.8
Bautista 4 1119 .409 66.3 .412 63 362.0 114 6.6
Konerko 5 1088 .365 54.1 .363 39 345.0 83 4.3

2009 NL MVP OPSBI RunAvg BTKS BTKSr StoHos 3Crowns BallSacs rWAR
Pujols 1 1236 .456 72.6 .454 63 392.5 132 9.4
H. Ramirez 2 1060 .359 53.9 .357 51 325.0 76 7.1
Howard 3 1072 .399 59.5 .372 53 370.5 87 3.5
Fielder 4 1155 .413 65.9 .407 48 382.5 128 6.0
Tulowitzki 5 1022 .355 54.6 .362 52 304.5 85 6.3

2009 AL MVP OPSBI RunAvg BTKS BTKSr StoHos 3Crowns BallSacs rWAR
Mauer 1 1107 .363 50.9 .369 32 3028.5 83 7.6
Teixeira 2 1070 .369 56.7 .363 41 346.0 98 5.1
Jeter 3 937 .273 46.3 .302 48 269.0 82 6.4
Cabrera 4 1045 .326 53.4 .334 40 333.0 74 4.7
Morales 5 1032 .343 50.8 .334 37 329.0 56 4.0

2008 NL MVP OPSBI RunAvg BTKS BTKSr StoHos 3Crowns BallSacs rWAR
Pujols 1 1230 .412 63.5 .429 44 368.5 117 9.0
Howard 2 1027 .411 59.0 .364 49 367.5 90 1.5
Braun 3 994 .324 49.3 .326 51 322.5 52 4.3
M. Ramirez 4 1285 .476 25.3 .478 19 285.0 42 3.4
Berkman 5 1092 .397 62.9 .396 47 320.0 111 6.6

2008 AL MVP OPSBI RunAvg BTKS BTKSr StoHos 3Crowns BallSacs rWAR
Pedroia 1 952 .308 48.1 .306 37 280.0 73 6.8
Morneau 2 1002 .363 53.7 .330 23 325.0 89 3.9
Youkilis 3 1073 .383 52.9 .365 32 329.0 83 6.0
Mauer 4 949 .341 46.4 .318 10 267.0 97 5.3
Quentin 5 1065 .408 50.8 .391 43 316.0 89 5.1

2007 NL MVP OPSBI RunAvg BTKS BTKSr StoHos 3Crowns BallSacs rWAR
Rollins 1 969 .325 53.5 .331 71 314.0 62 6.0
Holliday 2 1149 .404 60.2 .381 47 379.0 77 5.8
Fielder 3 1132 .398 63.2 .400 52 363.0 108 3.4
Wright 4 1070 .364 60.4 .377 64 329.5 107 8.1
Howard 5 1112 .435 59.2 .411 48 364.0 119 2.8

2007 AL MVP OPSBI RunAvg BTKS BTKSr StoHos 3Crowns BallSacs rWAR
Rodriguez 1 1223 .513 73.6 .466 78 421.0 125 9.2
Ordonez 2 1168 .430 60.9 .388 32 376.5 83 6.9
Guerrero 3 1075 .373 53.1 .354 29 341.0 86 4.3
Ortiz 4 1183 .424 62.6 .420 38 353.0 118 6.1
Lowell 5 999 .338 48.2 .313 24 324.0 64 4.6

So, what often happens with a chart like this is, people either skip it and wait for the conclusion, or they read it and go, "so what?"  If you're in the former group, that's annoying, because charts take the most time to make.  So authors are upset at you for not reading the chart.  If that's you, go back and look at them.  We'll wait.

Okay, now that everyone's caught up, what the heck does any of this mean?

Well, if the goal of OPSBI is to correlate to a player's true value, we have to ask, "how do we best measure a player's true value?"  Thus, the first and last columns of the chart.  The last, WAR, is a statistical measure.  The first, MVP voting, is a completely subjective measure.  If OPSBI is so good at perceiving value, we'd really like it to have some correlation to one of these two or the other.

The problem is, it doesn't.  Not at all.  So either both the MVP voters and WAR are wrong, while OPSBI is the true best measure... or OPSBI is crap.  Now, I'll admit freely that both WAR and voters take defense into account, which OPSBI doesn't... still though, that's not (for the most part) how MVPs are won, or how WAR is decided, since offense bears so much more weight.

For instance, if OPSBI is such a good measure, why would Jacoby Ellsbury have finished above Jose Bautista last year in MVP voting?  Bautista was the better offensive player... even by these made up metrics.  I just don't get what OPSBI is doing that different from... ANY of these other things I made up.  Seriously.  Well, maybe not HitByBallSacs, but that's just too hilarious to not include.  Anyway, the others do just as good of a job predicting WAR and predicting MVPs... maybe better.  So why OPSBI?  I see no reason, since it doesn't even reflect voter tendencies.