I'm Not Mad, I'm Just Disappointed: 2012

Wednesday, December 26, 2012

Next in MMA

So ESPN is back at it again with its "Next" magazine issue. For those that do not know, ESPN each year likes to make a magazine issue that stars athletes that are the future of their respective sports. For example Cam Newton was the winner last year for being "Next" in football (not sure how that has worked out so far). Typically I always have an issue whenever ESPN tries to create a list or a bracket of any sort. I looked at their nominees for who is next in each sport and in typical ESPN fashion they made some really hair scratching choices just for their nominees. Though I could hammer ESPN for choosing Johnny Football as a nominee for being "Next" in football over Russell Wilson or Breanna Stewart for being "Next" in basketball over Damian Lillard, I decided to take my frustrations out on their list of who is "Next" in fighting.

As I have gotten older my love for combat sports has grown to the point where the knowledge I use to store about football, basketball, baseball, and hockey have been pushed out to make room for MMA. No longer do I pay close attention to recruiting classes or even care who wins what award in college sports. Now I wonder who will be the next great fighter to enter the 145 pound class in the UFC and whether the 125 pound division is sustainable with a lack of well known fighters? Gone are the days of wondering who will be the next great defensive end to challenge the greatness of Bruce Smith and Reggie White. Now I wonder if anyone can eclipse the standard set by Anderson Silva and GSP. ESPN is clearly the gold standard in sports and is the voice of the sports world, but ESPN has always lacked in its knowledge of combat sports especially in MMA. So when I saw their list of who is "Next" in fighting needless to say I thought the list was extremely underwhelming.

Of their list of five fighters, two of their fighters were MMA fighters. I was impressed by the fact that they put Michael Chandler on their list. Michael Chandler is a star in the Bellator ranks and is one of the top 10 fighters in arguably the deepest weight class in MMA, the 155 pound division. However after Michael Chandler, ESPN then listed Alexander Gustavsson which is just an okay choice. Alexander Gustavsson has been relevant in the UFC (the biggest organization in MMA) for more then three years now. I'm not sure what ESPN considers for "Next" but in my book Gustavsson is more of a now guy then a next guy. After the two MMA fighters the next three fighters were boxers. Now I'm not a big boxing fan, but the guys they listed (Broner, Canelo, and Price) are all also well established boxers that hold multiple titles (which I understand isn't very difficult. I think you get a title just for turning professional). The fact that ESPN decided to list more boxers then MMA fighters is a clear sign of their ignorance of the popularity of MMA.

Their ignorance is evident every time the topic of MMA is brought up on one of their shows. The talking heads on ESPN always brush off the topic as if they are asked to discuss an obscure Olympic sport like curling or ribbon dancing. However, UFC alone has gone from being exclusively on Spike TV to having headlining shows of FOX, FX and Fuel TV. The Bellator brand has shows on Spike TV, MTV, and MTV2. UFC fighter Jon Jones has his own line of training gear in Nike. Oh and his line of gear was brought in after Nike dropped Manny Pacquio, (this was before he was left sleeping on the mat by Marquez). MMA is the biggest growing sport in the world with MMA gyms popping up all over in each country. MMA is the biggest individual sport in Brazil and in Canada. It's also considered the most popular combat sport in all European countries and in Asia. With this being a fact ESPN has to get with the times especially if they are going to try to provide commentary on combat sports. With this in mind I think it's my job to provide the correct insight on who is actually "Next" in fighting, specifically in MMA. So here we go!

5) Jon Dodson: The five foot three inched 125 pound American could be the face of a struggling division in the UFC. The 125 pound division gets criticized for not having enough heavy hitters. Though the fights are fast paced the masses always like a good knockout artist. Enter Jon Dodson. In his three UFC fights he has two spectacular knockout victories. Also his post-fight celebrations of flips and acrobatics just adds to his appeal. Jon Dodson will be fighting Demetrious "Might Mouse" Johnson on January 26 in Chicago for the championship. This fight will be the headlining bout on a card that will be on FX. This could be the stage that vaults Dodson into mainstream popularity

4) Anthony "Showtime" Pettis: There is no Milwaukee bias in this choice. Anthony Pettis has been slowed by some injuries, but he could be the most exciting fighter in the deepest weight class in the UFC. The 155 pounder has explosive kicks and amazing grappling. Milwaukee is surprisingly becoming a hotbed for training UFC fighters at the Duke Roufus gym and Anthony Pettis is their best prospect. The 25 year old has a huge fight against Donald Cerrone on the same card as Jon Dodson. A victory in this card should put him in line for a title fight against the current 155 pound king Ben Henderson.

3) Dustin Porier: The 23 year old Louisiana native has been on a tear ever since he was introduced to the UFC. He had a minor setback against Chan Sung Chung, but his submission defeat to the Korean Zombie happened after he was dominating that fight in the first round with timely strikes. With only two losses under his belt and with a recent bounce back victory over former TUF winner Jonathan Brookins, Porier is on track to be a serious threat to current 145 pound king Jose Aldo. With time, the American Top Team prospect should be fighting for a title sooner rather then later.

2) Chris Weidman: He could arguably be number one on this list, but the guy in front is just a little bit more dominant. However, this 185 pounder is the most serious threat to the MMA king, Anderson Silva. His dominant performance over Anthony Munoz sent a message to every fighter in the 185 pound division. The 28 year old is yet to lose a fight and has avoided the injury bug for the most part. Another dominant performance against another top 5 contender should put him in line to take on Silva. A victory over Silva would make Weidman the first American to hold the middleweight title since Rich Franklin was relevant in the early 2000s.

1) Rory McDonald: We thought there would never be another GSP, but here he is. If anything he's the evil GSP. Unlike GSP, this Canadian does not care for fan approval. His robotic post-fight interviews reminds many fans of Ivan Drago. However his skill sets are completely unstoppable. He has ran through every opponent that has been placed in front of him. His lone lost to former #1 contender Carlos Condit came in the very last round when he was knocked out after dominating Condit for the previous two rounds. He will have a chance to revenge that lost at UFC 158. The 23 year old could be the most dominate fighter in the UFC over the next five years. Rory McDonald has all of the striking and wrestling as his mentor GSP and he has time on his side. He's a bigger version of GSP and has way better kicks then GSP ever had. Canada has arguably the most rabid MMA fan base and they also have the next big thing in Rory McDonald.

Monday, December 3, 2012

The New WARSCOR

So, you all may remember WARSCOR, my attempt at a career rating system which could be my version of Adam Darowski's wWAR. Of course, there's also the fact that Adam has rolled out his newest thing: The Hall of Stats. Well, I got jealous. One of my biggest criticisms of wWAR was that it was too arbitrary: it had cutoffs in weird spots, and for no reason. The Hall of Stats has cutoffs at replacement and average.

WARSCOR has the advantage of being adaptable to be used with Win Shares, rWAR, fWAR, (the now-defunct gWAR, which I miss a whole lot, because it used DRA for defense,) and WARP (although that's my least favorite of the group, because it doesn't have full historical stats). And since I don't actually run a website with it, I am free to just do what I need for a specific project, not sort of figure everything out for every player in history.

But, of course, Adam had to go and come up with something better. I realized that, perhaps, average is a better comparison than replacement. Hmph. It's tough to say. But here's what I did. I ran the WARSCOR system using replacement level. Then I did the exact same thing with Wins Above Average (baseball-reference-style - let's not get ahead of ourselves TOO much). Then, I took the geometric mean. Easy as that. Here it was for Adam's list of the top 9 3B not in the Hall of Fame, with their WARSCOR, WAASCOR, and the Composite:

Stan Hack, 38.4; 21.0; 27.1
Heinie Groh, 37.3; 22.8; 28.3
Ron Cey, 39.3; 23.2; 29.2
Robin Ventura, 39.8; 23.6; 29.7
Darrell Evans, 41.4; 23.5; 30.0
Buddy Bell, 44.6; 27.9; 34.4
Sal Bando, 45.5; 28.4; 35.0 (35.954)
Graig Nettles, 46.2; 28.1; 35.0 (34.958)
Ken Boyer, 46.9; 28.0; 35.1

It's really only when you have a distance of at least 1.0 that you can start to even say there's a remote distinction between the players. Therefore, you can see that Bando, Nettles, and Boyer can't be distinguished between in a meaningful way, although it's probably not outrageous to say that they're significantly better than Bell. But it's also pretty clear that Evans falls into the lower group. There's a rather HUGE gap between Evans and Bell, and one can see that, if we were to say that people were to go into the Hall of Fame, this might be a good place to separate into different factions.

So, there you have it. A new, even more convoluted system. But one that I'm pretty proud of, and would stand behind.

Thursday, November 15, 2012

2012 MVP

Today is MVP day. Happy last day of the 2012 season!

Seriously, it's all hot stove from here on out, so let's take a moment to savor this moment. Savor it.

Aaaaaaaahhhh.

But just because 2012 is ending doesn't mean we can't have more people discussing the MVP debate, right? Here's a comment I posted over at Seamheads this morning.

-----------
In my opinion, there is a solid, non-sabermetric argument to be made for Trout over Cabrera. First, you just have to suggest that the currency of baseball is runs.
Then, you can look at a good, non-sabermetric stat like Runs Produced (RBI+R-HR). Trout had 182; Cabrera had 204. Or you can go more simply and say that you get half-credit for each run, and half-credit for each RBI. In other words, (R+RBI)/2. 106 for Trout; 118 for Cabrera. That’s either a difference of 12 runs, or 22. Let’s just hedge our bets and use the larger, 22 run-advantage for Cabrera. The question becomes, is it possible that Trout’s baserunning+fielding yielded 22 runs? Well, he pulled back 5 home runs this year. That’s five runs right there, easy as pie. We’re down to 17 runs. Let’s look at baserunning. Let’s say that every CS eliminates 2 SB. And let’s count each SB as 1/4 run. In math, that’s (SB-2CS)/4. That’s not very much. For Cabrera, that gives us (4-2*1)/4=.5. So the difference is back up to 17.5 runs. For Trout, we get (49-2*5)/4= 9.75. So the difference is now down to 7.75 runs. Are you really going to argue that non-stolen base baserunning and defense (which, keep in mind, we’ve only accounted for in terms of home runs stolen) is less than 8 runs? Didn’t think so.
This is the best non-saber case I can make for Trout. I think it holds, as much as any such argument could. It shows Cabrera to be “up” by 7.75 runs – but that doesn’t account for the slew of extra outs he made, or the fact that the Angels foolishly left Trout in the minors for a month, or the fact that the Angels won one more game than the Tigers, or account for differences in ballpark. Frankly, I don’t see how you can even make the case for Cabrera in light of this. But that’s my opinion.
------------

Now, perhaps it was unfair of me to "double-count" stolen bases; after all, shouldn't those have been counted in Runs Produced already? Probably, so that was a boo-boo by me. Anyway, we're left with a 17-run difference - or, disregarding the homer-robbing exploits of Trout, a 22-run difference.

Is it possible that Mike Trout saved 22 more defensive runs than Miguel Cabrera? I would say that it's pretty darn possible. 22 is a lot of runs, but it's definitely not inconceivable. And when you take into account that Trout is an electric defender, while Cabrera is adequate-at-best, I don't think it's unrealistic. But let's say that he didn't. What about some of the other points I made?

For example, who created more outs? That's easy to figure - and this will help fix the issue with playing time. You take batting outs (AB-H), and you add caught stealings and grounding into double plays. The whole thing is : AB-H+CS+GDP
Cabrera: 622-205+1+28=446 outs
Trout: 559-182+5+7=379 outs
So Cabrera created (as we said) 22 extra runs, in 67 extra outs. Is that good or bad?

Well, the entire AL this year created 17217 runs, using up 59932 outs. In other words, for every out, an average player created .287 runs. Why is this important? Well, what if we credit Trout for some of his missed playing time by giving him only AVERAGE performance for those additional 67 outs? If we do that, we take 17217/59932*67=19 runs. That means, if Trout had played as an average player, instead of as MIKE TROUT for the difference in playing time as Cabrera, we would have expected Trout to be only 3 runs behind Cabrera. Three runs. Remember those 5 homers we talked about earlier? Yeah. Add those in, and we get Trout above Cabrera. And that's still not factoring in the majority of their difference on defense.

Now, again, I can see someone saying, Well, you can't just "make up" for lost time like that - Trout wasn't playing, and that's that. To an extent, obviously, that's true. So let's look at a ridiculously similar MVP race, involving a part-time outfielder who was a better defender and baserunner, and compare him with... MIGUEL CABRERA.

In 2010, Josh Hamilton of the Rangers and Miguel Cabrera of the Tigers were in a very similar boat. Cabrera led the league in RBI (batting .328 and hitting 38 HR; this year, he batted .330 with 44 HR - so, basically, the exact same year) and played 150 games (161 this year). Josh Hamilton, on the other hand, played in only 133 games (Trout played 139 this year). In 2010, Miguel Cabrera produced 199 runs, topping the AL. Hamilton produced 163 runs - a difference of 36. Which, if you're scoring at home, gives Hamilton a BIGGER gap, and LESS playing time; and yet, people happily voted him the MVP. And the outs gap was only 44. So basically, Cabrera produced one run for every extra out he created. Giving Hamilton the extra 44 outs at a league-average rate for 2010 gives him only 13 runs, which still leaves him 23 runs behind - bigger than Trout's gap was, even BEFORE we adjusted for playing time! And since Josh Hamilton's 2010 defense is not 20 runs better than Mike Trout's 2012 defense, we have to conclude, I think, that voters in 2012 and voters in 2010 are not applying consistent reasoning.
So why did Hamilton win? Because of the batting title? Because his team made the playoffs? Because of his defense? Well, Trout's D is better, he practically won the batting title, and his team won more games than Cabrera's. So why Hamilton in 2010 but not Trout in 2012?

All I know is, I can make the argument that the difference between Trout and Cabrera, even before adjusting for defense or ballpark, is closer to 3 runs than 22. And I can also make the argument that we've had a Trout-Cabrera situation before, and resolved it in favor of our "surrogate" Trout. Yet, it seems to be a problem this year.

By the way, I made this case for Trout without the use of WAR. You don't need it, because it's obvious that Trout had the better year. I used a vastly inferior offensive statistic, which actually makes the gap between Trout and Cabrera look bigger than it really is. I hardly touched on defense. The fact of the matter is, Miguel Cabrera had a wonderful year. He was one of the 5 best players in the AL this year, probably one of the top 3, and maybe even one of the top 2. But he wasn't as good as Mike Trout. As I've said before, he'll win the MVP, and that's fine. But there's no doubt in my mind that it's also incorrect.

******
Addendum to original post:
Another way to think of this might be the following. What percentage of Miguel Cabrera's value comes from defense? 0%? A negative percentage? Let's be absurdly generous and say that 10% of Miguel Cabrera's value is from defense.
And now, how about Mike Trout. What percentage of Trout's value comes from defense? 30%? Let's say 20, to be conservative.

Well, if we assume those figures to be true, and even if we stick with Runs Produced as the offensive model, we're left with this math:
Cabrera= (11/10) * 204 = 224.4
Trout = (5/4) * 182 = 227.5

First of all, I swear I picked those numbers out of thin air, and did not specifically engineer them for Trout to come out on top. But really, they probably don't look that unreasonable. So, again, I get Trout as the MVP. In spite of less playing time, he still created, by this "measure" more total runs than Cabrera. Again, I just can't avoid the conclusion that Cabrera did not have as good of a year as Trout did.

Friday, November 2, 2012

How to Make Football a Better Game

I can hear you thinking it. "But David - football is already the most popular game in America. How are you going to make it better?"

What if I said that I could eliminate the most boring play in the game, make football higher-scoring, and make virtually every possession have at least one edge-of-your-seat play? Would you be interested?

Here's how it works:

First, in my football dream world, kickoffs and punts would also be eliminated. Every drive starts from the 20, and you get four downs - if you don’t convert, sucks to be you. I think the 20 is a good spot to start because it's not SO far back, but it's not so far up that teams can just play conservatively back and forth, gaining one or two first downs and then turning the ball over on downs. If you are facing fourth-and-four from the 26 yard line and you MISS it - you're screwed. The other team starts in GREAT field position. Thus, the game becomes higher scoring. If you can't punt, even when you're in horrible field position, teams have to get more creative and riskier, and that results in more turnovers, more spectacular plays, and a better game overall.

Second, there's a change in OT. Overtime will still be sudden-death. Home team has a choice of ball or wind. Why? Because they're the home team. Whoever starts on offense starts from their own 35. Why the 35? Because if they fail to convert, their opponents start with the ball less than half a field away. This yardline could be changed if the 35 is too problematic. Hopefully, though, the field position and wind disadvantages and the possession advantage even one another out. If defense stops the offense, good for them.

Third, the extra point is eliminated. Every TD is worth 7 points. However, if you'd like to get an "extra" point, you can. All you have to do is wager one of your 7 points. In other words, you'd get one down from the three yard-line. If you made it, you'd have 8 points. If you missed it, you'd LOSE one of your 7, and you'd only have 6. It makes a game-tying TD ACTUALLY tie the game, most of the time. And, if you tie on the last play of the game, you can choose to win or lose, right there. No time to think about overtime or not. You either win, or go home.

Fourth, I wouldn't eliminate the kicking game entirely. The only vestige of the kicking game I appreciate is the field goal. The rest of it can just go. But I also support progressive field goal scoring (2 points for a field goal of 20-29 yards, 3 points for a field goal of 30-39 yards, etc.). It rewards strong-legged, accurate kickers. And it creates some interesting scenarios, like this vignette:

You're the home team. There's a decent-speed wind with you, for now. It’s four and one from the 39, with your team down by 5 with two minutes to play, and you have Sebastian Janikowski (or a similarly strong-legged kicker). Do you:

a. Go for the conversion, and try to keep moving the ball, trying to score while running out the clock?
b. Kick a field goal from the standard distance (~17 yards farther than your yardline), which would be a 5-pointer to tie the game, but leave time on the clock?
c. Have your holder line up an extra four yards back and kick a 6-pointer to tie the game while leaving time on the clock?

To me, that would all make football a MORE exciting and interesting game, rather than what it is now. Defense matters more, because there's no onside-kick to fall back on. Another scenario, with the same team: If you're down by two TDs with 3 minutes left, and you score, now you're down by one TD. If your defense creates a four-and-out, they turn the ball over to you inside their own 30! You're in great position to score again. And, if you do, you can choose to go for the extra point, at which point you'll either win the game, or lose. And if you choose to play for overtime, you now have to choose if you want the ball, at which point you'll have to fight the wind even if you're in "normal" range for your strong-legged kicker, or you can give the ball to the other team in the hope that your D can stop them again. It's a boatload more strategy, more second-guessing of coaches, fewer gimmicks, fewer scary special teams plays that cause injuries, more scoring, defense is more important, there's more appreciation for strong-legged kickers, and there's a freed-up roster spot because no team carries a punter anymore. It's pretty much the best of all possible worlds.

What do you think? Do you have answers for the posited scenarios? Do you think these would be good changes?

Wednesday, October 31, 2012

Triple Crown Part IV

It's been a long time since one of my Triple Crown posts. And, now that the season's over (congratulations, Giants), I think it's time for another - especially since we actually saw a Triple Crown this year! Very exciting stuff. As I've mentioned in at least two posts, I support Mike Trout for MVP... but that doesn't mean that Miguel Cabrera shouldn't be celebrated for an outstanding season!

One of my old sawhorses is looking at alternative Triple Crowns. There's an article at the Hardball Times about the old-school-new-school debate. Frankly, I'm sick of it, because it seems to me that it's mostly settled - but then again, I hang around in "new school" circles on these here internets. And while I understand that it's not true for most people, my guess is that, in 20-30 years, baseball fandom will look quite different from how it looks now. Kids who have grown up with WAR and WPA and REW will not mind those stats. Anyway, the reason I'm bringing this up at all is that the aforementioned Hardball Times article has an interesting comment from a reader. That reader (TomH) says the following:

"WAR is too complicated for a stattha most will be able to get a handle on. We ought to start but getting old-schoolies to acknowledge that OBP is far batter than AVG, and that scoring is as important as driving them in. That the ‘triple crown’ (TC) wasn’t drawn up by Moses (pre-Ruth it was irrelevant), and a better measure of offesnive breadth (what the TC tries ot capture) would be OBP, R and RBI, or OBP, SLG, and R+RBI, or OBP, HR, and R+RBI."

I left it as was, typos and all. So while I think "stattha" means "stat that," one can never be sure. Anyway, He makes a good point that the TC, if it tries to measure anything (my response would be that it's artificially constructed, and doesn't try to do anything at all), it's breadth of offensive statistics. Not to mention the idea that there's one stat which is an "average" and two counting stats - and one of those stats involves power, the other, runs. Anyway, I got really curious about the ideas he mentioned as possibilities as a "replacement" Triple Crown. So lets go through them, one-by-one, since 1893.

OBP/R/RBI
Mike Schmidt (PHI), 1981 - .435/78/91
Carl Yastrzemski (BOS), 1967 - .418/112/121
Frank Robinson (BAL), 1966 - .410/122/122
Ted Williams (BOS), 1949 - .490/150/159
Stan Musial (STL), 1948 - .450/135/131
Ted Williams (BOS), 1947 - .499/125/114
Ted Williams (BOS), 1942 - .499/141/137
Ted Williams (BOS), 1941 - .553/135/120
Babe Ruth (NYY), 1926 - .516/139/146
Babe Ruth (NYY), 1923 - .545/151/131
Rogers Hornsby (STL), 1922 - .459/141/152
Babe Ruth (NYY), 1921 - .512/177/171
Rogers Hornsby (STL), 1921 - .458/131/126
Babe Ruth (NYY), 1920 - .532/158/137
Babe Ruth (NYY), 1919 - .456/103/114
Gavvy Cravath (PHI), 1915 - .393/89/115
Sherry Magee (PHI), 1910 - .445/110/123
Ty Cobb (DET), 1909 - .431/116/107
Nap Lajoie (PHA), 1901 - .463/145/125

OBP/SLG/R+RBI
Albert Pujols (STL), 2009 - .443/.658/259
Todd Helton (COL), 2000 - .463/.698/285
Larry Walker (COL), 1997 - .452/.720/273
Frank Thomas (CHW), 1994 - .487/.729/207
Barry Bonds (SFG), 1993 - .458/.677/252
Barry Bonds (PIT), 1992 - .456/.624/212
Mike Schmidt (PHI), 1981 - .435/.644/159Joe Morgan (CIN), 1976 - .444/.576/228
Dick Allen (CHW), 1972 - .420/.603/203
Carl Yastrzemski (BOS), 1970 - .452/.592/227
Willie McCovey (SFG), 1969 - .453/.656/227
Carl Yastrzemski (BOS), 1967 - .418/.622/133
Frank Robinson (BAL), 1966 - .410/.637/244
Willie Mays (SFG), 1965 - .398/.645/234
Duke Snider (BRO), 1956 - .399/.598/213
Ted Williams (BOS), 1951 - .464/.556/235
Ralph Kiner (PIT), 1951 - .452/.627/233
Ted Williams (BOS), 1949 - .490/.650/309
Stan Musial (STL), 1948 - .450/.702/266
Ted Williams (BOS), 1947 - .499/.634/239
Ted Williams (BOS), 1942 - .499/.648/278
Ted Williams (BOS), 1941 - .553/.735/255
Jimmie Foxx (BOS), 1938 - .462/.704/314
Lou Gehrig (NYY), 1936 - .478/.696/319
Lou Gehrig (NYY), 1934 - .465/.706/293
Chuck Klein (PHI), 1933 - .422/.602/221
Babe Ruth (NYY), 1926 - .516/.737/285
Rogers Hornsby (STL), 1925 - .489/.756/276
Babe Ruth (NYY), 1924 - .513/.729/264
Babe Ruth (NYY), 1923 - .545/.764/282
Rogers Hornsby (STL), 1922 - .459/.722/293
Babe Ruth (NYY), 1921 - .512/.846/342
Rogers Hornsby (STL), 1921 - .458/.639/257
Babe Ruth (NYY), 1920 - .532/.847/295
Rogers Hornsby (STL), 1920 - .431/.559/190
Babe Ruth (NYY), 1919 - .456/.637/217
Ty Cobb (DET), 1917 - .444/.570/209
Gavvy Cravath (PHI), 1915 - .393/.510/201
Ty Cobb (DET), 1909 - .431/.517/223
Honus Wagner (PIT), 1909 - .420/.489/192
Honus Wagner (PIT), 1908 - .415/.542/209
Honus Wagner (PIT), 1907 - .408/.513/180
Nap Lajoie (CLE), 1904 - .413/.546/194
Honus Wagner (PIT), 1904 - .423/.520/172
Nap Lajoie, (PHA), 1901 / .463/.643/270

OBP/HR/R+RBI
Albert Pujols (STL), 2009 - .443/.47/259
Larry Walker (COL), 1997 - .452/49/273
Barry Bonds (SFG), 1993 - .458/46/252
Mike Schmidt (PHI), 1981 - .435/31/159
Dick Allen (CHW), 1972 - .420/37/203
Harmon Killebrew (MIN), 1969 - .427/49/226
Willie McCovey (SFG), 1969 - .453/45/227
Carl Yastrzemski (BOS), 1967 - .418/44/133
Frank Robinson (BAL), 1966 - .410/49/244
Willie Mays (SFG), 1965 - .398/52/234
Duke Snider (BRO), 1956 - .399/43/213
Ralph Kiner (PIT), 1951 - .452/42/233
Ted Williams (BOS), 1949 - .490/43/309
Ted Williams (BOS), 1947 - .499/32/239
Ted Williams (BOS), 1946 - .497/.667/265
Ted Williams (BOS), 1942 - .499/36/278
Ted Williams (BOS), 1941 - .553/37/255
Mel Ott (NYG), 1938 - .442/36/232
Lou Gehrig (NYY), 1936 - .478/49/319
Lou Gehrig (NYY), 1934 - .465/49/293
Chuck Klein (PHI), 1933 - .422/28/221
Babe Ruth (NYY), 1926 - .516/47/285
Rogers Hornsby (STL), 1925 - .489/39/276
Babe Ruth (NYY), 1924 - .513/46/264
Babe Ruth (NYY), 1923 - .545/41/282
Rogers Hornsby (STL), 1922 - .459/42/293
Babe Ruth (NYY), 1921 - .512/59/342

Babe Ruth (NYY), 1920 - .532/54/295
Babe Ruth (NYY), 1919 - .456/29/217
Gavvy Cravath (PHI), 1915 - .393/24/201
Ty Cobb (DET), 1909 - .431/9/223
Nap Lajoie, (PHA), 1901 - .463/14/270

Well, that was fun. Hope you liked it.

Monday, October 22, 2012

2012 IBAs

Hurray! Here we are again, ladies and gentlemen. It's IBA (Internet Baseball Awards) time! That means, for the second straight year, I'll be publishing my ballot (here's last years). So, before I break it all down, here are my award winners:

AL Manager

1. Bob Melvin

2. Buck Showalter

3. Robin Ventura

Three guys who did a great job putting up better years than were expected of their teams. I think Baltimore is more likely a fluke than Oakland, and I think the Oakland job (with all those rookie pitchers) was, in some ways, the more difficult, and it was (turns out) in the more difficult division. So, as long as this is the "who's the manager of the team that overachieved the most?" award, I'm gonna go Melvin. Although I don't fault anyone for the logic that, "If you had said before the season started that the O's would finish with 93 wins and in the postseason, I would have handed my vote to Showalter that day." He's a fine choice, Ventura's a fine choice. Pretty much anyone's a fine choice, so long as he's not Bobby Valentine or Ned Yost.

NL Manager

1. Bud Black

2. Davey Johnson

3. Mike Matheny

Seriously, Manager of the Year is a stupid award. We simply don't know enough about managers to evaluate them well. But the Padres were WAY better than I thought they'd be. Probably because of Chase Headley. But maybe because of Bud (or as baseball-reference.com calls him, "Buddy") Black. Johnson led the Nationals (seriously, the WASHINGTON NATIONALS) to the playoffs behind the best pitchers in baseball. Again, I don't know how much that was him, but maybe some of it. Matheny gets added because, frankly, who saw the Cards being basically exactly the same team as last year, only without any of the players or LaRussa? I'm tempted to put Matheny at #1, but I think this whole award is a crapshoot, so he wound up third. Meh.

AL Rookie of the Year

1. Mike Trout

2. Yoenis Cespedes

3. Yu Darvish

4. Matt Moore

5. Will Middlebrooks

The Mike Trout Show begins. Cespedes and Darvish were great. Moore succeeded in spite of high expectations. Middlebrooks was one of the few bright spots for a bad Red Sox team. All good players. And there well could have been a few A's pitchers on here, too.

NL Rookie of the Year

1. Bryce Harper

2. Wade Miley

3. Norichika Aoki

4. Martin Maldonado

5. Zack Cozart

Harper or Miley... Harper or Miley. I go with Harper here, because he's more likely to have the better career. Aoki and Cozart are a little bit of a stretch, because they're both "old" rookies. Aoki came out of pretty much nowhere though, and completely made up for Nyjer Morgan's regression to... Nyjer Morgan, and allowed for Corey Hart to move to 1B. Now, were I not a Brewers fan, I wouldn't know that. But I am. So there. Maldonado replaced Jonathan Lucroy well. And he made George Kottaras cuttable (<----not a word). For people who think I'm being a homer, yes. But at least I found a way to keep Mike Fiers off the ballot, even though he may be my favorite of the crop (although I LOVE all three of the Brew Crew rookies).

AL Cy Young

1. Justin Verlander

2. David Price

3. Felix Hernandez

4. Max Scherzer

5. Chris Sale

Verlander, with the exception of his W-L record, was EXACTLY the same pitcher this year as last year. David Price this year was not as good as Justin Verlander last year. How anyone can vote Price the Cy winner is beyond my understanding. That said, Price was phenomenal. King Felix was himself. That's pretty much a 3rd-place vote, guaranteed. We'll see how he adjusts to new field dimensions in Seattle next year, when the fences get moved in. My guess? His ERA goes up, but he's basically the same guy. It'll be interesting to watch. Scherzer was great, and underrated, too, for the Tigers this year. Lots and lots of strikeouts. Chris Sale had a nice year. Don't know if he'll repeat it, but it doesn't matter. There were lots of other great guys (Matt Harrison, Jake Peavy, Hideki Kuroda, Yu Darvish, Jered Weaver, and that whole fleet of A's pitchers). I like these five best.

NL Cy Young

1. RA Dickey

2. Cliff Lee

3. Clayton Kershaw

4. Stephen Strasburg

5. Craig Kimbrel

There's really no easy answer, and Dickey's the best story. I really wanted to vote for Cliff Lee, because it would have been awesome for a player who was that unlucky to win the award. And Lee was outstanding - just unlucky. Strasburg is my more "unconventional" choice. He was great. the 160 inning-thing was not his fault. And yes, that's throwing Craig Kimbrel a bone at the end. He was too good not to get a vote. I don't think I'm being controversial for leaving off Gio Gonzalez and Johnny Cueto. I just don't think they were as good as the other four starters. They were more valuable than Kimbrel, but it's not his fault how he gets used, and I think he deserves a sympathy vote more than they need one for getting to 20 "wins." Frankly, though, if you were to tell me that I'm a fool for leaving them off because they were both better than Dickey... well... I don't know that you'd be wrong. Unlike last year, which had three deserving winners, this year has no candidate at all who stands out. Also, shout-outs to Kris Medlen and Jordan Zimmerman, whom I also really wanted to vote for. And it would have been nice to toss a vote to Aroldis Chapman, as well. So that's 10 guys, none of whom could really be "wrong." Yup. That's 2012 in the NL.

*A quick aside, before we get to the MVP votes. No pitchers on my ballots this year, except Verlander. While a lot of guys had great years, there were too many inseparable offensive players, in my opinion, and no compelling reason to give a vote to a pitcher. That's not good logic. I know that. I know it, and I don't care. It's my ballot. Go cast your own. {On second thought, don't. Because voting ends tonight, Monday the 22nd}

AL Most Valuable Player

1. Mike Trout

2. Miguel Cabrera

3. Robinson Cano

4. Adrian Beltre

5. Alex Gordon

6. Austin Jackson

7. Ben Zobrist

8. Josh Willingham

9. Justin Verlander

10. Joe Mauer

Yeah, I don't really want to have "the talk." You know - the one about whether Trout was better than Cabrera. They both had great seasons. Trout was better. Cabrera won the Triple Crown. Both seasons aren't likely to be forgotten. When (not if) Cabrera wins the BBWAA MVP award, it will be no great injustice. As in many previous years, the best player may not win. As in most previous years, a deserving player is winning. This is not Fingers in '81, or Hernandez in '84, or Eckersley in '92, or even an equally bad choice not involving a relief pitcher. It'll be a great player getting a great award. Miguel Cabrera has probably been the second-best player in the AL for like 4 years in a row... it's just been someone different on top each year. It's no great sin that he'll win. Heck, I'm happy for both guys.

As for my down-the-ballot choices, what stinks is that this Trout-Cabrera hubbub has distracted from a lot of great seasons by other players. Cano had a wacky year, but I'm willing to file the weird RBI totals as just bad luck this year, because Cano's been (historically) a good clutch hitter, not a bad one. Beltre's always been great. Gordon had a breakout year. Jackson and Zobrist have been themselves: great in the field, solid with the bat. Jackson continues to be a BABIP monster - maybe the greatest player of all time, adjusting for era, at getting on with balls in play. Verlander gets a ninth-place vote. There are two Twins in my top ten. That looks wrong. But then you look at Willingham's and Mauer's years. They were great. Almost any team would be lucky to have them.

NL Most Valuable Player

1. Ryan Braun

2. Andrew McCutchen

3. Buster Posey

4. Chase Headley

5. Yadier Molina

6. David Wright

7. Jason Heyward

8. Michael Bourn

9. Aramis Ramirez

10. Joey Votto

Remember how Votto had the award locked up before he got hurt? I honor that with a 10th-place vote. For me, the MVP was Braun. Dealing with all the steroids BS (you don't know what he did or didn't do, no matter what you think), hearing that he'd regress without Fielder, and basically carrying what was a BAD club offensively and keeping them competitive for half the season makes him deserve the award, in my opinion. But really, 1-6 were all the same player. Pick one. You're not wrong. Heyward and Bourn... it's too close to call between the two of them. I like both, but I'm giving Heyward the nod. As for Ramirez, here's a neat little fact for you, going away:

Most XBH (extra-base-hits) in MLB this year:

Miguel Cabrera - 84

Robinson Cano - 82

Ryan Braun - 80

Albert Pujols - 80

Aramis Ramirez - 80

Well, I hope you like the ballot. Got any comments? Feel free to post a link to yours - I'd love to read it!

Saturday, October 6, 2012

Postseason Baseball

Who's excited? I know I am. If you're not, watch this. It gave me chills.

Thursday, October 4, 2012

Win Estimators, or Why Baseball and Football are Different

What would it take for you to believe that a Major League Baseball team went undefeated? I mean in terms of runs scored and allowed. Like, if those were the only two pieces of information you had, what would they need to be (or be like) for you to believe a season was undefeated? I think, for me, it would take something pretty incredible - like allowing no runs for the entire season (or, like 10). Because, even if a team scored like 4000 runs, but gave up 500... well, don't you think it's possible that they'd lose, I don't know... 2 or 3 games? The basic Pythagorean formula (Runs squared over (runs squared plus runs allowed squared)) says a team like that would go 159.5-2.5 (I guessed two or three games before doing the calculation, by the way, so I'm pretty proud of that guess). And that seems about right, doesn't it? I mean, even with a run differential that big, you'd still expect to lose a game or two. Which, in the scheme of a 162-game season, is nothing. But the point is, the run differential it would take to believe in an undefeated baseball season is astronomical. I mean, this hypothetical team, which averages a 25-3 game, could hypothetically lose 12-11 twice or thrice during the year, right?

But football is fundamentally different, because it takes place in a small sample size. For example, if I told you a team scored 400 points on the season, and gave up only 50, well, you'd assume they went undefeated. And you'd probably be right. The pythagorean formula agrees with this one. Because it predicts this team to go 15.75-.25... so yeah. They'd probably go undefeated.

But what if we double their points allowed? What if they scored 400, and gave up 100? That's an average score of 25-10... but would they go undefeated? The Pythagorean formula says they'd go 15-1. My guess is, in an NFL where the average team scores 22 points/game (close to the historical average, and in fact just behind the average for 2011 of 22.2), that's probably about right. But, frankly, in a league where an average team scores 22 ppg, 400 points isn't that many (average team would score 352). So we'd expect the offense to fail once in the season, even if the defense is tough.

Anyway, why are we talking about this? I mean, who cares?

Well, I do. Because here are some real numbers. I'm going to list the team, their points scored/allowed, Pythag. record, and then actual record. Here goes:

2007 Patriots - 589-274; 13.8-2.2; 16-0

1985 Bears - 456-198; 14.1-1.9; 15-1

1998 Vikings - 556-296; 13.1-2.9; 15-1

1972 Dolphins - 385-171; 12.2-1.8; 14-0

2008 Lions - 268-517; 2.8-13.2; 0-16

1976 Buccaneers - 125-412; .8-13.2; 0-14

What you see here is that it's basically impossible, by the Pythagorean formula, to ever expect an undefeated season in the NFL . . . or a winless one. The reason is because of a quirk of the Pythagorean formula, in which the PSsq/(PSsq+PAsq) will only yield a 0 if the team scores no points, and will only yield 100 if the team allows no points. But the truth is, teams do go undefeated. So it makes no sense to use a quadratic equation when we know that football doesn't quite work that way.

So what do I suggest we do about this? Well, it's a pretty easy solution, actually. You go linear. And how does one do that? Like so:

Use the information we already have.

Figure out the number of points/game.

That's all you need.

Take the team's points differential. Divide by 2*(ppg). Add to half of the number of games in a season. That's it.

For example, in 2007, all NFL teams scored 11104 points. If we divide that by 32 (number of teams), by 16 (number of games), and then multiply by two (because two teams play in each game), we get 43.375 as the number of ppg. The Patriots that year had a points differential of (589-274=)315 points. 315/43.375=7.26 wins, plus 8 (a half-season's worth) = 15.26 wins. So, by my formula, we'd expect the 2007 Patriots to have gone 15.3-.7... which is much closer to their actual record of 16-0 than the Pythagorean expectation, which gave them less than 14 wins (13.8).

Here are the expectations for the other teams I mentioned:

1985 Bears - 14.0-2.0

1998 Vikings - 14.1-1.9

1972 Dolphins - 12.3-1.7

2008 Lions - 2.3-13.7

1972 Bucs - -.4-14.4

Yes, that is a negative expectation of wins for the 1972 Buccaneers. They were that bad. In every case, this linear method comes closer to the team's actual record (for the Vikings, it's one full win closer!), except the 1985 Bears, which my method misses by .1 wins more than the classic way of doing it. Frankly, I don't really see how anyone could use the Pythagorean method when one could do this, which is just as easy, works the same for middle-of-the-pack teams, and works significantly better for teams at the periphery.

Saturday, September 29, 2012

Fun with Runs Created

I've been rather prolific lately. It's been fun while it's lasted, though I don't know how much longer that'll be.

Regardless, when I was looking at all of that Chipper Jones stuff yesterday, I couldn't help but start looking at Runs Created. Why? Well, when I was thinking "What might Chipper Jones have 1500 of?" one of the things that crossed my mind was RC. And, in fact, Chipper does have over 1500 RC.

For the uninitiated, Runs Created has many versions, but the basic version is this:

RC=((H+BB)*TB)/(AB-H)

It's a handy formula, because it only takes four variables. Just for funsies, I checked six teams this year, to see how closely these factors matched with what we'd expect them to be. I tried to get teams from different types of parks, overachievers, underachievers, good teams, and bad. And I didn't want to do all 30 teams. So anyway, you can see how ridiculously accurate RC is. The team is listed, and then actual Runs/RC:

Milwaukee: 752/748
Pittsburgh: 639/626
Los Angeles Dodgers: 616/606
Colorado Rockies: 745/777
New York Mets: 639/640
New York Yankees: 765/789
Boston Red Sox: 721/714

So you can see that Colorado's been a pretty extreme example... but they've been awful, so it makes sense that they've been underachieving what components would have you believe they'd do.

Anyway, RC doesn't actually work great for individual hitters. The reason is that the formula assumes that these four factors (AB, H, BB, TB) actually interact with one another. Well, obviously, Chipper Jones doesn't just interact with his own AB/H/BB/TB... he actually interacts with other people's... and those other people weren't as good as Chipper Jones. So RC naturally overestimates the abilities of good hitters. But who cares? It's still fun, and it's a good, summative way to look at basically all parts of hitting while still being easy enough to calculate yourself. So keep this in your pocket, because it's fun to pull out sometimes.

And on to the crux of this post...

So, after goofin' around with RC a little, I looked at the leaderboard on Baseball-Reference (mad shout-out to them... that's where I've gotten the stats for the last few posts I've done, but I completely forgot to credit them - sorry, Sean and Neil!), and realized that they used a more complex version of RC*. No biggy. I just used their top 100 players, and recalculated the "basic" version of RC for each player.

*Under normal circumstances, the various "technical" versions of RC don't differ that much from one another, but they do for a couple of people. Specifically, they crush the guys who have mad base-stealing skills. Barry Bonds loses 246 Runs using the basic version, and so does Joe Morgan - seriously, both lost 246, on the nose. Rickey Henderson lost - get this - 334 runs. Those are HUGE differences, and I'm sorry to have to not include them. But it does take away from the beautiful simplicity of RC the more you add. And most other players were affected by 40 runs or fewer. That sounds huge, but since we're dealing with guys who, for the most part, had 20-year careers, we're talking 2 R/year, which is pretty insignificant. 74/100 were affected by 60 runs or fewer, and no one had 60 or more runs added by using the less-technical version. Besides, while it does affect the number, it only rarely has a significant effect on the order of players, so I decided to do this as simply as possible.

Anyway, this is really just for fun. So I calc'd it, and looked at the results. Interesting, no doubt. But then I decided to look at it as a rate stat. Actually, as a ratio stat, because I didn't want to estimate PAs, or use ABs, or use AB+BB (which would have been fine, I guess). So I used batting outs (AB-H). My question was, who has "hit" .300, using RC/O (that's runs created/out)?

Well, of these 100 guys, less than 1/3 of them. Ruth tops the list, with nearly 1 RC for every 2 outs (.495). Hank Aaron "hit" exactly .300 by this calculation. As always, the list was populated by pre-integration guys, and guys who played in the Selig Era. Every one of the 32 guys fits into one of those 3 categories, or is Hank Aaron, Willie Mays, or Mickey Mantle. I was expecting Mike Schmidt, but he finished lower at .273 than (get this) Will Clark (.274). In case you're curious, the worst finisher by rate was Lou Brock, who "hit" .198. Now, I know what you're thinking - "but Brock was a basestealer! So he probably got robbed by using the basic formula!" Nope. Brock lost only 46 runs by using the basic version of RC. Still a hefty load, yes, but only enough to push him up from #100 to #98. So, yeah.

But then I thought, well, why don't I take the rate, and multiply it by the raw number. That should give a nice compromise. Of course, it does do this. If you're into algebra, write out the equation, look at the cancelling, and be amused. I was. But it doesn't really matter, because it pretty much looks right. Anyway, by this reckoning, here are the top ten hitters of all-time, balancing rate and total. They're presented as Runs Created/RCRate/RC*RCRate, with the third of those being the organizing principle.

Babe Ruth 2733/.495/1352
Ted Williams 2347/.468/1091
Barry Bonds 2646/.383/1013
Lou Gehrig 2250/.426/959
Stan Musial 2551/.348/887
Ty Cobb 2510/.346/870
Jimmie Foxx 2119/.386/818
Rogers Hornsby 2030/.387/786
Hank Aaron 2576/.300/772
Willie Mays 2333/.307/716

Something about having Aaron and Mays next to one another feels really right about this.

In case you're curious, since we've been talking Chipper lately, he ranks #19, for now. I say "for now" not just because he's active, but because he's two spots behind A-Rod, one ahead of Todd Helton, and two ahead of Jim Thome. Actually by B-R RC, RC rate, or RCRate+RC, Thome and Jones end up right next to each other, so I guess they belong together. But anyway, there could still be some movement among those guys, even by the end of the season, so nothing's really set in stone there. Manny Ramirez (#12) is the highest ranking "active" player. Among "active" players who actually are active, Albert Pujols tops the list (#14). The lowest ranking player among these 100 was Steve Finley. At #99 was Lou Brock. Derek Jeter ranks 1 point behind Lance Berkman (432-433). I wouldn't have guessed that. Edgar Martinez ranks at #37 - and people say he's not a HOFer. Milwaukee's own Al Simmons ranks #22. Speaking of Milwaukee, Robin Yount is all over the map. He ranks #55 by RC (behind another Milwaukee connection, Eddie Mathews), by rate he ranks #98 (ahead of Finley and Brock), and by overall, he ranks #90. Frankly, it's not bad for a SS, I think. Molly fairs better, #56 overall; and since I mentioned Mathews, he's one spot ahead of Molitor.

So, that's pretty much it, I guess.

The Greatness of Chipper

Well, Cybermetrics got me thinking again with Cy's post over there today.

Players with .300 Avg, 1000 XBH, 1500 BB, 150 SB, 1500 RBI:

Chipper Jones

That's the whole list. So that's pretty cool. But here are some other exclusive lists of which he's a part.

.300/.400/.500 career guys:

Dan Brouthers
Ty Cobb
Jimmie Foxx
Lou Gehrig
Hank Greenberg
Harry Heilmann
Todd Helton
Rogers Hornsby
Shoeless Joe Jackson
Chipper Jones
Edgar Martinez
Stan Musial
Lefty O'Doul
Mel Ott
Albert Pujols
Manny Ramirez
Babe Ruth
Tris Speaker
Frank Thomas
Joey Votto
Larry Walker
Ted Williams

1500 RBI, 1500 R, 1500 BB (the 1500 Club, as it were)

Barry BondsLou Gehrig
Chipper Jones
Mickey Mantle
Stan Musial
Mel Ott
Babe Ruth
Mike Schmidt
Jim Thome
Ted Williams
Carl Yastrzemski
And, of course, both groups combined (which gives us totals and rates combined):

Lou Gehrig
Chipper Jones
Stan Musial
Mel Ott
Babe Ruth
Ted Williams

Honestly, I know all about the "let's make a list fallacy," as Bill James called it, but I nonetheless find it impressive that one can create a group with just these six names on it. I mean, the other five are generally regarded as some of the best pure hitters of all time (although Mel Ott is often left out of such discussions). And only one of them played his entire career post-integration: the great Chipper Jones.

Friday, September 28, 2012

Answer to Trivia Question

Over at Cybermetrics, there's a post asking two trivia questions. First, who are the only two players with 1200+ RBI, 250+HR, 300+ SB, and 3000+ Hits.

Easy. Got it on my first guesses. Willie Mays and Derek Jeter.

But that's actually the lesser question. The primary question is, "Through 1990, Who Were The Only 3 Players With 1000+ RBIs, 250+ HRs, 250+ SBs and 2500+ Hits?"

Answer:

Willie Mays

Joe Morgan

Vada Pinson

Since 1990 (since it's only seven more guys):

Craig Biggio

Barry Bonds

Andre Dawson

Steve Finley

Derek Jeter

Gary Sheffield

Robin Yount

Actually, both Yount and Paul Molitor narrowly miss on the first question. Yount finished 29 SBs shy... Molly missed by 26 HRs.

For the second trivia question, I noticed that including RBI actually has no bearing on the answer to the question. As to the first question, Biggio misses by 25 RBI. I guess Biggio and Jeter are pretty closely tied in my mind. I've watched them both throughout my life, both are middle infielders, both are part of a core "group" of players (Bags and Bigg in Houston, the "Core Four" in New York), both are associated with just one team (though, who knows, Jeter may finish his career elsewhere). It doesn't surprise me to see them this close statistically on such a list. The only real difference is that the media had a positive effect on Jeter's popularity; the media was more or less neutral on Biggio, though it could be argued that they had an averse effect, by not appreciating what he did well. Anyway, that's that.

Thursday, September 27, 2012

ERA+ Estimators, All-Time Greats

So, after my last post (which I realize just went live this morning), I decided it would be fun to look at some of the great seasons of all time. If you need explanations of what the numbers mean, see that other post. Here are the numbers:

Bob Gibson, 1968: 206/106/628
Doc Gooden, 1985: 322/137/892
Greg Maddux, 1995: 235/91/494
Walter Johnson, 1910: 428/181/1585
Steve Carlton, 1972: 256/136/887
Curt Schilling, 2002: 328/196/852
Roger Clemens, 1997: 361/165/954
Pedro Martinez, 1999: 520/223/1111
Hal Newhouser, 1946: 867/156/2540
Randy Johnson, 2001: 320/207/1008
Justin Verlander, 2009: 228/112/548
Sandy Koufax, 1965: 300/207/1008
Lefty Grove, 1930: 4913/145/14299 (!!!!!)
Dizzy Dean, 1933: 945/120/2770
Cy Young, 1905: 317/123/1018

What's funny is, last year, no one broke 1000 in the final category (and I doubt anyone will this year). Basically, all of these guys either won the Cy Young in the year noted, or there was no CY, but the pitcher at hand was considered the best in the league. There are a couple of exceptions, however: Schilling '02 and Verlander '09.

Verlander lost out to Zack Greinke, who posted a 234/109/536 line, which is very, very similar to Verlander's. Schilling in '02 lost out to teammate Randy Johnson, who posted a 300/175/780 line, which is just a hair below what Schilling did that year. Neither of these could really be considered a "bad" choice, particularly Greinke over Verlander in '09, since they were nearly identical by this measure.

These columns are, of course, somewhat interesting. What I found myself most drawn to was the middle column - it's SO-BB over average (if an average pitcher faced the same number of batters). Both Randy Johnson and Sandy Koufax had differences of 207 over an average pitcher, but both pale in comparison to Pedro Martinez's 223 in 1999.

As for the other two columns, the first functions a bit like ERA+: it shows how much better the player was than the league rate. Obviously, Lefty Grove's 1930 line is absolutely ridiculous. So is Martinez's, Newhouser's, and Dean's, when you look at them. But the others? Well, they're not actually that far off from some of the best ERA+ seasons of all-time. High-200s, low-300s seems pretty normal for an historically great season. So it doesn't even seem weird.

The final column, which is the rate/league rate multiplied by innings pitched, serves to take workload into account, so that a relief pitcher's line can be compared (somewhat) to a starting pitcher's. Well, again, Grove is off the charts. But most of the other numbers are pretty tame, actually.

So, the point is, I really like this stat line, and I think I'll return to it once in a while to check up on how various pitchers are doing in the coming years.

ERA+ Estimator

Tom Tango's been doing some brilliant stuff lately. That links to an article at The Hardball Times, at which there is a discussion of (and hyperlink to) an article from THT last week about ERA estimators. Anyway, as it turns out, the best way to estimate ERA is (K-BB)/BF. That's it. Strikeouts, minus walks. Then divide by batters faced.

This is very, very interesting stuff, indeed. Because I have an idea for how I may look at Cy Young voting differently in light of this fact. Essentially, it makes sense to me to take a players (K-BB)/BF, and divide it over the league rate. That is, it becomes (K-BB)/BF+, for all intents and purposes, or Est+ (as in "estimated+"). Here's last year's AL:

15655 SO
6949 BB
86425 BF
For a quotient of: .1007

And, in that order, here are some of the leading candidates for last year's AL CY:

Verlander - 250/57/969
Sabathia - 230/61/985
Weaver - 198/56/926
Haren - 192/33/953
Wilson - 206/74/915
Hernandez - 222/67/964

Overall, here are their personal quotient, over the league quotient (of .1007, if you recall), times 100:

Verlander - 198
Sabathia - 170
Weaver - 152
Haren - 165
Wilson - 143
Hernandez - 159

Now, if you wanted to multiply these times IP to get a playing time factor, that would be fine by me. But just know that this might be a way to look at things in the future. Or another way to look at it would be to look at the raw total difference between the pitcher at hand and an average pitcher. In other words, the league rate of (SO-BB)/BF, multiplied by the individual at hand's BF, and then subtracted from the individuals SO-BB. In mathematical terms, it would look something like this ("pl" for "player," "lg" for "league):

plSO-plBB-(plBF(lgSO-lgBB)/lgBF)

For the aforementioned pitchers, that would yield these results (numbers truncated, rather than rounded; rate*IP in parentheses):

Verlander -95 (496)
Sabathia -69 (404)
Weaver -48 (358)
Haren -62 (394)
Wilson -39 (319)
Hernandez -57 (373)

Clear-cut in favor of Verlander, right?

So why do I bring this up? I bring it up because of the Cy Young race in the NL right now. As it stands, here are the numbers for some of the best candidates:

Dickey - 151/52/331
Gonzalez - 134/32/260
Cueto - 112/13/237
Kimbrel - 358/67/215
Chapman - 305/65/211
Kershaw - 151/52/321

No, those numbers for Cueto and Kimbrel are not misprints. They're seriously that much better than the league... albeit in limited numbers. However, as you can see, the difference in total numbers is not that great, as they rank ahead of all but one guy, who's out on the periphery...

Cliff Lee - 174/71/346

Seriously. The guy with the 6-8 record is blowing people away. He's actually having a great year; his team's just not winning. The Phillies' offense is terrible, and it's getting taken out on Cliff Lee, even though he's probably been the best starting pitcher in the NL this season.

Now, I don't ever think a number like this would ever go "mainstream," but it's interesting to think that, if it did, Cliff Lee might have a shot at the NL Cy Young. But as it stands right now, I can't imagine he's on too many people's radar screens.

Anyway, here's the AL as it stands right now, just in case you're curious:

Sabathia - 155/49/285
Hernandez - 153/54/338
Verlander - 162/66/374
Scherzer - 191/81/352
Price - 154/50/315
Weaver - 117/13/211
Darvish - 139/35/256
Sale - 158/50/298
Shields - 145/46/308
Rodney - 180/24/126
Nathan - 232/37/142

Now, obviously, the season isn't over yet, so we could see some movement on both of these leaderboards. But as it stands right now, I believe my vote for AL Cy Young, strictly on the basis of these stats, would go the way of Justin Verlander. He's pitched more innings than Scherzer, who's pitched better. They'd be 1-2, with King Felix in third, since I view the third of these columns as most significant. In the NL, I desperately want RA Dickey to win the Cy Young. I've loved the guy since he was with the Twins and I was living in Minnesota. But I can't shake the nagging suspicion that Cliff Lee is actually the NL's best pitcher. So I suppose it'll be like last year's NL MVP, in which I voted for Matt Kemp, because he was the best player... but I wanted Ryan Braun to win, because he's my favorite player. Likewise, I'll be rooting for RA Dickey, but when the time comes for the IBAs (Internet Baseball Awards - you should vote if you never have before!), as things stand right now, I'm voting for Cliff Lee.

Monday, September 24, 2012

Cabrera vs. Trout

So, everyone's weighed in on this already. Like Brian Kenny. And Joe Posnanski. And a bunch of others. What do I have to say? Well, the "statheads" are right, of course. Trout's been better than Cabrera this year, and by quite a bit. That's true whether or not Cabrera wins the Triple Crown.

But I'm not actually here to debate the two (in spite of the title), because to my mind, it's pretty cut-and-dried who's been better. I'm here to celebrate Miggy Cabrera.

As some of you who read older posts may have noticed, I love the Triple Crown. I understand that it's pretty meaningless in the grand scheme of things, but it's so flippin' fun, I just can't help but love it. But, as everyone knows, there's a HUGE amount of luck in the Triple Crown. Like take Yaz's Triple Crown: 44/121/.316. Good numbers, all. But at no point in the 1990s would he have led the league in ANY of those categories. At all. So while it's true that he won the Triple Crown, that season wouldn't have in a lot of other years.

Now, I know I'm cherry picking here, and projecting, and not adjusting for era. But Miggy's final numbers this year may be: 45/140/.330! That's incredible. Do you know how many times that's been done in MLB history? 17 times. That's it. By 7 different players. Here's the list:

Babe Ruth, 1921: 59/171/.378 (no MVP awarded)
Babe Ruth, 1926: 47/146/.372 (ineligible for MVP)
Lou Gehrig, 1927: 47/175/.373 (Won MVP)
Babe Ruth, 1927: 60/164/.356 (ineligible for MVP)
Babe Ruth, 1929: 46/154/.345 (no MVP awarded)
Babe Ruth, 1930: 49/153/.359 (no MVP awarded)
Hack Wilson, 1930: 56/191/.356 (no MVP awarded)
Babe Ruth, 1931: 46/163/.373 (Did Not Win MVP)
Lou Gehrig, 1931: 46/184/.341 (Did Not Win MVP)
Jimmie Foxx, 1932: 58/169/.364 (Won MVP)
Jimmie Foxx, 1933: 48/163/.356 (Won MVP)
Lou Gehrig, 1934: 49/165/.363 (Did Not Win MVP, Won Triple Crown)
Lou Gehrig, 1936: 49/152/.354 (Won MVP)
Joe DiMaggio, 1937: 46/167/.346 (Did Not Win MVP)
Jimmie Foxx, 1938: 50/175/.349 (Won MVP)
Hank Greenberg, 1938: 58/146/.340 (Did Not Win MVP)
Todd Helton, 2001: 49/146/.336 (Did Not Win MVP)
Miguel Cabrera, 2012?

So, you can see, players have not won the MVP with this kind of season before. In fact, in some ways, Gehrig's 1934 is most similar: team missed the playoffs (as the Tigers may), but Gehrig reached these milestones AND won the Triple Crown... but he lost the MVP. As you can see, there have been 17 of these seasons. But in six of them, there was no MVP awarded, or the player who accomplished the feat was ineligible. That leaves 11 seasons. Of those 11, two occurred in the same season as another. That leaves nine seasons in which a player COULD have won the MVP, finishing with a .45/140/.330 line. Someone won with that line 5 times - just over half (and keep in mind that we're not including near-misses, like Albert Belle in 1998, or Ted Kluszewski in 1954, which would lower the percentage even more). So, while Cabrera is putting up a fantastic season, it still wouldn't be unprecedented for him to not win the award; and that's true even if he did win the Triple Crown.

One final note: all of these seasons occurred in the 1920s and 1930s, the best eras in history for baseball offense, and before integration, good minor league systems, and proper scouting; well, besides the Helton season, which occurred in perhaps the most run-rich environment (outside of the Baker Bowl in the early '30s) in baseball history. So if Cabrera does it this year, we may actually say that it's a totally unique season in baseball history. And he deserves to be celebrated for that. He just doesn't deserve the MVP.

Saturday, September 22, 2012

The OPSBI Fallacy

Yes, I realize I'm a year too late to this one. But the MVP talk is a-startin', and I just know some idiot sportswriter is going to reference this "stat" when explaining their vote, so I thought it was worth another look. I realize it's been torn apart by saberists before. But I wanted to take a different look at OPSBI.

For those who don't know, OPSBI was created by Jim Bowden, the former Reds/Nats GM. The basic idea is this: you take the players on-base percentage, and add his slugging percentage (thus, the OPS). Now, eliminate the decimal point (or multiply by 1000, however you prefer to think about it). Then, add the player's RBI. That's it.

So, first, I want to point out why this is stupid. First of all, it doesn't measure anything. At all. It arbitrarily adds a rate stat (two rate stats, actually) to a counting stat, without any consideration for why. It doesn't correlate to anything. As we know, OPS has a reasonable correlation with run-scoring. RBI represent actual runs batted in (though not runs scored). So, you're adding something which correlates to run-scoring with something that actually is one-half of run-scoring. Now, someone clever could probably make the argument that if you added Runs to this, you'd solve some of that problem. But here's the question - why add anything to OPS at all? I don't get it.

Anyway, Bill James says, "For a statistic to have value, it has to be meaningful with reference to something other than its own formula" (The New Bill James Historical Abstract, in the player comment on Craig Biggio). OPSBI fails that test.

Of course, there are still defenders out there. I don't feel like looking for articles right now, but I remember reading at least two of them last offseason. Here's the thing they'll say, more or less: "Who cares if it doesn't measure anything - it gives the right answer!" Okay, well, in my mind, it gives the "right" answer - that is, it affirms (much of the time) the conclusion sportswriters have already made. But I personally believe that I can throw out a lot of other BS stats that will do the same thing (more or less). Anyway, I'll be looking at the last five years of data (not including 2012, of course, since we're still underway), and looking at the top MVP-finishers (non-pitchers only) for each season, and comparing them by different metrics. Those metrics are:

MVP Finish - where did the player finish in subjective MVP voting?
OPSBI - of course.
RunAvg - (R+RBI)/AB
ButTheKitchenSink*Games - Games*(RBI+R+TB+BB+HBP+SB)/(3*AB) ; I already posted about this before.
ButTheKitchenSink(rate) - (RBI+R+TB+BB+HBP+SB)/(3*AB) ; same thing, but as a rate stat (scaled to batting average); really, what this is, is RunAvg+BattingAvg+SecondaryAvg, and divided by three so that it looks like the players batting average.
StolenHomes - SB+HR
TripleCrowns - (3*HR)+((1000*Avg-100)/2)+(RBI)
HitByBallSacs - because I couldn't resist: HBP+BB+SF+SH
rWAR - because, wouldn't it be fun if I used a metric that actually seemed to represent real value?

One last note before presenting: in 2008, Manny Ramirez only played 53 games in the NL, so only his NL stats are included. Also, I hope this formats okay. Here come the charts:

2011 NL	MVP	OPSBI	RunAvg	BTKS	BTKSr	StoHos	3Crowns	BallSacs	rWAR
Braun	1	1105	.391	57.9	.386	66	343.0	66	7.7
Kemp	2	1112	.400	63.7	.395	79	366.0	87	7.8
Fielder	3	1101	.378	62.2	.384	39	345.5	123	4.3
Upton	4	986	.326	54.2	.341	52	294.5	82	5.7
Pujols	5	1005	.352	50.0	.340	46	322.5	72	5.1

2011 AL	MVP	OPSBI	RunAvg	BTKS	BTKSr	StoHos	3Crowns	BallSacs	rWAR
Ellsbury	1	1033	.339	54.9	.347	71	329.5	69	8.0
Bautista	2	1159	.405	64.6	.433	52	340.0	142	7.7
Granderson	3	1035	.437	62.3	.400	66	332.0	108	5.3
Cabrera	4	1138	.378	62.3	.387	32	337.0	116	7.3
Cano	5	1000	.356	52.1	.327	36	325.0	58	5.2

2010 NL	MVP	OPSBI	RunAvg	BTKS	BTKSr	StoHos	3Crowns	BallSacs	rWAR
Votto	1	1137	.400	60.4	.403	53	349.0	101	6.7
Pujols	2	1129	.397	63.6	.400	56	358.0	113	7.3
C. Gonzalez	3	1091	.388	53.3	.367	60	353.0	49	5.8
A. Gonzalez	4	1005	.318	52.8	.330	31	362.0	101	4.1
Tulowitzki	5	1044	.391	44.6	.365	38	306.5	59	6.5

2010 AL	MVP	OPSBI	RunAvg	BTKS	BTKSr	StoHos	3Crowns	BallSacs	rWAR
Hamilton	1	1144	.376	49.6	.373	40	343.5	53	8.4
Cabrera	2	1168	.432	61.4	.409	41	366.0	100	6.1
Cano	3	1023	.339	52.3	.327	32	326.5	70	7.8
Bautista	4	1119	.409	66.3	.412	63	362.0	114	6.6
Konerko	5	1088	.365	54.1	.363	39	345.0	83	4.3

2009 NL	MVP	OPSBI	RunAvg	BTKS	BTKSr	StoHos	3Crowns	BallSacs	rWAR
Pujols	1	1236	.456	72.6	.454	63	392.5	132	9.4
H. Ramirez	2	1060	.359	53.9	.357	51	325.0	76	7.1
Howard	3	1072	.399	59.5	.372	53	370.5	87	3.5
Fielder	4	1155	.413	65.9	.407	48	382.5	128	6.0
Tulowitzki	5	1022	.355	54.6	.362	52	304.5	85	6.3

2009 AL	MVP	OPSBI	RunAvg	BTKS	BTKSr	StoHos	3Crowns	BallSacs	rWAR
Mauer	1	1107	.363	50.9	.369	32	3028.5	83	7.6
Teixeira	2	1070	.369	56.7	.363	41	346.0	98	5.1
Jeter	3	937	.273	46.3	.302	48	269.0	82	6.4
Cabrera	4	1045	.326	53.4	.334	40	333.0	74	4.7
Morales	5	1032	.343	50.8	.334	37	329.0	56	4.0

2008 NL	MVP	OPSBI	RunAvg	BTKS	BTKSr	StoHos	3Crowns	BallSacs	rWAR
Pujols	1	1230	.412	63.5	.429	44	368.5	117	9.0
Howard	2	1027	.411	59.0	.364	49	367.5	90	1.5
Braun	3	994	.324	49.3	.326	51	322.5	52	4.3
M. Ramirez	4	1285	.476	25.3	.478	19	285.0	42	3.4
Berkman	5	1092	.397	62.9	.396	47	320.0	111	6.6

2008 AL	MVP	OPSBI	RunAvg	BTKS	BTKSr	StoHos	3Crowns	BallSacs	rWAR
Pedroia	1	952	.308	48.1	.306	37	280.0	73	6.8
Morneau	2	1002	.363	53.7	.330	23	325.0	89	3.9
Youkilis	3	1073	.383	52.9	.365	32	329.0	83	6.0
Mauer	4	949	.341	46.4	.318	10	267.0	97	5.3
Quentin	5	1065	.408	50.8	.391	43	316.0	89	5.1

2007 NL	MVP	OPSBI	RunAvg	BTKS	BTKSr	StoHos	3Crowns	BallSacs	rWAR
Rollins	1	969	.325	53.5	.331	71	314.0	62	6.0
Holliday	2	1149	.404	60.2	.381	47	379.0	77	5.8
Fielder	3	1132	.398	63.2	.400	52	363.0	108	3.4
Wright	4	1070	.364	60.4	.377	64	329.5	107	8.1
Howard	5	1112	.435	59.2	.411	48	364.0	119	2.8

2007 AL	MVP	OPSBI	RunAvg	BTKS	BTKSr	StoHos	3Crowns	BallSacs	rWAR
Rodriguez	1	1223	.513	73.6	.466	78	421.0	125	9.2
Ordonez	2	1168	.430	60.9	.388	32	376.5	83	6.9
Guerrero	3	1075	.373	53.1	.354	29	341.0	86	4.3
Ortiz	4	1183	.424	62.6	.420	38	353.0	118	6.1
Lowell	5	999	.338	48.2	.313	24	324.0	64	4.6

So, what often happens with a chart like this is, people either skip it and wait for the conclusion, or they read it and go, "so what?" If you're in the former group, that's annoying, because charts take the most time to make. So authors are upset at you for not reading the chart. If that's you, go back and look at them. We'll wait.

Okay, now that everyone's caught up, what the heck does any of this mean?

Well, if the goal of OPSBI is to correlate to a player's true value, we have to ask, "how do we best measure a player's true value?" Thus, the first and last columns of the chart. The last, WAR, is a statistical measure. The first, MVP voting, is a completely subjective measure. If OPSBI is so good at perceiving value, we'd really like it to have some correlation to one of these two or the other.

The problem is, it doesn't. Not at all. So either both the MVP voters and WAR are wrong, while OPSBI is the true best measure... or OPSBI is crap. Now, I'll admit freely that both WAR and voters take defense into account, which OPSBI doesn't... still though, that's not (for the most part) how MVPs are won, or how WAR is decided, since offense bears so much more weight.

For instance, if OPSBI is such a good measure, why would Jacoby Ellsbury have finished above Jose Bautista last year in MVP voting? Bautista was the better offensive player... even by these made up metrics. I just don't get what OPSBI is doing that different from... ANY of these other things I made up. Seriously. Well, maybe not HitByBallSacs, but that's just too hilarious to not include. Anyway, the others do just as good of a job predicting WAR and predicting MVPs... maybe better. So why OPSBI? I see no reason, since it doesn't even reflect voter tendencies.

I'm Not Mad, I'm Just Disappointed