Monday, January 30, 2012

What Is a Compiler?

As you'll note from my last post, I've been playing around with a group of 500 players and assessing their worthiness for the Hall of Fame. One of the things that has always irked me about the Hall is when people talk about "compilers." Seriously - what's a compiler? Someone who sticks around forever and keeps putting up decent stats. Why is that bad? Isn't it good to have a long career? Well, of course it is. I do understand the perspective that if a guy gets 100 hits for 30 years to get to 3000, it's not the same as a guy who got 200 hits for ten, then 150 for 5 years, 100 more, then retired and fell short of 3000. But usually, when people argue against "compilers," it's really just a coded argument for, "The stats don't match my opinions, so I'll throw out a derogatory term to slight the player, instead of reconsidering my underlying assumptions." And that's really not what you want, especially from Hall of Fame voters.

Anyway, I don't just want to rant against Hall voters (though, really - who doesn't enjoy that?). What I'd like to propose is a quick mathematical model to see who the "compilers" really are. As in my last post, I'm using WAR as calculated on The Baseball Gauge. The method is simple: Take "peak" to be the player's ten best seasons. Then do some quick division: peak/career. Since I already had a pool of Hall-of-Fame-type players, I thought to just do the calculation quickly on them. I looked at the bottom thirty players - those for whom peak value was 68.4% or less of their career value. Here they are, in descending order:

Warren Spahn
Al Kaline
Lou Whitaker
Gaylord Perry
Stan Musial
Red Ruffing
Honus Wagner
Phil Niekro
Pete Rose
Bert Blyleven
Jack Quinn
Mel Ott
Greg Maddux
Rickey Henderson
Tris Speaker
Frank Robinson
Hoyt Wilhelm
Jim O'Rourke
Barry Bonds
Tommy John
Babe Ruth
Willie Mays
Ty Cobb
Roger Clemens
Dennis Eckersley
Don Sutton
Nolan Ryan
Cap Anson
Cy Young
Hank Aaron

That's right. The #1 compiler of all time is . . . Hank Aaron? Well, actually, it makes a lot of sense. Aaron was lauded for his consistency as a player, even when you adjust for ballpark, era, etc. So only 59% of his career value is wrapped up in his peak. Likewise, the next three players had such long and excellent careers that they can't be blamed for putting up value in those others seasons. There are some other odd outliers, too. Eckersley, for example: the reason he shows up is because most of his value was as a starter, but his seasons as a reliever were so good that they were quite valuable, too. So he shows up, but probably not the way it's meant. So here are the classic "compilers," as the argument is made, who show up in this cursory survey (Player, Rank; Peak WAR, Career WAR, %):

Don Sutton, #6; 52.0, 86.8, .599
Tommy John, #11; 37.4, 59.7, .626
Bert Blyleven, #21; 64.9, 97.0, .669
Pete Rose, #22; 54.1, 80.3, .674
Phil Niekro, #23; 68.7, 101.4, .678
Gaylord Perry, #27; 65.2, 95.5, .683
Lou Whitaker, #28; 45.8, 67.1, .683

So, it actually appears that there may be some validity to this criticism of these players, after all. Particularly, there is a lot of question about John, because his peak was pretty unspectacular. However, I would say that, excepting an extreme case like his, the "compiler" argument doesn't hold water, because most of the greatest players of all-time were compilers, too. I guess I just don't see, "You have a lot in common with Hank Aaron" as being that bad of a criticism, is all.

Thanks to The Baseball Gauge for the data (see above for link.).

Saturday, January 28, 2012

Baseball Hall of Fame Rankings

For a while, I've been envisioning a system to rank players by Hall of Fame worthiness. There are a couple of problems, though. First, I had no access to good data. Second, the market is already flooded with good systems - Bill James Hall of Fame Monitor and Hall of Fame Standards (which rank likelihood of entering the Hall, not merit), Jay Jaffe's JAWS system at Baseball Prospectus, Mike Hoban's CAWS system at Seamheads, and Adam Darowski's wWAR system at Baseball Think Factory. They each use different mathematical models and bases - James uses standard, "newspaper" statistics; Jaffe uses Baseball Prospectus' WARP (Wins Above Replacement Player); Hoban uses Bill James Win Shares; and Darowski uses rWAR (or, as some would call it, bWAR) - the system invented by Rally (Sean Smith) and the most common WAR system, thanks to being hosted at baseball-reference.com. While the systems are different (and I don't really want to get into the nitty-gritty here - go to their sites to find out if you're really interested), they reach largely the same conclusions. So why would I want to do it myself when I don't have the data and there's already good stuff out there?

Well, for one, I have always found value in doing something myself, even if others have already come up with a way to do things. Second of all, these systems all do some things with which I disagree. Third, I found a way to get the data I needed. Over at one of the sites I frequent, The Baseball Gauge, the proprietor, Dan Hirsch, has all of his data free for download. I actually needed a little extra help, but he's super nice about stuff, and we corresponded over e-mail and he helped to give me what I needed. He developed a WAR system over there (Base Runs for offense; Runs Saved, similar to Win Shares, for defense; DIPS 2.0 for pitching), and it's free to use.

So now that I had the data, what was my problem with the other systems. Well, several of them (and others online) use some arbitrary cutoffs - something along the line of [peakWAR+careerWAR]/2. That's the basic formula. Of course, there's really nothing wrong with that. However, how does one define "peak" WAR? 4 seasons? 5 seasons? 7 seasons? 10 seasons? I've seen all of those iterations. So I went to work on the problem. Here are the basic principles to which I stuck through this process.

1.) 10 seasons is the key. Why? Well, it's not arbitrary, for one. The Baseball Hall of Fame requires 10 seasons played for entry. Since that's one of the few rules, it makes sense to me to stick to it.

2.) Big seasons are better than consistency. Imagine two guys with 8 WAR. One of them has 7 WAR one year, 1 the next. The second guy has 4 each year. I prefer the first guy, because with a season that big, you're almost guaranteed to make the playoffs, and have a shot at the World Series. With the second guy, well, lots of guys manage 4 WAR. That may not help the team win. And sure, the first guys may not help at all the second year (with only 1 WAR, he'd be a sub-average player), but, as they say, "Flags fly forever" - in other words, the goal is to win, and you can't take a win a pennant. So the more a player helps to win a pennant, the more it counts. Of course, I'm not counting actual pennants, so I'm going with the seasons that give you the best chance of winning pennants - the biggest seasons.

3.) More different things will give a better estimate than just one thing. I used three different systems, each consisting of two parts. I'll explain.

So, here's the actual system. First, rank the seasons, descending from best WAR to worst. Then...

#1: Add up total WAR. (=A)
#2: Add up total WAR in top ten seasons. (=B)
#3: Add up total WAR, counting the top season 30 times, the next season 29 times, the next season 28 times, the next season 27 times, etc., until you've exhausted all the player's seasons. Then divide by 30. (=C)
#4: Add up total WAR in top ten seasons, counting the top season 10 times, the next season 9 times, the next season 8 times, the next season 7 times, etc., until you've used up all ten seasons. Then divide by 10. (=D)
#5: Add up total WAR, counting the top season 45 times, the next season 44 times, the next season 43 times, the next season 42 times, etc., until you've exhausted all the player's seasons. Then divide by 45. (=E)
#6: Add up total WAR in top ten seasons, counting the top season 15 times, the next season 14 times, the next season 13 times, the next season 12 times, etc., until you reach the tenth season, which will count six times. Then divide by 15. (=F)
#7: You'll now have six numbers, any of which could be the best indicator. So now, we take the harmonic mean of the six numbers:

6/[(1/A)+(1/B)+(1/C)+(1/D)+(1/E)+(1/F)]

Voila!

For example, here are some lists. The top 11 at third base:

schmimi01 Mike Schmidt 94.7 70.9 75.5 41.3 81.9 51.2 63.9
matheed01 Eddie Mathews 88.2 68.9 71.1 40.7 76.8 50.1 61.6
boggswa01 Wade Boggs 76.5 61.9 62.8 38.5 67.3 46.3 55.8
brettge01 George Brett 78.0 59.4 63.1 36.6 68.1 44.2 54.5
jonesch06 Chipper Jones 71.9 52.8 56.7 31.2 61.7 38.4 48.1
bakerfr01 Frank Baker 56.5 54.4 49.6 35.4 51.9 41.7 47.0
santoro01 Ron Santo 56.1 53.4 48.9 34.1 51.3 40.6 46.0
hackst01 Stan Hack 58.6 49.4 48.9 30.8 52.1 37.0 44.0
evansda01 Darrell Evans 59.1 45.5 47.8 28.4 51.6 34.1 41.7
mcgrajo01 John McGraw 48.0 46.9 42.8 32.4 44.5 37.3 41.2
rolensc01 Scott Rolen 53.7 47.7 45.2 28.7 48.0 35.0 41.1

The top 12 at catcher (because I was part of the discussion over at Baseball: Past and Present):

benchjo01 Johnny Bench 75.2 63.5 63.0 39.8 67.1 47.7 56.7
cartega01 Gary Carter 68.7 60.2 57.6 37.0 61.3 44.7 52.5
berrayo01 Yogi Berra 75.2 58.4 60.4 34.6 65.3 42.6 52.3
piazzmi01 Mike Piazza 61.0 54.7 52.2 35.3 55.1 41.8 48.3
dickebi01 Bill Dickey 67.5 52.9 54.9 32.7 59.1 39.4 48.1
cochrmi01 Mickey Cochrane 58.7 53.8 50.0 33.0 52.9 40.0 46.2
fiskca01 Carlton Fisk 67.8 47.4 53.2 29.5 58.1 35.5 44.8
hartnga01 Gabby Hartnett 64.4 47.5 51.3 29.1 55.6 35.2 43.8
rodriiv01 Ivan Rodriguez 65.2 47.0 51.2 28.3 55.9 34.5 43.4
torrejo01 Joe Torre 55.3 46.2 46.2 29.5 49.2 35.1 41.6
simmote01 Ted Simmons 52.5 47.9 45.3 28.6 47.7 35.0 41.0
tenacge01 Gene Tenace 48.0 44.3 41.4 28.5 43.6 33.7 38.6

How about the only 3 DHs I checked, because that's a short list:

molitpa01 Paul Molitor 64.9 49.9 51.8 28.8 56.2 35.9 44.4
martied01 Edgar Martinez 57.9 49.5 48.1 29.2 51.4 36.0 42.9
baineha01 Harold Baines 36.9 28.1 30.0 17.4 32.3 21.0 25.9

One last one. How about top 11 CF:

cobbty01 Ty Cobb 153.2 95.1 112.3 57.3 126.0 69.9 91.4
speaktr01 Tris Speaker 138.5 88.5 103.7 52.3 115.3 64.4 83.9
mantlmi01 Mickey Mantle 120.4 87.8 95.1 53.4 103.6 64.8 81.1
mayswi01 Willie Mays 136.5 85.0 101.0 49.0 112.8 61.0 80.4
dimagjo01 Joe DiMaggio 80.8 69.3 67.6 42.0 72.0 51.1 60.7
griffke02 Ken Griffey 74.4 59.6 60.2 35.5 65.0 43.8 52.9
hamilbi01 Billy Hamilton 70.8 61.2 58.9 36.0 62.9 44.4 52.8
snidedu01 Duke Snider 62.5 54.5 52.8 34.9 56.0 41.4 48.4
wynnji01 Jimmy Wynn 57.4 55.5 50.1 35.3 52.6 42.1 47.4
careyma01 Max Carey 67.7 52.6 54.5 31.3 58.9 38.4 47.2
edmonji01 Jim Edmonds 59.2 52.0 49.7 31.9 52.9 38.6 45.3

Hope that was interesting. I'd love to hear thoughts about this kind of stuff.

As one final thing, I'd just like to thank Dan Hirsch for all his help, and to recommend The Baseball Gauge to anyone out there reading this. It's a great resource.

Tuesday, January 10, 2012

The Baseball Hall of Fame, 2013 Preview

Long time no write!

As of my last writing, Ron Santo was the newest inductee to the Baseball Hall of Fame. Well, on his heels yesterday came the announcement that Barry Larkin would be joining him. Congrats to Barry. My all-time favorite non-Brewers player is going into the Hall! He's really the first player I've ever had any emotional attachment to who got the call to Cooperstown, so I'm pretty excited (although that excitement is just a bit muted by the fact that his election this year was pretty inevitable).

Anyway, what better time to talk about next year's ballot than today? Is it a bit disrespectful to the inductee? Nah - yesterday was his day, and he gets an even bigger day this summer. Rather, I think that, with one of the most loaded ballots in history coming up next year, it's about time we discussed that. So what will next year's ballot look like?

Here are the holdovers from the ballot:

Jack Morris
Jeff Bagwell
Lee Smith
Tim Raines
Alan Trammell
Edgar Martinez
Fred McGriff
Larry Walker
Mark McGwire
Don Mattingly
Dale Murphy
Rafael Palmeiro
Bernie Williams

And here are the (notable/vote-able) guys who are going to appear on the ballot for the first time:

Barry Bonds
Roger Clemens
Mike Piazza
Sammy Sosa
Curt Schilling
Craig Biggio
Kenny Lofton
Upgrade-of-Jack-Morris-with-worse-press . . . I mean, David Wells

Then, the handful of guys who deserve a couple of votes next year:

Julio Franco
Steve Finley
Reggie Sanders
Jeff Cirillo
Shawn Green
Maybe the all-time record holder for most fingers, Antonio Alfonseca

Compared to this year's first-year class, or as I like to call them, "Bernie Williams and the Pips," it's pretty amazing how stacked next year will be. One of the big predictions people are making is that, because Jack Morris garnered 2/3 of the vote this year, his election is inevitable. Well, I could actually see him taking a step back next year because of the loaded ballot. And I'm not really sure that ANYONE will garner election next year. If I had to guess, I would be Morris, Schilling, or Biggio (in no particular order) who would be most likely, because of the steroid stain on the others. And the next year's not much better. It will add:

Greg Maddux
Frank Thomas
Tom Glavine
Jeff Kent
Mike Mussina
Luis Gonzalez
Eric Gagne
Moises Alou
Kenny Rogers

All of those guys should get SOME votes, and I would venture to say that Maddux is automatically a first-ballot guy.

Anyway, what I really want to write about is what my ballot would have been/would be this year, next year, and the year after.

First, my strategy. I think the Hall voters are too stingy. Were I one of them, I'd vote for ten guys every year until there weren't even 10 remotely electable guys left. Sure, I might use one of those ten as a "courtesy" vote, but I'd fill my ballot. This year:

Barry Larkin
Jeff Bagwell
Tim Raines
Alan Trammell
Edgar Martinez
Fred McGriff
Larry Walker
Mark McGwire
Rafael Palmeiro
Brad Radke

My tenth vote could have just as easily gone to Dale Murphy, but I would have shown Bradke some love. Anyway, with Larkin (and Radke) disappearing from the ballot and the aforementioned new class, here is my preliminary ballot for next year:

Barry Bonds
Roger Clemens
Jeff Bagwell
Tim Raines
Alan Trammell
Sammy Sosa
Curt Schilling
Craig Biggio
Edgar Martinez
Mark McGwire

That was really, really tough. And like I said, I don't really expect anyone on next year's ballot to get elected. So that creates my 2014 ballot:

Barry Bonds
Roger Clemens
Greg Maddux
Jeff Bagwell
Frank Thomas
Tim Raines
Craig Biggio
Curt Schilling
Mike Piazza
Mark McGwire

That's right, I left off Alan Trammell, Edgar Martinez, Sammy Sosa, Fred McGriff, Larry Walker, Rafael Palmeiro, Tom Glavine, Mike Mussina, Kenny Lofton, and Dale Murphy (all of whom I see as HOF guys) off the ballot. That's ten guys - a FULL BALLOT. So if no one gets elected next year, which seems reasonable, the backlog becomes pretty much untenable. I don't know that pretty much ANYONE ever gets elected again, because there may always be too many bodies stuck in the doorway, so no one can get through. It will be interesting to see how all of this player out.