Friday, December 6, 2013

Overrated and Underrated in MLB

Over at HHS today, Bryan O'Connor posted a query about who the most over- and underrated players in MLB were.  And specifically, he asked that we (if we were so inclined) devise a method for measuring it.  I came up with separate lists for position players and pitchers, and I want to post the comments I wrote as I came up with my method over here on my own blog.  Why have something so fun only floating around on OTHER people's sites?

Here's what I wrote:

-------------

To mimic fantasy baseball, I took five categories: HR, R, RBI, SB, and H (I took H instead of BA, because I wanted them all to be cumulative). Then I set up a fake fantasy scoring system: 1 pt per hit, 2 pts per RBI or R, 5 pts per HR or SB. This way, everything is (sort of) scaled to H, such that 200 H=100 RBI=100 R=40 HR=40 SB, and a 200 H, 100 RBI, 100 R, 40 HR, 40 SB season is worth 1000 points (that’s a pretty awesome season, I think). Then, for one season, I would divide the number by 2500 – because 1000/2500=.400, which is a batting average representatively awesome enough that it goes with the stats I posted earlier.

Then, I took WAR/25 (in other words, the above season would be seen as equal to a 10 WAR season). You can quibble with my number sense or not, but the goal was to come up with a system, right?
Anyway, for a running three year total, I still just added everything together and divided, but by 7500 and 75, respectively, so the results were scaled to one another. Then I just subtracted the second column from the first (I tried dividing; that didn’t work very well because of negative numbers). I used all players with 1000 PAs in the last 3 years. This made for 226 hitters Fangraphs gave me in the Custom Report I generated for this project.

The Most Overrated Players:
Adam Dunn
Mark Reynolds
Eric Hosmer
Raul Ibanez
Alex Rios
Michael Young
Nelson Cruz
Ichiro Suzuki
Delmon Young
Rajai Davis

The Most Underrated Players
Buster Posey
Mike Trout
Yadier Molina
Joey Votto
Carlos Ruiz
AJ Ellis
Joe Mauer
Evan Longoria
Matt Carpenter
Ben Zobrist

Now, if you ask me, that list looks pretty darn close to right, as far as who sabermetrically-minded folk see as the superstars of baseball, as opposed to what fantasy-focused folk see. This worked pretty well. I’m gonna try something with pitchers, and see what I can do. Be right back…

 -------------

And my second comment (this one has been corrected, since I made an error when I initially posted it on HHS):

-------------


I’m back!

I did something similar for pitchers.

The categories I used were IP, W, S, SO, ER, BB, and H. I shaped them into five “buckets,” and all had to be cumulative, just like the hitters. The five categories I used were IP, 10*W+5*SV, SO, IP-ER, and IP*2-(BB+H). They also had to be scaled to one another. So just as the hitters were scaled to H, I did the same, but to IP. The “ideal” season (and worth 1000 points, just like the hitters) is this: 200 IP, 20 W or 40 SV, 200 SO, 50 ER (2.25 ERA), 200 BB+H (1.000 WHIP). That season would be worth 1000 “points,” just like with the hitters. I used 180 IP minimum, so that the report was roughly the same size as the hitters (I had 226 hitters, 239 pitchers).

I’m not as confident in this method as I am with the hitters; especially using Fangraphs WAR, rather than some combination of B-R and Fangraphs, which would be my preference. Nonetheless, the list actually looks pretty good, I think. So here they are.

The Most Overrated Pitchers
Bronson Arroyo
Ervin Santana
RA Dickey
Tim Lincecum
Yovani Gallardo
Jeremy Hellickson
James Shields
Ian Kennedy
Jason Vargas
Kyle Lohse

The Most Underrated Pitchers
Matt Harvey
David Robertson
Phil Coke
Roy Oswalt
Aaron Cook
Kevin Millwood
Javier Vazquez
Chris Carpenter
Matt Belisle
Hyun-Jin Ryu

There are a lot of relief pitchers on that bottom list. Before you go about saying I’ve overvalued saves, only three of those guys (Holland with 67, Papelbon with 98 and Kimbrel with 138) have more than 4 saves in the last three years. And when I changed the formula to be SV/3 instead of SV/2, it was still the same 10 names – the order only changed slightly.

Again, I’m not as confident about this as the position player list. I don’t know that it’s accurate, but it’s certainly one way of looking at it.

-------------

Specifically, I wanted to post a couple of things here that I thought about as I did this little project.

First, what are some factors that lead to over- or underrating players?

1.  Park factors - Hitters in hitters' parks are overrated; likewise for pitchers in pitchers' parks.
2.  Guys who provide a lot of defensive value - This is obvious; it's not even measured in the former measure.
3.  Positional scarcity - Also not taken into account is position.  To be fair, I generalized to "fantasy players," and fantasy players are usually VERY aware of positional scarcity, so that's not entirely fair.  The mainstream public, though, is not.
4.  DIPS theory - Using Fangraphs WAR, this is going to have a huge impact on who's rated well or poorly.  RA Dickey's numbers suffer from this in particular.  He may actually be a bit overrated (since his performance last year was so bad), but Fangraphs openly acknowledges that FIP-based WAR underrates knuckleballers, so Dickey suffers.
5.  Outs used - Leadoff hitters who don't walk a lot can accumulate MANY more at-bats than players who hit lower in the lineup and take the occasional walk.  Since all of the stats used in the former calculation are at bat-based (well, technically RBI and R aren't, but RBI in particular come pretty scarcely on BBs), that's a major factor.  Plus, if you imagine two guys, each with the same five "basic" stats, wouldn't you prefer one who did all that in 500 ABs, but with 100 BBs and no CS if the other guy also had 600 PA, but no walks and 15 CS?  The first would have created 300 outs; the second would have created 415.  It's an ENORMOUS difference, and it's often not thought of.
6.  Guys who hit lost of doubles - Doubles are the ugly stepsister of hits.  They're better than singles, but the "H" and "HR" column tell you nothing about them.  Ditto for three-baggers; there are just fewer of those, so they're less of a thing to worry about.  Guys who slap a bunch of those hits, though, are tremendously underrated by the "newspaper" stats.

One last thought I should share is just the formulas for each, in long form.

For hitters:

H+2*RBI+2*R+5*HR+5*SB
                 2500

For pitchers:

IP*4+SO+10*W+5*SV-ER-H-BB
                    2500

The idea is that these are equivalent; they're probably not.  But it's really just for fun, so don't worry about it too much!  Any thoughts?

Saturday, November 30, 2013

WARSCOR 3.1 & the 2014 Hall of Fame Ballot

I just recently rolled out WARSCOR 3.0, so why would I mention WARSCOR 3.1?  Because I made a tweak.  The tweak is so minor, in fact, that it would probably be more appropriately be called WARSCOR 3.0.0.1, but I'm not going to make so many updates (I hope) that that will be necessary.  So WARSCOR 3.1 it is.  The only change between WARSCOR 3.0 and 3.1 is multiplying by a constant at the end.  Why?  Because I think career WAR numbers have actually developed a sort of currency in the world of baseball stats today.  And WARSCORs are always lower than career WAR numbers, so it's hard to tell.  What does a "40" mean?  Well, instead of leaving it as it is, we're going to multiply everything by 1.618 at the end.  Why?  Because phi doesn't get as much love as pi, even though it's also a cool irrational number.  But also because that worked out really well as the number - it brought everything pretty well in line with what it needed to be.  So anyway, I'm now presenting all 36 candidates for the BBWAA Hall of Fame vote, as well as the 6 players on the Veterans Committee ballot (I'll put the VC nominees in italics, so you can tell them apart).  I'll post the players WARSCOR, career WAR, HOF monitor and HOF standards.  The last two of these were defined by Bill James in The Politics of Glory (also known as What Happened to the Hall of Fame?) as good measures of a candidates qualifications for the Hall of Fame.  In HOFm, 100 means (roughly) a Hall of Famer, while 120 signifies a virtual lock.  In HOFs, 45-ish is around the average HOF player, so scores around 35 or above merit consideration, while a score in the 60s signifies a virtual lock.  Without further ado, here are the candidates for the Baseball Hall of Fame and Museum in Cooperstown, NY, for 2014:

Barry Bonds:  131.9, 162.6, 340, 76
Roger Clemens:  115.8, 139.2, 332, 73
Greg Maddux:  93.5, 104.8, 254, 70
Curt Schilling:  80.6, 80.7, 171, 46
Jeff Bagwell:  78.3, 79.6, 150, 59
Mike Mussina:  76.0, 82.7, 121, 54
Larry Walker:  72.6, 72.4, 148, 58
Frank Thomas:  71.7, 73.6, 194, 60
Alan Trammell:  70.5, 70.3, 118, 40
Edgar Martinez:  68.8, 68.1, 132, 50
Tom Glavine:  67.6, 74.0, 176, 52
Tim Raines:  67.0, 68.8, 90, 47
Craig Biggio:  66.7, 64.8, 169, 57
Rafael Palmeiro:  66.0, 71.8, 178, 57
Sammy Sosa:  65.0, 58.3, 202, 52
Mark McGwire:  64.6, 62.0, 170, 42
Mike Piazza:  64.6, 59.1, 207, 62
(Joe Torre:  57.6, 57.3, 96, 40)
Jeff Kent:  56.6, 55.0, 122, 51
Tommy John:  55.8, 62.2, 112, 44Fred McGriff:  55.2, 52.4, 100, 48
Kenny Rogers:  54.5, 51.2, 66, 29
Ted Simmons:  53.9, 50.3, 124, 44
Luis Gonzalez:  53.4, 51.2, 103, 48
Dave Parker:  50.6, 40.0, 124, 42
Don Mattingly:  50.3, 42.2, 134, 34
Jack Morris:  48.2, 43.9, 122, 39
Dave Concepcion:  44.0, 40.1, 106, 29
Moises Alou:  42.9, 39.8, 80, 44
Steve Garvey:  41.6, 37.5, 130, 32
Ray Durham:  37.7, 33.7, 64, 33
Lee Smith:  31.4, 29.3, 135, 13
Dan Quisenberry:  31.2, 24.8, 77, 19
Hideo Nomo:  29.4, 21.7, 24, 14
Paul LoDuca:  24.4, 18.0, 21, 26
Richie Sexson:  23.6, 17.8, 46, 21
Armando Benitez:  23.0, 19.2, 73, 14
Sean Casey:  22.2, 16.3, 38, 19
Mike Timlin:  21.0, 19.4, 49, 8
Jacque Jones:  16.8, 11.4, 8, 12
JT Snow:  16.8, 11.7, 16, 16
Eric Gagne:  16.3, 11.7, 46, 17
Todd Jones:  14.6, 10.4, 78, 3

Some thoughts:  WARSCOR 3.1 ranks the candidates (even the ones at the bottom of the list) within two spots of where JAWS ranks them compared to one another, with the exception of Tom Glavine.  JAWS is much more bullish on Glavine than I am, but that's only because (in my opinion), JAWS's use of a 7-year peak gives one a more favorable impression of Glavine's peak that is merited.  I thought that was kinda neat, though. ...  You may have noticed that Joe Torre is included, in spite of not being on the ballot as a player.  I thought it would be fun to include him, to see where he shook out.  By WARSCOR, he's a better bet as a player than everyone on the Vets ballot!  He should have received MUCH more consideration than he did.  It's a shame that he didn't; that being said, though, he will get in as a manager, so at least he's got that going for him. ...  Can you believe the difference between the HOFs and HOFm for Lee Smith?  CRAZY!  If you have any familiarity with those metrics, you'll know that HOFs rewards career accomplishments, while HOFm looks at a career on a season-by-season basis.  Usually, they're in pretty good agreement - but that's the most extreme difference I've ever seen.  That HOFs score is very good - first-ballot-electee kind of good - but that HOFm score is more of an off-the-ballot-with-only-one-vote-in-the-first-year kind of score.  Insane difference.  No wonder he's hovered around 50% forever - measured one way, he obviously deserves it - measured the other, he's not worth a second look.  Interesting. ...  There are eighteen (EIGHTEEN) players on the BBWAA ballot who are more qualified via WARSCOR than anyone from the VC ballot.  It's been said before, but - can you say "logjam?" ...  Did you know that the link between Bonds and steroids starts with 1999, and he played 13 seasons before that?  Did you know that Roger Clemens' association with steroids starts with his time in Toronto, and that he also played 13 seasons before that?  If you took only their first 13 seasons, Bonds would have a WARSCOR of 100.0, Clemens' would be 88.3.  They would still be the best- and third-best players on the ballot, respectively.  They will eventually get in, because a Hall of Fame without those two is a bit of a farce. ...  Did you see how close some of the WARSCOR numbers are to the career WAR numbers of some of these players?  For Schilling, Walker, Trammell, and Torre, the number is +/- 3 of their actual career WAR.  It seems to break down below 40 WAR and above 80, but when we talk borderline HOF players, we're almost always in the 40-80 range, so that's where it's most important for it to "work."  Besides, the 1.618 multiplier is pretty irrelevant - it's just to make things look more "normal" to people who are familiar with WAR, so regardless of the fact that it seems to break down a bit, you know that a player with a WARSCOR<40 a="" and="" candidate="" is="" not="" player="" really="">80 is a shoe-in.  And since it's really designed to check HOF candidacy, it's a good measure, I think. ...  WARSCOR probably underrates catchers.  The catcher-adjustment in WAR does a good job for single seasons.  But it's not really designed to compensate for the toll catching takes on a body over the course of a career.  It's quite reasonable to argue that Piazza, Torre (if you're counting him), and Simmons (also Lo Duca, but he's really a non-factor in this discussion) should rank higher.  I wouldn't quibble with someone who argued that.

I think that does it for now.  Any thoughts?

As always, thanks to baseball-reference.com for the stats!

Monday, October 21, 2013

Best NFL Teams Ever, Part III: 1970-2012

And here we are:  post AFL-NFL merger.  Basically, this is the part of history football fans are generally familiar with.  Let's get straight to it.  Here's the 1970s:

1973 Los Angeles Rams, .885
1976 Pittsburgh Steelers, .880
1972 Miami Dolphins, .877
1975 Pittsburgh Steelers, .865
1970 Minnesota Vikings, .856
1973 Miami Dolphins, .854
1975 Minnesota Vikings, .841
1971 Dallas Cowboys, .839
1973 Dallas Cowboys, .828
1977 Los Angeles Rams, .824

Okay, be honest:  raise your hand if you thought the 1972 Dolphins were going to be the top team of the decade.  It's fine if you didn't.  But seriously, I'm SUPER impressed if you had the Rams with TWO of the top-ten teams of the decade.  The Rams did make one Super Bowl, but that was in 1979, when they scored only 14 more points than they allowed and went 9-7.  Both years, 1973 and 1977, they were upset in the  first round of the playoffs.  And in 1973, they actually did have the best record in the league:  12-2 - which is the record my system predicts, more or less (12.4-1.6).

The Vikings also had two great teams this decade, and made the Super Bowl thrice; just not in their best years, appearing in 1973, 1974, and 1976 (they also lost the Big Game in 1969).  The stars never seemed to align for the Vikes in the 1970s... or otherwise.

The 1972 Dolphins went undefeated, but weren't that great of a team.  In my opinion, calling them even a top-five all-time team is preposterous, and there's certainly an argument to be made via this model that they're not a top-20 team.  They just got lucky enough to win one-and-a-half more than they were expected.

The 1980s:

1985 Chicago Bears, .874
1984 San Francisco 49ers, .865
1984 Miami Dolphins, .817
1983 Washington Redskins, .799
1987 San Francisco 49ers, .798
1989 San Francisco 49ers, .786
1988 Minnesota Vikings, .767
1986 Chicago Bears, .751
1980 Philadelphia Eagles, .747
1981 Philadelphia Eagles, .722

Remember how, in the 1980s, the Raiders won two Super Bowls, and all the rest were won by NFC teams?  Well, this nearly-all-NFC top-ten may give an indication why that was.  The 1980s had probably the most parity of any decade.  It's actually crazy in how many seasons the teams were jammed pretty closely together.  In the 1960s, there were more teams with a .900 "record" than there were .800 teams in the 1980s!  The reason this is interesting, I think, is that you often hear the 1985 Bears and the 1984 'Niners and the 1989 'Niners in
discussions of greatest ever teams.  Only the top two teams of the 1980s would have even made the 1970s top ten.  It seems to me that the most dominant teams of the 1980s simply weren't that dominant relative to their peers, at least in the regular season.  That being said, the 1985 Bears were a pretty special team.  They are a reasonable group to have in a discussion of the best-ever teams, as are the 1984 49ers.  But the fact of the matter remains, neither of those teams can stack up to the sheer dominance of earlier teams, like the 1968 Colts or 1962 Packers, or the later dominance of some teams from the 1990s or the 2000s.  Actually, the 1980s look a lot like the 2010s.  The only difference is, the 2010s aren't even half over, and have plenty of time for a few dominant teams to sneak in.

The 1990s:

If I asked you to guess the best team of the 1990s, I can guess that you'd think of a few teams:  perhaps the 1992, '93, or '95 Cowboys.  Maybe you're sneaky, and you know how great the '94 49ers were.  Perhaps you remember 1998:  the year of five truly dominant teams, particularly Denver and Minnesota.  Maybe you like a team that was basically the 1998 Vikings 2.0:  the 1999 Rams, the Greatest Show on Turf.  Or maybe you favor the all-around dominance of the 1996 Packers.  Do you know which one was best?  Take a look:

1991 Washington Redskins, .929
1999 St. Louis Rams, .926
1998 Minnesota Vikings, .882
1996 Green Bay Packers, .876
1992 San Francisco 49ers, .825
1994 San Francisco 49ers, .822
1993 San Francisco 49ers, .797
1995 San Francisco 49ers, .789
1998 Denver Broncos, .782
1997 Denver Broncos,  .779

The 1991 Washington Redskins are a team that I often worry history will somehow forget.  They weren't dynastic.  They played a little worse than they're points scored/allowed total should have indicated (I have them at 14.87 wins; basically, they should have gone 15-1).  They rolled through the playoffs, thrashing Atlanta 24-7, crushing Detroit 41-10, and very solidly handling a very good Bills team, 37-24.  It was an excellent team, but the year before they were 10-6, the year after 9-7.  And they were sandwiched in the 49ers-Cowboys era of dominance, which makes them forgettable - even if they were the best team of the bunch.
The four 49ers squad above rank as the #2, 3, 5, and 6 49ers teams of the 1980s-1990s dynasty.  It's actually quite possible that, in spite of only winning one Super Bowl, the 49ers were better in the 1990s than they were in the 1980s.  That's insane to think about, considering they won four titles in the 1980s.
The 1997 Broncos were supposed to lose the Super Bowl to Green Bay, who was coming off a title in 1996.  The AFC hadn't won a Super Bowl since the 1983 season, when the LA Raiders defeated the heavily-favored Redskins.  What all the pundits ignored, though, was that the 1997 Broncos were a better team than the Packers.  The 1998 Broncos get more press because they started off 13-0; what no one ever tells you is that they were, from a point-differential perspective, more or less the exact same team as the year before - only 3 one-thousandths of a point different.
The 1996 Packers are a team that I have often, in barroom-type arguments, argued were more or less the equal of the 1985 Bears.  I used to make this claim in spite of not having done this research.  Those Bears outperformed their expected record by a game; the Packers underperformed theirs.  But they profile, basically, as exactly the same.

The 2000s:

2007 New England Patriots, .954
2001 St. Louis Rams, .856
2005 Indianapolis Colts, .791
2006 San Diego Chargers, .785
2005 Seattle Seahawks, .774
2000 Oakland Raiders, .772
2007 Indianapolis Colts, .771
2006 Chicago Bears, .760
2004 New England Patriots, .757

If you're surprised by the top team of the 2000s, you weren't paying attention to the teams of the era.  I'm quite certain that, even if I included playoffs, I would reach the same conclusion:  the 2007 Patriots were the best team of the decade, bar none.  And, if you're into making timeline adjustments when ranking teams, there's an extremely reasonable argument that the 2007 Pats are the greatest team ever.  The only other team since 1943 to best the Pats' .954 mark is the 1946 Browns of the AAFC.  And if you don't want to count them, that's fine.  The only team who's particularly close to the Pats is the 1968 Colts, at .949.
Of all the various top-tens I've shown, this one had the best rate of getting to the Super Bowl:  half of these teams made the Big Game.  They have the worst rate of winning it; only one team did (the 2004 Pats).
The gap between the best team of the decade and the 3rd-best is astronomical, with the #2 team closer to #3 than #1.  The near-100-point-gap between 1st and 2nd is also, far and away, the largest of any decade.  The 1999 Rams were much closer to the 2007 Pats than the 2001 Rams were.
Much like the 49ers of the 1990s being superior to the 49ers of the 1980s, there's ALREADY an argument that the Patriots teams of the 2010s will have been better than the Patriots teams of the 2000s, even if they go completely without a title.

The 2010s;

Admittedly, there's not much to write home about here... yet.  We're still waiting for our most dominant teams, which I assume will be coming later.  Here's the top-5 so far:

2010 New England Patriots, .791
2011 Green Bay Packers, .783
2012 Denver Broncos, .764
2011 New England Patriots, .741
2012 Seattle Seahawks, .729

Yup, Denver and Seattle were the two best teams last year.
I can't help but think that a Green Bay-New England matchup in 2011 would have made for a great Super Bowl.  Not that the Giants-Pats game was a bad one.  It just would have been interesting.
I think most people would have guessed the 2011 Packers as the top team of the decade so far, since they went 15-1.  Of course, they actually profiled to be a 12.5-win team, not a 15-win team.

So far, the best team of the current season is the Denver Broncos (in spite of their first loss to the Colts yesterday), who are +101 on the season.  The undefeated Chiefs are at +88.  If I were to do percentages today, before the Monday Night game, I could do that.  There have been 106 games this year.  There have been 4896 points scored.  That's 46.188 per game - the highest scoring season since 1965, if the trend were to continue.  The Broncos are +101 through 7 games, which profiles to 5.6867 wins in 7 games, a percentage of .812.  The Chiefs are at .772.  So it's possible that Denver is headed towards being the best team of the decade so far.  Only time will tell.

WEEK 15 UPDATE:
It's been a couple weeks since I last updated this, which was in week 11.  Whoops.  We're officially 224 games into the 256-game NFL season, so just two weeks remain.  At the moment, there have been 10634 points scored in the NFL this year.  That's 47.5 per game (one of the highest numbers of all-time; maybe THE highest number of all-time; I haven't checked for a while).  Here are the top seven "winning percentages" as of week 15:

Seattle Seahawks - .763
Denver Broncos - .745
Kansas City Chiefs - .717
San Francisco 49ers - .682
Carolina Panthers - .681
New Orleans Saints - .634
Cincinnati Bengals - .620

I should really schedule-adjust these rankings, but I'm not going to do that just yet, as it would be a crapton of work, and I do this all manually.  As it stands, Denver is 12 points behind Seattle; we'll see if one of them can manage two blowouts in the last to weeks to go down as the "team of the decade."

Sunday, October 20, 2013

Best NFL Teams Ever, Continued: 1943-1969

This era of NFL history is oft forgotten, and it's a shame.  Football fans, for some reason, think of history as beginning with the Super Bowl.  But it just plain didn't.  And it's unfortunate that they think that way.  In this era, I'm going to start looking at best teams by decade, because I think that'll be more fun that just lumping everything together.  So, the first "decade" will be 1943-1949.  But that's only 7 years, you say.  Well, keep in mind that we covered 1940-1942 in the last post.  But even so, the 1943-1949 "decade" covers 11 seasons, because the AAFC days were four years long, meaning there were two seasons each year from 1946-1949.  So we're still covering 11 "years" in this group!  Without further ado, the best teams of 1943-1949:

1946 Cleveland Browns, 1.018
1949 Philadelphia Eagles, .922
1948 Chicago Bears, .896
1948 Philadelphia Eagles, .889
1948 San Francisco 49ers, .887
1947 Cleveland Browns, .879
1949 San Francisco 49ers, .869
1943 Chicago Bears, .868
1945 Philadelphia Eagles, .868
1949 Cleveland Browns, .828

As I'm sure you noticed, the Browns from three out of the four AAFC years made the list.  Also, in the last post, I said that no teams should have "won" more games than they played outside of the 1920-1942 era.  Well, obviously I was wrong, because the 1946 Browns so thoroughly dominated the competition that they deserve a spot in that group, as well.  Of course, it wasn't actually the NFL, so maybe you'll forgive my mistake.  Anyway, the Eagles had probably their best decade ever in the 1940s.  Which is why it's a real shame for Eagles fans that NFL fans so quickly forget this era of pro football.  You may also have noticed that teams 3-5 all played in the same year:  1948.  San Francisco, obviously was in a different league than the other two, so never played them.  The 'Niners didn't even make the playoffs in their league; despite having the better point differential, the 'Niners went 12-2, while Cleveland went undefeated, and got to play a Buffalo team that went 7-7 in the regular season for the title.  Cleveland won, and that San Francisco team was forgotten.  You'll also notice that the 1948 Cleveland team is the only AAFC Cleveland team not to make the top ten.  They were #11.  In the NFL in 1948, Philadelphia won the championship, but not over Chicago.  Much like in the AAFC, the Bears (10-2) didn't even win their division, so the Cardinals (11-1) were the losers to the Eagles.  That's just how it goes sometimes.

The 1950s:

1953 Cleveland Browns, .860
1951 Cleveland Browns, .840
1954 Cleveland Browns, .831
1958 Baltimore Colts, .828
1950 Cleveland Browns, .801
1956 Chicago Bears, .801
1950 Los Angeles Rams, .785
1952 Detroit Lions, .784
1954 Detroit Lions, .782
1953 Chicago Bears,  .765

Holy Cleveland!  Again, much like Philadelphia in the 1940s, Cleveland's Golden Age for pro football was the forgotten 1950s.  And that's a shame.  Cleveland won NFL titles in 1950, 1954, and 1955, and probably also had the best team in the league in 1951 and 1953.  In their first six years in the NFL, only Detroit in 1952 managed to beat them both head-to-head, and in cumulative points.  Detroit also won back-to-back titles in 1952 and 1953, and had a team just as good in 1954... but they were crushed 56-10 by the Browns.  Even so, this was the Lions' best decade.  And it's been forgotten.  You'll notice a theme:  for these teams who had their best years in these "forgotten" eras of the NFL's past, they haven't won a title since.  It's time to start celebrating history; we may not see a title in Cleveland or Detroit for a looooong time otherwise!

The 1960s:

Basically, the 1960s gets a break.  People kinda start to think of this as the "modern" game, mostly because the Lombardi Packers dominated the decade, both before and after the Super Bowl began.  This allows people to think of it as more or less the same game, so you'll sometimes see NFL "historians" reference the 1960s as being part of the "real" history of the NFL... even though there hadn't been any real changes between the game of the 1950s and the game of the 1960s.  Anyway, keep in mind that the 1960s includes 10 years of the AFL; therefore, there were 20 "seasons" in the 1960s.  So here's the list:

1968 Baltimore Colts, .949
1962 Green Bay Packers, .927
1968 Dallas Cowboys, .927
1969 Minnesota Vikings, .920
1961 Houston Oilers, .900
1967 Oakland Raiders, .872
1961 New York Giants, .865
1968 Oakland Raiders, .856
1966 Dallas Cowboys, .839
1967 Los Angeles Rams, .830
1964 Baltimore Colts, .829
1968 Kansas City Chiefs, .825
1967 Baltimore Colts, .820
1969 Kansas City Chiefs, .808
1966 Green Bay Packers, .783
1960 Cleveland Browns, .780
1961 Green Bay Packers, .779
1966 Kansas City Chiefs, .774
1963 New York Giants, .773
1963 Green Bay Packers, .765

I was extremely surprised by two things Packers-related.  First, I didn't realize that Lombardi's Packers had so frequently outperformed their point differentials.  It may have been luck.  It may have been something about the team.  I doubt you'd find another team as successful as they who so often outperformed superior teams.  Second, I was certain that the 1962 Packers would rank as the best team of the decade.  But it was, in fact, the infamous 1968 Colts (who famously lost to the Jets in Super Bowl III) who took the honor of "team of the decade."  This puts a whole new spin on the idea on just how big of an upset that game was.  The five teams above .900 are the most by any decade since 1943.  I wonder if that'll hold.

Another unfortunate side-effect of people forgetting the pre-Super-Bowl-era is that the great teams of the AFL, like the 1961 Oilers, are basically forgotten.  Also, one more Packers-related thing:  the team won 5 NFL titles in 7 years.  But their 4th best team of that stretch is one that didn't:  1963, the #20 team of the decade.

Well, that's it for post #2.  We'll see if I can fit in everything since 1970 in one post.  Catch you later.

Best NFL Teams Ever: A Mathematical System

I've laid out on this blog before why I think using the Pythagorean formula for NFL team records is stupid.  The season's too short, and going with a linear, rather than quadratic, model shows the results pretty perfectly, ESPECIALLY at the extremes (i.e. Pythagorean will never predict winless or undefeated teams, and not even 1-15 or 15-1 teams, yet they happen ALL THE TIME, relatively speaking).  So I've devised this simple formula:

(TeamPointsScored-TeamPointsAllowed)/(AveragePointsInAnNFLGame)

Then you add that total to a .500 record, however many games that may be.

For the last variable, you take the average number of TOTAL points of an NFL game (usually about 44), not just the average for one team.  It's really simple.  

For example, the Pythagorean formula gives the 2007 Pats an expected W/L in the regular season of 13.8-2.2.  In my method, the Pats scored 589 and allowed 274, for a differential of 315.  In 2007, there were 256 regular season games played and 11104 points scored.  That's a total of 43.375 PPG.  315/43.375=7.26.  Then we add a half-season's worth of wins (8), and we get an expected W/L record of 15.26-0.74... WAY closer to their actual 16-0.

Anyway, I was thinking about this again, and thinking how it would be a good way to compare teams over time.  Except, of course, for the schedule-length issue.  So, I just take the answer and divide by the number of games in a season to get an expected winning percentage.  And that's what we'll go with.  Totally objectively ask the question:  "to what extent did this team dominate their opponents?"

I was initially going to publish the complete list of teams I did.  I took one to five teams from each year in NFL/AFL/APFA/AAFC history.  I'm pretty sure each franchise is represented at least once.  I wound up with 241 teams on the list.  Now, I'm pretty positive they're not the top 241 teams of all-time.  I probably missed as many as 50 or 60 that might be better than the worst team on this list.  But I'm absolutely CERTAIN that the top teams of all-time are represented.

Anyway, I'm not going to publish the full list because it's long.  So I'll start with this post in which I'll look at the early, pre-modern days of the NFL and APFA.  For me, the "modern" NFL starts in 1943.  That's basically when scoring reaches modern levels, and we stop having teams projected to win more than 100% of their games.  Anyway, in this post, I'll have 1920-1942.  In my next post, I'll detail 1943-1969, which includes the AFL and AAFC days.  And my final post will be about the true "modern" NFL, from 1970 to the present.  Here we go.

Before 1950, there is a problem with estmating winning percentages with my method.  That problem is that you will get winning percentages over 1.000.  It happens in more or less every single season.  Sometimes more than one team will be projected to have gone undefeated.  This happens because the gap between the "haves" and "have-nots" is too wide.  If I had looked for the worst teams of all-time, they would all have been from this era; likewise with the best.

You have to keep in mind a few things about this era:  not every team played the same number of games each year (at least until 1936; that's when every team started playing the same number of games, and there was an actual, organized schedule).  Not every team was in the league from year to year.  This creates HUGE gaps between the best an the worst.  Ties were extremely common.  For much of the era, there was no bowl game at the end of the season, so a champion was simply crowned.  It was a mess.  But I present to you (again, based on regular-season only) the greatest teams of the early days of the NFL (including its days as the APFA in 1920 and '21), from 1920-1942:

1923 Canton Bulldogs, 1.695
1925 Pottsville Maroons, 1.559
1922 Rock Island Independents, 1.494
1924 Cleveland Bulldogs, 1.407
1920 Buffalo All-Americans, 1.327
1942 Chicago Bears, 1.320
1929 New York Giants, 1.285
1922 Canton Bulldogs, 1.271
1921 Buffalo All-Americans, 1.269
1927 New York Giants

These ten teams are the greatest ten teams in NFL history, by this method.  Again, this is why we need to split everything that happened up to 1942 separately from the rest of NFL history.  There are a total of 27 teams in NFL history that were projected to win more games than they actually played; all of them are from this era.

You may also have noticed that these team names are not familiar.  Just because these teams were great in their own times, doesn't mean that they stuck around.  Canton/Cleveland, as you can see, was a dynasty.  The Bears of 1942 played in what was the closest to the modern NFL of any of these, and not just because they were the most recent of the bunch.  In 1942, in the average NFL game, the two teams combined for 32.38 points per game.  Scoring was under 30 points per game from 1920 to 1938.  Then in 1939, it reached 30, and hung right around there until 1942.  Then, in 1943, there was an explosion of offense, leading to a 39.65 points per game total.  The lowest it has been since then was in 1977, when it dipped as low as 34.35 points per game.  In other words, still higher than every year up to 1942.

Well, that's all I've got in terms of a history lesson for you tonight.  I'm just gonna keep posting until I get bored, so don't be surprised if there's another one up shortly!

Friday, October 18, 2013

2013 Awards: My Picks

I made picks at the 1/3 mark, and at the 2/3 mark.  I couldn't decide when to make my final pick.  Should I do it before the BBWAA announced?  After?  Release my picks on the same days as theirs?

Well, I don't have the patience necessary to wait that long.  But even if I did, I already posted my Internet Baseball Awards ballot over at Baseball Prospectus.  So I figured, with that done today, and while I'm clichedly sitting in a coffee shop (actually a Barnes & Noble) on my computer, it's a perfect time for blogging.  Especially about baseball, since I just did the awards ballot!  Anyway, on to the picks, and the justifications:

As I did in my last article, I'll list my pick from the 1/3 mark, my pick from the 2/3 mark, and who I predict will actually win the award, when the time comes.

AL Manager of the Year
My Ballot:
1.  John Farrell, BOS
2.  Terry Francona, CLE
3.  Bob Melvin, OAK
1/3 Pick:  Joe Girardi, NYY
2/3 Pick:  Farrell
Predicted BBWAA Winner:  Farrell or Francona

Personally, I think that Farrell will be the winner.  Worst-to-first is basically always a recipe for the MOY award.  And Farrell did a great job.  The pitching was outstanding, but it was mostly the 'pen.  Which means no-name guys.  Which means that most people say thing like, "Well, besides Pedroia and Ellsbury and an over-the-hill Ortiz, who d'they got?"  And they won anyway.  Francona's work in Cleveland is also impressive.  Bob Melvin, winning with Josh Donaldson as his best player... wow.  He won't get enough credit, because Oakland made the postseason last year.  But he should have strong consideration, too.  I'm going with Farrell, because... well... I have no idea how to evaluate managers, and his team went worst-to-first.  I'm not beyond simplistic analysis when it's called for by the situation.

NL Manager of the Year
My Ballot:
1.  Clint Hurdle, PIT
2.  Mike Matheny, STL
3.  Don Mattingly, LAD
1/3 Pick:  Hurdle
2/3 Pick:  The manager whose team wins the central
Predicted BBWAA Winner:  Hurdle

Pittsburgh didn't end up winning the central.  Matheny's Cardinals did, just like they do every year.  I'm so sick of the Cardinals.  But, credit where credit's due:  people thought the end of the LaRussa/Pujols era spelled the end of the Cards.  And yet, here they are, in the NLCS again.  Still, the story in Pittsburgh is too good.  It's not so much that Hurdle deserves the award over Matheny as it is that Pirates fans deserve as many awards as we can send their way.  Mattingly gets a vote because of Yasiel Puig.  I don't know how else to put that.  I guess I have to say that going with the flashy, foreign rookie was an inspired choice with his job on the line.  Baseball managers are, by nature, conservative creatures; they don't shake up the status quo unless necessary, and when necessary, they tend to go with "safe" over high-risk, high-reward.  And yet, here was Mattingly, with his job on the line, going with an unproven player.  And it worked out so well that the Dodgers were the best team in baseball once Puig came up (Kershaw and Greinke sure didn't hurt, though)!

AL Rookie of the Year
My Ballot:
1.  Wil Myers, CF, TBR
2.  Nick Franklin, 2B, SEA
3.  David Lough, LF, KCR
4.  Jose Iglesias, CF, BOS/DET
5.  Brad Miller, SS, SEA
1/3 Pick:  None
2/3 Pick:  Myers
Predicted BBWAA Winner:  Myers
I hated this year's AL rookie class.  There's a bit of promise, but no dominant, no-doubt winner.  I went with Myers, because he was the best prospect of the bunch.  There was a lot of WAR-based voting for this one, because no one made a big enough splash.  That's how a good-not-great-half-season by Miller wound up on my ballot.  Iglesias is up there because of the name.  Lough did a nice job filling in for Alex Gordon in left for Kansas City.

NL Rookie of the Year
My Ballot:
1.  Yasiel Puig, RF, LAD
2.  Jose Fernandez, SP, MIA
3.  Julio Teheran, SP, ATL
4.  Hyun-jin Ryu, SP, LAD
5.  Shelby Miller, SP, STL
1/3 Pick:  Miller
2/3 Pick:  Puig
Predicted BBWAA Winner:  Fernandez

Now THIS was a rookie class to remember.  The pitching!  My oh my!  Fernandez, Teheran, Ryu and Miller were all outstanding!  And yet, I had them all behind Puig.  Obviously, this should be a two-horse race between Fernandez and Puig.  Fernandez posted the better overall numbers, but Puig was churning out wins at a faster clip.  I voted that way this time, but it's by no means the "right" way to have voted.  I expect Fernandez to be the winner, and if/when he does, a deserving player will have won.  It's a lot like the last couple of NL MVP votes:  lots of players having MVP-type seasons.  There aren't "wrong" choices in years like that.  Just lots of right ones.  When awards season goes wrong, it becomes about calling the "other" side stupid for their choices.  When awards season goes right, it's about celebrating the performances of people who brought us a tremendous amount of enjoyment, just by watching them play a beautiful game.  This rookie class will be an exciting one for years to come.  You just can't help but feel excited.

AL Cy Young
My Ballot:
1.  Max Scherzer, SP, DET
2.  Anibal Sanchez, SP, DET
3.  Chris Sale, SP, CWS
4.  Felix Hernandez, SP, SEA
5.  Hisashi Iwakuma, SP, SEA
1/3 Pick:  Hernandez
2/3 Pick:  Hernandez
Predicted BBWAA Winner:  Scherzer

It's been close among all these guys all year.  Pleasantly, like the NY ROY race, this is just a time to celebrate great performances by great players.  The argument for a guy like Iwakuma is easy to make.  Even an argument for Yu Darvish, who didn't even make my ballot is easy to make.  There were a lot of excellent pitching performances, but nothing like the Verlander and Halladay performances of recent years, such that there was a no-doubt winner.  But Scherzer has the best "traditional" stats, and when the sabermetrics are pretty even, I have no problem with looking at the traditionals, so he will win.  In fact, I didn't use the "traditional" stats (ERA and W-L record) at all, and I still picked Scherzer as the winner.  But it's a lot closer than the W-L record would indicate.

NL Cy Young
My Ballot:
1.  Clayton Kershaw, SP, LAD
2.  Cliff Lee, SP, PHI
3.  Adam Wainwright, SP, STL
4.  Matt Harvey, SP, NYM
5.  Jose Fernandez, SP, MIA
1/3 Pick:  Kershaw
2/3 Pick:  Kershaw
Predicted BBWAA Winner:  Kershaw

And we have our first (and only!) unanimous, season-long choice!  Kershaw led the league in ERA and strikeouts.  He led in WAR.  He was, quite simply, the best pitcher in the NL this season.  Wainwright and Lee seem to be in always-a-bridesmaid mode.  That's unfortunate, because they're both outstanding pitchers. Lee, at least, has won a Cy Young.  Wainwright may never be so lucky.  Speaking of Lee, did you even notice how quietly excellent he was this year?  222 innings, 222 K, a 2.87 ERA in a park that was about 3% higher scoring than average this year.  Very quiet, very solid.  He won't get much support, but that's a shame.  And Harvey may have given Kershaw a run for his money, but his injury prevented that discussion.  Fernandez, again, could be considered for the second spot, but in a year like this, it wasn't to be.  The field is just too deep.

AL Most Valuable Player
My Ballot:
1.  Mike Trout, LF, LAA
2.  Josh Donaldson, 3B, OAK
3.  Miguel Cabrera, 3B, DET
4.  Robinson Cano, 2B, NYY
5. Chris Davis, 1B, BAL
6.  Max Scherzer, SP, DET
7.  Evan Longoria, 3B, TBR
8.  Manny Machado, 3B, BAL
9.  Anibal Sanchez, SP, DET
10.  Dustin Pedroia, 2B, BOS
1/3 Pick:  A debate between Cabrera and Trout
2/3 Pick:  A debate between Cabrera and Trout
Predicted BBWAA Winner: Cabrera

Okay, if you want to count this one as unanimous all way through the year, go ahead.  I did mention Trout in each one.  And, after all, what more can be said about the Millville Meteor (besides the fact that his is the best nickname in MLB in decades)?  He's too incredible.
By the way, if you were going to be impressed by the 3 Tigers on this list, just look at the number of third basemen:  4 of the top 8 players in the American League!  It reminds me of the A-Rod-Jeter-Garciaparra days, when the league was just stacked with shortstops.  Hopefully, these guys all continue the way they're going now.
Davis, Carbrera, and Trout have gotten the majority of the press.  And I don't want to go on too long about any one of these guys.  But they're all wonderful.  The hardest spot for me was the 10th.  I considered a couple of pitchers, but went with Pedroia because at least those pitchers get Cy votes; without this, Pedroia's got nada, and that didn't seem fair to his excellent season.

NL Most Valuable Player

My Ballot:
1.  Andrew McCutchen, CF, PIT
2.  Carlos Gomez, CF, MIL
3.  Clayton Kershaw, SP, LAD
4.  Matt Carpenter, 2B, STL
5.  Paul Goldschmidt, 1B, ARI
6. Cliff Lee, SP, PHI
7.  Joey Votto, 1B, CIN
8.  Adam Wainwright, SP, STL
9.  Yadier Molina, C, STL
10.  Troy Tulowitzki, SS, COL
1/3 Pick:  Gomez
2/3 Pick:  Gomez
Predicted BBWAA Winner:  McCutchen

Just as there were three Tigers in the AL top-9, so too there are three Cardinals in the NL top-9.  It's therefore no surprise to see both of those teams in their respective LCSs.  A lot of these NL guys flew under the radar:  Carpenter, Goldschmidt, and Tulo, in particular.  But nice seasons from them all.
Like the AL, I debated some pitchers for the last spot or two.  But Molina and Tulo would only get votes here, while the pitchers got something for the Cy, so I went with position players here.
As for the winner, I went with McCutchen.  Gomez got hurt shortly after the 2/3 column I wrote.  And, ultimately, that's the difference between him and McCutchen for me.  By the way, I've been voting in the IBAs for three years now, and I've had a Brewer in the top-2 each year:  Braun in 2nd two years ago (to Matt Kemp), Braun as my personal winner last year, and Gomez in 2nd this year, to McCutchen.  These aren't just homer picks; the team has actually had players that good.  That would have been unthinkable when I was young, growing up watching the terrible, mid-90s and early-00s Brewers.  But this has been a fun almost-decade to be a fan of this particular mid-market team.  I hope things only improve from here.

Any agreement or disagreement?  Want to tell me why I'm wrong?  I'd love to hear from you, so please comment!

A special thanks in this article to The Baseball Gauge, Baseball-Reference, Fangraphs, and Baseball Prospectus.  All were invaluable.  And if this column interested you at all, please consider following our RSS.  Also, consider joining Baseball Prospectus.  Even a free membership there gets you a daily e-mail with free content, as well as a vote in the IBAs.

Saturday, September 28, 2013

WARSCOR Revisited

I like tweaking.  I've tweaked WARSCOR a bit since its inception.  And I've never really been satisfied.  Because while I've done good work with it, and I think it has (in a lot of ways) improved, there's are two things I've been really unhappy with.  The first is that each revision of WARSCOR has made it more complex than the previous iterations.  As if it weren't bad enough that I'm using a system (Wins Above Replacement) that's controversial, I'm also further complicating it by using averages that aren't just finding the standard mean, which is confusing.  And then I'm using Wins Above Average which, although a more intuitive measure than WAR, is still confusing and confounding.  The second thing is that the system is a little arbitrary.  I mean, there are the two categories of "career" and "peak."  And each category has three subcategories:  total value, vanishing value (first number times x, second number times x-1, third number times x-2, etc.), and a vanishing value that starts at a higher number, so that the differences between season n and season n+1 are closer together.  Career, obviously, is not arbitrary; but for peak, I chose 10 years.  I claimed this wasn't arbitrary, because it's the minimum requirement for the Hall of Fame.  That's just stupid.  It's arbitrary.  And what's worse, the numbers I chose for the vanishing coefficient (starting at 30 and 45 for career, and 10 and 15 for peak, respectively) are arbitrary, as well.  And then I compound the whole thing by doing it again with WAA - and taking yet another odd average!  The main point of this whole exercise was to develop a formula in which I had confidence.  And while I like the results being put out by the current iteration of WARSCOR, the arbitrariness of the whole system is something I find really irksome.  So it's, more or less, back to the drawing board that I went.  And I finally came to a solution that I found palatable.

Let's start with the stuff I got right.  Number one, boiling a whole career down to one number that can serve as a quick reference point; that's good - but WAR already does that on its own.  So, number two, weighting peak and career differently, and giving them input into the said one number - that's really good.  That, we have to keep.  Number three, I liked that I sorted a player's career, starting with his best season, leading to his worst.  Number four, the idea of the vanishing coefficient is salvageable, and definitely does something for balancing peak and career values.  Number five, making sure to remember that any one-number system is, by nature, the start of the conversation, and not the end is the most important lesson of all.  We're definitely sticking with that one.  But everything else is fair game.

First of all, I'm scrapping WAA from the formula altogether.  There's no reason to include it.  I understand why a lot of people (people for whom I have a great deal of respect, as well!) want to use it to derive peak value (Adam Darowski at the Hall of Stats and Tom Tango over at his blog are both proponents of this line of thinking).  And many of those thinkers (including both of the aforementioned ones) believe in only including WAA for positive seasons.  That's not really my boat, because I think you have to account for value if it happened, for good or for ill.  Nonetheless, while I respect these other people, I can't help but think that if there's value in being between replacement and average, we must account for that.  Anyway, I think WAR will work just fine.  Most of all, there's the frustration many people (like me) feel in that, for example, with WAA, a perfectly average pitcher gets 0 WAA for 200 average innings, while a September call-up who throws 4 decent innings could easily have something like a 0.2 WAA.  That's just wrong, and I want the Tommy Johns and Jamie Moyers and Craig Counsells of this world to get credit where credit is due.  It's hard to be a MLB player, particularly to be one above replacement level!

Second of all, I never liked that there were six categories getting averaged.  That kinda defeats the purpose of the whole "coming up with a system" idea.  I mean, you want to simplify things, not make them more convoluted than ever.  My thinking at the time went something like the idea of crowdsourcing:  get a bunch of different by similar measures together, and ask them to spit out a number; that number will be better for having had many inputs.  There's some amount of wisdom in that, but not enough to inspire the level of confidence for which I was hoping.  First of all, six isn't nearly enough inputs.  I should have had 60 if I wanted any confidence.  But 60 is too many, so that wouldn't work.  So we're scrapping the idea of six categories, too.  We're going down to one category.  It must include peak and career, but just one category.

Finally, we come to the vanishing coefficients.  Ah the vanishing coefficients.  Actually, they were a good idea.  It's a thing that Bill James does a lot in his work.  I remember an article in the New Bill James Historical Baseball Abstract about the great pitching rotations of all time, and he scored them by giving one point for each Win Share by the top starter, two for each WS by the #2 guy, 3 for each for the #3, etc.  The idea was that the more balanced the rotation, the higher the score - basically, so you don't end up saying the 1985 Mets were the greatest rotation of their generation because Doc Gooden was so good that, no matter who else was in the rotation, they'd come out on top.  Anyway, the vanishing coefficients were designed to mimic that.  But I had to pick arbitrary points.  And I didn't like that.  Not only did I not like it because it was arbitrary, though.  For example, one of them for the whole career started like this:  45(x1)+44(x2)+43(x3)....  This is terrible, because 44/45 is not the same as 43/44 - so the relationships between the terms aren't consistent.  So that would have to be fixed.

And it is!  We're keeping the vanishing coefficient idea, because that's how we can be sure to include "peak" in the "one-number" that I'm coming up with.  The idea is actually really simple:  to keep the coefficients in relationship, I'm going to make the relationship non-linear (by squaring) and use decimals to make sure that the weights start at one.  Like this:  x1*0.8^0+x2*0.8^1+x3*0.8^3...; which essentially translates to 1(x1)+.8(x2)+.64(x3)..., which I like a lot better.  There's still sort of a linear component (because the power is going up by one for each term) and a quadratic one.  The only thing left to do was to choose the constant.

For the constant, I chose 0.9 - and there are at least two reasons for this.  Number one, "9" is the great number of baseball numerology:  nine innings, nine fielders, nine in the batting order, a forfeit is recorded as 9-0, etc., etc.  It's a wonderful thing.  But more importantly, I tried some other ones.  I tried .8, and it came out WAY overvaluing peak.  It basically said that if Mike Trout puts up 7 WAR next year, he'd have a better HOF case than Yogi Berra.  That's a little messed up, in my opinion, great as Trout has been and as much as catchers always pose problems for this kind of system.  So I tried numbers closer to 1, and 0.99 and 0.95 both kept putting things out that were much, much too close to the order in which players appear, simply based on career WAR.  Actually, using 0.9, the resultant order is pretty close to the order that WARSCOR 2.0 put out.  But that wasn't the goal, per se - but it was nice to see.  So here's what we do for WARSCOR 3.0:

Take the WAR accumulated by a player in each season of his career.  Sort from greatest to least.  Number them, starting with 0.  We will call this number n.  The numerical value for the WAR of each season we will call x, with x1 representing the best season, x2 the second best (as I have done in the entirety of this post so far).  So the formula is simply x1*0.9^n + x2*0.9^n + x3*0.9^n . . .

Here's a sample player.  Johnny Bench played 17 seasons in the Major Leagues, accumulating 75.2 WAR (baseball-reference version).  In order from greatest to least, with the first term being numbered "0," they were as follows:

0.  8.6
1.  7.8
2.  7.5
3.  6.6
4.  6.1
5.  5.6
6.  5.0
7.  5.0
8.  4.7
9.  4.6
10.  4.5
11.  4.1
12.  3.3
13.  1.1
14.  1.1
15.  0.0
16.  -0.5

For Bench's career, that means we do:

8.6*0.9^0 + 7.8*0.9^1 + 7.5*0.9^2 + 6.6*0.9^3 + 6.1*0.9^4 + 5.6*0.9^5 + 5.0*0.9^6 + 5.0*0.9^7 + 4.7*0.9^8 + 4.6*0.9^9 + 4.5*0.9^10 + 4.1*0.9^11 + 3.3*0.9^12 + 1.1*0.9^13 + 1.1*0.9^14 + 0*0.9^15+ -.5*0.9^16 = 46.9

Compare that to Bench's teammate, Pete Rose, who played 24 seasons, accumulating 79.4 WAR - just more than Bench.  I'll spare you the list, but suffice it to say that Rose's total was 46.8 - just a hair below Bench, instead of above him.  Bench's stronger peak outweighs Rose's longer hangaround value.  And if you think Rose's really negative years are dinging him here (his four worst seasons were .4, .9, 1.1, and 2.1 - a total of 3.5 -  Wins Below Replacement), those seasons total cost him only -.4 WARSCOR points - not even close to the 3.5 that he's dinged by just using standard WAR - and yet, they're still accounted for.  And yes, those are enough to make up the difference between Rose and Bench.  So while they do impact how these two rank, it's still an interesting exercise, don't you think?

It's simple, it's elegant, it does exactly the job that WARSCOR was intended to do.  I now firmly believe that WARSCOR is every bit as sensible as any other HOF measure out there:  JAWS, CAWS, the Hall of Stats - any of them.  I'll take WARSCOR 3.0 as my pick.

Sunday, August 18, 2013

A Little Late, but CLOSE to on Time!

Remember how I did awards 1/3 of the way through the season?  Well, in theory, I was going to do awards 2/3 of the way through the season.  However, in the last 7 weeks, I have a) started a new job, b) moved into our first house, and c) had to meet approximately 6 million people, which takes a bit of time.  Needless to say, updating this blog, which is often the first thing that falls by the wayside in my life, fell victim yet again.  But here we are, about 120 games in - which means we're at (roughly) the 3/4 mark.  In my opinion, 75% is close enough to 67% that we're going to call this a win.

Anyway, without further ado, here are the awards as I see them.  I will also place my choice from the previous article, so that you can compare them without having to switch back and forth to the old post.  Finally, I'll make my best guess as to who wins each award.

AL Manager of the Year:  John Farrell, BOS
1/3 Choice:  Joe Girardi, NYY
Prediction:  John Farrell, BOS

As everyone knows, there are two ways to win the coach/manager of the year in major professional sports.  You can either lead a team so dominant that it's impossible not to give you the award, or you can surprise people.  As I see it, there is no one dominant enough in the AL to fall into the first category.  But in the second, there would be two choices:  John Farrell in Boston and Ron Washington in Texas.  Washington lost key players in the offseason.  You may have heard of the Hamilton fellow.  The Angels were going to be on the rise, and the A's had made big strides.  The Mariners even spent a good part of the year in the hunt.  But they have fallen off, and while the A's are still close, virtually NO ONE expected Texas to compete.  But that pales in comparison to the odds faced by Farrell before the season started.  A huge but aging payroll.  An "ace" in John Lackey who is, at this moment, exactly 1 game over .500 in FOUR YEARS in Boston.  A last place finish.  A clubhouse that was supposedly embattled.  A dump of star-level talent including Adrian Gonzalez and Carl Crawford.  It was an impossible situation, particularly for a manager who had a reputation as a guy who just couldn't get a team "over the hump," based on two years of managing an up-and-down Toronto team.  So what did he do?  Stepped in and brought the team to first place, that's what.  And not just first, but the best record in the American League.  Sounds like a Manager of the Year to me.

AL Rookie of the Year:  I still have no idea
1/3 Choice:  None
Prediction:  Probably Wil Myers, OF, TBR

Good luck with this one.  On the one hand, the only person with a sizable sample is probably Jose Iglesias.  But it's not too often that you see a guy traded midseason win a major award like that.  So I don't think it's going to happen.  On the other hand, whom do you vote for?  It's wide open, as far as I'm concerned, but Myers is the biggest name, so he'll probably get the win.  And when it's all said and done, he'll probably get my vote, too.  But there's too much season left and too many things could happen for me to make any sort of choice, much less a prediction.

AL Cy Young:  Felix Hernandez, SEA
1/3 Choice:  Hernandez
Prediction:  Max Scherzer, DET

Last year, I thought that Max Scherzer was probably the best pitcher in the AL.  I didn't have the stones to say it publicly, but it's what I thought.  He's AGAIN putting up stellar numbers.  I may actually change my mind on this one.  But midseason, I don't put too much thought into these awards, and Felix has been outstanding.  Leading the league in innings, and STILL with an ERA under 2.5?  Whew.  That's impressive.  Like 2010, he doesn't have the gaudy wins number.  Unlike 2010, he doesn't have the nasty losses number.  Like 2010, he's a workhorse.  Unlike 2010, he's not leading the league in ERA (he's second).   Hideki Kuroda, Anibal Sanchez, and Chris Sale (this year's Cliff Lee, who was last year the best pitcher in the NL but had a terrible record) would also be inspired choices.  It's a year where you can't go wrong.  But Scherzer has the best W-L, so he'll win.  And that's not necessarily wrong.

AL MVP:  A debate between Miguel Cabrera, 1B, DET and Mike Trout, OF, LAA
1/3 Choice:  A debate between Miguel Cabrera, 1B, DET and Mike Trout, OF, LAA
Prediction:  Cabrera

"Here we go again."  I'm not the first person to write those words about this debate.  I won't be the last.  It's certainly a lot "closer" this year than last year, particularly without Cabrera having the Triple Crown.  However, Trout hasn't been quite as good, either.  Neither has dominated the other.  Well, Cabrera did for the first part of the season, but Trout snuck up by consistently outplaying him since June.  Either way, the poetically just thing would be for each of them to have an MVP after this astonishing two year stretch.  That won't happen, because Cabrera will win.  And it will just get filed away in my mind like all of Albert Pujols' if-only-Bonds-had-been-in-the-AL-2nd-place-MVP-finishes.

NL Manager of the Year:  The manager whose team wins the Central
1/3 Choice:  Mike Matheny, STL
Prediction:  The manager whose team wins the Central

That's totally a cop-out, I know.  But it's the truth.  Everyone thought Cinci had it in the bag.  They'll make the playoffs, sure.  But either St. Louis or Pittsburgh is going to win that division.  And the winning manager will have been a shock, and he'll deserve the win.  I also wouldn't be surprised if it were Fredi Gonzalez, but I said before the season that the Braves were the most talented team in baseball, so I can't imagine casting a vote for their manager when they have the best record.  Don Mattingly will get support here.  That's good, and justified.  But just remember:  people were calling for his head early in the season.  That was dumb, because he wasn't that bad.  But he's not THIS good either (though Yasiel Puig might be).  Any of those choices would be fine, but it's awfully tough to root against Pittsburgh and, by extension, Hurdle (especially because this season, unlike past ones, my Brewers aren't in the hunt).

NL Rookie of the Year:  Yasiel Puig, OF, LAD
1/3 Choice:  Shelby Miller, SP, STL
Prediction:  Puig

At the 1/3 marker, I wrote, " It's totally crazy, by the way, that I actually considered Yasmiel Puig's one week of play to make him the winner in this category.  He's been that scary good."  It hasn't let up.  He's been a monster.  He's obscuring fine seasons by the aforementioned Miller, Jose Fernandez (SP, MIA), Julio Teheran (SP, ATL), Nolan Arenado (3B, COL), among others.  It's a much deeper (and better) field than their American League counterparts, and it's a shame that only one of them will walk away with a ROY, while someone in the AL will grudgingly have to get one.

NL Cy Young:  Clayton Kershaw, LAD
1/3 Choice:  Kershaw
Prediction:  Kershaw

He leads the league in innings.  He leads in ERA.  He pitched the most, and he's prevented the most runs.  I said a lot more at the 1/3 point, but I think that about sums up my thoughts at this juncture.  Oh yeah:  and he's getting hotter as the season goes on.  Scary.

NL MVP:  Carlos Gomez, OF, MIL
1/3 Choice:  Gomez
Prediction:  Andrew McCutchen, OF, PIT

McCutchen and Gomez are both great choices.  At the moment, Baseball-Reference has each with 6.5 WAR.  I'm sticking with Gomez.  Not because he's a Brewer (at least I like to THINK it's not because he's a Brewer) but because he's been more consistent.  McCutchen has a 24-point advantage in OPS+.  If there were a "Def+" metric, I don't doubt Gomez would have a similar one over McCutchen.  Gomez has produced in a lineup that has been totally lackluster, with the exception of the outstanding Jean Segura.  Unfortunately, Gomez will get no support.  However, McCutchen has been great, and has been in this discussion for two years.  It will be a pleasure to see him get his due, even though I believe that I would have cast a vote for Gomez, if I had the opportunity.

Thanks to Baseball-Ref, to Fangraphs, and to The Baseball Gauge.  They're the three best sports statistics sites on the web, and they're all devoted to America's Pastime.

Sound off below on why I'm an idiot for my choices, or my predictions.

Monday, June 10, 2013

1/3 of the Way Through...

It's been a LOOOONG time since I've updated.  Of course, I probably start 80% of my posts here that way.  But that's life for you.  Anyway, I just noticed today in Jonah Keri's power rankings that it's week 10 of 30 in this MLB season.  And while others will give you midseason awards, I think 1/3 of the way season awards makes more sense.  Baseball's built in threes:  three strikes, three outs, nine innings, nine fielders, nine hitters in the order, 162 games (which is 3*3*3*3*2, so there's plenty of threes right there), three, three, three.  The All-Star break is three days.  There are three divisions in each league.  Seriously, this can go on forever.  So why celebrate the halfway point when we could celebrate the 1/3 point?  Of course, every team has already played more than 1/3 of their games, but who cares?  So here are my award winners for 2013, to this point in the season.

AL Manager of the Year:  Joe Girardi, NYY

You'll see this theme re-tread in the NL comment for this category, but seriously:  what are the Yanks doing as contenders?  For at least 7 years, I've been hearing that this would be the year the Yankees were done.  Not so, it seems. Apparently, it's never to be.  Or something.  I don't know what Joe Girardi is putting in the water up there (insert 'roids joke here), but they're way outperforming expectations.  Tip o' the hat to John Farrell and Ron Gardenhire for inspired jobs, as well.

AL Rookie of the Year:  Hell if I know.

I haven't the foggiest idea.  Enlighten me in the comments.  I don't think anyone's been good enough at this point in the season to merit the award.  I guess we could just find some way to give it to Mike Trout again, maybe.  If Hisashi Iwakuma is still eligible, it's him in a landslide.  Otherwise, I'm just not sure.  Thankfully, there are another hundred-or-so games in which someone could separate from the pack.

AL Cy Young:  Felix Hernandez, RHP, SEA

Yeesh... this one gets tougher and tougher every year.  Could it be Buchholz?  Yeah.  Could it be Chris Sale?  Maybe.  Hisashi Iwakuma?  Quite probably.  I think it's one of the guys in Seattle, and so I'm giving it up for Hernandez here, and letting track record be the tie-breaker when everything else is so hard to separate (as you'll see in the NL section). 

AL MVP:  A debate between Miguel Cabrera, 1B, DET and Mike Trout, OF, LAA

 Obviously, this has been written about already.  And I'm not talking about last year.  I mean that this very year, we're talking about the same two players:  one, the greatest hitter in a generation, the other nearly as good, but with the baserunning and defense to make him a true five-tool player.  Chris Davis is obviously in the mix here, as well, but I'd go with one of the more proven players at this point.  Cabrera's been better so far, I think, but Trout was better last year and Cabrera won the award, so perhaps an inversion would be poetic justice.  Nonetheless, at this point, I'm going to just split the award right in two.  And perhaps next season, we can just rename the AL MVP trophy as the Cabrera/Trout Trophy.


NL Manager of the Year:  Mike Matheny, STL

I hate the Cardinals.  Unless you're a Cardinals fan, you should, too.  They're WAY too good, WAY too often.  They've completely overperformed three years running now, and I'm sick of it.  But then again, if you looked at their roster on opening day and said, "Well, that's OBVIOUSLY the best team in MLB," you were lying.  And yet, against all odds, they ARE the best team.  Kirk Gibson deserves strong consideration here, as well.  And if you're into rewarding people for doing what you expected, perhaps Fredi Gonzalez deserves the nod.  I said before the season started that the Braves were the best team in the NL.  I expected Washington wouldn't live up to what they did last year, and I'm glad to say I've been right about that one so far.  But Gonzalez has done a good job managing Atlanta, in spite of underwhelming performances from BJ Upton and Jason Heyward.

NL Rookie of the Year:  Shelby Miller, RHP, STL

Miller's been great.  Cardinals are yucky.  I have no more to say on this subject.  It's totally crazy, by the way, that I actually considered Yasmiel Puig's one week of play to make him the winner in this category.  He's been that scary good.
NL Cy Young:  Clayton Kershaw, LHP, LAD

What more is there to say about Kershaw?  He's been outstanding yet again.  The Koufax comparison, which has been noted, is not as outrageous as it sounds.  Kershaw is the best pitcher in the NL for the third year running.  It's not just the run environment in LA, either.  He's third in the league in ERA+.  And while there are two players with lower ERAs than Kershaw's 1.93, he's done in in 93 innings, which is also third in the league.  If you want to give it to Adam Wainwright, He-Who-Must-Not-Give-Walks, be my guest.  If you think that Cliff Lee is again the league's best pitcher (as I did last year), more power to you.  You could pick wunderkind Matt Harvey.  If you think it's my ROY pick, I begrudge you not.  But I'm sticking with Kershaw for now.

NL MVP:  Carlos Gomez, CF, MIL

Homer pick?  No.  (As an aside, though, Milwaukee's outfield and the left side of their infield all have legitmate arguments as the best in the NL at their positions; it's just that the rest of the team has been SO dreadful that the Crew are still a last-place team.)  Gomez has been the best player in the NL this year.  Baseball-Reference has him as MLB's only 4-win player at this point in the year.  Fangraphs has him at 3.8, one tenth of a win behind Tulowitzki.  As always, Gomez has the glove and baserunning to merit high praise.  But unlike before, the bat has been there, too.  He leads the Brewers with 11 (yeah - he has more than Braun, Ramirez, Lucroy, or Weeks).  He's got 12 SB.  He could well have a 30-30 year - the kind of year the Mets expected when they were so desperate to hold onto him that only the greatest pitcher on the planet (Johan Santana) was worth giving him up for.  He still doesn't walk, but Gomez is a .300 hitter (thanks to an expectedly-high BABIP).  I've gotta think that, finally, at age 27 (no surprise there) that Gomez has finally put it all together and is ready to be the player he was always capable of being:  the best one in the National League.

Thanks, as always, to Baseball-Reference and Fangraphs for data and for awesome leaderboards, which helped in the construction of this post.

Got a beef with my picks?  Sound off below!

Wednesday, January 16, 2013

A Simpler Way to WAR (long post)

Since discovering Wins Above Replacement, I've had basically two ambitions:  one, to create an uber-stat which I could use to combine peak and career weight when having a Hall of Fame discussion.  I believe I've already done that with WARSCOR, particularly in its new revision (though it's admittedly much more convoluted than the Hall of Stats or JAWS methodologies).  The second goal I've had is to make calculating WAR a simpler process.  In other words, to be able to calculate it myself quickly and efficiently.  Well, I'm here to say that, while I haven't quite "done it," I've gotten much, much closer.  I now have a very good (and reasonable, I think) way to calculate WAR for offense and for pitching.  Defense, not so much, but that's okay (and we'll explore what that might look like, if there ever were such a thing, at the end of this post).  This is a start.  It doesn't really do that, but it does give you a way to compare offensive players to pitchers by giving them a "won-lost" record which matches up with a pitcher won-lost record - in other words, there are 162 decisions for the pitchers, but also 162 "decisions" for the hitters.

You may recall my last post that I used ERA+ to give relievers a "record."  Well, we're not going to worry about Fibonacci wins like I did in that post (though you'd certainly be welcome to play with them, if you feel like it).  But the rest of the methodology stays the same, pretty much.  Except that now, we're going to be comparing to replacement level.  So, what is a reasonable replacement level?  How about, just picking something out of the air, .310?  If you'd like to use something different, you're welcome to it.  Just follow the same steps, only with a different product side of the equation.  We have to find out that, if a team's run prevention and run scoring are equally bad (compared to average), what would they be for a .310 winning percentage?  The equation looks like this:

           (100-x)^2                     -     31
(100-x)^2 + (100+x)^2           -    100

100(100-x)^2 = 31(100-x)^2 + 31(100+x)^2
69(100-x)^2 = 31(100+x)^2
69(10000-200x+x^2) = 31(10000+200x+x^2)
690000-13800x+69x^2 = 310000+6200x+31x^2
380000-20000x+38x^2 = 0

Solve that equation, and you get (roughly) 20.  Actually, using 20, you get a winning percentage of .307, but that's good enough for me.

So, basically, since we have measures that can tell us how to compare players to average (where 100 is average) that are consistent (if not perfectly linearly related) to run scoring, we can actually tease out individual records from this exercise.  We want to know what a player would be like if he were on an average team.  We could actually do it with a replacement-level team instead, but average will work fine well.

So now, for pitchers, we figure out the record.  It's easy:  divide the number of innings pitched by 9.  This will be the number of decisions.  Then take 10000/(ERA+) [or, alternately, just use ERA-, which needn't be adjusted].  Take this number and insert it for "x" in this formula:

    100^2     
100^2 + x^2

Now, multiply by the number of decisions.  This gives you a number of wins.  You can get "losses" by subtracting wins from decisions, if you felt so inclined.

So, let's look at two pitchers:  Justin Verlander in 2011 (251 IP, 172 ERA+) and Justin Verlander in 2012 (238.1 IP, 160 ERA+).

2011:
10000/172 = 58
100^2/(100^2+58^2)=.748
251/9=27.9
.748*27.9=20.9
2011 Verlander, by this method, "went" 20.9-7.0

2012:
10000/160 = 63
100^2/(100^2+63^2)=.716
238/9=26.4
.716*26.4=18.9
2012 Verlander, by this method, "went" 18.9-7.5

So, let's do hitters.  They're pretty much the same, except that we use OPS+ or wRC+, and we're adjusting the numerator and denominator.  They'll do different things:  wRC+ will include SB and some other offensive events (including sacrifices, double plays, etc.); OPS+ will only consider hitting properly.  But they're basically the same.  Still, I'll show them separately, because they're just different enough to cause a kerfuffle.  For OPS+:

First, we take batting outs (AB-H) and divide by 25.5 to get the number of "decisions."  Then, we just plug like we did last time, with OPS+ standing in for x, but this time, it looks like this:

       x^2      
x^2 + 100^2

Then, we multiply by "decisions."  Here are two hitters, Miguel Cabrera in 2011 (572 AB, 197 H, 179 OPS+), and Miguel Cabrera in 2012 (622 AB, 205 H, 165 OPS+):

2011:
572-197=375
375/25.5=14.7
179^2/(179^2+100^2)=.762
.762*14.7=11.2
2011 Cabrera, by this method, "went" 11.2-3.5

2012:
622-205=417
417/25.5=16.4
165^2/(165^2+100^2)=.731
.731*16.3=12.0
2012 Cabrera, by this method, "went" 12.0-4.4

-------------------
Aside:
I'm gonna take this opportunity to say a word about replacement level.  A replacement level player on an average team will be considerably better than replacement level.  That just makes sense, doesn't it?  Since we're only comparing offense or defense, and making the other average, we will get the overall to be higher than .307, which is what we used as replacement level.  By this method, a replacement level offensive player would be stuck into this formula:

80*80/(80*80+100*100)=.390

A pitcher actually has a different replacement level, for this exercise, since:
100*100/(100*100+120*120)=.410
Personally, I don't see this as any reason to really care, because no one pitches enough innings for this to even make up a full win.  If you feel differently, please feel free to do the math to normalize this discrepancy.  Otherwise, keep in mind that this is just a fun, silly exercise by a person who only took one math class in college.

As you can see, the result is not .307, but .390.  Of course, this means that, comparing, say, 2012 Miguel Cabrera to replacement level, we'd do (replacement level) * (number of "decisions"), and then subtract that number from Cabrera's own wins.  In other words:

.390*16.3=6.4
11.9-6.4=5.5 "Wins Above Replacement"

Of course, there's another alternative.  We could have put Cabrera on a team that gave up runs at a replacement-level rate, and then simply subtracted wins at a rate of .307, our initial rate.  Like this:

165*165/(165*165+120*120)=.654
.654*16.3=10.7
.310*16.3=5.0
10.7-5.0=5.7 "Wins Above Replacement"

I've been working off the first method, but if you were to work by the second method, I wouldn't begrudge you.  It's probably actually a little better.  A little cleaner for the comparison to replacement, anyway.  But it's up to you.
End of Aside
-------------------

Finally, we'll look at what it looks like if you use wRC+, instead of the baseball-reference stats.  We'll use two players:  Ryan Braun in 2011 and Ryan Braun in 2012.  In this method, we look at all the outs the offensive player made, instead of just batting outs.  So the formula looks like this.

First, we figure total outs, by taking batting outs (AB-H), like before, and adding GDP, SH, SF, and CS.  Then we divide that by 27 to get "decisions."  The rest of the formula is identical to the OPS+ version.  So here's Brauny.

2011:
563-187+9+3+0+6=394
394/27=14.6
173^2/(173^2+100^2)=.750
.750*14.6=11.0
2011 Braun, by this method, "went" 11.0-3.6

2012:
598-191+12+5+0+7=431
431/27=16.0
162^2/(162^2+100^2)=.724
.724*16.0=11.6
2012 Braun, by this method, "went" 11.6-4.4

So, there you go.  You can see that we can, pretty easily, produce a "pitcher-like" record for an offensive player.  Obviously, it's not on the same scale quite, since even top players end up with under 20 "decisions," making comparisons difficult.  But it's still fun, I think, to look at.

So now, we get to imagination land.  How would I change this, if I could, to make it more like actual WAR?  Well, first of all, I would want a defensive system.  What we'd need to develop, of course, is a system by which we measured, basically, the number of "plays" that a player made (or perhaps better, runs saved on plays made, or whatever), relative to the expected number for his position, just as we would have for OPS+ or ERA-.  Once we have that number relative to 100, just as we do for other parts of the game, we can determine the number of decisions and then the number of wins.

Of course, it would be silly to have a number of defensive wins and losses that equalled 162, as well as a number of offensive and pitching wins and losses.  So what do we do about it?  Well, for my money, we would divide the offensive number by two, the defensive number by six, and the pitching number by three.  That would give us a much better basis for comparison.  We could actually see this already.  Take Cabrera in 2011 and Verlander that same year.  If we take Cabrera's record in half, we get 5.6-1.8; Verlander's as a third and we get 7.0-2.3.  Those are a lot more comparable, and would be even moreso if we were able to add in Cabrera's defensive wins and losses.  We'd see something much different from what we're used to seeing.

There is, of course, one problem that I must address.  If one were to implement the system I just suggested with, say a shortstop who was slightly below average fielding and hitting (let's say a 99 in each), he would grade out as a below-average player.  However, everyone knows that a shortstop who is basically average defensively and basically average offensively is a HUGE asset.  This is why, perhaps, it would be good in creating a defensive system to compare the runs saved, not to position, but to all positions on the field.  That would make shortstops automatically very valuable, while it would make first basemen very low in value.  But that's just an idea.  As far as I know, there's no such stat out there, so maybe that's another project for me.  But I doubt it.

So, that's my big brainstorm.  If you made it this far, wow.  Just wow.  Because I'm really, really impressed.  It was a ridiculously long post.  But I hope you enjoyed it.  Suggestions?

Thanks to baseball-reference and fangraphs for the stats in today's post!

Wednesday, January 2, 2013

Happy New Year... And Hall of Fame Relievers

It's Baseball Hall of Fame season, which is typically a very active time for me on this blog.  Well, it's been a really, REALLY busy few weeks, so I haven't done as much as I'd like.  But suffice it to say that I'm disappointed that it's likely no one will get election to the Hall this year.  I mean, I'm not the kind of person who says that Jack Morris should make the Hall of Fame.  But here's the thing:  Jack Morris was better at baseball than most people are at ANYTHING, and yet people scream and shout about how he doesn't belong.  Now, if I had a ballot, I would not vote for Morris.  I don't think he's good enough to make the Hall.  But if he did, I'd be very, very happy for him and for all the Tigers fans out there who have seen so many players who are above the Hall benchmark fail to be elected.

But anyway, today, Poz posted an article that discusses the candidates he didn't vote for, but who merit more consideration.  Well, I'm happy to say that one of the sections struck a chord with me:  the section on Lee Smith.  I thought to myself, would I vote for Lee Smith?  Now, with a ballot as crowded as this year's, the answer is "no."  But, if there were unlimited slots, would I?  I don't know.  So, I devised a way to figure it out.

People (like Poz) talk about how saves are too one-dimensional a stat.  I agree.  Especially when we're comparing people to starters (as we do in HOF voting).  So how do we account for this?  I think it's actually pretty easy.

First, we look at only two statistics for the pitcher:  ERA+ and Innings Pitched.  Normally, I'm more of a fan of ERA-, but I'll use the more commonly-known baseball-reference stat (speaking of which:  all stats courtesy of that wonderful site).  And I use Batters Faced for most of the silly little things I do with pitchers, but in this case, IP is necessary.

Anyway, we first convert ERA+ to ERA-, which is easy, and necessary.  ERA+ measures how much higher the league ERA was than the pitcher's (adjusted for ballpark).  What we need to know is the inverse (in other words, how much lower was the pitcher's ERA than the league, adjusted for ballpark).  Here it is:
10000/ERA+
It's that easy.  So we then have that number.  And we'll figure out a Pythagorean winning percentage, based on an average offense.  It looks like this:
100^2/(ERA-^2+100^2)
Now, we have a winning percentage.  Let's keep that in our back pockets.

Next, we take the innings pitched, and we divide by nine.  Why?  Because, roughly every nine innings, there's a decision.  Look at individual pitchers (starters, preferably), if you want.  Divide their career innings by nine.  Usually, you'll find that they have roughly nine times as many innings pitched as decisions.  If that's not good enough proof for you, go ahead and pick a random team in history.  Divide their number of Innings Pitched by the number of Games Played.  You will usually find that the answer hovers between 8.8 and 9.2 - which is good enough for me to just call it nine.

So anyway, we now have a number of "decisions" and a "winning percentage."  Now, just multiply them together.  That gives us a number of "pitcher wins" for these players who usually don't really have those to look at!

This gives us a nice starting point, actually.  But we can go a step further, of course.  We simply take the decisions, and subtract the wins.  That gives us losses, because that's important to know, too.  Then, we use one of my favorite Bill James tools:  Fibonacci wins.  We take:
Wins*Winning%+(Wins-Losses).  This helps us account for both the raw total of winnings, and the percentage of the time the player won.

Anyway, I did this for eleven relievers, who are considered among the best of all-time.  Why eleven?  Because these are the eleven relievers who are either in the Hall of Fame, or I have heard an argument for belonging in the Hall of Fame.  Here they are, presented with their "record," as well as Fibonacci wins (and ordered by the latter).

Hoyt Wilhelm:  171.2-79.2; 209.1
Dennis Eckersley:  209.4-155.6; 173.9
Mariano Rivera:  109.7-25.8; 172.6
Goose Gossage:  123.3-77.7; 121.3
Billy Wagner:  78.0-22.3; 116.4
John Franco:  90.8-47.7; 102.6
Rollie Fingers:  111.6-77.5; 99.9
Lee Smith:  91.0-52.2; 96.6
Dan Quisenberry:  78.9-37.0; 95.6
Trevor Hoffman:  80.5-40.5; 93.6
Bruce Sutter:  75.1-40.6; 83.3

Obviously, this is overly simplistic.  It takes a lot to say that you can boil things down to one number (as much as we all try to do it).  But at the end of the day, when it comes to the Hall of Fame, there are only two options:  in or out.  That's a binary decision.  Binaries are numbers.  So you have to be able to put a number on it.  And this is a pretty good place to start, if you ask me.

As you can tell, innings pitched is skewed for Eckersley because of his years as a starter.  But so what?  He did that pitching, as well.  And when you factor it all in, he's roughly as good as Mariano, which sounds about right to me.  Wilhelm's HUGE number of innings keeps him at the top of the group, which sounds about right to me.  And frankly, I'm not sure if I could vote for anyone below Mariano - the gap seems to be in roughly the 150 Fibonacci win area.

But, back to the topic at hand, which is Lee Smith.  Fingers' induction has been much-maligned by many people.  But seeing Fingers, Wagner, and John Franco atop Lee Smith makes me fairly certain of this much:  I don't think I could vote for him.  He deserves to be remembered, so, like Jack Morris, I would never begrudge his election.  But, also like Morris, I just don't think the Hall of Fame is big enough to include not only Lee Smith, but all of the players who were better or roughly his equal.  I just don't think anyone wants a Hall of Fame with 10 relief pitchers - not yet, anyway.  Maybe in another 50 years, but not right now.  And if Smith is still one of the 10 best relievers of all-time in 50 years, then we can talk about it.  But for now, it's a no.