Thursday, October 4, 2012

Win Estimators, or Why Baseball and Football are Different

What would it take for you to believe that a Major League Baseball team went undefeated? I mean in terms of runs scored and allowed. Like, if those were the only two pieces of information you had, what would they need to be (or be like) for you to believe a season was undefeated?  I think, for me, it would take something pretty incredible - like allowing no runs for the entire season (or, like 10).  Because, even if a team scored like 4000 runs, but gave up 500... well, don't you think it's possible that they'd lose, I don't know... 2 or 3 games?  The basic Pythagorean formula (Runs squared over (runs squared plus runs allowed squared)) says a team like that would go 159.5-2.5 (I guessed two or three games before doing the calculation, by the way, so I'm pretty proud of that guess).  And that seems about right, doesn't it?  I mean, even with a run differential that big, you'd still expect to lose a game or two.  Which, in the scheme of a 162-game season, is nothing.  But the point is, the run differential it would take to believe in an undefeated baseball season is astronomical.  I mean, this hypothetical team, which averages a 25-3 game, could hypothetically lose 12-11 twice or thrice during the year, right?

But football is fundamentally different, because it takes place in a small sample size.  For example, if I told you a team scored 400 points on the season, and gave up only 50, well, you'd assume they went undefeated.  And you'd probably be right.  The pythagorean formula agrees with this one.  Because it predicts this team to go 15.75-.25... so yeah.  They'd probably go undefeated.

But what if we double their points allowed?  What if they scored 400, and gave up 100?  That's an average score of 25-10... but would they go undefeated?  The Pythagorean formula says they'd go 15-1.  My guess is, in an NFL where the average team scores 22 points/game (close to the historical average, and in fact just behind the average for 2011 of 22.2), that's  probably about right.  But, frankly, in a league where an average team scores 22 ppg, 400 points isn't that many (average team would score 352).  So we'd expect the offense to fail once in the season, even if the defense is tough.

Anyway, why are we talking about this?  I mean, who cares?

Well, I do.  Because here are some real numbers.  I'm going to list the team, their points scored/allowed, Pythag. record, and then actual record.  Here goes:
2007 Patriots - 589-274; 13.8-2.2; 16-0
1985 Bears - 456-198; 14.1-1.9; 15-1
1998 Vikings - 556-296; 13.1-2.9; 15-1
1972 Dolphins - 385-171; 12.2-1.8; 14-0
2008 Lions - 268-517; 2.8-13.2; 0-16
1976 Buccaneers - 125-412; .8-13.2; 0-14

What you see here is that it's basically impossible, by the Pythagorean formula, to ever expect an undefeated season in the NFL . . . or a winless one.  The reason is because of a quirk of the Pythagorean formula, in which the PSsq/(PSsq+PAsq) will only yield a 0 if the team scores no points, and will only yield 100 if the team allows no points.  But the truth is, teams do go undefeated.  So it makes no sense to use a quadratic equation when we know that football doesn't quite work that way.

So what do I suggest we do about this?  Well, it's a pretty easy solution, actually.  You go linear.  And how does one do that?  Like so:
Use the information we already have.
Figure out the number of points/game.
That's all you need.
Take the team's points differential.  Divide by 2*(ppg).  Add to half of the number of games in a season.  That's it.

For example, in 2007, all NFL teams scored 11104 points.  If we divide that by 32 (number of teams), by 16 (number of games), and then multiply by two (because two teams play in each game), we get 43.375 as the number of ppg.  The Patriots that year had a points differential of (589-274=)315 points.  315/43.375=7.26 wins, plus 8 (a half-season's worth) = 15.26 wins.  So, by my formula, we'd expect the 2007 Patriots to have gone 15.3-.7... which is much closer to their actual record of 16-0 than the Pythagorean expectation, which gave them less than 14 wins (13.8).

Here are the expectations for the other teams I mentioned:
1985 Bears - 14.0-2.0
1998 Vikings - 14.1-1.9
1972 Dolphins - 12.3-1.7
2008 Lions - 2.3-13.7
1972 Bucs - -.4-14.4

Yes, that is a negative expectation of wins for the 1972 Buccaneers.  They were that bad.  In every case, this linear method comes closer to the team's actual record (for the Vikings, it's one full win closer!), except the 1985 Bears, which my method misses by .1 wins more than the classic way of doing it.  Frankly, I don't really see how anyone could use the Pythagorean method when one could do this, which is just as easy, works the same for middle-of-the-pack teams, and works significantly better for teams at the periphery.

No comments:

Post a Comment