There's a good article by Phil Birnbaum giving a common-sense explanation why an additional ten runs in a team's record changes one loss to a win. You can access it from tangotiger's insidethebook.com website if you're interested.
I've been fooling around with the Pythagorean formula and some other predictors, and I got a surprising result with simple run differential as a predictor: Runs - Runs Against.
I pulled out all 154-game seasons (actually 152-156) and regressed WL% against the differential (D) and got
WL% = .500 + .000653D
with an R-square of 91.1% and a standard error for the slope of .02912.
In other words, with a differential of 0, R = RA, you start off at .500, of course, and then an increase of one run in the differential is worth an increase of .000653 in WL%. If you multiply .000653 times 154, you get .100562. So if you have an increase of 10 runs, you get 1.00562 more wins.
That R-square isn't great, but it's about the same as the 1.81 exponent version of the pythagorean predictor, 91.8%
Regressing on wins themselves spells it out:
W = 76.6 + .100D
With the same number of runs as runs against, you'd expect to win half your games, about 77, and each additional run adds a tenth of a game.
For 162-game seasons (160-164), the R-squares are a little smaller, 87.8%, but the equations make just as much sense:
WL% = .500 + .000644D (.000644*162 = .104329)
W = 80.9 + .105D
The unexpected support for the 10-run-1-win rule is just icing. What surprised me was how well an utterly simple-minded linear expression explained data that we are pretty sure is not linear, over a wide range of eras, conditions, team quality, and run averages. If a rough back-of-the-envelope approximation can do this well, it sets the bar pretty high for formulas that actually claim to explain something.
I've been fooling around with the Pythagorean formula and some other predictors, and I got a surprising result with simple run differential as a predictor: Runs - Runs Against.
I pulled out all 154-game seasons (actually 152-156) and regressed WL% against the differential (D) and got
WL% = .500 + .000653D
with an R-square of 91.1% and a standard error for the slope of .02912.
In other words, with a differential of 0, R = RA, you start off at .500, of course, and then an increase of one run in the differential is worth an increase of .000653 in WL%. If you multiply .000653 times 154, you get .100562. So if you have an increase of 10 runs, you get 1.00562 more wins.
That R-square isn't great, but it's about the same as the 1.81 exponent version of the pythagorean predictor, 91.8%
Regressing on wins themselves spells it out:
W = 76.6 + .100D
With the same number of runs as runs against, you'd expect to win half your games, about 77, and each additional run adds a tenth of a game.
For 162-game seasons (160-164), the R-squares are a little smaller, 87.8%, but the equations make just as much sense:
WL% = .500 + .000644D (.000644*162 = .104329)
W = 80.9 + .105D
The unexpected support for the 10-run-1-win rule is just icing. What surprised me was how well an utterly simple-minded linear expression explained data that we are pretty sure is not linear, over a wide range of eras, conditions, team quality, and run averages. If a rough back-of-the-envelope approximation can do this well, it sets the bar pretty high for formulas that actually claim to explain something.
Comment