Announcement

Collapse
No announcement yet.

The 'ten runs = 1 win' rule of thumb

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • The 'ten runs = 1 win' rule of thumb

    There's a good article by Phil Birnbaum giving a common-sense explanation why an additional ten runs in a team's record changes one loss to a win. You can access it from tangotiger's insidethebook.com website if you're interested.

    I've been fooling around with the Pythagorean formula and some other predictors, and I got a surprising result with simple run differential as a predictor: Runs - Runs Against.

    I pulled out all 154-game seasons (actually 152-156) and regressed WL% against the differential (D) and got

    WL% = .500 + .000653D

    with an R-square of 91.1% and a standard error for the slope of .02912.

    In other words, with a differential of 0, R = RA, you start off at .500, of course, and then an increase of one run in the differential is worth an increase of .000653 in WL%. If you multiply .000653 times 154, you get .100562. So if you have an increase of 10 runs, you get 1.00562 more wins.

    That R-square isn't great, but it's about the same as the 1.81 exponent version of the pythagorean predictor, 91.8%

    Regressing on wins themselves spells it out:

    W = 76.6 + .100D

    With the same number of runs as runs against, you'd expect to win half your games, about 77, and each additional run adds a tenth of a game.

    For 162-game seasons (160-164), the R-squares are a little smaller, 87.8%, but the equations make just as much sense:

    WL% = .500 + .000644D (.000644*162 = .104329)

    W = 80.9 + .105D

    The unexpected support for the 10-run-1-win rule is just icing. What surprised me was how well an utterly simple-minded linear expression explained data that we are pretty sure is not linear, over a wide range of eras, conditions, team quality, and run averages. If a rough back-of-the-envelope approximation can do this well, it sets the bar pretty high for formulas that actually claim to explain something.
    Indeed the first step toward finding out is to acknowledge you do not satisfactorily know already; so that no blight can so surely arrest all intellectual growth as the blight of cocksureness.--CS Peirce

  • #2
    Are you saying that 10 runs equals 1 win regardless of the run environment? I don't see a run environment in your data. If a team scores 700 and allows 700 in 154 games then is 10 runs added worth the same as a team that scores 800 and allows 800, or scores 600 and allows 600?

    In my first metric 14 years ago I used runs per team per game X 2 to be my standard unit of production, so in a league with 4.25 runs per game, 8.5 would be a "game worth of runs" even though it didn't quite produce a net win.

    I'll point out a few interesting things:

    1) I believe you'll find that a run "saved" is worth a little more than a run "added" . Look at the effect of a team that scores 700 and allows 700 and then save them 100 runs allowed, and give them 100 runs added.

    2) Does a team do better if they save 154 runs concentrated into a few games than if they are spread out? Take a team that scores 450 and allows 350 in 77 games, and then scores and allows 350 in 77 games. Compare that team to one that scores 800 and allows 700 over 154 games.

    3) Even though it takes more runs to add a win on a good team, those wins may have a higher marginal value in terms of getting to the post season. It might take 11 runs to add a win to a 90 win team but that win is more likely to get that team to the post season than adding a run to an 81 win team.

    Comment


    • #3
      Here is the article. Thought it would be nice for everybody to have access to it right away. First time it has made sense in a practical way to me!

      http://sabermetricresearch.blogspot....als-1-win.html
      1885 1886 1926 1931 1934 1942 1944 1946 1964 1967 1982 2006 2011

      1887 1888 1928 1930 1943 1968 1985 1987 2004 2013

      1996 2000 2001 2002 2005 2009 2012 2014 2015


      The Top 100 Pitchers In MLB History
      The Top 100 Position Players In MLB History

      Comment


      • #4
        Thank you, Matthew C.

        Brett:

        No, the only environmental consideration was games played.

        Let me point out that if someone asked me for a formula that would account for 90% of the variation in w-l from 1904 to the present, I would not in a thousand years come up with

        W = .5G + .1D.

        It's too crude to work, and it is mathematically all wrong for reasons you point out:

        Saving a run is more productive than gaining a run because R/(R-X) > (R+X)/R. This is actually a special case of the run environment issue.

        No doubt differences in run distribution will affect w-l with differential held constant.

        For what it's worth, the result for teams from 1961 to 1972 is WL % = .5 + .0007D and for teams from 1995 to 20005, = .5 + .000613D

        In terms of wins, it's W = 80.43 + .113D and = 80.94 + .099D respectively, with R-squares around .89 for all. Pretty close, for extremes.

        I should add, and you'll have to take my word for this unless you do it yourself, that the regressions look pretty good. First of all, they're unbiased, and the residuals are pretty normally distributed.

        If you look at the run differentials for the high and low scoring periods, the means of course are about the same, roughly zero. But the low scoring period is highly skewed, with a median of 14.5, while the median for the high scoring period is .5 runs. The minima are about the same, around -330, but the maximum for the low scoring period is 262, for high scoring, 309. The low scoring standard deviation is 106 runs, the high scoring 115.

        These disparities, plus, as the Birnbaum article indicates, the uselessness of most extra runs, may explain why the rule of thumb holds as well as it does.

        As I said, it's a quick and filthy procedure, and I wouldn't bother refining it.

        Its interest and value to me is as a benchmark for theoretically motivated formulas. If it can stand beside the refined pythagorean theorem, then that doesn't speak well for the latter, because we know the former is theoretically unsound and depends for its good results on behavior of the data within that range.

        In other words, if a theory can't cover the data with significantly better than 90% R square, we don't really need it, do we?
        Last edited by Jackaroo Dave; 05-02-2012, 08:44 PM.
        Indeed the first step toward finding out is to acknowledge you do not satisfactorily know already; so that no blight can so surely arrest all intellectual growth as the blight of cocksureness.--CS Peirce

        Comment


        • #5
          Keep in mind that WAR does not actually convert 10 runs into 1 win. WAR takes a players real runs added (which don't appear anywhere) and then converts them to a standard environment where 10 runs happens to be worth very close to 1 win for an average team. The "runs" you see in war are therefore "standardized" runs. Is it possible to find which run setting would create a perfect .5G + .1D = W formula?


          An advantage of the run to wins added formula over a pythagorian formula is that runs are quantized. You either get one or none. The pyth formula assumes that the run variable is continuous, which it nearly becomes over a long period of time.

          Anyway as I wrote earlier, I don't think we need to convert runs to wins. We just need to convert runs to "game equivalents". In 1997 for example there were 4.77 runs per game per team, or 9.54 runs per game.

          Comment


          • #6
            Originally posted by brett View Post
            Keep in mind that WAR does not actually convert 10 runs into 1 win. WAR takes a players real runs added (which don't appear anywhere) and then converts them to a standard environment where 10 runs happens to be worth very close to 1 win for an average team. The "runs" you see in war are therefore "standardized" runs. Is it possible to find which run setting would create a perfect .5G + .1D = W formula?


            An advantage of the run to wins added formula over a pythagorian formula is that runs are quantized. You either get one or none. The pyth formula assumes that the run variable is continuous, which it nearly becomes over a long period of time.

            Anyway as I wrote earlier, I don't think we need to convert runs to wins. We just need to convert runs to "game equivalents". In 1997 for example there were 4.77 runs per game per team, or 9.54 runs per game.
            James does this over and over in his historical abstract. This is his favorite hammer when he tries to convince you someone from the 1960s was a better hitter than someone from the 1930s. So and so put up as many runs as the average team would expect to score in 20 or 30 or whatever games. It makes across-era comparisons a little easier to make.
            Your Second Base Coach
            Garvey, Lopes, Russell, and Cey started 833 times and the Dodgers went 498-335, for a .598 winning percentage. That’s equal to a team going 97-65 over a season. On those occasions when at least one of them missed his start, the Dodgers were 306-267-1, which is a .534 clip. That works out to a team going 87-75. So having all four of them added 10 wins to the Dodgers per year.
            http://www.youtube.com/watch?v=p5hCIvMule0

            Comment


            • #7
              Originally posted by Matthew C. View Post
              Here is the article. Thought it would be nice for everybody to have access to it right away. First time it has made sense in a practical way to me!

              http://sabermetricresearch.blogspot....als-1-win.html
              Thanks for the link!

              I follow his logic and it's clever, I have to admit. The only flaw is that it misses the point regarding the 22.5% of the games decided by 1 run (and coversely the 68% which aren't), and thus the importance of 10 runs.

              The more teams score on average, the less likely that games will be decided by 1 run margins. This can be shown casually by just looking at the 1-run stats in base-ref by league by season comparing high scoring to low scoring environments.

              However, simple logic will do the same: in an extremely low scoring environment (think soccer), where teams often don't score at all, the most typical wins will be 1-run differences. Typical scores would be 1-0, 1-1, or 2-0, since each score would by itself be rare.

              In extremely high scoring environments, (think junior high baseball), scoring would occur often and in bunches. The odds of a game being decided by 1-run are drastically reduced the more runs that are being scored. Scores like 15-14, 13-12, 18-17, would be flukes, not common.

              What the writer of the article said is true, only to the extent that runs already approximate to 10 runs per game (they actually average slightly below that in the 1990's, which is the period he chose.)

              The other issue of course is that there's an underlying assumption that teams act the same way when they are ahead by 10 runs, tied or behind by 10 runs. Since this is pretty much not true (teams do play differently), one can't completely use the scoring stats to determine the 'real' value of 10 runs. This isn't to say that players 'play' differently, although to some extent, stats will tell us that some do. For certain, however, managers do 'manage' differently in how they use the bullpen, the starters, what calls they make and what choices they exhibit.
              Last edited by drstrangelove; 05-23-2012, 02:35 PM.
              "It's better to look good, than be good."

              Comment

              Ad Widget

              Collapse
              Working...
              X