Pitching Pythagorean W-L% (?)

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Tyrus4189Cobb
    Gator wrastler
    • May 2007
    • 5586

    Pitching Pythagorean W-L% (?)

    I'm just sort of thinking aloud here. I really enjoy Pythagorean W-L% because it tends to be surprisingly accurate. Furthermore you can use the same equation in 1889 or 1922 or 1975 or 2010 unlike the nonsense that accompanies linear weights.

    Is there something similar out there (or possible) that could be computed for a pitcher given his run support and runs allowed? You could go a step further and adjust for team defense, but I just want to know if there's an empirical way to calculate the base % using only runs
    "Allen Sutton Sothoron pitched his initials off today."--1920s article
  • Pere
    One to Cross 'Em Up
    • Oct 2007
    • 2515

    #2
    You mean to project a pitcher's W/L record?

    The difference, obviously, is that pitchers typically appear in a number of games that 'don't count' toward their personal records--a team's RS and RA are always in 'decisions.'

    Still, what happens if one just does the math anyway? Presumably the correlation is reduced in the later years of shorter starts and specialty pitching.

    Comment

    • Bothrops Atrox
      IDC/ZRC/NJC*/*
      • Feb 2005
      • 31771

      #3
      Originally posted by Tyrus4189Cobb View Post
      I'm just sort of thinking aloud here. I really enjoy Pythagorean W-L% because it tends to be surprisingly accurate. Furthermore you can use the same equation in 1889 or 1922 or 1975 or 2010 unlike the nonsense that accompanies linear weights.

      Is there something similar out there (or possible) that could be computed for a pitcher given his run support and runs allowed? You could go a step further and adjust for team defense, but I just want to know if there's an empirical way to calculate the base % using only runs
      Tom Tango has done something like that over at The Book.
      1885 1886 1926 1931 1934 1942 1944 1946 1964 1967 1982 2006 2011

      1887 1888 1928 1930 1943 1968 1985 1987 2004 2013

      1996 2000 2001 2002 2005 2009 2012 2014 2015


      The Top 100 Pitchers In MLB History
      The Top 100 Position Players In MLB History

      Comment

      • drstrangelove
        Registered User
        • Apr 2011
        • 3765

        #4
        Originally posted by Tyrus4189Cobb View Post
        I'm just sort of thinking aloud here. I really enjoy Pythagorean W-L% because it tends to be surprisingly accurate. Furthermore you can use the same equation in 1889 or 1922 or 1975 or 2010 unlike the nonsense that accompanies linear weights.

        Is there something similar out there (or possible) that could be computed for a pitcher given his run support and runs allowed? You could go a step further and adjust for team defense, but I just want to know if there's an empirical way to calculate the base % using only runs
        I've done stuff like that:

        1) take a pitcher's season for runs allowed
        2) adjust runs allowed and project expected run support on a new team (e.g., take Feller from 1946 and put him on the 1998 Yankees, adjusting runs allowed for team defense, park and league factor, etc.)
        3) project a w-l record using the theorem

        I expected it would not actually work well in application. Then I took the pitcher's game logs, randomly applied the changes for runs allowed, slapped the revised game log on the other team's actual game schedule. The actual W-L came out really close to the projected one. I did this for multiple seasons (13) and I was off in total by .3%. Individual seasons were off by a larger % (7.5%.)

        Overall, I think using that unscientific approach, that pitcher specific luck is as much as +/- 7% over a selection of 13 seasons, but averages out over time.
        Last edited by drstrangelove; 11-18-2012, 11:17 PM.
        "It's better to look good, than be good."

        Comment

        • Jackaroo Dave
          Registered User
          • Jan 2012
          • 1228

          #5
          Or you can use ERA+ and 100* as the inputs, get the WL%, and multiply it times decisions to get the number of wins the pitcher "should have had." It's not, of course, but it does sort of level the playing field:

          W% = (ERA+)^2/(100+ERA+)"

          *Edit: Sorry, I was thinking of ERA+ as a percentage.
          Last edited by Jackaroo Dave; 11-19-2012, 02:13 AM.
          Indeed the first step toward finding out is to acknowledge you do not satisfactorily know already; so that no blight can so surely arrest all intellectual growth as the blight of cocksureness.--CS Peirce

          Comment

          • Tyrus4189Cobb
            Gator wrastler
            • May 2007
            • 5586

            #6
            Originally posted by drstrangelove View Post
            1) take a pitcher's season for runs allowed
            2) adjust runs allowed and project expected run support on a new team (e.g., take Feller from 1946 and put him on the 1998 Yankees, adjusting runs allowed for team defense, park and league factor, etc.)
            3) project a w-l record using the theorem.
            I'm following you but I don't know the math to match your description. Could you give me an example?

            Originally posted by Jackaroo Dave View Post
            Or you can use ERA+ and 100* as the inputs, get the WL%, and multiply it times decisions to get the number of wins the pitcher "should have had." It's not, of course, but it does sort of level the playing field:

            W% = (ERA+)^2/(100+ERA+)"

            *Edit: Sorry, I was thinking of ERA+ as a percentage.
            Computing for 2011 John Lackey, a 6.41 ERA (67 ERA+) who managed a 12-12 record due to Boston's explosive offense behind him

            PythW-L%= 67^2/(100+67)=26.88

            Based on what you're saying, his W-L% "should" have been 0.269?
            "Allen Sutton Sothoron pitched his initials off today."--1920s article

            Comment

            • brett
              Registered User
              • Jul 2006
              • 13897

              #7
              BBRef player value lists WAA WL% which is the W/L percentage a team would hypothetically have in games in which the player played. I am not sure if it is situational. If you take a player's innings divided by 9 and then give him half that total plus his WAA it might work. For example Gooden in 1985 pitched 276 2/3 innings or 30.74 games worth of innings. Half of that is 15.37, plus he was 9.8 WAA which if added to 15.37 would give him approx 25.2 wins and 5.5 losses.

              Comment

              • Tyrus4189Cobb
                Gator wrastler
                • May 2007
                • 5586

                #8
                Originally posted by brett View Post
                BBRef player value lists WAA WL% which is the W/L percentage a team would hypothetically have in games in which the player played. I am not sure if it is situational. If you take a player's innings divided by 9 and then give him half that total plus his WAA it might work. For example Gooden in 1985 pitched 276 2/3 innings or 30.74 games worth of innings. Half of that is 15.37, plus he was 9.8 WAA which if added to 15.37 would give him approx 25.2 wins and 5.5 losses.
                That's a hypothetical team, though. PythW-L deals with something he "should" have given his actual team. Plus it's probably easier because there are only so many runs to go around which is what makes regular Pyth so reasonably accurate.
                "Allen Sutton Sothoron pitched his initials off today."--1920s article

                Comment

                • Tyrus4189Cobb
                  Gator wrastler
                  • May 2007
                  • 5586

                  #9
                  Does anyone know how to get the run support for a pitcher in only the innings he pitched? Could run support per innings (RS/IP) work? It is the "runs scored per 27 outs while the pitcher was in the game as the pitcher."
                  Last edited by Tyrus4189Cobb; 11-19-2012, 01:44 PM.
                  "Allen Sutton Sothoron pitched his initials off today."--1920s article

                  Comment

                  • Jackaroo Dave
                    Registered User
                    • Jan 2012
                    • 1228

                    #10
                    Originally posted by Tyrus4189Cobb View Post
                    Computing for 2011 John Lackey, a 6.41 ERA (67 ERA+) who managed a 12-12 record due to Boston's explosive offense behind him

                    PythW-L%= 67^2/(100+67)=26.88

                    Based on what you're saying, his W-L% "should" have been 0.269?
                    Actually, I messed up again. Simplest route: ERA+ is the league ERA divided by the individual ERA expressed as a percentage. So the league ERA is .67 of Lackey's. So by the pythagorean formula, a team scoring .67 as many runs as the opposing team (e.g. Bozox vs league) would be expected to win .67^2/(1 + .67^2) = .3098, not what I told you before.

                    I'm really sorry for messing it up, because now it seems complicated, but it's really back of the envelope stuff and provides a limited insight about run support and league run environment.

                    Let's call ERA+ as a decimal fraction ERA%. Then just square ERA%, add 1, and divide into the original square.
                    Indeed the first step toward finding out is to acknowledge you do not satisfactorily know already; so that no blight can so surely arrest all intellectual growth as the blight of cocksureness.--CS Peirce

                    Comment

                    • drstrangelove
                      Registered User
                      • Apr 2011
                      • 3765

                      #11
                      Originally posted by Tyrus4189Cobb View Post
                      I'm following you but I don't know the math to match your description. Could you give me an example?
                      I'll use the example I made up. (Also, doing this from memory, so yell if you see a mistake.)

                      1) take Feller's actual record for 1946.

                      2) calculate what his runs allowed would have been had he played for the Yankees in 1998. (This is not meant to be perfect, just modeling how well the pyth works in real world usage. As you can see later, there is no adjustment for actual teams played or actual parks played, since that is not the purpose.) To do this step, use Feller's 1946 ERA+, recalculate to a 1998 ERA using the AL league ERA, adjust for current PF, multiply by innings to get earned runs, then adjust for the Yankees actual unearned run % in 1998 to gross up to runs allowed.

                      3) take Feller's actual 1946 game log and adjust runs allowed up or down as needed to force the recalculated runs allowed in step 2 above. You now have an 'adjusted' game log.

                      4) Take the Yankees actual team game log and compare to Feller's and determine wins / losses on a game by game basis. This has to be done inning by inning. That is the adjusted 'real' W-L record.

                      5) Take the calculated runs from step 2 (noting the innings pitched), divide and create a RPG allowed and a calculated number of games (e.g., 372 innings = 41.333 games.)

                      6) Take the actual total runs score by NYY in 1998, divide by the number of games played and create RPG for.

                      7) Multiple both RPG for and RPG against using the calculated number of games in step 5. You now have runs for, runs against and games (e.g., 41.333) to do pyth. calculate W-L.

                      I have found that the calculation in #7 is, on average, extremely like the actual figure derived in #4. That is, pitcher's expected wins, follow closely to actual wins.

                      My sample is only a few hundred games, so i'm not certain it's accurate, but I think it might be.


                      1) if the sample was sufficient, it implies that given a random pattern of starting pitcher game performances and a given average RPG for the pitcher's team, it should be possible to estimate accurately the pitcher's W-L recond over a sufficient number of seasons.

                      2) it implies that events such as, leaving a game with a lead, then not winning, getting knocked out early but not losing, leaving behind in the 7th but winning, etc., are simply a series of random events with a net sum in W-L of around zero.

                      3) it implies that the pyth theorem is more robust than some may think. It doesn't work just on completed games. It works on calculated games, if you will, simulated games as in parts of xx number of games equaling a calculated number of 'whole' games.

                      Obviously, the smaller the sample, the more likely you'll see a divergence from the expectation. There is luck involved, but not nearly as much from my small sample as I expected.
                      Last edited by drstrangelove; 11-19-2012, 06:15 PM.
                      "It's better to look good, than be good."

                      Comment

                      • Tyrus4189Cobb
                        Gator wrastler
                        • May 2007
                        • 5586

                        #12
                        This is more compulated than I thought it would be, but I appreciate the feedback.

                        I've been toying with a different approach using logarithms based solely on runs allowed and run support during the pitcher's stay (derived from RA/IP, unless there's something better out there). Using just these matches the traditional ones. I'm only looking for what the pitcher's win-loss% should have been given his runs allowed and run support. His actual performance has nothing to do with the number in my approach except for his ability to prevent runs. Correct me if I'm wrong, but drstrangeglove's method (did you invent that?) looks more like a neutralized stat instead of something that "should" be based on what occurred.

                        Less than two hours have passed on me working on this. In this short time the best number I've derived to compute w-L% solely on RA/RS is (logRA-logRS)^-0.85. In 2008, Ben Sheets allowed 76 runs with a support of 108 runs. His actual w-l was .591 and .548 based on mine. That's pretty close.

                        The fatal flaw with my equation is how it plays out for extremes. Pitchers who perform really well or poorly still end up with a w-l in the .500s because the exponent is so close to one. The formula only makes sense for pitchers hovering around a certain skill (like 95-135 ERA+).

                        I'm wondering if I should include league or teams runs since pythagorean W-L% is able to use runs, of which there are so many to go around. A team that scores 100 runs and allows 100 runs (400 and 400, 232 and 232, whatever) has a theoretical w-l% of .500.
                        "Allen Sutton Sothoron pitched his initials off today."--1920s article

                        Comment

                        • drstrangelove
                          Registered User
                          • Apr 2011
                          • 3765

                          #13
                          Originally posted by Tyrus4189Cobb View Post
                          This is more compulated than I thought it would be, but I appreciate the feedback.

                          I've been toying with a different approach using logarithms based solely on runs allowed and run support during the pitcher's stay (derived from RA/IP, unless there's something better out there). Using just these matches the traditional ones. I'm only looking for what the pitcher's win-loss% should have been given his runs allowed and run support. His actual performance has nothing to do with the number in my approach except for his ability to prevent runs. Correct me if I'm wrong, but drstrangeglove's method (did you invent that?) looks more like a neutralized stat instead of something that "should" be based on what occurred.
                          Yes, technically I did invent this method, fwiw. I've been building models of seasons, game logs etc for quite a while. I think it just occurred to me one day that if you could covert a season from one era to another, that you could covert each game within the season. The logical step was to see what happens when you overlay the conversion onto a real season.

                          The pythagorean theorem application was something I did before I started re-doing logs, but was only converting seasons. Once I started doing logs, comparing the two methods was a logical step.

                          I think I am doing a couple different things, but for this thread, I'd focus on the application step of the pitcher's log to the team log. What I think it means is that, e.g., if you take Koufax's 1965 season, as is, and apply it game for game, inning for inning, to the 1962 Mets, the 1927 Yankees, the 1966 Senators, the 1917 Red Sox, and let's say 12 other random teams, that you can get an overall W-L record that will closely match a calculated W-L record based solely upon his total runs allowed and a calculated runs scored for his team.
                          Last edited by drstrangelove; 11-19-2012, 08:35 PM.
                          "It's better to look good, than be good."

                          Comment

                          • Jackaroo Dave
                            Registered User
                            • Jan 2012
                            • 1228

                            #14
                            Originally posted by Tyrus4189Cobb View Post
                            This is more compulated than I thought it would be, but I appreciate the feedback.

                            Less than two hours have passed on me working on this. In this short time the best number I've derived to compute w-L% solely on RA/RS is (logRA-logRS)^-0.85. In 2008, Ben Sheets allowed 76 runs with a support of 108 runs. His actual w-l was .591 and .548 based on mine. That's pretty close.

                            The fatal flaw with my equation is how it plays out for extremes. Pitchers who perform really well or poorly still end up with a w-l in the .500s because the exponent is so close to one. The formula only makes sense for pitchers hovering around a certain skill (like 95-135 ERA+).
                            Any further progress, Tyrus?

                            I have a question, maybe helpful, maybe not. If you are working on RA/RS by taking the log and getting (logRA-logRS), shouldn't you also log the exponent and use (-0.85)*(logRA-logRS) to get the equivalent of (RA/RS)^(-0.85)? (Of course that may not be what you're trying to do at all, in which case I apologise for once again adding some dumb-ass.)

                            Frankly, I've never seen the form (logX-logY)^Z and don't know what it would work out to. But there are a lot of things I've never seen and don't understand.

                            As far as it not working for extremes goes, didn't Tom Tiger Tango observe that the pythagorean theorem works only in the narrow range you speak of?
                            Indeed the first step toward finding out is to acknowledge you do not satisfactorily know already; so that no blight can so surely arrest all intellectual growth as the blight of cocksureness.--CS Peirce

                            Comment

                            • Tyrus4189Cobb
                              Gator wrastler
                              • May 2007
                              • 5586

                              #15
                              Originally posted by Jackaroo Dave View Post
                              Any further progress, Tyrus?

                              I have a question, maybe helpful, maybe not. If you are working on RA/RS by taking the log and getting (logRA-logRS), shouldn't you also log the exponent and use (-0.85)*(logRA-logRS) to get the equivalent of (RA/RS)^(-0.85)? (Of course that may not be what you're trying to do at all, in which case I apologise for once again adding some dumb-ass.)

                              Frankly, I've never seen the form (logX-logY)^Z and don't know what it would work out to. But there are a lot of things I've never seen and don't understand.

                              As far as it not working for extremes goes, didn't Tom Tiger Tango observe that the pythagorean theorem works only in the narrow range you speak of?
                              Not much progress since, but I'm still dabbling with it. The bolded part made me laugh.

                              I'm not sure what Tango said about the extremes. I've only read his stuff about WAR and wOBA. From what I can tell, the extremes are impossible to compensate if we use only RS and RA. The exponent, being so close to one, neutralizes the numbers to somewhere in the .500s. One one hand, pitchers who do well by allowing fewer runs are hurt by the bigger difference created in the subtraction of logs. On the other hand, pitchers who sucked and/or had a lot of run support (Milt Pappas 1966, John Lackey 2011) benefit from the increased gap in log of Runs Allowed - log Runs Scored. If Tango mentioned the extremes, I definitely see why.

                              I'm going to tinker with your suggestion of "logging" the exponent and whatnot. Believe me, I'm such an amateur statistician that there is no way you can sound like a dumb-ass to me unless you bring up touchdowns or bogies
                              "Allen Sutton Sothoron pitched his initials off today."--1920s article

                              Comment

                              Ad Widget

                              Collapse
                              Working...
                              X