I've read a number of articles about Bill James Pythagorean Theorem Win Estimator. Usually they are trying to tweek the exponent to make the theorem estimate team winning percentage with less error. What these articles never really explain is what use a win estimator could possibly have. After all, we already know how many games the team won. When is there ever a situation where we know how many runs a team has scored and given up but don't know how many wins it had? So I am putting the question up to you all: What use could the Pythagorean Theorem possibly have?
Announcement
Collapse
No announcement yet.
Pythagorean Theorem
Collapse
X
-
It has significantly more predictive value than wins/winning percentage have when it comes to figuring out how a team is likely to perform the next season...it's a better representation of the team's actual abilities than wins are (because a significant part of how many games you win is tied up into how your runs happen to cluster)...
Although my research into probability theory (The FSIA) suggests that there is also less predictive value in seasonal runs all by themselves than there is in looking at how teams performed versus each of the other teams they faced (you can find discussion on the FSIA in previous threads here in the sabermetrics forum).
The pythagorean equation also helps in giving direct mathematical significance to replacement level (I use it to define margins in terms of winning percentages for use in PCA ratings calculations). Team wins don't really help you in the analysis in margins or in the analysis of individual players...the pyth...does.
-
Matt - I understand how runs scored and runs allowed could be elements in a formula that could predict a teams present talent level and future success. What I don't understand is how showing a correlation between winning percentage and that formula ( be it Pythagorean theorem or some other combining of runs scored and runs allowed) is supposed to justify that predictive ability. If you are spending a lot of time and effort adjusting the coefficient from 2 to 1.83 or some variable coefficient in order to REDUCE the difference between the number obtained from the Pythagorean theorem and actual winning percentage, how can you then turn around and say that the Pythagorean number is better at predicting future success BECAUSE it is different from actual winning percentage. Don't you see the logical contradiction in that.
Comment
-
What I would say is that it is good in converting runs into wins in predictions. Meaning if I look at a team and think that it will score 800 runs and allow 700 runs using the pyth I could convert that into wins and losses based on a converter that is reasonably accurate. To me the number used is like linear weights its a league average, and like linear weights it can be off in specific situations. I remember doing them once and found the coefficients to be different for some teams and run environments in the same year.
Anyway back to usefulness, for conversion of runs to wins I think it is useful and I think it is better then doing it "free hand" and I have yet to find anything that is more accurate then pyth. Its not perfect but it gets you in the ballpark, which I guess if you are doing surgery isn't good enough but for most of us it is.
Comment
-
The other thing is that it's easier to predict RS and RA with smaller error margins because they're larger sample sizes than wins and therefore less prone to luck factors (although still prone)...
And the reason we're trying to line up pythagorean W% estimates with real-world wins is that we're assuming that if we get it lined up closely, we'll have found something that is of more predictive use because it has a better handle on the logical truths about how run scoring produces wins (for instance, a point that is often made is that if the league scores one run every game (both sides combined), than whoever scores the run...wins the game and the pythagorean coefficient MUST be one, so you'll see variable pythagoreans that take into account that logical truth that as run scoring decreases, the pyth converges toward one.
Comment
-
Thats basically what we are talking about. Use what we think players will do convert them into runs tally them up and then use pyth to come up with a winning %. I did this very thing for the Cubs and I have them projected at around 85-89 wins, with the hopeful (best case scenario) side of me seeing around 91 wins. Of course that excludes any major changes in the Cub roster before or during the season.
Comment
-
Yeah, I was bored last night so I did that same thing for the AL East. The projections I came up with were:
Boston: 97-65
New York: 96-66
Toronto: 95-67
Baltimore: 84-78
Tampa Bay: 69-93
I think I'll do the AL West next.Hey umpire, he's got a horse in his pocket!!
Comment
-
Originally posted by SABR MattMy AL West projections...
Oakland: 96-66
Anaheim: 87-75
Seattle: 83-79
Texas: 79-83
Mine came out pretty similar with the A's being the only real big difference.
Oakland: 92-70
Anaheim: 90-72
Seattle: 85-77
Texas: 80-82Hey umpire, he's got a horse in his pocket!!
Comment
-
Interesting projections. You realize, of course, that you have by default projected the AL Central to play 114 games below .500 (unless the American League has a very lopsided winning percentage in interleague play)? That's quite a collapse from the 6 games below .500 level that they at last year. You may want to rethink things a little.
Comment
-
Hasn't the American League had a very good winnning % against the NL lateley?
For instance last year the Indians went 15-3, and the White Sox went 12-6. Though I think the AL as a whole went 20 games over .500 against the NL last year. And I think overall it might be very close in terms of history.
Comment
-
Originally posted by misterdirtInteresting projections. You realize, of course, that you have by default projected the AL Central to play 114 games below .500Hey umpire, he's got a horse in his pocket!!
Comment
-
I think I can answer that. In a 14 team league there will be 2268 games played. Every game must have a winner and a loser. So there will be 1134 winners and 1134 losers. Now of course interleague throws a monkey wrench in this bit, but lets ignore that for a second.
Right now you have two division with a total record of 788-670 leaving 810 games left for the central. So there record based on remainder of wins and losses would be 346-464. Or 118 games below .500.
Interleague complicates things because it accounts for about 250games in which a win and a loss would not be added to each league. Simply a win or loss would be added but not both. For instances last year the AL added 20 extra wins to its totals without adding 20 extra losses. The Indians took 15 games from the NL and by doing it against the NL no AL team recieved those losses. Without interleague play all leagues should have an overall record of .500 with interleague play its possible to have a record higher or lower then .500. for instance the AL last year had a .504 winning % because of interleague.
Comment
-
Right now for your two divisions you really only have Tampa with a losing record, which means that the AL Central would have to be a really bad division to enable all those other teams to win all those games because they according to your records are not going to be beating each other. So they have to be beating someone and it looks like so far it would be the AL central and the NL.
Comment
Ad Widget
Collapse
Comment