View Full Version : Pythagorean Theorem
02-28-2006, 09:25 PM
I've read a number of articles about Bill James Pythagorean Theorem Win Estimator. Usually they are trying to tweek the exponent to make the theorem estimate team winning percentage with less error. What these articles never really explain is what use a win estimator could possibly have. After all, we already know how many games the team won. When is there ever a situation where we know how many runs a team has scored and given up but don't know how many wins it had? So I am putting the question up to you all: What use could the Pythagorean Theorem possibly have?
02-28-2006, 09:45 PM
It has significantly more predictive value than wins/winning percentage have when it comes to figuring out how a team is likely to perform the next season...it's a better representation of the team's actual abilities than wins are (because a significant part of how many games you win is tied up into how your runs happen to cluster)...
Although my research into probability theory (The FSIA) suggests that there is also less predictive value in seasonal runs all by themselves than there is in looking at how teams performed versus each of the other teams they faced (you can find discussion on the FSIA in previous threads here in the sabermetrics forum).
The pythagorean equation also helps in giving direct mathematical significance to replacement level (I use it to define margins in terms of winning percentages for use in PCA ratings calculations). Team wins don't really help you in the analysis in margins or in the analysis of individual players...the pyth...does.
02-28-2006, 10:28 PM
Matt - I understand how runs scored and runs allowed could be elements in a formula that could predict a teams present talent level and future success. What I don't understand is how showing a correlation between winning percentage and that formula ( be it Pythagorean theorem or some other combining of runs scored and runs allowed) is supposed to justify that predictive ability. If you are spending a lot of time and effort adjusting the coefficient from 2 to 1.83 or some variable coefficient in order to REDUCE the difference between the number obtained from the Pythagorean theorem and actual winning percentage, how can you then turn around and say that the Pythagorean number is better at predicting future success BECAUSE it is different from actual winning percentage. Don't you see the logical contradiction in that.
02-28-2006, 11:11 PM
What I would say is that it is good in converting runs into wins in predictions. Meaning if I look at a team and think that it will score 800 runs and allow 700 runs using the pyth I could convert that into wins and losses based on a converter that is reasonably accurate. To me the number used is like linear weights its a league average, and like linear weights it can be off in specific situations. I remember doing them once and found the coefficients to be different for some teams and run environments in the same year.
Anyway back to usefulness, for conversion of runs to wins I think it is useful and I think it is better then doing it "free hand" and I have yet to find anything that is more accurate then pyth. Its not perfect but it gets you in the ballpark, which I guess if you are doing surgery isn't good enough but for most of us it is.
03-01-2006, 08:43 AM
The other thing is that it's easier to predict RS and RA with smaller error margins because they're larger sample sizes than wins and therefore less prone to luck factors (although still prone)...
And the reason we're trying to line up pythagorean W% estimates with real-world wins is that we're assuming that if we get it lined up closely, we'll have found something that is of more predictive use because it has a better handle on the logical truths about how run scoring produces wins (for instance, a point that is often made is that if the league scores one run every game (both sides combined), than whoever scores the run...wins the game and the pythagorean coefficient MUST be one, so you'll see variable pythagoreans that take into account that logical truth that as run scoring decreases, the pyth converges toward one.
03-02-2006, 10:43 AM
Couldnt one also use the pyth, combined with player projections for the upcoming season (like the ones ESPN does) to come up with a win/loss prediction for the upcoming season?
03-02-2006, 10:51 AM
Thats basically what we are talking about. Use what we think players will do convert them into runs tally them up and then use pyth to come up with a winning %. I did this very thing for the Cubs and I have them projected at around 85-89 wins, with the hopeful (best case scenario) side of me seeing around 91 wins. Of course that excludes any major changes in the Cub roster before or during the season.
03-02-2006, 01:41 PM
Yeah, I was bored last night so I did that same thing for the AL East. The projections I came up with were:
New York: 96-66
Tampa Bay: 69-93
I think I'll do the AL West next.
03-02-2006, 04:28 PM
My AL West projections...
03-02-2006, 07:01 PM
My AL West projections...
Mine came out pretty similar with the A's being the only real big difference.
03-04-2006, 09:14 AM
Interesting projections. You realize, of course, that you have by default projected the AL Central to play 114 games below .500 (unless the American League has a very lopsided winning percentage in interleague play)? That's quite a collapse from the 6 games below .500 level that they at last year. You may want to rethink things a little.
03-04-2006, 09:33 AM
Hasn't the American League had a very good winnning % against the NL lateley?
For instance last year the Indians went 15-3, and the White Sox went 12-6. Though I think the AL as a whole went 20 games over .500 against the NL last year. And I think overall it might be very close in terms of history.
03-04-2006, 11:31 AM
Interesting projections. You realize, of course, that you have by default projected the AL Central to play 114 games below .500
I'm curious, how did you arrive at this conclusion?
03-04-2006, 11:48 AM
I think I can answer that. In a 14 team league there will be 2268 games played. Every game must have a winner and a loser. So there will be 1134 winners and 1134 losers. Now of course interleague throws a monkey wrench in this bit, but lets ignore that for a second.
Right now you have two division with a total record of 788-670 leaving 810 games left for the central. So there record based on remainder of wins and losses would be 346-464. Or 118 games below .500.
Interleague complicates things because it accounts for about 250games in which a win and a loss would not be added to each league. Simply a win or loss would be added but not both. For instances last year the AL added 20 extra wins to its totals without adding 20 extra losses. The Indians took 15 games from the NL and by doing it against the NL no AL team recieved those losses. Without interleague play all leagues should have an overall record of .500 with interleague play its possible to have a record higher or lower then .500. for instance the AL last year had a .504 winning % because of interleague.
03-04-2006, 11:55 AM
Right now for your two divisions you really only have Tampa with a losing record, which means that the AL Central would have to be a really bad division to enable all those other teams to win all those games because they according to your records are not going to be beating each other. So they have to be beating someone and it looks like so far it would be the AL central and the NL.
03-04-2006, 01:42 PM
Well...I don't have the AL East doing quite as well as he does...
NYY - 92-70
BOS - 88-74
TOR - 85-77
BAL - 79-83
TBD - 65-97
But I do think the east and west will win a LOT of games against the NL and against the AL Central this year...Cleveland and Chicago are the only teams in the central I'm liking much...and Kansas City could be 60 games under .500 all by itself.
03-04-2006, 08:22 PM
Last year the AL Central was 222-228 when playing outside of its division. Matt's projections, while being far more reasonable than Kaliber's, would still require them to drop to 200-250 this year. Kaliber expects them to play at 166-284, an out-of-division rate that no division has ever come close to achieving. While the AL central has previously played worse than 200-250 in 2002 and 2003, the direction that it has been heading has been toward more wins, not fewer. It had 186 in 2002, 190 in 2003, 205 in 2004, and 222 last year.
03-04-2006, 09:23 PM
I expect a drop this year because (a) Chicago played way over its' head last year, (b) The AL East and AL West got noticeably better this year (c) almost all of the significant talent that either came from outside the majors or crossed from the NL to the AL ended up in the east or the west...not th3e central and (d) last year the Central played a relatively soft IL schedule...this year they're getting the NL Central essentially full time...and I think the NL Central is a very good division this year outside of Pittsburgh and Cincy.
03-04-2006, 09:35 PM
The NL east I think is still the best division in the league. I think they have beaten the snot out of the other divisions for a couple or three years now. On top of that I believe the NL central the last couple of years has also gotten beat consistently in interleague play. Last year I believe the central got saved by the West but I would have to check the results to be sure. Anyway what I'm saying is that I think in terms of quality the NL central will be somewhat weak. Meaning I don't think we will see 100 win juggernaut and it could end being a 89 win team taking the crown. I think Cincin and Pirates will be bad with a potential for the Cubs or Astros underachieving to a losing season and the Brewers the wild card with its youth and pitching staff. So to me the NLC has the potential to be really bad with I think the Cards about the only team safe enough to have winning season.
In otherwords I don't think the NL central is going to be a hard foe for the AL central, but will Cleveland go 15-3 again? I doubt it.
03-04-2006, 10:05 PM
The NL East is a soft division. They may have won a lot in IL play, but the Marlins are TERRIBLE this year instead of being decent, the Phillies are, if anything, a little worse this year, the Braves are probably going to lose a few more games this year, and the Nats were playing way over their head last year in the first half...I don't think that was at any time nor still is...the strongest division in baseball.
I think you'll see the ALC take a huge step back and the NLC take av step forward (Milwaukee got better, Houston got a tad worse, St. Louis is the same, and the Cubs are about the same...plus I think Pittsburgh will be better (though still bad).
Just my humble opinion at this point.
03-05-2006, 09:05 AM
Its probable that big declines in out-of-division wins are due more to injuries to key players rather than the outcome of off season player movement. Logically, most of the factors that determine team quality are relatively stable from season to season. Poor teams tend to remain poor, cheap owners tend to remain cheap, smart GM's tend to remain smart, good development programs tend to remain good. When a team makes its off season deals it tries to receive value for value; a player that will help its team to become better for a player whose absence will hurt its team the least. Sometimes this works out, sometimes it doesn't, but for the division as a whole these decisions probably even out pretty well. The biggest exception would be a small market team who loses a star player to free agency because it can't match the star's open market salary value. But the big thing that can't be planned for is injuries to key players.
As an example, I believe that the NL West in 2005 has the current record for biggest decline in wins out-of-division in a single year since the change to the unbalanced schedule (I admit that I haven't done a complete study on this, just a cursury examination). Their wins out-of-division declined from 210 in 2004 to 192 in 2005. The most likely reason for this is the injuries to the Los Angeles lineup with a result of an overall drop in team wins of 22 (from 93 to 71), and the injury to Barry Bonds with the result of an overall drop in team wins of 16 (from 91 to 75). So my best guess is that the AL Central will NOT experience a drop from 222 wins to 200 wins unless one or more teams in that division have injuries to several key players.
03-05-2006, 09:58 AM
I don't think you've done eonugh of a look to make that conclusion stick.
First of all...the average team-mobiity (in terms of how much teams move up or down in W% from season to season) is around .089 the last decade. That means if you played .500 ball last year...there's just as good a chance you'll play .589 or .411 ball this year...your belief that "good teams remain good and bad teams remain bad" is not supported by statistical evidence. In fact quite the opposite can often be true.
I'll grant you that normally, off-season player transactions are probably less important in that mobility than catrastrophic injury (the run of injuries to Mariner pitching prospects for instance has had a cumulative effect on the Mariners since 2002 and has led indirectly to their recent collapse...the 2003 Angels (the team that followed up the world champions) were riddled with catastrophic injuries to players like Glaus and their record showed it.
But all I need to do is look at the Florida Marlins to disprove this theory that good teams stay good and bad teams stay bad. The success of teams tends to ebb and flow...just because the AL Central has imrpoved in the last few seasons doesn't mean decline in unlikely.
03-05-2006, 03:25 PM
Matt - You do this all the time and it is really annoying. You either don't bother to read the words I have actually written in a post, or you choose to ignore them, or you are too ignorant to know the difference between what I actually have actually said and what you subsequently argue that I am supposed to have said. Please try to read the words that I write. I NEVER said that good teams stay good and bad teams stay bad. I said that many of the factors that make a good team good and a bad team bad stay stable from year to year. There is a big difference in those two statements. And we were talking about performance outside the division, not overall record. Since so many of the games are played against division opponents, changes in other division teams can magnify changes in a team's overall record. For instance Arizona went from 51 wins in 2004 to 77 wins in 2005 in overall record. How many of those extra 26 wins came from victories outside the division? Only 4 1/2, 21 1/2 wins of the extra wins came within the division from taking advantage of the problems that LA, SF and other division teams were having.
I did err in my previous post in saying that "teams making their off season deals try to receive value for value". That is actually still true but my implication was that the "value for value" was in terms of something that would eventually translate into on field performance. While that is still true for most teams, there are exceptions like the Florida ownership who have a philosophy of stripping a team of its on field value to line their own pockets. This has occured a few times in the history of professional baseball, but thankfully most owners want to field a competitive team.
You say that my conclusions aren't supported by statistical evidence but you won't do the work to show that there is evidence to contradict them. That evidence would be that there are other divisions that have had single year out-of-division declines in wins greater that the 18 that I cited for the NL West from 2004 to 2005. And that the biggest factor in those declines came from something other than injuries to key players. Making a straw man argument about overall changes in record is irrelavent to anything I said in my previous post.
03-05-2006, 04:29 PM
Perhaps when you calm down and stop acting like a complete wanker...you'll note this line:
"Poor teams tend to stay poor"
I read that as meaning bad teams stay bad...I understand now that you meant FINANCIALLY poor...but that's not how it came across.
Whatever it is you think I'm doing all the time, I still don't believe it's logically consistant that there isn't divisional mobility...divisions change strength pretty routinely in this game...and IMHO, the other AL divisions changed more for the good than the AL Central. If I'm wrong, I'll cheerfully admit it at the end of the year.