I like them both, but much prefer BR, it's much more user friendly for one thing.
And it is much easier to find out what the numbers mean. In BP for example, they have EQA and normalized statsI don't understand why they need both.
Baseball Reference and Baseball Prospectus, ugh.
retrosheet.org acquired them on microfilm from the Hall of Fame...the Hall acquired them for its' hall of records some time back...I'm not sure from whom.
Originally posted by SABR Matt View PostFrom 1882 to 1956, there exists a set of files that were recorded on squarepaper by league officials meticulously recording the goingson of each game.
BTW, this sounds very cool!
From 1882 to 1956, there exists a set of files that were recorded on squarepaper by league officials meticulously recording the goingson of each game. These records aren't as detailed as I might like, but for hitters they include AB, H, 2B, 3B, HR, BB, K, SB, CS, PO, A, E, fielding DP and for pitchers IP, H, R, ER, K, BB, HR, Decisions, Saves, PO, A, E, fielding DP for every game in major league history from 1882 onward.
It's not PBP, but it tells you exactly when each guy played and what he did that day. That level of information opens up the door to MASSIVELY more detailed park analyses, strength of schedule analysis and other similar works. We're hoping it's done before the start of the regular season, 2008 at which point the data would become available at retrosheet.
Matt,
You've peaked my curiousity. Can you provide more detail on the daily summary project? I'm guessing this is data at a gamebygame level as opposed to playbyplay for the preretrosheet years. Would that be accurate?
essentially this is what uberstats do, Tom...except they don't do it at a game by game level, which is how I'd like to do it once the daily summary project is finished (my partner in evil has 20 data entry teams hacking away at the 90 years worth of data as we speak...funded by David Smith and retrosheet.org, so you can thank them for their support). These daily summary files will be an absolute BONANZA for day by day sabermetric analysis, the improvement of context analysis and sabermetric fun in general.
Right, Matt is suggesting that once you've gone what you have to at the game level, then runs disappear, and all you have left are win% and "games" (with games being determined by whatever method is appropriate).
Similarly, we can do the same thing with RC+ (runs created indexed to 100, with runs created being whatever your favorite method is) and "games". Again, once you have that, runs disappears, and you are left with your index.
That's why I don't advocate aggregating in terms of RS though...I recommend aggregate W% figures weighted by PT and then converting to a run figure.
As I understand Matt's "offensive equivalencies", he looks at a pitching staff that allows 4 runs in a 5 run context, and says that should produce a .610 W% (or whatever it is exactly; I just used a Pyth exp on 2). In a 5 run context, an offense that scored 6.25 runs would also produce a .610 W%, so 6.25 is the "offense equivalent" performance of the pitching staff.
The W%s will work, I agree. The equivalent runs scored figures will be subject to the same issues when you aggregate them as the 1/ERA+ numbers are.
I should also add that this will barely make any difference anyway. We are really arguing on the technicalities here.
But, the br ERA+ problem is a math problem, and is wrong.
I think what Matt is saying, is equivalent to Brett's position of weighting the index by playing time.
That is, if you turn everything into a win%, you get the same scale for every game (or split or season). So, you can play at Coors one day, where the run scale is 6 runs per team, and play at the Astrodome another day where the run scale is 3 runs per team. But now, instead of adding each player's RC27 in the numerator and each league's run context in the denominator, you first convert each game into a win% (or in Brett's case, into an index).
The point of Matt is that each game should be its own universe, and it shouldn't disproportionately impact the ERA+ or RC+ or whathaveyou as you add more games.
So, index each game. Determine the weight of that index for the game.
Repeat for all games.
The index would either be a Pythagbased win%, or just the standard ERA+ type of scales.
(With the understanding, always, that ERA+ and other types of +constructors should be player divided by league.)
If you put both offense and defense into pythagorean context, the problem is gone, Patriot.
On the pitching side, offensive equivalence begins with pythag...which is why I said it solved the problem.
How does putting things in "offensive equivalents" solve Brett's issue when it exists for hitters as well?
Suppose Player A creates 5 runs in a league with an average of 4, then 6 in a league with an average of 6. His relative figures are 1.25 and 1, which by Brett's weighting comes to 1.125. (5+6)/(6+4) = 1.1.
Player B creates 6.25 against an average of 5, then 6 against an average of 6. His relative figures are 1.25 and 1, which again come to 1.125 by Brett's weighting. (6.25+6)/(5+6) = 1.114.
You don't have that environment problem if you turn everything into offense equivalence. Pythag logic applies.
In fact, I might be inclined to rate pitchers the same way I rate teams...PythagenMatt.
How about a pitcher at Coors? Let's say he pitches 90 innings, with an ERA of 3.0, while the rest of the league does 6.0, and away from Coors in 90 IP his ERA is 2.5 and the rest of the league is 4.0.
So, his Coors ERA+ is 200 (or 50% my way) ad his away ERA+ is 160 (or 62.5% my way).
A straight average of 50% and 62.5% would give you 56.25%, which is an ERA+ of 177.8.
However, his seasonal ERA is 2.75 in a league of 5.00, meaning an ERA+ of 181.8 (or 55.0% my way).
So, it's not even a question of the yeartoyear change in run environment causing "problems", but simply the change in run environment for each start causing problems.
Good job on Brett for bringing the issue up. You really need to do this at the game level. If you don't have that, then at some split level. If you don't have that, then at the seasonal level. If you don't have that, then the way br.com does it.
Whatever you do though, you cannot base it on ERA+, but rather its recipricol.
