Announcement

Collapse
No announcement yet.

Baseball Reference and Baseball Prospectus, ugh.

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • four tool
    replied
    I like them both, but much prefer BR, it's much more user friendly for one thing.
    And it is much easier to find out what the numbers mean. In BP for example, they have EQA and normalized stats--I don't understand why they need both.

    Leave a comment:


  • SABR Matt
    replied
    retrosheet.org acquired them on microfilm from the Hall of Fame...the Hall acquired them for its' hall of records some time back...I'm not sure from whom.

    Leave a comment:


  • weskelton
    replied
    Originally posted by SABR Matt View Post
    From 1882 to 1956, there exists a set of files that were recorded on square-paper by league officials meticulously recording the goings-on of each game.
    What is your source for the "square-sheets"? Are these the binders which were created by ICI as part of the compilation of the first MacMillan in the 60's, which I believe are stored in the library in Cooperstown.

    BTW, this sounds very cool!

    Leave a comment:


  • SABR Matt
    replied
    From 1882 to 1956, there exists a set of files that were recorded on square-paper by league officials meticulously recording the goings-on of each game. These records aren't as detailed as I might like, but for hitters they include AB, H, 2B, 3B, HR, BB, K, SB, CS, PO, A, E, fielding DP and for pitchers IP, H, R, ER, K, BB, HR, Decisions, Saves, PO, A, E, fielding DP for every game in major league history from 1882 onward.

    It's not PBP, but it tells you exactly when each guy played and what he did that day. That level of information opens up the door to MASSIVELY more detailed park analyses, strength of schedule analysis and other similar works. We're hoping it's done before the start of the regular season, 2008 at which point the data would become available at retrosheet.

    Leave a comment:


  • weskelton
    replied
    Matt,

    You've peaked my curiousity. Can you provide more detail on the daily summary project? I'm guessing this is data at a game-by-game level as opposed to play-by-play for the pre-retrosheet years. Would that be accurate?

    Leave a comment:


  • SABR Matt
    replied
    essentially this is what uberstats do, Tom...except they don't do it at a game by game level, which is how I'd like to do it once the daily summary project is finished (my partner in evil has 20 data entry teams hacking away at the 90 years worth of data as we speak...funded by David Smith and retrosheet.org, so you can thank them for their support). These daily summary files will be an absolute BONANZA for day by day sabermetric analysis, the improvement of context analysis and sabermetric fun in general.

    Leave a comment:


  • Tango Tiger
    replied
    Right, Matt is suggesting that once you've gone what you have to at the game level, then runs disappear, and all you have left are win% and "games" (with games being determined by whatever method is appropriate).

    Similarly, we can do the same thing with RC+ (runs created indexed to 100, with runs created being whatever your favorite method is) and "games". Again, once you have that, runs disappears, and you are left with your index.

    Leave a comment:


  • SABR Matt
    replied
    That's why I don't advocate aggregating in terms of RS though...I recommend aggregate W% figures weighted by PT and then converting to a run figure.

    Leave a comment:


  • Patriot
    replied
    As I understand Matt's "offensive equivalencies", he looks at a pitching staff that allows 4 runs in a 5 run context, and says that should produce a .610 W% (or whatever it is exactly; I just used a Pyth exp on 2). In a 5 run context, an offense that scored 6.25 runs would also produce a .610 W%, so 6.25 is the "offense equivalent" performance of the pitching staff.

    The W%s will work, I agree. The equivalent runs scored figures will be subject to the same issues when you aggregate them as the 1/ERA+ numbers are.

    Leave a comment:


  • Tango Tiger
    replied
    I should also add that this will barely make any difference anyway. We are really arguing on the technicalities here.

    But, the b-r ERA+ problem is a math problem, and is wrong.

    Leave a comment:


  • Tango Tiger
    replied
    I think what Matt is saying, is equivalent to Brett's position of weighting the index by playing time.

    That is, if you turn everything into a win%, you get the same scale for every game (or split or season). So, you can play at Coors one day, where the run scale is 6 runs per team, and play at the Astrodome another day where the run scale is 3 runs per team. But now, instead of adding each player's RC27 in the numerator and each league's run context in the denominator, you first convert each game into a win% (or in Brett's case, into an index).

    The point of Matt is that each game should be its own universe, and it shouldn't disproportionately impact the ERA+ or RC+ or whathaveyou as you add more games.

    So, index each game. Determine the weight of that index for the game.
    Repeat for all games.

    The index would either be a Pythag-based win%, or just the standard ERA+ type of scales.

    (With the understanding, always, that ERA+ and other types of +constructors should be player divided by league.)

    Leave a comment:


  • SABR Matt
    replied
    If you put both offense and defense into pythagorean context, the problem is gone, Patriot.

    On the pitching side, offensive equivalence begins with pythag...which is why I said it solved the problem.

    Leave a comment:


  • Patriot
    replied
    How does putting things in "offensive equivalents" solve Brett's issue when it exists for hitters as well?

    Suppose Player A creates 5 runs in a league with an average of 4, then 6 in a league with an average of 6. His relative figures are 1.25 and 1, which by Brett's weighting comes to 1.125. (5+6)/(6+4) = 1.1.

    Player B creates 6.25 against an average of 5, then 6 against an average of 6. His relative figures are 1.25 and 1, which again come to 1.125 by Brett's weighting. (6.25+6)/(5+6) = 1.114.

    Leave a comment:


  • SABR Matt
    replied
    You don't have that environment problem if you turn everything into offense equivalence. Pythag logic applies.

    In fact, I might be inclined to rate pitchers the same way I rate teams...PythagenMatt.

    Leave a comment:


  • Tango Tiger
    replied
    How about a pitcher at Coors? Let's say he pitches 90 innings, with an ERA of 3.0, while the rest of the league does 6.0, and away from Coors in 90 IP his ERA is 2.5 and the rest of the league is 4.0.

    So, his Coors ERA+ is 200 (or 50% my way) ad his away ERA+ is 160 (or 62.5% my way).

    A straight average of 50% and 62.5% would give you 56.25%, which is an ERA+ of 177.8.

    However, his seasonal ERA is 2.75 in a league of 5.00, meaning an ERA+ of 181.8 (or 55.0% my way).

    So, it's not even a question of the year-to-year change in run environment causing "problems", but simply the change in run environment for each start causing problems.

    Good job on Brett for bringing the issue up. You really need to do this at the game level. If you don't have that, then at some split level. If you don't have that, then at the seasonal level. If you don't have that, then the way b-r.com does it.

    Whatever you do though, you cannot base it on ERA+, but rather its recipricol.

    Leave a comment:

Ad Widget

Collapse
Working...
X