Announcement

Collapse
No announcement yet.

Significant problem with the baseball-databank database.

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Significant problem with the baseball-databank database.

    We've been over and over the rule book and determined that in fact no defensive out can be recorded that shuoldn't be awarded to some fielder as a putout...even Ks (C), interference (C), obstruction/fan interference (nearest fielder), baserunner makes contact with live ball (nearest fielder), baserunner passes another baserunner (nearest fielder)...you name it...we checked...

    And yet the most recent version of the baseball-databank release...and presumeably all others that preceded it do not show that IPOuts match total POs for every team. In fact over the course of the game's history, there've been 6607 more IPOuts than fielding POs...

    I know this seems like something kinda minor to be concerned about, but the errors start is recently as 1999, and in a few cases got WAY out of hand (there's a difference of 126 outs for one of the teams in 1889 for example).

    This poses a problem for someone like me who is trying to make it so that all outs are accounted for in a model of team defense...

    *sigh*
    Last edited by SABR Matt; 01-29-2006, 09:29 PM.

  • #2
    Well whatever the reason is the same discrepancy shows up at BRef and Retrosheet. Something is getting lost. for instance I looked at the 1961 Senators (first one with PBP) and on retrosheet it says that the OF'ers recorded 1054 outs, but it also has a breakdown for outfielders and it comes to 2 outs short. Now then Retrosheet and BREf are missing 22 outs for the 1961 Senators.

    But if we look at the PBP all the outs are accounted for. Looking at retrosheet and then comparing it to the PBP data I see that in the case of the 1961 Senators Retrosheet has done this (+ means to many outs given to that position, - means not enough given):
    Pitcher: +3
    Catcher: -6
    Firstbase: -17
    Second: -10
    Third: +9
    Shortsop: -3
    Outfield by position right on, but as a whole: +2
    Put it all together and retrosheet and Bref are short 22-24 outs.

    The next step would be to look at which fielders are being shorted or overcredited and possibly that can explain why it is doing this.

    Comment


    • #3
      so...the PBP event files are correct...but something about the loading software that adds it up for retrosheet into team records is screwing up and not counting outs correctly?

      Someone over there needs to look at this.

      According to my query, most of the teams are correct or very very close to correct...(within a few outs) from 1971 on...EXCEPT in the period from 1993 to 1998 where no PBP records exist...

      Damnit...I need those years to be filled in...stupid frackin' lawyers...

      Comment


      • #4
        I dug deeper in the 1961 and found that they under-reported Cottier's outs by 7, Johnson's by 2 and O'Connells by 1.


        This is a big pretty flub on retrosheets part, one that can have an impact on many peoples research.

        Comment


        • #5
          I sent a message to both Retrosheet and BRef. I got a response to from BRef that basically said there is nothing they can do.
          Retrosheet does not proof the fielding data in their files, so they
          don't update any changes based on that. The record is rife with
          errors like this. Balancing before the 1980's is nearly impossible
          and the original scorebooks are no help as they often have errors in
          them. Fielding data is very much a moving target.
          It doesn't really answer the concern because when you think about it they have probably most of the needed information in their event files all they need to do is update the files based on the event files but for some reason they don't.

          On top of that neither site really cautions you about this. Retrosheet when displaying splits will warn you that the data is incomplete yet they make no announcement about its fielding data. BRef doesn't warn you at all.

          Comment


          • #6
            Let me just say that the problems you guys have noted with retrosheet are just the beginning. I really can't say much more, but trust me, I wouldn't trust retrosheet for really specific endeavors. The PBP data has numerous problems.
            THE REVOLUTION WILL NOT COME WITH A SCORECARD

            In the avy: AZ - Doe or Die

            Comment


            • #7
              Originally posted by digglahhh
              Let me just say that the problems you guys have noted with retrosheet are just the beginning. I really can't say much more, but trust me, I wouldn't trust retrosheet for really specific endeavors. The PBP data has numerous problems.
              This is rather discouraging.

              I've been told by some sources who prefer to remain nameless that SABR is the only website with back-end data that's even remotely accurate because they're the only people working with Pete Palmer...but other than that...the data is loaded with problems...makes me quite a bit annoyed.

              Comment


              • #8
                Geez guys we are not electing a Pope or planning a coup of Cuba.

                Comment


                • #9
                  ?

                  You of all people should know why data integrity is important to someone like me. I can kludge over it if I have to, but this error pattern doesn't inspire much confidence in the final accuracy of anything I produce.

                  Comment


                  • #10
                    I'm talking about the
                    I've been told by some sources who prefer to remain nameless
                    nad
                    I really can't say much more,

                    Comment


                    • #11
                      Oh...I suspect we're both kinda talking about the same thng, and the reason for the guardedness is a feud between some folks over credit for research....it's a messy situation unfortunately, but best not to drag out the details here.

                      Comment


                      • #12
                        Could it be the zero or NULL thing again?

                        I found that in the times before sacrifices, some player team records have been filled in with NULLs, some with zeros. SELECT queries treat records differently sometimes,,, according to how columns are filled. I think zeros are better than NULLs in most cases. A pre sack player with a zero in the sack column gets compared with post sack players more fairly IMHO

                        Comment

                        Ad Widget

                        Collapse
                        Working...
                        X