No announcement yet.

Baseball Hacks?

  • Filter
  • Time
  • Show
Clear All
new posts

  • Baseball Hacks?

    Anybody check out "Baseball Hacks," the new O'Reilly book? I've got it and am struggling through a few things. There are some errata on the publisher's site, and there are a few places where you need to be creative to look at data through 2005 instead of 2004, which (I guess because of the publishing schedule) is what a lot of the samples use.

    So I thought I'd start a new thread to see what everybody else's experiences with the book are like.

    The weirdest problem I've had so far is in hack #14. When I follow the instructions (I'm sure I'm doing something wrong), I end up downloading a ton of zip files that I can't open--when I click on them to unzip I get a message that they're unreadable and may be corrupt.

  • #2
    From the preview on Amazon it sounds like an interesting book. A primer for how to manipulate databases using baseball stats. Always a troublesome thing. I might have to check it out if I can find it on the cheap.


    • #3
      Wow...sounds worth a purchase...I've ordered a copy just to see if he can teach me something new/useful.


      • #4
        I got it last week, it's a good book. Worth a purchase.

        In terms of hack #14, you can just manually get the files from retrosheet.

        There are a few sample hacks in O'Reilly's website (there are 75 of them in the book):


        • #5
          yeah...just you guys know...I have the book now...reading through it first before I attempt anything...but I gotta say...


          VERY important book for anyone wanting to get and make use of baseball data of all forms. This coming from a guy who has spent his tiem self-teaching in the ways of manipulating the baseball-databank.


          • #6
            I've started reading the book and while the overall idea of it is good it is in fact peppered with mistakes. Mistakes that if you are not fluent with the systems will cause one to waste time figuring it out ones own.
            For instance in Hack #11 there are at least 3 mistakes that I spotted. He uses runs instead of hits for average and TB. His code for AVG looks like this [R]/[AB] when obviously it should look like this [H]/[AB]. The biggest mistake on this hack though is in step three though. He has you picking "first" in the position category. This won't return the position with the most games per season. Instead it shows the first available position based on number assignment. 3 is firstbase, 4 is secondbase so on and so on. So if a player played 4 games at second and 125 games at OF this step will return 2B as the primary position. What one actually has to do is setup a query in which you have playerID, yearID, and G and then select max for games. This will return the position with the most games played in that season. You then can do another query in which you attach the position and everything else he describes in that hack.


            • #7
              Yes...there are mistakes and problems...some of them are crippling. It took me a LONG time to figure out that some of his commands are UNIX SPECIFIC and you have to download a series of programs for WINDOWS if you want to be able to use his hacks...for example. I still think it's worth having.


              • #8
                Ubi... there is a Errata on their website that explains all the typos.... it's useful to have.

                Matt..... what UNIX specific tools are you talking about? I don't remember any.


                • #9
                  the unzip utility is NOT native to Win have to get's native only to UNIX.

                  The rm command is part of the GNU32 package that is NOT part of windows...there were a couple of other commands that were in a set I downloaded to correct the problem that don't leap out at me.

                  Also...his code to make sense of PBP Event files doesn't translate correctly in a native unix have to change a couple of lines of text or it converts straight line feeds (\n) to carriage returns which adds to the file size and causes them to be unuseable.


                  • #10
                    ah, I see. Being taylored to a UNIX environment makes some since O'Reilly is their publisher, and they do a lot of UNIX stuff.



                    • #11
                      I've tried to to the hack with mysql, but I cannot get it to install correctly under windows XP, anyone have any tricks on installing this thing.

                      I'm actually very computer saavy, I have a computer science degree, well Associates. But still, cannot get the thing running.


                      • #12

                        you can't get mysql installed or something associated with the mysql hack?


                        • #13
                          Cult: try installing InstantRails . That'll give you a complete install of Apache, MySQL, PHP.
                          Author of THE BOOK -- Playing The Percentages In Baseball


                          • #14
                            not from the Hack book but a tidbit I thought I would sure with you all.

                            Lets say you are perusing Retrosheet and you see some data on a page you wish to play with but you are not all that familiar with manipulating the raw data or simply do not wish to jump through all those hoops when all you want to do is play with the numbers somebody else has already jumped through hoops for. Well Microsoft excel and internet explorer allow you to import most data directly to excel with a right click of the mouse. This will work for most pages and it will put all the data in nice little boxes from which you can play with the data. Now then some of the data doesn't translate so well. For instance lets say you wish to use Todd Heltons career record at each ballpark. If you try to simply import the data it won't work. So what you have to do is to copy the data onto microsoft word. Save as a text document and then open that file in excel and it will translate the data into an excel format, and presto you can once again play with the data to your hearts content.

                            I know nothing major but it does allow one to use the data presented on the internet without having to know how to manipulate databases and PbP code, or simply wishing to play with useful data that wasn't in a file format.


                            • #15
                     of the hacks in the book is essentially that.

                              Getting data from web pages using Microsoft Excel Internet Utilities.


                              Ad Widget