Announcement

Collapse
No announcement yet.

Sabermetric Book

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • As Andy explains it to me, it's not the average PA, but:
    3 / (1/PA1+1/PA2+1/PA3)

    So, if you had 3 guys, 10 PA, 100 PA, 1000 PA, the "Average" would be 27.
    Author of THE BOOK -- Playing The Percentages In Baseball

    Comment


    • Is that n / (SUM(1 / i))?

      I don't quite get why that's accurate.

      Comment


      • Pardon the interrogation, Tom...I am just trying to understand your methods fully (and I don't have my copy of THE BOOK on hand) to see if they make sense to me and to see if they can be applied to a different (but similar) mission.

        Comment


        • I just try to scrape by from what I learned myself from The Book's appendix. Andy does this for a living, so I put alot of faith in his knowledge.

          The basic idea is that if you have 10 PA and 1000 PA, that the amount of variance won't be the same as two samples of 505 PA each, that it would be equivalent to two players with 19.8 PA each.

          For example, the variance from 10 PA for a mean of .500: (.5*.5/10), and the variance from 1000 PA: (.5*.5/1000) would be the same as: (.5*.5/19.8) + (.5*.5/19.8)

          Dropping all the .5*.5 terms, we are left with:
          1/10 + 1/1000= 1/19.8 + 1/19.8
          Author of THE BOOK -- Playing The Percentages In Baseball

          Comment


          • That makes sense.

            so the "mean" PA is actually a geometric mean with general form:

            SUM((# = 1 to z) n#) / SUM((# = 1 to z) n# * (1 / X))

            Where:

            n# refers to the number of players with X plate appearances
            z refers to the number of unique PA counts in history

            If you have 5 data points instead of three and two of them had 100 PA while the other three had 1, 10 and 1000 respectively, your mean would be:

            5 / (1/1 + 1/10 + 2/100 + 1/1000) = 5/1.121 = 4.46

            That seems a rather incredible claim.

            I need to see if that actually works on a realistic sample.

            Comment


            • OK...doing it that way, I get the average PA all time is 11.071.

              Which carries an expected variance of .335 * .665 / 11.071 or 0.020...I guess that makes sense...

              Very counterintuitive that the average PA all time should be 11.1, but...the numbers work out...

              Comment


              • That explains why the standard deviation of BA was so much higher when I included part time (<100 AB) seasons than when I didn't...because the all time average PA goes way down so the error term goes way up.

                Omitting 100 or less PA seasons, my all time average PA is 265.6, making the variance of OBP .335 * .665 / 265.6 or 0.0008 (standard deviation drops from 0.141 to 0.028).
                Last edited by SABR Matt; 11-01-2007, 03:49 PM.

                Comment


                • Originally posted by Tango Tiger View Post
                  As Andy explains it to me, it's not the average PA, but:
                  3 / (1/PA1+1/PA2+1/PA3)

                  So, if you had 3 guys, 10 PA, 100 PA, 1000 PA, the "Average" would be 27.
                  In this thread on your site, you use the average BIP of all the pitchers, which was ~3000. Using the above method on the Google Docs spreadsheet, I get an "average" of ~1500. Which "average" is correct?

                  Comment


                  • The correct "average" is 1490 (not 2910). But, it was a strange thing when I started running the correlations. The 2910 average actually yielded the consistent results according to the BIP correlation equation (that uses 3700 in that equation). I really ought to revist that.
                    Author of THE BOOK -- Playing The Percentages In Baseball

                    Comment


                    • When I look at pitchers with 5000 to 7000 BIP, the average, either way, is around 5900 (as we'd expect since the range of the numbers are very close). The SD is 1.50, which yields "5000" in the equation where I had "3700".

                      When I look at pitchers with 2500 to 5000 BIP, the averages are close either way you calculate it (3406 and 3548). In this case the SD is 1.4, which implies an r of close to .50. Meaning that the mean, around 3500, is also what goes in the correlation equation (where I'd have 3700).

                      When I look at pitchers with 500 to 2500 BIP, the averages are close (1039, 1290). The SD is 1.2, meaning the correlation equation gets around 2500, instead of the 3700.

                      As you can see, the correlation equation should have the value of somewhere between x=2500 and 5000, in the BIP/(BIP+x) equation. And the only way for me to get that is to figure the BIP average the usual way, and get 2910, even though the "right" way gives me the "wrong" answer.
                      Author of THE BOOK -- Playing The Percentages In Baseball

                      Comment


                      • Perhaps it's just me, but if the correct method is giving you a different answer than you expected, maybe your expectation is wrong.

                        Comment


                        • I just realized (and correct me if I'm wrong here Tom)...

                          Back to the whole regression to the mean discussion...

                          The error variance found in that methodology using the binomial distribution needs to be unweighted. I usually calculate weighted standard deviations, but the expected error (with an average playing time found in that inverted way you showed above) assumes no weighting.

                          When I calculated the average PA with the 1/SUM(n/X) method I got 11-ish...and an error variance in OBP of roughly 0.02...which is a standard deviation of 0.141 which is the standard deviation you'd expect if you didn't weight it by PT and therefore the guy with 1 PA is just as significant as the guy with 750. The observed weighted standard deviation was 0.065 which is WAY lower than 0.141 which wouldn't happen if the weighted standard deviation naturally screened out some of the variance caused by small samples getting into the distribution.

                          Comment

                          Ad Widget

                          Collapse
                          Working...
                          X