Announcement

Collapse
No announcement yet.

Sabermetric Book

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SABR Matt
    replied
    I just realized (and correct me if I'm wrong here Tom)...

    Back to the whole regression to the mean discussion...

    The error variance found in that methodology using the binomial distribution needs to be unweighted. I usually calculate weighted standard deviations, but the expected error (with an average playing time found in that inverted way you showed above) assumes no weighting.

    When I calculated the average PA with the 1/SUM(n/X) method I got 11-ish...and an error variance in OBP of roughly 0.02...which is a standard deviation of 0.141 which is the standard deviation you'd expect if you didn't weight it by PT and therefore the guy with 1 PA is just as significant as the guy with 750. The observed weighted standard deviation was 0.065 which is WAY lower than 0.141 which wouldn't happen if the weighted standard deviation naturally screened out some of the variance caused by small samples getting into the distribution.

    Leave a comment:


  • SABR Matt
    replied
    Perhaps it's just me, but if the correct method is giving you a different answer than you expected, maybe your expectation is wrong.

    Leave a comment:


  • Tango Tiger
    replied
    When I look at pitchers with 5000 to 7000 BIP, the average, either way, is around 5900 (as we'd expect since the range of the numbers are very close). The SD is 1.50, which yields "5000" in the equation where I had "3700".

    When I look at pitchers with 2500 to 5000 BIP, the averages are close either way you calculate it (3406 and 3548). In this case the SD is 1.4, which implies an r of close to .50. Meaning that the mean, around 3500, is also what goes in the correlation equation (where I'd have 3700).

    When I look at pitchers with 500 to 2500 BIP, the averages are close (1039, 1290). The SD is 1.2, meaning the correlation equation gets around 2500, instead of the 3700.

    As you can see, the correlation equation should have the value of somewhere between x=2500 and 5000, in the BIP/(BIP+x) equation. And the only way for me to get that is to figure the BIP average the usual way, and get 2910, even though the "right" way gives me the "wrong" answer.

    Leave a comment:


  • Tango Tiger
    replied
    The correct "average" is 1490 (not 2910). But, it was a strange thing when I started running the correlations. The 2910 average actually yielded the consistent results according to the BIP correlation equation (that uses 3700 in that equation). I really ought to revist that.

    Leave a comment:


  • Vogon Poet
    replied
    Originally posted by Tango Tiger View Post
    As Andy explains it to me, it's not the average PA, but:
    3 / (1/PA1+1/PA2+1/PA3)

    So, if you had 3 guys, 10 PA, 100 PA, 1000 PA, the "Average" would be 27.
    In this thread on your site, you use the average BIP of all the pitchers, which was ~3000. Using the above method on the Google Docs spreadsheet, I get an "average" of ~1500. Which "average" is correct?

    Leave a comment:


  • SABR Matt
    replied
    That explains why the standard deviation of BA was so much higher when I included part time (<100 AB) seasons than when I didn't...because the all time average PA goes way down so the error term goes way up.

    Omitting 100 or less PA seasons, my all time average PA is 265.6, making the variance of OBP .335 * .665 / 265.6 or 0.0008 (standard deviation drops from 0.141 to 0.028).
    Last edited by SABR Matt; 11-01-2007, 03:49 PM.

    Leave a comment:


  • SABR Matt
    replied
    OK...doing it that way, I get the average PA all time is 11.071.

    Which carries an expected variance of .335 * .665 / 11.071 or 0.020...I guess that makes sense...

    Very counterintuitive that the average PA all time should be 11.1, but...the numbers work out...

    Leave a comment:


  • SABR Matt
    replied
    That makes sense.

    so the "mean" PA is actually a geometric mean with general form:

    SUM((# = 1 to z) n#) / SUM((# = 1 to z) n# * (1 / X))

    Where:

    n# refers to the number of players with X plate appearances
    z refers to the number of unique PA counts in history

    If you have 5 data points instead of three and two of them had 100 PA while the other three had 1, 10 and 1000 respectively, your mean would be:

    5 / (1/1 + 1/10 + 2/100 + 1/1000) = 5/1.121 = 4.46

    That seems a rather incredible claim.

    I need to see if that actually works on a realistic sample.

    Leave a comment:


  • Tango Tiger
    replied
    I just try to scrape by from what I learned myself from The Book's appendix. Andy does this for a living, so I put alot of faith in his knowledge.

    The basic idea is that if you have 10 PA and 1000 PA, that the amount of variance won't be the same as two samples of 505 PA each, that it would be equivalent to two players with 19.8 PA each.

    For example, the variance from 10 PA for a mean of .500: (.5*.5/10), and the variance from 1000 PA: (.5*.5/1000) would be the same as: (.5*.5/19.8) + (.5*.5/19.8)

    Dropping all the .5*.5 terms, we are left with:
    1/10 + 1/1000= 1/19.8 + 1/19.8

    Leave a comment:


  • SABR Matt
    replied
    Pardon the interrogation, Tom...I am just trying to understand your methods fully (and I don't have my copy of THE BOOK on hand) to see if they make sense to me and to see if they can be applied to a different (but similar) mission.

    Leave a comment:


  • SABR Matt
    replied
    Is that n / (SUM(1 / i))?

    I don't quite get why that's accurate.

    Leave a comment:


  • Tango Tiger
    replied
    As Andy explains it to me, it's not the average PA, but:
    3 / (1/PA1+1/PA2+1/PA3)

    So, if you had 3 guys, 10 PA, 100 PA, 1000 PA, the "Average" would be 27.

    Leave a comment:


  • SABR Matt
    replied
    Incidentally, the average PA for a batter-season all time is 182. Sure you want to base calculations on that? That doesn't make any sense. The binomial for 182 PA is like 0.035 and the all time std dev (I just calculated it) is 0.059.

    I think this data needs to be normalized before we start taking standard deviations of it.

    Leave a comment:


  • SABR Matt
    replied
    Also...there's significant collinearity between PA and OBP. Higher OBP = more PA. Isn't it rather self defeating to base your binomial error on the average batter's PT when the average batter's playing time is biased by how he performs?

    Leave a comment:


  • SABR Matt
    replied
    Explain to me how it doesn't do the same thing?

    The concept is the same...the goal being to define the sample size at which random error would fully define the observed variance...

    What you're doing makes sense, but it makes zero sense to me to find some average PA...PA aren't distributed normally or binomially or anything close to such...there are 500 guys in a 1000 person season who get less than 200 plate appearances and maybe 50 who get more than 600. It makes no sense at all to act as though an average PA value means anything.

    Leave a comment:

Ad Widget

Collapse
Working...
X