OK...doing it that way, I get the average PA all time is 11.071.
Which carries an expected variance of .335 * .665 / 11.071 or 0.020...I guess that makes sense...
Very counterintuitive that the average PA all time should be 11.1, but...the numbers work out...
That explains why the standard deviation of BA was so much higher when I included part time (<100 AB) seasons than when I didn't...because the all time average PA goes way down so the error term goes way up.
Omitting 100 or less PA seasons, my all time average PA is 265.6, making the variance of OBP .335 * .665 / 265.6 or 0.0008 (standard deviation drops from 0.141 to 0.028).
In this thread on your site, you use the average BIP of all the pitchers, which was ~3000. Using the above method on the Google Docs spreadsheet, I get an "average" of ~1500. Which "average" is correct?
Originally Posted by Tango Tiger
The correct "average" is 1490 (not 2910). But, it was a strange thing when I started running the correlations. The 2910 average actually yielded the consistent results according to the BIP correlation equation (that uses 3700 in that equation). I really ought to revist that.
When I look at pitchers with 5000 to 7000 BIP, the average, either way, is around 5900 (as we'd expect since the range of the numbers are very close). The SD is 1.50, which yields "5000" in the equation where I had "3700".
When I look at pitchers with 2500 to 5000 BIP, the averages are close either way you calculate it (3406 and 3548). In this case the SD is 1.4, which implies an r of close to .50. Meaning that the mean, around 3500, is also what goes in the correlation equation (where I'd have 3700).
When I look at pitchers with 500 to 2500 BIP, the averages are close (1039, 1290). The SD is 1.2, meaning the correlation equation gets around 2500, instead of the 3700.
As you can see, the correlation equation should have the value of somewhere between x=2500 and 5000, in the BIP/(BIP+x) equation. And the only way for me to get that is to figure the BIP average the usual way, and get 2910, even though the "right" way gives me the "wrong" answer.
Perhaps it's just me, but if the correct method is giving you a different answer than you expected, maybe your expectation is wrong.
I just realized (and correct me if I'm wrong here Tom)...
Back to the whole regression to the mean discussion...
The error variance found in that methodology using the binomial distribution needs to be unweighted. I usually calculate weighted standard deviations, but the expected error (with an average playing time found in that inverted way you showed above) assumes no weighting.
When I calculated the average PA with the 1/SUM(n/X) method I got 11-ish...and an error variance in OBP of roughly 0.02...which is a standard deviation of 0.141 which is the standard deviation you'd expect if you didn't weight it by PT and therefore the guy with 1 PA is just as significant as the guy with 750. The observed weighted standard deviation was 0.065 which is WAY lower than 0.141 which wouldn't happen if the weighted standard deviation naturally screened out some of the variance caused by small samples getting into the distribution.