# Sabermetric Book

Show 40 post(s) from this thread on one page
Page 9 of 9 First ... 789
• 11-01-2007, 04:45 PM
SABR Matt
OK...doing it that way, I get the average PA all time is 11.071.

Which carries an expected variance of .335 * .665 / 11.071 or 0.020...I guess that makes sense...

Very counterintuitive that the average PA all time should be 11.1, but...the numbers work out...
• 11-01-2007, 04:46 PM
SABR Matt
That explains why the standard deviation of BA was so much higher when I included part time (<100 AB) seasons than when I didn't...because the all time average PA goes way down so the error term goes way up.

Omitting 100 or less PA seasons, my all time average PA is 265.6, making the variance of OBP .335 * .665 / 265.6 or 0.0008 (standard deviation drops from 0.141 to 0.028).
• 11-01-2007, 06:21 PM
Vogon Poet
Quote:

Originally Posted by Tango Tiger
As Andy explains it to me, it's not the average PA, but:
3 / (1/PA1+1/PA2+1/PA3)

So, if you had 3 guys, 10 PA, 100 PA, 1000 PA, the "Average" would be 27.

In this thread on your site, you use the average BIP of all the pitchers, which was ~3000. Using the above method on the Google Docs spreadsheet, I get an "average" of ~1500. Which "average" is correct?
• 11-02-2007, 11:05 AM
Tango Tiger
The correct "average" is 1490 (not 2910). But, it was a strange thing when I started running the correlations. The 2910 average actually yielded the consistent results according to the BIP correlation equation (that uses 3700 in that equation). I really ought to revist that.
• 11-02-2007, 11:30 AM
Tango Tiger
When I look at pitchers with 5000 to 7000 BIP, the average, either way, is around 5900 (as we'd expect since the range of the numbers are very close). The SD is 1.50, which yields "5000" in the equation where I had "3700".

When I look at pitchers with 2500 to 5000 BIP, the averages are close either way you calculate it (3406 and 3548). In this case the SD is 1.4, which implies an r of close to .50. Meaning that the mean, around 3500, is also what goes in the correlation equation (where I'd have 3700).

When I look at pitchers with 500 to 2500 BIP, the averages are close (1039, 1290). The SD is 1.2, meaning the correlation equation gets around 2500, instead of the 3700.

As you can see, the correlation equation should have the value of somewhere between x=2500 and 5000, in the BIP/(BIP+x) equation. And the only way for me to get that is to figure the BIP average the usual way, and get 2910, even though the "right" way gives me the "wrong" answer.
• 11-02-2007, 08:34 PM
SABR Matt
Perhaps it's just me, but if the correct method is giving you a different answer than you expected, maybe your expectation is wrong.
• 11-07-2007, 06:24 PM
SABR Matt
I just realized (and correct me if I'm wrong here Tom)...

Back to the whole regression to the mean discussion...

The error variance found in that methodology using the binomial distribution needs to be unweighted. I usually calculate weighted standard deviations, but the expected error (with an average playing time found in that inverted way you showed above) assumes no weighting.

When I calculated the average PA with the 1/SUM(n/X) method I got 11-ish...and an error variance in OBP of roughly 0.02...which is a standard deviation of 0.141 which is the standard deviation you'd expect if you didn't weight it by PT and therefore the guy with 1 PA is just as significant as the guy with 750. The observed weighted standard deviation was 0.065 which is WAY lower than 0.141 which wouldn't happen if the weighted standard deviation naturally screened out some of the variance caused by small samples getting into the distribution.
Show 40 post(s) from this thread on one page
Page 9 of 9 First ... 789