Your last post seems a much more thoughtful and balanced response. I agree with everything in it. I also use RE tables as a basis for linear weights to be able to project a player's future performance. I don't know whether Tango uses empirical data or a Markov chain to produce the RE tables that has presented. I use empirical data aggregated over 3 seasons and do not regress the data to any larger aggregation. There is absolutely no difference between the linear weights that Tango generates from his RE table and the linear weights that I generate from my RE table.
I suspect that you have not actually gone through the process of creating linear weights from an RE table or you would understand why there would not be a difference. Even if you have anomalous values in some of the base out states because of small sample size for those base out states, the resulting linear weights is unaffected. This is because when you are determining the value of an offensive (for example a single) you are multiplying the change in base out state value over the number of occurencesof that base out state and then adding the resulting values over all the base out states. Since the number of occurences of the anomalous base out states are small (that is why you are having the anomalies to begin with) their effect on the total run value of a single does not register within the level of precision used in linear weights. Try calculating a linear weights value with a manually altered value for the man on third 0 outs state and you will see what I mean.
So there is really no gain in regressing to try and remove those anomalies for those who create linear weights from RE tables. There is, however, a loss for people like Ub who are looking at the team level for explanations for how a team might be better or worse at creating runs. Regressing to remove the anomalies removes just the data he is looking for, i.e. the differences between that team and an average team. He still must investigate whether those differences in data are based on actual differences within the team or just on sample size but at least he has a starting point for his investigations.


Reply With Quote
Bookmarks