View Full Version : Regression Question
05-29-2006, 08:19 PM
When designing a regression that analyzes historical team performance, what variables are traditionally included as controls?
Right now I am planning on including payroll and managerial experience. Are there any other variables I should include? I considered including opponent's winning percentage, but there doesn't seem to be any good measures of that available...
Thanks in advance.
05-29-2006, 08:59 PM
What are you trying to find, specifically?
The teams that were "best managed"? The teams that were best run from the GM position? If you just want to know something about how good the teams on the field were, then neither of your proposed controls make sense...in the context of how "great" the modern Yankees are, it doesn't matter that they cost 200 million dollars to employ...that only matters if you're trying to evaluate their front office.
05-29-2006, 09:58 PM
Right now like SABR Matt said you need to widdle down the experiment. If you want to find out who the best teams were overall that shouldn't be too hard, runs, rbis, obp, etc.
But if you'd like to go beyond that, like you stated to managerial experiences then you have to take into account payroll and revenue. Like Matt said come up with exactly what angle your going to come from before you start finding controls, because right now you're covering too many aspects to be properly thrown into an experiment.
05-30-2006, 12:55 AM
My experiment (which was discussed on another recent post) is to examine how sabermetric teams have performed in comparison to traditional, scout-based teams.
I will be assigning a dummy variable to whether a team is sabermetric (I know some people object to such a black-and-white classfication, but please see my response in my other recent post).
The sabermetric variable is the variable of interest but, as stated above, I would like to know what other variables I should control for. Payroll and managerial experience seem obvious. What else is normally used in this sort of situation?
05-30-2006, 01:04 AM
Hmm...I think some sort of strength of schedule assessment is required...the tricky part is in defining how to calculate that. I have a method but it's not trivial or simple to reproduce (heck...I can't even write the program that does it...Randy Fiato (Admin of MLBCenter) wrote the program that solved the matrix that found the strength of schedule for every team.
05-30-2006, 01:30 AM
Thanks for the info Matt. Is there any way I can access strength of schedule data? Also, do you recommend using any other control variables?
05-30-2006, 01:39 AM
SoS data is available at ESPN's website...they have osmething called RPI which measures the SoS...dig aroudn in the standings pages and you'll find it.
I think they saved RPI info going back to 2004? It's a start anyway...the way the SoS component of RPI works is...the winning percentage faced by a team is 2/3 the weighted winning percentages of the teams they played (weighted by the number of games they played against each team), 1/3 the weighted W%s of the teams our team's OPPONANTS faced. Kinda complicated to calculate, but that's how it's done.
05-30-2006, 01:41 AM
Other possible control variables...I'm trying to think of other things that might interfere with this study...perhaps the average ages of the teams? No matter how well you run your organization...you go through cycles where you get too old and have to reload from the bottom...