Page 1 of 2 12 LastLast
Results 1 to 25 of 46

Thread: Correlation Between Stats and Runs, etc.

  1. #1
    Join Date
    Mar 2006
    Location
    Kentucky
    Posts
    800

    Correlation Between Stats and Runs, etc.

    What are the best stats, in terms of their correlation to winning %, runs scored, or runs allowed? I.E. what batting stat most closely correlates with runs scored? What pitching stat most closely correlates with K rate, BB rate, HR rate, BF, etc? What fielding stat (preferably individual) that most closely correlates with balls in play converted to outs and thus runs allowed? I hope you understand where I'm going with this. Thanks in advance.

  2. #2
    Join Date
    Aug 2005
    Posts
    13,874
    Blog Entries
    2
    Hopefully this will show
    Code:
    	TB	AVG	OBP	SLG	SEC	OPS	SLOB	AV+SE	A*SEC	OBP3+S	OP1.8+S	AV+SE/2	Eff	RC	OBP*1.8	ISO	ISO+OBP	R
    TB	1.000																	
    AVG	0.838	1.000																
    OBP	0.805	0.855	1.000															
    SLG	0.991	0.811	0.806	1.000														
    SEC	0.812	0.534	0.790	0.848	1.000													
    OPS	0.974	0.862	0.906	0.981	0.867	1.000												
    SLOB	0.969	0.862	0.915	0.975	0.868	0.999	1.000											
    AV+SE	0.913	0.747	0.901	0.933	0.961	0.964	0.965	1.000										
    A*SEC	0.908	0.732	0.894	0.929	0.966	0.959	0.962	0.999	1.000									
    OBP3+S	0.928	0.879	0.965	0.933	0.856	0.985	0.988	0.961	0.955	1.000								
    OP1.8+S	0.954	0.874	0.940	0.960	0.865	0.996	0.997	0.966	0.960	0.997	1.000							
    AV+SE/2	0.940	0.851	0.935	0.948	0.899	0.986	0.987	0.985	0.980	0.988	0.991	1.000						
    Eff	0.966	0.771	0.865	0.982	0.919	0.987	0.985	0.975	0.973	0.961	0.977	0.971	1.000					
    RC	0.977	0.885	0.907	0.970	0.837	0.992	0.993	0.948	0.944	0.981	0.990	0.979	0.970	1.000				
    OBP*1.8	0.805	0.855	1.000	0.806	0.790	0.906	0.915	0.901	0.894	0.965	0.940	0.935	0.865	0.907	1.000			
    ISO	0.920	0.580	0.653	0.947	0.888	0.893	0.884	0.888	0.892	0.816	0.856	0.853	0.944	0.864	0.653	1.000		
    ISO+OBP	0.958	0.752	0.863	0.976	0.930	0.983	0.981	0.978	0.977	0.957	0.973	0.968	0.999	0.964	0.863	0.946	1.000	
    R	0.929	0.834	0.904	0.926	0.853	0.960	0.962	0.944	0.940	0.960	0.964	0.963	0.944	0.962	0.904	0.831	0.941	1.000
    Basically what you want to look at is the bottom line which is the correlation of all the stats to runs. This by the way was for all teams from 1962 to 2003 with strike shortened seasons removed. If they didn't schedule a full schedule for the year or play a full schedule it was removed.

  3. #3
    Join Date
    Mar 2006
    Location
    Kentucky
    Posts
    800
    What is SLOB? Thanks by the way.

  4. #4
    Join Date
    Aug 2005
    Posts
    13,874
    Blog Entries
    2
    SLOB is slugging times on base percentage. SLOB is basically runs created.

  5. #5
    Ub, *great* presentation.

    And, interestingly, 1.8 * OBP + SLG has the highest correlation to runs scored. But, as we can see, even the plain old OPS works fine.

  6. #6
    Join Date
    Mar 2006
    Location
    Kentucky
    Posts
    800
    How do you make these charts? And can you make one for pitching statistics?

  7. #7
    Join Date
    May 2005
    Location
    Western Massachusetts
    Posts
    11,560
    Thanks a lot Ubi! I was surprised to see that many of the extremely advanced measures weren't much better than plain OPS, and that SLG actually correlated better than OBP. That's contrary to what I thought was true. Thanks!

  8. #8
    Join Date
    Aug 2005
    Posts
    13,874
    Blog Entries
    2
    Quote Originally Posted by Mr. Red
    How do you make these charts? And can you make one for pitching statistics?

    Lahman database, excel, and the code tagline for here.

    I looked into this a long time ago, I think in 2004 or so and had saved it. Never looked into pitching but I would guess the same if available would apply to pitching. Teams with the lowest obp and slg allowed would allow the least runs.

  9. #9
    Join Date
    Aug 2005
    Posts
    13,874
    Blog Entries
    2
    Quote Originally Posted by 538280
    Thanks a lot Ubi! I was surprised to see that many of the extremely advanced measures weren't much better than plain OPS, and that SLG actually correlated better than OBP. That's contrary to what I thought was true. Thanks!

    SLG correlating higher then OBP surprises most, but if you think about it makes sense. SLG in part has an element of OBP in it, plus it tells you what kind hits they were. Whereas OBP simply treats everything like a single and only counts hits and walks (plus HBP so on and so on).

    OPS correlating so high is one of the reasons why OPS got pushed and became popular. It is extrememly simply to find and for the amount of worked needed to obtain extremely accurate.

  10. #10
    Even if SLG correlates higher than OBP, it's 1.8 OBP + SLG that correlates the highest, not OBP + SLG.

    Also note that there is a lack of understanding of correlation. If all the teams build their teams so that the team OBP is all around the .320 to .340 mark, the regression won't be able to approach 1.00 as well as a teams that are built where HR are clustered.

    The reason that K/PA or K/BFP has a high year-to-year correlation is because the true distribution is so high.

  11. #11
    Join Date
    May 2005
    Location
    Where all students live...nowhere.
    Posts
    8,900
    Your point about clustering is definitely true Tango. I noticed when I was doing all of my work through correlations that if you had a distribution where there were clusters of data points at radically different places on your X/Y scatter, you'd get a line connecting your clusters with a very high correlation that meant nothing at all (because within any cluster the line tells you nothing.

    Gotta watch out when you have huge ranges of possible outcomes...the range itself can determine your correlation.

  12. #12
    Join Date
    Mar 2006
    Location
    Kentucky
    Posts
    800
    Quote Originally Posted by Ubiquitous
    SLG correlating higher then OBP surprises most, but if you think about it makes sense. SLG in part has an element of OBP in it, plus it tells you what kind hits they were. Whereas OBP simply treats everything like a single and only counts hits and walks (plus HBP so on and so on).
    Wouldn't it make the most sense to make a stat that was total bases (including BB, etc.) per plate appearance? And what is Lahman's database?

  13. #13
    What makes sense is Linear Weights and BaseRuns. Everything else doesn't. Lahman database is a free historical baseball database that you can download at http://www.baseball1.com , or you can see its implementation at http://www.baseball-reference.com .

  14. #14
    Join Date
    Apr 2006
    Location
    body in Chicago, heart in Cleveland
    Posts
    215
    Given the high correlation values across the board (and the statistical and conceptual colinearity), has anyone ever done a stepwise regression to start isolating which of the factors need to be isolated?
    Statistically Speaking

    The plural of anecdote is not data.

  15. Preventing Runs and correlation

    This might be worthy of a new thread but What is the correlation between some pitching stats and ERA? (ie SO, BB, GB/FB/LD/IFFB%, HBP, etc.)

  16. #16
    pizzacutter, you should read more about Linear Weights.

  17. Quote Originally Posted by Ubiquitous
    Hopefully this will show
    Code:
    	TB	AVG	OBP	SLG	SEC	OPS	SLOB	AV+SE	A*SEC	OBP3+S	OP1.8+S	AV+SE/2	Eff	RC	OBP*1.8	ISO	ISO+OBP	R
    TB	1.000																	
    AVG	0.838	1.000																
    OBP	0.805	0.855	1.000															
    SLG	0.991	0.811	0.806	1.000														
    SEC	0.812	0.534	0.790	0.848	1.000													
    OPS	0.974	0.862	0.906	0.981	0.867	1.000												
    SLOB	0.969	0.862	0.915	0.975	0.868	0.999	1.000											
    AV+SE	0.913	0.747	0.901	0.933	0.961	0.964	0.965	1.000										
    A*SEC	0.908	0.732	0.894	0.929	0.966	0.959	0.962	0.999	1.000									
    OBP3+S	0.928	0.879	0.965	0.933	0.856	0.985	0.988	0.961	0.955	1.000								
    OP1.8+S	0.954	0.874	0.940	0.960	0.865	0.996	0.997	0.966	0.960	0.997	1.000							
    AV+SE/2	0.940	0.851	0.935	0.948	0.899	0.986	0.987	0.985	0.980	0.988	0.991	1.000						
    Eff	0.966	0.771	0.865	0.982	0.919	0.987	0.985	0.975	0.973	0.961	0.977	0.971	1.000					
    RC	0.977	0.885	0.907	0.970	0.837	0.992	0.993	0.948	0.944	0.981	0.990	0.979	0.970	1.000				
    OBP*1.8	0.805	0.855	1.000	0.806	0.790	0.906	0.915	0.901	0.894	0.965	0.940	0.935	0.865	0.907	1.000			
    ISO	0.920	0.580	0.653	0.947	0.888	0.893	0.884	0.888	0.892	0.816	0.856	0.853	0.944	0.864	0.653	1.000		
    ISO+OBP	0.958	0.752	0.863	0.976	0.930	0.983	0.981	0.978	0.977	0.957	0.973	0.968	0.999	0.964	0.863	0.946	1.000	
    R	0.929	0.834	0.904	0.926	0.853	0.960	0.962	0.944	0.940	0.960	0.964	0.963	0.944	0.962	0.904	0.831	0.941	1.000
    Basically what you want to look at is the bottom line which is the correlation of all the stats to runs. This by the way was for all teams from 1962 to 2003 with strike shortened seasons removed. If they didn't schedule a full schedule for the year or play a full schedule it was removed.
    What is the correlation betweens runs and EQA and runs and VORP.

  18. #18
    They are all the same:

    http://www.baseballprospectus.com/ar...articleid=2596

    Choose the smarter measure, or the easiest measure. Don't choose the one that gets you .001 more of r.

  19. Quote Originally Posted by Tango Tiger
    They are all the same:

    http://www.baseballprospectus.com/ar...articleid=2596

    Choose the smarter measure, or the easiest measure. Don't choose the one that gets you .001 more of r.
    So if you compare .928 to .962 EQA is not the most accurate?

  20. #20
    Join Date
    May 2005
    Location
    Where all students live...nowhere.
    Posts
    8,900
    EQA doesn't correlate to run scoring because it is not modeling the actual run scoring of a league..it is modeling an imaginary league in which (theoretically) a .260 EQA is the league average and a .280 EQA hitter in one league is just as productive as a .280 EQA hitter in another. It doesn't model the actual runs created by the hitter...it's what he would do in a league that is like the all time average league.

  21. Quote Originally Posted by SABR Matt
    EQA doesn't correlate to run scoring because it is not modeling the actual run scoring of a league..it is modeling an imaginary league in which (theoretically) a .260 EQA is the league average and a .280 EQA hitter in one league is just as productive as a .280 EQA hitter in another. It doesn't model the actual runs created by the hitter...it's what he would do in a league that is like the all time average league.
    Okay. Am I readimng this correctly when I say it can`t be compared to other stats mnetioned in the table above?

  22. #22
    Join Date
    May 2005
    Location
    Where all students live...nowhere.
    Posts
    8,900
    I believe so yes.

  23. Quote Originally Posted by SABR Matt
    I believe so yes.
    So in other words we can`t figure out how accurate it is :rolleyes

  24. #24
    Join Date
    May 2005
    Location
    Where all students live...nowhere.
    Posts
    8,900
    Not by comparing it directly to run scoring. You can compare EQA to something more like relative RS Rate...but not to the other RC estimators.

  25. Quote Originally Posted by SABR Matt
    Not by comparing it directly to run scoring. You can compare EQA to something more like relative RS Rate...but not to the other RC estimators.
    So is it the most accurate of all the staistics?

Page 1 of 2 12 LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •