![]() |
|
#1
|
||||
|
||||
|
Correlation Between Stats and Runs, etc.
What are the best stats, in terms of their correlation to winning %, runs scored, or runs allowed? I.E. what batting stat most closely correlates with runs scored? What pitching stat most closely correlates with K rate, BB rate, HR rate, BF, etc? What fielding stat (preferably individual) that most closely correlates with balls in play converted to outs and thus runs allowed? I hope you understand where I'm going with this. Thanks in advance.
|
|
#2
|
||||
|
||||
|
Hopefully this will show
Code:
TB AVG OBP SLG SEC OPS SLOB AV+SE A*SEC OBP3+S OP1.8+S AV+SE/2 Eff RC OBP*1.8 ISO ISO+OBP R TB 1.000 AVG 0.838 1.000 OBP 0.805 0.855 1.000 SLG 0.991 0.811 0.806 1.000 SEC 0.812 0.534 0.790 0.848 1.000 OPS 0.974 0.862 0.906 0.981 0.867 1.000 SLOB 0.969 0.862 0.915 0.975 0.868 0.999 1.000 AV+SE 0.913 0.747 0.901 0.933 0.961 0.964 0.965 1.000 A*SEC 0.908 0.732 0.894 0.929 0.966 0.959 0.962 0.999 1.000 OBP3+S 0.928 0.879 0.965 0.933 0.856 0.985 0.988 0.961 0.955 1.000 OP1.8+S 0.954 0.874 0.940 0.960 0.865 0.996 0.997 0.966 0.960 0.997 1.000 AV+SE/2 0.940 0.851 0.935 0.948 0.899 0.986 0.987 0.985 0.980 0.988 0.991 1.000 Eff 0.966 0.771 0.865 0.982 0.919 0.987 0.985 0.975 0.973 0.961 0.977 0.971 1.000 RC 0.977 0.885 0.907 0.970 0.837 0.992 0.993 0.948 0.944 0.981 0.990 0.979 0.970 1.000 OBP*1.8 0.805 0.855 1.000 0.806 0.790 0.906 0.915 0.901 0.894 0.965 0.940 0.935 0.865 0.907 1.000 ISO 0.920 0.580 0.653 0.947 0.888 0.893 0.884 0.888 0.892 0.816 0.856 0.853 0.944 0.864 0.653 1.000 ISO+OBP 0.958 0.752 0.863 0.976 0.930 0.983 0.981 0.978 0.977 0.957 0.973 0.968 0.999 0.964 0.863 0.946 1.000 R 0.929 0.834 0.904 0.926 0.853 0.960 0.962 0.944 0.940 0.960 0.964 0.963 0.944 0.962 0.904 0.831 0.941 1.000 |
|
#3
|
||||
|
||||
|
What is SLOB? Thanks by the way.
|
|
#5
|
||||
|
||||
|
Ub, *great* presentation.
And, interestingly, 1.8 * OBP + SLG has the highest correlation to runs scored. But, as we can see, even the plain old OPS works fine.
__________________
Author of THE BOOK -- Playing The Percentages In Baseball |
|
#6
|
||||
|
||||
|
How do you make these charts? And can you make one for pitching statistics?
|
|
#7
|
||||
|
||||
|
Thanks a lot Ubi! I was surprised to see that many of the extremely advanced measures weren't much better than plain OPS, and that SLG actually correlated better than OBP. That's contrary to what I thought was true. Thanks!
|
|
#8
|
||||
|
||||
|
Quote:
Lahman database, excel, and the code tagline for here. I looked into this a long time ago, I think in 2004 or so and had saved it. Never looked into pitching but I would guess the same if available would apply to pitching. Teams with the lowest obp and slg allowed would allow the least runs. |
|
#9
|
||||
|
||||
|
Quote:
SLG correlating higher then OBP surprises most, but if you think about it makes sense. SLG in part has an element of OBP in it, plus it tells you what kind hits they were. Whereas OBP simply treats everything like a single and only counts hits and walks (plus HBP so on and so on). OPS correlating so high is one of the reasons why OPS got pushed and became popular. It is extrememly simply to find and for the amount of worked needed to obtain extremely accurate. |
|
#10
|
||||
|
||||
|
Even if SLG correlates higher than OBP, it's 1.8 OBP + SLG that correlates the highest, not OBP + SLG.
Also note that there is a lack of understanding of correlation. If all the teams build their teams so that the team OBP is all around the .320 to .340 mark, the regression won't be able to approach 1.00 as well as a teams that are built where HR are clustered. The reason that K/PA or K/BFP has a high year-to-year correlation is because the true distribution is so high.
__________________
Author of THE BOOK -- Playing The Percentages In Baseball |
|
#11
|
||||
|
||||
|
Your point about clustering is definitely true Tango. I noticed when I was doing all of my work through correlations that if you had a distribution where there were clusters of data points at radically different places on your X/Y scatter, you'd get a line connecting your clusters with a very high correlation that meant nothing at all (because within any cluster the line tells you nothing.
Gotta watch out when you have huge ranges of possible outcomes...the range itself can determine your correlation. |
|
#12
|
||||
|
||||
|
Quote:
|
|
#13
|
||||
|
||||
|
What makes sense is Linear Weights and BaseRuns. Everything else doesn't. Lahman database is a free historical baseball database that you can download at http://www.baseball1.com , or you can see its implementation at http://www.baseball-reference.com .
__________________
Author of THE BOOK -- Playing The Percentages In Baseball |
|
#14
|
|||
|
|||
|
Given the high correlation values across the board (and the statistical and conceptual colinearity), has anyone ever done a stepwise regression to start isolating which of the factors need to be isolated?
|
|
#15
|
||||
|
||||
|
Preventing Runs and correlation
This might be worthy of a new thread but What is the correlation between some pitching stats and ERA? (ie SO, BB, GB/FB/LD/IFFB%, HBP, etc.)
__________________
2009 World Series Champions, The New York Yankees |
|
#16
|
||||
|
||||
|
pizzacutter, you should read more about Linear Weights.
__________________
Author of THE BOOK -- Playing The Percentages In Baseball |
|
#17
|
||||
|
||||
|
Quote:
__________________
2009 World Series Champions, The New York Yankees |
|
#18
|
||||
|
||||
|
They are all the same:
http://www.baseballprospectus.com/ar...articleid=2596 Choose the smarter measure, or the easiest measure. Don't choose the one that gets you .001 more of r.
__________________
Author of THE BOOK -- Playing The Percentages In Baseball |
|
#19
|
||||
|
||||
|
Quote:
__________________
2009 World Series Champions, The New York Yankees |
|
#20
|
||||
|
||||
|
EQA doesn't correlate to run scoring because it is not modeling the actual run scoring of a league..it is modeling an imaginary league in which (theoretically) a .260 EQA is the league average and a .280 EQA hitter in one league is just as productive as a .280 EQA hitter in another. It doesn't model the actual runs created by the hitter...it's what he would do in a league that is like the all time average league.
|
|
#21
|
||||
|
||||
|
Quote:
__________________
2009 World Series Champions, The New York Yankees |
|
#22
|
||||
|
||||
|
I believe so yes.
|
|
#23
|
||||
|
||||
|
Quote:
__________________
2009 World Series Champions, The New York Yankees |
|
#24
|
||||
|
||||
|
Not by comparing it directly to run scoring. You can compare EQA to something more like relative RS Rate...but not to the other RC estimators.
|
|
#25
|
||||
|
||||
|
Quote:
__________________
2009 World Series Champions, The New York Yankees |
![]() |
| Thread Tools | |
| Display Modes | Rate This Thread |
|
|