Baseball Fever  

Go Back   Baseball Fever > General Baseball > Statistics, Analysis, & Sabermetrics

Reply
 
Thread Tools Rate Thread Display Modes
  #1  
Old 08-10-2006, 03:51 PM
Mr. Red's Avatar
Mr. Red Mr. Red is offline
Registered User
 
Join Date: Mar 2006
Location: Kentucky
Posts: 112
Correlation Between Stats and Runs, etc.

What are the best stats, in terms of their correlation to winning %, runs scored, or runs allowed? I.E. what batting stat most closely correlates with runs scored? What pitching stat most closely correlates with K rate, BB rate, HR rate, BF, etc? What fielding stat (preferably individual) that most closely correlates with balls in play converted to outs and thus runs allowed? I hope you understand where I'm going with this. Thanks in advance.
Reply With Quote
  #2  
Old 08-10-2006, 11:36 PM
Ubiquitous's Avatar
Ubiquitous Ubiquitous is offline
stats moderator
 
Join Date: Aug 2005
Posts: 10,272
Blog Entries: 2
Hopefully this will show
Code:
	TB	AVG	OBP	SLG	SEC	OPS	SLOB	AV+SE	A*SEC	OBP3+S	OP1.8+S	AV+SE/2	Eff	RC	OBP*1.8	ISO	ISO+OBP	R
TB	1.000																	
AVG	0.838	1.000																
OBP	0.805	0.855	1.000															
SLG	0.991	0.811	0.806	1.000														
SEC	0.812	0.534	0.790	0.848	1.000													
OPS	0.974	0.862	0.906	0.981	0.867	1.000												
SLOB	0.969	0.862	0.915	0.975	0.868	0.999	1.000											
AV+SE	0.913	0.747	0.901	0.933	0.961	0.964	0.965	1.000										
A*SEC	0.908	0.732	0.894	0.929	0.966	0.959	0.962	0.999	1.000									
OBP3+S	0.928	0.879	0.965	0.933	0.856	0.985	0.988	0.961	0.955	1.000								
OP1.8+S	0.954	0.874	0.940	0.960	0.865	0.996	0.997	0.966	0.960	0.997	1.000							
AV+SE/2	0.940	0.851	0.935	0.948	0.899	0.986	0.987	0.985	0.980	0.988	0.991	1.000						
Eff	0.966	0.771	0.865	0.982	0.919	0.987	0.985	0.975	0.973	0.961	0.977	0.971	1.000					
RC	0.977	0.885	0.907	0.970	0.837	0.992	0.993	0.948	0.944	0.981	0.990	0.979	0.970	1.000				
OBP*1.8	0.805	0.855	1.000	0.806	0.790	0.906	0.915	0.901	0.894	0.965	0.940	0.935	0.865	0.907	1.000			
ISO	0.920	0.580	0.653	0.947	0.888	0.893	0.884	0.888	0.892	0.816	0.856	0.853	0.944	0.864	0.653	1.000		
ISO+OBP	0.958	0.752	0.863	0.976	0.930	0.983	0.981	0.978	0.977	0.957	0.973	0.968	0.999	0.964	0.863	0.946	1.000	
R	0.929	0.834	0.904	0.926	0.853	0.960	0.962	0.944	0.940	0.960	0.964	0.963	0.944	0.962	0.904	0.831	0.941	1.000
Basically what you want to look at is the bottom line which is the correlation of all the stats to runs. This by the way was for all teams from 1962 to 2003 with strike shortened seasons removed. If they didn't schedule a full schedule for the year or play a full schedule it was removed.
Reply With Quote
  #3  
Old 08-10-2006, 11:54 PM
Mr. Red's Avatar
Mr. Red Mr. Red is offline
Registered User
 
Join Date: Mar 2006
Location: Kentucky
Posts: 112
What is SLOB? Thanks by the way.
Reply With Quote
  #4  
Old 08-10-2006, 11:57 PM
Ubiquitous's Avatar
Ubiquitous Ubiquitous is offline
stats moderator
 
Join Date: Aug 2005
Posts: 10,272
Blog Entries: 2
SLOB is slugging times on base percentage. SLOB is basically runs created.
Reply With Quote
  #5  
Old 08-11-2006, 08:03 AM
Tango Tiger's Avatar
Tango Tiger Tango Tiger is offline
Registered User
 
Join Date: Mar 2006
Posts: 2,015
Ub, *great* presentation.

And, interestingly, 1.8 * OBP + SLG has the highest correlation to runs scored. But, as we can see, even the plain old OPS works fine.
Reply With Quote
  #6  
Old 08-11-2006, 08:11 AM
Mr. Red's Avatar
Mr. Red Mr. Red is offline
Registered User
 
Join Date: Mar 2006
Location: Kentucky
Posts: 112
How do you make these charts? And can you make one for pitching statistics?
Reply With Quote
  #7  
Old 08-11-2006, 09:34 AM
538280's Avatar
538280 538280 is offline
Prophet of Rage
 
Join Date: May 2005
Location: Western Massachusetts
Posts: 11,576
Thanks a lot Ubi! I was surprised to see that many of the extremely advanced measures weren't much better than plain OPS, and that SLG actually correlated better than OBP. That's contrary to what I thought was true. Thanks!
Reply With Quote
  #8  
Old 08-11-2006, 09:38 AM
Ubiquitous's Avatar
Ubiquitous Ubiquitous is offline
stats moderator
 
Join Date: Aug 2005
Posts: 10,272
Blog Entries: 2
Quote:
Originally Posted by Mr. Red
How do you make these charts? And can you make one for pitching statistics?

Lahman database, excel, and the code tagline for here.

I looked into this a long time ago, I think in 2004 or so and had saved it. Never looked into pitching but I would guess the same if available would apply to pitching. Teams with the lowest obp and slg allowed would allow the least runs.
Reply With Quote
  #9  
Old 08-11-2006, 09:41 AM
Ubiquitous's Avatar
Ubiquitous Ubiquitous is offline
stats moderator
 
Join Date: Aug 2005
Posts: 10,272
Blog Entries: 2
Quote:
Originally Posted by 538280
Thanks a lot Ubi! I was surprised to see that many of the extremely advanced measures weren't much better than plain OPS, and that SLG actually correlated better than OBP. That's contrary to what I thought was true. Thanks!

SLG correlating higher then OBP surprises most, but if you think about it makes sense. SLG in part has an element of OBP in it, plus it tells you what kind hits they were. Whereas OBP simply treats everything like a single and only counts hits and walks (plus HBP so on and so on).

OPS correlating so high is one of the reasons why OPS got pushed and became popular. It is extrememly simply to find and for the amount of worked needed to obtain extremely accurate.
Reply With Quote
  #10  
Old 08-11-2006, 09:59 AM
Tango Tiger's Avatar
Tango Tiger Tango Tiger is offline
Registered User
 
Join Date: Mar 2006
Posts: 2,015
Even if SLG correlates higher than OBP, it's 1.8 OBP + SLG that correlates the highest, not OBP + SLG.

Also note that there is a lack of understanding of correlation. If all the teams build their teams so that the team OBP is all around the .320 to .340 mark, the regression won't be able to approach 1.00 as well as a teams that are built where HR are clustered.

The reason that K/PA or K/BFP has a high year-to-year correlation is because the true distribution is so high.
Reply With Quote
  #11  
Old 08-11-2006, 10:04 AM
SABR Matt's Avatar
SABR Matt SABR Matt is offline
Hunter of Objective Truth
 
Join Date: May 2005
Location: Where all students live...nowhere.
Posts: 8,710
Your point about clustering is definitely true Tango. I noticed when I was doing all of my work through correlations that if you had a distribution where there were clusters of data points at radically different places on your X/Y scatter, you'd get a line connecting your clusters with a very high correlation that meant nothing at all (because within any cluster the line tells you nothing.

Gotta watch out when you have huge ranges of possible outcomes...the range itself can determine your correlation.
Reply With Quote
  #12  
Old 08-11-2006, 01:34 PM
Mr. Red's Avatar
Mr. Red Mr. Red is offline
Registered User
 
Join Date: Mar 2006
Location: Kentucky
Posts: 112
Quote:
Originally Posted by Ubiquitous
SLG correlating higher then OBP surprises most, but if you think about it makes sense. SLG in part has an element of OBP in it, plus it tells you what kind hits they were. Whereas OBP simply treats everything like a single and only counts hits and walks (plus HBP so on and so on).
Wouldn't it make the most sense to make a stat that was total bases (including BB, etc.) per plate appearance? And what is Lahman's database?
Reply With Quote
  #13  
Old 08-11-2006, 01:50 PM
Tango Tiger's Avatar
Tango Tiger Tango Tiger is offline
Registered User
 
Join Date: Mar 2006
Posts: 2,015
What makes sense is Linear Weights and BaseRuns. Everything else doesn't. Lahman database is a free historical baseball database that you can download at http://www.baseball1.com , or you can see its implementation at http://www.baseball-reference.com .
Reply With Quote
  #14  
Old 08-14-2006, 09:43 PM
pizzacutter pizzacutter is offline
Registered User
 
Join Date: Apr 2006
Location: body in Chicago, heart in Cleveland
Posts: 215
Given the high correlation values across the board (and the statistical and conceptual colinearity), has anyone ever done a stepwise regression to start isolating which of the factors need to be isolated?
__________________
Statistically Speaking

The plural of anecdote is not data.
Reply With Quote
  #15  
Old 08-15-2006, 06:37 AM
Mariano_Rivera's Avatar
Mariano_Rivera Mariano_Rivera is offline
Joba Rules
 
Join Date: Mar 2006
Posts: 5,836
Preventing Runs and correlation

This might be worthy of a new thread but What is the correlation between some pitching stats and ERA? (ie SO, BB, GB/FB/LD/IFFB%, HBP, etc.)
Reply With Quote
  #16  
Old 08-15-2006, 08:31 AM
Tango Tiger's Avatar
Tango Tiger Tango Tiger is offline
Registered User
 
Join Date: Mar 2006
Posts: 2,015
pizzacutter, you should read more about Linear Weights.
Reply With Quote
  #17  
Old 09-15-2006, 03:04 PM
Mariano_Rivera's Avatar
Mariano_Rivera Mariano_Rivera is offline
Joba Rules
 
Join Date: Mar 2006
Posts: 5,836
Quote:
Originally Posted by Ubiquitous
Hopefully this will show
Code:
	TB	AVG	OBP	SLG	SEC	OPS	SLOB	AV+SE	A*SEC	OBP3+S	OP1.8+S	AV+SE/2	Eff	RC	OBP*1.8	ISO	ISO+OBP	R
TB	1.000																	
AVG	0.838	1.000																
OBP	0.805	0.855	1.000															
SLG	0.991	0.811	0.806	1.000														
SEC	0.812	0.534	0.790	0.848	1.000													
OPS	0.974	0.862	0.906	0.981	0.867	1.000												
SLOB	0.969	0.862	0.915	0.975	0.868	0.999	1.000											
AV+SE	0.913	0.747	0.901	0.933	0.961	0.964	0.965	1.000										
A*SEC	0.908	0.732	0.894	0.929	0.966	0.959	0.962	0.999	1.000									
OBP3+S	0.928	0.879	0.965	0.933	0.856	0.985	0.988	0.961	0.955	1.000								
OP1.8+S	0.954	0.874	0.940	0.960	0.865	0.996	0.997	0.966	0.960	0.997	1.000							
AV+SE/2	0.940	0.851	0.935	0.948	0.899	0.986	0.987	0.985	0.980	0.988	0.991	1.000						
Eff	0.966	0.771	0.865	0.982	0.919	0.987	0.985	0.975	0.973	0.961	0.977	0.971	1.000					
RC	0.977	0.885	0.907	0.970	0.837	0.992	0.993	0.948	0.944	0.981	0.990	0.979	0.970	1.000				
OBP*1.8	0.805	0.855	1.000	0.806	0.790	0.906	0.915	0.901	0.894	0.965	0.940	0.935	0.865	0.907	1.000			
ISO	0.920	0.580	0.653	0.947	0.888	0.893	0.884	0.888	0.892	0.816	0.856	0.853	0.944	0.864	0.653	1.000		
ISO+OBP	0.958	0.752	0.863	0.976	0.930	0.983	0.981	0.978	0.977	0.957	0.973	0.968	0.999	0.964	0.863	0.946	1.000	
R	0.929	0.834	0.904	0.926	0.853	0.960	0.962	0.944	0.940	0.960	0.964	0.963	0.944	0.962	0.904	0.831	0.941	1.000
Basically what you want to look at is the bottom line which is the correlation of all the stats to runs. This by the way was for all teams from 1962 to 2003 with strike shortened seasons removed. If they didn't schedule a full schedule for the year or play a full schedule it was removed.
What is the correlation betweens runs and EQA and runs and VORP.
Reply With Quote
  #18  
Old 09-15-2006, 03:11 PM
Tango Tiger's Avatar
Tango Tiger Tango Tiger is offline
Registered User
 
Join Date: Mar 2006
Posts: 2,015
They are all the same:

http://www.baseballprospectus.com/ar...articleid=2596

Choose the smarter measure, or the easiest measure. Don't choose the one that gets you .001 more of r.
Reply With Quote
  #19  
Old 09-15-2006, 05:06 PM
Mariano_Rivera's Avatar
Mariano_Rivera Mariano_Rivera is offline
Joba Rules
 
Join Date: Mar 2006
Posts: 5,836
Quote:
Originally Posted by Tango Tiger
They are all the same:

http://www.baseballprospectus.com/ar...articleid=2596

Choose the smarter measure, or the easiest measure. Don't choose the one that gets you .001 more of r.
So if you compare .928 to .962 EQA is not the most accurate?
Reply With Quote
  #20  
Old 09-15-2006, 05:12 PM
SABR Matt's Avatar
SABR Matt SABR Matt is offline
Hunter of Objective Truth
 
Join Date: May 2005
Location: Where all students live...nowhere.
Posts: 8,710
EQA doesn't correlate to run scoring because it is not modeling the actual run scoring of a league..it is modeling an imaginary league in which (theoretically) a .260 EQA is the league average and a .280 EQA hitter in one league is just as productive as a .280 EQA hitter in another. It doesn't model the actual runs created by the hitter...it's what he would do in a league that is like the all time average league.
Reply With Quote
  #21  
Old 09-15-2006, 05:44 PM
Mariano_Rivera's Avatar
Mariano_Rivera Mariano_Rivera is offline
Joba Rules
 
Join Date: Mar 2006
Posts: 5,836
Quote:
Originally Posted by SABR Matt
EQA doesn't correlate to run scoring because it is not modeling the actual run scoring of a league..it is modeling an imaginary league in which (theoretically) a .260 EQA is the league average and a .280 EQA hitter in one league is just as productive as a .280 EQA hitter in another. It doesn't model the actual runs created by the hitter...it's what he would do in a league that is like the all time average league.
Okay. Am I readimng this correctly when I say it can`t be compared to other stats mnetioned in the table above?
Reply With Quote
  #22  
Old 09-15-2006, 06:07 PM
SABR Matt's Avatar
SABR Matt SABR Matt is offline
Hunter of Objective Truth
 
Join Date: May 2005
Location: Where all students live...nowhere.
Posts: 8,710
I believe so yes.
Reply With Quote
  #23  
Old 09-15-2006, 06:13 PM
Mariano_Rivera's Avatar
Mariano_Rivera Mariano_Rivera is offline
Joba Rules
 
Join Date: Mar 2006
Posts: 5,836
Quote:
Originally Posted by SABR Matt
I believe so yes.
So in other words we can`t figure out how accurate it is :rolleyes
Reply With Quote
  #24  
Old 09-15-2006, 07:25 PM
SABR Matt's Avatar
SABR Matt SABR Matt is offline
Hunter of Objective Truth
 
Join Date: May 2005
Location: Where all students live...nowhere.
Posts: 8,710
Not by comparing it directly to run scoring. You can compare EQA to something more like relative RS Rate...but not to the other RC estimators.
Reply With Quote
  #25  
Old 09-15-2006, 07:41 PM
Mariano_Rivera's Avatar
Mariano_Rivera Mariano_Rivera is offline
Joba Rules
 
Join Date: Mar 2006
Posts: 5,836
Quote:
Originally Posted by SABR Matt
Not by comparing it directly to run scoring. You can compare EQA to something more like relative RS Rate...but not to the other RC estimators.
So is it the most accurate of all the staistics?
Reply With Quote
Reply

Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -7. The time now is 04:26 PM.


Copyright © 2000-2008. All Rights Reserved.
Part of the
Baseball Almanac family: 755 Home Runs | Baseball Box Scores | Football Almanac | Pigskin Fever | Today in Baseball History.