Announcement

Collapse
No announcement yet.

BA and OPS

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • dominik
    replied
    Originally posted by Ubiquitous View Post
    Linear weights is based on the data so you really shouldn't use linear weight numbers for say a .450/.550 environment for a .225/.325 environment.

    Baseruns, by the way, is suppposed to be able to handle any environment.
    I would say even in a very wrong environment linear weights are still a lot more precise than the SLG coefficients. the value of a double might change from .7 to .75 but it never has twice the value of a single.

    I think it is pretty safe to say that if SLG is equal the guy with the higher BA and lower ISO will produce more. Of course a double is better than a single but a double is never worth 2 singles (to achieve the same SLG with singles than with doubles you need twice as many singles).

    So ISO does compensate for BA but not all the way.

    the really great hitters of course had great BA and great ISO.

    Leave a comment:


  • Jackaroo Dave
    replied
    Originally posted by drstrangelove View Post
    Interesting. More to read...
    I ran a linear regression on Runs, using as predictors BB, 1B, 2B, 3B, HR, SB, and CS, to see how the coefficients would compare with linear weights. I used MLB teams from 1997 to 2006, as that's the data I have.

    R = -521 + .352*BB + .609*1B + .713*2B + 1.14*3B + 1.48*HR + .129*SB - .163*CS.

    Linear weights: .70*BB. .90*1B. 1.25*2B. 1.6*3B. 2.00*HR .25*SB. -.50*CS

    (Counterintuitively, you can just ignore the -521.) (The linear weights come from Mark Klaassen beyondtheboxscore.com. Google wOBA, linear weights, and a bunch of years, and you'll get a long list of linear weights and conversion coefficients, and a good explanation.)

    The regression Rsquare value was 91.4, meaning that 91.4% of the variation in runs could be "accounted for" or "explained by" variation in the 7 predictor variables. Except for CS, all the variables were significant, and except for SB, the significant variables had p< = 0.00. The residuals (differencs between predicted and actual runs scored) were roughly normal in distribution, and only a few cases were tagged as anomalous.

    I think the Rsquare is rather low, considering how much information is included in the predictor variables, but otherwise it looked Ok to me.

    Why the coefficients are so different from the linear weights, I cannot explain, only note that they have two different interpretations.

    In regression, a coefficient of, say, .713 for a double means that if you hold all the other variables constant, an increase of one double means an increase of .713 runs, on average. In linear weights, a weight of 1.25 means that given the run expectancy for situation A, the run expectation for situation B = situation A followed by a double, will be 1.25 runs higher than that of situation A.

    I wish I could do better. I'm working on it and appreciate any help.

    Leave a comment:


  • drstrangelove
    replied
    Originally posted by Jackaroo Dave View Post
    Linear weights are based on differences in average run expectancies in different situations, based on play-by-play data.

    Runs created in its original form was based on its pretty durn good predictive power and its simplicity, as far as I know.

    I don't THINK either is based on multiple linear regression. That doesn't speak to your main point, however.

    (apologies for the cuts in your post)
    Interesting. More to read...
    Last edited by drstrangelove; 02-21-2012, 06:04 PM.

    Leave a comment:


  • Jackaroo Dave
    replied
    Originally posted by drstrangelove View Post
    I know you've thought about this and probably studied this extensively, so I don't mean to disagree, just to disagree . . .
    As I understand them, they [LW, RC] both multiple regression models, and such models are notoriously inaccurate if you try to apply them to data outside the base range IF the regression line was created using a limited range of data.

    Does this not apply to both?
    Linear weights are based on differences in average run expectancies in different situations, based on play-by-play data.

    Runs created in its original form was based on its pretty durn good predictive power and its simplicity, as far as I know.

    I don't THINK either is based on multiple linear regression. That doesn't speak to your main point, however.

    (apologies for the cuts in your post)

    Leave a comment:


  • dominik
    replied
    Originally posted by Joltin' Joe View Post
    I can't agree that it takes the guesswork out of it. WAR and Win Shares are good stats but they are not absolute and it is one person's guess work of how valuable something is. It is not a cut and dry stat like Usain Bolt's 100 meter time.





    Yes I agree that BA is the least important of the three but my point was that when OBPs and SAs are that close, the significant delta in the BA does come into play as a "tie breaker".
    I would say that BA does get enough credit. all the superstar hitters hit for high average:

    -ruth
    -williams
    -gehrig
    -pujols
    -bonds
    -manny

    they all hit well above .300 in their good years. out of the 10 highest OPS+ guys only 2 are not having a .300 career average (bonds at .299 and mantle at .298). most are over .330. the prototypical inner circle HOFer is not a patient pure slugger. It is a .330 hitter that slugs for power and also walks quite a lot.

    that high walk, high homer, low BA guy like jim thome usually doesn't get a lot of credit.


    for comparison I will post wOBA for 2 quite similar seasons of mantle and hornsby (by OPS).

    hornsby 1927: 1.035 OPS, .361 BA, .448 OBP, .586 SLG
    mantle 1958: 1.035 OPS, .304 BA, .443 OBP, .592 SLG


    of course mantles ISO is higher (.288 vs.225) and hornsbys BA is a lot higher.

    wOBA

    Mantle: .452
    Hornsby: .471

    So it seems like the higher BA is slightly more worth than the higher ISO. this makes sense of course since XBH are better than singles but not as much as slg might you think.
    SLG says that a single is only 25% worth a HR and 50% worth a double. linear weights however say that a single is 70% of a double and about 40% or so of a HR.

    that means the higher ISO does equal out the higher BA somewhat but not quite. the higher BA guy will produce slightly more runs.
    Last edited by dominik; 02-21-2012, 04:20 PM.

    Leave a comment:


  • drstrangelove
    replied
    Originally posted by Ubiquitous View Post
    Linear weights is based on the data so you really shouldn't use linear weight numbers for say a .450/.550 environment for a .225/.325 environment.

    Baseruns, by the way, is suppposed to be able to handle any environment.
    Ok, so I need to study baseruns and become more familiar with that. Cool...more to learn is good!

    Thx

    Leave a comment:


  • Ubiquitous
    replied
    Linear weights is based on the data so you really shouldn't use linear weight numbers for say a .450/.550 environment for a .225/.325 environment.

    Baseruns, by the way, is suppposed to be able to handle any environment.

    Leave a comment:


  • drstrangelove
    replied
    I know you've thought about this and probably studied this extensively, so I don't mean to disagree, just to disagree.

    Is it not true that both RC and LW suffer from the same problem of trying to fit data that 'normally' occurs in a somewhat narrow range, while then extrapolating the findings to data that falls firmly outside the base range? RC luckily or unluckily, estimates runs scored quite well. LW accomplishes the same.

    I don't find it odd that the results are contridicted by the two methods once one leaves the cozy confines of the base data and tries to apply it to virgin raw information that looks nothing like the base. Neither method is really based upon those data ranges.

    If this is completely wrong, then I'm sure you can explain why, but it's concerning to me. As I understand them, they are both multiple regression models, and such models are notoriously inaccurate if you try to apply them to data outside the base range IF the regression line was created using a limited range of data.

    Does this not apply to both?

    Leave a comment:


  • Ubiquitous
    replied
    It may be the best tool if you desire the nut to be everywhere but beyond that not really.

    I doubt there is any metric out there or ever will be that is free of error but runs created is a fluke.

    Leave a comment:


  • leewileyfan
    replied
    Originally posted by Ubiquitous View Post
    RC is an extremely simple and coincidental metric which shouldn't really be used as anything other than a blunt instrument.
    I am quite honestly NOT being cynical or snide when I ask this: Sometimes, isn't the blunt instrument the best tool to use when trying to crack open a tough nut?

    I have studied and worked with LWTS, BSR and looked into sabermetric glossaries for alternative methods to approach batter run creation. After all is said and done, one I stumbled upon by trial and error suggests that adding Total Bases + Bases on Balls, then multiplying by BA gives a pretty darn good estimated of Runs Created. I imagine it's a matter of how much exactitude one wants in his numbers; but then, I believe it's fair to ask, just how sure one is of his precise conclusions once he has reached them?

    Is there, or has there ever been a study, sabermetrically regressed and vetted, that has taken an acceptable team/league/decade/historic formula that has been applied and found EXACT in conforming from player level to actual team results without any margin of error?

    Leave a comment:


  • Ubiquitous
    replied
    Originally posted by drstrangelove View Post

    Same data, but with RC compiled. In the RC formula, higher BA is better. I haven't studied this, so I don't have an opinion which is right, but it will make for some interesting time for me at least over some coffee. (Maybe a couple pots of coffee!)

    I don't want to debate which is right since both methods have experienced and intelligent advocates. It's interesting though.
    RC is an extremely simple and coincidental metric which shouldn't really be used as anything other than a blunt instrument.

    Leave a comment:


  • leewileyfan
    replied
    The model is a self-fulfilling prophecy. The guy with the highest BA is presumed to have the same OB% and SLG and the guy batting around .190. One can create an arithmetic model to suit and point under debate.

    In the context of recent remarks posted here, I tried to construct a reasonable pair of batters with very comparable stats OTHER than their respective BAs, which, in themselves are fairly well apart but not so extreme as to forbid reasonable comparisons.

    Player "Visitor" posts these numbers:

    AB 600
    H 162
    BA .270
    HR 30
    3B 0
    2B 18
    1B 114
    BB 73
    TB 270

    Player "Home" puts up these numbers:

    AB 600
    Hits 197
    BA .328
    HR 12
    3B 4
    2B 29
    1B 152
    BB 37
    TB 270

    The hour grows late, so I penalized Home a single. There is nothing in the figures that would indicate DPs batted into by either player. One has 48 extra base hits; the other 45. The big disparity would seem to be HRs; but hen the question is legitimately raised WHEN and under what circumstances game/conditions each of those 18 big hits were belted. In a full season, there's enough random chance in PA where 18 "bombs" may not be all that telling, especially when the trailer is collecting 36 more hits on his side of the ledger.
    Last edited by leewileyfan; 03-14-2012, 09:57 PM.

    Leave a comment:


  • drstrangelove
    replied
    Code:
    PA	AB	H	2B	3B	HR	AVG	BB	SO	TB	OBP	SLG	LH	Ex/LH	OPS	Outs	RC
    660	640	200	30	4	8.1	.313	20		262	.333	.410	42	62	.743	440	91
    660	620	180	30	4	12.1	.290	40		254	.333	.410	46	74	.743	440	89
    660	600	160	30	4	16	.267	60		246	.333	.410	50	86	.743	440	87
    660	580	140	30	4	19.9	.241	80		238	.333	.410	54	98	.743	440	85
    660	560	120	30	4	23.9	.214	100		230	.333	.410	58	110	.744	440	83
    660	540	100	30	4	27.8	.185	120		221	.333	.410	62	121	.743	440	82
    Same data, but with RC compiled. In the RC formula, higher BA is better. I haven't studied this, so I don't have an opinion which is right, but it will make for some interesting time for me at least over some coffee. (Maybe a couple pots of coffee!)

    I don't want to debate which is right since both methods have experienced and intelligent advocates. It's interesting though.
    Last edited by drstrangelove; 02-20-2012, 08:19 PM.

    Leave a comment:


  • Ubiquitous
    replied
    Tangotiger on BA/OBP/SLG

    The following table presents 6 players with the same OBA, same SLG, but widely differing batting averages.

    Essentially, as the walks and HR go up, I decrease the hits. In this way, I force the OBA and SLG to match, while varying the batting average.

    We see that not considering the batting average in your OPS metric will have an effect of +/- 2 runs. We again see that Linear Weights, as expected, is an almost perfect match. BaseRuns comes to within 1 run of the true value
    Code:
    PA 	AB 	H 	2B 	3B 	HR 	BB 	Outs 	AVG 	OBA 	SLG 	OPS 	BsR 	BsR/440 BsR+/- 	LWTS 	Team diff
    660 	640 	200 	30 	4 	8.1 	20 	440 	0.313 	0.333 	0.41 	0.743 	74.6 	74.6 	-1 	-1.8 	-1.7
    660 	620 	180 	30 	4 	12.1 	40 	440 	0.29 	0.333 	0.41 	0.743 	75.1 	75.1 	-0.5 	-0.9 	-0.8
    660 	600 	160 	30 	4 	16 	60 	440 	0.267 	0.333 	0.41 	0.743 	75.6 	75.6 	0 	0 	0
    660 	580 	140 	30 	4 	19.9 	80 	440 	0.241 	0.333 	0.41 	0.743 	76.1 	76.1 	0.5 	0.9 	0.9
    660 	560 	120 	30 	4 	23.9 	100 	440 	0.214 	0.333 	0.41 	0.743 	76.6 	76.6 	1.1 	1.8 	1.8
    660 	540 	100 	30 	4 	27.8 	120 	440 	0.185 	0.333 	0.41 	0.743 	77.2 	77.2 	1.6 	2.6 	2.7
    As for the batting average thing, I suppose that's another myth. It's pretty clear that given two guys with the same OBA and SLG, you want the guy with the LOWER BA (though in reality, we're not talking about much difference).

    I just tried with a weird environment (OBA/SLG of .393/.493), and in this case, the higher the BA, the more runs scored. I then tried the other way, with .289/.351, and this time the LOWER the BA, the more runs scored.

    The "break-even" point seems to be about .360/.450. That is, at that level, the change in batting average (and I checked from .200 to .340) made zero change to the run production of the team.

    RC has its own problems, magnified substantially when the HR/H or HR/PA becomes out of whack. RC does not model run scoring at all: it just got lucky that it looks like it models it. If you've got a computer, there's zero reason to use RC, when you've got BsR (unless you want to propose a model that's better).

    I don't really care about the different denominators. The whole thing of OPS centers around: more good, less bad. The more walks, the more hits, the more TB, the less outs, the better the number. There's nothing inherent in OPS that ensures that the balance is proper. It's just plain old luck that for the run environment of MLB, that it works out that way.

    Believe me, if the run environment was half what it is today, or double what it is, there'd be some other "quick" estimator that would get lucky to model run creation.

    Leave a comment:


  • leewileyfan
    replied
    Originally posted by SavoyBG View Post
    Just looked at it, and it's not really clear which player is better. But what is clear is that there's no strong evidence that we should automatically prefer the higher batting average when both players are roughly equal in OBP and SLG%. Usually the higher batting average guy will use up more outs along the way, imcluding hitting into more DPs, since he puts the ball in play more often and would be less of a flyball hitter than the Evans or Thome type.

    I checked, and Madlock hit into DPs more often than Evans, including one season where Madlock led the league with 25 GIDPs.
    A couple of questions being begged there. Completing an arc that ties high batting averages to a tendency to hit into more DP's is one. Between the ground ball and fly ball scenarios, there are grass-cutters and line drives aplenty. Then too, there are high average hitters who achieve that by GDP avoidance.

    We can paint idealized pictures to color debating points; but if one is inclined to hold others' feet to the task of statistical support, the idealized generalizations must stand up to similar scrutiny.

    Leave a comment:

Ad Widget

Collapse
Working...
X