+ Reply to Thread
Results 1 to 21 of 21

Thread: BPro's VORP vs. RARP

  1. BPro's VORP vs. RARP

    Anyone have a good understanding of the difference between Baseball Prospectus' VORP and RARP (runs above replacement position -- found in their EqA report)? Here's the glossary explanation:

    "Runs Above Replacement, Position-adjusted. A statistic that compares a hitter's Equivalent Run total to that of a replacement-level player who makes the same number of outs and plays the same position. A "replacement level" player is one who has .736 times as many EqR as the average for the position; that corresponds to a .351 winning percentage. Used when fielding data is unavailable."

    Is that replacement level higher than VORP? RARP numbers of the top guys seem about 20 runs lower than with VORP.

    RARP includes as many aspects offense as VORP, right? It's EqR.

  2. Join Date
    May 2005
    Location
    Where all students live...nowhere.
    Posts
    8,741
    VORP has a replacement level of something like .270 I seem to recall...Tango or someone else can correct me if I'm wrong. I am almost certain that RARP is a higher replacement level than VORP.

  3. Since BP doesn't tell you exactly what they are doing, it is hard to say exactly what the discrepancy is, but VORP uses a higher replacement level. According to the "Baseball Between the Numbers" glossary, VORP uses 75% of the positional average for 1B/DH, and 80% for all other positions except catcher (85%) and pitcher (100%).

    The difference probably comes from the run estimators. RARP uses EQR as Sky said; EQR is essentially a linear weights formula. VORP uses MLV, which is Runs Created-based.

    EDIT: While what I said above is still true and a factor, it appears as if the biggest difference is that EQA/EQR are translated into their ideal league. I don't know that for sure, but if you total up all the EQR for the league, you get 21620, while the leagues have actually scored a combined 23161 runs, a 7% discrepancy. I don't have a problem personally with the "translated" stats, so long as they clearly indicate that is what they are doing, and so people don't judge them in the wrong context. They don't clearly indicate much of anything over there.
    Last edited by Patriot; 09-30-2007 at 02:49 PM.

  4. Join Date
    May 2005
    Location
    Where all students live...nowhere.
    Posts
    8,741
    The ideal league is a sort of average of all of the modern baseball era...which is why there are more runs in the real world now than the ideal league (because this is an offense-heavy time for baseball). I still don't like BP's use of this ideal non-existent league.

  5. Quote Originally Posted by Patriot View Post
    Since BP doesn't tell you exactly what they are doing, it is hard to say exactly what the discrepancy is, but VORP uses a higher replacement level. According to the "Baseball Between the Numbers" glossary, VORP uses 75% of the positional average for 1B/DH, and 80% for all other positions except catcher (85%) and pitcher (100%).

    The difference probably comes from the run estimators. RARP uses EQR as Sky said; EQR is essentially a linear weights formula. VORP uses MLV, which is Runs Created-based.

    EDIT: While what I said above is still true and a factor, it appears as if the biggest difference is that EQA/EQR are translated into their ideal league. I don't know that for sure, but if you total up all the EQR for the league, you get 21620, while the leagues have actually scored a combined 23161 runs, a 7% discrepancy. I don't have a problem personally with the "translated" stats, so long as they clearly indicate that is what they are doing, and so people don't judge them in the wrong context. They don't clearly indicate much of anything over there.
    Haha, so while VORP uses a higher replacement level (mostly 80% versus 73%), RARP numbers turn out lower due to neutralizing the run environment? Awesome. Just awesome.

    Can somebody please create their own VORP stat and make it publicly available? Don't make me do it. No, really, don't make me do it.
    Last edited by skyking162; 10-01-2007 at 07:52 AM. Reason: typo

  6. Quote Originally Posted by skyking162 View Post
    Can somebody please create their own VORP stat and make it publicly available? Don't make me do it. No, really, don't make me do it.
    I always post spreadsheets on my website that include RAR above a "replacement hitter at the position", which is what VORP is basically. I use 73% and a ten-year positional average to set the baseline. It usually takes me about a week after the end of the season to get around to, though.

    I just wish BP would be more transparent in what they do. They don't have to give their formulas, even--just explain, clearly, what each method does and what the units mean, etc. If people like Sky can't tell the difference between RARP and VORP easily, then what about the average person? They're going to be completely confused.

  7. BPro also has an excess of stats running around, and they're not consistent. WARP should be VORP plus fielding and other stuff, but it's not. So now they have SuperVORP, but not publicly available. EqA, MLVr, and other stuff like that just get in the way of each other. It's like presenting HRs, 2Bs, 3Bs, XBHs, SLG, OPS, ISO, to make a case about power. Just use the best stat. If you need different stats for different situations, that's fine. But they shouldn't overlap if the situations don't overlap. And I can understand continuing to calculate something after if it's been replaced by a newer, better model, but have a separate section for stuff like that.

    Patriot, do you include any park factors in your RAR(P)? How did you arrive at 73%? I always heard BPro's 80% replacement level was too low...

  8. Quote Originally Posted by skyking162 View Post
    Patriot, do you include any park factors in your RAR(P)? How did you arrive at 73%? I always heard BPro's 80% replacement level was too low...
    Sure, I park adjust Runs Created (actually ERP) before plugging it in. I use 73% not because I have any particular preference for it, but because it seems (or at least seemed to be when I started doing it) a standard value that a lot of people used. The Big Bad Baseball Annual always used a .350 OW%, which is equivalent to 73%; Bill James used .350 in 1984 before he stopped doing mathematical rankings; Clay D. as mentioned above uses 73%.

    Personally, I do think that a higher baseline is probably better...80% corresponds to a .390 OW%. Tango, unless I am mistaken, uses .380 for position players, which is ~78%. Of course, if anyone doesn't like 73%, they can always just go into the spreadsheet and change "*.73" to "*.8" or whatever value they prefer (although I'm not sure if you can do this in a google spreadsheet, which is how I'm going to do everything this year).

  9. Thanks, Patriot, looking forward to your numbers, although I might try to tackle it myself.

    Two more questions, somewhat related:

    1. Is there any theoretical difference between translating a player's line to a neutral setting and then comparing to replacement-level versus comparing a player's raw line to a dynamic runs-per-win ratio that depends on ballpark? Shouldn't be, right? I find a lot of people resisting park effects for MVP awards because they think PF's introduce a "what-if" component. But in the most basic park adjustment, you're really just saying how valuable a certain line was IN ITS CONTEXT, without any "what-ifs". Right?

    2. USSMariner has a year-end look at player value, with steps 1 and 2 today. Dave follows these steps:

    Step 1: compare all players' hitting to average (linear weights)
    Step 2: compare all player positions to average position (via rough #s from Tango)
    Step 3: compare all players's fielding to positional average

    That's pretty solid if you want to use league-average as your baseline. If you want to convert to replacement level, where do you do it?

  10. Right, I use around .380 for nonpitchers, starters, and .470 for relievers. That combination will give you a team win% of .300, which I think is the level that has the most basis is reality, and seems to be commonly accepted.

    I could live with a level as low as .250 (which would imply levels of .350 for nonposition, starters, and .440 for relievers).

    ***

    A .380 level implies a 74% RC level at the INDIVIDUAL player level, while a .350 level implies a 67% RC level.

    This is how I calculate it for the .380 level. To get a team-level .380 hitting implies 8.65 nonpitchers per team (excludes pitchers, includes DH, PH) of .486 each. That is, each is -.014 wins, times 8.65 gives you -.12 wins compared to .500, or .380 overall. A team that scores 4.85 runs and allows 5.00 will win, according to PythagenPat, .486 games.

    So, the baseline player is 0.15 runs per game below average, or 24 runs per 162 games below average. A 5 runs per game and 9 hitters per game, then the average hitter is 5*162/9=90 runs. 24 below that is 66, and 66/90 = 73%. If you used 8.65 hitters instead, you get 74%.

    Anything at the 70-75% level is fine with me.

    However, I wouldn't do ERP, unless you redistribute the excess/missing outs. I prefer LWTS plus lgRuns/lgPA * PA to get the Runs Created figure.

  11. On Sky's point #1, there's really no difference, particularly if you use RPW = RPG as your run/win converter. If you use a different r/w converter, there will be a small discrepancy, but not a whole lot.

    Suppose you have a player who creates .3 runs/out and makes 400 outs against a league average of .2 in a park with a 1.2 PF, in a league where there are 10 total runs per game. Method A is to park adjust the player's contribution, and say that he "would have created" .3/1.2 = .25 runs/out in a neutral park. So his runs above average is now (.25-.2)*400 = +20. At 10 r/w, that makes his WAA +2.

    The other approach would be to adjust the league average, and so in this park the league will score .2*1.2 = .24 r/o. Now the player's RAA is (.3-.24)*400 = +24. But the r/w in this park is 1.2*10 = 12, and so his WAA is 24/2 = +2.

    Some people in a MVP discussion might want to actually look at the player's contribution in his run context, and that would make a difference. If you have two players of equal run contribution in equal parks, but one plays on a team that scores or allows more runs, his run contribution will create less wins for the team then the other player. Some people might want to look at the actual team construction to determine value, and park factors can mess that up.

    But if you are assuming that a player's value shouldn't depend on his teammates contributions, and only the context in which he performs, using a run PF or using a context-based r/w converter will have the same effect.

    On Sky's #2, the easiest thing to do would be to introduce the replacement level on offense. Whether this is the best thing to do is open for debate, but you shouldn't be off by too much.

  12. Quote Originally Posted by Tango Tiger View Post
    However, I wouldn't do ERP, unless you redistribute the excess/missing outs. I prefer LWTS plus lgRuns/lgPA * PA to get the Runs Created figure.
    I like that figure too, as it's the "RC+". But I assume you are going to then make the comparison to the replacement player on the basis of PA, no? If so, the math works out as equivalent for runs above average and just a bit off in the replacement case (around +/-3 runs for all hitters).

    I of course have no objection to the idea that it would be better to use a custom LW for the season in question then a general method like ERP.

  13. What's the meaning of LWTS + lg(LWTS/PA)*PA? Is LWTS in this case relative to league-average or just total runs created?

  14. Also, Patriot, I was going through your PRAR numbers and had a question about the positional multipliers -- why is DH the same as 1B? Are your numbers totally empirical?

  15. Quote Originally Posted by skyking162 View Post
    Also, Patriot, I was going through your PRAR numbers and had a question about the positional multipliers -- why is DH the same as 1B? Are your numbers totally empirical?
    Yep, totally empirical and based on offense. I make a mental adjustment of knocking off another 5-10 runs for a full-time DH, but I have decided to keep the published figure strictly empirical. I also have no qualms with the research of Tango and others that shows that center field is a plus defensive position and thus they may be a bit underrated, and the opposite may be true for second base.

    So my figures have all of the problems that go along with offensive positional adjustments. So do those of BP, and more, since they base the comparison on the one-year performance of hitters at each position, while I take a more long-term approach. In some years shortstops hit better than second baseman, for instance, but over the long haul, this is never the case (not that it couldn't happen, but it hasn't yet in the history of baseball).

    LWTS + Lg(R/PA)*PA is Runs Created plus the value of avoiding outs. As you pointed out in the Rollins thread, straight RC does not fully incorporate the negative effects of making outs. However, I don't believe that it makes much of a difference when you convert to value against baseline, because if you take (RC/Out-Lg(R/Out))*Out, it will be equal to (Tango's RC/PA - Lg(R/PA))*PA. Once you include outs in the RC number as Tango does, to use outs as the rate denominator is inappropriate.

  16. Join Date
    Aug 2007
    Location
    Pullman, WA
    Posts
    3,118
    Co-sign on wanting better explinations. I STILL don't understand Expected Wins, which I believe is supposed to be basic. What exactly is an expected win? What does it mean if your actual wins are higher or lower?

    I like BP, maybe more than some of the more intelligent guys here (that should tell me something), but I agree. PLEASE tell me what I'm looking at.

    I paid 5 bucks to read your PECOTA predictions. Least you could do is a better job of telling me how exactly FRAR works and what the heck it's representing.


    I'm glad EqA is made public, and I do like it. Although it bothers me that either it doesn't actually represent some tangible percentage of something (at least not that I understand), but I like it's weights and using it to compare who is better than who in a rought way in terms of offense.

  17. EqA is Linear Weights, but with 5 extra unnecessary steps in the middle. Here:
    http://www.insidethebook.com/ee/inde...o_complicated/

    You can also find it in the future by googling:
    "eqa so darn complicated"

    ***

    Your BP-centric questions can be answered by BP people. They are usually quick to respond. Nice bunch of guys, even if a couple are extremely stubborn. Then again, I'm extremely stubborn, and I'm not sure I'm any nicer.

    ***

    LWTS plus lgRuns/lgPA * PA:

    Remember that LWTS is runs above the average player. That second term simply tells you how many runs an average player created, given the number of PA your player had.

    So, the first term is runs above the average player, and the second term is runs for the average player. Adding the two gives you total runs created.

    It's better than straight RC, because it handles the out better.

  18. Join Date
    Aug 2007
    Location
    Pullman, WA
    Posts
    3,118
    Is the major point of EqA just to create a stat that looks like batting average that maybe the less stat-minded fan can digest?

  19. Quote Originally Posted by Tango Tiger View Post
    LWTS plus lgRuns/lgPA * PA:

    Remember that LWTS is runs above the average player. That second term simply tells you how many runs an average player created, given the number of PA your player had.

    So, the first term is runs above the average player, and the second term is runs for the average player. Adding the two gives you total runs created.

    It's better than straight RC, because it handles the out better.
    That makes sense. LWTS is runs compared to average and the second term term converts the level of average from 0 to total runs. Thanks.

  20. Do you have to park adjust the second part of LWTS + lg(runs/PA)*PA ? Later on in the value calculation you're going to penalize a player for hitting in Coors, so don't you need to reward him in both parts of this step, or is the LWTS bonus enough? Or maybe the fact that players in hitters' parks rack up extra PAs is the only "extra credit" needed and it's already built in?


+ Reply to Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts