Announcement

Collapse
No announcement yet.

BA and OPS

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by drstrangelove View Post
    Interesting. More to read...
    I ran a linear regression on Runs, using as predictors BB, 1B, 2B, 3B, HR, SB, and CS, to see how the coefficients would compare with linear weights. I used MLB teams from 1997 to 2006, as that's the data I have.

    R = -521 + .352*BB + .609*1B + .713*2B + 1.14*3B + 1.48*HR + .129*SB - .163*CS.

    Linear weights: .70*BB. .90*1B. 1.25*2B. 1.6*3B. 2.00*HR .25*SB. -.50*CS

    (Counterintuitively, you can just ignore the -521.) (The linear weights come from Mark Klaassen beyondtheboxscore.com. Google wOBA, linear weights, and a bunch of years, and you'll get a long list of linear weights and conversion coefficients, and a good explanation.)

    The regression Rsquare value was 91.4, meaning that 91.4% of the variation in runs could be "accounted for" or "explained by" variation in the 7 predictor variables. Except for CS, all the variables were significant, and except for SB, the significant variables had p< = 0.00. The residuals (differencs between predicted and actual runs scored) were roughly normal in distribution, and only a few cases were tagged as anomalous.

    I think the Rsquare is rather low, considering how much information is included in the predictor variables, but otherwise it looked Ok to me.

    Why the coefficients are so different from the linear weights, I cannot explain, only note that they have two different interpretations.

    In regression, a coefficient of, say, .713 for a double means that if you hold all the other variables constant, an increase of one double means an increase of .713 runs, on average. In linear weights, a weight of 1.25 means that given the run expectancy for situation A, the run expectation for situation B = situation A followed by a double, will be 1.25 runs higher than that of situation A.

    I wish I could do better. I'm working on it and appreciate any help.
    Indeed the first step toward finding out is to acknowledge you do not satisfactorily know already; so that no blight can so surely arrest all intellectual growth as the blight of cocksureness.--CS Peirce

    Comment


    • #32
      Originally posted by Ubiquitous View Post
      Linear weights is based on the data so you really shouldn't use linear weight numbers for say a .450/.550 environment for a .225/.325 environment.

      Baseruns, by the way, is suppposed to be able to handle any environment.
      I would say even in a very wrong environment linear weights are still a lot more precise than the SLG coefficients. the value of a double might change from .7 to .75 but it never has twice the value of a single.

      I think it is pretty safe to say that if SLG is equal the guy with the higher BA and lower ISO will produce more. Of course a double is better than a single but a double is never worth 2 singles (to achieve the same SLG with singles than with doubles you need twice as many singles).

      So ISO does compensate for BA but not all the way.

      the really great hitters of course had great BA and great ISO.
      I now have my own non commercial blog about training for batspeed and power using my training experience in baseball and track and field.

      Comment


      • #33
        Not sure that this will add much to the discussion, but shoot me...I like to talk about numbers!

        If you look at baseball and want to 'predict' scoring, you have at least two theoretical options:

        1) find actual baseball results (hopefully a LOT of them and hopefully under the same conditions regarding rules, parks, etc.), run a regression and declare the formula viable. It can and should be tested on similar data that was not part of the set you used to test it's predictive value.

        2) create a statistical model of baseball using actual data. E.g., if a runner is on second, the batter hits a single, the odds are xx% of the time that the runner will score, stop at third, stop at second, the batter will stop at first, make it to second on a throw, get thrown out at second, either runner will advance on an error, etc.

        Obviously, step #2 is insane for modeling a league, but valuable if one wants to model a single situation like, runner on first, no outs, average hitter at the plate and on deck, do I sac bunt or swing away.

        I'm still not up to speed on base runs, but ultimately, all regression models are strong at modeling large amounts of data, but much weaker when one applies it to small amounts of data (e.g., 1 home run) or abberant data (e.g., a .180 batting average with .550 slugging average). In reading on base runs, it apparently has weaknesses as well, but I can't say if they are due to the same flaws.

        Notwithstanding any of this: models are not reality and their validity only comes from whether that do what they purport to do, and their value from whether we need them to do that at all.

        It's unlikely that any model can tell us whether one player is really 'better' than another unless we start with the assumption that we already know what 'better' looks like. Models that tell us that a player's stats are what we expect will produce 'x' number of runs are great, but still we all know that a homer with the bases loaded is not the same as a solo homer. Many (all) of the models we use ignore that and assume homers all have the same value because they are assumed to be random. I'm not sure that I think that's true, and I'm fairly certain I haven't seen anyone prove it's true.

        Better players are possibly (simply) players who play better than other players all the time, or they may be players who play the same as other players most of the time, but are much better than other players when it's more important (e.g., bases loaded, team behind). I don't know that many models try to model that second behavior.

        Down from my soap box!

        I think BA, OBP and Spct are too simple to be taken at face value when players are fairly close in quality, even after normalization. They are better than counting stats, but there's a deeper level we likely should add by joining in situational performance.
        Last edited by drstrangelove; 02-22-2012, 12:36 PM.
        "It's better to look good, than be good."

        Comment


        • #34
          Originally posted by dominik View Post
          I think it is pretty safe to say that if SLG is equal the guy with the higher BA and lower ISO will produce more.
          Not according to the numbers.

          Comment


          • #35
            were are the real stat pros when we need them?
            I now have my own non commercial blog about training for batspeed and power using my training experience in baseball and track and field.

            Comment


            • #36
              Someone asked for an example, using real MLB hitters in actual seasons of play. I decided to do just that. Listed below are thirteen Major League players, same league, same season and the numbers they posted.

              I will try to provide all information that might allow posters to apply whatever run creation or other formula they favor to evaluate the players, maybe even list them, top-to-bottom in run productivity or overall batting value. Meanwhile, I have applied my own RC formula. Then I dug deeper and calculated wOBA for each, with some surprising differences.

              Here are the players and their numbers:

              A. 610 PA; 507 AB; 151 H; 102 BB; 294 TB = .298/.416/.580

              B. 580 PA; 513 AB; 165 H; 60 BB; 240 TB = .322/.399/.468

              C. 644 PA; 581 AB; 156 H; 51 BB; 282 TB = .269/.330/.485

              D. 565 PA; 494 AB; 160 H; 61 BB; 210 TB = .324/.401/.425

              E. 687 PA; 590 AB; 153 H; 92 BB; 269 TB = .259/.360/.486

              F. 703 PA; 584 AB; 149 H; 113 BB; 251 TB = .255/.376/.430

              G. 632 PA; 546 AB; 155 H; 80 BB; 249 TB = .284/.377/.456

              H. 629 PA; 501 AB; 159 H; 121 BB; 233 TB = .317/.452/.465

              I. 653 PA; 569 AB; 169 H; 82 BB; 247 TB = .297/.386/.434

              J. 653 PA; 552 AB; 152 H; 95 BB; 264 TB = .275/.385/.478

              K. 615 PA; 518 AB; 161 H; 82 BB; 263 TB = .311/.406/.508

              L. 632 PA; 538 AB; 149 H; 81 BB; 279 TB = .277/.377/.519

              M. 643 PA; 585 AB; 199 H; 50 BB; 343 TB = .340/.396/.586

              There they are. Real season. Real Players.

              Comment


              • #37
                Originally posted by leewileyfan View Post
                Someone asked for an example, using real MLB hitters in actual seasons of play. I decided to do just that. Listed below are thirteen Major League players, same league, same season and the numbers they posted.

                I will try to provide all information that might allow posters to apply whatever run creation or other formula they favor to evaluate the players, maybe even list them, top-to-bottom in run productivity or overall batting value. Meanwhile, I have applied my own RC formula. Then I dug deeper and calculated wOBA for each, with some surprising differences.

                Here are the players and their numbers:

                A. 610 PA; 507 AB; 151 H; 102 BB; 294 TB = .298/.416/.580

                B. 580 PA; 513 AB; 165 H; 60 BB; 240 TB = .322/.399/.468

                C. 644 PA; 581 AB; 156 H; 51 BB; 282 TB = .269/.330/.485

                D. 565 PA; 494 AB; 160 H; 61 BB; 210 TB = .324/.401/.425

                E. 687 PA; 590 AB; 153 H; 92 BB; 269 TB = .259/.360/.486

                F. 703 PA; 584 AB; 149 H; 113 BB; 251 TB = .255/.376/.430

                G. 632 PA; 546 AB; 155 H; 80 BB; 249 TB = .284/.377/.456

                H. 629 PA; 501 AB; 159 H; 121 BB; 233 TB = .317/.452/.465

                I. 653 PA; 569 AB; 169 H; 82 BB; 247 TB = .297/.386/.434

                J. 653 PA; 552 AB; 152 H; 95 BB; 264 TB = .275/.385/.478

                K. 615 PA; 518 AB; 161 H; 82 BB; 263 TB = .311/.406/.508

                L. 632 PA; 538 AB; 149 H; 81 BB; 279 TB = .277/.377/.519

                M. 643 PA; 585 AB; 199 H; 50 BB; 343 TB = .340/.396/.586

                There they are. Real season. Real Players.
                I actually know what year and league this is because I recognized the line from the first player! Does this mean I read and watch too much baseball? :ooo: Although being a Yankee fan and this year being very historic did make it a little easy, but still....
                Last edited by Joltin' Joe; 02-23-2012, 09:58 PM.

                Comment


                • #38
                  Originally posted by Joltin' Joe View Post
                  I actually know what year and league this is because I recognized the line from the first player! Does this mean I read and watch too much baseball? :ooo: Although being a Yankee fan and this year being very historic did make it a little easy, but still....
                  Well, no problem if you'd like to evaluate each by whatever metrics you like. No harm done. However, if we might keep the season secret for the time being we might get some evaluating responses.

                  You have an edge. You can look up all the data to calculate wOBA if you like. [I did].

                  Comment


                  • #39
                    Originally posted by leewileyfan View Post
                    Well, no problem if you'd like to evaluate each by whatever metrics you like. No harm done. However, if we might keep the season secret for the time being we might get some evaluating responses.

                    You have an edge. You can look up all the data to calculate wOBA if you like. [I did].
                    Leewileyfan, what particular questions or issues prompted your posting these data lines? I realize, of course, that anything interesting will be of interest, but are we still looking to see whether BA is underrated, or comparing a bunch of evaluative tools, and if so to what end, or did you have something else in mind?

                    You mentioned some surprising results. Do they have to do with your version of runs created? How about a little hint?

                    There are some amazing batting lines in here, by the way, and the two from this year that are most so aren't even included.
                    Indeed the first step toward finding out is to acknowledge you do not satisfactorily know already; so that no blight can so surely arrest all intellectual growth as the blight of cocksureness.--CS Peirce

                    Comment


                    • #40
                      sorted by wOBA

                      Code:
                      	wOBA
                      A	0.448
                      M	0.440
                      H	0.429
                      K	0.416
                      L	0.409
                      B	0.400
                      J	0.397
                      D	0.388
                      E	0.388
                      G	0.384
                      I	0.381
                      F	0.376
                      C	0.370

                      Comment


                      • #41
                        Originally posted by Jackaroo Dave View Post
                        Leewileyfan, what particular questions or issues prompted your posting these data lines? I realize, of course, that anything interesting will be of interest, but are we still looking to see whether BA is underrated, or comparing a bunch of evaluative tools, and if so to what end, or did you have something else in mind?

                        You mentioned some surprising results. Do they have to do with your version of runs created? How about a little hint?
                        There are two threads running concurrently here, both having to do with batter value [Player A and Player B] and the various approaches to looking at power, average and the gamut of sabermetric formulas for everything.

                        Take note of Ubiquitous' entry for the players by WOBA. Others may use LWTS, BsR, various RC formulas.

                        That's what I had in mind. Evaluate these players by your favorite metric[s]. We could have a great discussion. It will all revolve around what each poster thinks is significant. [Good question. Thanks.]

                        Comment


                        • #42
                          Originally posted by leewileyfan View Post
                          However, if we might keep the season secret for the time being we might get some evaluating responses.
                          Yes of course. I am looking forward to some insightful analysis by you and other members of this site.

                          Comment


                          • #43
                            Ubiquitus, in Post #40 above, calculated wOBA for the thirteen players and listed them in declining order by wOBA. I, too, worked wOBA on them, getting different results {I readily admit the possibility that either I erred on inputs , OR that we used different weightings}. The results are essentially the same, as to standing order, with a minor change. Column 1.

                            wOBA


                            A. .402
                            M. .379
                            H. .371
                            K. .358
                            L. .355
                            J. .349
                            B. .339
                            G. .335
                            E. .333
                            F. .332
                            I. .329
                            D. .323
                            C. .321

                            Then, One version of RC: [shortcut: (TB+BB)*BA

                            RC shortcut

                            M. 134
                            A. 118
                            H. 112
                            K. 107
                            L. 100
                            J. 99
                            I. 98
                            B. 97
                            F. 93
                            E. 93
                            G. 93
                            C. 90
                            D. 88

                            Then, shuffling them again [not by a metric] but by actual MVP votes that season MVP vote standing in brackets, N = no MVP votes:

                            1. A [5th]
                            2. M [8th]
                            3. H [10th]
                            4. K [11th]
                            5. L [14th]
                            6. D [15th]
                            7. E [23rd]
                            8. B [25th]

                            C, F, G, I and J = N
                            Last edited by leewileyfan; 02-24-2012, 03:28 PM.

                            Comment


                            • #44
                              My favorite quick-and-dirty metric when I'm mobile is SLOB: SLG X OBA. It's usually close to OPS, but when OPS are equal, it favors the batter whose component scores are closer together, i.e. almost always the player with higher OBP (appropriately). To get runs created, just multiply by at-bats. .200 is a hall of fame score, .333 X .420 = .140.

                              A .241
                              M .232
                              H .210
                              K .206
                              L .195
                              B .187
                              J .183
                              G .172
                              D .170
                              I .167
                              E .164
                              F .161
                              C .160

                              No big surprises. The order of the last 4 or 5 seems pretty arbitrary, as they are within .012 of one another.
                              Indeed the first step toward finding out is to acknowledge you do not satisfactorily know already; so that no blight can so surely arrest all intellectual growth as the blight of cocksureness.--CS Peirce

                              Comment


                              • #45
                                Anyone care to chip in with LWTS or BsR or some other metric?

                                Comment

                                Ad Widget

                                Collapse
                                Working...
                                X