Announcement

Collapse
No announcement yet.

Is FIP/xFIP a better stat than ERA?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is FIP/xFIP a better stat than ERA?

    Many people rather talk about FIP than ERA.

    But how usefull is that stat? I know FIP ignores non home run hits and xFIP even ignores HRs so the purpose is to avoid fielding and luck.

    does a FIP/xFIP lower than the ERA always mean that you are lucky? and does this equal out over a career (have there been guys lucky/unlucky over a whole career-or does FIP after several years match the ERA?)?
    I now have my own non commercial blog about training for batspeed and power using my training experience in baseball and track and field.

  • #2
    Originally posted by dominik View Post
    Many people rather talk about FIP than ERA.

    But how usefull is that stat? I know FIP ignores non home run hits and xFIP even ignores HRs so the purpose is to avoid fielding and luck.

    does a FIP/xFIP lower than the ERA always mean that you are lucky? and does this equal out over a career (have there been guys lucky/unlucky over a whole career-or does FIP after several years match the ERA?)?
    It depends on when we can start taking BABIP and LOB%/sequencing "seriously". As a rule of thumb, r=.5 for both BABIP and LOB% at 7-8 seasons. So with less than 7-8 seasons, I'd take FIP. After that, I'd take a defensive-adjusted ERA (or RA) over FIP. The point is, regardless of the sample size, we have to weed out defense as much as possible. Actualizing that is very hard, of course.

    Now unadjusted ERA vs. FIP after 7-8 years? Not sure. And FIP vs. xFIP? Again, HR/FB has a r=.5 point, so I would want to switch from xFIP to FIP sometime around that r=.5 point.

    As far as luck and ERA and FIP - there are many guys who ALWAYS outperform their FIP. Think Glavine, Cain, Buehrle, etc. And some who always under-perform. Think Lieber, Vazquez, and Grinke. Randomness would suggest that some guys would always out or under perform based just on luck (like if you flipped a coin 20x 7,000 times-at least a few of them would get close to 20 heads and a few would get close to 20 tails), but there are so many who out/under-perform, randomness cannot explain it all. And this is because there is some skill involved in BABIP and LOB% - it just takes a large sample size to find and quantify it.

    Also, keep in mind that BABIP and sequencing are not the only things neglected by FIP. GIDPs, pitcher defense, hit batsmen, wild pitches, and controlling the running game are all factors that can add up. Now each of these are smallish factors on their own, but can add up to be large factors for pitchers who master many of them. Take Glavine, who was above average in all of these areas. Take these and a good BABIP (even after his defense is considered) and those famous situational splits, and we can see why Glavine was a HOf pitcher despite decent, but not great FIP. The bottom line is, if a pitcher seems to frequently out or under perform their peripherals, look behind the numbers to see if we can see why and if it is sustainable.

    How useful is it? When pitchers like Damion Moss, Jamie Garcia or Jonathon Sanchez, etc. have great rookie seasons, FIP will correlate with future performance better than ERA. FIP is a good predictive tool for young pitchers.
    Last edited by Bothrops Atrox; 07-12-2012, 04:16 PM.
    1885 1886 1926 1931 1934 1942 1944 1946 1964 1967 1982 2006 2011

    1887 1888 1928 1930 1943 1968 1985 1987 2004 2013

    1996 2000 2001 2002 2005 2009 2012 2014 2015


    The Top 100 Pitchers In MLB History
    The Top 100 Position Players In MLB History

    Comment


    • #3
      Excellent, articulate explanation^.
      UI2
      BTB

      Comment


      • #4
        Originally posted by Matthew C. View Post
        It depends on when we can start taking BABIP and LOB%/sequencing "seriously". As a rule of thumb, r=.5 for both BABIP and LOB% at 7-8 seasons. So with less than 7-8 seasons, I'd take FIP. After that, I'd take a defensive-adjusted ERA (or RA) over FIP. The point is, regardless of the sample size, we have to weed out defense as much as possible. Actualizing that is very hard, of course.

        Now unadjusted ERA vs. FIP after 7-8 years? Not sure. And FIP vs. xFIP? Again, HR/FB has a r=.5 point, so I would want to switch from xFIP to FIP sometime around that r=.5 point.

        As far as luck and ERA and FIP - there are many guys who ALWAYS outperform their FIP. Think Glavine, Cain, Buehrle, etc. And some who always under-perform. Think Lieber, Vazquez, and Grinke. Randomness would suggest that some guys would always out or under perform based just on luck (like if you flipped a coin 20x 7,000 times-at least a few of them would get close to 20 heads and a few would get close to 20 tails), but there are so many who out/under-perform, randomness cannot explain it all. And this is because there is some skill involved in BABIP and LOB% - it just takes a large sample size to find and quantify it.

        Also, keep in mind that BABIP and sequencing are not the only things neglected by FIP. GIDPs, pitcher defense, hit batsmen, wild pitches, and controlling the running game are all factors that can add up. Now each of these are smallish factors on their own, but can add up to be large factors for pitchers who master many of them. Take Glavine, who was above average in all of these areas. Take these and a good BABIP (even after his defense is considered) and those famous situational splits, and we can see why Glavine was a HOf pitcher despite decent, but not great FIP. The bottom line is, if a pitcher seems to frequently out or under perform their peripherals, look behind the numbers to see if we can see why and if it is sustainable.

        How useful is it? When pitchers like Damion Moss, Jamie Garcia or Jonathon Sanchez, etc. have great rookie seasons, FIP will correlate with future performance better than ERA. FIP is a good predictive tool for young pitchers.


        Very nice - Thank you (coming from one who is statistically challenged). I would think there are FIP factors that a player just cannot get away from and will affect his FIP (defense, park-factor, etc).
        My blog - http://sandlotwisdom.blogspot.com/

        Comment


        • #5
          Whenever FIP/xFIP is discussed I am reminded of this article.

          An Open Letter to the Viceroy of Stats
          By Grant Brisbee


          Dear Viceroy of Stats,

          First off, thank you for the stats. If I were to do a line graph comparing my love for baseball and the rise of the internet, the two lines would start rising dramatically around 1996 without a single dip. The stats are a big part of that. One of my favorite things in the world is feeling superior to other people. Now when someone references RBI, I know I’m objectively better than them in every capacity. You can’t buy that feeling, and I have stats to thank. Plus, when people argue about "sabermetrics" vs. "sabremetrics", it reminds me of the Northern Conservative Baptist Great Lakes Region Council of 1912 joke,and that’s always a good thing.

          But I also remember those early days of the internet stats. No-hit, all-glove wizards were not tolerated. Teams and GMs who signed players like Royce Clayton, Rey Sanchez, and Mike Bordick were mocked without mercy. The new stats, though, tell us that some of those guys had pretty valuable seasons. Jose Vizcaino, for whom I had a strong distaste in 1997, was actually a 2+ win player that year. Well, I’ll be. This isn’t to suggest that because the methods of evaluation have changed, people should discount every innovation because it’s likely to be considered wrong in a decade. Of course not.

          It might not be a bad idea, though, always to assume that stats are likely to contain some measure of imperfection. When I see single-season WAR totals used with a dogmatic certainty, it makes me uneasy. I have a feeling that the formula for WAR will be updated and tinkered for years, if not decades, because it’s surely tricky to combine hitting stats with something as variable as single-season fielding stats to produce a single number. Yet there’s a small faction among us who likes to use single-season WAR as a blunt object. It feels like some folks -- certainly not most or all -- use the stat without the spirit of intellectual curiosity with which it was created.

          So I’ve searched for the most diplomatic way to phrase this, and I think I’ve arrived at something that fair, honest, and non-combative. Here goes: Matt Cain is good, and people who use xFIP as a blunt object can shut their yap holes. The idea of normalizing ERA to account for luck with balls put in play is a fine one. Trying to normalize home runs per fly all is a good idea too. Assuming that the current construct will work as an infallible predictive tool for every single pitcher in professional baseball right now? Not my favorite idea.

          Matt Cain has outperformed his FIP for four straight seasons. He has probably benefited from some measure of luck, especially in 2009, when he beat the mark by a full run. The traditional stat, ERA, indicates that Matt Cain is an elite pitcher. FIP suggests that Cain is merely very good. That’s a fair debate. Pitchers can do that sort of thing for an entire career, but they’re the exceptions, not the rules. The burden of proof would probably be on the person suggesting that Cain is elite.

          However, xFIP suggests that Matt Cain is an innings-eater of the most ordinary capacity, like a Jon Garland or a Joe Blanton. Matt Cain’s career xFIP is 4.43, and aaaaaaany day now, his ERA will regress to meet that mark. Some people pounce on that, and they froth at the mention of Matt Cain as a top pitcher. And I’m forced to react like a troglodyte, mentioning that a) I’VE TOTALLY WATCHED, LIKE, EVERY ONE OF HIS STARTS, AND MY EYES ARE MORE BETTER THAN YOUR STATS, and b) but his ERA! I don’t like both of those arguments. I can link to a study by the wizard who actually invented FIP, which acknowledges that there could be outliers like Cain when calculating xFIP, but because the math hurts my brain, I can’t do anything but appeal to his authority.

          It feels like with some folks, you get "Matt Cain’s xFIP is this. His ERA is that. The difference means there is something wrong with Matt Cain." I would like more, "Matt Cain’s xFIP is this. His ERA is that. Maybe there’s something that makes this happen every year." That’s all. I would just like the small, vocal minority to use stats like WAR, FIP, and xFIP as useful tools, not divinely inspired scripture just yet. Please command them to do so with your powers as Viceroy of Stats.

          I would like to end this open letter by noting that Matt Cain did not allow an earned run this postseason, and contrary to popular belief, that performance has tremendous predictive value. I predict that in 20 years, Matt Cain’s performance in the 2010 playoffs will still have been totally awesome.
          http://www.mccoveychronicles.com/201...ceroy-of-stats
          Last edited by Honus Wagner Rules; 07-13-2012, 12:19 PM.
          Strikeouts are boring! Besides that, they're fascist. Throw some ground balls - it's more democratic.-Crash Davis

          Comment


          • #6
            Thanks for the explanation. Can you spell out for me what "r = .5" means here? I understand r to represent the coefficient of correlation, generally, but I do not knw if that is the case here, and if so, which variables are being correlated or why .5 is a crucial value. Thanks a lot.
            Indeed the first step toward finding out is to acknowledge you do not satisfactorily know already; so that no blight can so surely arrest all intellectual growth as the blight of cocksureness.--CS Peirce

            Comment


            • #7
              Originally posted by Jackaroo Dave View Post
              Thanks for the explanation. Can you spell out for me what "r = .5" means here? I understand r to represent the coefficient of correlation, generally, but I do not knw if that is the case here, and if so, which variables are being correlated or why .5 is a crucial value. Thanks a lot.
              "r" is the point in which we regress. So when r=.5, we would want to regress the particular data 50% to average. For BABIP, it takes about 3,700 PAs for a pitchers BABIP to be half luck and half skill. Well, statistically, at least. It is more accurate to say it is at that point in which we are 50% confident of his BABIP skill.
              1885 1886 1926 1931 1934 1942 1944 1946 1964 1967 1982 2006 2011

              1887 1888 1928 1930 1943 1968 1985 1987 2004 2013

              1996 2000 2001 2002 2005 2009 2012 2014 2015


              The Top 100 Pitchers In MLB History
              The Top 100 Position Players In MLB History

              Comment


              • #8
                Originally posted by Honus Wagner Rules View Post
                Whenever FIP/xFIP is discussed I am reminded of this article.



                http://www.mccoveychronicles.com/201...ceroy-of-stats

                The thing often missed is: given thousands of pitchers with significant longevity, randomness would predict x amount of pitchers to get lucky in terms of beating peripherals four, five, even fifteen times in a row. But we also know that there are many more pitchers who have BABIP and LOB% skill than what randmoness would predict. So we know that there is BABIP skill.

                So how do we know if Cain is one of the "lucky" ones or if he is a true BABIP reducer? 1. look at his stuff or anything else that would indicate skill, i.e batted ball repertoire, etc. #2. Regress to the mean based on his PAs, and what you get will be the best guess to his true BABIP ability. There is plenty there to indicate that he is having an impact on BIP and once you regress his BABIP to league average based on his PAs, his BABIP is still well above league average.

                The article does a great job of expressing that the Cains and Glavines of the world are the exceptions and do not make FIP "useless" or "garbage." FIP is a great predictor of future performance for a majority of pitchers. We are now looking into the makeup and repertoires of pitchers to see how they are able to outperform their peripherals year after year after year.
                1885 1886 1926 1931 1934 1942 1944 1946 1964 1967 1982 2006 2011

                1887 1888 1928 1930 1943 1968 1985 1987 2004 2013

                1996 2000 2001 2002 2005 2009 2012 2014 2015


                The Top 100 Pitchers In MLB History
                The Top 100 Position Players In MLB History

                Comment


                • #9
                  Originally posted by Matthew C. View Post
                  "r" is the point in which we regress. So when r=.5, we would want to regress the particular data 50% to average. For BABIP, it takes about 3,700 PAs for a pitchers BABIP to be half luck and half skill. Well, statistically, at least. It is more accurate to say it is at that point in which we are 50% confident of his BABIP skill.
                  Thank you, Matthew. I will work on this.

                  My general reservation is that BABIP skill may vary so I'm not sure that "HIS BABIP skill" corresponds to some value to be 50% confident about. I suppose that is where xFIP or xBABIP (if that exists) comes in.

                  I've noticed that baseball statisticians who regress data to the mean tend not to use confidence intervals. Is this just a coincidence, or does one supplant the other? (Sorry to pester you; obviously I've got a lot of work to do on my own).
                  Indeed the first step toward finding out is to acknowledge you do not satisfactorily know already; so that no blight can so surely arrest all intellectual growth as the blight of cocksureness.--CS Peirce

                  Comment


                  • #10
                    double post
                    Last edited by Jackaroo Dave; 07-14-2012, 10:59 AM.
                    Indeed the first step toward finding out is to acknowledge you do not satisfactorily know already; so that no blight can so surely arrest all intellectual growth as the blight of cocksureness.--CS Peirce

                    Comment


                    • #11
                      Originally posted by Jackaroo Dave View Post
                      I've noticed that baseball statisticians who regress data to the mean tend not to use confidence intervals. Is this just a coincidence, or does one supplant the other? (Sorry to pester you; obviously I've got a lot of work to do on my own).
                      I have noticed that too, but am not sure why, exactly. I am not a statistician. I really dislike math, actually. It reminds me of when I went to the doctor with a health concern, in which they told me I probably had something but refused to officially diagnose it.

                      And you are right, there is a lot about BABIP skill that we are still unsure about, so we are dealing with generalities and assumptions to a large degree.
                      1885 1886 1926 1931 1934 1942 1944 1946 1964 1967 1982 2006 2011

                      1887 1888 1928 1930 1943 1968 1985 1987 2004 2013

                      1996 2000 2001 2002 2005 2009 2012 2014 2015


                      The Top 100 Pitchers In MLB History
                      The Top 100 Position Players In MLB History

                      Comment


                      • #12
                        Originally posted by Jackaroo Dave View Post
                        Thank you, Matthew. I will work on this.

                        My general reservation is that BABIP skill may vary so I'm not sure that "HIS BABIP skill" corresponds to some value to be 50% confident about. I suppose that is where xFIP or xBABIP (if that exists) comes in.

                        I've noticed that baseball statisticians who regress data to the mean tend not to use confidence intervals. Is this just a coincidence, or does one supplant the other? (Sorry to pester you; obviously I've got a lot of work to do on my own).
                        the thing is the more innings you have pitched the more the BABIP value will show the real expected value. This is also called law of large numbers. I'm no mathematician either but usually the smaller the sample size the higher the fluctuations of the values.

                        If a pitcher has pitched 2 seasons and one time the BABIP is .220 and the next year it is .330 we cannot really say much his BABIP skill and you would be much better off using league average BABIP as a reference.

                        However after 10 seasons or so we can say a lot more about his true BABIP. and at the end of the career BABIP is of course 100% skill.
                        I now have my own non commercial blog about training for batspeed and power using my training experience in baseball and track and field.

                        Comment


                        • #13
                          Originally posted by dominik View Post
                          the thing is the more innings you have pitched the more the BABIP value will show the real expected value. This is also called law of large numbers. I'm no mathematician either but usually the smaller the sample size the higher the fluctuations of the values.

                          If a pitcher has pitched 2 seasons and one time the BABIP is .220 and the next year it is .330 we cannot really say much his BABIP skill and you would be much better off using league average BABIP as a reference.

                          However after 10 seasons or so we can say a lot more about his true BABIP. and at the end of the career BABIP is of course 100% skill.
                          Thank you, Dominik. I understand your well-expressed points, but I retain my reservations.

                          You equate the career BABIP average with player skill, and I understand that since the value is no longer a variable, but a constant, there is no variation to account for. But I would say that career BABIP is a measure of performance, which is a combination of skill and random variation, not a reliable measure of pure skill.

                          Suppose you have a balanced coin and you flip it a hundred times, getting 48 heads and 52 tails. (Or you flip it a million times and get 5,000,382 heads and 4,999,618 tails, whatever). Then you never flip it again. Are you telling me that the coin's ability to generate heads was .48 (or .5000382 or whatever)? I would say no, the probability is .5 by definition, the relative frequency is .48, which is not the same thing.

                          The law of large numbers (in one of its forms) claims that as the number of trials approaches infinity, the relative frequency approaches the true probability, but a player's career is well short of infinite. Of course as the number of innings or batters faced increases, the difference due to random variation grows proportionally minute, and the player's output becomes a better predictor, but it doesn't vanish.

                          Hence my puzzlement about whether "true talent" or "his [actual, permanent] BABIP skill" correspond to anything but a theoretical construct.

                          In particular, with a coin or a random number generator, the probability of an outcome is fixed throughout the instrument's career. For a player, ability is varying like the weather: for example, Yaz at the end of his career did much better in the first half of a season than in the second. So suppose he is playing in June: His ability is going up, compared to last September, but it's going down compared to 3 years ago. And today is a day game, so it's going up, but he's facing a lefthander, so it's going down. In what way does it make sense to talk about a true level of skill for Yaz? (Pluswhich, even instantaneously, when the ball hits the bat, it's not just his skill level at that moment but a host of random things he can't control that help determine the outcome.)

                          Although these points are not news to anybody, as I said, these things still honestly puzzle me and I'm grateful for any help.
                          Indeed the first step toward finding out is to acknowledge you do not satisfactorily know already; so that no blight can so surely arrest all intellectual growth as the blight of cocksureness.--CS Peirce

                          Comment


                          • #14
                            Originally posted by dominik View Post
                            . and at the end of the career BABIP is of course 100% skill.
                            I am glad you put the smiley face as I suspect you know that isn't true ! In order to only have to regress 15% to the mean, one would need like 17,000+ PA's. So even for Steve Carlton, we would still want to regress his BABIP 15%-20% to the mean in order to get the closest estimate of his true skill. No player has or will play long enough to weed out all of the potential luck or doubt. So we can never truly identify 100% skill for BABIP or any other event. For events in which players have more control over (like K's for example), we can get pretty close.

                            One person once said, no great player is really as good as they look and no horrible player is really as bad as they look. The true talent curve is a little flatter than the performance curve.
                            1885 1886 1926 1931 1934 1942 1944 1946 1964 1967 1982 2006 2011

                            1887 1888 1928 1930 1943 1968 1985 1987 2004 2013

                            1996 2000 2001 2002 2005 2009 2012 2014 2015


                            The Top 100 Pitchers In MLB History
                            The Top 100 Position Players In MLB History

                            Comment


                            • #15
                              Originally posted by Matthew C. View Post
                              I am glad you put the smiley face as I suspect you know that isn't true ! In order to only have to regress 15% to the mean, one would need like 17,000+ PA's. So even for Steve Carlton, we would still want to regress his BABIP 15%-20% to the mean in order to get the closest estimate of his true skill. No player has or will play long enough to weed out all of the potential luck or doubt. So we can never truly identify 100% skill for BABIP or any other event. For events in which players have more control over (like K's for example), we can get pretty close.

                              One person once said, no great player is really as good as they look and no horrible player is really as bad as they look. The true talent curve is a little flatter than the performance curve.
                              I can see how this is working out, and once I learn the operational definition of "true talent" (in contexts of course) and where the numbers come from like 15% regression requires about 17K PA's, I'll be on my way!
                              Indeed the first step toward finding out is to acknowledge you do not satisfactorily know already; so that no blight can so surely arrest all intellectual growth as the blight of cocksureness.--CS Peirce

                              Comment

                              Ad Widget

                              Collapse
                              Working...
                              X