Announcement

Collapse
No announcement yet.

Fiato/Souders Matrix - Technical Paper

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fiato/Souders Matrix - Technical Paper

    For some of the harder-core numbers guys around here who might be interested in the specific details about a new sabermetric tool I've been honing for some time now...take a look at this URL (links you to a PDF white-paper on the inner workings of the Fiato/Souders Matrix.

    Tango, Pizza Cutter, Ubi etc...you might have interest

    Bill_Burgess...I don't recommend it to you.



    I'm "priming" the sabermetrics crowds in preparation for SABR 38...my talk will be on this new tool so I figured I'd start getting the word out a little early.
    Last edited by SABR Matt; 05-21-2008, 07:39 AM.

  • #2
    For the record, I am not David Smyth. I don't mind being confused with him, since he's a genius, but I doubt he'd like the connotations of that...

    Comment


    • #3
      My sincere apologies, Pat...there was a write-up of various pythag estimators that claimed Patriot was David Smyth, but I willl correct your name if you will let me know who to credit. Last thing I want is to give the wrong guy credit for work you did that is still considered the gold-standard.

      I could not find an article actually written by you that had your name.
      Last edited by SABR Matt; 05-20-2008, 09:53 PM.

      Comment


      • #4
        I followed along pretty well...I took computational linear algebra twice, and still have the book on my desk.

        I had thought the matrix approach was best for Park Factors, but didn't remember enough to try to design it, and I see how you do PF in context of the offenses, defenses and starting pitchers as well, all considered in one large matrix.

        You individualized the starting pitcher in the matrix, and I agree that is the one player in the game who has the largest proportional effect on the outcome, but instead of games started, would innings be a better weighting factor? Starters don't go as deep anymore, so bullpens are worth more now that in the 60's and before.
        Baseball Prospectus articles
        FanGraphs articles
        MVN Statistically Speaking articles
        Seam Heads articles

        Comment


        • #5
          Yes...innings would be a better weighting factor (or...percent of game since games are the weighting factor for everything else) but we don't have exact innings pitched counts for the starters prior to the play by play era. When the daily summary project is concluded and we have IP counts for each game, I intend to replace the RS values with the runs they actually allowed and the game weights with partial game weights.

          Comment


          • #6
            Editorial note:

            I am speaking with my development team about the possibility of adding a fifth dimension to the matrix...that being the home plate umpire dimension. Baseball Prospectus' dataset suggests the possibility that the home plate ump may have just as large an impact on scoring as the parks, which forces me to consider it...nice that retorsheet has complete umpire records back to 1871...that's pretty cool.
            Last edited by SABR Matt; 05-21-2008, 11:46 AM.

            Comment


            • #7
              Originally posted by SABR Matt View Post
              Editorial note:

              I am speaking with my development team about the possibility of adding a fifth dimension to the matrix...that being the home plate umpire dimension. Baseball Prospectus' dataset suggests the possibility that the home plate ump may have just as large an impact on scoring as the parks, which forces me to consider it...nice that retorsheet has complete umpire records back to 1871...that's pretty cool.
              Off the top of my head I don't think the effect would be as large, and there's so many ways a park can influence the outcomes, where the umpire will vary on a few ball/strike counts, but I do think it is worthwhile to include, at the least to see just how much of an influence.

              I normally use the James Function in observed value vs expected value situations. His original function was to take the true level of the batter (or team A) and the true level of the pitcher, combined in context of the league mean, to calculate the expected result. I solved for R instead, treating that as the observed value, and the opponent as the expected value, again with the league mean, to calculate the true level.

              One logical complication is when there are several factors in the observed value. If you want to look at umpires calling balls and strikes, you have the expected value of the batters in bis games, but also the expected values of the pitchers. You have to combine them first before you have one expected value for the umpire. Here is where a matrix might be the superior method, but there is a reason I took computational linear algebra twice, and I didn't do much better the second time around. Then I became a Geography major, so now I make maps on computers, and still have to do coordinate transformation matrices.
              Baseball Prospectus articles
              FanGraphs articles
              MVN Statistically Speaking articles
              Seam Heads articles

              Comment


              • #8
                Yep...this is why we need to include the umpire data...to see if the umps are having a big impact or if it just looks that way in the BP data because they were drawing weird non-random samples from the pitchers, batters, and parks.

                Comment


                • #10
                  Nice work Tom...

                  And that's about what I expected. It looks like the range of ERAs supports a significant umpire run factor even if a lot of them is noise. The F/S Matrix will factor out the noise caused by uneven umpiring schedules (some umps might get more bat pitching) and parks, which will be important information.

                  Comment


                  • #11
                    Right, I wouldn't want to settle on ERA for all the obvious reasons, and I'm presuming you are just saying ERA as that was the red flag that alerted you for it, and really you are looking at the components (as granular as your data allows).

                    The swing between a strike to a ball is around .15 to .20 runs or so per pitch. So, we can see if that there's a 3 or 4 pitch difference between the extreme and the average, we're talking about a substantial (for our purposes anyway) difference of around 0.50 in your ERA.
                    Author of THE BOOK -- Playing The Percentages In Baseball

                    Comment


                    • #12
                      I'm just saying ERA because over at detectovision.com they brought the BP umpire study to my attention earlier this season and again this week and that was the stat they were quoting. I never use ERA to rate pitching or umpiring...just sort of like Voros McCracken did with his DIPS ERA, you look at the range of ERAs and immediately realize that it would be impossible to duplicate that range without the umps playing a role.

                      The F/S Matrix uses runs as its' "truth" but it's going to factor out all of the other noise that usually poison's ERA.

                      Comment


                      • #13
                        I read the umpire thread Tango linked...I think I'll ru some numbers myself

                        I plan on SELECT HomeTeam, VisTeam, Site, Year, Batter, Pitcher, HP_Ump, then the various pitch outcomes.

                        With this selection, I can get league mean, do multiyear park factors, then mean and normalize both batters and pitchers, finally giving the expected value for each umpire, comparing their expected value to their observed.

                        I'm starting to love SQL
                        Baseball Prospectus articles
                        FanGraphs articles
                        MVN Statistically Speaking articles
                        Seam Heads articles

                        Comment


                        • #14
                          The F/S Matrix would be impossible without MySQL...very powerful language...very easy to learn.

                          Comment


                          • #15
                            You must be watching different umps than the ones calling games against the Ms...because the Mariners' hitters have been very frustrated this year with lots of high strike calls. LOL

                            Comment

                            Ad Widget

                            Collapse
                            Working...
                            X