Announcement

Collapse

Updated Baseball Fever Policy

Baseball Fever Policy

I. Purpose of this announcement:

This announcement describes the policies pertaining to the operation of Baseball Fever.

Baseball Fever is a moderated baseball message board which encourages and facilitates research and information exchange among fans of our national pastime. The intent of the Baseball Fever Policy is to ensure that Baseball Fever remains an extremely high quality, extremely low "noise" environment.

Baseball Fever is administrated by three principal administrators:
webmaster - Baseball Fever Owner
The Commissioner - Baseball Fever Administrator
Macker - Baseball Fever Administrator

And a group of forum specific super moderators. The role of the moderator is to keep Baseball Fever smoothly and to screen posts for compliance with our policy. The moderators are ALL volunteer positions, so please be patient and understanding of any delays you might experience in correspondence.

II. Comments about our policy:

Any suggestions on this policy may be made directly to the webmaster.

III. Acknowledgments:

This document was based on a similar policy used by SABR.

IV. Requirements for participation on Baseball Fever:

Participation on Baseball Fever is available to all baseball fans with a valid email address, as verified by the forum's automated system, which then in turn creates a single validated account. Multiple accounts by a single user are prohibited.

By registering, you agree to adhere to the policies outlined in this document and to conduct yourself accordingly. Abuse of the forum, by repeated failure to abide by these policies, will result in your access being blocked to the forum entirely.

V. Baseball Fever Netiquette:

Participants at Baseball Fever are required to adhere to these principles, which are outlined in this section.
a. All posts to Baseball Fever should be written in clear, concise English, with proper grammar and accurate spelling. The use of abbreviations should be kept to a minimum; when abbreviation is necessary, they should be either well-known (such as etc.), or explained on their first use in your post.

b. Conciseness is a key attribute of a good post.

c. Quote only the portion of a post to which you are responding.

d. Standard capitalization and punctuation make a large difference in the readability of a post. TYPING IN ALL CAPITALS is considered to be "shouting"; it is a good practice to limit use of all capitals to words which you wish to emphasize.

e. It is our policy NOT to transmit any defamatory or illegal materials.

f. Personal attacks of any type against Baseball Fever readers will not be tolerated. In these instances the post will be copied by a moderator and/or administrator, deleted from the site, then sent to the member who made the personal attack via a Private Message (PM) along with a single warning. Members who choose to not listen and continue personal attacks will be banned from the site.

g. It is important to remember that many contextual clues available in face-to-face discussion, such as tone of voice and facial expression, are lost in the electronic forum. As a poster, try to be alert for phrasing that might be misinterpreted by your audience to be offensive; as a reader, remember to give the benefit of the doubt and not to take umbrage too easily. There are many instances in which a particular choice of words or phrasing can come across as being a personal attack where none was intended.

h. The netiquette described above (a-g) often uses the term "posts", but applies equally to Private Messages.

VI. Baseball Fever User Signature Policy

A signature is a piece of text that some members may care to have inserted at the end of ALL of their posts, a little like the closing of a letter. You can set and / or change your signature by editing your profile in the UserCP. Since it is visible on ALL your posts, the following policy must be adhered to:

Signature Composition
Font size limit: No larger than size 2 (This policy is a size 2)
Style: Bold and italics are permissible
Character limit: No more than 500 total characters
Lines: No more than 4 lines
Colors: Most colors are permissible, but those which are hard to discern against the gray background (yellow, white, pale gray) should be avoided
Images/Graphics: Allowed, but nothing larger than 20k and Content rules must be followed

Signature Content
No advertising is permitted
Nothing political or religious
Nothing obscene, vulgar, defamatory or derogatory
Links to personal blogs/websites are permissible - with the webmaster's written consent
A Link to your Baseball Fever Blog does not require written consent and is recommended
Quotes must be attributed. Non-baseball quotes are permissible as long as they are not religious or political

Please adhere to these rules when you create your signature. Failure to do so will result in a request to comply by a moderator. If you do not comply within a reasonable amount of time, the signature will be removed and / or edited by an Administrator. Baseball Fever reserves the right to edit and / or remove any or all of your signature line at any time without contacting the account holder.

VII. Appropriate and inappropriate topics for Baseball Fever:

Most concisely, the test for whether a post is appropriate for Baseball Fever is: "Does this message discuss our national pastime in an interesting manner?" This post can be direct or indirect: posing a question, asking for assistance, providing raw data or citations, or discussing and constructively critiquing existing posts. In general, a broad interpretation of "baseball related" is used.

Baseball Fever is not a promotional environment. Advertising of products, web sites, etc., whether for profit or not-for-profit, is not permitted. At the webmaster's discretion, brief one-time announcements for products or services of legitimate baseball interest and usefulness may be allowed. If advertising is posted to the site it will be copied by a moderator and/or administrator, deleted from the site, then sent to the member who made the post via a Private Message (PM) along with a single warning. Members who choose to not listen and continue advertising will be banned from the site. If the advertising is spam-related, pornography-based, or a "visit-my-site" type post / private message, no warning at all will be provided, and the member will be banned immediately without a warning.

It is considered appropriate to post a URL to a page which specifically and directly answers a question posted on the list (for example, it would be permissible to post a link to a page containing home-road splits, even on a site which has advertising or other commercial content; however, it would not be appropriate to post the URL of the main page of the site). The site reserves the right to limit the frequency of such announcements by any individual or group.

In keeping with our test for a proper topic, posting to Baseball Fever should be treated as if you truly do care. This includes posting information that is, to the best of your knowledge, complete and accurate at the time you post. Any errors or ambiguities you catch later should be acknowledged and corrected in the thread, since Baseball Fever is sometimes considered to be a valuable reference for research information.

VIII. Role of the moderator:

When a post is submitted to Baseball Fever, it is forwarded by the server automatically and seen immediately. The moderator may:
a. Leave the thread exactly like it was submitted. This is the case 95% of the time.

b. Immediately delete the thread as inappropriate for Baseball Fever. Examples include advertising, personal attacks, or spam. This is the case 1% of the time.

c. Move the thread. If a member makes a post about the Marlins in the Yankees forum it will be moved to the appropriate forum. This is the case 3% of the time.

d. Edit the message due to an inappropriate item. This is the case 1% of the time. There have been new users who will make a wonderful post, then add to their signature line (where your name / handle appears) a tagline that is a pure advertisement. This tagline will be removed, a note will be left in the message so he/she is aware of the edit, and personal contact will be made to the poster telling them what has been edited and what actions need to be taken to prevent further edits.

The moderators perform no checks on posts to verify factual or logical accuracy. While he/she may point out gross errors in factual data in replies to the thread, the moderator does not act as an "accuracy" editor. Also moderation is not a vehicle for censorship of individuals and/or opinions, and the moderator's decisions should not be taken personally.

IX. Legal aspects of participation in Baseball Fever:

By submitting a post to Baseball Fever, you grant Baseball Fever permission to distribute your message to the forum. Other rights pertaining to the post remain with the ORIGINAL author, and you may not redistribute or retransmit any posts by any others, in whole or in part, without the express consent of the original author.

The messages appearing on Baseball Fever contain the opinions and views of their respective authors and are not necessarily those of Baseball Fever, or of the Baseball Almanac family of sites.

Sincerely,

Sean Holtz, Webmaster of Baseball Almanac & Baseball Fever
www.baseball-almanac.com | www.baseball-fever.com
"Baseball Almanac: Sharing Baseball. Sharing History."
See more
See less

Handling errors in run estimation

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Handling errors in run estimation

    Hello, statheads!

    I'm new to this site, and relatively new to sabermetrics (though I read some of Bill James's books as a kid in the '80s).

    I've been working on analyzing the results of the first season (2007) of the Israel Baseball League (IBL - see http://www.israelbaseballleague.com), a professional league which played a 40-game season this past summer. Some of my findings are available on my blog: http://biblemetrics.blogspot.com.

    As you can see from my recent posts there, I'm having trouble reconciling the conventional (MLB) run estimation formulas with the IBL stats. About 20% more runs were scored than the predictions.

    I suspect that the main element missing is errors. IBL games featured about three times as many errors as in the majors. But none of the run estimators seem to take errors into account.

    Has anyone done any work in this direction? I would think that error rates must also be high in the minors - do the run estimators have the same problems there?

    Thanks,

    The Iblemetrician
    Israel
    http://biblemetrics.blogspot.com
    The blog for Israel Baseball League analysis

  • #2
    Wow...I had no idea they were taking an interest in baseball in the middle east.

    FWIW, some run estimators do take errors into account. Linear Weights, if you use one of the more detailed equations, have error terms. I'm guessing though that errors have a different value in a pre-major-league environment like that because, much like softball, errors would probably result in many more bases advanced by the batter and other baserunners than they do in the major leagues. What kind of information do you have at your disposal regarding the IBL. Do they track just the basics or do you have play by play records? If you've got something more detailed than your basic 1B, 2B, 3B, BB, K, HR, RBI, R type stats, I'd like to know. Even if you don't, whatever stats you have from the IBL would serve as a potentially valuable window into measuring league quality...a league as new and (no offense to the Israelis) "young" developmentally as the IBL could teach us a lot about the earmarks of sub-major-league baseball.

    Comment


    • #3
      Wow, this is really neat. I'd heard about IBL earlier this year, but it's fascinating to think about applying sabr work to such a novel system. I'm subscribing to your blog.

      With the runs estimators, the problem you're probably running into is that you're probably dealing with a massively different run environment than in MLB, which means that any linear weights based on MLB numbers will miss pretty badly (as you noted on your blog).

      Seems like base runs would be your best bet moving forward, as it's supposed to hold up well in extreme/unusual environments because it's based on a decent model of run scoring. And you can modify it when you need to include additional terms--and once you have a good model of run scoring, you can generate your own custom linear weights.

      Patriot has a nice writeup on base runs that should be helpful (as well as other run estimators):
      http://gosu02.tripod.com/id108.html

      Here's a "full" version of base runs from his article:
      A = H + W + HB - HR - CS - DP
      B = .777*S + 2.61*D + 4.29*T + 2.43*HR + .03*(W + HB - IW) - .747*IW + 1.30*SB + .13*CS + 1.08*SH + 1.81*SF + .70*DP - .04*(AB - H)
      C = AB - H + SH + SF
      D = HR

      Where base runs = A * B/(B+C) + D

      Now I am totally unqualified to be advising on this, as I've never tried to manipulate such an equation before. But it seems to me that errors would result in additional men on base (the "A" term) as well as additional advancement of runners (the "B" term).

      So you'd definitely add reached-base-on-errors to the A term (if you don't have reached-base-on-errors, and just total errors, you might need to add a coefficient here to approximate reached-base-on-errors).

      You'd add errors to the B term as well, though you'll have to futz around with the coefficient to find something that works (I'd probably start by equating them to a single).

      I'm sure others might have much better suggestions on how to go about this, but this might get you started.
      -j
      ---
      My blog: On Baseball and the Reds

      Comment


      • #4
        The run value of an error on the batter reaching base in MLB is about .02 runs higher than a single. So, as a general rule, just counts an error like a single.

        I would start with http://www.tangotiger.net/markov.html to see how the IBL compares. The constraints of the system (lack of outs on baserunners) gets more exposed the lesser the quality of the league (where outs on the bases is likely far higher).
        Author of THE BOOK -- Playing The Percentages In Baseball

        Comment


        • #5
          Tango...I don't think you can assume that a reach-on-error in the IBL will be similar to a single. In weaker leagues, errors are usually more spectacular and result in compounding errors (think about a softball game where one bad throw often leads to another). I think an error is going to be worth a lot more relative to a single in a weak, error-prone league.

          Comment


          • #6
            Originally posted by SABR Matt View Post
            Wow...I had no idea they were taking an interest in baseball in the middle east.
            Well, I wouldn't go that far. Most of the fans either grew up in North America or lived there for a while - or their parents did. Only 10% of the players are Israeli; the rest are mostly Americans, followed by Dominicans and a surprisingly talented contingent of Australians. I don't see baseball competing with soccer or basketball here anytime in my lifetime, at least.

            Do they track just the basics or do you have play by play records?
            One thing the league did well was recordkeeping. I have the play-by-play game logs, though I have yet to write the software to process them. For now, I'm working from the box scores which lump all errors together as E. Hopefully at some point I'll be able to break down reached-on-error, not to mention other juicy data like flyballs and groundballs, taking the extra base, etc.

            Even if you don't, whatever stats you have from the IBL would serve as a potentially valuable window into measuring league quality...a league as new and (no offense to the Israelis) "young" developmentally as the IBL could teach us a lot about the earmarks of sub-major-league baseball.
            One question I'd certainly like to look at is how to assess the play level of the league. Is there any reliable way to do that other than to look at players who have played in both the IBL and other leagues, and compare their performance levels? Is it possible to assess play levels without a broader context - like looking at pitcher control, maybe?
            http://biblemetrics.blogspot.com
            The blog for Israel Baseball League analysis

            Comment


            • #7
              Thanks to everyone for the suggestions on the formulas. Looks like I could use some data on the actual consequences of errors in the league - how many batters reached base, how many runners advanced. Interesting that the conventional stats don't distinguish between those very different situations.
              http://biblemetrics.blogspot.com
              The blog for Israel Baseball League analysis

              Comment


              • #8
                Indeed...conventional statistics have a severe failing when it comes to recording the real meaning of errors.

                If you have a complete PBP record, there's a LOT you can do in terms of linear analysis...I recommend you spend your time working on a way to process that information, because once you process that, you can compare the rates at which "sloppy" events occur to similar rates in the majors throughout history. HBP, WP, SB, E etc...I'm willing to bet that the IBL is very similar to 19th century baseball...a PBP record of the IBL could give us tools to better understand our own past.

                Comment


                • #9
                  So I took the visual queue from the "powered by mlb.com" logo and went digging. I found all of the PBP data for the Israel Baseball League in Gameday format located in the following location...

                  http://gd2.mlb.com/components/game/ind/year_2007/

                  Now if anyone around here has a working version of Adler's Hack for parsing the 2007 Gameday data, we'd be in business.

                  Comment


                  • #10
                    Originally posted by weskelton View Post
                    I found all of the PBP data for the Israel Baseball League in Gameday format
                    Amazing! Shows how out of touch I am - I don't even know what to do with Gameday format. Until now, I've been parsing the web pages downloaded straight from the IBL's site.

                    Thanks for the discovery.
                    http://biblemetrics.blogspot.com
                    The blog for Israel Baseball League analysis

                    Comment


                    • #11
                      It worked - the full BaseRuns formula yielded good estimates without any tweaking on my part. Details are in my latest blog post. Thanks again for the pointers.

                      A related question: Does anyone know where I can find league-average summary stats for current minor leagues?
                      http://biblemetrics.blogspot.com
                      The blog for Israel Baseball League analysis

                      Comment


                      • #12
                        baseball-reference.com has minor league data now...include league summary information...for all leagues 1992-2007.

                        Comment


                        • #13
                          I actually did a little data massaging of my own using the numbers that Iblemetrician had posted on his blog...



                          Basically I took the percentage of extra errors in the ILB that were above and beyond what would have been expected in the MLB and turned them into singles, following Tango's suggestion. I also treated the HBP's as BB's. Then I plugged the revised run environment into Tango's markov modeler. My values were as follows...

                          AB=837, H=262, 2B=39, 3B=2, HR=25, BB=163, K=165

                          The result was a run environment that would score 8.083 runs/27 outs(AB-H). When treating the extra errors as singles (not outs), the IBL actually scored at a rate of 7.889 runs/27 outs. This is compared to a Markov value of only 6.341 runs/27 outs using the un-massaged data.

                          So yes, it does seem that the errors account for a good deal of the error.

                          Comment

                          Working...
                          X