Announcement

Collapse

Updated Baseball Fever Policy

Baseball Fever Policy

I. Purpose of this announcement:

This announcement describes the policies pertaining to the operation of Baseball Fever.

Baseball Fever is a moderated baseball message board which encourages and facilitates research and information exchange among fans of our national pastime. The intent of the Baseball Fever Policy is to ensure that Baseball Fever remains an extremely high quality, extremely low "noise" environment.

Baseball Fever is administrated by three principal administrators:
webmaster - Baseball Fever Owner
The Commissioner - Baseball Fever Administrator
Macker - Baseball Fever Administrator

And a group of forum specific super moderators. The role of the moderator is to keep Baseball Fever smoothly and to screen posts for compliance with our policy. The moderators are ALL volunteer positions, so please be patient and understanding of any delays you might experience in correspondence.

II. Comments about our policy:

Any suggestions on this policy may be made directly to the webmaster.

III. Acknowledgments:

This document was based on a similar policy used by SABR.

IV. Requirements for participation on Baseball Fever:

Participation on Baseball Fever is available to all baseball fans with a valid email address, as verified by the forum's automated system, which then in turn creates a single validated account. Multiple accounts by a single user are prohibited.

By registering, you agree to adhere to the policies outlined in this document and to conduct yourself accordingly. Abuse of the forum, by repeated failure to abide by these policies, will result in your access being blocked to the forum entirely.

V. Baseball Fever Netiquette:

Participants at Baseball Fever are required to adhere to these principles, which are outlined in this section.
a. All posts to Baseball Fever should be written in clear, concise English, with proper grammar and accurate spelling. The use of abbreviations should be kept to a minimum; when abbreviation is necessary, they should be either well-known (such as etc.), or explained on their first use in your post.

b. Conciseness is a key attribute of a good post.

c. Quote only the portion of a post to which you are responding.

d. Standard capitalization and punctuation make a large difference in the readability of a post. TYPING IN ALL CAPITALS is considered to be "shouting"; it is a good practice to limit use of all capitals to words which you wish to emphasize.

e. It is our policy NOT to transmit any defamatory or illegal materials.

f. Personal attacks of any type against Baseball Fever readers will not be tolerated. In these instances the post will be copied by a moderator and/or administrator, deleted from the site, then sent to the member who made the personal attack via a Private Message (PM) along with a single warning. Members who choose to not listen and continue personal attacks will be banned from the site.

g. It is important to remember that many contextual clues available in face-to-face discussion, such as tone of voice and facial expression, are lost in the electronic forum. As a poster, try to be alert for phrasing that might be misinterpreted by your audience to be offensive; as a reader, remember to give the benefit of the doubt and not to take umbrage too easily. There are many instances in which a particular choice of words or phrasing can come across as being a personal attack where none was intended.

h. The netiquette described above (a-g) often uses the term "posts", but applies equally to Private Messages.

VI. Baseball Fever User Signature Policy

A signature is a piece of text that some members may care to have inserted at the end of ALL of their posts, a little like the closing of a letter. You can set and / or change your signature by editing your profile in the UserCP. Since it is visible on ALL your posts, the following policy must be adhered to:

Signature Composition
Font size limit: No larger than size 2 (This policy is a size 2)
Style: Bold and italics are permissible
Character limit: No more than 500 total characters
Lines: No more than 4 lines
Colors: Most colors are permissible, but those which are hard to discern against the gray background (yellow, white, pale gray) should be avoided
Images/Graphics: Allowed, but nothing larger than 20k and Content rules must be followed

Signature Content
No advertising is permitted
Nothing political or religious
Nothing obscene, vulgar, defamatory or derogatory
Links to personal blogs/websites are permissible - with the webmaster's written consent
A Link to your Baseball Fever Blog does not require written consent and is recommended
Quotes must be attributed. Non-baseball quotes are permissible as long as they are not religious or political

Please adhere to these rules when you create your signature. Failure to do so will result in a request to comply by a moderator. If you do not comply within a reasonable amount of time, the signature will be removed and / or edited by an Administrator. Baseball Fever reserves the right to edit and / or remove any or all of your signature line at any time without contacting the account holder.

VII. Appropriate and inappropriate topics for Baseball Fever:

Most concisely, the test for whether a post is appropriate for Baseball Fever is: "Does this message discuss our national pastime in an interesting manner?" This post can be direct or indirect: posing a question, asking for assistance, providing raw data or citations, or discussing and constructively critiquing existing posts. In general, a broad interpretation of "baseball related" is used.

Baseball Fever is not a promotional environment. Advertising of products, web sites, etc., whether for profit or not-for-profit, is not permitted. At the webmaster's discretion, brief one-time announcements for products or services of legitimate baseball interest and usefulness may be allowed. If advertising is posted to the site it will be copied by a moderator and/or administrator, deleted from the site, then sent to the member who made the post via a Private Message (PM) along with a single warning. Members who choose to not listen and continue advertising will be banned from the site. If the advertising is spam-related, pornography-based, or a "visit-my-site" type post / private message, no warning at all will be provided, and the member will be banned immediately without a warning.

It is considered appropriate to post a URL to a page which specifically and directly answers a question posted on the list (for example, it would be permissible to post a link to a page containing home-road splits, even on a site which has advertising or other commercial content; however, it would not be appropriate to post the URL of the main page of the site). The site reserves the right to limit the frequency of such announcements by any individual or group.

In keeping with our test for a proper topic, posting to Baseball Fever should be treated as if you truly do care. This includes posting information that is, to the best of your knowledge, complete and accurate at the time you post. Any errors or ambiguities you catch later should be acknowledged and corrected in the thread, since Baseball Fever is sometimes considered to be a valuable reference for research information.

VIII. Role of the moderator:

When a post is submitted to Baseball Fever, it is forwarded by the server automatically and seen immediately. The moderator may:
a. Leave the thread exactly like it was submitted. This is the case 95% of the time.

b. Immediately delete the thread as inappropriate for Baseball Fever. Examples include advertising, personal attacks, or spam. This is the case 1% of the time.

c. Move the thread. If a member makes a post about the Marlins in the Yankees forum it will be moved to the appropriate forum. This is the case 3% of the time.

d. Edit the message due to an inappropriate item. This is the case 1% of the time. There have been new users who will make a wonderful post, then add to their signature line (where your name / handle appears) a tagline that is a pure advertisement. This tagline will be removed, a note will be left in the message so he/she is aware of the edit, and personal contact will be made to the poster telling them what has been edited and what actions need to be taken to prevent further edits.

The moderators perform no checks on posts to verify factual or logical accuracy. While he/she may point out gross errors in factual data in replies to the thread, the moderator does not act as an "accuracy" editor. Also moderation is not a vehicle for censorship of individuals and/or opinions, and the moderator's decisions should not be taken personally.

IX. Legal aspects of participation in Baseball Fever:

By submitting a post to Baseball Fever, you grant Baseball Fever permission to distribute your message to the forum. Other rights pertaining to the post remain with the ORIGINAL author, and you may not redistribute or retransmit any posts by any others, in whole or in part, without the express consent of the original author.

The messages appearing on Baseball Fever contain the opinions and views of their respective authors and are not necessarily those of Baseball Fever, or of the Baseball Almanac family of sites.

Sincerely,

Sean Holtz, Webmaster of Baseball Almanac & Baseball Fever
www.baseball-almanac.com | www.baseball-fever.com
"Baseball Almanac: Sharing Baseball. Sharing History."
See more
See less

question about Extrapolated Runs

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    The Hidden Game formula was published in ... The Hidden Game. And, they are very close to Matt's model, as they are to ones that I have published, whether on my site or in The Book.

    Pete Palmer was a genius.
    Author of THE BOOK -- Playing The Percentages In Baseball

    Comment


    • #17
      Whoops...I misread Tango's numbers...thought he was saying the HG numbers were the ones he listed. My apologies.

      And yes...Pete Palmer was indeed a genius...he had a lot less to work with than we do and he got very close to a lot of truths which is really quite impressive.

      Comment


      • #18
        Originally posted by Tango Tiger View Post
        The Hidden Game formula was published in ... The Hidden Game.
        Go figure. I am very aware of the Linear Weight values that come from The Hidden Game. I just didn't recall that Palmer had actually originally specified it as (.16*H)+(.31*TB). I thought that was just something you had observed.

        Comment


        • #19
          No, that was his shorthand that he used after presenting the "era" numbers.
          Author of THE BOOK -- Playing The Percentages In Baseball

          Comment


          • #20
            Tom,

            Just broke out my copy of THG and re-read the primer chapter on Linear Weights and there it was, just like you said. Interestingly enough, Palmer actually presents the condensed version as (.47 * H) +(.31 * (TB-H)), which is mathematically equivalent to what you presented and looks a lot like the thing Matt and I cooked up (albeit with preferable weights).

            The other construct is actually similar to the original presentation of Johnson's ERP which was (.16 * H) + (.32 * TB).

            Now, about XR... you had stated that it was the result of regression. While it did start with regression, Furtado also claimed to have included helpings of empirical analysis, peer review and comparison to other models in concocting his final stew. So now I'm wondering, if the net result of the XR weights and LWTS for 1b,2b,3b,hr are so similar for the time period in question, why didn't he just concede to conventional wisdom? It makes me wonder how much of XR's accuracy would be lost by substituting the LWTS values for these events as opposed to the XR versions.

            I think I just found a way to kill the next few hours.

            Comment


            • #21
              It looks to me that he went half-way between logic/empirical and regression. If you are looking to lower your RMSE on the data you are basing your equation on, then regression will certainly help you... but it won't help with the out-of-sample data, where logic will win.
              Author of THE BOOK -- Playing The Percentages In Baseball

              Comment


              • #22
                So, here's what I did... I took the HG weights and scaled them proportionately, so that the difference between the sums of the weighted values for 1b/2b/3b/hr in XR and LWTS was zeroed out for the 1955-1997 time period. Here are the values I ended up with...

                1B: .48
                2B: .80
                3B: 1.11
                HR: 1.43

                I then substituted these weights in the XR formula and re-ran the tests that Furtado had originally done in the 1999 BBBA. The results...

                The revised XR formula suffered degradation from an RMSE of 20.9 to 21.3 for the entire 1955-1997 time period. Likewise, there was a comparable drop in accuracy for each of the individual decade tests 1955-59, 1960's, 1970's, 1980's and 1990-97.

                What does it all mean? Probably not very much, but it does seem to suggest that there was method to the madness in deriving the original XR weights, as strange as they seem. If nothing else, XR does give us a level of accuracy to shoot for in measuring run estimators for teams in the normal range.

                Comment


                • #23
                  No, you are wrong.

                  All the regression does is best-fit the data for the sample, but it doesn't work in the out-of-sample. I once took the Redsox data, and I ended up with a regression where the triples run value was higher than a HR.

                  What you have to do is derive the components in your sample, and test it in the out-of-sample.

                  Otherwise, why not go all the way, and use just regression, where you'll end up with a run value for .66 for the double?
                  Author of THE BOOK -- Playing The Percentages In Baseball

                  Comment


                  • #24
                    http://www.knology.net/~johnfjarvis/runs_survey.html

                    See tables 2 and 4 above.

                    If you are looking for the lowest RMSE, then you want the run value of the double as .63, just 0.15 runs more than a single!

                    The only way to test, is to develop the equation in your sample, and test out-of-sample.

                    However, note that each coefficient in the regression itself has a fairly wide confidence interval.

                    Finally, why test on aggregated team data, where you'll only have a few thousand data points, when you can test at the game or even inning level, where you will have tens of thousands of data points or more.
                    Author of THE BOOK -- Playing The Percentages In Baseball

                    Comment


                    • #25
                      Originally posted by Tango Tiger View Post
                      If you are looking for the lowest RMSE, then you want the run value of the double as .63, just 0.15 runs more than a single!
                      What is it about regression that would give rise to the undervaluing of the double - assuming there are enough data points to yield highly significant coefficient estimates?

                      Is it that regression is estimating the average value of each game event, whereas you're interested in the marginal values in a given offensive environment?


                      Originally posted by Tango Tiger View Post
                      Finally, why test on aggregated team data, where you'll only have a few thousand data points, when you can test at the game or even inning level, where you will have tens of thousands of data points or more.
                      I've been looking at generating weights for the Israel Baseball League, with only 6 teams and 41 games each. To get meaningful results, I have in fact gone down to the inning level, which gives me over 1600 independent data points. Since run scoring actually takes place on the inning level, not the game level, I expect it should also yield more accurate estimates of event values, since an event in inning 1 shouldn't be able to create runs in inning 2.
                      http://biblemetrics.blogspot.com
                      The blog for Israel Baseball League analysis

                      Comment


                      • #26
                        The point is that a few thousand data points doesn't give you a "significant" estimate, as you are trying to imply. The range for the .62 run value of a double at the 95% confidence interval is probably something like .62 +/- .20 runs or something. Couple that with .47 run value (with it's smaller interval), and it's statistically possible that the run value of the single is higher than the double, according to these regressions. (And the triple being higher than HR, possibly.)

                        The problem you have with team/seasonal-level is that you have two huge biases: park, and team. Rather than each "team" being a random sample of games, is that it's the actual same players all lumped up. Therefore, each team is biased by the identity of the players. On top of which, those players play in the same parks (half the time), which introduces another bias.

                        It's basically a joke that we even bother to run regressions like this.

                        As you correctly point out, runs are scored at the inning level. Why in the world would anyone aggregate 1458 innings into a single data point is beyond me.
                        Author of THE BOOK -- Playing The Percentages In Baseball

                        Comment


                        • #27
                          Interesting. The article you linked to doesn't give confidence levels.

                          For my IBL data, using a total of 1633 half-innings as separate data points, and the standard spreadsheet regression function, I get what appear to be significant coefficient values for all the non-rare game events. (Keep in mind that this is a much higher run scoring environment than the MLB.)

                          Out: -.156 +- .010
                          Single: .598 +- .016
                          Double: .845 +- .035
                          Triple: 1.221 +- .127
                          HR: 1.435 +- .042
                          Walk: .482 +- .018
                          HBP: .520 +- .038
                          Reach-on-error: .522 +- .062
                          Error without reach: .302 +- .050
                          Stolen base: .092 +- .026
                          Caught stealing: -.281 +- .058
                          GIDP: -.280 +- .056

                          Obviously, some of those error terms are too high for comfort.

                          I'm somewhat reassured, though, by the internal consistency among the figures: the same values for caught stealing and double plays, for example, and for hit-by-pitch and reach-on-error. The weight for triples may be dodgy, and I'm inclined to declare by fiat that a triple is the average of a double and a home run.

                          A double comes out to .247 more than a single,

                          Error terms for sacrifices and intentional walks are higher:

                          Sac fly: .404 +- .069
                          Sac hit: .105 +- .082
                          Intentional walk (adjustment to walk values): -.168 +- .114


                          Should I be suspicious about these numbers just because they come from an (admittedly imperfect) regression?

                          On the other hand, the use of constant linear weights for game events is inherently inconsistent with the true nature of run creation. How much precision can we demand from a flawed method?
                          http://biblemetrics.blogspot.com
                          The blog for Israel Baseball League analysis

                          Comment


                          • #28
                            Actually IBL...your values for the main offensive events look about right for a league with such a high run scoring rate.

                            Comment


                            • #29
                              http://www.tangotiger.net/customlwts.html

                              Looking at your weights, I'd be suspicious as to how close the walk and singles are. Reached on error should be very close, or higher, than a single, and not a HBP.

                              How many runs per 27 outs does your league score? Looking at your weights, and my link above, I'll guess about 8?
                              Author of THE BOOK -- Playing The Percentages In Baseball

                              Comment


                              • #30
                                Originally posted by Tango Tiger View Post
                                No, you are wrong.
                                Huh? I wasn't aware that I had actually said anything that I could be wrong about, as I was simply stating facts and the results of my experiment. But if you say so, then I probably am.

                                My efforts here were just an exercise in curiousity. I wanted to see how much damage was done to the accuracy of XR by substituting the more conventional values into XR. I was optimistic that the difference would have been negligible, but I don't think I would call it that.

                                I was working within the confines of the original tests more out of convenience for comparison (or maybe laziness). I don't disagree with anything you've said here about more effective ways to test the accuracy of estimators.

                                Now about those 2B values...

                                In "Curve Ball" there is a chapter on the development of event weights using Linear Regression. After starting with a series of weights simply from 1B,2B,3B,HR,BB and SB, they start adding other events into the mix. They are ultimately able to reduce the RMSE by adding in SF. But, it turns out that the SF has a positive weight greater than anything other than a 3B or HR, which is absolutely ludicrous. They suggest that it is most likely saying 1)something about the definition of the stat itself (i.e. the only official stat other than the HR that guarantees a run) or 2) something about the situational nature of the stat (runner on 3B, less than 2 out).

                                Like the Iblemetrician, I too am curious about what the regression numbers might be telling us. I don't think that either 1) or 2) from above apply to the 2B. Perhaps there's something about the nature of teams that hit very few or very many 2B's. What do you think?

                                Comment

                                Working...
                                X