No announcement yet.

Introduction to Sabermetrics

  • Filter
  • Time
  • Show
Clear All
new posts

  • Introduction to Sabermetrics

    I'm very interested in learning about sabermetrics. I've found a lot of stuff on the internet through searches, but would like to cut down some time weeding through the pile. Can you please recommend some good sources that may be better suited for the basics and a beginner like myself? I plan to visit my local book store to sample the baseball section for info on sabermetrics. Thanks for any help you can provide in pointing me in the right direction.

    Perfection is not attainable. But, if we chase perfection, we can catch excellence. -- Vince Lombardi

  • #2
    Moneyball is a good start. as are articles by Rob Neyer at is another good one.
    as is

    then there's always just asking questions here or at
    RIP Dimebag, Mitch, John, & Grey Cat


    Angel of Death
    Monarch to the kingdom of the dead
    Infamous butcher,
    Angel of Death


    • #3
      I would also recommend you read the articles posted at and and ask questions about what you don't understand here or within those sites...the best way to learn is to read a lot and talk to other baseball analysts...that's how I learned.


      • #4
        Read the Bill James Annual Baseball Abstracts. There a bit hard to find but one can learn much from them.
        Strikeouts are boring! Besides that, they're fascist. Throw some ground balls - it's more democratic.-Crash Davis


        • #5
          I owuld say read total baseball as well if they're still publishing.


          • #6
            Go down to your Library and head for the baseball section. If you truly live in Anderson, SC then your library has a Bill James 1984 Baseball Abstract check that out plus a host of other books your library system has.

            Rob Neyer would have been a great start but his archives and current stuff is no longer free.


            • #7
              Well that's pretty sucky...that Neyer has gone corporate...

              But I understand it...I will eventually be corporate as well...sort of...I'll evenetually be selling what I have.


              • #8
                Neyer is great for beginners but outdated for anybody who has been reading his or really anybody elses stuff for awhile. Its odd most people who know Neyer and his work are vets at this and they are also the ones most likely not spend money for his articles. While on the flipside its the new guys who are most likely to enjoy his stuff yet they are not likely to buy his column since they have never had any experience with his work.

                I like Neyer but his stuff stopped becoming really interesting a few years back for me.


                • #9
                  Agreed on that Cubbie.

                  Neyer is like a crutch for sabermetric newbies,


                  • #10
                    Thanks to all for the help. My interest in sabermetrics started after I read Micheal Lewis' Moneyball last week. I have a general understanding of some basic metrics like OPS, K:BB, K/9, and BB/9. I'm just now starting to read about stuff like Defense Independent metrics and PECOTA projection systems on the Baseball Prospectus site. I've also been checking out the Baseball Think Factory, The Baseball Cube, and Baseball HQ.

                    Thanks Cubbieinexile about the tip of checking out the library. I do indeed live in Anderson, SC. That is an excellent idea. I will pay a visit to the library before the bookstore. A great way to keep the cost down.

                    Also, I plan to hang around the Baseball Fever site. I found this site after doing a search for sabermetrics.
                    Perfection is not attainable. But, if we chase perfection, we can catch excellence. -- Vince Lombardi


                    • #11
                      Go me...Go Cubbie...Go us for discussing sabermetrics often enough to get listed in online searches for that term.


                      • #12

                        Willie Runquist (who I was not then, nor am now familiar with) wrote an excellent book titled Baseball by the Numbers: How Statistics Are Collected, What They Mean, and How They Reveal the Game. It may require some understanding of math, but in general Runquist explains the concepts very well. I would say it is recommended, if not required, reading. I was lucky enough to have found it at my college library. For baseball books, I usually just search the library catalog for any baseball book and then go to that section and look through all the books. Don't forget reference works. My college library has two editions of Total Baseball in the reference section.

                        In addition to that, Baseball Prospectus has archived a series of articles for the newly interested. Then there are the HighBoskageHouse White Papers, which should give you a good idea of what, how, and why some things are done. HighBoskageHouse is Eric Walker's site, the same Eric Walker mentioned all too briefly in Moneyball as having written the "pamphlet" that Alderson gave to Beane.

                        I should not forget to mention Diamond Mind and Tom Tippet. Their site has a series of studies, as well. And Tangotiger, and a site called Sabernomics, which has a number of baseball studies. If you Google enough, you'll eventually find a helluva lot of sites dedicated to baseball research. Hard Ball Times, another one.

                        Going through all of that would see you off to a running start, I'd imagine.

                        Edit: Yikes! Right on the heels of SABR Matt and I forget to mention At Home Plate, which has a series of articles under the name Serious Baseball that are archived, among other things.
                        Last edited by therealnod; 06-25-2005, 12:30 PM.


                        • #13
                          Most of that is pretty advanced...start with the basics first.

                          But yes, therealnod has some very good ideas for where to go when you feel comfortable with the Jamesian sabermetrics...the basic stuff.


                          • #14
                            Also that book is in the library in South Carolina.


                            • #15
                              Bowling Green University has a good explanation of what sabermetics is. You can read the whole article to find out about some basic metrics, or just read the intro to get an idea of what exactly sabermetrics is. It seems that a link won't take you to the correct place, so I'll just give you the whole article right here:

                              An Introduction to Sabermetrics

                              by Jim Albert

                              What is Sabermetrics?

                              Sabermetrics is the mathematical and statistical analysis of baseball records. To understand the field of sabermetrics, one first should be familiar with the game of baseball. This sport is one of the most popular games in the United States; it is often called the {\it national pastime}. Baseball began in the eastern United States in the mid-1800's. Professional baseball started near the end of the 18th century; the National League was founded in 1876 and the American League in 1900. Currently in the United States, there are 28 professional teams in the American and National Leagues and millions of people watch games in ballparks and on television.

                              The game of baseball

                              The game of baseball is played between two teams, each consisting of nine players. The nine players are a pitcher, a catcher, first baseman, second baseman, shortstop, third baseman, left fielder, center fielder and right fielder. A game of baseball consists of nine innings. One inning is divided into two halves; in the top half of the inning, one team plays in the field and the second team comes to bat, and in the bottom half, the teams reverse roles. The team that is batting during a particular half-inning is trying to score runs. The team with the higher number of runs at the end of the nine innings is the winner of the game.

                              During an inning, a player on the team in the field, called a pitcher, throws a baseball toward a player of the team at-bat, called the batter. The batter will try to hit the ball using a wooden stick (called a bat) in a location out of the reach of the players in the field. By hitting the ball, the batter has the opportunity to run around four bases that lie in the field. If a player advances around all of the bases, he has scored a run. If a batter hits a ball that can be caught, or that can be thrown to first base before he runs to that base, then he is said to be out, and cannot score a run. A batter is also out if he fails to hit the baseball three times or if three good pitches (called strikes) have been thrown. The objective for the batting team during an inning is to score as many runs as possible before obtaining three outs.

                              The basic batting statistics

                              One notable aspect of the game of baseball is the wealth of numerical information that is recorded about the game. The effectiveness of batters and pitchers is typically assessed by particular numerical measures. The usual measure of hitting effectiveness for a player is the batting average which is computed by dividing the number of hits by the number of at-bats. This statistic gives the proportion of opportunities (at-bats) in which the batter succeeds (gets a hit). The batter with the highest batting average during a baseball season is called the best hitter that year. Batters are also evaluated on their ability to reach one, two, three, or four bases on a single hit; these hits are called respectively singles, doubles, triples, and home runs. The slugging average is computed by dividing the total number of bases (in short, total bases) by the number of opportunities. Since it weights hits by the number of bases reached, this measure reflects the ability of a batter to hit a long ball for distance. The most valued hit in baseball is the home run where a player advances four bases on one hit. The number of home runs is recorded for all players and the batter with the largest number of home runs at the end of the season is given special recognition.

                              The basic pitching statistics

                              A number of statistics are also used in the evaluation of pitchers. For a particular pitcher, one counts the number of games in which he was declared the winner or loser and the number of runs allowed. Pitchers are usually rated in terms of the average number of ``earned" runs allowed for a nine inning game. Other statistics are useful in understanding pitching ability. A pitcher records a strikeout when the batter fails to hit the ball in the field and records a walk when he throws four inaccurate pitches (balls) to the batter. A pitcher who can throw the ball very fast can record a high number of strikeouts. A pitcher who is ``wild" or relatively inaccurate will record a large number of walks.

                              Better measure of hitting ability -- runs created

                              One goal of sabermetrics is to find good measures of hitting and pitching performance. Bill James (1982) compares the batting records of two players, Johnny Pesky and Dick Stuart, who played in the 1960's. Pesky was a batter who hit for a high batting average but hit few home runs. Stuart, in contrast, had a modest batting average, but hit a high number of home runs. Who was the more valuable hitter? James argues that a hitter should be evaluated by his ability to create runs for his team. From an empirical study of a large collection of team hitting data, he established the following formula for predicting the number of runs scored in a season based on the number of hits, walks, at-bats, and total bases recorded in a season.

                              (HITS + WALKS) (TOTAL BASES)
                              RUNS = ----------------------------
                              AT-BATS + WALKS

                              This formula reflects two important aspects in scoring runs in baseball. The number of hits and walks of a team reflects the team's ability to get runners on base. The number of total bases of a team reflects the team's ability to move runners that are already on base. This runs created formula can be used at an individual level to compute the number of runs that a player creates for his team. In 1942, Johnny Pesky had 620 at-bats, 205 hits, 42 walks, and 258 total bases; using the formula, he created 96 runs for his team. Dick Stuart in 1960 had 532 at-bats with 160 at-bats, 34 walks, and 309 total bases for 106 runs created. The conclusion is that Stuart in 1960 was a slightly better hitter than Pesky in 1942 since he created a few more runs for his team.
                              Linear weights
                              An alternative approach to evaluating batting performance is based on a linear weights formula. George Lindsey (1963) was the first person to assign run values to each event that could occur while a team was batting. By the use of recorded data from baseball games and probability theory, he developed the formula

                              RUNS = (.41) 1B + (.82) 2B + (1.06) 3B + (1.42) HR

                              where 1B, 2B, 3B, and HR are respectively the number of singles, doubles, triples, and home runs hit in a game. One notable aspect of this formula is that it recognizes that a batter creates a run three ways. There is a direct run potential when a batter gets a hit and gets on base. In addition, the batter can advance runners that are already on base. Also, by not getting an out, the hitter allows a new batter a chance of getting a hit, and this produces an indirect run potential. Thorn and Palmer (1993) present a more sophisticated version of the linear weights formula which predicts the number of runs produced by an average baseball team based on all of the offensive events recorded during the game. Like James' runs created formula, the linear weights rule can be used to evaluate a player's batting performance.
                              Runs to wins
                              Although scoring runs is important in baseball, the basic objective is for a team to score more runs than its opponent. To learn about the relationship between runs scored and the number of wins, James (1982) looked at the number of runs produced, the number of runs allowed, the number of wins and the number of losses during a season for a large number of recent major league teams. James noted that the ratio of a team's wins to losses was approximately equal to the square of the ratio of runs scored to the runs allowed. Equivalently,

                              WINS RUNS^2
                              RUNS = -------------- = --------------------------- .
                              WINS + LOSSES RUNS^2 + OPPOSITION RUNS^2

                              This relationship can be used to measure a batter's performance in terms of the number of wins that he creates for his team.
                              Better measure of pitching ability
                              Sabermetrics has also developed better ways of evaluating pitching ability. The standard pitching statistics, the number of wins and the earned runs per game (ERA) are flawed. The number of wins of a pitcher can just reflect the fact that he pitches for a good offensive (run scoring) team. The ERA does measure the rate of a pitcher's efficiency, but it does not tell you about the actual benefit of this pitcher over an entire season. Thorn and Palmer (1993) developed the pitching runs formula

                              League ERA
                              PITCHING RUNS = Innings Pitched x ----------- - ER.

                              The factor (League ERA/9) measures the average runs allowed per inning for all teams in the league. This value is multiplied by the number of innings pitched by that pitcher --- this product represents the number of runs that pitcher would allow over the season if he was average. Last, one subtracts the actual earned runs (ER) the pitcher allowed for that season. If the pitching runs is larger than 0, then this pitcher is better than average. This new measure appears to be useful in measuring the efficiency and durability of a pitcher.
                              Player game percentage
                              Good measures of hitting, pitching, and fielding performance of baseball players have been developed. However, these statistics do not directly measure a player's contribution to a win for his team. Bennett and Flueck (1984) used data from two baseball seasons to estimate the probability the home team wins a game given the run differential (the home team runs minus visiting team runs), the half inning (top or bottom of the inning), the number of outs, and the on-base situation. Using these estimated probabilities, one can see how the probability of winning changes for each game event. One can measure a player's contribution to winning a game by summing the changes in win probabilities for each play in which the player has participated. This statistic, called the Player Game Percentage, was used by Bennett (1993) to evaluate the batting performance of Joe Jackson. This player was banished from baseball for allegedly throwing the 1919 World Series. A statistical analysis using the Player Game Percentage showed that Jackson played to his full potential during this series.

                              Making adjustments

                              People are often interested in comparing batters or pitchers from different eras. In making these comparisons, it is important to view batting or pitching statistics in the context in which they were achieved. For example, Bill Terry led the National League in 1930 with a batting average of .401, a mark that has been surpassed since by only one hitter. In 1968 Carl Yastrzemski led the American League in hitting with an average of .301. It appears on the surface that Terry was the clearly superior hitter. However, when viewed relative to the hitters that played during the same time, both hitters were approximately 27 percent better than the average hitter (Thorn and Palmer, 1993). The hitting accomplishments of Terry in 1930 and Yastrzemski in 1968 were actually very similar. Likewise, there are significant differences in hitting in different ball parks, and hitting statistics need to be adjusted for the ball park played to make accurate comparisons between players.

                              Learning from selected data

                              Watching a baseball game raises questions that motivate interesting statistical analyses. During the broadcast of a game, a baseball announcer will typically report selected hitting data for a player. For example, it may be reported that Barry Bonds has 10 hits in his most recent 20 at-bats. What have you learned about Bonds' batting average on the basis of this information? Clearly, Bonds' batting average can't be as large as 10/20 = .500 since this data was chosen to maximize the reported percentage. Casella and Berger (1994) construct the likelihood function for a player's true batting average on the basis of this selected information and find the maximum likelihood estimate. They conclude that this selected data only provides a little insight into the ``complete data" batting average that is obtained from batting records over the entire season.


                              Another interesting question is on the existence of streakiness in hitting data. During a season it is observed that some ballplayers will experience periods of ``hot" hitting where they will get a high proportion of hits. Other hitters will go through slumps or periods of hitting with very few hits. But these periods of hot and cold hitting may be just a reflection of the natural variability observed in coin tossing. Is there statistical evidence for a ``hot hand" among baseball hitters where the probability of obtain a hit is dependent on recent at-bats? Albright (1993) looked at a large collection of baseball hitting data and used a number of statistics such as the number of runs to detect streakiness in hitting data. His main conclusion was that there little statistical evidence generally for a hot hand in baseball hitting.

                              Situational data

                              Currently there is great interest among fans and the media in situational baseball data. The hitting performance of batters is recorded for a number of different situations, such as day versus night games, on grass fields and artificial turf fields, against pitchers who throw right-handed and left-handed, and during home and away games. There are two basic questions in the statistical analysis of this type of data. First, are there particular situations that can explain a significant amount of variation in the hitting data? Second, are there ballplayers that perform particularly well or poorly in a given situation? Albert (1994) analyzed a large body of published situational data and used Bayesian hierarchical models to combine data from a large group of players. His basic conclusion is that there do exist some important situations. For example, batters hit on average 20 points higher when facing a pitcher of the opposite arm, and hit 8 points higher when they are playing in their home ballpark. However, there is generally little statistical evidence for individual differences in these situational effects.


                              Major league baseball is currently divided into six divisions and one goal of any team is to finish first in its division. Suppose that part of the season has been completed. Using the teams' records from this partial season, is it possible to predict accurately the winners of the divisions? Barry and Hartigan (1993) use a choice model for the probability that a team wins an individual game. This model allows for different strengths between the teams, different home advantages, and team strengths that can change randomly with time. The authors use this model to simulate the results of future baseball games and estimate the probabilities that each team will win its respective divisions.

                              Currently, major league baseball games are recorded in very fine detail. Information about every single ball pitched, fielded and hit during a game are noted, creating a large database of baseball statistics. This database is used in a number of ways. Public relations departments of teams use the data to publish special statistics about their players. The statistics are used to help determine the salaries of major league ballplayers. Specifically, statistical information is used as evidence in salary arbitration, a legal proceeding which sets salaries. A number of teams have employed full-time professional statistical analysts and some managers use statistical information in deciding on strategy during a game. Bill James and other baseball statisticians have shown that it is possible to answer a variety of questions about the game of baseball by means of statistical analyses.


                              Albert, J. (1994), "`Exploring baseball hitting data: what about those breakdown statistics?", Journal of the American Statistical Association , 89, 1066-1074.

                              Albright, S. C. (1993), "A statistical analysis of hitting streaks in baseball," Journal of the American Statistical Association , 88, 1175-1183.

                              Barry, D., and Hartigan, J. A. (1993), "Choice Models for Predicting Divisional Winners in Major League Baseball," Journal of the American Statistical Association , 88, 766-774.

                              Bennett, J. M. (1993), "Did Shoeless Joe Jackson Throw the 1919 World Series?", The American Statistician, 47, 241-250.

                              Bennett, J. M. and Flueck, J. A. (1984), "Player Game Percentage", in Proceedings of the Social Statistics Section, American Statistical Association, 378-380.

                              Casella, G. and Berger, R. (1993), "Estimation With Selected Binomial Information or Do You Really believe that Dave Winfield is Batting .471?", Journal of the American Statistical Association , 89, 1080-1090.

                              James, B. (1982), The Bill James Baseball Abstract, New York: Ballantine Books.

                              Lindsey, G. (1963) "An Investigation of Strategies in Baseball," Operations Research, 11, 447-501.

                              Thorn, J. and Palmer, P. (1993), Total Baseball, New York: Harper Collins.

                              The books listed in the bibliography would all be good purchases if you're interested in the field of sabermetrics.
                              Last edited by 538280; 06-26-2005, 07:35 PM.


                              Ad Widget