**OVERVIEW**

One of the first missions of almost every sabermetrician is to determine a preferred strategy for rating the performance of baseball teams and players while keeping in mind the many complicating factors that distort statistics like wins and losses and run differentials.

There is a host of available data today that makes analysis of teams possible, but some understanding of the dynamic way in which those statistics combine to produce wins and losses is required, and this is not a simple matter. Empirical analysis has for years centered on the idea that averages tell enough of the story to be used as the backbone of any system designed to adjust raw statistics to account for the context in which they occured. This document will explore the problems with empirical sabermetrics and introduce a new tool designed to bridge the gap between the intrinsic skill of the players, and the real world statistics that define them.

**REVIEW OF EMPIRICAL METHODS**(empirical or traditional analysis includes my own work in the field...proto-PCA for example)

Up until this moment, all documented analyses of player and team value has proceeded in a straight forwad, logical fashion, going from point A to point B to point C in order.

*A) Rate the offensive context of the league.*

This has commonly been done through some variant of looking directly at the league average run scoring rate. If 10,000 runs scored in 2500 games, then the assumption was made that it was a 4 R/G league...that the other factors would essentially cancel out and that the average scoring rate would be fully explanatory.

*B) Rate the offensive context of the park as it relates to the league.*

Even in the most sophisicated of modern traditional park adjustments, this boils down to a direct comparison of the scoring rate in a given park (the home team and its' opponents combined) and the scoring rates of that team and its' opponents on the road. The best of methods is iterative...adjusting and readjusting to account for the effect each park has on the net park effect of "the road"...but these are not commonly used or published What is typically available at ESPN.com or any source for baseball statistics is a simple ratio between the run scoring rate of the home park and the run scoring rate of everyone else.

*C) Combine the league and park contexts to come up with an average expectation to produce runs for each player and team.*

Because of the way traditional park factors are calculated (a ratio...the most natural thing to do with two sets of data that the statistician is trying to compare) the normal method for blending league and park statistics into one number (a R/G or R/O or R/PA type statistic) is to multiply the league scoring rate by the park adjustment for each team and use that as a basis for comparison.

**PROBLEMS WITH TRADITIONAL CONTEXTUAL ANALYSIS**

*1) Missing Elements*

There has always been an assumption in empirical sabermetrics that the variation in run scoring at the league level can be entirely explained by the league. If the league scores five runs a game compared to an average league that scores 4.75 runs a game, there is an implicit assertion made that the league and only the league is responsible for that change.When you put the league context with the park adjustment, it is assumed that those two things combine to fully explain what we should expect from average talent in the same conditions.

There's a serious problem with that claim, however. It is fairly evident by just taking a quick glance at the rosters of the teams as the years pass that balance of talent changes. Some years the pitching is a little better than others. Some years the hitting is a little better. It should be pretty clear looking at the rosters from 1999 that the hitter was better than the pitching.

Top Hitters from 1999 in no particular order

Alex Rodriguez

Sammy Sosa

Mark McGwire

Barry Bonds

Ken Griffey Jr

Edgar Martinez

Jason Giambi

Manny Ramirez

Vlad Guerrero

Mike Piazza...etc etc

Top Pitchers from 1999 in no particular order

Pedro Martinez

Roger Clemens

Greg Maddux

Randy Johnson

Curt Schilling

Kevin Brown

Mike Mussina

... ... ... ... uh ...

Obviously I'm leaving some names off and some of you reading this can fill in both lists with more detail, but it seems clear to me that there was a greater depth of hitting talent in 1999 than there was pitching talent.

To assume that 1968 was a great defensive year only because the league made it easier is to rip off the incredible depth of pitching and fielding talent and give too much credit to a mediocre crop of hitters by major league standards.

When park factors are calculated, there are two elements that are commonly forgotten and ignored.

A) The opponents a team faces are not necessarily neutral competition. The late 90s Cleveland Indians did not face a league average offense overall when they played their games at Jacobs Field. Those Indians were a cut above the rest with the bat, which means the rest were a cut below normal by definition. Traditional park factors make no effort to account for this.

B) Players make adjustments to the parks in which they play. Some players do this better than others, but the personnel that play in any given park have a direct impact of how that park APPEARS to play (how offensively friendly it is). Some front offices do a great job acquiring players that maximize their potential because they are good matches for the home park. The 1998 Yankees had a lot of left handed hitters up and down the line-up turning a normally neutral park into a hitter's haven...for the Yankees. The 2001 Mariners filled their outfield with defensively gifted players and loaded up on flyball pitchers to take advantage of the dead air in center.

*2) Runs are cumulative, not multiplicative.*

Traditional analysis as we have covered above includes a step where the league context is multiplied by a park adjustment to come up with a new expectation to score runs.But contexts shouldn't be multiplied like that. If the park makes the offensive environment more conducive to run scoring, it does not do so by multiplying the danger...it does so by ADDING runs to the scoreboard. Additive adjustments are less prone to the vagueries of small sample sizes, generally more stable, and more intuitive. They also translate more logically to player level analysis. If the park is adding a run per game (27 outs) one can easily see how it effects the players that player there. A multiplicative park factor will effect higher run scoring contexts more severely than lower scoring periods. If the league average R/G is 4 and the park factor is 120, then it we are claiming it increases run scoring by 20 percent (0.8 runs). If the run scoring environemtn is changed to 5 R/G, the park didn't change at all, but one of two things is true...either the multiplicative factor remains 120 (and the park therefore adds 1 R/G)...or the amount the park adds to run scoring doesn't change (and the multiplicative factor drops to 116).

It seems evident to us that the park effect should not depend on the offensive environment...the park has the impact it has...whether it's the deadball era or the rabbitball of 1930, if the park adds a run a game, it does it either way (unless of course it's doing that run adding by being homer friendly in which case it's not likely to hekp deadball hitters much!).

*3) The Denominator is Wrong*

Traditional contextual analysis includes adjustments that take the form of a series of fractions of the form (Runs per Run). Park factors are formed by a ratio of runs scoring rate at home over run scoring rate on the road. League contexts are set in essence by the ratio of the league's run scoring rate to the all time average scoring rate. The contexts themselves are not attached to anything...they're unitless multipliers that blow up with small sample sizes. In reality, any context has an increasing impact on a player or team the longer they spend in that context. And as we all know, time in the game of baseball is measured in outs, or when outs are not available, games. The denominator of any contextual adjustment should take the form of R/Out or R/G.As soon as you change the denominator to that form, it becomes very easy to see that contexts add together to explain run scoring changes.

*4) The elements that come together to explain runs are completely inter-related.*

Traditional sabermetric analysis proceeds from A to B to C, without stopping to fully appreciate how dependent each step of their analysis is on the steps that come before and after it. In order to know how the league impacted scoring, we need to know how the parks, the teams, and the players impacted the league...in order to know how the parks impacted scoring, we need information about the league, the teams, and the players...etc. What is needed is some sort of system of equations where each variable is considered as it related to the others.

**THE LAW OF SUCCESSION**

As noted earlier, sabermetricians fight a constant battle with small sample sizes. Even a full major league season includes match-ups that only recur 6-12 times between pairs of teams. Getting information from these match-ups requires a more useful method than simply taking the statistics at face value.There is a wing of statisical analysis known as Bayesian probability. The general idea behind Bayesian Probability is that we cannot assume we have seen an entire distribution simply because we have all of the available data. Just because two teams face each other 10 times and one of the teams wins all ten, doesn't mean there is a 100% probability that the successful team will win the next game or that if those ten games were replayed under identical conditions the results would be the same.

The Bayesian model starts with the assumption that every team, every park, every league is average and forces the statistics to prove or disprove this assumption, one run at a time. This methodology is the driving force in our analysis and the idea came to us (myself and Randy Fiato...a programmer of great skill and tenacity and a budding sabermetricin in his own right) by way of a Dr. Colley of Princeton University, who used Bayesian probability to mathematically explain the success of college football teams and rank them (his matrix is still used today as a part of the BCS ranking system). His system is somewhat simpler, because all football fields are the same dimension, he doesn't have to deal with a changing timeline, and his method deals only with ordinally ranking football teams so a certain level of precision is not necessary to achieve the desired accuracy in the rankings. But beyond cosmetic differences, our approach relies on the same central theory - the law of succession.

Rather than assume that without any data present, no conclusion cam be drawn, we assume that in the absense of data, one conclusion MUST be drawn...that being that future events will occur at the average pace until proven otherwise.

**THE MATRIX**

The unifying idea behind the Fiato-Souders Intrinsic Analysis Matrix can be summed up in one equation.

For any team: (ARSPG + OIRAAPG - DIRAAPG + LIRAAPG + PIRAAPG + OPR - DPR) = Actual Runs Scored - Actual Runs Allowed (both of which could be accurately predicted using only the componants that apply)

Holy Acronyms, Batman! I think we need a decoder ring!

ARSPG -> Alltime Runs Scored per Game (per side)...this turns out to be approximately 4.76 Runs/Game/Side excluding 1871-1875 which were not even slightly major league calibar baseball and would unnecessarily throw off the alltime scoring average. All additional contextual adjustments are relative to an "average" league.

OIRAAPG -> Offensive Intrinsic Runs Above Average Per Game...this term represents how many runs per game above the alltime scoring average this team's offense could be expected to score in an average league agaisnt average competition in a neutral park.

DIRAAPG -> Defensive Intrinsic Runs Above Average Per Game...this term represents how many runs per game above the alltime scoring average this team's defense could be expected to allow in an average league against average competition in a neutral park.

LIRAAPG -> League Intrinsic Runs Above Average Per Game...this term represents how many runs per game above the alltime scoring average this league would result in given neutral parks and average players and teams.

PIRAAPG -> Park Intrinsic Runs Above Average Per Game...this term represents how many runs per game above the alltime scoring average would score in this park given average teams and players, and an average league.

OPR -> Offensive Park Reactions...this term represents how the offensive players on each team did relative to what would be expected of them given the intrinsic strengths of the parks in which they played. This is a little harder to put in words, but to put it as simply as possible...if the park favors pitchers, and your team has found a way to score at an above average clip, you're doing something that is not statistically expected of you and that needs to be accounted for separately.

DPR -> Defensive Park Reactions...same as above only with defensive players (pitchers and fielders).

That sounds like a lot...but here's how it goes together. Each individual variable in the matrix is reported (these variables include the team's unique reactions to each and every park in which they played...one at a time...each park...the league as a whole...and the offenses and defenses of each team in the majors in a specific year and league) and placed in a linear equation where every other variable upon which it depends is set to an all time average (the Law of Succession...the average assertion). This is done for every variable in the history of the game...and those variables number greater than 1,000 in each year of the modern era.

Those equations are placed on the left hand side of a linear system of equations. They're set equal to the real world results on the right hand side (in the league row, the runs scored in that league would be recorded...in the intrinsic offense row, that team's runs scored would be recorded...etc). This system of equations can be solved using matrix algebra and what comes out the other end of that process is a set of results explaining each variable.

**RESULTS OF FSIA CALCULATIONS**

*1) What do the results look like?*

Top Fifty Teams since 1900

(In terms of Intrinsic Run Differential per Game)

Code:

Year Team InRD/G 1939 NYA 2.265 1927 NYA 2.197 1902 PIT 2.000 1936 NYA 1.945 1931 NYA 1.869 2001 SEA 1.790 1998 NYA 1.772 1906 CHN 1.723 1937 NYA 1.702 1929 PHA 1.644 1905 NY1 1.592 1932 NYA 1.586 1942 NYA 1.567 1944 SLN 1.530 1942 SLN 1.525 1904 NY1 1.504 1935 CHN 1.492 1953 BRO 1.492 1931 PHA 1.490 1969 BAL 1.489 1901 PIT 1.475 1903 BOS 1.452 1911 PHA 1.448 1998 HOU 1.447 1934 DET 1.441 1948 CLE 1.439 2001 OAK 1.410 1975 CIN 1.401 1921 NYA 1.369 2002 ANA 1.365 1935 DET 1.359 1995 CLE 1.351 1938 NYA 1.339 1974 LAN 1.336 1912 BOS 1.336 1912 NY1 1.330 1949 BRO 1.330 1998 ATL 1.328 1910 PHA 1.325 1942 BRO 1.321 1932 PHA 1.315 1909 PIT 1.298 1999 ARI 1.295 1922 SLA 1.289 1905 CHN 1.288 1955 BRO 1.286 1950 NYA 1.268 1909 PHA 1.266 1953 NYA 1.262 1901 CHA 1.261

Code:

Year Team InRD/G 1909 WS1 -1.480 1904 BSN -1.486 1963 NYN -1.514 1940 PHI -1.522 1906 BSN -1.534 1951 SLA -1.536 1955 KC1 -1.540 1935 BSN -1.548 1974 SDN -1.559 1953 DET -1.565 1920 PHA -1.583 1901 CIN -1.586 1948 CHA -1.590 1910 SLA -1.604 1923 PHI -1.612 1908 SLN -1.618 1979 OAK -1.619 1937 SLA -1.628 1952 PIT -1.635 1924 BSN -1.636 1926 BOS -1.636 1909 BSN -1.640 1925 BOS -1.653 1956 WS1 -1.676 1969 SDN -1.682 1941 PHI -1.701 1942 PHI -1.702 1905 BRO -1.719 1928 PHI -1.726 1939 PHI -1.749 1919 PHA -1.764 1904 WS1 -1.767 1921 PHI -1.773 1945 PHI -1.775 1954 PIT -1.778 1962 NYN -1.783 2002 DET -1.784 1936 PHA -1.812 1916 PHA -1.834 1939 SLA -1.849 1938 PHI -1.873 1911 BSN -1.887 1903 SLN -1.901 1996 DET -1.925 1932 BOS -1.930 2004 ARI -1.940 1954 PHA -1.974 1939 PHA -2.009 1915 PHA -2.024 2003 DET -2.112

It should be noted that these intrinsic calculations included the intrinsic offenses and defenses of each team as well as the team's unique park reactions (because park reactions are a skill that shouldbe accounted for when rating the merits of teams).

*2) Benefits of the FSIA*

A) This represents the first ever system that has made an attempt to credit the players at least in part for helping to create the changes in the run scoring environment.

Typipcally, the credit awarded to the offenses and defenses (one way or the other depending on the conditions in the league) is on the order of 50-400 Runs over the course of an entire season for an entire league, so the credit is relatively small, but certain extreme seasons like 1999 in the national league, or 1968 in the NL, or 1987 in the AL swing further (1999 for instance gives almost as much credit to the hitters as the leagues themselves for the huge spike in offensive production).

B) Park adjustments are

*significantly*more conservative, and stable over time compared to ratio factors currently available. When you apply a ratio factor of 120 (the Coors Field effect) to a player season, you get a rather extreme result...when you apply a cumulative adjustment of one additional run expected every 27 batting outs to the same season (the park added roughly 160-180 runs each year to the scoring from both sides combined), the park's pull on the hitter's value will be somewhat muted (though still very real). FSIA park factors are significantly less prone to wild fluctuations from season to season and reflect our belief that most parks have a very minor effect on scoring and that it's only a few extreme parks at either end of the spectrum that can really be counted on from year to year to have a certain impact. Stable park factors were made possible by switching to cumulative math, and by factoring out the unexpected fluctuations in the reactions of players to the parks (and thus neutrallizing the home-team bias problem mentioned earlier)

C) This represents the first complete effort to separate the intrinsic abilities of teams from their contexts, while being able to reproduce real-world statistics with a high degree of accuracy. One of the problems with Baseball Prospectus's EqR statistic is that while it is a fairly aggressive attempt to put all players on a level playing field, it does not in any way model actual run scoring (it's not intended to...it's a conceptualized ideal league environment based on the average EqA being .260), so it's not particularly useful for doing any kind of top-down win analysis (you can't use EqR to predict how many runs a team will score and allow). The FSIA not only places players on a level playing field...it models the real world too.

**THE ACCURACY OF REAL-WORLD MODELLING WITH THE FSIA**

Using run differential data totalled up for each league and season, we were able to determine a series of encouraging error-statistics that we hope will make it clear that the FSIA is a highly accurate intrinsic analysis tool for use in real-world modelling.

First we tested its' ability to accurately reproduce league run scoring results from the componants. The largest discrepencies we found when comparing real-world run scoring totals to the FSIA generated RS was 68 runs. The error range was -68 to +49. To put this in clearer terms, on a per game basis, the error range was -0.030 to +0.027 R/G. In the worst case scneario, we're talking about

*maybe*a 1% error (more likely closer to half a percent). The root-mean-square-error (standard deviation of the error) was a mere 8.7 runs. In an average league which scores something like 8,000-11,000 runs!!

Next we tested its' ability to accurately predict runs scored and allowed by teams. We expected a larger error here, because the fewer games you have in a sample, the more the Law of Succession will play a part in pulling that sample variable toward the mean. This model will tend to underestimate the spread of run differentials in the case of extreme teams, partially because it is a proper statistical question whether we have seen the entire distribution of outcomes when the sample is reduced in size to 162 or 154 games (in most cases), and partially because in the case of extreme teams, we begin to run into a new error source which we are working toward correcting and which will be discussed in our future research plans below.

In any event, we did get a larger error here, but it was far smaller than even I had expected. The error range was -47 runs to +52 runs...or in terms of runs per game...-0.315 to +0.326 runs/game. In the worst case scenario we're looking at something like a 6-8% error, but this wasn't all that common.

The RMSE for team offenses was 14.7 R and is was 20.4 R for team defenses. Given that the average team scores and allows about 770 runs over the course of major league history, the "typical" error is something more like 2-3%.

That error shouldn't really even fully be called error, since, particularly in the case of teams with shorter schedules or extreme teams, all laws of probability suggest that a center-pull is wise (there is an increased probability that what we've seen out of a team with a shorter schedule or an extreme team is just a part of the distribution and that if those games were replayed under identical conditions, a somewhat less severe result would occur).

**PRIMARY SOURCE OF REAL ERROR**

Aside from random chance and the center-pull inherent to Bayesian probability, the primary problem with the FSIA is that there is one somewhat incorrect assumption required to make it work. The FSIA is a system of LINEAR equations. But we already know from research done by Bill James that teams do not combine LINEARLY to produce wins and losses...and they probably don't combine linearly to produce runs either. Teams and the contexts in which they play combine very NEARLY linearly when winning percenages of those contexts fall inside a range near .500 (.400 to .600 is considered the acceptible range of the linear assunption). The FSIA works very well for most of the variables it evaluates...but particularly park reactions, which are very small sample sizes, and prone to random fluctuations that make them appear extreme and therefore force them to fall outside the range where the linear assumption holds, and extremely good and poor teams, are sometimes vulnerable to error.

**FUTURE RESEARCH**

Randy and I have already planned out the concepts for the final advancement of our intrinsic analysis and are beginning work on a non-linear solver for systems of equations following a form pioneered by Bill James called "log5". More details on the log5 system when we are ready with new results, but as you have seen, the FSIA is already very accurate in just about every case, and ready for application to player evaluation models like PCA.

Adding to our work on log5, we are beginning to strategize on how to improve the accuracy of dynamic linear weights...more details on that at a later time.

It should also be noted that the FSIA masakes no attempt to correct for the strength of a league...that's another project entirely. We're working on ways to try to quantify the competitiveness and depth of a league as well, but that'll take some time.

I think I've written quite enough for one day...anyone still reading this...I solute you for taking the LOOONG time necessary to digest it all and I thank you for reading.

Thoughts? Quibbles? General wonderings?

## Comment