# Thread: History of the Game's Strength - The Era Difficulty Rating

1. ## History of the Game's Strength - The Era Difficulty Rating

I thought the history buffs might find this one interesting enough that I decided to post it here...the "sabermetrics" involved here are very light mathematically, so it fits in.

This is just experimental, because to properly scale my difficulty rating, I had to arbitrarily choose the marginal value you'll see in a moment...I'm working on ways to more rigorously define it.

OK...bear with me for a moment while I explain where I got this idea.

I've been looking for the LONGEST time for a way to objectively rate how "deep" or "difficult" a league was...

I never liked James' subjective timeline adjustment...it seemed WAY too simple. But how do you go about seeing how skilled the players within a league are as a group?

The idea came to me through a discussion I had with Randy Fiato (TKD) about what defines "bad baseball". It is intuitively obvious that when two bad teams face each other, the games will be sloppy more frequently..mistakes will be made in all aspects of the game. Pitching mistakes...hitting mistakes...fielding mistakes...baserunning blunders.

What will this look like statistically though? A classic idea proposed by sabermetricians in the 70s was to rate players based on standard deviations from the mean...it has been observed many times that the standard deviation of batting average has been fluctuating through time but trending down...(there's a famous paper on the disappearance of the .400 hitter that discusses this...the author's name escapes me for some reason).

Batting average is not however explanatory enough...what we want to know is...does the standard deviation of run scoring per side per game change with time the way it does for batting average? Are we cycling closer and closer to the mean as time advances?

A quick survey using retrosheet.org's game logs reveals that in fact standard deviation is changing with time...but perhaps not the way you might think. It became immediately apparent that the standard deviation of run scoring on a per game basis was directly dependent on the league average run scoring rate. In fact, an r^2 of 0.9301 exists between those two variables...low scoring leagues have small standard deviations...high scoring leagues have larger standard deviations.

Does this mean that high scoring leagues are "weaker"...less deep with talent? Of course not. It's hard to argue that the deadball era was a better level of play than today's game even with expansion considernig the player pool has expanded to include approximately 50 times more potential baseball players than it did back then, minor league scouting and development didn't exist in the deadball era, and the equipment and field conditions were often horrendous, making for sloppy games far more frequently than in today's major leagues.

This dependence on run scoring environment is not however the only problem with using standard deviation to rate the difficulty of a league or the players within the league. There is a fundamental logical flaw. The use of standard normal z scores presumes that the league and/or player distribution was normal...neither is the case.

The player distribution is pyramidal...the top 1% of the humans who play baseball make the major leagues (liberally...it might be closer to .001%)...if we could rate every baseballer from tee-ball to Japan to MLB to High School...the distribution of skill might be normal. Meanwhile, the distribution of runs scored per side per game in a league is the summation of a series of one-game match-ups...each match-up behaving according to the laws of probability as governed by the intrinsic strengths of both combatants...the result of that process is a non-normal significantly skewed distribution...high extreme values will have an exaggeratedly large Z-score...shutouts are a sign of bad play too but their is a lower bound to how "bad" you can be in the non-scoring direction.

Given this lower bound...and the resulting tendency for variations in ability to manifest themselves in the rightward biasing direction (large numbers of high scoring games relative to the mean run scoring environment)...we fall back on MEASURING the skew of the league's RS distribution to get an idea about how erratic/weak that league was.

The positives...Skew is not dependent on the run scoring environemtn...it is never affected by the mean of a probability distribution. Skew uni-directional...meaning the lower bound shouldn't interfere with an accurate measurement of positive skew (skew is defined to be positive when the longer tail of a distribution points to the right on a number line). Skewness also does not presume a distribution is normal. It describes how non-normal a distribution is.

Logically...skew tells you how frequently extremes occur...more extremes mean more variation in intrinsic team strengths...and therefore...a weaker league.

If the run scoring distribution were normal (had no skew) this would mean that there was ZERO variation in player ability across the league...this would be the "ideal" league...but we know this to be humanly impossible to achieve...nonetheless...it serves to demonstrate that more skew is a larger deviation from the ideal league.

Skewness of a distribution is easily measured:

SUM(x - u)^3
--------------------
(n - 1) * s^3

Where x is the observed game/side runs scored, u is the league average runs scored per side per game, n is the number of game/sides within the league and s is the standard deviation of the distribution.

Placing the s term in the expression automatically scales the skew value so that higher scoring leagues, which will naturally have a wider range of run scoring outcomes do not appear to have higher skew.

When I plotted skew of the run scoring distribution against time, wat I found was a somewhat messy but nonetheless encouraging trend toward gradually decreasing skew with time. There was a lot of noise in the plot...probably because skew is heavily impacted by large outliers, so extreme games might have had a disproportionately large pull on skew...it therefore was necessary to smooth skew values.

I chose to use a normally weighted 7-year running mean of skew values for each league (normally weighted implies a larger emphasis on the center year...think of the shape of the bell curve) to smooth out the fluctuations...

It makes sense to smooth the data because although players change from season to season...the overall strength of the league cannot possibly fluctuate by overly large amounts...there are hundreds of players in any given league...turnover from year to year is no larger than 5-10% so we should expect league strengths to change gradually except in extreme circumstances like during WWII.

I'm considering alternatives to this normally weight running mean idea...I may for instance measure the skewness of a longer period of years than one...perhaps skew is more persistant if you incluide more than one year of data...either way...the smoothed values were eye popping and aligned very well with my expectations for where baseball was weak and where it was strong.

But this doesn't end the problem.

Assuming Smoothed skew is an appropriate measure of league strength, we need to put it in a form that allows strong leagues to score higher than weak leagues...and it would in fact be ideal if we got the scores to range from 0 to 1 so that they could be used multiplicatively...(for instance...if we rate 1872 as a 0.5 league...we would cut player wins in half in 1872 to get an idea of how many wins they'd be worth in a strong league)

We can make use of the exponential function here...it makes sense to use the exponential given that major league baseball represents the top of the baseball pyramid and the drop in skew value from typical leagues to great ones is likely to be large.

It also gives us the right range if used properly. Skewness can theoretically range from 0 to infinity in this case (it can't range negatively because of the lower bound at zero)...if we take a skewness of zero...e^0 = 1...if we take a skewness value approaching infinity e^large = large...ah but if we make that e^-skew...-0 is still zero, but -large implies 1/(e^large) which asymptotically approaches zero.

One more step though...no baseball league...no matter how great...will ever have a skew of zero. Here's the nasty part where I have to arbitrarily pick a marginal skew value. This was just me visually examining the graph of smoothed skew with time and seeing what the skew appeared to be approaching (the overall curved trend appears to be leveling off slowly but surely.

I chose a value orf 0.8 as the minimum skew...though I experimented with other values.

This was applied by simply subtracting 0.8 from each skew value obtained by the smoothing process before converting them with the exponential decay function.

The end result is quite interest to me...

Here are the top 20 most difficult leagues by this method:
Code:
```Year    Lg      Strength
1984	AL	0.968
1985	AL	0.967
1997	AL	0.947
1995	AL	0.946
1996	AL	0.943
1998	AL	0.942
1986	AL	0.941
1983	AL	0.941
1983	NL	0.932
1933	AL	0.928
1934	AL	0.928
1999	AL	0.925
1994	AL	0.925
1982	NL	0.923
1937	AL	0.919
1935	AL	0.913
1938	AL	0.912
1936	AL	0.909
1987	AL	0.907
1962	AL	0.906```
And the 20 weakest leagues
Code:
```1910	NL	0.691
1909	AL	0.690
1944	NL	0.688
1902	NL	0.687
1901	NL	0.683
1885	NL	0.682
1905	AL	0.679
1911	NL	0.675
1881	NL	0.666
1875	NA	0.665
1906	AL	0.663
1908	AL	0.654
1907	AL	0.651
1874	NA	0.637
1873	NA	0.614
1884	NL	0.612
1882	NL	0.589
1872	NA	0.578
1883	NL	0.560
1871	NA	0.528```
The early deadball era looks to me to have been very weak competitively...though obviously not as bad as the old National Association...which plays like a modern AA or A league.

Thoughts from the peanut gallery?

2. Any way you could put this in simpler terms? I have no idea what you're tying to say. By "stronger" league, do you mean harder to excel offensively in? If thats the case, how can you have 5 seasons from 1995-present in the "hardest" list when this happened:

AVG MLB TEAM

YEAR - HR - ERA

2005 – 167 -- 4.29
2004 – 182 -- 4.47
2003 – 174 -- 4.41
2002 – 169 -- 4.28
2001 – 182 -- 4.41
2000 – 190 -- 4.77
1999 – 184 -- 4.71
1998 – 169 -- 4.46
1997 – 166 -- 4.39
1996 – 177 -- 4.61
1995 – 146 -- 4.45

3. --What would make you assume it means hardest to excell offensively? What league difficulty measures is how good the league is and as a result how hard it is to separate from the pack in that league.

4. This isn't a measure of difficulty to hit...to measure that one needs only the league's mean run scoring tendency.

This is a measure of the overall depth of the player pool and therefore the difficulty in being significantly better than average.

Both for hitters and for pitchers/fielders.

It can be measured using only the Run Scored Distribution because both the offense and the defense must contribute if a team is to score a large number of runs in a game...you have to both do your own hitting...and have the defense you face suck.

Primarily, the reason league difficulty expresses itself (at least...it appears to express itself) in a lack of extreme games is that when talent is spread more evenly...it's harder to bash a team's head in.

5. I'm surprised that '43 & '45 does not appear in the top 20 weakest list.

6. Originally Posted by leecemark
--What would make you assume it means hardest to excell offensively?
I didn't know, thats why I was asking. I'm not a big saber guy. In fact halfway through his original post I started seeing spots and had to get up for another beer

What league difficulty measures is how good the league is and as a result how hard it is to separate from the pack in that league.

Ok, that makes sense. If the game is easier for everyone offensively, then it would be harder to stand out from the pack. Makes sense that certain players would try to do something to stand out as well.

7. It's not so much that the game is "easier for everyone" offensively...it's more than everyone is so good offensively that the league context is being dragged upward...hence making it harder to be better than the standard...but yeah...you have the idea. It also does start to explain some of the desperation by today's players toward getting an edge...

Pitchers and hitters alike.

I'm sorry if my opening post was dense..I wasn't goinfg for dense...just trying to fully explain where I got the idea for skewness research.

8. As for '43 and '45...fear not...the war did have a very noticeable upward pull on both AL and NL skews...the difference was that the AL started out a much stronger league than the NL so the deleterious effects of the player loss was not as sharply felt there...and '43 and '45 in the NL are both in the top 40 weakest leagues...the curve is pretty cool actually...a perfectly timed spike in skew (drop in difficulty) right during the war.

9. Originally Posted by SABR Matt
It's not so much that the game is "easier for everyone" offensively...it's more than everyone is so good offensively that the league context is being dragged upward...hence making it harder to be better than the standard...but yeah...you have the idea. It also does start to explain some of the desperation by today's players toward getting an edge...

Pitchers and hitters alike.

I'm sorry if my opening post was dense..I wasn't goinfg for dense...just trying to fully explain where I got the idea for skewness research.
It wasn't dense, just over my head apparently. I was done at the r^2y or whatever you put. Its all good though.

I guess its a matter of opinion as the why its harder to stand out in todays game. I would say its like restricter plates in nascar. Everybody has a strong engine (body) and MLB has given them other small advantages that add up to create more offense then we've ever seen before. The top 5% talent can no longer rise further above because its easier for the middle 40% to put up bigger numbers. Its all opinion apparently though.

10. well you're not disagreeing with me...

"Everyone has strong engines (bodies)..."

That's precisely it...the atheletes are all better...batters, fielders, and pitchers are all better today than they were in 1920. The increase in offense is mostly created by the ball type and the fact that we've just had a period where the hitters were better than the pitchers (it goes in cycles...hitters dominated the 1880s and 1890s, pitchers owned the deadball era, hitters dominated in the 20s and 30s...pitchers dominated fomr the mid fifties to the early 90s...the hitters are back...that advantage is starting to wain thuogh...pitching and defense are starting to balance now...not to mention the smaller parks of today...

And BTW...it's not really true that this is the most offense we've ever had...there was more offense in 1930 and in 1894 than there is today.

11. Originally Posted by SABR Matt

And BTW...it's not really true that this is the most offense we've ever had...there was more offense in 1930 and in 1894 than there is today.
Yeah, these parks are rediculous and so are all the other factors.

What are you basing that comment on? , runs/game?

The average team hit 98 HR in '30, and the average ERA was 4.80

The average team hit 182 HR in 2004, average ERA was 4.47

12. Offense is offense...doesn't matter in this context how we came by it...I'm basing the comment on runs/game. runs/game in the modern game are higher than normal, but these years don't match the 20s/30s spike for offensive profficiency.

13. Yes bill has a timeline adjustment that is simple but it was also for a simple rating system. In the same book he has something like a ten or twelve fator timeline system. One that includes quality of play and the conditions of the playing field.

Also can you show us bottom 10 or 20 of only the 20th century? Perhaps a ranking of all the 20th century seasons. Thanks.

14. Sure.

Let me do something to avoid people getting too obsessed with exact ranks of modern seasons...

Let's focus on the last 100 years of the FL, AL and NL only...that's 202 leagues from 1905 to 2004

I'm going to give the leagues letter grades in order to break the sample up into "similar" groups. I don't want people going "how could you have year X in the NL 20 ranks below year Y in the AL!" when the numerical rank is very similar so I'll give you groupings...they will be ranked in reverse order of difficulty though for anyone curious about exact ordinal rank.

Higher Difficulty - higher Letter Grade
Code:
```Year    Lg      Gr      EDR
1984	AL	A+	0.968
1985	AL	A+	0.967
1997	AL	A+	0.947
1995	AL	A+	0.946
1996	AL	A+	0.943
1998	AL	A+	0.942
1986	AL	A	0.941
1983	AL	A	0.941
1983	NL	A	0.932
1933	AL	A	0.928
1934	AL	A	0.928
1999	AL	A	0.925
1994	AL	A	0.925
1982	NL	A	0.923
1937	AL	A	0.919
1935	AL	A	0.913
1938	AL	A	0.912
1936	AL	A	0.909
1987	AL	A	0.907
1962	AL	A-	0.906
2000	AL	A-	0.904
1932	AL	A-	0.902
1961	AL	A-	0.901
1984	NL	A-	0.900
1982	AL	A-	0.899
1963	AL	B+	0.897
2004	NL	B+	0.897
2001	AL	B+	0.891
1960	AL	B+	0.887
1993	AL	B+	0.884
2002	AL	B+	0.882
2003	AL	B+	0.880
1939	AL	B+	0.880
1998	NL	B+	0.879
2004	AL	B+	0.878
1997	NL	B+	0.877
1981	NL	B	0.875
1964	AL	B	0.874
1988	AL	B	0.874
2003	NL	B	0.872
1941	AL	B	0.870
1940	AL	B	0.869
1959	AL	B	0.868
1999	NL	B	0.867
1961	NL	B	0.866
2000	NL	B	0.864
1926	AL	B	0.863
1962	NL	B	0.863
2001	NL	B	0.862
1931	AL	B	0.860
2002	NL	B	0.859
1981	AL	B	0.858
1960	NL	B	0.858
1996	NL	B	0.856
1989	AL	B	0.855
1942	AL	B	0.854
1963	NL	B	0.853
1985	NL	B	0.852
1975	AL	B	0.852
1927	AL	B-	0.851
1992	AL	B-	0.850
1964	NL	B-	0.848
1974	AL	B-	0.846
1965	AL	B-	0.844
1959	NL	B-	0.843
1990	AL	B-	0.842
1958	AL	B-	0.841
1991	AL	B-	0.841
1928	NL	B-	0.840
1980	AL	B-	0.839
1976	AL	C+	0.839
1965	NL	C+	0.838
1977	AL	C+	0.838
1995	NL	C+	0.837
1978	AL	C+	0.836
1992	NL	C+	0.836
1979	AL	C+	0.835
1993	NL	C+	0.833
1925	AL	C+	0.832
1994	NL	C+	0.831
1970	NL	C+	0.831
1973	AL	C+	0.830
1948	NL	C+	0.829
1971	AL	C+	0.829
1986	NL	C+	0.829
1943	AL	C	0.829
1916	NL	C	0.828
1927	NL	C	0.827
1971	NL	C	0.826
1929	NL	C	0.825
1947	NL	C	0.825
1914	AL	C	0.825
1915	NL	C	0.825
1958	NL	C	0.823
1966	AL	C	0.823
1972	AL	C	0.823
1928	AL	C	0.821
1991	NL	C	0.821
1930	AL	C	0.819
1914	NL	C	0.819
1970	AL	C	0.818
1980	NL	C	0.817
1944	AL	C	0.816
1949	NL	C	0.816
1946	AL	C	0.815
1913	AL	C	0.815
1966	NL	C	0.814
1945	AL	C	0.814
1947	AL	C	0.813
1955	NL	C	0.812
1956	NL	C	0.810
1975	NL	C	0.809
1972	NL	C	0.808
1969	NL	C	0.808
1951	NL	C	0.807
1987	NL	C	0.807
1940	NL	C	0.807
1967	AL	C-	0.807
1950	NL	C-	0.807
1957	NL	C-	0.806
1948	AL	C-	0.806
1915	FL	C-	0.806
1914	FL	C-	0.806
1976	NL	C-	0.805
1929	AL	C-	0.805
1952	NL	C-	0.805
1951	AL	C-	0.803
1969	AL	C-	0.803
1954	NL	C-	0.803
1941	NL	C-	0.802
1939	NL	C-	0.801
1953	NL	C-	0.800
1921	NL	D+	0.798
1974	NL	D+	0.798
1920	AL	D+	0.798
1968	AL	D+	0.797
1930	NL	D+	0.797
1973	NL	D+	0.795
1957	AL	D+	0.795
1917	NL	D+	0.795
1949	AL	D+	0.795
1952	AL	D+	0.791
1950	AL	D+	0.791
1921	AL	D	0.787
1922	NL	D	0.787
1977	NL	D	0.787
1990	NL	D	0.786
1926	NL	D	0.784
1920	NL	D	0.783
1988	NL	D	0.782
1912	AL	D	0.782
1967	NL	D	0.781
1968	NL	D	0.780
1938	NL	D	0.780
1919	AL	D	0.780
1946	NL	D	0.779
1913	NL	D	0.779
1915	AL	D	0.778
1979	NL	D	0.777
1924	AL	D	0.775
1934	NL	D	0.773
1931	NL	D	0.773
1923	NL	D	0.773
1978	NL	D	0.772
1933	NL	D	0.772
1932	NL	D	0.770
1989	NL	D-	0.768
1919	NL	D-	0.762
1924	NL	D-	0.762
1942	NL	D-	0.761
1911	AL	D-	0.761
1935	NL	D-	0.761
1925	NL	D-	0.760
1918	NL	D-	0.759
1953	AL	D-	0.757
1922	AL	D-	0.756
1937	NL	D-	0.755
1936	NL	D-	0.746
1956	AL	D-	0.746
1918	AL	D-	0.738
1923	AL	D-	0.736
1910	AL	D-	0.731
1908	NL	F	0.726
1954	AL	F	0.725
1909	NL	F	0.721
1945	NL	F	0.721
1916	AL	F	0.720
1907	NL	F	0.718
1906	NL	F	0.718
1955	AL	F	0.715
1912	NL	F	0.714
1905	NL	F	0.714
1917	AL	F	0.705
1943	NL	F	0.704
1910	NL	F	0.691
1909	AL	F	0.690
1944	NL	F	0.688
1905	AL	F	0.679
1911	NL	F	0.675
1906	AL	F	0.663
1908	AL	F	0.654
1907	AL	F	0.651```
That's from 1905-2004...thoughts?

15. I see the Cubs greatest run is during a time in the NL that you have them ranked as an F.
Is that because the league was so bad or because the Cubs were so great? In otherwords how much effect did the Cubs have on that rating? Is it possible that the team assembled so many of the better players of that era on one team that it made the whole league lower skilled when in reality it was merely disbursed unevenly?

16. Registered User
Join Date
Jul 2003
Posts
271
The way I've always adjusted is very non-scientific, but is:

=1.05-((R27-ERA)/R27)

17. also can you put an attachment on your post so that one can download that list. I would like to be able to seperate out the leagues and put them in a timeline order. thanks

18. I'm sure you put a lot of reasearch into this, and it looks like it's returned reasonable results, but there is one major thing I must point out. I realize you have the 1989 NL rated at a D-, and the 1989 AL rated at a B. Does that really make any sense? Why is it in a modern era that the AL should be so far ahead of the NL? That doesn't seem to make much sense to me at all. Could you perhaps explain to me why it comes out that way, or any reasonable explanation for that? To me, that could be a major flaw in this system. I can't imagine league quality between leaues was even that far, or if it was not in the modern era.

19. Well...now I see the flaw in using letter grades...LOL

At least the way I've scaled them. The difference between a D- and a B isn't that huge...modern baseball from about 1950 on is very homogeneous...the spread of league difficulties is very small. I perhaps shuold have rated all of the teams in baseball history when I came up with those quick reference grades...and I definitely should have used a mean/standard deviation method instead of just breaking it up into proportions the way I did.

There was a significant difference between the AL and the NL in the late 80s though...quick...top of your head...name five great players who were at their best from 1985-1990...no cheating and looking guys up...

I did that little exercise when I was looking at the data to see if it made sense.

I obviously am not saying this is going to be perfect...now would I claim this is the end of my research...I just found it worthy of posting here because I believe I'm "on the right track"...I do believe skewness of the run scoring distribution is going to be the key to seeing league difficulty.

I have more things I want to try to smooth the data more scientifically than I did here.

I'll convery the answers I have to this point into a text file and attach them so you can recreate the nifty graph I have of difficulty by league and year.

20. Originally Posted by BillyF29
The way I've always adjusted is very non-scientific, but is:

=1.05-((R27-ERA)/R27)
Interesting...you tried to capture the ERROR rate...the rate at which runs score on sloppy play...as your method for era adjusting...clever idea...not bad as a quick thumbnail.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•