Page 1 of 10 123 ... LastLast
Results 1 to 25 of 234

Thread: History of the Game's Strength - The Era Difficulty Rating

  1. #1
    Join Date
    May 2005
    Location
    Where all students live...nowhere.
    Posts
    8,900

    History of the Game's Strength - The Era Difficulty Rating

    I thought the history buffs might find this one interesting enough that I decided to post it here...the "sabermetrics" involved here are very light mathematically, so it fits in.

    This is just experimental, because to properly scale my difficulty rating, I had to arbitrarily choose the marginal value you'll see in a moment...I'm working on ways to more rigorously define it.

    OK...bear with me for a moment while I explain where I got this idea.

    I've been looking for the LONGEST time for a way to objectively rate how "deep" or "difficult" a league was...

    I never liked James' subjective timeline adjustment...it seemed WAY too simple. But how do you go about seeing how skilled the players within a league are as a group?

    The idea came to me through a discussion I had with Randy Fiato (TKD) about what defines "bad baseball". It is intuitively obvious that when two bad teams face each other, the games will be sloppy more frequently..mistakes will be made in all aspects of the game. Pitching mistakes...hitting mistakes...fielding mistakes...baserunning blunders.

    What will this look like statistically though? A classic idea proposed by sabermetricians in the 70s was to rate players based on standard deviations from the mean...it has been observed many times that the standard deviation of batting average has been fluctuating through time but trending down...(there's a famous paper on the disappearance of the .400 hitter that discusses this...the author's name escapes me for some reason).

    Batting average is not however explanatory enough...what we want to know is...does the standard deviation of run scoring per side per game change with time the way it does for batting average? Are we cycling closer and closer to the mean as time advances?

    A quick survey using retrosheet.org's game logs reveals that in fact standard deviation is changing with time...but perhaps not the way you might think. It became immediately apparent that the standard deviation of run scoring on a per game basis was directly dependent on the league average run scoring rate. In fact, an r^2 of 0.9301 exists between those two variables...low scoring leagues have small standard deviations...high scoring leagues have larger standard deviations.

    Does this mean that high scoring leagues are "weaker"...less deep with talent? Of course not. It's hard to argue that the deadball era was a better level of play than today's game even with expansion considernig the player pool has expanded to include approximately 50 times more potential baseball players than it did back then, minor league scouting and development didn't exist in the deadball era, and the equipment and field conditions were often horrendous, making for sloppy games far more frequently than in today's major leagues.

    This dependence on run scoring environment is not however the only problem with using standard deviation to rate the difficulty of a league or the players within the league. There is a fundamental logical flaw. The use of standard normal z scores presumes that the league and/or player distribution was normal...neither is the case.

    The player distribution is pyramidal...the top 1% of the humans who play baseball make the major leagues (liberally...it might be closer to .001%)...if we could rate every baseballer from tee-ball to Japan to MLB to High School...the distribution of skill might be normal. Meanwhile, the distribution of runs scored per side per game in a league is the summation of a series of one-game match-ups...each match-up behaving according to the laws of probability as governed by the intrinsic strengths of both combatants...the result of that process is a non-normal significantly skewed distribution...high extreme values will have an exaggeratedly large Z-score...shutouts are a sign of bad play too but their is a lower bound to how "bad" you can be in the non-scoring direction.

    Given this lower bound...and the resulting tendency for variations in ability to manifest themselves in the rightward biasing direction (large numbers of high scoring games relative to the mean run scoring environment)...we fall back on MEASURING the skew of the league's RS distribution to get an idea about how erratic/weak that league was.

    The positives...Skew is not dependent on the run scoring environemtn...it is never affected by the mean of a probability distribution. Skew uni-directional...meaning the lower bound shouldn't interfere with an accurate measurement of positive skew (skew is defined to be positive when the longer tail of a distribution points to the right on a number line). Skewness also does not presume a distribution is normal. It describes how non-normal a distribution is.

    Logically...skew tells you how frequently extremes occur...more extremes mean more variation in intrinsic team strengths...and therefore...a weaker league.

    If the run scoring distribution were normal (had no skew) this would mean that there was ZERO variation in player ability across the league...this would be the "ideal" league...but we know this to be humanly impossible to achieve...nonetheless...it serves to demonstrate that more skew is a larger deviation from the ideal league.

    Skewness of a distribution is easily measured:

    SUM(x - u)^3
    --------------------
    (n - 1) * s^3

    Where x is the observed game/side runs scored, u is the league average runs scored per side per game, n is the number of game/sides within the league and s is the standard deviation of the distribution.

    Placing the s term in the expression automatically scales the skew value so that higher scoring leagues, which will naturally have a wider range of run scoring outcomes do not appear to have higher skew.

    When I plotted skew of the run scoring distribution against time, wat I found was a somewhat messy but nonetheless encouraging trend toward gradually decreasing skew with time. There was a lot of noise in the plot...probably because skew is heavily impacted by large outliers, so extreme games might have had a disproportionately large pull on skew...it therefore was necessary to smooth skew values.

    I chose to use a normally weighted 7-year running mean of skew values for each league (normally weighted implies a larger emphasis on the center year...think of the shape of the bell curve) to smooth out the fluctuations...

    It makes sense to smooth the data because although players change from season to season...the overall strength of the league cannot possibly fluctuate by overly large amounts...there are hundreds of players in any given league...turnover from year to year is no larger than 5-10% so we should expect league strengths to change gradually except in extreme circumstances like during WWII.

    I'm considering alternatives to this normally weight running mean idea...I may for instance measure the skewness of a longer period of years than one...perhaps skew is more persistant if you incluide more than one year of data...either way...the smoothed values were eye popping and aligned very well with my expectations for where baseball was weak and where it was strong.

    But this doesn't end the problem.

    Assuming Smoothed skew is an appropriate measure of league strength, we need to put it in a form that allows strong leagues to score higher than weak leagues...and it would in fact be ideal if we got the scores to range from 0 to 1 so that they could be used multiplicatively...(for instance...if we rate 1872 as a 0.5 league...we would cut player wins in half in 1872 to get an idea of how many wins they'd be worth in a strong league)

    We can make use of the exponential function here...it makes sense to use the exponential given that major league baseball represents the top of the baseball pyramid and the drop in skew value from typical leagues to great ones is likely to be large.

    It also gives us the right range if used properly. Skewness can theoretically range from 0 to infinity in this case (it can't range negatively because of the lower bound at zero)...if we take a skewness of zero...e^0 = 1...if we take a skewness value approaching infinity e^large = large...ah but if we make that e^-skew...-0 is still zero, but -large implies 1/(e^large) which asymptotically approaches zero.

    One more step though...no baseball league...no matter how great...will ever have a skew of zero. Here's the nasty part where I have to arbitrarily pick a marginal skew value. This was just me visually examining the graph of smoothed skew with time and seeing what the skew appeared to be approaching (the overall curved trend appears to be leveling off slowly but surely.

    I chose a value orf 0.8 as the minimum skew...though I experimented with other values.

    This was applied by simply subtracting 0.8 from each skew value obtained by the smoothing process before converting them with the exponential decay function.

    The end result is quite interest to me...

    Here are the top 20 most difficult leagues by this method:
    Code:
    Year    Lg      Strength
    1984	AL	0.968
    1985	AL	0.967
    1997	AL	0.947
    1995	AL	0.946
    1996	AL	0.943
    1998	AL	0.942
    1986	AL	0.941
    1983	AL	0.941
    1983	NL	0.932
    1933	AL	0.928
    1934	AL	0.928
    1999	AL	0.925
    1994	AL	0.925
    1982	NL	0.923
    1937	AL	0.919
    1935	AL	0.913
    1938	AL	0.912
    1936	AL	0.909
    1987	AL	0.907
    1962	AL	0.906
    And the 20 weakest leagues
    Code:
    1910	NL	0.691
    1909	AL	0.690
    1944	NL	0.688
    1902	NL	0.687
    1901	NL	0.683
    1885	NL	0.682
    1905	AL	0.679
    1911	NL	0.675
    1881	NL	0.666
    1875	NA	0.665
    1906	AL	0.663
    1908	AL	0.654
    1907	AL	0.651
    1874	NA	0.637
    1873	NA	0.614
    1884	NL	0.612
    1882	NL	0.589
    1872	NA	0.578
    1883	NL	0.560
    1871	NA	0.528
    The early deadball era looks to me to have been very weak competitively...though obviously not as bad as the old National Association...which plays like a modern AA or A league.

    Thoughts from the peanut gallery?

  2. #2
    Join Date
    Sep 2005
    Location
    Portland, OR
    Posts
    12,654
    Blog Entries
    1
    Any way you could put this in simpler terms? I have no idea what you're tying to say. By "stronger" league, do you mean harder to excel offensively in? If thats the case, how can you have 5 seasons from 1995-present in the "hardest" list when this happened:

    AVG MLB TEAM

    YEAR - HR - ERA

    2005 – 167 -- 4.29
    2004 – 182 -- 4.47
    2003 – 174 -- 4.41
    2002 – 169 -- 4.28
    2001 – 182 -- 4.41
    2000 – 190 -- 4.77
    1999 – 184 -- 4.71
    1998 – 169 -- 4.46
    1997 – 166 -- 4.39
    1996 – 177 -- 4.61
    1995 – 146 -- 4.45
    "Everyone left here, but I remain at my post, documenting my sports writers and photos. I don't do Ty Cobb anymore. I did for him everything I could do. Work will live on. Personalities will fade.

    Fever members come and go. Not relevant. Your documentations will live FOREVER, my brother. That outweighs all the Fever jack-asses. Ignore what you must, document all you can."
    - Bill Burgess

  3. #3
    Join Date
    Apr 2004
    Location
    mariners country
    Posts
    23,570
    --What would make you assume it means hardest to excell offensively? What league difficulty measures is how good the league is and as a result how hard it is to separate from the pack in that league.

  4. #4
    Join Date
    May 2005
    Location
    Where all students live...nowhere.
    Posts
    8,900
    This isn't a measure of difficulty to hit...to measure that one needs only the league's mean run scoring tendency.

    This is a measure of the overall depth of the player pool and therefore the difficulty in being significantly better than average.

    Both for hitters and for pitchers/fielders.

    It can be measured using only the Run Scored Distribution because both the offense and the defense must contribute if a team is to score a large number of runs in a game...you have to both do your own hitting...and have the defense you face suck.

    Primarily, the reason league difficulty expresses itself (at least...it appears to express itself) in a lack of extreme games is that when talent is spread more evenly...it's harder to bash a team's head in.

  5. #5
    Join Date
    Sep 2005
    Location
    Connecticut
    Posts
    1,964
    I'm surprised that '43 & '45 does not appear in the top 20 weakest list.

  6. #6
    Join Date
    Sep 2005
    Location
    Portland, OR
    Posts
    12,654
    Blog Entries
    1
    Quote Originally Posted by leecemark
    --What would make you assume it means hardest to excell offensively?
    I didn't know, thats why I was asking. I'm not a big saber guy. In fact halfway through his original post I started seeing spots and had to get up for another beer

    What league difficulty measures is how good the league is and as a result how hard it is to separate from the pack in that league.

    Ok, that makes sense. If the game is easier for everyone offensively, then it would be harder to stand out from the pack. Makes sense that certain players would try to do something to stand out as well.
    "Everyone left here, but I remain at my post, documenting my sports writers and photos. I don't do Ty Cobb anymore. I did for him everything I could do. Work will live on. Personalities will fade.

    Fever members come and go. Not relevant. Your documentations will live FOREVER, my brother. That outweighs all the Fever jack-asses. Ignore what you must, document all you can."
    - Bill Burgess

  7. #7
    Join Date
    May 2005
    Location
    Where all students live...nowhere.
    Posts
    8,900
    It's not so much that the game is "easier for everyone" offensively...it's more than everyone is so good offensively that the league context is being dragged upward...hence making it harder to be better than the standard...but yeah...you have the idea. It also does start to explain some of the desperation by today's players toward getting an edge...

    Pitchers and hitters alike.

    I'm sorry if my opening post was dense..I wasn't goinfg for dense...just trying to fully explain where I got the idea for skewness research.

  8. #8
    Join Date
    May 2005
    Location
    Where all students live...nowhere.
    Posts
    8,900
    As for '43 and '45...fear not...the war did have a very noticeable upward pull on both AL and NL skews...the difference was that the AL started out a much stronger league than the NL so the deleterious effects of the player loss was not as sharply felt there...and '43 and '45 in the NL are both in the top 40 weakest leagues...the curve is pretty cool actually...a perfectly timed spike in skew (drop in difficulty) right during the war.

  9. #9
    Join Date
    Sep 2005
    Location
    Portland, OR
    Posts
    12,654
    Blog Entries
    1
    Quote Originally Posted by SABR Matt
    It's not so much that the game is "easier for everyone" offensively...it's more than everyone is so good offensively that the league context is being dragged upward...hence making it harder to be better than the standard...but yeah...you have the idea. It also does start to explain some of the desperation by today's players toward getting an edge...

    Pitchers and hitters alike.

    I'm sorry if my opening post was dense..I wasn't goinfg for dense...just trying to fully explain where I got the idea for skewness research.
    It wasn't dense, just over my head apparently. I was done at the r^2y or whatever you put. Its all good though.

    I guess its a matter of opinion as the why its harder to stand out in todays game. I would say its like restricter plates in nascar. Everybody has a strong engine (body) and MLB has given them other small advantages that add up to create more offense then we've ever seen before. The top 5% talent can no longer rise further above because its easier for the middle 40% to put up bigger numbers. Its all opinion apparently though.
    "Everyone left here, but I remain at my post, documenting my sports writers and photos. I don't do Ty Cobb anymore. I did for him everything I could do. Work will live on. Personalities will fade.

    Fever members come and go. Not relevant. Your documentations will live FOREVER, my brother. That outweighs all the Fever jack-asses. Ignore what you must, document all you can."
    - Bill Burgess

  10. #10
    Join Date
    May 2005
    Location
    Where all students live...nowhere.
    Posts
    8,900
    well you're not disagreeing with me...

    "Everyone has strong engines (bodies)..."

    That's precisely it...the atheletes are all better...batters, fielders, and pitchers are all better today than they were in 1920. The increase in offense is mostly created by the ball type and the fact that we've just had a period where the hitters were better than the pitchers (it goes in cycles...hitters dominated the 1880s and 1890s, pitchers owned the deadball era, hitters dominated in the 20s and 30s...pitchers dominated fomr the mid fifties to the early 90s...the hitters are back...that advantage is starting to wain thuogh...pitching and defense are starting to balance now...not to mention the smaller parks of today...

    And BTW...it's not really true that this is the most offense we've ever had...there was more offense in 1930 and in 1894 than there is today.

  11. #11
    Join Date
    Sep 2005
    Location
    Portland, OR
    Posts
    12,654
    Blog Entries
    1
    Quote Originally Posted by SABR Matt

    And BTW...it's not really true that this is the most offense we've ever had...there was more offense in 1930 and in 1894 than there is today.
    Yeah, these parks are rediculous and so are all the other factors.

    What are you basing that comment on? , runs/game?


    The average team hit 98 HR in '30, and the average ERA was 4.80

    The average team hit 182 HR in 2004, average ERA was 4.47
    "Everyone left here, but I remain at my post, documenting my sports writers and photos. I don't do Ty Cobb anymore. I did for him everything I could do. Work will live on. Personalities will fade.

    Fever members come and go. Not relevant. Your documentations will live FOREVER, my brother. That outweighs all the Fever jack-asses. Ignore what you must, document all you can."
    - Bill Burgess

  12. #12
    Join Date
    May 2005
    Location
    Where all students live...nowhere.
    Posts
    8,900
    Offense is offense...doesn't matter in this context how we came by it...I'm basing the comment on runs/game. runs/game in the modern game are higher than normal, but these years don't match the 20s/30s spike for offensive profficiency.

  13. #13
    Join Date
    Aug 2005
    Posts
    14,096
    Blog Entries
    2
    Yes bill has a timeline adjustment that is simple but it was also for a simple rating system. In the same book he has something like a ten or twelve fator timeline system. One that includes quality of play and the conditions of the playing field.

    Also can you show us bottom 10 or 20 of only the 20th century? Perhaps a ranking of all the 20th century seasons. Thanks.

  14. #14
    Join Date
    May 2005
    Location
    Where all students live...nowhere.
    Posts
    8,900
    Sure.

    Let me do something to avoid people getting too obsessed with exact ranks of modern seasons...

    Let's focus on the last 100 years of the FL, AL and NL only...that's 202 leagues from 1905 to 2004

    I'm going to give the leagues letter grades in order to break the sample up into "similar" groups. I don't want people going "how could you have year X in the NL 20 ranks below year Y in the AL!" when the numerical rank is very similar so I'll give you groupings...they will be ranked in reverse order of difficulty though for anyone curious about exact ordinal rank.

    Higher Difficulty - higher Letter Grade
    Code:
    Year    Lg      Gr      EDR
    1984	AL	A+	0.968
    1985	AL	A+	0.967
    1997	AL	A+	0.947
    1995	AL	A+	0.946
    1996	AL	A+	0.943
    1998	AL	A+	0.942
    1986	AL	A	0.941
    1983	AL	A	0.941
    1983	NL	A	0.932
    1933	AL	A	0.928
    1934	AL	A	0.928
    1999	AL	A	0.925
    1994	AL	A	0.925
    1982	NL	A	0.923
    1937	AL	A	0.919
    1935	AL	A	0.913
    1938	AL	A	0.912
    1936	AL	A	0.909
    1987	AL	A	0.907
    1962	AL	A-	0.906
    2000	AL	A-	0.904
    1932	AL	A-	0.902
    1961	AL	A-	0.901
    1984	NL	A-	0.900
    1982	AL	A-	0.899
    1963	AL	B+	0.897
    2004	NL	B+	0.897
    2001	AL	B+	0.891
    1960	AL	B+	0.887
    1993	AL	B+	0.884
    2002	AL	B+	0.882
    2003	AL	B+	0.880
    1939	AL	B+	0.880
    1998	NL	B+	0.879
    2004	AL	B+	0.878
    1997	NL	B+	0.877
    1981	NL	B	0.875
    1964	AL	B	0.874
    1988	AL	B	0.874
    2003	NL	B	0.872
    1941	AL	B	0.870
    1940	AL	B	0.869
    1959	AL	B	0.868
    1999	NL	B	0.867
    1961	NL	B	0.866
    2000	NL	B	0.864
    1926	AL	B	0.863
    1962	NL	B	0.863
    2001	NL	B	0.862
    1931	AL	B	0.860
    2002	NL	B	0.859
    1981	AL	B	0.858
    1960	NL	B	0.858
    1996	NL	B	0.856
    1989	AL	B	0.855
    1942	AL	B	0.854
    1963	NL	B	0.853
    1985	NL	B	0.852
    1975	AL	B	0.852
    1927	AL	B-	0.851
    1992	AL	B-	0.850
    1964	NL	B-	0.848
    1974	AL	B-	0.846
    1965	AL	B-	0.844
    1959	NL	B-	0.843
    1990	AL	B-	0.842
    1958	AL	B-	0.841
    1991	AL	B-	0.841
    1928	NL	B-	0.840
    1980	AL	B-	0.839
    1976	AL	C+	0.839
    1965	NL	C+	0.838
    1977	AL	C+	0.838
    1995	NL	C+	0.837
    1978	AL	C+	0.836
    1992	NL	C+	0.836
    1979	AL	C+	0.835
    1993	NL	C+	0.833
    1925	AL	C+	0.832
    1994	NL	C+	0.831
    1970	NL	C+	0.831
    1973	AL	C+	0.830
    1948	NL	C+	0.829
    1971	AL	C+	0.829
    1986	NL	C+	0.829
    1943	AL	C	0.829
    1916	NL	C	0.828
    1927	NL	C	0.827
    1971	NL	C	0.826
    1929	NL	C	0.825
    1947	NL	C	0.825
    1914	AL	C	0.825
    1915	NL	C	0.825
    1958	NL	C	0.823
    1966	AL	C	0.823
    1972	AL	C	0.823
    1928	AL	C	0.821
    1991	NL	C	0.821
    1930	AL	C	0.819
    1914	NL	C	0.819
    1970	AL	C	0.818
    1980	NL	C	0.817
    1944	AL	C	0.816
    1949	NL	C	0.816
    1946	AL	C	0.815
    1913	AL	C	0.815
    1966	NL	C	0.814
    1945	AL	C	0.814
    1947	AL	C	0.813
    1955	NL	C	0.812
    1956	NL	C	0.810
    1975	NL	C	0.809
    1972	NL	C	0.808
    1969	NL	C	0.808
    1951	NL	C	0.807
    1987	NL	C	0.807
    1940	NL	C	0.807
    1967	AL	C-	0.807
    1950	NL	C-	0.807
    1957	NL	C-	0.806
    1948	AL	C-	0.806
    1915	FL	C-	0.806
    1914	FL	C-	0.806
    1976	NL	C-	0.805
    1929	AL	C-	0.805
    1952	NL	C-	0.805
    1951	AL	C-	0.803
    1969	AL	C-	0.803
    1954	NL	C-	0.803
    1941	NL	C-	0.802
    1939	NL	C-	0.801
    1953	NL	C-	0.800
    1921	NL	D+	0.798
    1974	NL	D+	0.798
    1920	AL	D+	0.798
    1968	AL	D+	0.797
    1930	NL	D+	0.797
    1973	NL	D+	0.795
    1957	AL	D+	0.795
    1917	NL	D+	0.795
    1949	AL	D+	0.795
    1952	AL	D+	0.791
    1950	AL	D+	0.791
    1921	AL	D	0.787
    1922	NL	D	0.787
    1977	NL	D	0.787
    1990	NL	D	0.786
    1926	NL	D	0.784
    1920	NL	D	0.783
    1988	NL	D	0.782
    1912	AL	D	0.782
    1967	NL	D	0.781
    1968	NL	D	0.780
    1938	NL	D	0.780
    1919	AL	D	0.780
    1946	NL	D	0.779
    1913	NL	D	0.779
    1915	AL	D	0.778
    1979	NL	D	0.777
    1924	AL	D	0.775
    1934	NL	D	0.773
    1931	NL	D	0.773
    1923	NL	D	0.773
    1978	NL	D	0.772
    1933	NL	D	0.772
    1932	NL	D	0.770
    1989	NL	D-	0.768
    1919	NL	D-	0.762
    1924	NL	D-	0.762
    1942	NL	D-	0.761
    1911	AL	D-	0.761
    1935	NL	D-	0.761
    1925	NL	D-	0.760
    1918	NL	D-	0.759
    1953	AL	D-	0.757
    1922	AL	D-	0.756
    1937	NL	D-	0.755
    1936	NL	D-	0.746
    1956	AL	D-	0.746
    1918	AL	D-	0.738
    1923	AL	D-	0.736
    1910	AL	D-	0.731
    1908	NL	F	0.726
    1954	AL	F	0.725
    1909	NL	F	0.721
    1945	NL	F	0.721
    1916	AL	F	0.720
    1907	NL	F	0.718
    1906	NL	F	0.718
    1955	AL	F	0.715
    1912	NL	F	0.714
    1905	NL	F	0.714
    1917	AL	F	0.705
    1943	NL	F	0.704
    1910	NL	F	0.691
    1909	AL	F	0.690
    1944	NL	F	0.688
    1905	AL	F	0.679
    1911	NL	F	0.675
    1906	AL	F	0.663
    1908	AL	F	0.654
    1907	AL	F	0.651
    That's from 1905-2004...thoughts?

  15. #15
    Join Date
    Aug 2005
    Posts
    14,096
    Blog Entries
    2
    I see the Cubs greatest run is during a time in the NL that you have them ranked as an F.
    Is that because the league was so bad or because the Cubs were so great? In otherwords how much effect did the Cubs have on that rating? Is it possible that the team assembled so many of the better players of that era on one team that it made the whole league lower skilled when in reality it was merely disbursed unevenly?

  16. #16
    The way I've always adjusted is very non-scientific, but is:

    =1.05-((R27-ERA)/R27)

  17. #17
    Join Date
    Aug 2005
    Posts
    14,096
    Blog Entries
    2
    also can you put an attachment on your post so that one can download that list. I would like to be able to seperate out the leagues and put them in a timeline order. thanks

  18. #18
    Join Date
    May 2005
    Location
    Western Massachusetts
    Posts
    11,560
    I'm sure you put a lot of reasearch into this, and it looks like it's returned reasonable results, but there is one major thing I must point out. I realize you have the 1989 NL rated at a D-, and the 1989 AL rated at a B. Does that really make any sense? Why is it in a modern era that the AL should be so far ahead of the NL? That doesn't seem to make much sense to me at all. Could you perhaps explain to me why it comes out that way, or any reasonable explanation for that? To me, that could be a major flaw in this system. I can't imagine league quality between leaues was even that far, or if it was not in the modern era.

  19. #19
    Join Date
    May 2005
    Location
    Where all students live...nowhere.
    Posts
    8,900
    Well...now I see the flaw in using letter grades...LOL

    At least the way I've scaled them. The difference between a D- and a B isn't that huge...modern baseball from about 1950 on is very homogeneous...the spread of league difficulties is very small. I perhaps shuold have rated all of the teams in baseball history when I came up with those quick reference grades...and I definitely should have used a mean/standard deviation method instead of just breaking it up into proportions the way I did.

    There was a significant difference between the AL and the NL in the late 80s though...quick...top of your head...name five great players who were at their best from 1985-1990...no cheating and looking guys up...

    I did that little exercise when I was looking at the data to see if it made sense.

    I obviously am not saying this is going to be perfect...now would I claim this is the end of my research...I just found it worthy of posting here because I believe I'm "on the right track"...I do believe skewness of the run scoring distribution is going to be the key to seeing league difficulty.

    I have more things I want to try to smooth the data more scientifically than I did here.

    I'll convery the answers I have to this point into a text file and attach them so you can recreate the nifty graph I have of difficulty by league and year.

  20. #20
    Join Date
    May 2005
    Location
    Where all students live...nowhere.
    Posts
    8,900
    Quote Originally Posted by BillyF29
    The way I've always adjusted is very non-scientific, but is:

    =1.05-((R27-ERA)/R27)
    Interesting...you tried to capture the ERROR rate...the rate at which runs score on sloppy play...as your method for era adjusting...clever idea...not bad as a quick thumbnail.

  21. #21
    Join Date
    May 2005
    Location
    Where all students live...nowhere.
    Posts
    8,900
    BTW...in answer to the question about those great Cubs teams...one team's success shouldn't theoretically have an extreme impact on the entire league's run scoring distribution...nonetheless...I think it's a little of both...I think there weren't very many great players in the NL in the deadball era...but the ones there were all managed to land on a few teams...creating some really AWFUL clubs...and some good ones. I think that SHOULD have a significant negative impact on how we view those '00s Cubs...yes they had good players...but we can't be as confident that they really were GREAT as we could be if they were doing their winning in a much tougher era.

    Think about it this way...the 2001 Mariners won 116 games in a league that was MUCH more demanding...who would you rather have...the '06 Cubs who won their games against horrible foes...of the '01 Mariners who contended with a couple of bad teams but largely well distributed talent?

  22. #22
    Join Date
    May 2005
    Location
    Where all students live...nowhere.
    Posts
    8,900
    Attachment...

    This also includes the statistical documentation you need to see how I got weighted (by league games) mean and standard deviation to calculate new letter grades...they're slightly less arbitrary and make a little more sense now...sorry about that...

    If you guys want I can screencap my graph from Excel and post it here as a jpeg.
    Attached Files Attached Files

  23. #23
    Join Date
    May 2005
    Location
    Where all students live...nowhere.
    Posts
    8,900
    I trust you know how to get it from text file to something you can play with in Excel, Ubiquitus...it's pretty straightforward. If anyone else wants to attempt it but doesn't know how to go from text file to excel file...let me know and I'll explain.

  24. #24
    Join Date
    Aug 2005
    Posts
    14,096
    Blog Entries
    2
    Unfortunately excel is on the fritz but after jumping through some hoops I got it to work for quattro.

    Anyway I don't know if I can buy the results or not.

    From 1947 on the quality of NL play is supposedly pretty mediocre, then smack dab in the middle of expansion it gets better, slide back down during the pitching era, then the latin explosion happens and the quality slides back even further and doesn't get good again until the 1980's.

    Over in the American League after expansion the league stays mediocre to bad then again right around expansion gets good, falls back to mediocre for the pitching era then falls back to mediocre during the latin expansion and again doesn't get good again until the 80's.

  25. #25
    Join Date
    May 2005
    Location
    Where all students live...nowhere.
    Posts
    8,900
    I quickly calculated a second EDR where instead of taking skew values for individual leagues and doing a normally weighted mean, I actually took the skew of all of the data in that seven year period.

    To avoid the changing means presenting a bias, I used the (x-u) values previously calculated for one year samples...so as to make each season independent of its' run scoring mean...the result was nearly identical to the first effort except that rapid drops and increases were flattened out a little more and large gaps between leagues seemed to shrink some, particularly in the modern era.

    The 1989 example brought up here for instance formerly had EDRs of .764 for the NL and .855 for the AL...the second EDR was .811 for the NL, .862 for the AL...still a noteworthy difference...though now instead of being a D+ vs a B-...it's a C vs. a B-.

Page 1 of 10 123 ... LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •