Announcement

Collapse
No announcement yet.

Standard deviation of team run differential

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Standard deviation of team run differential

    Has anyone compiled standard deviations among team run differentials for MLB seasons?

    To be clear, let's say there are 8 teams in the league. For each team, take runs scored minus runs allowed. This is the variable that I would like to see the league standard deviation for.

    I think I read that people have looked at it as a sign of league quality, so I was hoping it has been done. If not, someone might like to take it as a project. Doesn't seem too extensive.

    Also, I saw a stat on BBRef: defensive efficiency. Is that just 1-BABIP?

  • #2
    Originally posted by brett View Post
    To be clear, let's say there are 8 teams in the league. For each team, take runs scored minus runs allowed. This is the variable that I would like to see the league standard deviation for.
    I'm not sure what this means. Subtracting RA from RS wouldn't be a measure of deviation. It's just a difference. Do you mean find the deviation based on those numbers?

    As I learned it, standard deviation is the units (measured in std devs) for which a figure deviates from the average. http://www.socialresearchmethods.net...s/statdes3.gif

    We would find the run differentials for each team, average them (will always be 0), then subtract the average from each sample (using absolute value because we are only measuring distance). Add all of these up, divide by the entire sample set minus one, and finish by taking the square root. Is this what you're looking for?

    1901 AL:
    Orioles score 761, allow 750 (+11)
    Americans score 759, allow 608 (+151)
    White Sox score 819, allow 632 (+187)
    Blues score 666, allow 831 (-165)
    Tigers score 742, allow 696 (+46)
    Brewers score 641, allow 828 (-187)
    Athletics score 805, allow 760 (+45)
    Senators score 683, allow 771 (-88)

    Average run differential: 0

    (11+151+187+165+46+187+45+88+0)=880
    divided by sample size minus one=110.
    Square root= about 10.5. Standard deviation is 10.5 runs.

    Is this what you mean?
    "Allen Sutton Sothoron pitched his initials off today."--1920s article

    Comment


    • #3
      Originally posted by brett View Post

      Also, I saw a stat on BBRef: defensive efficiency. Is that just 1-BABIP?
      Yes. Not sure what they do with bunt singles or sacrifice flies, however.
      1885 1886 1926 1931 1934 1942 1944 1946 1964 1967 1982 2006 2011

      1887 1888 1928 1930 1943 1968 1985 1987 2004 2013

      1996 2000 2001 2002 2005 2009 2012 2014 2015


      The Top 100 Pitchers In MLB History
      The Top 100 Position Players In MLB History

      Comment


      • #4
        Originally posted by Tyrus4189Cobb View Post
        I'm not sure what this means. Subtracting RA from RS wouldn't be a measure of deviation. It's just a difference. Do you mean find the deviation based on those numbers?

        As I learned it, standard deviation is the units (measured in std devs) for which a figure deviates from the average. http://www.socialresearchmethods.net...s/statdes3.gif

        We would find the run differentials for each team, average them (will always be 0), then subtract the average from each sample (using absolute value because we are only measuring distance). Add all of these up, divide by the entire sample set minus one, and finish by taking the square root. Is this what you're looking for?

        1901 AL:
        Orioles score 761, allow 750 (+11)
        Americans score 759, allow 608 (+151)
        White Sox score 819, allow 632 (+187)
        Blues score 666, allow 831 (-165)
        Tigers score 742, allow 696 (+46)
        Brewers score 641, allow 828 (-187)
        Athletics score 805, allow 760 (+45)
        Senators score 683, allow 771 (-88)

        Average run differential: 0

        (11+151+187+165+46+187+45+88+0)=880
        divided by sample size minus one=110.
        Square root= about 10.5. Standard deviation is 10.5 runs.

        Is this what you mean?
        that is the set of data that I want, but you square each team's run differential, add them together, divide by the number of teams and then take the square root like this:

        1901 AL:
        Orioles score 761, allow 750 (+11) squared=+121
        Americans score 759, allow 608 (+151) squared=+22801
        White Sox score 819, allow 632 (+187) squared=+34969
        Blues score 666, allow 831 (-165) squared=+27225 (all squares are positive)
        Tigers score 742, allow 696 (+46) squared=: +2116
        Brewers score 641, allow 828 (-187) squared=: +34969
        Athletics score 805, allow 760 (+45) squared=: +2025
        Senators score 683, allow 771 (-88) squared=: +7744
        sum of all squared values= 131,970
        Divided by 8=16496.25
        Square root=128.4 so the standard deviation would be 128.4

        Comment


        • #5
          Yup, 128.4 is correct. I forgot to square the differentials. How does it relate to league quality?
          "Allen Sutton Sothoron pitched his initials off today."--1920s article

          Comment


          • #6
            League QUALITY? Well, in a completely open league, where anyone of any age could play, you would probably have a greater disparity between teams than in a highly filtered league with a high threshold. But there are lots of other factors at work. For example, high minor leagues might have a lower standard deviation of run differentials than majors because of a negative feedback effect: Once a player is established as outstanding, he leaves the league. And the worst players are replaced from below.

            In general, wins and losses line up pretty well with run differential, so it seems like you are asking whether balanced or unbalanced leagues are of higher quality. What is the lure of using run differential instead of just won-lost records?

            (edit) Sorry, Ty, I see you were asking brett, who would actually know the answer, not throwing it out for the group. Sorry, brett, for butting in.
            Last edited by Jackaroo Dave; 09-27-2012, 04:36 PM.
            Indeed the first step toward finding out is to acknowledge you do not satisfactorily know already; so that no blight can so surely arrest all intellectual growth as the blight of cocksureness.--CS Peirce

            Comment


            • #7
              Maybe it has nothing to do with LQ. I'm interested anyway. What the variation is large though because a few teams have a large share of the best players? It would suggest to me that that would tend to trap more value on their benches and in their minors, while the bottom teams might be using replacement level guys. If you balance things out, there are fewer MLB quality guys on benches or in the minors of good teams.
              Plus if we are talking about the highest league, (versus minors) you aren't going to lose good players to a better league.

              Even if it has no direct connection, I think it would be interesting to see anyway.

              Anyone want to do it just for one year per decade to see if there are trends?

              Comment


              • #8
                Originally posted by brett View Post
                Anyone want to do it just for one year per decade to see if there are trends?
                I'll do it. Gimme a bit
                "Allen Sutton Sothoron pitched his initials off today."--1920s article

                Comment


                • #9
                  Code:
                  1905	RS	RA	Diff	Dff^2
                  BOS	578	564	14	196
                  CHW	614	450	164	26896
                  CLE	563	587	-24	576
                  DET	511	608	-97	9409
                  NYH	588	622	-34	1156
                  PHA*	623	492	131	17161
                  SLB	509	608	-99	9801
                  WSH	560	623	-63	3969
                  				8645.5
                  Std Dev				92.98
                  				
                  1915	RS	RA	Diff	Dff^2
                  BOS*	668	499	169	28561
                  CHW	717	509	208	43264
                  CLE	539	669	-130	16900
                  DET	778	597	181	32761
                  NYY	583	588	-5	25
                  PHA	545	889	-344	118336
                  SLB	521	679	-158	24964
                  WSH	570	491	79	6241
                  				33881.5
                  STD Dev				184.07
                  				
                  1925	RS	RA	Diff	Dff^2
                  BOS	640	921	-281	78961
                  CHW	811	773	38	1444
                  CLE	783	817	-34	1156
                  DET	901	828	73	5329
                  NYY	706	774	-68	4624
                  PHA	831	713	118	13924
                  SLB	898	907	-9	81
                  WSH*	828	670	158	24964
                  				16310.375
                  STd Dev				127.71
                  				
                  1935	RS	RA	Diff	Dff^2
                  BOS	718	732	-14	196
                  CHW	738	750	-12	144
                  CLE	776	739	37	1369
                  DET*	919	665	254	64516
                  NYY	818	632	186	34596
                  PHA	710	869	-159	25281
                  SLB	718	930	-212	44944
                  WSH	823	902	-79	6241
                  				22160.875
                  STD Dev				148.87
                  				
                  1945	RS	RA	Diff	Dff^2
                  BOS	599	674	-75	5625
                  CHW	596	633	-37	1369
                  CLE	557	548	9	81
                  DET*	633	565	68	4624
                  NYY	676	606	70	4900
                  PHA	494	638	-144	20736
                  SLB	597	548	49	2401
                  WSH	622	562	60	3600
                  				5417
                  Std Dev				73.60
                  				
                  1955	RS	RA	Diff	Dff^2
                  BAL	540	754	-214	45796
                  BOS	755	652	103	10609
                  CHW	725	557	168	28224
                  CLE	698	601	97	9409
                  DET	775	658	117	13689
                  KCA	638	911	-273	74529
                  NYY*	761	569	192	36864
                  WSH	598	789	-191	36481
                  				31950.125
                  STD Dev				178.75
                  				
                  1965	RS	RA	Diff	Dff^2
                  BAL	641	578	63	3969
                  BOS	669	791	-122	14884
                  CAL	527	569	-42	1764
                  CHW	647	555	92	8464
                  CLE	663	613	50	2500
                  DET	680	602	78	6084
                  KCA	585	755	-170	28900
                  MIN*	774	600	174	30276
                  NYY	611	604	7	49
                  WSA	591	721	-130	16900
                  				11379
                  STd Dev				106.67
                  				
                  1975	RS	RA	Diff	Dff^2
                  BAL	682	553	129	16641
                  BOS*	796	709	87	7569
                  CAL	628	723	-95	9025
                  CHW	655	703	-48	2304
                  CLE	688	703	-15	225
                  DET	570	786	-216	46656
                  KCR	710	649	61	3721
                  MIL	675	792	-117	13689
                  MIN	724	736	-12	144
                  NYY	681	588	93	8649
                  OAK**	758	606	152	23104
                  TEX	714	733	-19	361
                  				11007.333
                  STD Dev				104.92
                  				
                  1985	RS	RA	Diff	Dff^2
                  BAL	818	764	54	2916
                  BOS	800	720	80	6400
                  CAL	732	703	29	841
                  CHW	736	720	16	256
                  CLE	729	861	-132	17424
                  DET	729	688	41	1681
                  KCR*	687	639	48	2304
                  MIL	690	802	-112	12544
                  MIN	705	782	-77	5929
                  NYY	839	660	179	32041
                  OAK	757	787	-30	900
                  SEA	719	818	-99	9801
                  TEX	617	785	-168	28224
                  TOR**	759	588	171	29241
                  				10750.143
                  STD Dev				103.68
                  				
                  1996	RS	RA	Diff	Dff^2
                  BAL***	949	903	46	2116
                  BOS	928	921	7	49
                  CAL	762	943	-181	32761
                  CHW	898	794	104	10816
                  CLE**	952	769	183	33489
                  DET	783	1103	-320	102400
                  KCR	746	786	-40	1600
                  MIL	894	899	-5	25
                  MIN	877	900	-23	529
                  NYY*	871	787	84	7056
                  OAK	861	900	-39	1521
                  SEA	993	895	98	9604
                  TEX**	928	799	129	16641
                  TOR	766	809	-43	1849
                  				15746.857
                  STD Dev				125.47
                  				
                  2005	RS	RA	Diff	Dff^2
                  BAL	729	800	-71	5041
                  BOS***	910	805	105	11025
                  CHW*	741	645	96	9216
                  CLE	790	642	148	21904
                  DET	723	787	-64	4096
                  KCR	701	935	-234	54756
                  LAA**	761	643	118	13924
                  MIN	688	662	26	676
                  NYY**	886	789	97	9409
                  OAK	772	658	114	12996
                  SEA	699	751	-52	2704
                  TBD	750	936	-186	34596
                  TEX	865	858	7	49
                  TOR	775	705	70	4900
                  				13235.143
                  STd Dev				115.04
                  Where * denotes pennant winner, ** denotes division winner, *** denotes wildcard. Ideally, lgRS+lgRA=0. Any discrepencies are due to missing RetroSheet data or IL play. If you need any more, let me know. I made a program to calculate it with relative ease.
                  "Allen Sutton Sothoron pitched his initials off today."--1920s article

                  Comment


                  • #10
                    Originally posted by brett View Post
                    Maybe it has nothing to do with LQ. I'm interested anyway. What the variation is large though because a few teams have a large share of the best players? It would suggest to me that that would tend to trap more value on their benches and in their minors, while the bottom teams might be using replacement level guys. If you balance things out, there are fewer MLB quality guys on benches or in the minors of good teams.
                    Plus if we are talking about the highest league, (versus minors) you aren't going to lose good players to a better league.
                    Oh, yeah, I suggested the high minors as an exception to the general rule that LQ varies inversely to competitive balance: The quality of play is poorer than the big leagues, but the run diff SD is, maybe, smaller.

                    In his article on why there are no .400 hitters, Stephen Gould suggests that it's because inter-player variation has shrunk, so a smaller BA gap now represents a larger SD difference. And this in turn is because baseball has evolved. Just like cars: there's much less variation in auto body shape, because they are all approaching the same aerodynamic ideal.
                    Indeed the first step toward finding out is to acknowledge you do not satisfactorily know already; so that no blight can so surely arrest all intellectual growth as the blight of cocksureness.--CS Peirce

                    Comment


                    • #11
                      Run differential might track runs scored. I know it doesn't linearally--I was surprised how little difference in run diff there was between high- and low-scoring eras. But there is probably some effect. I would guess the absolute diff would increase for both the best and the worst teams.

                      Another possibly troublesome factor is that changes like expansion, contraction, integration, and war affect both quality and standard deviation: Expansion reduces overall quality by bringing in lower-level players, and by the same token increases individual variation. If the distribution of new players is not equitable (as in the 60s), then variation at the team level would increase. League contraction works in the opposite directions, and again, I would guess that some teams can take more advantage of this than others. (Ditto integration. War is tougher to figure out.)

                      So you have quality and variation moving in opposite directions, but neither one an independent variable, both dependent on big, sudden changes in league structure.

                      Not to horn in on this, but Ty, if it really is no trouble, why not run every year? For example, if we had 1913-16, we could look for any effects due to the Federal League.
                      Indeed the first step toward finding out is to acknowledge you do not satisfactorily know already; so that no blight can so surely arrest all intellectual growth as the blight of cocksureness.--CS Peirce

                      Comment


                      • #12
                        Originally posted by Tyrus4189Cobb View Post
                        Code:
                        1905	RS	RA	Diff	Dff^2
                        BOS	578	564	14	196
                        CHW	614	450	164	26896
                        CLE	563	587	-24	576
                        DET	511	608	-97	9409
                        NYH	588	622	-34	1156
                        PHA*	623	492	131	17161
                        SLB	509	608	-99	9801
                        WSH	560	623	-63	3969
                        				8645.5
                        Std Dev				92.98
                        				
                        1915	RS	RA	Diff	Dff^2
                        BOS*	668	499	169	28561
                        CHW	717	509	208	43264
                        CLE	539	669	-130	16900
                        DET	778	597	181	32761
                        NYY	583	588	-5	25
                        PHA	545	889	-344	118336
                        SLB	521	679	-158	24964
                        WSH	570	491	79	6241
                        				33881.5
                        STD Dev				184.07
                        				
                        1925	RS	RA	Diff	Dff^2
                        BOS	640	921	-281	78961
                        CHW	811	773	38	1444
                        CLE	783	817	-34	1156
                        DET	901	828	73	5329
                        NYY	706	774	-68	4624
                        PHA	831	713	118	13924
                        SLB	898	907	-9	81
                        WSH*	828	670	158	24964
                        				16310.375
                        STd Dev				127.71
                        				
                        1935	RS	RA	Diff	Dff^2
                        BOS	718	732	-14	196
                        CHW	738	750	-12	144
                        CLE	776	739	37	1369
                        DET*	919	665	254	64516
                        NYY	818	632	186	34596
                        PHA	710	869	-159	25281
                        SLB	718	930	-212	44944
                        WSH	823	902	-79	6241
                        				22160.875
                        STD Dev				148.87
                        				
                        1945	RS	RA	Diff	Dff^2
                        BOS	599	674	-75	5625
                        CHW	596	633	-37	1369
                        CLE	557	548	9	81
                        DET*	633	565	68	4624
                        NYY	676	606	70	4900
                        PHA	494	638	-144	20736
                        SLB	597	548	49	2401
                        WSH	622	562	60	3600
                        				5417
                        Std Dev				73.60
                        				
                        1955	RS	RA	Diff	Dff^2
                        BAL	540	754	-214	45796
                        BOS	755	652	103	10609
                        CHW	725	557	168	28224
                        CLE	698	601	97	9409
                        DET	775	658	117	13689
                        KCA	638	911	-273	74529
                        NYY*	761	569	192	36864
                        WSH	598	789	-191	36481
                        				31950.125
                        STD Dev				178.75
                        				
                        1965	RS	RA	Diff	Dff^2
                        BAL	641	578	63	3969
                        BOS	669	791	-122	14884
                        CAL	527	569	-42	1764
                        CHW	647	555	92	8464
                        CLE	663	613	50	2500
                        DET	680	602	78	6084
                        KCA	585	755	-170	28900
                        MIN*	774	600	174	30276
                        NYY	611	604	7	49
                        WSA	591	721	-130	16900
                        				11379
                        STd Dev				106.67
                        				
                        1975	RS	RA	Diff	Dff^2
                        BAL	682	553	129	16641
                        BOS*	796	709	87	7569
                        CAL	628	723	-95	9025
                        CHW	655	703	-48	2304
                        CLE	688	703	-15	225
                        DET	570	786	-216	46656
                        KCR	710	649	61	3721
                        MIL	675	792	-117	13689
                        MIN	724	736	-12	144
                        NYY	681	588	93	8649
                        OAK**	758	606	152	23104
                        TEX	714	733	-19	361
                        				11007.333
                        STD Dev				104.92
                        				
                        1985	RS	RA	Diff	Dff^2
                        BAL	818	764	54	2916
                        BOS	800	720	80	6400
                        CAL	732	703	29	841
                        CHW	736	720	16	256
                        CLE	729	861	-132	17424
                        DET	729	688	41	1681
                        KCR*	687	639	48	2304
                        MIL	690	802	-112	12544
                        MIN	705	782	-77	5929
                        NYY	839	660	179	32041
                        OAK	757	787	-30	900
                        SEA	719	818	-99	9801
                        TEX	617	785	-168	28224
                        TOR**	759	588	171	29241
                        				10750.143
                        STD Dev				103.68
                        				
                        1996	RS	RA	Diff	Dff^2
                        BAL***	949	903	46	2116
                        BOS	928	921	7	49
                        CAL	762	943	-181	32761
                        CHW	898	794	104	10816
                        CLE**	952	769	183	33489
                        DET	783	1103	-320	102400
                        KCR	746	786	-40	1600
                        MIL	894	899	-5	25
                        MIN	877	900	-23	529
                        NYY*	871	787	84	7056
                        OAK	861	900	-39	1521
                        SEA	993	895	98	9604
                        TEX**	928	799	129	16641
                        TOR	766	809	-43	1849
                        				15746.857
                        STD Dev				125.47
                        				
                        2005	RS	RA	Diff	Dff^2
                        BAL	729	800	-71	5041
                        BOS***	910	805	105	11025
                        CHW*	741	645	96	9216
                        CLE	790	642	148	21904
                        DET	723	787	-64	4096
                        KCR	701	935	-234	54756
                        LAA**	761	643	118	13924
                        MIN	688	662	26	676
                        NYY**	886	789	97	9409
                        OAK	772	658	114	12996
                        SEA	699	751	-52	2704
                        TBD	750	936	-186	34596
                        TEX	865	858	7	49
                        TOR	775	705	70	4900
                        				13235.143
                        STd Dev				115.04
                        .
                        Random fluctuation or does somebody see anything obvious that correlated to the 1915, 1945, 1955 season outliers?
                        1885 1886 1926 1931 1934 1942 1944 1946 1964 1967 1982 2006 2011

                        1887 1888 1928 1930 1943 1968 1985 1987 2004 2013

                        1996 2000 2001 2002 2005 2009 2012 2014 2015


                        The Top 100 Pitchers In MLB History
                        The Top 100 Position Players In MLB History

                        Comment


                        • #13
                          A couple of other issues or points of interest: It probably doesn't matter unless one tries to work out a specific relationship between the variations in league quality and run differentials, but . . . .

                          Run differential values aren't independent. Obviously, a plus run for one team is a minus run for another. So you wind up with twice as many differential runs as there are actual runs, just as you do with wins and losses. I think counting a run scored as half a differential run, and a run allowed as minus half, just as a win counts as a half game, might make sense.

                          I'm a little antsy about measuring variation with the standard deviation for the following reasons: The standard deviation is nice because the standard deviation of the sample mean typically has a normal distribution, whose own standard deviation, aka standard error, aka root mean error, is equal to the sample standard deviation divided by the square root of the sample size. That helps you make infernces and estimates about the population mean. As a purely descriptive statistic, however, it is no better than the mean absolute deviation (the average of the absolute value of the each team's difference from the mean).

                          But in this case, the sample mean is not a variable: it's a constant; it's always equal to the population mean, zero. The number of runs ahead is equal to the number of runs behind. The sample mean has no standard deviation, because it's a constant. Consequently, no kind of statistical work connecting Run Differential to another variable can be "parametric," i.e. based on the assumption that the sample mean is a normally distributed random variable with a certain mean and a certain standard deviation.
                          Indeed the first step toward finding out is to acknowledge you do not satisfactorily know already; so that no blight can so surely arrest all intellectual growth as the blight of cocksureness.--CS Peirce

                          Comment


                          • #14
                            Originally posted by Jackaroo Dave View Post
                            Not to horn in on this, but Ty, if it really is no trouble, why not run every year? For example, if we had 1913-16, we could look for any effects due to the Federal League.
                            That will take some time because I have to input every year. I can't do it right now but I'll get on it ASAP.
                            "Allen Sutton Sothoron pitched his initials off today."--1920s article

                            Comment


                            • #15
                              Originally posted by Matthew C. View Post
                              Random fluctuation or does somebody see anything obvious that correlated to the 1915, 1945, 1955 season outliers?
                              well, oddly I tohught '45 would be high because of the war. Pull out x percent and bring up the next guy. Plus didn't they deaden the ball?

                              Comment

                              Ad Widget

                              Collapse
                              Working...
                              X