Looking for the Perfect “Batter” Recipe
I've attempted to apply whatever data I considered most important to address a question raised in many threads: Who are
the best hitters in MLB history? One thread debated right-handed hitters; and I decided to start with righties here.
I limit my own scope of history to the period 1901 – Present; so pre 1901 players are not included, except for two who had considerable careers in 1901 and after.
My primary focus in on runs created per plate appearance, for which I use Baseball Reference as the primary source. Here are the steps I’ve taken:
1. Realizing that batters perform in varying run scoring “climates,” I made a grid for each season and each league [1901 through 2011] into which I put the actual RC/PA rates.
2. I then scanned each player career and established a career prevalent RC league average for each player’s career.
3. I equated 550 PA to a “season” and derived total “seasons” from dividing TPA by 550.
4. I applied each player’s RC/PA to the 550 PA to get a “season” RC.
5. Doing the same application for league average RC for the season, I arrived at a league RC for 550 PA.
6. Subtracting 5 above from 4 above, we get a RC number of superiority of player above league average.
7. Multiplying 6 by “seasons” we get career batting runs against contemporary average.
8. I also wanted to get a sense of to what degree [%] each player’s RC “rate” was superior to his contemporary career MLB average player production.
Example: Harmon Killebrew had a career that spanned 1954-1975, with many seasons along the way having limited play, also tapering off near the end. I applied all partial seasons and arrived at a total of 17.88 “seasons,” made a bit more uniform [and convenient] by the convention of 550 PA = 1 “season.”
Killebrew’s own RC rate was .1633 per PA. The sum of all his career seasons took place in League play which averaged .107984 RC/PA.
Killebrew: .1633*550 = 89.82 RC per 550 PA. His contemporaries were .107984*550 = 59.39, giving Killebrew +29.93 runs above contemporary average per season. Over 17.88 seasons, that totals 535.15 batting runs above contemporary average.
I then shrunk all of Killebrew’s career into a run expectancy matrix [devised by Tom Tangotiger] to get a run generation projection of 7.118 runs per 37 PA, the default model.
Tangotiger’s Run Expectancy Matrix was by created Markhov Chain regression analysis and uses At-Bats [rather than PA] as its primary input and driving force. However, since K’s and BB’s are factored in, there is an element of PA performance at play as well.
The product of all this is, I hope, a useful tool for posters who might appreciate a digest of hitter career production with a sense of competitive context provided. That data in each player line summary,I believe, allows various viewpoints to be considered.
The Killebrew Model:
Killebrew .1633>.10798 = +51.23%; 550= 89.82>59.39=+30.43 * 17.88 = 544.03
I did this for each of the players mentioned in this post.
The Players: [Final Entry = Career RC > Contemporary MLB AVG; Figure in brackets = run matrix runs projection, each career reduced to 37 AB model].
Ho. Wagner .1584>.10632 = +48.98%; 550= 87.12>58.48= +28.64*17.53= 502.06 [6.727]
Lajoie .1571>.10727= +46.46%; 550= 86.40>59.0= +27.4* 15.5= 411.27 [6.275]
Greenberg .2117>.1261= +67.87%; 550= 116.44>69.36= +47.08*10.98= 516.94 [9.223]
F. Robinson .1812>.10736= +68.8%; 550= 99.68>59.05= +40.63*21.35= 867.41 [7.534]
J. DiMaggio .2045>.12477= + 63.9%; 550= 112.4*>68.62= + 43.86*13.95= 611.82 [8.457]
Killebrew .1633>.10632= +51.23%; 550= 89.82>59.39= +30.43*17.88= 544.03 [7.118]
Piazza .1779>.12093= +47.11%; 550= 97.86>65.61= +32.35*14.08= 455.55 [7.443]
Pujols .2197>.1175= +86.98%; 550= 120.83>64.64= +56.19*11= 618.09 [9.517]
Foxx .2201>.12792= +72.06%; 550= 121.06>70.36= +50.7*17.63= 893.79 [9.991]
H. Wilson .1854>.1229= +50.84%; 550= 101.97>67.6= +34.37*9.9= 340.26 [7.950]
Heilmann .1858>.12023= +54.54%; 550= 102.19>66.13= +36.06*14.85= 535.49 [8.186]
Mays .1895>.10993= +72.38%; 550= 104.23>60.46= 43.77*22.72= 994.45 [7.843]
Hornsby .2161>.115143= +87.68%; 550= 118.86> 55.53 = +55.53*15.645= 868.77 [9.817]
F. Thomas .1987>.126607= +56.94%; 550= 109.29>69.63 = +39.66*15.225= 603.82 [9.070]
H. Aaron .1831>.1058= +73.06%; 550= 100.71>58.19= +42.52*21.85= 920.06 [7.216]
V. Guerrero .1794>.12253= +57.93%; 550= 98.72>67.39= +31.33*13.56= 424.83 [7.452]
M. Ramirez .2051>.12703= +61.46%; 550= 112.81>69.87= +42.94*16.425= 705.29 [8.905]
Kiner .1829>.11581= +57.93%; 550= 100.58>63.70= +36.88*10= 368.80 [8.504]
“D.” Allen .1769>.10525= +68.08%; 550= 97.30>57.89= +39.41*11.4= 449.27 [7.340]
McGwire .1996>.12537= +59.20%; 550= 109.78>68.96= +40.82*12.375= 505.15 [8.491]
Bagwell .1896>.1186= +59.87%; 550= 104.27>65.23= +39.04*13.65= 532.90 [8.283]
A. Rodriguez .1942>.12694= +52.98%; 550= 106.80>69.82= +36.98*15.9= 587.98 [7.785]
M. Schmidt .1757>.10655= +64.90%; 550= 96.04>58.60= +37.44*15.71= 588.18 [7.361]
As for the bracketed run expectancy for each player, comparison with the Baseball Reference Runs Created/Game [RC/G] reveals a close correlation.
In turn, RC/G is the same as RC per 27 Outs, which at some point was defined as the runs a team of this player would produce in a nine inning game.
Hope somebody finds this all useful. It helped me sort my own beliefs a bit more sharply, with some players being better appreciated that I'd had them before. One player in particular, whom I always regarded as "pretty special," looms larger than ever, as I see it.
I've attempted to apply whatever data I considered most important to address a question raised in many threads: Who are
the best hitters in MLB history? One thread debated right-handed hitters; and I decided to start with righties here.
I limit my own scope of history to the period 1901 – Present; so pre 1901 players are not included, except for two who had considerable careers in 1901 and after.
My primary focus in on runs created per plate appearance, for which I use Baseball Reference as the primary source. Here are the steps I’ve taken:
1. Realizing that batters perform in varying run scoring “climates,” I made a grid for each season and each league [1901 through 2011] into which I put the actual RC/PA rates.
2. I then scanned each player career and established a career prevalent RC league average for each player’s career.
3. I equated 550 PA to a “season” and derived total “seasons” from dividing TPA by 550.
4. I applied each player’s RC/PA to the 550 PA to get a “season” RC.
5. Doing the same application for league average RC for the season, I arrived at a league RC for 550 PA.
6. Subtracting 5 above from 4 above, we get a RC number of superiority of player above league average.
7. Multiplying 6 by “seasons” we get career batting runs against contemporary average.
8. I also wanted to get a sense of to what degree [%] each player’s RC “rate” was superior to his contemporary career MLB average player production.
Example: Harmon Killebrew had a career that spanned 1954-1975, with many seasons along the way having limited play, also tapering off near the end. I applied all partial seasons and arrived at a total of 17.88 “seasons,” made a bit more uniform [and convenient] by the convention of 550 PA = 1 “season.”
Killebrew’s own RC rate was .1633 per PA. The sum of all his career seasons took place in League play which averaged .107984 RC/PA.
Killebrew: .1633*550 = 89.82 RC per 550 PA. His contemporaries were .107984*550 = 59.39, giving Killebrew +29.93 runs above contemporary average per season. Over 17.88 seasons, that totals 535.15 batting runs above contemporary average.
I then shrunk all of Killebrew’s career into a run expectancy matrix [devised by Tom Tangotiger] to get a run generation projection of 7.118 runs per 37 PA, the default model.
Tangotiger’s Run Expectancy Matrix was by created Markhov Chain regression analysis and uses At-Bats [rather than PA] as its primary input and driving force. However, since K’s and BB’s are factored in, there is an element of PA performance at play as well.
The product of all this is, I hope, a useful tool for posters who might appreciate a digest of hitter career production with a sense of competitive context provided. That data in each player line summary,I believe, allows various viewpoints to be considered.
The Killebrew Model:
Killebrew .1633>.10798 = +51.23%; 550= 89.82>59.39=+30.43 * 17.88 = 544.03
I did this for each of the players mentioned in this post.
The Players: [Final Entry = Career RC > Contemporary MLB AVG; Figure in brackets = run matrix runs projection, each career reduced to 37 AB model].
Ho. Wagner .1584>.10632 = +48.98%; 550= 87.12>58.48= +28.64*17.53= 502.06 [6.727]
Lajoie .1571>.10727= +46.46%; 550= 86.40>59.0= +27.4* 15.5= 411.27 [6.275]
Greenberg .2117>.1261= +67.87%; 550= 116.44>69.36= +47.08*10.98= 516.94 [9.223]
F. Robinson .1812>.10736= +68.8%; 550= 99.68>59.05= +40.63*21.35= 867.41 [7.534]
J. DiMaggio .2045>.12477= + 63.9%; 550= 112.4*>68.62= + 43.86*13.95= 611.82 [8.457]
Killebrew .1633>.10632= +51.23%; 550= 89.82>59.39= +30.43*17.88= 544.03 [7.118]
Piazza .1779>.12093= +47.11%; 550= 97.86>65.61= +32.35*14.08= 455.55 [7.443]
Pujols .2197>.1175= +86.98%; 550= 120.83>64.64= +56.19*11= 618.09 [9.517]
Foxx .2201>.12792= +72.06%; 550= 121.06>70.36= +50.7*17.63= 893.79 [9.991]
H. Wilson .1854>.1229= +50.84%; 550= 101.97>67.6= +34.37*9.9= 340.26 [7.950]
Heilmann .1858>.12023= +54.54%; 550= 102.19>66.13= +36.06*14.85= 535.49 [8.186]
Mays .1895>.10993= +72.38%; 550= 104.23>60.46= 43.77*22.72= 994.45 [7.843]
Hornsby .2161>.115143= +87.68%; 550= 118.86> 55.53 = +55.53*15.645= 868.77 [9.817]
F. Thomas .1987>.126607= +56.94%; 550= 109.29>69.63 = +39.66*15.225= 603.82 [9.070]
H. Aaron .1831>.1058= +73.06%; 550= 100.71>58.19= +42.52*21.85= 920.06 [7.216]
V. Guerrero .1794>.12253= +57.93%; 550= 98.72>67.39= +31.33*13.56= 424.83 [7.452]
M. Ramirez .2051>.12703= +61.46%; 550= 112.81>69.87= +42.94*16.425= 705.29 [8.905]
Kiner .1829>.11581= +57.93%; 550= 100.58>63.70= +36.88*10= 368.80 [8.504]
“D.” Allen .1769>.10525= +68.08%; 550= 97.30>57.89= +39.41*11.4= 449.27 [7.340]
McGwire .1996>.12537= +59.20%; 550= 109.78>68.96= +40.82*12.375= 505.15 [8.491]
Bagwell .1896>.1186= +59.87%; 550= 104.27>65.23= +39.04*13.65= 532.90 [8.283]
A. Rodriguez .1942>.12694= +52.98%; 550= 106.80>69.82= +36.98*15.9= 587.98 [7.785]
M. Schmidt .1757>.10655= +64.90%; 550= 96.04>58.60= +37.44*15.71= 588.18 [7.361]
As for the bracketed run expectancy for each player, comparison with the Baseball Reference Runs Created/Game [RC/G] reveals a close correlation.
In turn, RC/G is the same as RC per 27 Outs, which at some point was defined as the runs a team of this player would produce in a nine inning game.
Hope somebody finds this all useful. It helped me sort my own beliefs a bit more sharply, with some players being better appreciated that I'd had them before. One player in particular, whom I always regarded as "pretty special," looms larger than ever, as I see it.
Comment