Page 2 of 2 FirstFirst 12
Results 26 to 35 of 35

Thread: Data available - Ratings and "Halls"

  1. #26

    Pitchers, edition 637 now available

    This concerns the second dataset described in the thread.

    637pitchers.txt is attached (below)

    Format and dimensions
    - one comma-delimited table, 638 x 22, where the first row gives field names
    NOTE. The format is comma-separated values, which is commonly indicated by filename extension ".csv". Here the filename ends ".txt" because BaseBall-Fever doesn't accept the other. Some of you will benefit your memories, at least, by renaming it.

    - The Baseball Database for MS Access, "lahman5.4" covering 1871-2006. Multiple editions, annually updated, are distributed at Sean Lahman's Baseball Archive.
    - player pages at
    - player pages or 'DT cards' at

    Extensive Scope
    : 637 pitchers or major league pitching careers, 1871-2008.
    They include everyone with one these career achievements,
    - at least 2000 innings (407 checked against career leaders by Innings at bb-ref)
    - at least 1500 innings and 109 ERA+ (239)
    - at least 1000 innings and 116 ERA+ (186 checked against career leaders by ERA+)
    They also include everyone from the phase one polls for "Top 100 Pitchers" at (346?).

    Intensive Scope
    • playerID, nameFirst, nameLast, debut,
    • lahmanID, time, name,
    • IP, ERA+, PA, OPS+
    • RpS (Reliefs per Start)

    playerID, nameFirst, nameLast, debut
    At the last minute, for convenience I included these four fields from lahman5.4, edited only by truncating [debut] at four characters, the debut year only. [playerID] rather than lahmanID (below) is used in that database to identify records in tables of season playing data. [playerID] is also used to define internet addresses for player pages at both baseball-reference and baseballprospectus.

    lahmanID, time, name

    [lahmanID] is the unique numerical identifier for people in the baseball-databank and in lahman5.4

    [time] represents mlb debut decade as an integer -3 to 10 meaning 1870s to 2000s

    [name] is a version of the pitcher's name, surname followed by one space and some initial(s).

    XIP, RA, DH, DR, DW, NRA, RAA, PRAA, PRAR, DERA (blue: scanty coverage of lesser pitchers)

    The first ten fields of performance data are new sabrmetrics by Clay Davenport from the player 'DT cards' at Three of them (blue) are present only in some records that I added or checked recently. Career coverage should be complete through 2008 but the sabrmetrics evolve and I systematically checked only active players after the 2008 season.

    [DERA] is a measure of pitching quality on a runs per 9 innings scale. Par is 4.50, so 450/DERA is an index on the ERA+ scale. I call it 'DERA+'. Compared with other normalized measures of pitcher runs allowed (or runs/9), the crucial difference is that Davenport uses his own attribution of runs to pitchers and fielders rather than use official earned and unearned runs. The other "normalizations" are essentially to what other sabrmetricians use, adjusting for the team's run-scoring environment.

    IP, ERA+, PA, OPS+
    The sources for IP, ERA+, PA, and OPS+ are player pages at and "the Baseball Database" lahman54. That database covers 1871-2006. I updated many active players thru 2008 mainly by checking against career leaders, IP and ERA+. For the active players I tried to remember to check innings at the baseballprospectus DT cards.

    [RpS] for 'Reliefs per Start' is a career statistic: games pitched in relief or ^Reliefs^ divided by ^Starts plus one^. Specifically, using official pitching statistics G and GS, that is (G-GS)/(GS+1). The highest values of RpS are the numbers of pitching games for a few relief pitchers who never started; the lowest value is zero for several starting pitchers who never relieved. RpS is derived from the 1871-2006 database without systematic update thru 2008.
    Attached Files Attached Files
    Last edited by Paul Wendt; 02-06-2009 at 06:26 PM. Reason: link, filename extension

  2. #27

    637 Pitchers: extension PA, correction OPS+

    I have revised the table of pitchers data that is attached to the preceding article, namely 637pitchers.txt (no change in filename).

    Now it provides career PA for all 637 pitchers, PA=0 if no plate appearances, and career OPS+ for everyone with PA>0.

    At one time I had entered OPS+ = -50 or -30 or 0 for many pitchers with small numbers of PA. A month or three ago I forgot about that and entered another batch or two of pitchers with their true OPS+, and only recently noticed the mixture. Oops.

    Considerations regarding small number of Plate Appearances are now left to the user.

    Leaders by career OPS+, current edition
    252 Gregg Olson, 5 pa
    207 Babe Ruth, 10616 pa
    166 Roberto Hernandez, 2 pa
    145 Terry Forster, 86 pa
    133 Bob Caruthers, 2906 pa

    Trailers by career OPS+, current edition
    -100 John Lackey
    -100 sixteen others in this table

    No success at bat? (OPS+ = -100)
    The minimum possible value of OPS+ is -100 corresponding to "perfect" .000 onbase and slugging averages.

    John Lackey is the leader in this table with no success in 22 career plate appearances through 2008; the others flawless batters in this table have 1 to 8 career PA. Some pitchers with OPS+ = -100 may be reasonably successful batters, given what they are asked or expected to do, if they have many successful sacrifice hits. Remarkably Lackey has only seven strikeouts yet no sacrifice hits.

    No plate appearances (PA is null)
    There are 13 pitchers in this table with no plate appearances and therefore null OPS+. Some users may prefer to enter zeroes.

    Mike Flanagan is the leader with 2770 innings pitched. He must be the all-time innings leader in the category, which is practically limited to American League careers 1973 to 1993. In seven World Series plate appearances Flanagan walked once.
    Last edited by Paul Wendt; 02-06-2009 at 06:23 PM. Reason: embellish, clarify "in this table"

  3. #28

    fielding games by team-season and position (new file)

    This is the third dataset described in the thread, first edition. is attached.
    This table reports playing time by fielding position at the team-season level. It covers every major league team-season 1871 to 2006 (more than 2500 team-seasons) and eight fielding positions {P, C, 1B, 2B, SS, 3B, OF, DH}.

    Format and dimensions
    - one comma-delimited table, 19790 x 6, where the first row gives field names
    NOTE. The format is comma-separated values, which is commonly indicated by filename extension ".csv". Here the filename ends ".txt" because BaseBall-Fever doesn't accept the other. Some of you will benefit your memories, at least, by renaming it. Change it to *.csv if you commonly use MS Excel.

    Explanation by example
    (lines 1 and 5 in the file)
    : yearID, lgID, teamID, POS, maxPlayer, sumPlayer
    : 1871, NA, BS1, C, 0.93, 1.16

    yearID, lgID, teamID, POS
    1871, NA, BS1, C

    The first four columns identify a major league team-season and a fielding position including designated hitter. The example is the 1871 Boston Red Stockings and the catcher position.

    maxPlayer, sumPlayer
    0.93, 1.16

    The last two columns give games fielding the position expressed as a share of team games, first for the team's most frequent fielder at the position (its "regular" in a sense") and then for everyone on the team in sum. The latter is one plus the frequency of in-game switches and substitutions at the fielding position.

    : One player, Cal McVey, fielded catcher in 93% of team games.
    : The team used 1.16 catchers per game and made 0.16 catcher changes per game.

    Complete Fielding Games (cFG)

    [maxPlayer] and [sumPlayer] imply floors under the team's and the leading player's complete fielding games expressed as a share of the season (cFG share).

    team's fCG share >= 2 - sumPlayer
    : 1871 BRS catcher fCG share >= 2 - 1.16 = 0.84

    regular's fCG share >= maxPlayer + 1 - sumPlayer
    : McVey's catcher fCG share >= 0.93 + 1 - 1.16 = 0.77

    The original Boston Red Stockings enjoyed at least 84% complete games fielding catcher including at least 77% complete games from Cal McVey (84% and 77% of the team season.

    Because pitcher is one of the fielding positions, the note on complete fielding games pertains to complete pitching games too.

    These data are useful for studying the distribution of fielding time across teams and history. Of course there is no information here concerning why a team's "regular" thirdbaseman played 80% of its games. Did he join the team after the first month, or frequently play shortstop, or spend 20% of the team season resting, suspended, and disabled?

    add: This is the third distinct dataset covered in the thread.
    Attached Files Attached Files
    Last edited by Paul Wendt; 02-06-2009 at 06:10 PM.

  4. #29

    3B in 1900

    Using, attached to the preceding article:

    Here is thirdbase in 1900 when there were only 8 NL teams, the smallest year for the major leagues since 1881.

    teamID POS maxPlayer sumPlayer

    BRO 3B ; 0.82 1.02
    BSN 3B ; 0.99 1 -- Collins, the only fixture in the league
    CHN 3B ; 0.72 1
    CIN 3B ; 0.46 1 -- Steinfeldt, who played half third and half second
    NY1 3B ; 0.85 1.03
    PHI 3B ; 0.71 1.03
    PIT 3B ; 0.73 1.02
    SLN 3B ; 0.69 1.04 -- McGraw, a long holdout and sometimes injured

    Boston enjoyed great stability at third with one man Jimmy Collins working 99% of its games. Only two other teams used even a 75% thirdbaseman and one of the eight teams did not use anyone half-time. Cincinnati did not have one man play half of its games there. Harry Steinfeldt subbed for Charlie Irwin at third, then replaced Joe Quinn, the short-term successor to Bid McPhee at second.

    Boston and two other teams never changed thirdbasemen during a game (sumPlayer=1). St Louis used 1.04 3Bmen per game, made five or six in-game changes.
    Last edited by Paul Wendt; 02-06-2009 at 06:42 PM.

  5. #30

    first-year DH (1973)

    Again using, attached to #28.
    I don't use Excel but I suppose that that program can execute these simple selections and averages.

    Here is the first DH season, 1973, when there were twelve AL teams.

    team use of the DH role, the first season
    yr lg tm pos maxPlayer sumPlayer
    1973 AL BAL DH ; 0.78 1.018 *
    1973 AL BOS DH ; 0.87 1.12
    1973 AL CAL DH ; 0.78 1.02 *
    1973 AL CHA DH ; 0.46 1.04 *
    1973 AL CLE DH ; 0.43 1.14
    1973 AL DET DH ; 0.73 1.27
    1973 AL KCA DH ; 0.22 1.16
    1973 AL MIN DH ; 0.87 1.07
    1973 AL ML4 DH ; 0.50 1.20
    1973 AL NYA DH ; 0.65 1.20
    1973 AL OAK DH ; 0.66 1.16
    1973 AL TEX DH ; 0.71 1.03 *

    1973 AL average ; 0.64 1.12

    Boston and Minnesota both used one man at DH in 87% of their games (Orlando Cepada and Tony Oliva). That may not seem like much but no other team used even an 80% DH and the league average was 64% games at DH for the team leader.

    Four of the twelve teams (*) rarely replaced their designated hitters during the game, no more than 7 times in 162 games (1.04). On the other hand three teams led by Detroit (1.27) replaced the DH at least 20% of the time, which is more than thirty times during the season, Detroit more than forty times. The league-average 12% in-game substitution rate (1.12) seems high to me.

    Changing use of the DH role
    time period ; maxPlayer sumPlayer
    1973 AL average ; 0.64 1.12
    1974-79 average ; 0.54 1.17
    . . .
    2001-06 average ; 0.47 1.04

    The use of DH as a regular role for one player declined immediately after the first season. During the next six seasons 1974-79 the team-leading DHs filled that role in merely 54% of team games. There have been two relative heydays for the regular DH, 1981-1984 at about 64% of team games and 1991-96 at about the same.

    Presently teams make more DH changes between games than ever but they have now sharply reduced the number of in-game changes. During the last six seasons covered here 2001-2006 there was a DH change in merely 4% of games with 96% complete games (1.04).

    These notes entirely ignore use of the DH by National League teams in several interleague games every season.

    Ninety-Percent Designated Hitters
    In 34 seasons covered here we have seen only 29 "ninety-percent DHs" meaning players who filled the role in 90% of games during one team-season. The chart provides the annual numbers.

    1970s : - - - 0 2 1 0 1 2 1
    1980s : 0 3 5 2 0 1 0 0 1 0
    1990s : 1 1 0 1 3 1 0 0 1 0
    2000s : 1 0 0 0 0 1 0

    What happened early in the 1980s?
    How many of the 29 ninety-percenters can you name?
    Last edited by Paul Wendt; 02-06-2009 at 07:20 PM.

  6. #31

    Pitchers, edition 848 now available

    This concerns the second dataset described in the thread, career data for pitchers.

    800pitchers.csv provides some career data for 848 pitchers.
    This new edition is available at for seven days.

    See the documentation for the preceding edition at #26 and 27 above.

    This one includes the current, spring 2009 edition of the ten "Advanced Pitching Statistics" by Clay Davenport, in ten columns "newXIP" to "newDERA". At the same time it covers about 200 more pitchers (details to follow).
    Last edited by Paul Wendt; 05-22-2009 at 08:14 PM.

  7. #32

    Pitchers, edition 848 now available

    The 848 pitchers include all 688 with 1500 or more innings, 1871 to 2008.

    Evidently there are 160 pitchers with fewer than 1500 career innings. That selection is occasionally idiosyncratic but usually by fame or success in some sense.

    For some of the active pitchers with 1500 to 1999 innings, the Advanced Pitching Statistics by Davenport include part of the 2009 season.
    Last edited by Paul Wendt; 05-22-2009 at 08:26 PM.

  8. #33

    selected Rankings and Ratings for 1583 players

    This is the fourth dataset described in the thread.

    BJHOMCHONE.txt is attached. It's a smaller version of the big table introduced at post #1, which is not available here.

    This table cover 1583 players in 1584 rows and 25 columns. The filename extension is .txt but the format is csv. If you use Excel, rename it from *.txt to *.csv (which baseball-fever doesn't accept).

    The 1583 players are
    - all 1098 ranked by Bill James (2001) at nine fielding positions -- from the New BJHBA (print)
    - all 999 who rank in the Top 500 Pitchers or Top 500 Positions Players by Sean Smith's WAR (2008) -- WAR at baseballprojection
    - all 240 in the Hall of Merit, and everyone who scored a point in the HOM annual elections 2008 to 2010 -- for example, 2010 results

    The name 'BJHOMCHONE' refers to Bill James, the Hall of Merit, and Sean Smith (chone). The central columns of the table provide
    - Bill James rankings (columns 5 to 7, or E-G in Excel)
    - Hall of Merit member rankings and recent election results (cols 8-16 or H-P)
    - Bill James' rating cWS, career win shares, for 1876-2001 (col 17 or Q)
    - Sean Smith's rating WAR, career wins above replacement, for 1871-2008 (cols 18-19 or R-S)

    I'm not sure when or whether I will provide more documentation than I did last month in the Hall of Merit discussion, "CHONE's WARP and the Hall of Merit" (#10, 70, 72).
    Attached Files Attached Files
    Last edited by Paul Wendt; 12-21-2009 at 08:23 AM. Reason: more references

  9. #34

    selected Rankings - some documentation

    brett asked about the Hall of Merit rankings yesterday in "Barry Larkin", which prompted me to post the table here. Over there I added this bit of pointed documentation (slightly revised).
    This 1583 x 25 table in csv format is attached at "Data available" (above).

    Columns 8-16 (Excel H-P) give Hall of Merit data, namely the results of 2008/2009 special ranking elections and the three annual elections 2008-2010.
    - 8-10, ranking all HOMers within fielding position (12 rankings including 4 for pitchers)
    - 11-12, ranking all HOMers who are not in Cooperstown within eligibility groups (4 rankings)
    - 14-16, scores or points in annual elections 2010 2009 2008 (3 official election results)

    - 18-19, Sean Smith's rating WAR, career wins above replacement, for 1871-2008
    Reference: WAR at baseballprojection
    : Smith has updated that site to cover 2009 and to mark HOM members in the Top 500 lists.
    : The two Top 500 lists feature batting/baserunning/fielding WAR and pitching WAR respectively. Neither one covers the whole player, except Position Players who never pitched in the majors.
    : column 18 (R) "WAR" in my table gives career b/b/f WAR or career pitching WAR, whichever is greater. There are 999 entries because John Ward is in both Top 500 lists.
    : column 19 (S) "WAR-alt" in my table gives career b/b/f WAR or career pitching WAR, whichever is lesser. There are about 200 entries that I looked up manually, primarily for leading Pitchers including everyone in the Top 500 down to George Uhle (WAR 34.5, WAR-alt 11.1, so whole player WAR=45.6). There may be a few dozen entries for secondary pitchers including 10 well-known and important secondary pitchers such as Babe Ruth.
    : Only the sum of the two columns rates the whole player.
    Last edited by Paul Wendt; 12-21-2009 at 08:27 AM.

  10. #35
    Join Date
    May 2006
    Williamsport PA
    Thank you fellas! This will be a nice compliment to my Historical Win Shares Spreadsheet (all players, all years)
    Your Second Base Coach
    Garvey, Lopes, Russell, and Cey started 833 times and the Dodgers went 498-335, for a .598 winning percentage. Thatís equal to a team going 97-65 over a season. On those occasions when at least one of them missed his start, the Dodgers were 306-267-1, which is a .534 clip. That works out to a team going 87-75. So having all four of them added 10 wins to the Dodgers per year.

Page 2 of 2 FirstFirst 12


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts