# Thread: The PCA (DIPS-based) pitching metric.

1. ## The PCA (DIPS-based) pitching metric.

Given the level of skepticism and otherwise interest in how I get my pitching rankings/ratings, I've decided to post the exact methodology I use to evaluate pitchers in the interests of seeking feedback and full disclosure of the method.

This isn't a "simple" method so it's going to take a bit to explain. If you're interested, please do bare with me - I'll give a concrete example pitcher record to illustrate.

Here is the pitching record I will analyze:
Tom Glavine

Focus your attention on his 2001 season. It's fairly instructive so we'll use it.

STEP ONE - PURE DIPS ANALYSIS
Statistics of interest at this point in the analysis:

IPOuts - 659
H - 213
K - 116
BB - 97
HR - 24
HBP - 2
WP - 2
BK - 0
PkO - 9
A - 40

From the IPOut, H, HR and K data, we know that Glavine put 732 balls into play (IPOuts + H - K - HR).

I track pitcher assists because I am working on the assumption that if a pitcher gets a "touch" on defense, the ball must have been hit poorly and that must be considered a success for the pitcher. The BABIP is probably WAY lower for balls which a pitcher could potentially make a play anyway. Note that pick-offs almost always result in an out...a pitcher-assisted out, so from this point forward, we consider pickoffs as part of pitcher assists.

In 2001 in the NL, pitchers recorded 2841 assists on 67240 balls in play. The Natioanl League batting average on balls in play when not fielded by the pitcher is .312 so given normal skill at preventing BIP hits and an average defense we expect Glavine to allow 216 (.312 * 692 BIP not fielded by Glavine) in play hits.

Doubles make up a little less than 23% of the in play hits, so 50 of his 216 in play hits shuold be 2B.

Triples make up 2.4% of the in play hits in 2001 in the NL, so 5 of his 216 hits should be 3B.

That leaves 161 singles.

We also have to adjust his HRA to account for the impact Turner Field had on HR rates. I calculated a 5-year weighted park-HR adjustment of 1.00296 for 2001 (that's including all of the other parks in which the Braves played), meaning that the net average park in which the Braves played their games slightly increased the odds of a HR relative to the rest of the NL.

If we divide his 24 HRs by 1.00296 we get 23.9 or essentially still 24 so no change there.

With an expectation to allow (216-189) 27 more hits than he actually allowed, he now has 27 fewer defense-independent IPOuts.

That gives him this defense independent line:

IPOuts - 632
1B - 161
2B - 50
3B - 5
HR - 24
BB - 97
K - 116
HBP - 2
WP - 2
BK - 0
PkO - 9
A - 40

STEP TWO - ACCOUNTING FOR REAL IMPACTS ON BALLS IN PLAY

How much of the difference between league average BIP hit rates and Glavine's actual performance is the Braves' defense and how much is Glavine?

The Braves as a team allowed 4419 balls in play, of which 190 were fielded by pitchers for an assist leaving 4229 BIP of interest and 1210 in play hits. But Glavine affected that figure by his own pitching, so we need to take out 692 non-pitcher-assisted BIP and 189 BIP hits leaving us with these team totals:

3537 BIP
1021 In-Play Hits
.289 BABIP on non-pitcher-asisted BIP

If Glavine were league average, pitching in front of this brilliant team defense, he'd record (.289*692 BIP) 200 in-play hits. He allowed 11 fewer than that.

Now we need to "give back" some of the hits we added above.

8 singles and 3 doubles get taken off his existing defense independent line to accoutn for the fact that he shaved 11 hits off the board.

That leaves his hit counts and out counts at:

IPOuts - 643
1B - 153
2B - 47
3B - 5
HR - 24

The rest of the stats stay the same.

STEP THREE - TURNING THE NEW LINE INTO RC

Pitcher Assists are treated as automatic outs (given the usual linear weight for an out).

Now I used less accurate LWs in the original PCA analysis than the ones I just finished calculating that were based on multi-linear regression, but I expect to implement this methodology with accurate LW data in the near future so I'll use the better LW data in this example.

In 2001, the following LW existed in the NL:

In-Play Out (this includes both FC and straight out) - -0.257
K - -0.281
1B - 0.460
2B - 0.771
3B - 1.060
HR - 1.370
BB (including both unintentional and intentional) - 0.291
HBP - 0.328
WP - 0.279
BK - 0.292

To get RA (above average) for Glavine, we just multiply the number of each of these events by their LW:

In-Play Outs: (643 - 116) = 527 * -0.257 = -135.44 +
K: 116 * -0.281 = -32.60 +
1B: 153 * 0.460 = 70.38 +
2B: 47 * 0.771 = 36.24 +
3B: 5 * 1.060 = 5.30 +
HR: 24 * 1.370 = 32.88 +
BB: 97 * 0.291 = 28.23 +
HBP: 2 * 0.328 = 0.66 +
WP: 2 * 0.279 = 0.56

Add those all up and you get 6.21. In 2001, this method asserts that Tom Glavine was 6.21 runs allowed worse than the national league average pitcher (a far cry from his defense-assisted 124 ERA+).

If you want an ERA-like number, you can convert 6.21 RAAA to a total RA metric by knowing that the NL RA/Out is 0.176 (that does include errors) which means in Glavine's defense-neutral 643 IPouts he should have allowed 113 defense-neutral runs (we're expecting a league-average rate of runs produced by errors here) and our estimate is that he actually allowed 119.21 defense-neutral runs for a DNRA (defense-neutral run average) of 5.01 (.186 * 27 outs) against a league average DNRA of 4.75.

His DNRA+ would be (4.75 / 5.01) * 100 or 95. A slightly below average pitcher in 2001.

Two notable flaws with this that I aim to correct in future renditions where data allows.

I didn't adjust for the pitcher's tendency to generate groundballs/flyballs and the expected changes that produces in the average result of a ball in play, especially the change in the double play rate.

I didn't account for the pitcher's impact in extra base hits on balls in play compared to his team's extra base hit rates. I didn't originally have that data, and I still don't have it prior to 1957 so the standing assumption must be that all pitchers allow the same rate of extra base hits per in play hit prior to 1957. In the PBP era I can change the method to account for sensible impacts on ball in play XBH rates, though I expect those tweaks to be small most of the time.

Thoughts?

Criticisms? Any ideas you have that might improve the method, I'm open to suggestions.

2. Registered User
Join Date
May 2005
Posts
382
Here's Glavine's actual line for 2001:

495 outs
116 K's
9 SB's
7 CS's
3 PkO's
2 WP
87 BB's
10 IBB's
2 HBP's
5 Error's on play
1 FC
156 1B
32 2B
1 3B
24 HR
28 DP's

Total BIP's is 714 not 732. The difference is due to not counting double plays. His actual value is a pitcher is going to be much higher than you calculated for several reasons. He has lower WP and PB than the average pitcher. He has a much lower success rate for SB than the average pitcher. He has much lower rate of XBH than the average pitcher. I haven't calculated the exact number but I suspect he also has a higher DP rate than the average pitcher.

I know you are trying to create a methodology that you can use for the years that you don't have PBP data, but you should calculate some of the years that you do have PBP data for both ways; using the actual numbers and your estimated numbers. That should show you the range of error that your estimates can create. Then you can try to fine tune your estimates to lower that error.

3. That is in the plans MrD

The XBH though you need to be careful, becasue the Braves' defensive strength during this time period was their outfield. Andruw Jones was a one-man double/triple wrecking crew.

The way I intend to handle XBH in the PBP era is by doing exactly what I did for in play hits...placing Glavine in an average context, then adjusting for his actual impact on XBH compared to how the Braves did in that regard when he was not pitching. The same will have to happen for DP% (per ball in play), because the Braves also turned a mean double play during that time period...not necessarily because of pitching skill entirely.

4. Success rate on SB is the other thing I haven't yet included, and that would need to be adjusted relative to the team as well. Some pitchers might have a low SB% because their catchers are really good at throwing out runners, although Glavine is probably not one of them since his catchers have generally sucked.

All of these things that are not pure-DIPS must be team-relative because the team is a biasing factor that needs to be considered.

5. Hmm...I just noticed your pickoff number is different than mine...trying to determine how I got the wrong number in my database...that's not good.

6. STEP 3:

In 2001, the following LW existed in the NL
If you are going for precision, the LWTS is not what you should use for pitchers or teams. BaseRuns is. (Or better yet, Markov.) If Clemens gives up 2.5 runs per game, it doesn't matter what the 4.50 RPG pitcher did in the same season. You want to know how a 2.5 RPG pitcher would do, given the base-advancement probabilities of that season.

I understand why you want to use LWTS. But, it's wrong. That said, you'll only be off by a little bit.

***

As for pickoffs, it depends how you treat it. I treat each of the Retro events as mutually exclusive. It's either a SB or CS or PK.

http://www.insidethebook.com/ee/inde...the_cs_and_pk/

7. Originally Posted by misterdirt
Here's Glavine's actual line for 2001:

32 2B
I was gonna say that Matt's estimate of 50 doubles sounded way too high for Glavine.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•