# Thread: The PCA (DIPS-based) pitching metric.

1. OK...I am still trying to understand how to work out a pitcher-specific BsRLW set.

Let's say you know all of the defense independent info and you know the complete BsR formula.

Let's use Glavine's '01 DI line to exemplify this:
Code:
```# 	Type 	Count
2 	Out 	446
3 	K 	116
4 	SB 	7
5 	Indif 	1
6 	CS 	5
8 	PkO 	2
9 	WP 	2
10 	PB 	0
11 	BK 	0
14 	UBB 	87
15 	IBB 	10
16 	HBP 	2
17 	Intf. 	0
18 	ROE 	9
19 	FC 	2
20 	1B 	148
21 	2B 	41
22 	3B 	2
23 	HR 	26
25 	SH 	11
26 	SF 	6
27 	DP 	30
28 	TP 	0```
I would require a BsR formula that accomodates ALL of those event types...and not one event short of all of those event types.

As far as I know, no such formula exists...though if I'm wrong I'd appreciate being pointed in the right direction.

The A factor is Baserunners who we don't already know got gunned down on base...

H-HR+UBB+IBB+HBP+ROE-CS-DP-(2*TP) = A

C is the batting outs:

In-Play Outs + K + DP + TP + SF + SH + FC = C

D is HR as usual.

B is a mystifying blizzard of weighted events that I have no idea how they're calculated though from what I gather...it looks like sabermetricians just sort of guessed their way to them through trial and error or shaky efforts at linear best fit modeling.

I don't quite get what I'm putting in the run column for the pitcher...because I'm using this to CALCULATE his run environment...so I don't get how I'm doing that if I need to KNOW it to do the calculation.

2. Registered User
Join Date
May 2005
Posts
382
Matt - In post #68 Beamer gave you the link to the place on Tango's site to the complete formula that uses all the information that you require. I believe that what you are calculating using this formula is runs for the pitcher that are equivalent to the runs that you are calculating by multiplying Gavine's stat line by your custom linear weights. No further conversion is necessary.

3. Oy...I'm an idiot...I had the link John gave me in my favorites even...LOL

I am still confused as to how you use the run environment created by the pitcher...to rate the run environment created by the pitcher.

4. Registered User
Join Date
Jul 2006
Posts
53
Matt, I outline the method to generate LWTS in post 72 (simple method ... but just extend it for the more complex formula).

To get the correct numbers for a particular pitcher what you need to do is to collate all the pitcher metrics (which you have adjusted for defense independence, I guess). And plug it into the BsR forumla. Because pitcher are their own context this will give you the correct run environment.

Then do the LWTS generation method that I outline in post 72

5. OK...so if I put all of the event counts into the A, B, C, D elements, that will produce a total that is equivalent to the pitcher's RC?

6. BTW, my apologies if I'm being dense...I'm trying to make sure I have a COMPLETE understanding before I do anything with this.

7. Registered User
Join Date
Jul 2006
Posts
53
Yep -- that is correct. Try it for both the simple and complex formula. The numbers should be close. Also you sort of know what numbers you are looking for (1b = 0.47, 2b = 0.75, etc etc etc)

8. If the BsR formula gives me a RC equivalent...why do I even care what it says the LW are? it gives you the answer.

9. The most complete formula I have (The one Tango developed) doesn't include DP and TP...DP are pretty important, so I'm surprised Tango didn't include that.

10. Registered User
Join Date
Jul 2006
Posts
53
Yes, I see. And no out on base term either.

Perhaps you could account for DP by combing the out and caught stealing. Not ideal, I know, but I'd imagine it gives you the right coefficients.

11. Been down that road before...when I was using multilinear regression, in order to make things with small sample sizes give me sensible linear weights I started calling events the equivalent of combinations of other events.

DP = Out+CS
SF = Out+(HR-3B)
SH = Out+SB
3B = 2B+SB
etc

I did that for all of the less common events and ended up with 7 categories

1B
2B
HR
K
BB
Out

It kinda worked to produce more stable multilinear weights for those things but I'd rather not go down the road of making assumptions like that again. :\

12. Registered User
Join Date
Jul 2006
Posts
53
I'd imagine that the DP is the most robust out of those. Isn't that the only one you need to do? I'd ignore triple plays as they are so rare.

Perhaps Tango could give his thoughts on this?

13. Triple plays always have the same rough value. they have a LW of -1.5ish very consistently. I can work around that TP absence...but DPs I need to be in the equation because they happen often enough that including them will probably change all of those other constants enough to matter.

14. Registered User
Join Date
May 2005
Posts
382
The B values represent what effect the event has on scoring the base runners. A double play and a generic out are likely to move the baserunners along in about the same way. The fact that it removes one base runner is already accounted for in the C portion of BaseRuns. However, if you think it makes a difference Tango probably still has the program that he used to create the table. Perhaps he could give you a value for DP. The only other value that should change would be the value for the generic out.

15. Registered User
Join Date
Jul 2006
Posts
53
So what you are saying MisterD is that all you do is for a DP add an extra out; for a TP add two extra outs (in the C part of the equation). Does that account for the fact that by virtue of being on base they have a value??

If all picthers have the same DP rate it doesn't matter. However, since Tango has explicity excluded DP, you have to assume the value of this is already included in the B formula. Given that different picthers have different DP rates (think ground v flyball pitchers) it could make a difference .... but I don't know for sure.

16. Registered User
Join Date
May 2005
Posts
382
Beamer - I am not an expert on BaseRuns by any means so my understanding might be way off. But what I am inferring from how Tango calculated his B values are that they are not pitcher specific. The pitcher affects the likelihood of each event occuring, but how that event affects the likelihood of a runner on base scoring (the B values) is determined mostly by the event itself, a little by the location of a hit, and in a very minor way by the average abilities of the runners on base. That is why the B values are determined by using a very large data set. The differences between individual pitcher's DP rates would be accounted for in their different C values just as the differences in their HR rates are accounted for in their D values and not in varying the HR value in the B value.

17. First off Matt, I (again) highly suggest you read Patriot's link on BaseRuns that I noted earlier. He gives you the exact nuts and bolts to be able to figure out the coefficients in the B equation. Warning: you better know calculus. Here it is again:
http://gosu02.tripod.com/id76.html

Once you have an understanding of what Patriot is doing, you can use the LWTS figures for any given year, and reverse engineer a BaseRuns equation. You will likely find that the coefficients of the B equation will not change much. That's the beauty of BaseRuns. Once you are convinced that the coefficients in the B equation are fairly stable, then simply create a standard BaseRuns equation for 1957-2005.

As well, it is not necessary to derive the LWTS values once you have BaseRuns. But, it's fun to do anyway, and you'll get these figures:
http://www.tangotiger.net/customlwts.html
Last edited by Tango Tiger; 08-15-2006 at 08:58 AM.

18. I've attempted on several occasions to read Patriot's BaseRuns articles as you suggested, Tango but found it illegible. His choices on how to name his variables and how to present his equations make them completely unreadable and therefore make the articles hard to follow.

And the link you provided me is a lnik to a dozen links...there are multiple articles on BsR there. I know my calculus, but his derivation of how to work out B coefficients doesn't make sense to me...so either I'm an idiot (not out of the realm of possibility) or his presentation is throwing me off.

19. Registered User
Join Date
Jul 2006
Posts
53
This is the badger you want:

http://members.cox.net/~harlowk22/br1.html

Errr ... I understand it, but I sure as hell wouldn't want to implement it.

Good luck!!

20. There are two articles devoted to BaseRuns. In the second one, he says the following:

Sometimes, though, you know what Linear Weights you would like to generate for the entity as a whole. If you do, you can find the exact B coefficient that you need to produce it for each event through this formula:
b = ((B + C)^2*(L - d) - B^2*a - B*C*a + A*B*c)/(A*C)
Where L is the Linear Weight value you want to get for the event in question. B here is the Exact B that you calculate from actual runs scored, A, C, and D, as you do not yet have B coefficients for each event and therefore cannot compute B. I have provided a spreadsheet which you can use to do this as well.
As well, he provides a spreadsheet at the end of the second article, with the formula already created for you! That sheet is named
Find B Value from known LW
This is about as hand-holding as you can get. I think you need to slow down a little, and take greater care in reading. (Reminds me how you missed the "5", "6" on the Retrosheet site signifying unearned runs on the "how to use" page.)

21. Total derivatives...suddenly I feel like I'm back in my meteorology classes...LOL

This derivation could be expanded to include every event, but NySQL can't do calculus...so I have no idea how to actually implement it.

22. Originally Posted by Tango Tiger
There are two articles devoted to BaseRuns. In the second one, he says the following:

As well, he provides a spreadsheet at the end of the second article, with the formula already created for you! That sheet is named

This is about as hand-holding as you can get. I think you need to slow down a little, and take greater care in reading. (Reminds me how you missed the "5", "6" on the Retrosheet site signifying unearned runs on the "how to use" page.)
I was reading the first article...the second article's title "Complete Reinvention" sounded like it was moving away from the point. Thanks for pointing that out. :\

I'm not sure I like the idea of taking assumed linear weights, putting them in an equation and going "aha! I got the correct linear weights!"...*shrug*

23. You are not doing that! You are putting in the correct LWTS for a particular run environment. This will create a standard BaseRuns equation from which you can generate custom LWTS values for almost any run environment.

What I suggest is the following:
take the 1962-1968 time period, and the 1994-2005 time period, and generate your LWTS equations using the standard RE process. They should be markedly different.

Then, using Patriot spreadsheet, create the BaseRuns equation for each one. They should look somewhat similar.

24. And it was titled "Base Runs(total rewrite)". Even if "rewrite" and "reinvention" is the same thing, you should still have read it!

25. This would then be something like seeding a numerical forecasting model with some presupposed starting relationships in order to get access to a forecasted outcome for other conditions. Give it the LW value for a total population (I would get a net LW value for each even using the entire 1957-2005 database) and you can get b values and from the final form of the BsR equation you get LWs for any environment.

I'll need to expand his spreadsheet to include every imaginable event type (he's only got 14 or 15 of the 25 types), and calculate a population LW for my entire dataset (I could use any one season, but I'm guessing the more data you put in, and the more precise your starting seed is, the more robust your BsR will be...much like the better your initialization of a numerical forecasting model, the more accurate the weather forecast).

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•