# Thread: Using R With Baseball Analysis

## Using R With Baseball Analysis

Hey everyone, I'm 1phillies fan382. How many of you use R for your baseball computations. If you do, post your code/graphs here (SAS, SPSS, and others can count as well).

On another forum I post on, someone was saying that Cameron Rupp will hit 20 home runs next year (and that him and Andrew Knapp will combine for 30 home runs). I showed him how ridiculous and unlikely this was by looking at all players from age 20-27 who had a .163 ISO or greater with 732 AB's (PA's were too hard to come by in my data set), and then looking at how many of those players hit 20 home runs the next season. It turns out the percentage was ~22% since 1876. I thought about looking up how likely it would be for Rupp to hit 20 home runs AND Knapp to hit 10 home runs, but I just settled on the fact that only 5 teams in the last 3 years have received 30 home runs from their entire catching position. Here is my code. The second chart is the 43 players (of 197 possible players) to hit 20 home runs or more after having an ISO of .163 in any of their age 20-27 seasons. I told this dude, look at the list- do any of the players sound like freaking Cameron Rupp to you?Screen Shot 2016-12-11 at 3.09.03 AM.pngScreen Shot 2016-12-11 at 3.14.09 AM.png

Nah, none of them look like Cameron Rupp to me.... And that's why I don't like using historical comps to predict the future performance of a player. Each player is unique and mutually exclusive.

...............

ferrel.jpg

Which set are you using?

I always go with Lehman, but am always looking for better.

What doesn't make sense about what I said? You're extrapolating data from other players in past seasons in order to try and make a prediction for another player's future season.

I toyed around with the Lahman package, but decided not to use it on this one, and just used Fangraphs' sheets. I don't think Lahman contains ISO

Exactly. You don't notice a trend in certain types of players? Like there's not many 34 year old catchers or first baseman? Or the Prince Fielder and Ryan Howard were going to end up the same as Mo Vaughn and Greg Vaughn? Or do you just use those vaunted intuitions and "gut-feelings" to predict how a player will do?

I'd consider myself a pretty stat-savvy guy. I agree that there is an indeed a trend that exists in certain players. However, these models don't account for the individual adjustments that exist on a player-to-player basis.

ahh.

I get it. I usually like to calculate my own stats, but I also usually do it in SAS. When I do things in R it usually becomes a huge (or more huge) pain in the butt to do the calculations.

wait, I guess I am confused.

you are looking at those with an ISO of .163 or greater? Why not .163 or less?

Oh ****, you're right. I guess that doesn't make much sense. I did it last night at 4 AM haha. I guess all mine shows is players who are capable of an ISO that great or not

12. yeah, I think if you do the flip then you will find more garbage players. At least one journeyman catcher is my guess.

So I did the ISO being <= .163, and it turns out there are 742 players who qualify, with 139 of those hitting 20 home runs in their age 28 season. Thanks!

Here are the players. Again, any look to have similar career trajectories to Rupp?

Screen Shot 2016-12-11 at 1.27.40 PM.pngScreen Shot 2016-12-11 at 1.27.54 PM.pngScreen Shot 2016-12-11 at 1.28.11 PM.png

I am not sure looking at it this way gets you where you want to go. We have a player whose trajectory is a bit too small to compare to anything.

If we only care about ISO to predict HR's then who are we to say he cannot have the same 28+ year trajectory of the above (because we know better, of course).

I looked at it slightly different.

I took players who through their 27th birthday had zero (0) 20 HR seasons, and who had a career ISO through that point of .163 and under; and looked at which of them had at least one 20 HR season post 27th birthday.

There were 145 players:
LAST FIRST 20+ HR seasons post 27
Roberto Alomar 3
Sandy Alomar 1
George Altman 2
Cap Anson 1
Rich Aurilia 4
Bob Bailey 3
Clint Barmes 1
Earl Battey 1
Jose Bautista 4
Buddy Bell 1
Jay Bell 3
Craig Biggio 8
Aaron Boone 1
Bret Boone 6
Clete Boyer 1
Eddie Bressoud 1
Lou Brock 1
Hubie Brooks 2
Smoky Burgess 1
Marlon Byrd 2
Ken Caminiti 4
Bert Campaneris 1
Leo Cardenas 1
Chris Chambliss 2
Mickey Cochrane 1
Coco Crisp 1
Joe Cronin 1
Johnny Damon 3
Al Dark 2
Darren Daulton 2
Doug DeCinces 5
Mike Devereaux 1
Bill Dickey 4
Vince DiMaggio 1
Brian Downing 6
Ray Durham 2
Damion Easley 3
Bob Elliott 3
Jacoby Ellsbury 1
Kevin Elster 1
Carl Everett 3
Hoot Evers 1
Chico Fernandez 1
Steve Finley 7
Darrin Fletcher 1
Jack Fournier 3
Julio Franco 1
Jim Fregosi 1
Carl Furillo 3
Charlie Gehringer 1
Bernard Gilkey 1
Carlos Gomez 1
Alex Gonzalez 1
Luis Gonzalez 7
Bobby Grich 2
Tom Grieve 1
Marquis Grissom 5
Kelly Gruber 2
Carlos Guillen 2
Tommy Harper 1
Charlie Hayes 1
Von Hayes 2
Harry Heilmann 1
Ken Henderson 1
Jose Hernandez 3
Larry Herndon 2
Jim Hickman 2
Tommy Holmes 1
Brandon Inge 2
Randy Jackson 1
Davey Johnson 1
Eddie Joost 2
Corey Koskie 2
Joe Kuhel 1
Mike Lansing 1
Jeffrey Leonard 3
Sherm Lollar 2
Ernie Lombardi 1
John Lowenstein 1
Mike Macfarlane 1
Frank Malzone 1
Felix Mantilla 1
Edgar Martinez 8
Russell Martin 1
Frank McCormick 1
Brian McRae 1
Irish Meusel 1
Minnie Minoso 4
Bengie Molina 1
Paul Molitor 1
Rick Monday 3
Don Money 1
Joe Morgan 4
Phil Nevin 4
Ben Oglivie 4
Miguel Olivo 1
Lyle Overbay 2
Dustin Pedroia 1
Terry Pendleton 2
Tony Phillips 1
A. J. Pierzynski 1
Bill Robinson 4
Jimmy Rollins 4
Joe Rudi 2
Chris Sabo 3
Benito Santiago 1
Nate Schierholtz 1
Frank Schulte 1
David Segui 1
Andy Seminick 2
John Shelby 1
Norm Siebern 1
Roy Sievers 9
Harry Simpson 1
Duke Sims 1
Bob Skinner 1
Lonnie Smith 1
Eric Soderholm 2
Jim Spencer 1
Ed Sprague 2
Mike Stanley 3
Leroy Stanton 1
Terry Steinbach 1
B. J. Surhoff 3
Ed Taubensee 1
Alan Trammell 2
Mickey Vernon 1
Tillie Walker 2
Bob Watson 1
Lou Whitaker 4
Frank White 2
Ty Wigginton 3
Bernie Williams 7
Glenn Wright 1
Eddie Yost 1
Kevin Youkilis 2
Kevin Young 3
Michael Young 4
Todd Zeile 4

LAST FIRST 20+ HR seasons post 27
Roberto Alomar 3
Sandy Alomar 1
George Altman 2
Cap Anson 1
Rich Aurilia 4
Bob Bailey 3
Clint Barmes 1
Earl Battey 1
Jose Bautista 4
Buddy Bell 1
Jay Bell 3
Craig Biggio 8
Aaron Boone 1
Bret Boone 6
Clete Boyer 1
Eddie Bressoud 1
Lou Brock 1
Hubie Brooks 2
Smoky Burgess 1
Marlon Byrd 2
Ken Caminiti 4
Bert Campaneris 1
Leo Cardenas 1
Chris Chambliss 2
Mickey Cochrane 1
Coco Crisp 1
Joe Cronin 1
Johnny Damon 3
Al Dark 2
Darren Daulton 2
Doug DeCinces 5
Mike Devereaux 1
Bill Dickey 4
Vince DiMaggio 1
Brian Downing 6
Ray Durham 2
Damion Easley 3
Bob Elliott 3
Jacoby Ellsbury 1
Kevin Elster 1
Carl Everett 3
Hoot Evers 1
Chico Fernandez 1
Steve Finley 7
Darrin Fletcher 1
Jack Fournier 3
Julio Franco 1
Jim Fregosi 1
Carl Furillo 3
Charlie Gehringer 1
Bernard Gilkey 1
Carlos Gomez 1
Alex Gonzalez 1
Luis Gonzalez 7
Bobby Grich 2
Tom Grieve 1
Marquis Grissom 5
Kelly Gruber 2
Carlos Guillen 2
Tommy Harper 1
Charlie Hayes 1
Von Hayes 2
Harry Heilmann 1
Ken Henderson 1
Jose Hernandez 3
Larry Herndon 2
Jim Hickman 2
Tommy Holmes 1
Brandon Inge 2
Randy Jackson 1
Davey Johnson 1
Eddie Joost 2
Corey Koskie 2
Joe Kuhel 1
Mike Lansing 1
Jeffrey Leonard 3
Sherm Lollar 2
Ernie Lombardi 1
John Lowenstein 1
Mike Macfarlane 1
Frank Malzone 1
Felix Mantilla 1
Edgar Martinez 8
Russell Martin 1
Frank McCormick 1
Brian McRae 1
Irish Meusel 1
Minnie Minoso 4
Bengie Molina 1
Paul Molitor 1
Rick Monday 3
Don Money 1
Joe Morgan 4
Phil Nevin 4
Ben Oglivie 4
Miguel Olivo 1
Lyle Overbay 2
Dustin Pedroia 1
Terry Pendleton 2
Tony Phillips 1
A. J. Pierzynski 1
Bill Robinson 4
Jimmy Rollins 4
Joe Rudi 2
Chris Sabo 3
Benito Santiago 1
Nate Schierholtz 1
Frank Schulte 1
David Segui 1
Andy Seminick 2
John Shelby 1
Norm Siebern 1
Roy Sievers 9
Harry Simpson 1
Duke Sims 1
Bob Skinner 1
Lonnie Smith 1
Eric Soderholm 2
Jim Spencer 1
Ed Sprague 2
Mike Stanley 3
Leroy Stanton 1
Terry Steinbach 1
B. J. Surhoff 3
Ed Taubensee 1
Alan Trammell 2
Mickey Vernon 1
Tillie Walker 2
Bob Watson 1
Lou Whitaker 4
Frank White 2
Ty Wigginton 3
Bernie Williams 7
Glenn Wright 1
Eddie Yost 1
Kevin Youkilis 2
Kevin Young 3
Michael Young 4
Todd Zeile 4
Nice, man! Your methodology is indeed much better than mine. I was just trying to show how rare it is for a player with Rupp's methodology (singularly using ISO, which is of course flawed) to hit 20 home runs at age 28 (also flawed because they could hit home runs after age 28), which was rare, at 19%. I mainly had beef with the statement that Rupp and a rookie catcher will combine for 30 home runs next season.

LAST FIRST 20+ HR seasons post 27
Roberto Alomar 3
Sandy Alomar 1
George Altman 2
Cap Anson 1
Rich Aurilia 4
Bob Bailey 3
Clint Barmes 1
Earl Battey 1
Jose Bautista 4
Buddy Bell 1
Jay Bell 3
Craig Biggio 8
Aaron Boone 1
Bret Boone 6
Clete Boyer 1
Eddie Bressoud 1
Lou Brock 1
Hubie Brooks 2
Smoky Burgess 1
Marlon Byrd 2
Ken Caminiti 4
Bert Campaneris 1
Leo Cardenas 1
Chris Chambliss 2
Mickey Cochrane 1
Coco Crisp 1
Joe Cronin 1
Johnny Damon 3
Al Dark 2
Darren Daulton 2
Doug DeCinces 5
Mike Devereaux 1
Bill Dickey 4
Vince DiMaggio 1
Brian Downing 6
Ray Durham 2
Damion Easley 3
Bob Elliott 3
Jacoby Ellsbury 1
Kevin Elster 1
Carl Everett 3
Hoot Evers 1
Chico Fernandez 1
Steve Finley 7
Darrin Fletcher 1
Jack Fournier 3
Julio Franco 1
Jim Fregosi 1
Carl Furillo 3
Charlie Gehringer 1
Bernard Gilkey 1
Carlos Gomez 1
Alex Gonzalez 1
Luis Gonzalez 7
Bobby Grich 2
Tom Grieve 1
Marquis Grissom 5
Kelly Gruber 2
Carlos Guillen 2
Tommy Harper 1
Charlie Hayes 1
Von Hayes 2
Harry Heilmann 1
Ken Henderson 1
Jose Hernandez 3
Larry Herndon 2
Jim Hickman 2
Tommy Holmes 1
Brandon Inge 2
Randy Jackson 1
Davey Johnson 1
Eddie Joost 2
Corey Koskie 2
Joe Kuhel 1
Mike Lansing 1
Jeffrey Leonard 3
Sherm Lollar 2
Ernie Lombardi 1
John Lowenstein 1
Mike Macfarlane 1
Frank Malzone 1
Felix Mantilla 1
Edgar Martinez 8
Russell Martin 1
Frank McCormick 1
Brian McRae 1
Irish Meusel 1
Minnie Minoso 4
Bengie Molina 1
Paul Molitor 1
Rick Monday 3
Don Money 1
Joe Morgan 4
Phil Nevin 4
Ben Oglivie 4
Miguel Olivo 1
Lyle Overbay 2
Dustin Pedroia 1
Terry Pendleton 2
Tony Phillips 1
A. J. Pierzynski 1
Bill Robinson 4
Jimmy Rollins 4
Joe Rudi 2
Chris Sabo 3
Benito Santiago 1
Nate Schierholtz 1
Frank Schulte 1
David Segui 1
Andy Seminick 2
John Shelby 1
Norm Siebern 1
Roy Sievers 9
Harry Simpson 1
Duke Sims 1
Bob Skinner 1
Lonnie Smith 1
Eric Soderholm 2
Jim Spencer 1
Ed Sprague 2
Mike Stanley 3
Leroy Stanton 1
Terry Steinbach 1
B. J. Surhoff 3
Ed Taubensee 1
Alan Trammell 2
Mickey Vernon 1
Tillie Walker 2
Bob Watson 1
Lou Whitaker 4
Frank White 2
Ty Wigginton 3
Bernie Williams 7
Glenn Wright 1
Eddie Yost 1
Kevin Youkilis 2
Kevin Young 3
Michael Young 4
Todd Zeile 4

Wow, didn't realize just how much of a late bloomer Biggio was in the power department

One other thing that's important to my analysis is that of the 143 players who did indeed hit 20 or more home runs after not doing so previously, 94 of them had a wRC+ of 120 or better in the next season. 78 had a wRC+ of 130 or better, 54 were at 140 or better. The median wRC+ of the 143 players was 134. Cameron Rupp has a career wRC+ of 88.

