Tuesday, March 16, 2010

Monte Carlo Simulation-A Lineup of Nine Joe Mauers!

In this post we will examine what would happen if you had an entire lineup comprised of one player, i.e. a lineup with nine Joe Mauers or Ryan Howards. This problem has been tackled before, most notably with Bill James’ Runs Created statistic. However, Monte Carlo Simulation can be used to give an even more accurate idea of how a lineup consisting entirely of one player would perform. Monte Carlo Simulation is used when there are copious amounts of possible outcomes and conventional simulation is not adequate.

An Excel spreadsheet can be used to perform the simulation and a player’s statistics can be put into the sheet so probabilities of various outcomes. There are 17 possible outcomes to each plate appearance: Strikeout, walk, HBP, error, short/medium/long single, short/long double, triple, home run, ground out, ground into double play (with men on base), line drive/infield fly, and short/medium/long fly out. Innings are simulated and the number of runs scored per inning (and by extension, game) can be recorded. The obvious upside to this method is that a true simulation is used, which should be much more accurate than basing it off of plain old statistics. The downside is that there is no way (currently) to factor in steals and players who can advance the extra base on a base hit better than other players. However, this should not have a significant effect on the study. The players who were in the top 10 for Batting Value in 2009 according to fangraphs.com were examined. A total of 1440 innings, or 160 “games” were simulated for each player. The table of results follows:

PlayerRuns/Game
Albert Pujols10.24
Joe Mauer9.90
Prince Fielder8.71
Hanley Ramirez8.45
Mark Teixeira8.00
Ben Zobrist7.83
Adrian Gonzalez7.76
Miguel Cabrera7.68
Derrek Lee7.64
Ryan Braun7.55


The results are not very surprising. Albert Pujols and Joe Mauer were the MVPs of their respective leagues and by a considerable margin the best offensive players in Major League Baseball. For perspective, the average runs scored per game in the NL was 4.43, and in the AL it was 4.82.

These numbers can be used to roughly gauge a player’s value. With them, we can examine which statistics are most highly correlated with Runs/Game as given by Monte Carlo Simulation, so we can examine which statistics are most important to a team’s success. The results will likely tell us nothing new, but it will be interesting nonetheless.

Two statistics had correlations over .9: OBP and OPS. Remember the closer the correlation is to 1, the more related the two statistics are.

StatisticCorrelation
AVG0.579
OBP0.928
SLG0.796
OPS0.922
Hits0.244
HR0.258
Walks0.356


This should not surprise any good stat-minded baseball fan. On-Base Percentage has the highest correlation and OPS is just behind. Slugging Percentage is the only other statistically significant correlation, which is of little surprise as well. Here are the scatter plots of Average Runs/Game against OBP and AVG. Note that “Average” is the R/G from the table above.





As you can see, there is a fairly strong linear relationship between R/G and OBP, while the relationship between R/G and AVG is considerably more scattered.

Interesting Baseball Fact of the Day: The lowest single-season ERA for a pitcher who gave up more than one hit per inning (since 1961) was Tommy John in 1977. Tommy John gave up 225 hits in 220.1 IP en route to a 2.78 ERA, 1.25 WHIP, 20-7 record and a second place spot in the Cy Young voting.

No comments:

Post a Comment