Saturday, 12 January 2013

Markov Chains and Monte Carlo Simulation

I have been dabbling with fastpitch statistics for a number of years.  I have been using fairly simple formulae such as runs created and linear weights as described in The Bill James Handbook 2013 and by Pete Palmer in The Hidden Game of Baseball .

Last winter, I started analyzing fastpitch statistics in a serious way.  I purchased two books that got me started.

The first book is Baseball Hacks: Tips & Tools for Analyzing and Winning with Statistics .  This book has many formulae and algorithms, as well as computer code, and explains how to use these formulae and algorithms to calculate sophisticated statistics.

The second book is The Book: Playing the Percentages in Baseball  which does not contain as many formulae but has many results using sophisticated computer programs and statistical analysis.

Two of the results in the first chapter of The Book were:

a. the average number of runs earned until the end of the inning for every combination of outs and runners on base, and

b. the probability of winning the game for every inning, score, outs and runners on base from the top of the first inning, no score, no outs, nobody on base to the bottom of the ninth inning with the score tied, two outs and the bases loaded.

I spent last winter writing computer programs to obtain similar results for fastpitch. The mathematical technique that I used is called Markov Chains.  The purpose of these calculations is to evaluate offensive strategies like bunting and stealing.

The results are interesting but counter-intuitive.  For example, there were very few situations in the results in which a sacrifice bunt improved a team's probability of winning the game.  In other words, even if the sacrifice bunt is successful in a particular situation, the probability of winning the game is reduced afterwards.

This is simply not sellable to fastpitch coaches and managers.  However, I think there is value in this result.  Especially, since I didn't consider the difficulty of laying down a successful sacrifice bunt.

The limitation to the Markov Chain approach is that we must use averages.  It considers two average teams playing each other and each team is made up of all average players.

Also, I was able to find the break-even point for attempting to steal a base, or attempting to take an extra base on a hit or an error, or advancing a base on a passed ball.  That is, if the estimated probability of successfully advancing to the next base is higher than the break-even point, it is advantageous to take the chance and attempt it.  If the estimated probability is not higher than the break-even point, then taking the next base should not be attempted.

Although this still considers two average teams playing each other, it does consider the specific capabilities of the baserunner and the defensive players.

This winter, I took a different approach that allowed me to consider the individual characteristics of the players on a team.  This involved a technique called the Monte Carlo method.

I wrote a computer program to simulate seven offensive innings with a particular lineup of players each with specific probabilities of producing different outcomes when they bat. I used the average probabilities for the season for each player.  But each player's average probabilities were different.

I ran the simulation many times and calculated the average runs scored for seven innings.  Then I was able to do statistical analysis to find a confidence interval on the average number of runs.  With this, I was able to determine if the differences in the simulation results were significant or if they might have been caused by random elements of the simulation.

I used the simulation to evaluate if one batting order was significantly better than another batting order.  I found some interesting results.  In most cases, the choice of which individual players are placed in the batting order makes a difference.  However, there is considerable leeway in which spot in the batting order they can be placed.  In a few cases, the same players may be in the batting order but the placement of the players in the order makes a significant difference.

I should also be able to consider the potential value of a successful bunt based on the individual characteristics of the players on the team.

The limitation of this approach is that it does not consider the defensive capabilities of the players on the team  The inclusion of defensive capabilities in the simulation may have to be postponed until a future date.


No comments:

Post a Comment