Monday, 4 February 2013

Monte Carlo Simulation Used to Compare Two Batting Orders

So far I have used Markov Chains based on statistics for a team of average players. Although I have been able to gain some insight, especially concerning the value of bunting, stealing or taking an extra base on a hit, I have not considered the capabilities of individual players on a particular team.

To consider the individual characteristics of the players on an particular team, I built a Monte Carlo simulation of seven offensive innings. The first study I can do with this simulation is evaluate the average runs scored by a team in a game based on a particular batting order.

I can specify a batting order and run the simulation for any number of seven inning games. For each game, I record the simulated number of runs in the game. I can find the average number of runs in a game for the batting order and a 95% confidence interval around the average. That is, 19 out of 20 times, the simulated average number of runs will be between the lower bound and upper bound of the confidence interval.

If I run the simulation for two batting orders, I can determine if the difference in the average number of runs between the two batting orders is statistically significant.

I used data from the 2012 season.
 
Consider two batting orders which I will call batting order 1 and 2.

By running batting order 1 through 10 seasons of 20 games, I obtained the following results for runs per game.

Lower Bound       Average                Upper Bound
6.188                   6.919                    7.651

In batting order 2, only the last batter in the order was changed. 
 
Batting order 2 produced the following results for 10 seasons of 20 games.
Lower Bound        Average                Upper Bound
5.920                    6.500                    7.079

Batting order 1 looks somewhat better than batting order 2 because it has a higher average number of runs per game.

However, the two confidence intervals overlap. Therefore based on 10 seasons of 20 games, I cannot say that the difference in the average number of runs per game is statistically significant. That is, the average for batting order 1 may be larger than batting order 2 by random chance. To say that batting order 1 is better than batting order 2 with statistical significance, the lower bound of the confidence interval for batting order 1 would have to larger than the upper bound of batting order 2. I can get this result by taking a larger sample of seasons (that is, simulating more seasons).

Below is a table that shows the lower bound of batting order 1 and the upper bound of batting order 2 for varying number of 20 game seasons.

Seasons    Batting Order 1    Batting Order 2    Difference Statistically Significant
                  Lower Bound        Upper Bound
20              6.289                    6.454                                    No
30              6.296                    6.407                                    No
40              6.234                    6.237                                    No
50              6.288                    6.206                                    Yes

So it takes at least 50 seasons before the difference turns out to be statistically significant.

The major result is that Monte Carlo simulation can be used to evaluate based on statistically significance whether one batting order is better than another.

No comments:

Post a Comment