So far I have used Markov Chains based
on statistics for a team of average players. Although I have been
able to gain some insight, especially concerning the value of
bunting, stealing or taking an extra base on a hit, I have not
considered the capabilities of individual players on a particular team.
To consider the individual
characteristics of the players on an particular team, I built a Monte
Carlo simulation of seven offensive innings. The first study I can
do with this simulation is evaluate the average runs scored by a team
in a game based on a particular batting order.
I can specify a batting order and run
the simulation for any number of seven inning games. For each game, I
record the simulated number of runs in the game. I can find the
average number of runs in a game for the batting order and a 95%
confidence interval around the average. That is, 19 out of 20 times,
the simulated average number of runs will be between the lower bound
and upper bound of the confidence interval.
If I run the simulation for two batting
orders, I can determine if the difference in the average number of
runs between the two batting orders is statistically significant.
I used data from the 2012 season.
Consider two batting orders which I will call batting order 1 and 2.
By running batting order 1 through 10
seasons of 20 games, I obtained the following results for runs per
game.
Lower Bound Average Upper Bound
6.188 6.919 7.651
In batting order 2, only the last batter in the order was changed.
Batting order 2 produced the following
results for 10 seasons of 20 games.
Lower Bound Average Upper Bound
5.920 6.500 7.079
Batting order 1 looks somewhat better
than batting order 2 because it has a higher average number of runs
per game.
However, the two confidence intervals
overlap. Therefore based on 10 seasons of 20 games, I cannot say
that the difference in the average number of runs per game is
statistically significant. That is, the average for batting order 1 may be larger than batting order 2 by random chance. To say that
batting order 1 is better than batting order 2 with statistical
significance, the lower bound of the confidence interval for batting
order 1 would have to larger than the upper bound of batting order 2.
I can get this result by taking a larger sample of seasons (that is,
simulating more seasons).
Below is a table that shows the lower
bound of batting order 1 and the upper bound of batting order 2 for
varying number of 20 game seasons.
Seasons Batting Order 1 Batting Order
2 Difference Statistically Significant
Lower Bound Upper Bound
20 6.289 6.454 No
30 6.296 6.407 No
40 6.234 6.237 No
50 6.288 6.206 Yes
So it takes at least 50 seasons before
the difference turns out to be statistically significant.
The major result is that Monte Carlo
simulation can be used to evaluate based on statistically
significance whether one batting order is better than another.
No comments:
Post a Comment