Monday, 15 April 2013

Defensive Positioning of the Team

In an earlier post, I discussed how to find the best bating order.  However, this method neglected the defensive positioning of the players.

In this post, I will describe how to find the best lineup considering both offensive and defensive aspects of the team's performance.

First, I found the positions of each of the players on the team can play.  I wanted to find the best lineup and batting order for each of the three pitchers on the team.

For each of the three pitchers, I developed 100 random lineups that filled the 9 defensive positions along with the DH.  I did not consider the batting order at this point.  However, I ran the Monte Carlo simulation for each of the random lineups with a random batting order.  Then I screened out the lineups that did not provide sufficient average runs per game.   In this way, I had all of the defensive positions filled and still had a reasonably good average runs per game.

On examination of the lineups that were not screend out, I found that seven players were in each of the lineups.  They were the strongest offensive players.  I reordered the lineups so that these seven players would be in a specific place in the first seven positions in the batting order to maximize the average number of runs per game.  One of these players could possibly play DH.

The last three players in the lineup filled out the defensive positions.  So at this point, I had a number of lineups with batting orders that provided good offensive statistics.

I ran the Monte Carlo simulation to do a runoff of the batting orders to find the best offensive lineup for each pitcher while also having all of the defensive positions filled out.

Tuesday, 2 April 2013

Likelihood of Making the Final Four Tournament with an Eight Team League

In an earlier post, I predicted the chances of team G making the four team final weekend tournament based on the 7 team playoff format. The probability that I estimated for team G was 45%.

This probability would change in the 8 team league. The first round playoff format could have 1st play 8th, 2nd play 7th, 3rd play 6th and 4th play 5th.  The opposition for team G plays in the first round will depend on how the new team performs.

Teams A, B, C and D will be in the North Division.  Teams E, F, G and H will be in South Division.

I will assume the new team is team A.  I used the Pythagorean Formula to determine the runs for and against for team A based on various values of their winning percentage.  Then I adjusted the runs for and against for from the 2012 season for the other teams based on the results for team A.  I then calculated the winning percentage for the other teams in the league.

Then I found the final standings based on the winning percentage of each team.

The final standings would be as shown below based on the performance of team A.

Team A Percent 0.400 0.450 0.500 0.550 0.600 0.650 0.700 0.750
Standings
1 E E E E E E E A
2 B B B B A A A E
3 C C C A B B B B
4 G  G  A C C C C C
5 F F G  G  G  G  G  G 
6 H A F F F F F F
7 A H H H H H H H
8 D D D D D D D D

So if team A has a winning percentage between a 0.400 or 0.450, team G plays team F in the first round.  Based on team G's head-to-head winning percentage versus team F, the probability of winning the first round of the playoffs would be 0.500.

If team A has a 0.500 winning percentage, then team G would play team A in the first round.  Team G's head-to-head winning percentage versus team A is in this case expected to be 0.452.  Thus, the probability of team G winning the first round would be 0.411.

For all of the other cases, team G would play team C in the first round.  Team G's head-to-head winning percentage versus team C is expected to be 0.435.  Thus the probability of winning the first round of the playoffs would be 0.380.

The Effect on Regular Season Performance of Adding an Eighth Team to the League

In an earlier post, I discussed how I used the runs for and against to predict the win percentage of a team.

Then I described how the win percentages for two teams could be used to predict results of head-to-head competiton.

Finally, I used the head-to-head win percentage to predict the regular season results for a 7 team league.

This season an eighth team has joined the league. It is difficult to predict their win percentage.

The league will be divided into two divisions. The teams will play three games against the other teams in their division and two games against the teams in the other division for a 17 game schedule.

My prediction for runs for and against are the same for team G, namely 104 runs for and 120 runs against. Thus, the predicted win percentage for team G is 0.429.

I assumed that teams A, B, C and D were in Division I and teams E, F, G and H were in Division II. I also assumed the two divisions were identical. I calculated the percentile for the standings in each division, that is, 0.8, 0.6, 0.4 and 0.2.

Then I used the inverse normal distribution to calculate the win percentage for the teams in the divisions by finding the standard deviation that would make team G and team C have a win percentage of 0.429.

The table below shows my predictions for the win percentage of the 8 teams.

Division I Rank Normal Win Percent
A 1 0.80 0.736
B 2 0.60 0.571
C 3 0.40 0.429
D 4 0.20 0.264




Division II Rank Normal Win Percent
E 1 0.80 0.736
F 2 0.60 0.571
G 3 0.40 0.429
H 4 0.20 0.264


I can now calculate the head-to-head performance of the 8 teams as shown in the table below.



0.736 0.571 0.429 0.264 0.736 0.571 0.429 0.264


A B C D E F G H
0.736 A 0.500 0.677 0.788 0.886 0.500 0.677 0.788 0.886
0.571 B 0.323 0.500 0.639 0.788 0.323 0.500 0.639 0.788
0.429 C 0.212 0.361 0.500 0.677 0.212 0.361 0.500 0.677
0.264 D 0.114 0.212 0.323 0.500 0.114 0.212 0.323 0.500
0.736 E 0.500 0.677 0.788 0.886 0.500 0.677 0.788 0.886
0.571 F 0.323 0.500 0.639 0.788 0.323 0.500 0.639 0.788
0.429 G 0.212 0.361 0.500 0.677 0.212 0.361 0.500 0.677
0.264 H 0.114 0.212 0.323 0.500 0.114 0.212 0.323 0.500


Then I can predict the regular season performance of the 8 teams as shown in this table.


A B C D E F G H Wins
A 0 2 2 3 1 1 2 2 13
B 1 0 2 2 1 1 1 2 10
C 1 1 0 2 0 1 1 1 7
D 0 1 1 0 0 0 1 1 4
E 1 1 2 2 0 2 2 3 13
F 1 1 1 2 1 0 2 2 10
G 0 1 1 1 1 1 0 2 7
H 0 0 1 1 0 1 1 0 4
Losses 4 7 10 13 4 7 10 13

Thus my prediction of the regular season results is

Div I Wins Losses Win Percent
A 13 4 0.750
B 10 7 0.574
C 7 10 0.426
D 4 13 0.250




Div II Wins Losses Win Percent
E 13 4 0.750
F 10 7 0.574
G 7 10 0.426
H 4 13 0.250

Saturday, 30 March 2013

Predicting the First Round of the Playoffs

In this 7 team league, the first round of the playoffs involves the 2nd through 7th place teams. The 1st place team gets a bye into the final weekend double-knockout 4 team tournament involving the three winners of the first round.

In the first round, the 2nd place team plays the 7th place team, the 3rd place team plays the 6th place team, and the 4th place team plays the 5th place team. Each of these is a best-of-5 series.

In my last post, I made a prediction about the regular season standings. So in the first round of the playoffs, team B would play team F, team C would play team D and team E would play team G.

I am interested in the probability of team G winning the best-of-5 first round series and making it to the final weekend tournament.

Team G's regular season win percentage is estimated to be 0.429 and team E's regular season win percentage is estimated to be 0.456. 

So the probability of team G beating team E in one game of head-to-head competition is 0.473.

Team G could win the best-of-5 game series in 3, 4 or 5 games.

The probability of team G winning the series in 3 games is

[ 0.473 * 0.473 * 0.473 ] = 0.106.

The probability of team G winning the series in 4 games is

[ 3 * (1 – 0.473) * 0.473 * 0.473 * 0.473 ] = 0.167.

The probability of team G winning the series in 5 games is

[ 6 * (1 – 0.473) * (1 – 0.473) * 0.473 * 0.473 * 0.473 ] = 0.176.

Adding up these three probabilities, the total probability of team G winning the best-of-5 series against team E is

[ 0.106 + 0.167 + 0.176 ] = 0.449.

So there is a 45% chance that team G will make it to the final weekend tournament in 2013.

Regular Season Predictions for 2013

I can estimate the head-to-head performance of two teams from their win percentages.

If a is the win percentage for Team A and b is the win percentage for Team B, then the probability of Team A beating Team B in one game can be estimated as a * (1 – b) / [ a * (1 – b) + (1 – a) * b ] .

In my last post, I estimated the win percentage for the 7 teams in the league based on the expected runs for and against for each team.

I can now estimate the probability of each team winning in head-to-head competition.

The table below shows these calculations.



0.765 0.562 0.521 0.456 0.429 0.425 0.364


A B C E G D F
0.765 A 0.500 0.717 0.749 0.795 0.812 0.815 0.851
0.562 B 0.283 0.500 0.541 0.605 0.631 0.634 0.692
0.521 C 0.251 0.459 0.500 0.565 0.592 0.596 0.656
0.456 E 0.205 0.395 0.435 0.500 0.527 0.531 0.595
0.429 G 0.188 0.369 0.408 0.473 0.500 0.504 0.568
0.425 D 0.185 0.366 0.404 0.469 0.496 0.500 0.564
0.364 F 0.149 0.308 0.344 0.405 0.432 0.436 0.500

Assuming a balanced 18 game schedule, the win-loss record for the 7 teams would be as follows


A B C E G D F Wins
A 0 2 2 2 2 2 3 14
B 1 0 2 2 2 2 2 10
C 1 1 0 2 2 2 2 9
E 1 1 1 0 2 2 2 8
G 1 1 1 1 0 2 2 8
D 1 1 1 1 1 0 2 7
F 0 1 1 1 1 1 0 6
Losses 4 8 9 10 10 11 12

Thus team G would be tied with team E for 4th place at the end of the regular season.

Improvement in Winning Percentage


In an earlier post, I calculated the expected runs that would be scored by a team using linear weights and an optimal assignment of plate appearances between the players on the team. I found that the expected number of runs scored would increase from 90 in 2012 to 104 in 2013.

In another earlier post, I calculated the expected number of runs allowed based on a linear model of the pitchers and an equitable distribution on innings for each pitcher. I found that the expected number of runs allowed in 2013 is 120.

Also in an earlier post, I described the Pythagorean formula to estimate the win percentage from the runs for and against.

If I assume that the other teams in the league do nothing to improve while team G optimizes the plate appearances and pitchers innings, I can estimate the number of runs for and against for the 7 teams in 2013 based on the runs for and against in 2012 and the improvement of team G.

Below is a table which shows the estimated win percentages for the 7 teams in 2013 based on the runs for and against estimated for 2013.


2012 2013
Team RF RA RF RA Win Pct
A 109 59 109 61 0.765
B 94 81 94 83 0.562
C 92 86 92 88 0.521
E 106 113 106 116 0.456
G 90 119 104 120 0.429
D 74 84 74 86 0.425
F 79 102 79 105 0.364









We can see that in 2013, team G would move to 5th place in the standings based on estimated win percentage.



Friday, 29 March 2013

Determining the Necessary Improvement to Move Up in the Standings

In an earlier post, I discussed the Pythagorean formula for estimating winning percentage from runs for and runs against.

In this post, I will use the Pythagorean formula to determine the necessary improvement the last place team in a league would need to make to move up in the standings.

There were seven teams in the league and I found the average and standard deviaton of the winning percentages for the league. Then based on the rank, I found the percentile for the inverse normal distribution. Then based on the percentile and the average and standard deviation, I found the expected winning percentage for each of the teams in the league.

I assumed that the last place team would need to improve their offensive and their defensive equally to move up in the standings. That is, the runs for would need to increase and the runs against would need to decrease by the same amount. 

rank percentile win pct runs for runs against change
1 0.88 0.636 119 90 29
2 0.75 0.580 113 96 23
3 0.63 0.538 109 100 19
4 0.50 0.500 105 105 15
5 0.38 0.462 101 108 11
6 0.25 0.420 96 113 6
7 0.13 0.364 90 119 0

Thus if the top four teams of the seven team league make the playoffs, the last place team from 2012 would have to increase their runs for by 15 and decrease their runs against by 15 in 2013.

Wednesday, 27 March 2013

A Goal for Pitchers

A common idea is that successful pitchers stay ahead on the count.  That means pitching more strikes than balls.  It has been suggested by some baseball coaches that a pitcher should strive to pitch 60% strikes.  I collected some data from the recent International Softball Federation World Tournament that confirmed this suggested goal for pitchers.

I wanted to answer the question: how important is it to pitch a high percentage of strikes?  I found data from Major League Baseball on this subject.  There were 88 pitchers in the sample.  The average strike percentage was 64.0% and the average earned run average was 3.87.

I did a linear regression to determine the relationship between strike percentage and earned run average.  The significance of the relationship was very high.

The regression equation I found is:

Earned run average = 12.5 – 13.5 * strike percentage

Here are some predicted values for earned run average based on strike percentage using this equation.

Strike Percentage
Earned Run Average
70%
3.06
65%
3.73
60%
4.41
55%
5.08
50%
5.76

From these results, I can see why coaches recommend that pitchers strive to throw at least 60% strikes.

 

 

Sunday, 17 March 2013

Distributing Plate Appearances between Position Players

In my last post, I distributed the number of innings between a men's fastpitch softball teams's pitching staff.

In this post, I will distribute the plate appearances for the position players on a men's fastpitch softball team.

In 2012, there were 492 regular season plate appearances distributed between 15 players.  I would like to distribute these 492 plate appearances for the 2013 regular season among the 14 players while maximizing the number of runs created by the team.

Recall the linear weights based runs created formula that I found for this men's fastpitch softball league.

Runs Created = 0.44*1b + 0.83*2b + 1.00*3b + 1.38*hr + 0.31*walks

I calculated the runs created per plate appearance for each of the 14 players.  Then I ranked the players by runs created per plate appearance.

I found the mean and standard deviation of the plate appearances for the team's players in the regular season for 2012.
 
Then I used the inverse normal probability distribution with the percentile found using the rank of the player’s runs created per plate appearance and the mean and standard deviation of the plate appearances for the team in 2012.  In this way, I could determine the ideal number of plate appearances for the 2013 season for each of the players.

rank runs/pa norm dist new pa new runs
1 0.30 0.93 56 17
2 0.25 0.87 50 13
3 0.24 0.80 47 11
4 0.22 0.73 44 10
5 0.22 0.67 41 9
6 0.21 0.60 39 8
7 0.20 0.53 36 7
8 0.19 0.47 34 7
9 0.17 0.40 32 6
10 0.17 0.33 29 5
11 0.16 0.27 27 4
12 0.14 0.20 24 3
13 0.14 0.13 20 3
14 0.13 0.07 15 2



493 104

This team scored 90 runs in 2012.  So with this ideal distribution of plate appearances, they should be able to improve that to 104 runs in 2013.

Friday, 15 March 2013

Balancing the Innings for a Pitching Staff


In my last post, I discussed Pete Palmer’s Linear Weights formula.  I showed how it could be used to estimate the number of runs produced by a men’s fastpitch softball team.

In this post, I will look at a similar idea of linear weights to evaluate a pitching staff.  Then I will use the linear weights to balance the innings assigned to each pitcher to minimize the runs given up by a men’s fastpitch softball team.

I took the pitching statistics for the primary pitchers in a local men’s fastpitch softball league.   I calculated various pitching statistics in terms of their values per inning.  Then I used multiple linear regression to estimate the runs allowed per inning pitched as a function of hits allowed (non-homeruns), walks (base on balls and hit by pitch), strikeouts, and homeruns allowed per inning pitched.

Here is the data that I used.

Pitcher
Hits
Walks
Strikeouts
Homeruns
Runs Allowed
1
1.05
0.42
1.32
0.08
0.58
2
1.23
0.57
0.91
0.16
1.25
3
1.49
0.47
1.24
0.17
1.18
4
0.79
0.26
1.44
0.12
0.32
5
0.83
0.68
1.43
0.00
0.48
6
1.03
0.34
1.52
0.08
0.76
7
1.09
0.34
1.07
0.10
0.73
8
1.13
0.54
1.02
0.17
1.22
9
1.42
0.46
0.46
0.07
1.05
10
1.31
0.20
0.76
0.06
0.71
11
1.09
0.76
1.27
0.15
0.97
12
1.19
0.65
1.19
0.11
0.98
13
0.46
0.33
1.77
0.06
0.40
14
1.15
0.66
0.66
0.16
1.23

The formula that I obtained from the linear regression is

Runs Allowed = 0.42*Hits + 0.55*Walks – 0.14*Strikeouts + 2.36*Homeruns

Pitchers 1, 2 and 3 are on the same team.  

I wanted to balance the number of innings between the three pitchers.  I found that one good way to do that was to equalize the runs allowed by each pitcher.

Here are the results.

Weight
0.42
0.55
-0.14
2.36
Games
Innings
Hits
Walks
Strikeouts
Homeruns
Runs Allowed
9
60
1.05
0.42
1.32
0.08
40
5
37
1.23
0.57
0.91
0.16
40
5
36
1.49
0.47
1.24
0.17
40
19
133
120

So the manager should plan to throw pitcher 1 for 60 innings or the equivalent of 9 games during the season.  Pitchers 2 and 3 would be expected to throw 37 and 36 innings respectively which represents approximately 5 games each.

The entire ptiching staff would be expected to allow 120 runs during the season.

I can now use the expected offensive production of the batters on the team from the previous post and the expected runs allowed by the pitchers shown here in the Pythagorean formula to estimate the winning percentage of the team during the regular season.