Archive for the ‘Math’ Category

Bowling Simulation

Sunday, January 24th, 2010

I don’t have a particularly good question here.  I mainly just felt like writing a bowling simulation program and explaining the results.  If I had to form a question, I’d probably ask:  What does it take to become as good as a PBA bowler?

First off, I will ask the simple question of how likely is someone to roll a 300 game?  This is easy to answer even without a simulation program.  Rolling 300 involves rolling 12 consecutive strikes.  So, if someone rolls a strike 99% of the time, then their chances of rolling 300 are 99% raised to the 12th power, or about 88.6%.  Of course, no one is that good.  Based on my simulations and the fact that top PGA bowlers average around 220, I’d estimate that PBA bowlers throw strikes about 60% of the time.  At that rate, they should expect to roll a perfect 300 about once every 459 games.  (Note:  I just looked it up, and the record for season strike percentage is 66.35%, which would mean a perfect 300 about once every 137 games, a big difference — but 66.35 is the record, not the typical PBA bowler.)  I estimate my strike percentage as closer to 20%, meaning I’d roll a 300 about once every 244 million games.  Ha — not gonna happen!

Anyway, back to my simulation program.  I simulate a bowler using the following basic statistics:  for every 1000 first throws, how many times does the bowler get 10 pins down, 9 pins down, etc.  In addition, each bowler has a spare percentage — that is, of the time that they don’t throw a strike, what are the chances they’ll get a spare?  For simplicity sake, if they don’t throw a spare, I say that all outcomes  are equally likely; for example, if there are 2 pins remaining and their spare percentage is 70%, then they’ll knock down 1 of the remaining 2 pins 15% of the time, and neither of the pins 15% of the time.  I could of course make the simulation more realistic by making those numbers different, but that would make it more complicated and I don’t think it affects too much, especially for top bowlers where they rarely face more than 1 or 2 remaining pins, and rarely miss spares.

Ok, so with a little fiddling I was able to come up with a bowler that appoximates a top PBA bowler.  If a bowler throws strikes 60% of the time, leaves 1 pin standing 25% of the time, leaves 2 pins standing 15% of the time, and has a spare percentage of 80%, then their average will be 223 (based on 100,000 runs of the simulation).  From what I found on the web, the single-season record for average is 228.04, so that’s pretty close.  I also just looked it up and the single-season record for spare conversion percentage is 88.16%.  So 80% might be typical of top PBA bowlers.

As for me?  I estimate my average around 140 (my high is 187).  Someone with the following statistics would bowl an average of 142:  20% strikes, 20% leave 1 pin standing, 20% leave 2 pins standing, 10% leave 7 pins standing, 10% leave 6 pins standing, 10% leave 5 pins standing, 2% for each of 0-4 pins, and a spare percentage of 50%.  Of course there are other combinations that would result in an average of about 140, but that gives a rough idea of what my statistics might be (I don’t *think* I throw gutter balls on the first throw 2% of the time, but it made the numbers work out).

Also, what effect on my average would I have if I got better, improving my strike percentage to 25% and my spare percentage to 60%?  Then my average would go from 142 to 156.  The next time I go bowling (which is admittedly pretty rare), I’ll keep track of these stats and compare them with these numbers.

ERA Adjusted for the Number of Outs, Part 2

Wednesday, January 20th, 2010

I couldn’t help myself.  I decided to start writing a baseball simulation program.  Oh, there are already some programs available, some free, some for sale.  But it’s always more fun to do things on your own.  Plus, it was really quick and easy to do.

For the purposes of this question, I just did a quick simulation program with just a few features.  Each batter only has the following properties:  walks per thousand plate appearances (PTPA), singles PTPA, doubles PTPA, triples PTPA, and home runs PTPA.  Everything else is an out (and an unproductive one at that).  As for baserunning, each player has the following properties:  how often they go from second to home on a single, how often they go from first to third on a single, and how often they go from first to home on a double.  Everything else is station-to-station baseball.  So, there’s a lot I don’t take into account — bunts, sacrifice flies, steals, double plays, errors, the list goes on and on.  But it’s a decent approximation for our purposes.

So, as an example, I created a team filled with hitters that approximate the 2009 San Francisco Giants:  67 walks PTPA, 165 singles PTPA, 47 doubles PTPA, 7 triples PTPA, and 21 home runs PTPA.  I filled in some guesses as to baserunning percentages.  Then I ran the simulation for 100,000 innings.  The result was an average of 3.82 runs per game, which is very close to the Giants’ actual RBI average of 3.78 per game (and under their runs per game of 4.05).  So that was validation to me that my simulation was fairly accurate.   I did the same thing with the 2009 Phillies, and it resulted in 4.96 runs per game, very close to their actual RBI average of 4.86 per game (and under their runs per game of 5.06).

I then ran the same simulations but with each inning starting with 1 out, and then again where each inning started with 2 outs.  For the Giants, the opposing ERA went from 3.82 for normal innings to 3.10 for innings that started with 1 out, and 2.34 for innings that started with 2 outs.  For the Phillies, the numbers were 4.96, 4.15, and 3.26.

So, ERA for normal innings is about 1.2 times more than ERA for innings starting with 1 out, and about 1.6 times the ERA for innings starting with 2 outs.

So, to calculate someone’s adjusted ERA, let’s define the following:

  • R = number of earned runs allowed
  • I0 = innings pitched after entering with 0 outs
  • I1 = innings pitched after entering with 1 out
  • I2 = innings pitched after entering with 2 outs
  • Note, for example, if a relief pitcher enters with 2 outs, then finishes that inning and then starts the next, then I2 = 1/3 and the remaining innings are attributed to I0.

Then the adjusted ERA = 9 * R / (I0 + I1/1.2 + I2/1.6)

So, what does this mean in practice?  Well, I am ashamed to say that I spent an hour of my life analyzing the 2009 statistics for Brandon Medders (a Giants reliever).  Why Medders?  I had to pick a reliever, and I just figured guys like Jeremy Affeldt and Brian Wilson mostly started innings.  So, Brandon Medders appeared 61 times, giving up 23 earned runs in 68 2/3 innings for an ERA of 3.01.  Of those innings, 57 2/3 innings were pitched after starting the inning (or entering with 0 outs), 8 2/3 were pitched after entering with 1 out, and 2 1/3 were pitched after entering with 2 outs.

So Brandon Medders’ adjusted ERA = 9 * 23 / (57.67 + 8.67/1.2 + 2.33/1.6) = 3.12

So, just to summarize, his normal ERA was 3.01 and his adjusted ERA was 3.12.  Not a big difference, really.  If he had simply given up one more earned run, his ERA would have been 3.15.

So, in practice, it’s not much to worry about, but it was fun figuring it all out.  And now I have a baseball simulation program which I will probably refine and expand and use to answer other odd questions, like if a single player hits 30 home runs instead of 20, how many more wins does that translate into?

ERA Adjusted for the Number of Outs

Tuesday, January 12th, 2010

It occured to me a few months ago that relief pitchers, everything else being equal, should have lower ERAs than starting pitchers.  This is because your ERA should be lower if you enter an inning with 1 or 2 outs (instead of starting the inning with no outs).  Now, in this day and age a lot of teams have a set-up man who will start the 8th inning, and the closer will start the 9th.  But, still, there are definitely many times when relief pitchers will come into a game with 1 or 2 outs and men on base (or not).  Remember, as far as their own ERA goes, however, it doesn’t matter if there are men on base or not.  If he comes in with 3 men on base and 2 outs and gives up a grand slam, he’s only charged with 1 run.

To start with, I’ll do a relatively simple experiment that demonstrates that entering with 1 or 2 outs will lower your ERA.  For this demonstration I’ll simplify things.  Suppose a pitcher only walks or strikes out batters; there are no hits or stolen bases.  Simply put, the other team scores if there are 4 or more walks in an inning.  Obviously a game is not this simple, but it’ll be fine for our demonstration purposes.

Suppose that a pitcher walks a better every 1 / x times.  Another way to put it is that for each batter, the walk probability is B = 1/x and the strikeout probability is K = 1 – B.  The question is, what’s the expected number of runs scored in each inning?  We’ll call this E.  Then:

E = 0 * probability(0 runs) + 1 * probability(1 run) + 2 * probability(2 runs) + …

E = 1 * probability(4 walks) + 2 * probability(5 walks) + …

E = 1 * (15B^4K^3) + 2 * (21B^5K^3) + … + r * ((r + 5) choose (r + 3)) * B^(r+3) * K^3 + …

E = ((K^3) / 2) * (sum for all r (from 1 to infinity) of (r * (r + 5) * (r + 4) * B^(r+3)))

Rather than trying some fancy math to simplify this, I used my Euclid calculator to calculate E for various values of B.

If B = 1/2, then E = approximately 0.9375 (for an ERA of 8.44).

If B = 2/5, then E = approximately 0.374 (for an ERA of  3.36).

Suppose that instead of entering with 0 outs, the pitcher enters with 2 outs.  Then the expected number of runs (F) in that 1/3 of an inning is:

F = 0 + 1 * (B^4K) + 2 * (B^5K) + … + r * b^r * K + …

F = K*(sum for all r (from 1 to infinity) of r * B^(r+3))

If B = 1/2, then F = approximately 0.125 (for an ERA of 3.375).

If B = 2/5, then F = approximately 0.043 (for an ERA of 1.152).

So, clearly, there’s a huge difference between entering with 0 outs and entering with 2 outs in this simplified example.

Of course, in a real game, the difference will probably not be as dramatic.  For example, a home run is a run no matter if there are 0 outs or 2 outs.  So if you did the same experiment I just did, but only allowing a pitcher to have strikeouts or home runs, then the ERA should be the same regardless of the number of outs.

I’m not sure how much difference there is in actual practice.  I am sure it is significant, but I don’t know how much.  A wild guess would be that if the average ERA is 4.50 when entering with 0 outs, then it might be 3.50 when entering with 1 out and 2.50 when entering with 0 outs.  But that’s just a wild baseless guess.  If someone with more patience than me wanted to figure this out, they could simply scour the box scores of MLB games and crunch the data.  You’d probably have to go through a whole season to get statistically relevant data.  This could also be done with some sort of simulation program to simulate a real baseball game.  That sounds like fun.  Maybe I’ll do that sometime.

Note:  I’ve only checked the above equations a couple times, so it’s entirely possible I may have made some errors, so feel free to point them out if you see any mistakes.