As I sat with my friend in the second deck of Dodger Stadium on September 4, 2017, we began to contemplate the impossible.
Heading into the bottom of the sixth inning, Robbie Ray had retired the first 15 batters he faced without allowing a baserunner. The Diamondbacks were up 2-0, J.D. Martinez having hit a two-run homer in the 4th.
Leading off the bottom of the 6th, Logan Forsythe would put a temporary end to our dream of seeing a historical baseball evening with a hard-hit single to center. Ray’s perfect game had suffered an early defeat.
Unlike most baseball nights, the dream of seeing history would later be resurrected by Kristopher Negron’s (remember him?) ninth-inning double to left, virtually guaranteeing Martinez would get his chance to hit his fourth home run of the night.
Unlike the perfect game, the four home run night builds: it is not until the third home run you consider yourself a potential witness to history, with a perfect game conspicuous by the three consecutive zeros on the scoreboard from start to finish.
With the scoreboard still showing those zeros in the fifth, I began to wonder: how rare of an event is it to retire the first 15 batters of the game? And, certainly, while the odds of a perfect game are easily found, certainly if you’ve gone slightly more than halfway through a game without allowing a batter, your odds of finishing the night without a baserunner must be increased from the norm, right?
I couldn’t find this information anywhere, in spite of getting to Google Page #17 on a fairly specific search. As such, I decided to write a little computer program (a Python script, for you fellow nerds) which parsed retrosheet.org play-by-play data to figure out how rare the Robbie Ray 15-straight-to-start-the-game outing was, and to determine the odds of pitching a perfect game by the number of outs from the top of the first.
I took all the play-by-play files from retrosheet.org from 1930 – not all the games are complete, and there may be small errors here and there which I didn’t account for, and I didn’t check for pitching changes – and ran them through a parser. The parser determined at what out the first batter reached base. If the walk happened with 2 outs in the first, I marked it as a “2”.
It turns out Robbie Ray’s 15 straight was a decently special event, occurring only 939 times in the 304,933 team play-by-plays analyzed (it’s an odd number since several bad play-by-play files were removed.) If it were a batting average, seeing 15 in a row mowed down is akin to seeing a player with a .003 batting average getting a hit. It happens once about every 162.5 games, so a starter for each team should do it once a year.
However, to demonstrate the difficulty of pitching a perfect game, Ray only had a .019 chance of making history at that point. He’s already pitched a game in the 99.7% percentile in terms of keeping runners off base from the start of the game, and he’s still only got a very small chance of making history.
Statistically, a pitcher has about a 28% chance of getting through the first inning without allowing a baserunner, but he only increases his odds of pitching a perfect game to .0002.
The probabilities finally start looking better for a pitcher around the seventh inning. This makes sense, as more perfect games are lost with 9 outs gone than 8, and even 18 outs gone than 17, because of the pitcher’s slot in the National League (8 and 17 representing two outs in the bottom of the 3rd and 6th, respectively.) The pitcher will have gone through what should be the best part of the lineup three times and is on the home stretch.
The first time a pitcher has a >10% chance of a perfect game is with 21 outs gone, or through seven innings. From there, the odds begin increasing dramatically: 20.7% after 22, 29.5% after 23, 41.9% after 24, 51.4% after 25, and 62.1% after 26 outs. That’s right: 11 of the 29 times a pitcher got to 26 outs, batter number 27 reached base (including, remember, Yusmeiro Petit.)
Though the sample size is very, very small to the point where trying to divine meaning doesn’t make much sense, it’s still interesting the 29 batters representing the final out in a perfect game have a reached base a higher percentage of the time (.379) than the 304,933 leadoff batters (.345). Even more interesting, the biggest jump in historic probability occurs between outs 23 and 24. Batters have only broken up a perfect game 14 times in the bottom of the 9th with zero or one out, compared to 11 times with two outs. In fact, the rarest situation is the perfect game broken up after one out in the 9th – the 8th batter has only made it on base 6 times in 35 possibilities, for a .171 OBP (well, an OBP which includes errors.)
If you see a pitcher retire the first three batters, that’s between a 1-in-3 and 1-in-4 occurrence. If he makes it through two innings, that’s approximately a 1-in-11. Retiring nine straight to start the game happens roughly once every 31 games – by that point, the pitcher has done better than 97.9% of all pitchers since 1930, in games without corrupted play-by-play data.
Another interesting statistic: the on base with error percentages are highest for outs 27 (.379), 2 (.355), 1 (.345), 20 (.349), and 11 (.339). Generally, with the exception of the number three hitter, the chance a batter will reach base in a given situation decreases as the game gets on, which is in line with what we’d expect with a pitcher capable of retiring 18 straight – until, of course, that final out.
Oh, and the Dennis Martinez 1991 perfecto is the only time in the play-by-play I looked at where both pitchers took perfectos into the sixth inning in any game – fortunately for Dennis, Mike Morgan gave up a no-out sixth inning hit to Tucson, Arizona native Ron Hassey, the only catcher in history to catch more than one perfect game.
So, your conclusion: enjoy the games where the pitcher gets into the fourth or fifth inning without allowing any baserunners. The fun’s likely to end soon, but you’ve still witnessed a fairly rare event. If a pitcher is perfect through seven, you’re unlikely to see a perfect game, but you’ve now got a 1-in-7 chance of witnessing history. And be nervous with two outs in the ninth, since the only person in the stadium who isn’t nervous is probably Eric Chavez.
I have included the table and the Python script below. (Sorry about the headers. An hour of CSS design didn’t do much.)
Outs |
Times ended |
Total occurrences |
Percent chance of perfect game |
Percent chance of retiring X straight to start the game |
Individual odds of having the game be broken up after this out |
Percentile |
Odds change between outs |
OBP (with error) % by out |
1 in X games* |
0 |
105159 |
304933 |
.000 |
1.000 |
.345 |
.345 |
.000 |
.345 |
1 |
1 |
66685 |
199774 |
.000 |
.655 |
.219 |
.564 |
.000 |
.334 |
2 |
2 |
47259 |
133089 |
.000 |
.436 |
.155 |
.719 |
.000 |
.355 |
2 |
3 |
29771 |
85830 |
.000 |
.281 |
.098 |
.816 |
.000 |
.347 |
4 |
4 |
18385 |
56059 |
.000 |
.184 |
.060 |
.876 |
.000 |
.328 |
5 |
5 |
12020 |
37674 |
.000 |
.124 |
.039 |
.916 |
.000 |
.319 |
8 |
6 |
7902 |
25654 |
.001 |
.084 |
.026 |
.942 |
.000 |
.308 |
12 |
7 |
5202 |
17752 |
.001 |
.058 |
.017 |
.959 |
.000 |
.293 |
17 |
8 |
2760 |
12550 |
.001 |
.041 |
.009 |
.968 |
.000 |
.220 |
24 |
9 |
3246 |
9790 |
.002 |
.032 |
.011 |
.979 |
.001 |
.332 |
31 |
10 |
2185 |
6544 |
.003 |
.021 |
.007 |
.986 |
.001 |
.334 |
47 |
11 |
1476 |
4359 |
.004 |
.014 |
.005 |
.991 |
.002 |
.339 |
70 |
12 |
923 |
2883 |
.006 |
.009 |
.003 |
.994 |
.003 |
.320 |
106 |
13 |
604 |
1960 |
.009 |
.006 |
.002 |
.996 |
.004 |
.308 |
156 |
14 |
417 |
1356 |
.013 |
.004 |
.001 |
.997 |
.006 |
.308 |
225 |
15 |
261 |
939 |
.019 |
.003 |
.001 |
.998 |
.007 |
.278 |
325 |
16 |
190 |
678 |
.027 |
.002 |
.001 |
.998 |
.010 |
.280 |
450 |
17 |
107 |
488 |
.037 |
.002 |
.000 |
.999 |
.010 |
.219 |
625 |
18 |
116 |
381 |
.047 |
.001 |
.000 |
.999 |
.021 |
.304 |
800 |
19 |
73 |
265 |
.068 |
.001 |
.000 |
.999 |
.026 |
.275 |
1151 |
20 |
67 |
192 |
.094 |
.001 |
.000 |
1.000 |
.050 |
.349 |
1588 |
21 |
38 |
125 |
.144 |
.000 |
.000 |
1.000 |
.063 |
.304 |
2439 |
22 |
26 |
87 |
.207 |
.000 |
.000 |
1.000 |
.088 |
.299 |
3505 |
23 |
18 |
61 |
.295 |
.000 |
.000 |
1.000 |
.124 |
.295 |
4999 |
24 |
8 |
43 |
.419 |
.000 |
.000 |
1.000 |
.096 |
.186 |
7091 |
25 |
6 |
35 |
.514 |
.000 |
.000 |
1.000 |
.106 |
.171 |
8712 |
26 |
11 |
29 |
.621 |
.000 |
.000 |
1.000 |
.379 |
.379 |
10515 |
27 |
18 |
18 |
1.000 |
.000 |
.000 |
1.000 |
-1.000 |
1.000 |
16941 |
*divide by two to get the actual number, since we “count” each game twice: once for each team.
And the python script:
*divide by two to get the actual number, since we “count” each game twice: once for each team.
#© 2018 John Holden
#You are free to use and distribute this script as long as you do not charge and credit is given to the author
#Script to determine play-by-play
import csv
import os
#printMe function -
#instead of printing all files to the console, just print a couple games to check to see it’s working properly
#otherwise it gets super unwieldy super quickly
#also check to see what games don’t catch and give you >27 outs
def printMe(printString):
if (activegame < 17 and activegame > 16) or firstout1 > 27 or firstout2 > 27:
print(printString)
#define which variables we will be using
activegame = 0
onbase1 = 0 #flag for the away team to see if anyone has been on base
onbase2 = 0 #flag for the home team to see if anyone has been on base
firstout1 = 0 #variable to store how many outs there were when person reached base
firstout2 = 0
ider = '' #for display purposes – gets the ID of the game
outs = {} # dictionary to track the number of times a perfecto was broken up at a specific number of outs
for csvFilename in os.listdir('PerfectGame'):
if csvFilename.endswith('.EVN') or csvFilename.endswith('.EVA'):
with open('PerfectGame/' + csvFilename, 'rb') as csvfile:
pbp = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in pbp:
if row[0] == "id":
printMe("new game")
if activegame > 0:
outs[firstout1] = outs.get(firstout1, 0) + 1
outs[firstout2] = outs.get(firstout2, 0) + 1
if firstout1 > 25 or firstout2 > 25:
#print all the games that went into the ninth inning to make sure this is working properly
print(ider + " " + str(firstout1) + " " + str(firstout2))
#we have a new game going so reset all the variables
firstout1 = 0
firstout2 = 0
onbase1 = 0
onbase2 = 0
activegame = activegame + 1
ider = row[1]
if row[0] == "play":
pbpres = row[6]
printMe(row)
#checks to see if any of the game data from Retrosheet matches getting on base
if onbase1 == 0 and row[2] == '0':
if pbpres[:1] == "H" or pbpres[:1] == "T" or pbpres[:1] == "D" or pbpres[:1] == "S" or pbpres[:1] == "W" or pbpres[:1] == "E":
onbase1 = 1
printMe("game1 killed" + str(firstout1))
elif pbpres[:2] != "NP":
firstout1 = firstout1 + 1
#checks for home team – this could have been looped
if onbase2 == 0 and row[2] == '1':
if pbpres[:1] == "H" or pbpres[:1] == "T" or pbpres[:1] == "D" or pbpres[:1] == "S" or pbpres[:1] == "W" or pbpres[:1] == "E":
onbase2 = 1
printMe("game2 killed " + str(firstout2))
elif pbpres[:2] != "NP":
firstout2 = firstout2 + 1
print(outs)