5.6 billion Aussie rules games, simulated on an off-the-shelf laptop.
In the work on leverage I’ve done over the past couple of months, I figured there’s a fixed winning percentage given the minute, the margin, and a team’s initial (estimated) winning percentage. I also figured this could be easily achieved through simulation.
I simulated 10,000 games for each minute, margin (within 12 goals), and initial winning percentage to get the estimated winning percentage, in order to create a master table of winning percentages which can be used by anyone. While computers are amazing, lookup tables still definitely have their uses.
Methodology
The simulator made some basic assumptions: there was a 52.4% chance of a scoring shot in any given minute, and an additional 6.8% chance of an additional, second scoring shot, based on 2019 data scraped from afltables.com. I did not simulate a third scoring shot, which is possible but very rare, because I only found two or three instances of that in the data set. I didn’t check to make sure there wasn’t a string of behinds which generated that event chain, simply choosing to ignore it completely.
The simulator also translated the winning percentage into expected goals using a best-fit line from spreads using Squiggle AFL. Unfortunately, this did not work well as the winning percentage increased, as I had less data for predicted blowouts, so I ended up using trial and error to figure out which values corresponded with the initial winning percentage (if the initial winning percentage is 98%, the favourite should win roughly 9800 out of 10,000 games in all instances, just as a team with a 51% chance to win should win roughly 5100 out of 10,000 games.)
I ended up using a fourth-order polynomial equation to generate expected goals for percentages greater than 82%, and it matched very well at the initial win percentage. An 80% win percentage with a tie score at the start of the game averaged out to 79.9% over 10,000,000 simulations, with a standard deviation of only 41 wins, and no simulation more than 1.5% away from the 80%. Similar results existed at the 90% mark, though this was skewed slightly above 90% and had a slightly lower standard deviation. Because of the margin of error involved, I would not use this data set to bet on any game below a certain threshold.
I also wrote the simulator in php.
In 2012, I was tasked with taking a large data set from a website, parsing it for updates, and then emailing a spreadsheet to my boss, who would present the spreadsheet at a daily 8:30am meeting. I wrote a php script which worked quickly and wonderfully.
We hired someone with Silicon Valley credentials who asked me what I was doing and asked why I wasn’t using Python to run the script. I told him it was because I didn’t know Python that well yet (true) and that php was an order of magnitude faster (also true.) He ended up taking the project off my hands and re-writing it in Python even though the rewrite conveyed no benefit whatsoever, apart, apparently, from the fact it was in Python.
I ended up re-writing the simulator in php. After letting it run overnight, it had almost finished. By my timer, Python would have taken several days. I then fed the finished data set into Python and graphed it.
Finally, I ended up simulating margins which could only be considered theoretical, such as being down by 12 goals in the first minute. I thought the information would be very valuable in determining initial spreads given a win percentage. How many goals down does the favourite have to be to have a 50-50% of winning the match?
Findings
Because the data is four-dimensional, it’s obviously not the easiest to graph as an entire data set, but if you hold one or two of the values constant, there are wonderful graphs to be made.
For instance, holding the margin constant at zero shows how the longer the underdog can keep the game tied, the more likely the favourite won’t win. While that’s an obvious conclusion, what’s interesting to me is just how even the game becomes if tied in the last ten minutes. Also note the noise.
I found the initial winning percentages interesting as well. The simulator predicts a team with an 85% initial winning percentage should win by five goals 50% of the time, as the simulator predicts the spread to be 30 points. At 90%, the spread should be six goals, increasing dramatically to 99%, which the simulator estimates would be a 12-goal initial spread. The 99% team would also still be favoured to win if they were down by five goals or less at halftime. Unfortunately, I don’t have betting data to look at to see if this holds true over time.
These are just a couple of the stories you can tell from the simulated data set – if you come up with any more on your own, I’d love to read about them.
Download
I’ve put both the PHP code and the final table up at https://github.com/johnpholden/afl-leverage-simulator. I just noticed the “home team” should be labeled “favourite” and the “away team” should be labeled “underdog” in the table.
If you end up using the table or code, or performing any sort of statistical analysis on this, please let me know.