AFL: Using leverage as a proxy for excitement

The historically boring 2019 grand final and the analytical press demonstrating most grand finals historically haven’t been the most exciting games led me to wonder if games could be ranked based on how exciting they were.

The last few minutes of a close game are one of the many appeals of Aussie rules. In terms of games which use a clock to keep time, the back-and-forth nature of a close AFL game makes for incredibly exciting finishes. Tracking the “excitement” of a game, then, could be determined by the game’s leverage.

I’m approaching the concept of leverage from the writings on kenpom.com, specifically Nic Reiner’s posts about tense NCAA tournament games and unbeaten teams, now from several years ago. In the footy context, leverage measures how much is at stake in a given minute, defined here as: how much would the win probability change if either team kicked a goal in that specific minute?

In doing so, I decided to write a simulation which would generate both a running winning percentage and a running leverage percentage.

A very similar calculation has already been done by Aflalytics in measuring the game’s tension. The findings from the simulator I wrote and the equation used over on that site end up matching quite closely.

I also want to highlight two things – I’m currently writing a footy simulator, Australian Football Coach – and in order to create more game data for grand finals, I have recently also captured the scoring plays from the 1990 and 1999 grand finals and sent them into AFL Tables. If you have raw video of old AFL games (before 2001), consider helping add to the database of games we have full scoring summaries for – I’ll write a further blog post about capturing this data.

Collecting the data

Before the simulator could be written, I needed game data. I ended up writing a Python script which didn’t automatically scrape afltables.com, but could take the scoring table which AFL Tables generates at the end of games and turn the raw data directly copied from the site and pasted into a text file into a format which could be used by a simulator. It’s more likely I haven’t stumbled upon a proper open database, but the script still has its uses. I’ll host the script on git if it’s of interest to anyone – please get in touch on Twitter (@thejohnholden) or through the Contacts page and I’ll put in the work to make it reusable.

Creating the simulator

While I fully admit a simulation pales in elegance to a proper equation, like the one used over at Aflalytics, I love writing simulators. My footy simulator is a collection of many different micro-simulations, and I’ve also done work writing quick-running AI simulations for some of the New Star games, especially the old and now hard to find New Star Tennis.

I needed the simulation’s inputs to be as simple as possible. I needed the simulator to read from a match file which tells the simulator when points were scored, how, and by which team, and I needed an input winning percentage. The match events were generated by the above script, the latter taken from Squiggle AFL. I only ran 11 games to start – the 2017, 2018, and 2019 grand finals, and the remainder of the 2019 finals series – partially because the simulation’s a bit slow, and partially because I got bored manually collecting game data. There was a simulator to write!

The simulation worked similarly to the simulation I wrote to check the winning percentage of the GWS-Collingwood game. For each minute of the game, the simulation checks to see if any scoring shots were kicked in real life, adds them to the score, and then runs three different simulations 10,000 times each: a simulation using the current score, a simulation where the winning team kicked an additional goal that minute, and a simulation where the losing team kicked an additional goal in that minute. To calculate leverage, I then took the absolute value of the difference of the winning percentage of the two simulations where I added a goal to each team.

To simulate the games, I predicted the chance a scoring shot would be kicked in a given minute, approximately 50%. I then determine if the scoring shot was a goal or a behind using the ratio of goals to behinds from the 2019 season, or a 23 in 40 chance of scoring a goal. I then used the initial winning percentage to generate whether the winning or losing team kicked the scoring shot.

The latter part proved slightly difficult, as I needed to calculate how likely a team with a given win ratio would be expected to kick a scoring shot in the game. My initial calculations were incorrect, and I ended up having to graph the expected winning percentage to the game-generated winning percentage in LibreOffice Calc – it turns out the y-intercept is 37.9%, meaning a team with a 1% initial winning percentage will still generate 38% of the scoring shots in the simulation. Fortunately, it now simulates out properly. Perhaps the biggest issue with the simulation is it does not predict multiple scoring shots in a minute, and while this happens – eyeballing the raw data – once a game, I’m not too concerned with this assumption.

The results

Of our 11-game set, the most exciting game was the 2019 elimination final between Brisbane and GWS. The Giants had only a 40% win chance and ended up winning by three points after kicking the winning goal with five minutes left in the match. The Giants jumped out to a 24-0 lead by kicking the first four goals, but wound up behind 32-26 at quarter time. This variation meant the most interesting first quarter based on average leverage in the data set wound up being the GWS-Bulldogs game in the first round of finals – only five goals were kicked, but with one exception the teams traded goals before GWS ended up breaking the game open in the third quarter. Second was the 2018 grand final, in part because the Brisbane-GWS game was much more back-and-forth: as Magpies fans are all too well aware, Collingwood jumped out to a healthy lead in the opening part of the 2018 decider, which had a negative impact on the leverage.

The least interesting game actually wasn’t the 2019 grand final, even though that game had by far the most boring second half of any game in the data set. Because GWS kept the first quarter of the grand final relatively close, the West Coast-Essendon blowout actually has a lower leverage score, mostly because the Eagles jumped on the Bombers faster than the Tigers pulled away. Five of the 11 games had an average fourth quarter leverage of less than 1% – the 2017 and 2019 grand finals, the Richmond-Brisbane first round final, GWS-Bulldogs as mentioned above, and West Coast-Essendon.

Here are three examples of different types of games: the very close 2018 grand final, the historical blowout 2019 grand final, and the most average game out of the 11, the Collingwood-Geelong game where no team scored for roughly 20 minutes. The red line is the leverage, the dotted line is the estimated winning percentage, and the x-axis is minutes since the start of the game.

Finally here’s a table of all 11 games which have been simulated, ranked by average quarter leverage and showing average leverage within each quarter. Note the 0% Q4 leverage for the grand final – this means that no scoring shot in that quarter moved the winning percentage at all:

Game Average Qtr

Leverage

Q1 Q2 Q3 Q4
2019 EF BRI-GWS 25.02% 16.00% 18.92% 20.98% 44.18%
2018 GF 22.76% 12.35% 14.06% 22.79% 41.84%
2019 PF GWS-COL 20.45% 14.41% 18.01% 18.48% 30.89%
2019 PF RIC-GEE 19.53% 15.53% 17.82% 23.58% 21.17%
2019 EF GEE-WCE 15.36% 11.09% 8.24% 24.55% 17.56%
2019 QF COL-GEE 12.93% 15.99% 12.21% 14.53% 9.00%
2019 EF GWS-WB 12.12% 16.40% 15.87% 16.03% 0.18%
2017 GF 11.68% 15.91% 18.31% 11.76% 0.75%
2019 QF RIC-BRI 11.13% 16.39% 19.17% 8.40% 0.55%
2019 GF 5.97% 14.88% 8.28% 0.73% 0.00%
2019 EF WCE-ESS 5.02% 12.13% 3.86% 3.87% 0.21%

Next Steps

I’d like to generate leverage outcomes for more games, but I’m looking for winning percentage over a longer period of time (Squiggle only goes back to 2017.) I also need to generate a larger data set of games to run. Ultimately I want to use the simulator to determine which player kicks goals at the most opportune times (though this data set may already exist somewhere) and write a piece to test my hypothesis on how grand finals tend not to be the most exciting games. If you have any other ideas about how this leverage simulator can be used, please get in touch.

MLB: The art of expected bases

I don’t quite remember which player statistics the Hillsboro Hops had up on their right field scoreboard during the Northwest League minor league game. One for sure was batting average, and one for sure was OPS.

OPS, of course, stands for On-base Plus Slugging, or, put mildly, the sum of a player’s on-base percentage (how often the player reaches base safely) and slugging percentage (a scale of how many extra base hits a player has made.)

My problem in that moment? I hate OPS. The number on the scoreboard had functionally no use to me, apart from the fact the third and final stat, the one I don’t remember, was either slugging percentage or on base percentage, so I could perform mental subtraction. In my time working in baseball as a statistics stringer, I’ve gravitated towards observational statistics over analytical statistics. I’m not setting up an argument against analytical statistics – as a data analyst, I think they’re very important, but they’re difficult to digest in the moment.

Sitting at the ballpark, I thought to myself – what I really want to know isn’t OPS, but rather expected bases. In any given plate appearance, how far down the base line would you expect the player to get?

The formula came to mind easily: (total bases + walks + hit by pitches) / (at bats + walks + hit by pitches) – functionally, slugging percentage, but with walks and hit by pitches added to both the numerator and the denominator.

What excited me is this number actually had meaning – instead of adding a number to another number, the formula represented how many bases you could expect a player to achieve in a given at-bat. Calculating it for the National League, Christian Yelich by far and away had the best in the 2019 season with .721 expected bases per near-plate appearances for the 2019 season, whereas Miami’s Lewis Brinson was the outlier at the other end with only .282 expected bases per near-plate appearance. (The NL average was .502 expected bases, or about half a base every near-plate appearance.)

What about sacrifices? I’ve excluded those, because in a sacrifice, it could be argued a player isn’t actually trying to reach base – instead, they’re trying to move another runner along at their own expense. As a result, sacrifices are treated as null, hence my near-plate appearance term used above.

I’ve looked around on the internet and haven’t found the formula anywhere. I’d be very surprised if this were a novel idea. The concept isn’t perfect, since the concept of a percentage of a base is indeed kind of odd – but being able to visualise the fact Christian Yelich would get an extra .210 bases per near-plate appearance means that in a game, you could expect Yelich to reach one extra base more the average player, which – let’s be honest, it’s incredible. That being said, reaching base means different things to different players – Ronald Acuña, for instance, scores 48% of the time he reaches base, in part to his base-stealing ability, while the Reds’ Eugenio Suarez is 9th in expected bases, but only scores 36% of the time he reaches base. I’m sure someone has looked at these scoring stats and how much they vary by player over seasons, but again, this is a tangent – a player only has so much control over what happens when they’re at the plate, and a metric which measures how many bases they would be expected to achieve only as a result of their plate appearance makes logical sense to me.

Also, while singles may be a better result than a walk (since it allows baserunners to move more than one base and isn’t reliant on forces) I’m happy to let other metrics measure those outcomes. One of the oddities of OPS is that on-base percentage sort of gets lost, but with expected bases, walks and HBP are treated the same as a single, since the player achieved the same result, albeit in different ways: gaining first base. If over ten at bats a player only walks, his expected bases will be 1.000, the same as OBP, and if the same player only hits home runs, his expected bases would match his slugging at 4.000.

The ranking actually matches OPS fairly closely, with an r² of .96, which makes sense – OPS isn’t a terrible metric. However, there were some stark differences. The biggest gainers were players such as Derek Dietrich, Hunter Renfroe, and at the bottom end of the rankings, Austin Riley, who all have an exceptionally high number of extra base hits. Players who dropped in the rankings compared to their OPS were Howie Kendrick, who rarely walks and hits a lot of singles, and Brian Reynolds. Pittsburgh’s Kevin Newman had the biggest overall drop in the rankings, with OPS thinking him 59th but only finishing 91st in expected bases. Newman does not walk all that often and has a very low percentage of extra base hits with 75% of his hits going for singles. Interestingly, Ronald Acuña did not drop in the rankings much in spite of not really being an extra base hitter. Adding walks into the equation will hurt a player who typically hits doubles, but this could also be considered a feature of the metric – if someone hits triples consistently, a triple has a much greater value than a walk.

I’ve been thinking about this for a few months and I like it a lot. As I’ve said above, I’m sure someone else has thought of this somewhere, but I strongly prefer it as an alternative to OPS, especially because it’s a statistic which can be easily visualized.

AFL: It’s only a 50-50 game in the last couple minutes if it’s tied

While bathed in controversy, the GWS-Collingwood 2019 preliminary final was a clear case of “next goal wins.” The Pies had the last nine scoring shots, kicking 4.5 to 0.0 after the 5:30 mark of the 4th quarter. From 27:57 to the sound of the siren at 34:57, GWS held a 4-point lead.

Even with Collingwood’s momentum, predictive models showed GWS were always predicted to win the match in the 4th quarter, even at a one-goal margin without any momentum (though momentum is overrated statistically.) This was a curious problem for me, as the expectation would be the match was 50%-50% late on, especially after two late behinds which would have given Collingwood a lead with another goal, but the computers always gave GWS a good 60% chance of winning at worst. Doing some fairly simple simulation, I wanted to determine whether this would be accurate – what percentage chance does a team four points down have of winning the game at a specific time?

Four points is an interesting question, since it more or less removes the chance a behind would be scored that would influence the outcome of the game. Let’s look at some assumptions:
– 4739 goals were kicked in the first 206 games of the 2019 AFL season, or about 23 a game, nearly on the number.
– 3513 behinds were kicked, about 17 a game.
– At 23 goals/game, the odds a goal will be kicked in a given second are 23/4800 (number of seconds in a match), or 0.0047%, or about 1 goal every 3.54 minutes. Of course, the odds of whether a goal will be scored in a given second vary wildly based on the pitch location of the ball, but this is a nice average for simulation purposes. (If a goal is scored in one second, the next second will have a 0% chance of producing a goal, but this was not built in.)
– Only one goal can be scored in a given second – an easy assumption, more of a rule, perhaps.

Assumptions in hand, let’s now take a single pinpoint of a match: the 17:00 mark of the 4th quarter, ignoring time on (TV time counting down from 3:00.) The away team has a four point lead, and the teams are exactly equal in strength. Running our simulation 100,000 times shows the away team will win the game 74.4% of the time, with the home team kicking the goal(s) they need to win 25.6% of the time. In 40.7% of games, no goal is kicked by either team. If my math is right (away team win probability when a goal is kicked = 74.4% – (100% – 40.7%)), that means in games where at least one goal is kicked, the team with the lead still wins 56.8% of the time.

But what if the teams aren’t equal? The line on the Collingwood-GWS game was Collingwood -21.5, or about three and a half goals. That means Collingwood’s rough expected goals using the average goals/game calculated above would be approximately 13.25 to 9.75, or a 79.5 to 58.5 win (ignoring behinds). This would weight Collingwood as having a 57% chance to score any given goal in a game.

Running the simulation to give Collingwood a 57% chance of kicking a goal only decreases GWS’ odds of winning to about 69.9% from 74.4%. In this simulation, “only” 40.6% of games didn’t feature a goal, functionally the same as before. This number’s important, however – if this event happens, GWS win! Even if Collingwood are weighted to kick goals 100% to 0% of the time, GWS still win the game no less than 40.6% of the time.

Given the assumptions baked into the simulation, Collingwood only win 50.7% of the games in which a goal is kicked. A true 50-50 game – but only if at least one goal is kicked!

Obviously there’s a lot of potential for error in this simulation, including not simulating behinds. The rate at which goals are scored may increase in high leverage situations, for one – I haven’t looked into leverage but would like to. Still, if your team is up by 4 with 3 minutes to play, and it feels like a “next goal wins,” it’s quite likely a “next goal wins”, but because of the high likelihood no goal is scored, it’s not a 50-50 game.

The odds of winning also go up steadily as the number of seconds in a simulation gets closer to zero. Given a starting 52-56 scoreline, at two minutes, GWS’ win rate gets up to 76.5%, at one minute it’s 86%, 30 seconds 92%, and 15 seconds remaining interestingly only 96.1% (as there’s a 6% chance of a goal being scored in any given 15-second span.)

Of course, with the benefit of hindsight, if Collingwood had kicked a goal, the 2019 grand final would have been 99% more likely to be interesting to a neutral. That being said, it does support the fact that what feels like a 50-50 game will be estimated correctly by a predictive model if it favours the winning team.