Maps: Automated Redistricting for QGIS

After mostly finishing up the QGIS redistricting tool, I realised that by reusing some of the code, I could come up with a QGIS plugin that could redistrict automatically.

This isn’t a novel concept, as a quick search of automated redistricters bring up attempts to redistrict automatically using such mathematical devices as Markov chains, or even autoredistrict.org, free and open source software which does exactly this. So why try to solve the problem again using QGIS? Simply because it’s something I’ve wondered about for awhile, and because I thought I could put it together quickly, which turned out to be the case.

If you want to jump in head first without reading further, you can download the plugin from https://github.com/johnpholden/qgisautoredistrict (and please throw a star on the repository if you find it useful.) If it doesn’t work (or if it does), please let me know – I may have missed uploading something. Now, into how this works!

This automated redistricter uses a flood-fill mechanism to assign districts. The workflow is fairly straightforward:
1. Pick an active district.
2. Pick a geography.
3. Has the district reached its target population? If not, assign the geography to the district.
4. Pick an adjacent geography and return to step #3 until the district has reached its target population.

In other words, the program picks a point and expands the district outwards until a target population gets reached.

For testing, my goal was to redistrict Connecticut into 151 state legislature seats as equally as possible. Why Connecticut? It’s geographically small, but densely populated, and isn’t too big and isn’t too small. There’s about 67,000 polygons in the file.

The first attempt went fine, but I ran into several problems. The redistricter is very stupid, so by picking neighboring geographies at random and then stopping when the target population is met, it creates enclaves, or unassigned areas in between other districts. As a design rule, I ensured the program could go over or under the number of target districts, and I stopped the first run after over 600 districts had been created. Most of these districts contained a very small number of people, were completely inside a completed district, and typically were only one or two polygons in size.

So I set about creating an enclave checker. After another test, I realised I also needed to account for very small districts which nevertheless touched two or three different districts. So I added in some more logic: if a neighboring district has been assigned, remember that district. Then, if the district hasn’t reached a certain target population threshold, assign it to the neighboring district with the lowest population.

A simple bug caused this to initially fail, as I forgot to tell the software not to add the current district to the list, so the program would never finish as it kept trying to recursively add a district to itself as it always had the lowest population. However, once I sorted this, the software hit the number of target districts square on the nose. Some districts were far too small and some were far too big, but the overall result was relatively nice.

The initial result – geographic boundaries were ignored, so the districts look a bit splattered onto the map.

However, the software remained stupid in two different ways. First, the software treated as a neighbor any polygon which touched another polygon at only a point, like Arizona and Colorado or Utah and New Mexico. This led to some interesting – and topologically incorrect – districts.

Second, the software didn’t take geography into account at all. The first attempt at a five-district Connecticut meant the first district spread out happily along the entire coastline, ultimately looking like some sort of gerrymandered serpent. (Spoiler: The second attempt also had this problem, but that’s because I used tracts, not counties, as the geography. It didn’t know to stop at the county line and start back-filling, as you’ll see.)

I needed to fix both of these things. First, I changed the neighbor checker to only include polygons which shared an edge or (sigh) overlapped, which fortunately didn’t make this any slower. I also decided to add in a geography field. If the program finds a neighbor which does not share the active geography field, it assigns it to a list. Only once the program has exhausted all of the fields in the active polygon, it will go on to the next geography.

I also decided to make the program smarter by choosing a neighbor directionally. This led to a very interesting map, where districts were created longitudinally, as the program would always pick the westernmost available district instead of spreading out normally, so I decided to revert this addition. However, the program can cycle through the direction that it picks the next available geography for the next district – I’m not sure how useful this will be, but it does add a bit of variety to the software.

Always picking the next adjacent westernmost district led to a map that looks almost like it’s crying.

Unfortunately, the final product isn’t perfect. In the final run, the software generates 152 districts for Connecticut where 151 is the target, and only 93 of these districts are within 2% of being a “perfect” district of equal population. Many of these districts are far too large as a result of the enclave assigner, with 50 of the 59 districts out of the 2% range being above their target population.

The final result – these districts look properly compact, except for the one which greedily gobbled up ocean
The final result near Hartford with an overlay, showing a decent amount of respected geographic boundaries

Unfortunately, this isn’t the easiest fix. The software doesn’t yet know if reassigning a mesh block that already has an assignment will split the district in two or not, and the initial attempt to fix this had the most abstract art map of all – running districts down zero-population freeway medians, which I intentionally kept in. One district, low on people and desperately needing to bulk up, found the median of Connecticut State Route Two, which had a median, and decided it should connect Glastonbury with Colchester while never being wider than about an eighth of a mile on either side of the highway. Other small districts did the same.

A discarded result from alpha testing – the bright yellow district connects disparate areas using a freeway median. North Carolina would be proud

The easiest thing to do would be to remove zero-population blocks completely from the map by merging them with neighbors, but I want to make sure this works out of the box.

In the congressional simulation, four districts were between 3,500 and 34,000 voters too large, and the remaining district was 74,130 voters too small. This isn’t good enough if you’re trying to generate perfect districts, but it’s a great starting point for users who want to create a congressional district map from something other than scratch.

The final congressional map. The red district is the one that is 10% short of its voters target, but all in all, not a terrible result despite the interestingly coastal district.

It’s also important to note at no point in time does this software use partisan indexes to create districts. It’s based purely on geography, as I believe districts represent both places and people, and that while we should strive to create competitive districts, we should also strive to create districts that people can easily geographically understand. It’s much easier to say “I represent people on this side of Princes Highway and to the west of Springvale Road” than it is to describe pretty much any district anywhere in the United States. But enough of the soapbox.

I’m probably going to continue working on this, but not immediately. However, I did want to open source the code on this in case others are interested in playing around with the formulas.

The next step will be to figure out how to reassign mesh blocks for districts which are over their population target (my first bid failed) and to add a second geography column to avoid situations such as the one seen in the Connecticut plan, where the first district expands along the shoreline unconstrained by a change in counties. If the geography column looked at both county and tract, the checker would have stopped at the Fairfield county/New Haven county border, there would have been one partial district for Fairfield county and one district for New Haven county, and we all would have been a lot happier.

Once again, the plugin can be found at https://github.com/johnpholden/qgisautoredistrict. May your results be interesting!