On our recent driving trip to Yellowstone and Montana, I had lots of time to think about random things while behind the wheel. One of them was to wonder of the major US Interstates, which two come the closest without actually intersecting? My guess was that it’s some place on the East Coast, but due to my general lack of knowledge of East Coast highways, I had no idea which two it is.
Being a huge dork, I decided to figure it out.
Basically, it’s actually not a very difficult thing to figure out. The steps are:
- Get the latitude and longitude coordinates for a number of points along each of the interstates.
- Determine which interstates intersect and eliminate those pairs.
- Put the coordinates for the interstates into a kD-Tree which will perform the search that determines the distance between non-intersecting highways in a fast way.
It turns out that the first step proved to be the hardest. I decided to use the data from the Open Street Map (OSM) project. This is a Google Maps-like website that is editable by anyone in the world, similar to Wikipedia. It will not give you directions like other mapping services, but it contains the geographical location of a wide variety of items, including and importantly (as the name suggests) roads. I looked into using the OSM APIs, but as far as I could tell either the APIs didn’t do what I needed in an efficient way, or the servers were down. So I simply downloaded the 82 GB XML (5 GB compressed for download) dataset for the United States.
Begin rant – feel free to skip to the next paragraph. I loathe XML. Any time that you have a 82 GB text file (apparently it’s 200+ GB for the whole world) as your main distribution method, you’re doing something wrong. Doing this project I learned as little about XML as I could to get just what I needed out of the file. Apparently the authoritative data is kept in a real database, but it appears that you can not download the data as a database. They do have a binary format description, but I can’t find a link to download the data in that format. Furthermore, the world doesn’t need yet another binary format. For example, they do not discuss endianness for their binary format on that wiki page, which is a big issue with binary formats. There are many other quality formats they could use (SQLite or HDF5). The binary format has a distinct Not Invented Here feel to it, which is nearly always a bad thing. Anyway, back to the main point of my rant. I don’t care that the 82 GB XML file compresses down to 5 GB. Reading a 82 GB text file when you’re searching for just a fraction of that data takes a long, long time, and is completely unnecessary. Every time I encounter XML it wastes my time in myriad ways. This time was no different. End rant.
I’ll spare you the full details and samples my low-quality Python code, but I munged the interstate data into a SQLite file, which distilled the data from 82 GB to 19 MB. Yes, that’s nearly four orders of magnitude smaller. Then I used the much more convenient (and fast) SQLite file to build lists of interstate coordinates, which were fed into the kD-Tree for the nearest neighbor searches. The results are shown below. Note that there is no I-50 or I-60, and I eliminated I-45 from consideration because it’s entirely within Texas, and therefore is not “major” in my opinion. I eliminated Hawaii’s H-1 for the same reason. I have included links to maps showing the great circle between the nearest points of the highways. For highways that intersect, the link goes to one of the (more or less random) points of intersection.
Finally, we can see the answer I was looking for. Interstates 70 and 95 come within 5 kilometers in Baltimore at the terminus of 70, but do not intersect. So my suspicion was correct that it was somewhere in the East, so I have that to feel good about.
p.s. If you really, really want to see the code I used for this, I can share it, but I’ll have to pull out the hamsters that have taken residence in it. They’re attracted to dusty littered places, you know.