This script reads in all the (Latitude, Longitude) pairs from the file(s). It also creates an empty list of ‘unique points,’ points separated by a distance set by the user. It then puts each point into the unique points list as a new point, or increases the count on an existing point depending on the separation distance. Here is a 1D example:
Start with a list of all the points: [1, 5, 10, 3, 77, 74, 34, 45] and our unique list, which is empty: . Let’s say our distance between points is set to 20. Start with 1, the unique list is empty, so 1 gets added straight away: [[1,1]]. Next, 5, which is only 4 away from 1, the count on 1 is increased to 2, and the unique list looks like [[1,2]]. 10 and 3 are also within 20 of 1, so now the list is [[1,4]]. 77 is far away, so it gets added: [[1,4],[77,1]]. 74 is added to 77, and 45 gets added to 34, and our final unique list is: [[1,4],[77,2],[34,2]]. This list is a mathematical way of saying we spend most of our time near 1, and equal parts near 77 and 34. The code does this in 2D, but the idea is the same as in 1D.
There’s a small wrinkle in all of this. If I applied the formula above, it is what is called an O(N2) operation. So if you double the number of inputs, you quadruple the amount of work involved in completing the formula. I have nearly 400,000 way points in my database. 400,0002 is 160 billion. Doing 160 billion calculations of this variety, even on a modern computer, might take a day or two. Which is a bit ridiculous, don’t you think?
So instead, as I read in the points, I calculate their distance from a central point, and bin them based on this distance. Then I do the O(N2) matching calculations inside each bin. It will depend on your choices, explained below, but this reduces the number of items in each bin to thousands, and changes the running time to perhaps a few minutes, and a good portion of that time is simply reading the data from disk.
The result is a little different using binning, but it’s a small price to pay for not waiting a long time for the results.
You’ll probably want to edit a few variables in the python file for your own needs. Use any decent text editor (so probably not made by Microsoft). The python distribution comes with one that does text highlighting. Look for this:
# ---- DEFAULT VALUES ----- #
# distance between radial bins in meters
radial_binsize = 200
# distance between farthest points in meters
dist_max = 1000000
# distance between unique points in meters
dist_unique = 100
# Set your own central point
# [SetLat,SetLon] = [32,-117]
# read in from XML file or Training History.gtc?
# put in 'XML' or 'gtc'
# Don't worry, on windows it should default to XML on its own.
read_in_type = 'gtc'
# mac os x only, set a specific gtc file, comment to use default location
#gtc_file = '/your/random/path/Training Center.gtc'
radial_binsize: This is the size of the annuli that surround the central point. Think of a bulls-eye, with concentric circles. This isn’t terribly important, but it should be bigger than dist_unique, and significantly smaller than dist_max.
dist_max: This is the radius (in meters) of the analyzed points. If you wish to analyze all your points, make it large, like the distance between the most remote places you’ve used your GPS. There is no major time hit in making this too large. If you wish to only consider one area, you can make this smaller, in conjunction with a manual [SetLat,SetLon].
dist_unique: This is the most important preference. It sets the size of the ‘halo’ around each unique point. The distance between unique points is no smaller than twice this value. Set it small and you’ll have much finer detail, but many more points. Having too many points will make the mapping portion break. For example, I initially found 40,000 unique points and that failed for me. Setting this value larger netted me 14,000 points, which works (albeit slowly, especially on Google Earth).
[SetLat,SetLon]: If you like, you can set a preferred central starting point. This is only important if you wish to restrict your search to a specific area, and have set dist_max accordingly. If you don’t care about this, you can comment it out with a ‘#’. It is commented out by default. Note: You can achieve nearly the same result by only exporting a few activities from GTC and running in XML mode.
read_in_type: Mac OS only: If you want to read in XML (.tcx) formatted files, enter ‘XML’ here. If you want to use your GTC database, leave it as shown above. This preference (should) have no effect on Windows.
gtc_file Mac OS X only: Set a manual location of the Garmin Training Center file. Comment it out if you want to use the default location. It is commented out by default.
Look for this near the top of the file:
# read in from XML file or Training History.gtc?
# put in 'XML' or 'gtc'
read_in_type = 'XML'
Like it says, you’ll need to have ‘XML’ there (the single quotes are important). (On windows, it should automatically default to XML mode, but you can change it to be sure). Next, place the python script in the same directory as all your .tcx files. Then run the script by double clicking on it.
If you’re trying to use this script on data from a random GPS, XML is probably the easiest way to make it work. You’ll probably have to do a bit of code wrangling, however.
There should be no setup necessary (besides setting the variables, see above) if you want to run GTCMashup.py on your entire history. If you want to run on only a part of your history, you’ll either need to run using XML mode on a set of XML files, or set [SetLat,SetLon] and an appropriate dist_max.
The program outputs the unique list as a text file, “UniquePoints.txt.” This is formatted in three comma-separated columns, Lat, Lon, and ln(count+2). I chose logarithmic instead of linear because places I go often, I go really often, and I want to compress the scale and have more sensitivity at the low end. This file is formatted in a way gpsvisualizer.com can read.
You should upload your data to this page. In my experience, the settings shown here are good starting points. If you have more than 300 points, Google Maps will probably not work very well.
The XML mode assumes that the XML has lines like this, with the Latitude before Longitude. If you convert your data into this format, it should work. Alternatively, you could edit the script to search a text file of your preferred format.
The GTC mode assumes that the units stored in the database for Lat/Lon are multiplied by 11930464.71 over standard units. So (32,-117) in real units is (381774870.72,-1395864371.07) in the database. I don’t have access to Garmin’s internal code, so I can’t be sure this is true, but in my testing it seems to work. I would love to know if this is incorrect.
Since this does counting by point frequency, places you move slowly will have a bit of a positive bias over places you go fast. There are ways around this, and I may fix that in the future.
Here is another script to extract data out of your Mac OS X GTC file. I didn’t use that script as the base of this code, but it did inspire me to add the SQLite stuff.
At some point I think I’d like to improve this code to actually build a network of sections and nodes, with directionality. So I could build a map showing how many times I’ve gone up a hill, or how many times I’ve passed a certain intersection.
More immediately, at some point I should clean up some of my inconsistencies in the code, like how I’m not using the Points class everywhere I should.