How I Got a 327x Speedup Of Some Python Code

I don't usually post code stuff on this blog, but I had some fun working on this and I wanted to share!

My colleague, Abhi, is processing data from an instrument collecting actual data from the real world (you should know that this is something I have never done!) and is having some problems with how long his analysis is taking. In particular, it is taking him longer than a day to analyze a day's worth of data, which is clearly an unacceptable rate of progress. One step in his analysis is to take the data from all ~30,000 frequency channels of the instrument, and calculate an average across a moving window 1,000 entries long. For example, if I had the list:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

and I wanted to find the average over a window of length three, I'd get:

[1, 2, 3, 4, 5, 6, 7, 8]

Notice that I started by averaging [0, 1, 2]=>1, as this is the first way to come up with a window of three values. Similarly, the last entry is [7, 8, 9]=>8. Here is a simplified version of how Ahbi was doing it (all of the code snippets shown here are also available [here][1]):

def avg0(arr, l):
    # arr - the list of values
    # l - the length of the window to average over
    new = []
    for i in range(arr.size - l + 1):
        s = 0
        for j in range(i, l + i):
            s += j
        new.append(s / l)
    return new

This function is correct, but when we time this simple function (using an IPython Magic) we get:

# a is [0, 1, 2, ..., 29997, 29998, 29999]
a = np.arange(30000)
%timeit avg0(a, 1000)
1 loops, best of 3: 3.19 s per loop

Which isn't so horrible in the absolute sense -- 3 seconds isn't that long compared to how much time we all waste per day. However, I neglected to mention above that Abhi must do this averaging for nearly 9,000 chunks of data per collecting day. This means that this averaging step alone takes about 8 hours of processing time per day of collected data. This is clearly taking too long!

The next thing I tried is to take advantage of Numpy, which is a Python package that enables much faster numerical analysis than with vanilla Python. In this function I'm essentially doing the same thing as above, but using Numpy methods, including pre-allocating space, summing values using the accelerated Numpy .sum() function, and using the xrange() iterator, which is somewhat faster than plain range():

def avg1(arr, l):
    new = np.empty(arr.size - l + 1, dtype=arr.dtype)
    for i in xrange(arr.size - l + 1):
        new[i] = arr[i:l+i].sum() / l
    return new

This provides a healthy speed up of about a factor of eight:

%timeit avg1(a, 1000)
1 loops, best of 3: 405 ms per loop

But this still isn't fast enough; we've gone from 8 hours to just under an hour. We'd like to run this analysis in at most a few minutes. Below is an improved method that takes advantage of a queue, which is a programming construct that allows you to efficiently keep track of values you've seen before.

def avg2(arr, l):
    new = np.empty(arr.size - l + 1, dtype=arr.dtype)
    d = deque()
    s = 0
    i = 0
    for value in arr:
        d.append(value)
        s += value
        if len(d) == l:
            new[i] = s / l
            i += 1
        elif len(d) > l:
            s -= d.popleft()
            new[i] = s / l
            i += 1
    return new

And here we can see that we've cut our time down by another factor of 4:

%timeit avg2(a, 1000)
10 loops, best of 3: 114 ms per loop

This means that from 8 hours we're now down to roughly 13 minutes. Getting better, but still not great. What else can we try? I've been looking for an excuse to try out Numba, which is a tool that is supposed to help speed up numerical analysis in Python, so I decided to give it a shot. What makes Numba attractive is that with a single additional line, Numba can take a function and dramatically speed it up by seamlessly converting it into C and compiling it when needed. So let's try this on the first averaging function:

@jit(argtypes=[int32[:], int32], restype=int32[:])
def avg0_numba(arr, l):
    new = []
    for i in range(arr.size - l + 1):
        s = 0
        for j in range(i, l + i):
            s += j
        new.append(s / l)
    return np.array(new)

In the line beginning with @jit, all I have to do is describe the input and the output types, and it handles the rest. And here's the result:

%timeit avg0_numba(a, 1000)
10 loops, best of 3: 21.6 ms per loop

What is incredible here is that not only is this roughly 5 times faster than the queue method above, it's a ridiculous 147 times faster than the original method and only one line has been added. We've now reduced 8 hours to about 4 minutes. Not bad!

Let's try this on the second averaging method, which if you recall, was substantially better than the original method:

@jit(argtypes=[int32[:], int32], restype=int32[:])
def avg1_numba(arr, l):
    new = np.empty(arr.size - l + 1, dtype=arr.dtype)
    for i in xrange(arr.size - l + 1):
        new[i] = arr[i:l+i].sum() / l
    return new
%timeit avg1_numba(a, 1000)
1 loops, best of 3: 688 ms per loop

That's interesting! For some reason I don't understand, this is actually slower than the un-optimized version of avg1. Let's see if Numba can speed up the queue method:

@jit(argtypes=[int32[:], int32], restype=int32[:])
def avg2_numba(arr, l):
    new = np.empty(arr.size - l + 1, dtype=arr.dtype)
    d = deque()
    s = 0
    i = 0
    for value in arr:
        d.append(value)
        s += value
        if len(d) == l:
            new[i] = s / l
            i += 1
        elif len(d) > l:
            s -= d.popleft()
            new[i] = s / l
            i += 1
    return new
%timeit avg2_numba(a, 1000)
10 loops, best of 3: 77.5 ms per loop

This is somewhat better than before, but still not as fast as avg0_numba, which comes in at roughly 20ms. But what if I really try hard to optimize the queue method by using only Numpy arrays?

@jit(argtypes=[int32[:], int32], restype=int32[:])
def avg2_numba2(arr, l):
    new = np.empty(arr.size - l + 1, dtype=arr.dtype)
    d = np.empty(l + 1, dtype=arr.dtype)
    s = 0
    i = 0
    left = 0
    right = 0
    full = False
    for j in xrange(arr.size):
        d[right] = arr[j]
        s += arr[j]
        right = (right + 1) % (l+1)
        if not full and right == l:
            new[i] = s / l
            i += 1
            full = True
        elif full:
            s -= d[left]
            left = (left + 1) % (l+1)
            new[i] = s / l
            i += 1
    return new
%timeit avg2_numba2(a, 1000)
100 loops, best of 3: 9.77 ms per loop

A ha! That's even faster, and our 3.19s are now down to 9.77ms, an improvement of 327 times. The original 8 hours are now reduced to less than two minutes. I hope you've enjoyed this as much as I have!

Update:

After chatting with a couple of my colleagues, this appears to be the fastest way to do this kind of operation:

from scipy.signal import fftconvolve
import numpy as np
a = np.arange(30000)
b = np.ones(1000) / 1000.
%timeit fftconvolve(a, b, 'valid')
100 loops, best of 3: 6.25 ms per loop
more ...

Bye Bye Google Reader

Here's what I see when I look at my Google Reader stats:

Since May 22, 2007 you have read a total of 291,781 items.

That count will go no higher.

Google has decided to retire Reader on July 1st. Reluctantly, I have decided that I will not be using Reader on my six year anniversary. As a test, for the last week I have been exclusively using tt-rss installed on this web server as a replacement. So far it has been working pretty well. It's not a perfect replacement for Reader, but Reader isn't a perfect replacement for what Reader used to be, either. And unlike Reader, since it's self-hosted, no one will be taking it away from me.

If you want an account on my server, and you are someone I know personally, don't heistate to ask!

Update: I no longer run tt-rss. As of late, I have been using Feedly.

more ...

Yet Another Snowy Picture

You really need to click on the image and see the full-sized version!

Tour de France

more ...

Christmas in Boulder, 2012

Melissa and Stephen at Dowdy Draw

We have a tradition (2010, 2011) of going for a hike on Christmas eve in Boulder. This year we decided to keep it easy and go for a stroll/hike (Melissa calls it a "strike") at Dowdy Draw open space south of Boulder.

Melissa and Stephen with the mountains

Yesterday evening it started snowing, and by this morning there was roughly eight inches of fluffy snow on the ground, giving us a true white Christmas!

A snowy panorama of Boulder

more ...

Snowy NIST

Snowy NIST

It's been over five months since I last posted on this blog! It seems appropriate that I come back with the counterpoint to the last photo I posted, this time showing the snowy Flatirons fronted by NIST.

Who knows when I'll post next!

more ...

Foggy Flatirons & New Commute

Cloudy Flatirons

Since we have moved to our new house at the other end of town, my ride to campus is a bit different. In some ways I don't like it as much as my old route because it is next to Broadway for most of the way. Broadway is a very busy street, it's noisy and unpleasant. My old route did go for a short while on a busy street, but most of the time it was on roads in quiet neighborhoods or on bike paths.

Note that I wrote "next to Broadway" above. In fact, my new route is almost entirely on grade-separated bike paths/sidewalks (it's legal here for adults to ride on sidewalks some places) or frontage roads next to Broadway, and I actually never ride on Broadway. The big bonus on my new route is that the views are far superior. Above is a photo of the view looking over NIST towards the fog-enshrouded Flatirons. They are not often covered in clouds like this, but it had rained heavily the day before and the air was still humid. Also note that the grass in the foreground is brown. Before our heavy rains we had two weeks of record-setting heat. It's been a feast or famine summer in Boulder.

more ...

Tour Pool 2012

Tour de France

Just a quick note: It's time again for the 2012 edition of the Tour de France pool hosted by Kris Wells. I am the scorekeeper, which I have been for several years.

more ...

Working Conditions

Pic

Of late, there has been a great deal of construction in and around my office. Beginning two weeks ago my office cluster has been repainted, doors refinished, window shades replaced, some carpet has been replaced, and some wires and pipes have been installed and moved. There is still some more work to be done.

Beginning yesterday, and continuing for the next four to six weeks, the scoreboards for the stadium are being replaced. Pictured above is a crane removing pieces of the old one (notice the asymmetry in solid black/shaded regions). This crane is more or less just outside my window, which, as you can imagine, is kind of neat for the first five minutes, and then gets tiresome and annoying after that. The construction has also closed off large amount of space in front of the offices which is inconvenient for us, and they even cut down two perfectly healthy trees, presumably to make space for the crane (insert tree-sitter joke here).

As a matter of fact, the conduit carrying the control wires for the new scoreboards passes by my office. The next time Stanfurd plays the Buffs, don't be surprised if - wait, I'm saying too much.

more ...

Some Walker Ranch Pictures

Here are a few pictures from my ride today at Walker Ranch.

Tour de France

Above: Such a lovely trail!

Tour de France

The wildflowers are out in force.

Tour de France

When I was taking pictures, the California Zephyr rolled by on its westward journey to Emeryville. The full-sized image shows the train better than the preview. This is the train that Melissa and I took last Thanksgiving, and I remember seeing the Walker Ranch trail from the train.

more ...

Giant Cowboy & Longs Peak

Cowboy Longs Peak

Cowboy Longs Peak

Yes, that's a giant cowboy in front of someone's home, with Longs Peak in the background.

more ...

24 Minutes By Bicycle From Our Front Door

Bow Mountain Road

It's too bad the smell of fresh mountain air, cleaned by a recent rain/snowstorm, can't be captured and transmitted digitally.

more ...

My Colorado Rides

Below is a kernel-density map showing where and how often I've ridden my bike around Boulder in the last year and a half. I've posted things like this before for my rides in California. The color map (the legend is in the bottom-left corner) indicates that anywhere there's a green dot, I've cycled through there at most a few times, and anything between blue and purple are my usual routes. There are a few red points mainly centered around where I live, which corresponds to places I've been through a few hundreds of times.

Looking at the map, I'm a bit surprised at the amount of exploring I've done out in the plains. I didn't realize I had done that much out there. I also want to note that most of the dead-end segments are due to geography - the roads end where the segments do.

Colorado Rides

more ...

Valmont View

Valmont

Valmont View

Taken here on Saturday the 25th of February.

more ...

Snowshoeing Bear Creek Trail

Boulder

Here are a few photos I took on my snowshoeing hike in the mountains behind NCAR. As always, click to enlarge an image. Unfortunately, I broke one of my nearly brand new snowshoes. Luckily, I was nearly back to the car when it happened. By the way, today's high was -7C (20F).

Above is a panorama of some lesser Flatirons.

Boulder

The narrowly beaten path between the Aspens.

Boulder

A snowy view. Gotta love Boulder!

more ...

Boulder Creek in February (Again!)

Boulder Creek

It's presently snowing. We've received around half a foot since last night. This is on top of the 22 inches that fell late last week. Boulder is a very snowy place right now!

This is the thirteenth photo I've posted in this series, and was taken only two days short of a full year after I began the project. I think I will cease taking photos of Boulder Creek every month. However, I will continue to post photos that highlight the natural beauty and change of seasons around Boulder.

more ...