Mannheim Steamroller - Christmas In The Aire

This week the top un-reviewed album is not Christmas In The Aire by Mannheim Steamroller, it is a Garth Brooks album. As I've already discussed, Garth Brooks doesn't allow his catalog on most music streaming services. I don't care enough about Garth Brooks to expend effort to listen to him, so we'll drop down to the #4 album.

I was getting a little bit worried that I would not have any Christmas music to review this year. We're just over a week away from Christmas and by this time last year I had reviewed two (1, 2) Christmas albums. And whoo boy is this some kind of Christmas music.

I wasn't previously familiar with Mannheim Steamroller, or at least if I've heard their music before, I never made the name association. Outside of deliberate parodies or joke albums (like this K9 Tunes Christmas album, which looks like it's AI generated, so it's triply bad), this is the absolute worst kind of Christmas music. It is highly electronic, uninspired, and super lame. It's the kind of music that would be played in a 1980s comedy film that takes place at a ski resort during the slapstick scenes of people falling down the mountain. Just simply horrible. Avoid this album, and I suspect everything Mannheim Steamroller has ever recorded. I'm not brave enough to find out by listening to more of their albums.

I want to mention the use of the extra "e" on "Aire." It appears that Mannheim Steamroller uses "Aire" as their virtual trademark. Many of their albums are named "Fresh Aire." I'm glad this differentiates from the better NPR Fresh Air program, but the extra "e" is still stupid. It reminds me of a local road here in Boulder I often ride on my bike, Olde Stage Road. There's a sign near the road that omitted the extra "e" (because, duh, it makes no sense) which some concerned citizen fixed in the most hilarious and cheap way possible:

Olde Stage Road


The Beatles - Anthology 1

The most remarkable thing about this week's #1 album is that it featured the first new Beatles song in 25 years, Free as a Bird. The rest of the album are outtakes, live versions, and short spoken interview quotes by people in and associated with the band.

Like the last Beatles album I reviewed this is not for casual listening and clocks in at over two hours long. You would not want this to be your introduction to The Beatles. In fact, I would urge anyone to listen to the entirety of their catalog before listening to any of the Anthology albums. It's important to have the context of what these songs ultimately sounded like when The Beatles perfected them.

This is not an album that needs to be listened to often and repeatedly. There's a reason why The Beatles worked as hard as they did to perfect their songs, and the polished versions are the what you'll want to hear again and again. In the end, because it's the freakin' Beatles, it is worth checking out when you're in the right mood and have the time.


Waiting To Exhale Soundtrack

This week's top un-reviewed album is at #1 by R. Kelly, and I'm not going to review that for obvious reasons. Instead I'll drop down to #4 for the soundtrack for the film Waiting to Exhale.

If you know anything about me, you should be able to guess that I have never seen the movie, and as far as I know I have never listened to the soundtrack before. The soundtrack is full of heavy-hitters like Whitney Houston, Toni Braxton, Arethra Franklin, and Mary J Blidge. Combined with the popularity of the movie, the album sold quite well, was reviewed positively, and won multiple awards.

I think I kinda sorta remember hearing a few of the songs off the album, but it's been years since I heard any of them. This is one of those albums that I can tell has quality music but I don't care about it. It does nothing for me so I can only offer a neutral opinion with no suggestion one way or the other.

Futurama Neutral


ETL Pipeline Improvements

One of my primary responsibilities at my current job is ownership of the ETL pipeline that brings in the data upon which we run our business. Every day it processes hundreds of gigabytes of data, cleaning and normalizing it, and outputting it in several different forms.

For a few years I have been using a Hadoop-based mrjob pipeline on top of AWS EMR. It replaced a (I kid you not) Redshift-based pipeline that was hugely expensive (that I didn't write). It's been very reliable. For the last year or two the only failures have been when AWS's systems have had issues, something I can't do anything about. Despite this reliability, it hasn't been perfect. The biggest problem with it is that it's expensive to run. There are few reasons why it's been so expensive:

  • EMR adds a 25% upcharge on all resources used. It is reasonable for AWS to charge something because EMR is a useful service with added value. However, in my opinion, 25% is too high for what it does. AWS already gets paid plenty because you're using their other services underneath it (mainly EC2 virtual computers and EBS block storage), so the extra mark-up feels like a big cash-grab. It is, of course, entirely possible to run Hadoop on AWS without using EMR. But it is a hassle, and it's likely that AWS has figured out that 25% is the inflection point between too expensive and not worth the hassle

  • Hadoop isn't the most efficient way of doing things. Newer tools, notably Spark, have surpassed Hadoop in both speed and features. I originally used Hadoop because I wasn't happy with how Spark needed quite a bit more memory than Hadoop for a similar operation, but over time I became less and less satisfied with my inability to speed up certain parts of the process

  • AWS offers spot compute instances, which are virtual computers that are significantly cheaper than on-demand instances. The difference is that on-demand nodes are yours as long as you want them, while spot instances can be taken away at any time with only two minutes warning.

    One of Hadoop's killer features is that during a MapReduce cycle, if one or more worker nodes goes away (for whatever reason including spot node removal), in most cases it can recover and redo any lost results.

    However, the EMR pipeline used a multi-step MapReduce process. Unfortunately, Hadoop cannot recover lost results from earlier, fully completed MapReduce cycles. This means that in order to run the pipeline, it had to have enough on-demand instances that the data that needed to be preserved across cycles fit on them. This raised the cost considerably when compared to an all-spot pipeline run

Earlier this year I decided it was time to start looking at how to rewrite the pipeline to dramatically lower costs. I saw two possible ways forward:

  • Choose a modern, high performance tool like Spark/PySpark or Dask that would still run on EMR but hopefully would be much faster

  • Abandon EMR entirely (and its 25% surcharge) and write something that could run completely on spot instances and use S3 for storage, which is four to five times cheaper than EBS

After some thought, I decided that the second option was the better choice. If I could figure it out, it offered the best possible outcome. The pipeline runs once per day, meaning that it has 24 hours to finish before the next run needs to start. Ultimately, high performance was less important than lowering costs.

I have been using Polars quite a bit and have been (mostly1) impressed with its speed and functionality. It is written in Rust, which is one of the fastest programming languages. Polars has a Python-facing API as a first-class member of the project. Rust has an ever-growing library of packages that I've found are high-quality and well documented (in contrast to my experience with cough R packages). The crucial difference between Polars and Hadoop/Spark/Dask is that Polars runs on only one node at a time (it can and does use all the CPU cores), while all of the others can run on multiple nodes. If I could figure out how to slice up the pipeline into chunks that would work on separate instances, I believed I could use Polars in place of Hadoop.

Jumping to the end of the story, I was able to convert the pipeline to use Polars, and to great success. I use a simple pattern for each step. An orchestration process builds a list of work which is submitted to SQS. An EC2 spot fleet is created which launches workers that consume the work. The input and output of the work is stored on S3. The workers send success or failure messages to a callback queue on SQS, which is monitored by the orchestration process. If a spot node goes away interrupting work, the work will be picked up by a different node once the message returns to availability. Once the work is done for a step (i.e. all work has generated a callback), the orchestration process kills the spot fleet and continues to the next step (or sends an error message for a human to figure out).

The bottom line is that the cost has dropped by roughly 85%, primarily due to the following reasons:

  • Polars has the concept of LazyFrames which are data objects that are not realized in memory nor computation until Polars is told to do so. Operations and filtering can be applied to them and Polars can do the work in parallel with efficiency tricks that overall increases speed without loading the whole dataset into memory at once. The combination of sink_parquet with PartitionByKey is effectively a MapReduce operation that is much faster than Hadoop on similar hardware

  • AWS has "regions" and "availability zones (AZs)" which are the physical locations where cloud compute happens. Each AZ is a distinct data center (or close by data centers) within a region. When running an EMR job, you are restricted to a single AZ largely because AWS charges for cross-AZ data transfer, and EMR jobs are very loquacious across the network. There's also increased network latency between AZs. Running and EMR job in multiple AZs would hugely impact performance and cost.

    Because the new pipeline reads from and saves data to S3, and there are no cross-AZ charges for accessing S3 within a region, it doesn't matter which AZ the workers run in. This means that the spot fleets can target all AZs within the region, unlike EMR

  • When launching an EC2 fleet, you must specify one or more launch templates, which describe how to launch each instance in terms of OS and installed software. AWS EC2 offers instances using x86 processors from Intel and AMD, and ARM instances using AWS Graviton processors. Conveniently, the pipeline doesn't require any processor-specific features. Therefore, I created two AMIs, one for each of x86 and ARM, which allows the spot fleet to target any and all of Intel, AMD, and Graviton instances

  • The pipeline requirements for each step basically comes down to number of CPU cores and amount of RAM, more or less of each depending on what the step is doing. The upshot of all of the above is that for a given step all the pipeline cares about is the resources of the node, not what kind of node it is. Of course, not all instances are the same speed, but the cost of an instance is roughly proportional with its speed, so it all works out. This means that for a given step, across all AZs and EC2 instance types, there can be over 100 distinct resource combinations to pick from. This basically guarantees spot availability at all times

  • The pipeline uses a fair number of User Defined Functions. Polars supports UDFs written in Python, Numba, and Rust using PyO3. By using the latter two, basically all of the inner loops and heavy computation in the pipeline happens in compiled C or Rust. This, in my opinion, is a really nice way of doing things. Let Python handle moving data around and high-level stuff, and run all the heavy computation in compiled code.

Overall, I'm very pleased about the results of this work. The goal was to save money, and it has done that. I wasn't expecting 85% savings (I'm not sure what I was hoping for), but I feel quite good about that.

  1. Polars definitely has some frustrating issues, like this one or this one ↩︎

Alice in Chains - Alice in Chains

It is unfortunate that I had a thirteen year gap in my listening project. If I had not stopped, or restarted earlier, I might have reviewed Alice in Chains two best recordings: the 1992 album Dirt and the 1994 EP Jar of Flies. They are two of my favorite albums/EPs; I have listened to tracks off them nearly 700 times.

In stark contrast, I have barely listened to the eponymous Alice in Chains album. Listening to this album again I am not upset at my lack of plays. It is nowhere near as good as Dirt and Jar of Flies. On Tidal, of the top thirty tracks by plays for Alice in Chains, only three songs are off the album Alice in Chains. I do not have a Spotify account so I can only see the top ten, but there are zero songs off Alice in Chains on that list.

I can't endorse listening to the album Alice in Chains, but I can strongly suggest giving Dirt and Jar of Flies a play. Dirt, in particular, is one of the greatest grunge albums of all time. The first chord and lyric of first track, Them Bones, hits you hard and grabs your attention like few other songs do.


Tha Dogg Pound - Dogg Food

This week's #1 album, Dogg Food, is by the Snoop (Doggy) Dogg-adjacent Tha Dogg Pound. Apparently the album has done quite well, having sold over two million copies as of 1996. Despite this success, I can't say that I recall ever hearing any of the songs before.

I have no strong opinions about this album. It's pretty typical hip hop from the era, nothing outstanding, and nothing horrible. It didn't grab my attention at all. I'll probably never listen to it again.


Smashing Pumpkins - Mellon Collie and the Infinite Sadness

This week (thirty years ago) one of the all time great albums hit #1 on the charts in its first week of sales. It would go on to sell over 10 million copies, making it one of the best selling albums of all time. And deservedly so. Mellon Collie and the Infinite Sadness by The Smashing Pumpkins is one of my favorite albums. According to last.fm, I have played a song off the album over 600 times.

I remember when the lead single off the album, Bullet with Butterfly Wings, hit the radio. Of course it's a banger, but what I remember thirty years later is mishearing the line "despite all my rage, I am still just a rat in a cage." I could make out the "despite all my rage" part, but I couldn't quite figure out the second half. I think I had some nonsense words there, but it's been so long since I learned the correct words that I've forgotten what I thought the words were!

Another single off the album, 1979, was cool because that's the year I was born. The song is about entering adolescence, and in 1995 I was in the throes of adolescence myself. A great coincidence!

All of the big singles off this album still get plenty of radio plays, but the whole album deserves to be listened to in its entirety. I did not find listening to this album another time for this project a chore or unpleasant. Indeed, it was a pleasure, and I look forward to listening to the album again and again for years to come. You should listen to the album today, and again and again.


Agument Code

Last month the company I work for purchased a subscription to Augment Code for all its developers. Augment Code is an AI coding engine that is broadly similar to Claude Code, which I played around with a few months ago. You can read the linked post, but the summary is that I came away mostly skeptical about AI coding. However, I am not a luddite, and am willing to learn new tools and try things again, so I installed the Visual Studio Code Augment plugin and have been giving AI coding another shot.

What I've learned is that AI coding agents are more useful than I gave them credit previously if you give them small enough jobs that are well-defined. Asking it to do too much, which perhaps I did in my earlier blog post, is not (yet) what it's good at. Augment code has a few modes. There is a chat box in which you can conversationally interact with the AI, asking it questions or giving it tasks to complete. The agent will also give autocomplete suggestions as you type new code, which I'd say is mostly helpful, but not always. Sometimes the suggestions are just plain wrong, but a few times the suggestions have been subtly wrong, which is dangerous.

Here are some example tasks that Augment Code has been at least 90% successful at:

  • In one of my Python projects, I asked it to create a file that tries to import all the packages used in the project, and output which packages do and do not import successfully. This project uses a few custom & private packages I keep elsewhere, so a requirements.txt file doesn't work with pip. Having a file I can run to quickly check I installed all the packages is useful. It did a pretty good job of this, but it did miss one import from one file, probably because the import wasn't near the top of the file (which is an anti-pattern, but it's there for reasons)
  • It is quite good at adding type hinting to Python projects. You can ask it to "add type hinting to all functions in all Python files in this directory" and it does it
  • I have a PyO3 project that it successfully threaded/parallelized using Rayon. I had to be very specific about how the inputs and outputs would change, but with that it did a good job. It wasn't perfect. Instead of using lightweight vector slices, it was creating new vectors for each chunk of parallel work, which involves copying memory. When I suggested a change, it did a good job fixing that oversight
  • I am an unapologetic user of Mercurial. The rest of the world uses Git. Kind of like Mac and Windows, Mercurial and Git are 99% the same in what you can do with them, but they differ in methods and style. In fact, there is a Mercurial plugin that allows perfect 1:1 interoperability between the two, which I use. Sometimes I don't want to bother installing Mercurial on a temporary virtual instance that already has Git installed, and I want to do something quick in Git. I've asked the AI agent to translate a Mercurial command to Git and it's done a fine job

It appears that my company has a (grandfathered) $50/mo/user plan, which has jumped to $60/mo/user now for new purchases. I would say that it has saved me enough time to justify that price. The real question is how much the service actually costs? The AI industry is spending so much money that "to recoup their existing and announced investments, AI companies will have to bring in $2 trillion (every 2-3 years), more than the combined revenue of Amazon, Google, Microsoft, Apple, Nvidia and Meta." It feels like the early days of Uber, where the fares were subsidized by venture capital, and once most competitors were vanquished, prices went up. To reach $2 trillion in revenue, how high would prices have to go? There are currently a bit over 8 billion humans on earth, which means that over 3 years, the whole AI industry will need to take in about $80 from each and every person on earth per year. That is a ridiculous number and will not be reached any time soon.

My opinion is that AI does has some value, but not nearly as much as it costs in real terms. I'll use it if it works for me. But I won't rely on it.


Janet Jackson - Design of a Decade 1986/1996

Rising to #3, Design of a Decade 1986/1996 is a greatest-hits album of Janet Jackson.

When it comes to the music, I don't care about it. I've heard a few of the songs before, likely contemporaneously, and also since 1995. Janet Jackson had been a big star for years, so of course I've heard some of her music. It's solid pop music, I concede, but it's not for me.

What I find more interesting is the name of the album: Design of a Decade 1986/1996, what does that even mean? I do not believe that Janet Jackson planned (designed) a whole decade of music, recorded it, with the target of a greatest hits collection. Also, the title is factually wrong! The album was released in early October 1995, nearly three months before 1996 even started. 1986 to 1995 represents ten years (aka a decade, count it out). Putting 1996 in the title is both illegitimate and numerically incorrect. The only thing it has is alliteration, which isn't much (lots of words start with the same letter as other words). It's a stupid name. Stupid!

My verdict: This album can be left to the dustbin of history.


Green Day - Insomniac

After last week's album, some Green Day is very welcomed. Coming off their smash hit Dookie, Insomniac was return to harder, less poppy punk music. Debuting at #2, it never reached #1 and didn't sell anywhere near as well as Dookie (although few albums do).

I don't know why but I haven't listened to Insomniac that much over the years. I do like punk music, and listening to it now it's in my wheelhouse of musical taste. I think I'll add it to my rotation and I'll try to revisit it more often. This is a win and why I do this project. I suggest that you check out Insomniac, too!


Mariah Carey - Daydream

Sigh, it's another Mariah Carey album. I'm pretty sure this is the third Mariah Carey album I've done. This is gonna hurt.

Rocketing to #1 upon release, Daydream had three singles reach number one on the US Billboard Hot 100 list. Because of the popularity of the songs off this album, I certainly recall hearing them on the radio.

I don't miss hearing the songs as frequently on the radio, and I didn't appreciate hearing them again. I get that Mariah Carey is very popular and successful, but I will be happy to never review one of her albums again. It's not for me.


AC/DC - Ballbreaker

Ballbreaker by AC/DC debuted at #4 on the album sales chart.

There's nothing particularly special about this album. None of the songs are amazing, the lead track Hard as a Rock was the best performing single and only hit #33 in the US. That being said, none of the songs are terrible, which across a whole album is somewhat unusual. That's probably because the band had over 20 years of experience.

As far as it goes, the market agrees with me. Next week the album drops to #9, then #12, and so on. This is a common pattern with older, established acts on the charts. They sell well to start but there's no long tail of people discovering them keeping the album sales high.

If you're into AC/DC, check it out, but don't expect much.


Tim McGraw - All I Want

Debuting at #4, All I Want is the second Tim McGraw album I've reviewed. I closed my previous review with the sentence "Finally, I will not be listening to this album again," and I have stuck to that promise.

In a very real sense, All I Want is really no different than Not A Moment Too Soon. The second line of the first song is "Finally own a car that doesn't break down on the freeway." He wasn't pushing any musical boundaries with this release. He had a formula that worked well 17 months prior, and he stuck to it. I'll begrudgingly admit that the biggest hit off the album, I Like It, I Love It, is good fun.

Overall, my opinion of this hasn't changed, and like before, I will not be listening to this album again.


Red Hot Chili Peppers - One Hot Minute

It's been a while since I reviewed an album by any artist(s) I actually like, therefore reviewing a Red Hot Chili Peppers album is a big relief. As of this writing they are #8 in my rankings of listens by artist.

One Hot Minute (the #4 album this week) was released four years after RHCP's best album, Blood Sugar Sex Magik, which was always going to be a tough act to follow. One Hot Minute has never been my favorite RCHP album. It isn't bad at all, but compared to Blood Sugar Sex Magik it was always going to pale. Interestingly, the internet ranks One Hot Minute all over the place, from worst, to fourth, and spots in between. I'm not knowledgeable enough about the RHCP corpus of work to say where One Hot Minute fits.

There were a few decent hits off of the album, including Warped, My Friends, and Aeroplane, all of which I remember hearing contemporaneously on the radio, and sometimes still do on "classic rock" stations (a thought that makes me feel old!).

This album is probably worth your time, as are all RHCP albums.


Silverchair - Frogstomp

I think there is only one song off of this week's #9 album by Silverchair that I've heard before. Tomorrow off of Frogstomp got a decent amount of airplay when it came out. The rest of the album seemed new to me, and my last.fm history pretty much confirms it.

Silverchair definitely rode the tails of the Grunge wave, and they did a good enough job of it to sell a couple million copies of Frogstomp. I don't think that I will increase my Silverchair consumption a great deal, but perhaps I'll think it more often when I'm in the mood for some Grunge. You might consider the same!