Supercomputers
Faithful readers will remember me posing with my favorite supercomputer about a year ago. Datastar is going to be turned off in a few months. When it was turned on three years ago, it was the 35th fastest computer in the world, it has since slipped to 473rd. Despite the fact it's no longer the fastest thing around, it works wonderfully, and as I write this, there are at least sixty people logged onto this machine. Everyone I know loves Datastar, and wishes it wasn't going to be turned off. I am starting to move my work and attention to the newer machines. They are faster, and have many more processors, which makes queue times short (which is the time it takes for a job I request to run)
A few months ago, Ranger was turned on. It is a Sun cluster in Texas with 63,000 Intel CPU cores. It is currently ranked fourth fastest in the world. Datastar has only 2528 CPUs (but those are real CPUs, while Ranger has mutli-core chips which in reality aren't as good). By raw numbers, Ranger is an order of magnitude better than Datastar, except that Ranger doesn't work very well. Many different people are seeing memory leaks using vastly different codes. These codes work well on other machines. I have yet to be able to run anything at all on Ranger. For all intents and purposes, Ranger is useless to me right now.
If you look at the top of the list of super computers, you'll see that a machine called Roadrunner is the fastest in the world. Notice that it is made up of both AMD Opteron and IBM Cell processors. The Cell processor is the one inside Playstation 3s. Having two kind of chips adds a layer of complexity, which makes the machine less useful. The Cell processor is a vector processor, which is only awesome for very specially written code. The machine is fast, except it's also highly unusable. I don't have access to it because it's a DOE machine, but a colleague has tried it and says he got under 0.1% peak theoretical speed out of it. Other people were seeing similar numbers. No one ever gets 100% from any machine, but 0.1% is terrible.
Computers two and three on the list are DOE machines, so I don't have access to them. On the near horizon is a machine called Kraken, in Tennessee. It's being upgraded right now, but when it's complete it will be very similar to, but faster than the fifth fastest computer on the list currently, called Jaguar. It is a Cray XT4 that runs AMD Opteron chips. I got to use Kraken recently while it was still an XT3, and it was awesome. Unlike Ranger, it actually works. As an XT4, it should be even faster than Ranger. It will also have a great tape backup system, unlike Ranger.
I am predicting that Kraken will be come my new favorite super computer, replacing Datastar. However, I think it's a shame that Datastar is being turned off even though it's still very useful and popular. When it's turned off to make way for machines like Ranger and Roadrunner(*), that's just stupid.
(*) The pots of money for Ranger, Datstar and Roadrunner are different, but you get the point. Supercomputers aren't getting better; in some cases, they're getting worse!
more ...Hawaiian GT-40
Parked in front of my building this morning is a Hawaiian Ford GT, which retails for the suggested price of $139,995. Someone has to be serious about their car to bring it all the way from Hawaii. They even have driving gloves that match the seat belts.
This is the first time I've seen one parked, I never realized how deep the vents behind the radiator are in the hood.
Like all expensive cars, the motor has its own window.
more ...My Lacerated Digit
Be careful around those sharp coconut milk can tops, or else you could cut your finger. Then you'd have to go to the emergency room to get your finger glued and a tetanus shot.
I'm just saying, I think it's best if you take my word for it.
more ...Music History Graph
Above is a small part of my music listening history as reported to last.fm over the last year and a half. Time is plotted left to right, overall number of tracks by the width of the shape, and the colors represent individual artists. I used LastGraph3, which if given your user name will make a set of graphs from your data.
If you click on the image above, you'll see the full history. It looks like I go through periods where I listen to a fair bit of music, and then stop, and start again. I think there's a fair amount of smoothing of the data. I think my history would look even more jagged without smoothing.
I like plots like this because they show multidimensional data using colors and shapes in an intelligent way. Of course the classic example is Minard's famous depiction of Napoleon's 1812 Russian campaign. I think everyone should have to learn how to make good plots, and understand how to read one. When I was a TA, I constantly had to remind students the point of making graphs - I think nearly all of them felt it was busy work rather than a way to organize and visualize data; a way to recognize a physical effect.
Just like significant figure errors (I am bothered enough by those to contact newspaper reporters: I've done it in the past), I cringe at the sight of misleading or poorly organized graphs. The worst offenders tend to use Excel, whose plots are instantly recognizable as probably being garbage. I also dislike the USA Today charts and many plots seen on the various network evening news shows. Too much artistic influence from graphics artists (no offense K.P.!), and not enough substance.
more ...New Hosting Service, Again
Two and a half years ago I moved my website off my father's computer at home to Site5. For a while it was great, especially compared to serving a website over a cable modem connection. However, over the last year or two it's gotten progressively worse, something I discussed in this post about a year ago. Also over a year ago, Site5 promised to move everyone to new servers. It hasn't happened, and my service has gone steadily downhill.
My first two-year prepaid period with Site5 went up in December last year, and I seriously thought about moving. I looked at other shared hosting companies, but I felt I would probably have the same problems on a new shared host. I looked into hybrid solutions, but that too didn't seem a guaranteed improvement. I liked the idea of Virtual Private Servers (VPS), but I couldn't find one with enough disk space in my budget.
A few months ago, my lab mate Rick pointed me towards s3fs, which intrigued me. s3fs puts your data on Amazon S3, but allows the data to appear to be local to the server, like another hard drive. You pay for only what you use with S3, and it has virtually unlimited space. Suddenly, a VPS hosting solution fit into my budget. I could pay for a VPS with less disk space than I needed, but still get the power of VPS. It was also an upgrade because now me and my family could upload as much data as we wanted, and it would be much more secure from disk failure than before.
This website and other sites that were on the old server are now being hosted on a machine from linode.com. I'm using their lowest option, which has 10GB of space. I installed Ubuntu Hardy Heron which seems like a solid Linux distribution. s3fs has proven to be reliable and fast enough, although it's much slower than having the data on a local disk. Using Apache rewrites, my father and I have made it such that when a web browser asks for items on a page that exists on S3, the request goes there instead from this server, which saves lots of time. I've also figured out how to shoehorn Gallery2 into using S3.
So far I am very happy with the new server.
more ...A Rainbow over Mesa
Today it rained nearly all day long. As the sun was setting I saw this rainbow over the Mesa Apartments. I looked for gold - no luck. So I'm staying in grad school.
more ...Ride History Mashup
Below are some mashups showing the frequency density of where I have ridden my bikes in the last three years. (I never use my GPS on the track, so velodrome riding is not represented here. And besides, that isn't very interesting.) The circles on the maps represent a place I have passed through, and the color how many times. Red means many, perhaps as many as 5000 times for the area near my apartment, and blue means few, as few as once. That 5000 doesn't mean I've done 5000 rides, it means that there are 5000 GPS waypoints in the 100 meter radius circle around that particular point. As waypoints are recorded closer than 100m apart, the same ride could have multiple waypoints inside each circle. Also note that the circles on the map are much larger than 100m.
Click on each for a larger view.
I made a Google Earth KMZ file containing all the points. If you open it, be patient as it will take a bit of time to load. Download it here.
I think I'm planning on posting the code here, as I think other people might like this fun bit of code. But I want to clean it up a bit before I make it public.
more ...Uncle Dubya’s Money
I finished my taxes over the weekend and I discovered that in 2007 I made a whopping three dollars more than in 2006. Three dollars is three dollars, right? Not so fast, according to this page, I actually made an inflation-adjusted 5% less in 2007 than 2006. Good times for grad students!
Actually, it's not quite that simple. In 2006 I was paid more for a higher-earning TA position (66% vs. 50%, for those of you in-the-know), and in 2007 my investments covered that difference. So unless the economy collapses (it's possible!) I can count on my investments continuing to pay off in the long run.
TurboTax told me that I can expect a $600 check from the Treasury. I am sophisticated enough to realize that money is never free. As the government is already running a huge debt, this largesse is just more debt. And who gets to pay for this, why, I will! Depending on the next few congresses and presidents, it could be sooner or later, less or more, but in the end the $600 is not going to be free to me.
So, I give to my dear readers, a list of what I can do with Uncle Dubya's Money:
- Save/invest the money to prepare for the eventual increase in taxes that has to come.
- Buy some more RC helicopter stuff. I haven't flown my helicopter since December mostly because I've been forcing myself to focus on school. But also to keep me from spending too much money on it. With this money I could upgrade the parts, buy a bunch of spares, or buy a whole new toy! Like a nitro-power RC car.
- Buy a flat-screen television. But then I'd have a nice TV, and I could watch what on it? I'd have to pay for cable or satellite, or buy a Blu-Ray player.
- Convince Melissa to use her $600 with mine and we could do something really fun, like a trip to Europe. Except then we'd be spending all our money on either airfare or in a foreign country, which isn't the point of the money, right? And what with today's weak dollar $1200 buys approximately a nice dinner in Europe. At McDonalds. Maybe Burger King.
- I think now is the time to enter the housing market. The $600/$1200 could go for a down-payment on a nice condo here. And in a couple years we can sell it for huge profits due to the raging San Diego real estate market. A good idea? Maybe not.
Anza Borrego Desert State Park
I've posted some photos from Anza Borrego Desert State Park. Melissa and I drove out to see the wildflowers and go for a hike. As you can see above, both of my hiking boots lost their soles. I had them re-soled a few months ago and hadn't used them since. They started falling apart only a little bit into the hike and we shorted the walk when it became clear the damage was progressing. For the last third I had to high-step it to keep the front of the soles from catching on the ground. One sole fell apart completely meters from the car. Happily, I didn't drive in my hiking boots so I had another pair to wear the rest of the day.
We also sorta kinda saw a Big Horned Sheep on the hillside. In the photo the sheep's body is facing you with its head turned to the left (your left). It was hard to see it in the shadow of the rock; I didn't see it until Melissa pointed it out at home. I took the photo on blind faith that everyone wasn't lying to me.
more ...
Glider Port
Today is especially windy, so we went out to the glider port to see what was flying. We found a bunch of model airplane gliders and one hang glider, seen above. If I wanted another expensive hobby, hang gliding looks awfully fun!
more ...OpenMP
I'm all about graphs lately!
The graph above shows the speedup that a few OpenMP statements can give with very little effort. OpenMP is a simple way to parallelize a C/C++ program which allows you to run a program on many processors at once. However, unlike MPI which can run on many different machines (like a cluster), OpenMP can only be run on one computer at a time. Since most new machines have multiple processors (or cores), OpenMP is quite useful.
I've added a couple dozen OpenMP statements to the code I'm working on. The blue line shows how long (in seconds) it took me to run a test problem on between one and 32 processors. The green line shows the speedup compared to running on a single processor as a ratio of time. It is very typical of parallel programs that the speedup isn't linear and flattens out at high thread count. This small test problem deviates at 16 processors; when I do a real run (which will be much larger and the parallelization more efficient) I may see nearly linear speedups all the way to 32 processors.
I think it's pretty neat how with very little effort I was able to significantly speedup my code.
more ...2008 Tour of California
...or rather, Tour of the Bay Area, Central Coast and North of L.A.
2006, 2007, 2008 route maps
In the three editions of the ToC so far, the closest it has come to San Diego is Long Beach, basically 2 hours away. That stage was on the fairly-boring Long Beach Grand Prix course. This year, the first four (of a total eight) stages are within two hours of the central Bay Area. Each are interesting. The closest stage this year to La Jolla is over two hours away, if the traffic is good in LA (ha!).
I am forced to wonder if the northward-tilt of the ToC makes good business sense. While the Bay Area is quite large at over 7 million people, the greater Los Angeles area and San Diego County together account for nearly 20 million people. Perhaps cycling is more popular in Northern California, but it would have to be three times more popular per capita to make business sense. Additionally, the weather is generally better in Southern California which would make the riders happier and the spectator turnout higher. Furthermore, Amgen, the title sponsor, is headquartered in Thousand Oaks.
I certainly don't think a race through downtown LA or San Diego is practical, but there are many roads in both areas that would make an excellent part of the race. I should know, I have ridden my bike on many roads I could recommend to the race organizers.
I will follow the race all the same, but I wish the organizers would bring the race near me at least one of these years (before the race evaporates, like every major American race eventually does).
more ...Monte-Carlo Whoopass
Don't worry about the physical meaning of the two plots below:
Taken from Baldry et. al. (2004),figure 3 (plot 7).
My plot of entirely fake data that means almost nothing.
Just notice that the two peaks are pretty much in the same places on both graphs, 1.5 and 2.2. The first graph shows physical data (stars) and a double-Gaussian fit (light solid line). The second graph is the result of my using Monte-Carlo fitting to make entirely fake data using the first curve. The real graph has over 10,000 items to make that smooth distribution, while with only about 100 items Monte-Carlo is already starting to look like the real thing. Of course, it will take much more items to capture the smoothness and the "long-tail" on each end.
I just wanted to share because the whole thing I wrote, which includes a simple function integration (for normalization), worked on my first try.
more ...