Setting Hadoop Node Types on AWS EMR

Amazon Web Services (AWS) offers dozens (if not over one hundred) different services. I probably use about a dozen of them regularly, including Elastic Map Reduce (EMR) which is their platform for running big-data things like Hadoop and Spark.

EMR runs your work on EC2 instances, and you can pick which kind(s) you want when the job starts. You can also pick the "lifecycle" of these instances. This means you can pick some instances to run as "on-demand" where the instance is yours (barring a hardware failure), and other instances to run as "spot" which costs much less than on-demand but AWS can take away the instance at any time.

Luckily, Hadoop and Spark are designed to work on unreliable hardware, and if previously done work is unavailable (e.g. because the instance that did it is no longer running), Hadoop and Spark can re-run that work again. This means that you can use a mix of on-demand and spot instances that, as long as AWS doesn't take away too many spot instances, will run the job for lower cost than otherwise.

A big issue with running Hadoop on spot instances is that multi-stage Hadoop jobs save some data between stages that can't be redone. This data is stored in HDFS, which is where Hadoop stores (semi)permanent data. Because we don't want this data going away, we need to run HDFS on on-demand instances, and not run it on spot instances. Hadoop handles this by having two kind of "worker" instances: "CORE" instances that run HDFS and have the important data, and "TASK" types that do not run HDFS and store easily reproduced data. Both types share in the computational workload, the difference is what kind of data is allowed to be stored on them. It makes sense, then, to confine "TASK" instances to spot nodes.

The trick is to configure Hadoop such that the instances themselves know what kind of instance they are. Figuring this out was harder than it should have been because AWS EMR doesn't auto-configure nodes to work this way; the user needs to configure the job to do this including running scripts on the instances themselves.

I like to run my Hadoop jobs using mrjob which makes development and running Hadoop with Python easier. I assume that this can be done outside of mrjob, but its up to the reader to figure out how to do that.

There are three parts to this. The first two are two Python scripts that are run on the EC2 instances, and the third is modifying the mrjob configuration file. The Python scripts should be uploaded to a S3 bucket because they will be downloaded to each Hadoop node (see the bootstrap actions below).

With the changes below, you should be able to run a multi-step Hadoop job on AWS EMR using spot nodes and not lose any intermediate work. Good luck!

make_node_labels.py

This script tells yarn what kind of instance types are available. This only needs to run once.

#!/usr/bin/python3
import subprocess
import time

def run(cmd):
    proc = subprocess.Popen(cmd,
        stdout = subprocess.PIPE,
        stderr = subprocess.PIPE,
    )
    stdout, stderr = proc.communicate()

    return proc.returncode, stdout, stderr

if __name__ == '__main__':

    # Wait for the yarn stuff to be installed
    code, out, err = run(['which', 'yarn'])
    while code == 1:
        time.sleep(5)
        code, out, err = run(['which', 'yarn'])

    # Now we wait for things to be configured
    time.sleep(60)

    # Now set the node label types
    code, out, err = run(["yarn",
        "rmadmin",
        "-addToClusterNodeLabels",
        '"CORE(exclusive=false),TASK(exclusive=false)"'])

get_node_label.py

This script tells Hadoop what kind of instance this is. It is called by Hadoop and run as many times as needed.

#!/usr/bin/python3
import json
k='/mnt/var/lib/info/extraInstanceData.json'
with open(k) as f:
    response = json.load(f)
    print("NODE_PARTITION:", response['instanceRole'].upper())

mrjob.conf

This is not a complete mrjob configuration file. It shows the essential parts needed for setting up CORE/TASK nodes. You will need to fill in the rest for your specific situation.

runners:
  emr:

    instance_fleets:
    - InstanceFleetType: MASTER
      TargetOnDemandCapacity: 1
      InstanceTypeConfigs:
      - InstanceType: (smallish instance type)
        WeightedCapacity: 1
    - InstanceFleetType: CORE
      # Some nodes are launched on-demand which prevents the whole job from
      # dying if spot nodes are yanked
      TargetOnDemandCapacity: NNN (count of on-demand cores)
      InstanceTypeConfigs:
      - InstanceType: (bigger instance type)
        BidPriceAsPercentageOfOnDemandPrice: 100
        WeightedCapacity: (core count per instance)
    - InstanceFleetType: TASK
      # TASK means no HDFS is stored so loss of a node won't lose data
      # that can't be recovered relatively easily
      TargetOnDemandCapacity: 0
      TargetSpotCapacity: MMM (count of spot cores)
      LaunchSpecifications:
        SpotSpecification:
          TimeoutDurationMinutes: 60
          TimeoutAction: SWITCH_TO_ON_DEMAND
      InstanceTypeConfigs:
      - InstanceType: (bigger instance type)
        BidPriceAsPercentageOfOnDemandPrice: 100
        WeightedCapacity: (core count per instance)
      - InstanceType: (alternative instance type)
        BidPriceAsPercentageOfOnDemandPrice: 100
        WeightedCapacity: (core count per instance)

    bootstrap:
    # Download the Python scripts to the instance
    - /usr/bin/aws s3 cp s3://bucket/get_node_label.py /home/hadoop/
    - chmod a+x /home/hadoop/get_node_label.py
    - /usr/bin/aws s3 cp s3://bucket/make_node_labels.py /home/hadoop/
    - chmod a+x /home/hadoop/make_node_labels.py
    # nohup runs this until it quits on its own
    - nohup /home/hadoop/make_node_labels.py &

    emr_configurations:
      - Classification: yarn-site
        Properties:
          yarn.node-labels.enabled: true
          yarn.node-labels.am.default-node-label-expression: CORE
          yarn.node-labels.configuration-type: distributed
          yarn.nodemanager.node-labels.provider: script
          yarn.nodemanager.node-labels.provider.script.path: /home/hadoop/get_node_label.py
more ...

Finale for last.fm

Since 2006 I've engaged in a bit of navel-gazing and tracked my music listening using last.fm. By running an application on your computer or phone, or linking a streaming service to last.fm (e.g. Tidal and Spotify support this), the service keeps track of what songs you listen to. The act of recording a song listen is called a "scrobble." However, it's always kind of bugged me that you can't scrobble in all situations, like when listening to an internet radio station or terrestrial radio.

Finale

Recently I discovered the Finale app that does exactly this: it not only allows you to manually enter songs, it includes Shazam-like ability to listen to songs and identify them for you, and if you like, then scrobble the song. It has a "continuously listen" mode which despite the name means that once a minute it wakes up and listens for any new song you're listening to and scrobbles for you.

Somewhat related, I would like to recommend The Colorado Sound, a music radio station operated by a local NPR affiliate. It's as commercial-free as NPR is (limited to "supported by" messages) and has a very wide and eclectic selection of music. Using the Finale app while I listen to their internet stream allows me to keep track of what they're playing, and if I hear something I like, I can revisit that song/artist. You should check it out!

more ...

Solar Water Monitoring Page

Water Heater Monitoring

Today I just published a page that monitors my solar water heater. Every fifteen minutes the Raspberry Pi that controls the system pushes updated metrics to this website and triggers a refresh of the plots on the page. There's a diagram and description of how the whole system works. Go ahead and check it out!

more ...

Bye Bye Bitbucket

A few days ago Bitbucket announced that they were ending support for Mercurial and will only support Git starting in about a year. I am a happy Mercurial user. It was the first DVCS system I learned, and have had no reason to switch away from it. Git is more popular than Mercurial, yes, but with useful plugins like hg-git, I never need to use Git even when I am working on a Git repository. Ultimately, the difference between Git and Mercurial is kind of like the difference between Windows and Mac (in that order). One is more popular than the other, but when it comes down to featureset and capabilities, it's very close. Like Windows vs. Mac, the main difference is the experience of using them, and I feel Mercurial is the clear winner.

I only use Bitbucket for their Mercurial support, so I decided to waste no time and abandon them entirely and immediately. In their place I am self-hosting an install of the community (open source) edition of Rhodecode. There are a couple of my repositories publicly viewable here. So far it seems like it has more than enough features to replace Bitbucket.

Update: As of this writing, I turned my self-hosted code server off because I'm lazy.

I feel I should point out that this serves as a reminder to me (and to you, dear reader) that cloud services cannot be relied upon. Companies, even if you are paying them, can stop serving you. This is at least the second time I have had to replace a cloud service with a self-hosted solution. A few years ago I replaced Google Reader with tt-rss. Like I wrote previously, because my new Mercurial hosting service is self-hosted, no one will be taking it away from me. And that is good.

Finally, I don't know why anyone would choose Bitbucket now to host their code repositories. Without Mercurial, all it has it Git, and Github (which has obviously been Git-only from the start) has already won:

Github
Github

Bitbucket
Bitbucket

more ...

State Fair Trip Planning

Some time ago I saw this blog post where Randal S. Olson used a genetic algorithm to compute an optimized path to visit various landmarks across the United States. I have used genetic algorithms professionally for a few things, and I found this application very fun and clever. As part of the blog post, Mr. Olson linked to the Jupyter notebook he used to calculate the optimized road trip. I am a heavy user of Python and the Jupyter notebook, so this intrigued me.

I got the idea to try to apply this kind thinking to the challenge to visiting the various state fairs that typically happen in the mid- to late-summer across the US. In fact, I got the idea to do this before the summer started. Indeed, some of the fairs analyzed below have already ended. But I have a good excuse! My second child was born very recently which has decreased my free time and increased the rate of brain cell death due to lack of sleep. These methods can be applied to future years by simply modifying the start/end dates where all the fairs are listed (in the notebook). Below are the results of my investigations.

The code behind these results can be found here.

Rules

  • We only attend fairs in whole-day chunks. In reality, it's possible to arrive in town and attend the fair on the same day. But to keep this analysis simpler, we'll assume that travel days and fair attendance days are separate.
  • We drive 12 hours a day. This is obviously something that comes down to personal preference, but I feel that 12 hours is fairly doable if you have more than one driver. Besides, who wants to attend all these fairs by themselves? What this rule means is that if it takes 12 hours and one second to get from point A to point B, the algorithm will count it as a two-day drive. This is obviously not realistic, but it keeps things simple.

When Do Fairs Happen?

Below is a Gantt chart that shows when the various fairs happen. The state name is on the y-axis, and time is on the x-axis. Except for Florida and Nevada, all the fairs happen in a fairly congested time frame between July and November.

!

Quick note: As can be seen above, the Florida and Nevada state fairs take (took) place far earlier in the year than all the other fairs. Attending them doesn't conflict with any other fair, so I leave them out of my analyses until the end, or I just don't include them.

How to Spend the Most Days at a State Fair

I'm first going to answer question of how to spend the most days at a state fair in 2016. This is the way to eat the most fried things, ride the most rides, and see the most hog races in one summer. The winning strategy here is to get to a state fair and not leave it until you have to (e.g. it ends, or it's advantageous to leave for another fair), and also spend the least amount of time on the road between fairs.

For this analysis, we will not use a genetic algorithm, but rather a directional graph (digraph). This kind of graph is a network of nodes and edges, where the edges link nodes in an allowed direction of travel. In this case, the nodes will be fair attendance days, and the edges will be transitions. Some of the transitions will simply return to the same fair for consecutive attendance days, and other transitions will be travel to a new fair that cover one or more actual days.

After a bunch of math and stuff (see the notebook!) we end up with 128 days that we can attend a fair. The diagram below describes the strategy of how to do this. Here's how to follow the diagram:

  1. Because they are temporally isolated, Florida and Nevada are to be attended first. We attend each fair for their full duration.
  2. Next we go to California. We see that there is an edge that goes back to California, and one that goes to Ohio. What this means is that we should stay in California as long as we can or want to, but we should travel directly to Ohio in time for that fair to be open.
  3. Likewise for Ohio and Indiana, we should stay at those fairs as long as we can/want to, and then travel to the next fair while it is still running.
  4. For Kentucky, after we have spent enough days there, we have an option of going to Minnesota or Nebraska. Whichever way we go here determines what fairs we can attend later on. For example, if we go to Nebraska, we can't go to New Mexico.
  5. Follow the rest of the the tree until we end up in Louisiana.

Most Days on the Road

What if you really like driving, but only if you have a destination? Are you some kind of weirdo? Let's see how we can attend state fairs all summer, but spend most of our time driving.

Doing some analysis similar to above, we come up with a method that puts us at state fairs for only 43 days. Note that I'm not including Florida nor Nevada in this count. This method has added 69 extra days of driving!

The strategy diagram below shows how to maximize driving time. It is quite a bit more complicated than the one above. It shows (as might be expected) that the optimal strategy here is to travel between distant fairs as much as possible, even going back and forth between fairs that are open at the same time. If you can travel to another fair far away, do it! For example, near the top there are links between Delaware and North Dakota going both ways. This means that to maximize traveling time, we should go to the Delaware fair, then North Dakota, and then return to Delaware again. After that we may travel back to North Dakota (if it's still open), or head over to Montana or Maine. Inspecting the diagram we'll notice some fair pairs that are very distantly separated (OR<->AK, WA<->TN), which again, makes sense if we are trying to maximize driving time. Also notice that out of the many possible paths, the WY->RI->AK segment is always the best strategy, which is hardly surprising, considering how long of a trip just those two segments are.

Whether or not this kind of strategy is a good idea is a very good question. But here it is. I dare anyone to try it!

Most Unique Fairs in One Summer (with the Lowest Driving Time)

This is the question of how to attend the most number of unique fairs in a single summer. What if we want to sample the character of as many fairs as possible? Which state fair has the best fried food? How might we go about that?

The work by Mr. Olson that inspired me to play around with state fairs used a genetic algorithm to determine a fairly efficient way to visit landmarks across the US. After much thought, I decided that a genetic algorithm would not work (easily) for this question. The reason is that the algorithm does various mutations, substitutions, and combinations that would be difficult to implement when time-ordered events limit choices.

For example, let's say we have a short segment in our path:

CA->DE->OH

and the genetic algorithm randomly decides to modify the Delaware step to a new fair. It can't just pick any fair because the new fair has to be open around the time the California fair. The new fair also has to be located such that we can travel to Ohio afterwards. Chances are that most new fair modifications wouldn't work at all.

Maybe the algorithm could replace the Delaware fair with a fair that works for California, but not Ohio. Then randomly picks the next fair (or fairs) replacing Ohio (and ones after that), fixing the chain until there's no more conflict. To me, that is not much different from just brute-forcing the problem because this would effectively destroy any genomes in the individual, and it would likely reduce its fitness.

This problem is probably as hard as the Traveling Salesman Problem. There are some differences between our problem here and the TSP, but my gut tells me that the two problems are similarly difficult:

  • We do not need to visit all the possible states. For example, it's reasonable to expect that visiting Alaska will be very unlikely in any route that optimizes the number of unique fairs visited due to its prohibitive travel time.
  • It's okay to visit a state fair more than once, most likely on consecutive days, but there's no rule that prevents us from coming back later if there are no other fairs we haven't already visited. This is a good thing, it means more fried food!
  • Fairs are strictly ordered, which limits our available choices at any given time. It also requires us to visit a fair at a certain time which might limit some other options we might have pursued further down the line.

As daunting as this all sounds, a partially-random brute force method is easy to implement. Running it for five minutes gives us this recommended schedule below (in "State + Date" format) that results in visiting 38 state fairs (or 40 if we throw in Florida and Nevada). There's no guarantee that 40 states is the most that we can travel to in one calendar year, only a comprehensive search can prove that (which is a very, very expensive problem to solve). However, considering the size of our nation, and that there are two fairs that are impractical to attend (Alaska and Hawaii), and one that doesn't exist (Pennsylvania), 40/47 isn't bad at all.

'CA+2016-07-16',
 'DE+2016-07-21',
 'ND+2016-07-25',
 'OH+2016-07-28',
 'MT+2016-08-01',
 'WI+2016-08-04',
 'ME+2016-08-07',
 'NJ+2016-08-09',
 'MO+2016-08-12',
 'IL+2016-08-14',
 'WY+2016-08-17',
 'IA+2016-08-19',
 'IN+2016-08-21',
 'KY+2016-08-23',
 'NY+2016-08-25',
 'MN+2016-08-28',
 'NE+2016-08-30',
 'CO+2016-09-01',
 'OR+2016-09-04',
 'WA+2016-09-06',
 'ID+2016-09-08',
 'UT+2016-09-10',
 'NM+2016-09-12',
 'TN+2016-09-15',
 'KS+2016-09-17',
 'MA+2016-09-20',
 'CT+2016-09-22',
 'OK+2016-09-25',
 'VA+2016-09-28',
 'GA+2016-09-30',
 'TX+2016-10-02',
 'GA+2016-10-04',
 'MS+2016-10-06',
 'AZ+2016-10-09',
 'NC+2016-10-13',
 'SC+2016-10-15',
 'AR+2016-10-17',
 'AZ+2016-10-20',
 'AZ+2016-10-21',
 'AZ+2016-10-22',
 'AZ+2016-10-23',
 'AZ+2016-10-24',
 'LA+2016-10-27',
 'AL+2016-10-29 '

Anything Else?

Please leave a comment or send me a note (my email is easy to find) with any ideas.

more ...

The Closest Non-Intersecting US Interstates

On our recent driving trip to Yellowstone and Montana, I had lots of time to think about random things while behind the wheel. One of them was to wonder of the major US Interstates, which two come the closest without actually intersecting? My guess was that it's some place on the East Coast, but due to my general lack of knowledge of East Coast highways, I had no idea which two it is.

Being a huge dork, I decided to figure it out.

Basically, it's actually not a very difficult thing to figure out. The steps are:

  1. Get the latitude and longitude coordinates for a number of points along each of the interstates.
  2. Determine which interstates intersect and eliminate those pairs.
  3. Put the coordinates for the interstates into a kD-Tree which will perform the search that determines the distance between non-intersecting highways in a fast way.

It turns out that the first step proved to be the hardest. I decided to use the data from the Open Street Map (OSM) project. This is a Google Maps-like website that is editable by anyone in the world, similar to Wikipedia. It will not give you directions like other mapping services, but it contains the geographical location of a wide variety of items, including and importantly (as the name suggests) roads. I looked into using the OSM APIs, but as far as I could tell either the APIs didn't do what I needed in an efficient way, or the servers were down. So I simply downloaded the 82 GB XML (5 GB compressed for download) dataset for the United States.

Begin rant feel free to skip to the next paragraph. I loathe XML. Any time that you have a 82 GB text file (apparently it's 200+ GB for the whole world) as your main distribution method, you're doing something wrong. Doing this project I learned as little about XML as I could to get just what I needed out of the file. Apparently the authoritative data is kept in a real database, but it appears that you can not download the data as a database. They do have a binary format description, but I can't find a link to download the data in that format. Furthermore, the world doesn't need yet another binary format. For example, they do not discuss endianness for their binary format on that wiki page, which is a big issue with binary formats. There are many other quality formats they could use (SQLite or HDF5). The binary format has a distinct Not Invented Here feel to it, which is nearly always a bad thing. Anyway, back to the main point of my rant. I don't care that the 82 GB XML file compresses down to 5 GB. Reading a 82 GB text file when you're searching for just a fraction of that data takes a long, long time, and is completely unnecessary. Every time I encounter XML it wastes my time in myriad ways. This time was no different. End rant

I'll spare you the full details and samples my low-quality Python code, but I munged the interstate data into a SQLite file, which distilled the data from 82 GB to 19 MB. Yes, that's nearly four orders of magnitude smaller. Then I used the much more convenient (and fast) SQLite file to build lists of interstate coordinates, which were fed into the kD-Tree for the nearest neighbor searches. The results are shown below. Note that there is no I-50 or I-60, and I eliminated I-45 from consideration because it's entirely within Texas, and therefore is not "major" in my opinion. I eliminated Hawaii's H-1 for the same reason. I have included links to maps showing the great circle between the nearest points of the highways. For highways that intersect, the link goes to one of the (more or less random) points of intersection.

Finally, we can see the answer I was looking for. Interstates 70 and 95 come within 5 kilometers in Baltimore at the terminus of 70, but do not intersect. So my suspicion was correct that it was somewhere in the East, so I have that to feel good about.

Closest approach distances between major interstates in kilometers.
X 95 90 85 80 75 70 65 55 40 35 30 25 20 15 10
5 3338 2877 2826 690 2741 2484 142 1760 1822 910 1237
10 1195 178 544 558 91 321
15 2860 2423 2029 2009 1839 1272 1489 436 1093
20 804 764 615 173 283
25 2185 1730 1653 1453 1239 629 749
30 1062 865 610 758 643 458 486 191
35 1404 985 569 516 358
40 587 550 347
55 806 340 319 37
65 465 101
70 5 152 234 105
75 12
80 413
85 582
90

p.s. If you really, really want to see the code I used for this, I can share it, but I'll have to pull out the hamsters that have taken residence in it. They're attracted to dusty littered places, you know.

more ...

SciPy 2009 at Caltech

I'm at the SciPy 2009 conference at Caltech in Pasadena today and yesterday. It is an amazing collection of nerds and questionable facial hair styles. There have been some interesting talks.

I like this "snub cube" fountain:

Snub Cube

The SciPy crowd:

SciPy Crowd

A fountain in front of the building the conference is in:

Fountain

more ...

New Hosting Service, Again

Two and a half years ago I moved my website off my father's computer at home to Site5. For a while it was great, especially compared to serving a website over a cable modem connection. However, over the last year or two it's gotten progressively worse, something I discussed in this post about a year ago. Also over a year ago, Site5 promised to move everyone to new servers. It hasn't happened, and my service has gone steadily downhill.

My first two-year prepaid period with Site5 went up in December last year, and I seriously thought about moving. I looked at other shared hosting companies, but I felt I would probably have the same problems on a new shared host. I looked into hybrid solutions, but that too didn't seem a guaranteed improvement. I liked the idea of Virtual Private Servers (VPS), but I couldn't find one with enough disk space in my budget.

A few months ago, my lab mate Rick pointed me towards s3fs, which intrigued me. s3fs puts your data on Amazon S3, but allows the data to appear to be local to the server, like another hard drive. You pay for only what you use with S3, and it has virtually unlimited space. Suddenly, a VPS hosting solution fit into my budget. I could pay for a VPS with less disk space than I needed, but still get the power of VPS. It was also an upgrade because now me and my family could upload as much data as we wanted, and it would be much more secure from disk failure than before.

This website and other sites that were on the old server are now being hosted on a machine from linode.com. I'm using their lowest option, which has 10GB of space. I installed Ubuntu Hardy Heron which seems like a solid Linux distribution. s3fs has proven to be reliable and fast enough, although it's much slower than having the data on a local disk. Using Apache rewrites, my father and I have made it such that when a web browser asks for items on a page that exists on S3, the request goes there instead from this server, which saves lots of time. I've also figured out how to shoehorn Gallery2 into using S3.

So far I am very happy with the new server.

more ...

Yahoo! Mail Tries, and Misses

I have written thrice (1, 2, 3) in the past about the new Yahoo! mail interface, the Ajaxed interface to Yahoo! mail. It is incredible how slowly they make improvements to it. It's not like Yahoo! cares what I say, but of the points I raised over two years ago in my first post, they still haven't all been fixed.

But Yahoo! maybe trying harder. There is now a preference to add the greater-than signs on replied to messages:

Yahoo Mail

Which is great. Until you try to use it. Here is a message I sent myself:

Yahoo Mail

Here is what I get when I hit "reply" (this is a screen shot of the compose window, the text is editable):

Yahoo Mail

Yes, each and every word of the message I'm replying to gets its own line. But it gets worse! Here's what I get when I send the replied message without touching anything:

Yahoo Mail

Here each word of the replied to message gets its own line separate from the greater-than signs. I hope this is just a simple bug (I will submit a bug report about this) but this is simply ridiculous.

more ...

Yahoo! Mail Beta still stinks… less

Three months ago I wrote that Yahoo! Mail Beta still stinks. I said that Yahoo! had fixed two of my five main quibbles with their newest email interface. Sure, they fixed two, but they were the ones I cared least about.

Lo and behold, Yahoo! came out with an updated version of Mail Beta a month ago, and more recently my server farm received the update. Let's see how Yahoo! fares this round!

  1. Fixed-width fonts. Huzzah! Numbah one gets addressed. This is big. Fixing this almost is enough for me to start using Beta every day. But only almost. Yahoo!, you get a nice green check:

    x

  2. Message replying format. Nope. Nothing new here. Same, lame behavior as before. Give us some freedom, Yahoo!. Stop putting the minority with good etiquette down! This earns you a Big Red X, and red is never a good color for anything.

    x

  3. Message quoting. Nope, again. There is still no way to differentiate the message I'm replying to and what I've written. Another BRX.

    x

Yahoo!, you're getting beat up, down, left and right by Google. They just took YouTube out from under you this week! Shape up!

more ...

[W]hack a Phone

E815

I've had my Motorola E815 phone for about a year. It has bluetooth, a camera, and other cool features, like many modern phones. Bluetooth enables you to transfer photos, movies and ringtones on and off a phone. All these things seem like a useful feature set for a phone. However, Verizon, the carrier for my current phone, disabled a number of bluetooth features, such as file transfer.

Why would Verizon intentionally cripple a phone? Money, of course. Turning off the file transfer abilities of a phone means that if I want to get the photos I take using my phone onto my personal computer, I need to use their not free services. Also, if I want to get a new ringtone, I need to pay for them (often over a buck for 30 seconds of music!), instead of uploading a simple, free, MIDI I found on the internet.

Ever since I got my phone, I've been aware that it is easily hacked to allow all bluetooth function, but it required a special USB cable. I never got around to buying the $10 cable, and hacking the phone, until now.

Phone

The camera actually takes farily decent photos in a wide range of light conditions considering it's just a pinhole lens. It also takes short movies, with sound. I can also upload MIDI sound files, or even MP3s, and make them my ringtone. I've always wanted to pull out my camera phone whenever I saw something cool, but never bothered because I couldn't get them off the camera for free. Now that I can, I'll be more willing to snap a pic of whatever.

Phone

more ...

Expedite my PhD

Six Displays!

Do you want to help me get my PhD faster? If so, go on over to this website and buy me one Zenview six screen multidisplay. I'd also like you to buy me a computer capable of running all the monitors.

It's only your money. We're talking about my education here!

more ...

Yahoo! Mail Beta still stinks

Seven months ago I wrote a post covering my likes and dislikes (mainly dislikes) of Yahoo! Mail beta. It's time to revisit it and see if Yahoo! has done anything in that time. I had five points of contention:

  1. Fixed-width fonts. There is still no option for showing/composing messages in fixed-width font. Starting off weakly, Yahoo!

    x

  2. Message replying format. It still puts my signature at the top of the message. Again, they should provide the option of putting it where I want it. Uh oh, another red x.

    x

  3. Message quoting. This also has not been fixed yet. Since I use good email etiquette, having no differentiation between what I'm replying to and writing is not an option.

    x

  4. Signature new lines. Finally, something they've fixed. Of course, without fixed-width font, my signature still looks wrong.

    x

  5. Bugs. Perhaps there are other bugs that have gone unfixed, but the one I identified in the previous post has been fixed. So you get a green check, Yahoo!

    x

In sum total Yahoo! is batting .400, which is an excellent baseball average, but it is a poor average for things so simple to fix. Moreover, the list is in roughly descending order of importance to me. Therefore, Yahoo! is batting more like .150, having tackled none of the things crucial to me.

I've tried to tell Yahoo! about these shortcomings. I've submitted feature requests to the appropriate place several times over the last seven months. I can't believe I'm the only one with these concerns. But since nothing has changed, perhaps I am.

more ...

Intel MacBook, Parallels & Garmin GPS

Just a few days ago, I replaced my four year old white 500 Mhz G3 iBook with a shiny widescreen white 1.83 Ghz Intel Core Duo MacBook (the black one seemed altogether silly to spend extra money on). The laptop is a very nice machine. My home machine is a 20" iMac G5 with a widescreen, and I've gotten used to the extra real estate, so I like the fact that the MacBook has one too. Really, the main reason I bought it is because my iBook had decayed to the point that it was only good for websurfing. I wanted a machine I could use at school, and the iBook just couldn't cut it (I definitely tried to make the iBook work!).

Tour de France

Since Apple is switching to Intel chips, the world of Windows is now available using either Boot Camp or Parallels. I have a copy of Windows 2000, and since I believe that Boot Camp only works with Windows XP, I am not going to try that out. Also, Boot Camp makes your machine dual-boot, which means only one OS at a time and no interaction between the two. Parallels is the more attractive option, it allows you to run Windows along with Mac OS X. The Windows world lives inside of an application that runs on Mac OS X. Choosing to interact with Windows is no more difficult that switching applications. Also, Parallels works with basically any Intel-compatible operating system, so I could use my Win2000 install disk.

After much trial and tribulation (mainly related to the fact that my Win2000 is an upgrade version, not full install) I got Win2000 installed using Parallels on my MacBook. After installing the myriad of security updates, I installed the softwares for my Garmin Forerunner 301. You see, Garmin (right now) only makes software for their gadgets for Windows. They've promised to make a Mac OS X verison of the software I use by Spring 2006 (they have two and a half weeks). Obviously, waiting around for that to be relased will just waste my time, so I was hoping that I could use this whole setup to run the Windows software on my MacBook to talk to my GPS. However, sadly, it doesn't work. It's clear it almost works, since Win2000 notices when I plug in the GPS, but the Garmin stuff can't quite talk to the GPS. The situation seems exactly the same as when I tried using Virtual PC on my G5 over a year ago.

All in all, I like the laptop, I like Parallels, and I'm displeased with Garmin. I'm using a beta 30-day activation key with Parallels, and I'm unsure if I'll buy the full version ($40 for pre-ordering). I really try to stay away from Windows applications. Right now the only app I do want to run is the Garmin stuff, and it doesn't look like that's going to work.

more ...

Sim City 4

About a month ago I received my iPod settlement gift certificate worth $50. The money was only good at the Apple store. As you can find out for yourself, there's not much for $50 on the Apple store, save for a few iPod accessories. I didn't really want any iPod accessories, and besides, most wouldn't fit my second generation iPod. Of course Apple was hoping that I'd just go ahead and use the $50 as an excuse to by a quad-G5 desktop system for $3,300 (make that $3,250).

Sim City 4

Look for the one-way streets, wide boulevards, and train tracks.

Above the iPod accessories, the lowest price items are software titles. I browsed through them and discovered that there was a Mac OS X version of Sim City 4. I've always liked the Sim City series, starting with the orginal Sim City. I like the planning of a citys infrastructure. As the "mayor" of the city, the player has to lay roads, freeways, railways, subways, power lines and water pipes. The mayor has to also balance a budget of expedatures and taxes (more on that later). The goal of the game can be as simple as building the largest city possible, or the most asthetically pleasing, or one with the "happiest" residents, or one with the best finances. My goal has usually been to have a high population combined with a happy population.

Sim City 4

Look for the railroad crossing guards, various species of trees and the accurate railroad "Y".

The newest version has many improvements over the previous versions, Sim City 2000 and Sim City 3000. For one, this version supports regions, whereby you can build dozens of cities that are neighbors on one big map. Each city is independent in that you can only edit one at a time, but cities influence neighbors by way of jobs & trade. One of the biggest improvements has been in the graphics. This new game has beautiful graphics with very high level of detail. There are something like six levels of zoom. The closest one is so close you can see individual "sims" (people) on the streets (look for them on the above image of the train station).

Sim City 4

See if you can find the county fair & golf course.

Sim City 4

One of the keys to a well-functioning city is the transportation system. Sims are very touchy about how long it takes to drive to their job. Sim City 4 introduces a very useful tool that allows you to see what kind of traffic goes where through your transportation system. In the picture linked on the right, you can see all the commute traffic that goes through a train station in the center of town (in the lower-left of the picture). You can trace foot traffic onto a passenger train, which gets off at a different station and walks to a jobsite.

I must admit that one key element of Sim City, collecting taxes and balancing a budget, I've never really liked. I either cheat to get the cash to run the town, or find some other way to make money, like building a magic building found on the internet. I'm no libertarian -- I just don't really care too much about the fiscal part of the game. If I were really going to play the game for real, I would not cheat. But I don't want to play it for real, so I feel no guilt.

One nice thing about this game is there is no death & violence, no princess to save, and I can stop the game at any time without losing my progress.

more ...