Using Amazon S3 with Gallery2
Here are step by step instructions on how I integrated Gallery2 with Amazon S3. This is assuming you already have a working Gallery2 installation. This also assumes you have the default rewrite pattern for ‘Download Item’ in your Gallery2 Rewrites preferences, which is “d/%itemId%-%serialNumber%/%fileName%”.
Big warning! - back your stuff up (photos and database) before you try this. I would strongly suggest trying this first on a test gallery to see if it works before moving your real gallery. I did!
I have put some details and discussion on this page.
- Download and build s3fs on your server. It uses /etc/mime.types to determine file types by extension when uploading to S3. You need to either edit that file, or make a copy and change the source (s3fs.cpp) to point your copy, such that the line for jpegs looks like this:
image/jpeg dat jpeg jpg jpe
You need to add the ‘dat’ extension. Also make sure there are no other mimetypes with ‘dat’ associated with them. In my tests, uploading gifs, pngs and jpgs with the extension .dat to S3 with mimetype image/jpeg works for all modern browsers.
If this task is beyond your skill set, this Gallery2 conversion isn’t something you should attempt! Sorry.
- Make an account at Amazon AWS. It costs you nothing to make and maintain an unused AWS account, so even if you don’t get any of this working it won’t cost you anything.
- Make a bucket on S3. On Mac OS X, I use S3 Browser, there is a Firefox S3 plugin, and there are Windows S3 tools. You’ll need your access info from Amazon AWS, which are two long strings of characters.
- Make a folder on your server which will be the mount point for your bucket. It doesn’t matter where it is, but it probably shouldn’t be inside your web root (public_html or similar).
- If you’re lucky, FUSE is already built into your kernel. If it is, a command like this should work, (I like to make a small shell script):
#!/bin/sh /path/to/s3fs bucketname /path/to/bucket/mountpoint -o default_acl=public-read -o allow_other -o accessKeyId=XXXXXXXXXXXXXXXX -o secretAccessKey=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX -o use_cache=/tmp
The “-o default_acl=public-read” bit is important, as it allows the objects that are uploaded to S3 to be publicly read, which is the whole point! I’m not 100% sure if “-o allow_other” is needed. I think it is, but overall it would be better if it wasn’t needed, as it allows anyone on your machine to edit files in your bucket. I was having problems when it wasn’t turned on. A local cache is specified by “-o use_cache=/tmp” and will be made in /tmp/bucketname, and is optional, but is a good idea
If this command doesn’t work, either you should build the FUSE kernel module and install that, or explore your options. Depending on your hosting situation you might try asking customer support to do it. I can pretty much guarantee that if you’re on a popular shared hosting server (like Dreamhost, Site5, Host Monster), FUSE won’t be installed. It doesn’t hurt to ask!
You can un-mount a s3fs disk with the command ‘fusermount -u /path/to/s3fs/mountpoint’.
Note: if you run s3fs as above, your secret AWS info will appear to anyone who runs ‘ps’ on your machine. There is the option of using an external file (see the s3fs page), but it defaults to /etc/passwd-s3fs which may not be a place you can or want to put it. If you want to use the external file option, you could edit the source to point to a different file.
- Copy your entire g2data/albums and g2data/cache/derivative folders to your s3 mount point, i.e. cp -r g2data/albums /path/to/bucket/mountpoint/, which will put your albums folder at /path/to/bucket/mountpoint/albums/ (and derivative similarly) (see note about this). Transfer time will depend on your server’s network connection, the amount of data and the number of files. There is a large per-file transaction penalty. For example, it took me 4 hours to transfer 26,000 files that totaled 2.7GB, while 5,000 files/4GB took about an hour. (See note on details page.)
- (I’m not sure if this is necessary.) In your php.ini file, or perhaps in your .htaccess file, you’ll want to add your bucket mount point path to your open_basedir. It should look something like this:
open_basedir = "/path/to/public_html:/path/to/g2data:/tmp:/usr/bin:/path/to/bucket/mountpoint"
You can make sure this is working by making a php file in your public_html folder with this content:
<?php phpinfo(); ?>
If you then open that file using your browser, you can see if the open_basedir has been set correctly (do a search for that word).
- At this point it is a good idea to make sure your gallery is still functional. As the only thing up to here that might effect Gallery2 is the open_basedir, it should still work, but it doesn’t hurt to check. From this point on you’ll need to check that your gallery still works after every step.
- Inside the g2data folder, execute these commands, which makes Gallery2 start using the s3 files:
mv albums albums-local ln -s /path/to/bucket/albums albums mv cache/derivative cache/derivative-local ln -s /path/to/bucket/derivative cache/derivative
- Check that your gallery works. If not, take a look at your server error log, and make sure you followed all the steps above. Depending on what files s3fs has locally cached, images may load slower than normal because they are coming from s3.
- In your cgi-bin directory, put this python script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166
#!/usr/bin/python # Stephen Skory # stephenskory@yahoo.com # Part of a hack to make Gallery2 work with Amazon's S3 service # uncomment this below if you want to do some debugging. #print "Content-Type: text/plain" #print # User edited values AWSBase = 'http://bucketname.s3.amazonaws.com/' # yes trailing slash albumsAWSBase = AWSBase + 'albums/' # yes trailing slash derivativeAWSBase = AWSBase + 'derivative/' # yes trailing slash s3fsBase = '/path/to/bucket/' # yes trailing slash galleryBase = '/gallery2/' # your web location for gallery2, minus your domain name # if your gallery is at http://mysite.com/gallery2/main.php # put /gallery2/ here, with the trailing slash table_prefix = 'g2_' column_prefix = 'g_' refreshTime = '180' # days myHost = 'localhost' # 99% of the time myUser = 'database_username' myPasswd = 'database_password' myDB = 'database_name' # --- EDIT ABOVE --- # # get the needed modules import cgi, time, sys, re, MySQLdb, urllib2 #import cgitb; cgitb.enable() # for debugging from path import path # for 'touch'ing files, requries path.py: http://www.jorendorff.com/articles/python/path/ # if the image is a full sized image, get the pieces of its pathname def fullSizedImage(numbers): path = [] # loop over the numbers in its location for number in numbers: if (number != '7'): # 7 is the top album db.query("""SELECT %spathComponent from %sFileSystemEntity where g_id=%s""" \ % (column_prefix,table_prefix,number)) r = db.store_result() # store the results in path path.append(r.fetch_row()[0][0]) # get the file name db.query("""SELECT %spathComponent from %sFileSystemEntity where g_id=%s""" \ % (column_prefix,table_prefix,line)) r = db.store_result() for row in r.fetch_row(1): path.append(row[0]) # make the path to the full S3 version string = '' string = "/".join("%s" % part for part in path) string = albumsAWSBase + string return string # if the id is for a reduced size image, it's easy to make the S3 location def reducedSizeImage(line): if (int(line) <= 100): # the id=100 is a weird case I'm not 100% on... string = derivativeAWSBase + '0/%s/' % line[0] string = string + line.strip() + '.dat' else: string = derivativeAWSBase + '%s/%s/' % (line[0], line[1]) string = string + line.strip() + '.dat' return string # we call this after we've formed a S3 location to see if it's there already. # if it is and it's new enough, we return 0 and preserve the S3 version # otherwise we return 1 and will create a new local location def chooseRefreshedVersion(Location): # try to get the file off of S3 req = urllib2.Request(Location) try: url_handle = urllib2.urlopen(req) # if it's not there, return 1 except urllib2.HTTPError: return 1 # otherwise, get the file headers headers = url_handle.info() # get the date of last change for the S3 version last_modified = headers.getheader("Last-Modified") time_mod = time.mktime(time.strptime(last_modified, "%a, %d %b %Y %H:%M:%S %Z")) # now time_now = time.mktime(time.localtime()) # calculate the difference timeDiff = time_now - time_mod # calculate the refresh time in seconds rTimeSecs = 60*60*24*refreshTime # if the S3 file is too old, return 1 # but also we'll 'touch' it to 'refresh' it. If it needs regeneration, Gallery will do it # because we're going through the 'official' channel, so an extra touch won't hurt it if (timeDiff > rTimeSecs): # take off the s3 address part localFileName = sb.subn('',Location)[0] # add the local stuff localFileName = s3fsBase + localFileName # create f under path framework f = path(localFileName) # touch it f.touch() return 1 # if the S3 version is not too old, or if it exists on S3, return 0 return 0 # Establish a database connection # yeah, yeah, it's database specific. You get what you pay for! db = MySQLdb.connection(host=myHost, user=myUser, passwd=myPasswd, db=myDB) # Make some regular expressions # to find the numbers from parentSequence p = re.compile('\d+') # to strip off stuff that's not a number D = re.compile('\D') # for updating refresh time sb = re.compile(AWSBase) # get the values from the request inputValue = cgi.FieldStorage() # get the ID if(inputValue.has_key('id')): id = inputValue['id'].value line = inputValue['id'].value # get the serial number if(inputValue.has_key('sn')): sn = inputValue['sn'].value # get the file name if(inputValue.has_key('fn')): fn = inputValue['fn'].value # clean the line of anything but digits, i.e. SQL injection queries line = D.subn('',line)[0] # if line is too long, or there's nothing, let's just exit 'cause something's wrong # 20 is an arbitrary number if (len(line) > 20 or len(line) == 0): sys.exit() # Make the SQL query db.query("""SELECT %sparentSequence FROM %sItemAttributesMap WHERE g_itemId=%s""" \ % (column_prefix,table_prefix,line)) r = db.store_result() string = '' # if the id queried is in ItemAttributesMap, it's a full sized image, # on error (as in nothing returned), we know it's a resized version, and make the path try: numbers = p.findall(r.fetch_row()[0][0]) except IndexError: string = reducedSizeImage(line) # here, if we haven't filled 'string' it's a full sized image if (len(string) == 0): string = fullSizedImage(numbers) # after making the S3 version, we test to see if it's fresh or not, or even on S3 if (chooseRefreshedVersion(string)): string = '%smain.php?g2_view=core.DownloadItem&g2_itemId=%s&g2_serialNumber=%s&g2_fileName=%s' \ % (galleryBase,id,sn,fn) # after all that, we forward the client to the correct address print 'Location: %s' % string print
Here is a direct download link for rewrite.py.
Note that I wrote it in python and that it’s mysql-specific. I have some notes on that here.
Here are the lines you’ll need to possibly change:
–Line 11: Change bucketname to your bucket.
–Lines 12+13: The locations of albums and derivative relative to your s3fs mount point root.
–Line 14: The full path of your s3fs mount point.
–Line 15: The web-root relative path to gallery2.
–Lines 18+19: Change this to your settings for Gallery2’s database structure. It’s set to the default here.
–Line 20: This controls how often the rewrite script sends the picture request to the normal core.DownloadItem channel, which should keep resized images fresh. Even if it doesn’t need to be refreshed, the image is ‘touched’ so it appears to be renewed.
–Line 21-24: Your mysql database settings.You can comment out lines 30 and/or 7 & 8 to debug the script. Don’t forget to ‘chmod a+xr rewrite.py’.
- Download path.py and put it in cgi-bin. This provides a convenient ‘touch’ mechanism I use in rewrite.py.
- Test to see if the cgi is working. Navigate to your gallery and pick any image, and look at its address. If it’s something like ‘http://mysite.com/gallery2/d/93953-3/Kids.jpg,’ in your browser you should type ‘http://mysite.com/cgi-bin/rewrite.py?id=93953&sn=3&fn=Kids.jpg’ and your image should come up from either s3 or your website. If it doesn’t, try turning on the debugging stuff in the cgi, and check to see you’ve put the right values in.
- In gallery2/modules/rewrite/templates make a backup copy of Htaccess.tpl. You’re supposed to make a ‘local’ directory and edit gallery2/modules/rewrite/templates/local/Htaccess.tpl, but that didn’t work for me, so I edited the original version.
In whichever version of Htaccess.tpl works for you, you’ll want it took look something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
{* * $Revision: 15835 $ * If you want to customize this file, do not edit it directly since future upgrades * may overwrite it. Instead, copy it into a new directory called "local" and edit that * version. Gallery will look for that file first and use it if it exists. *} # BEGIN Url Rewrite section # (Automatically generated. Do not edit this section) <IfModule mod_rewrite.c> {if $Htaccess.needOptions} Options +FollowSymlinks {/if} RewriteEngine On RewriteBase {$Htaccess.rewriteBase} RewriteCond %{ldelim}REQUEST_FILENAME{rdelim} -f [OR] RewriteCond %{ldelim}REQUEST_FILENAME{rdelim} -d [OR] RewriteCond %{ldelim}REQUEST_FILENAME{rdelim} gallery\_remote2\.php RewriteCond %{ldelim}REQUEST_URI{rdelim} !{$Htaccess.matchBaseFile}$ RewriteRule . - [L] {foreach from=$Htaccess.rules item=rule} {if !empty($rule.conditions)} {foreach from=$rule.conditions item="condition"} RewriteCond %{ldelim}{$condition.test}{rdelim} {$condition.pattern}{if !empty($condition.flags)} [{$condition.flags|@implode:","}]{/if} {/foreach} {/if} {if $rule.substitution != '/gallery2/main.php?g2_view=core.DownloadItem&g2_itemId=%1&g2_serialNumber=%2&g2_fileName=%3'} RewriteRule . {$rule.substitution}{if !empty($rule.flags)} [{$rule.flags|@implode:","}]{/if} {else} RewriteRule . /cgi-bin/rewrite.py?id=%1&sn=%2&fn=%3 [QSA,L] {/if} {/foreach} </IfModule> # END Url Rewrite section
The lines that are different are 30, and 32 through 34. What it’s doing is changing the one rewrite rule for retrieving images that allows us to s3-ize the gallery. Make sure you make the right side of the if statement on line 30 represent where your Gallery2 install is.
- Now go to your gallery site admin, URL Rewrite preferences, and click ’save’ at the bottom. Take a look at your gallery2/.htaccess file, it should look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
# BEGIN Url Rewrite section # (Automatically generated. Do not edit this section) <IfModule mod_rewrite.c> RewriteEngine On RewriteBase /gallery2/ RewriteCond %{REQUEST_FILENAME} -f [OR] RewriteCond %{REQUEST_FILENAME} -d [OR] RewriteCond %{REQUEST_FILENAME} gallery\_remote2\.php RewriteCond %{REQUEST_URI} !/gallery2/main\.php$ RewriteRule . - [L] RewriteCond %{THE_REQUEST} /gallery2/v/([^?]+)/slideshowapplet\.html(\?.|\ .) RewriteCond %{REQUEST_URI} !/gallery2/main\.php$ RewriteRule . /gallery2/main.php?g2_view=slideshowapplet.SlideshowApplet&g2_path=%1 [QSA,L] RewriteCond %{THE_REQUEST} /gallery2/d/([0-9]+)-([0-9]+)/([^/?]+)(\?.|\ .) RewriteCond %{REQUEST_URI} !/gallery2/main\.php$ RewriteRule . /cgi-bin/rewrite.py?id=%1&sn=%2&fn=%3 [QSA,L] RewriteCond %{THE_REQUEST} /gallery2/v/([^?]+)(\?.|\ .) RewriteCond %{REQUEST_URI} !/gallery2/main\.php$ RewriteRule . /gallery2/main.php?g2_path=%1 [QSA,L] </IfModule> # END Url Rewrite sectionLine 19 is the difference between your typical gallery setup and this s3-ized gallery.
- Now go to your gallery and see if it works. If it does you’ve successfully integrated S3 with Gallery2! You may have to clear the browser’s cache, or try a different browser to make sure that images are coming from S3. Installing the Live HTTP Headers plugin for Firefox helps with being sure where things are coming from. If things aren’t coming from S3, or your gallery is broken, take a look at your error.log, or turn on debugging in rewrite.py, or make sure your gallery2/.htaccess file looks the way it should.
- You should add a crontab entry to delete files periodically from your s3fs local cache (if you’re using one), otherwise you could fill up the disk. Put something like this in your crontab:
0 0 * * * find /tmp/bucketname/ \! -type d -atime +3 -exec rm -f {} \;The line above will remove every file (not directory) from /tmp/bucketname that hasn’t been accessed in three days, and it checks at midnight every night.
- You should also figure out how to automatically mount your s3fs disk at machine boot. If you have root access, you can put a line like this into /etc/rc.local:
sudo -u username /path/to/s3fs/mountscript.sh
The username is whatever you’re using for your website. If you’re using suexec and/or userdirs, it would the the username. If you’re using the web daemon (www-data or similar), it should be that user. If you have the ‘-o allow_other’ option with s3fs, it actually doesn’t matter which user mounts the disk, but if you want to use ‘fusermount -u’ as non-root, I think you have to mount s3fs with the same user.
Another option is to write a crontab job that checks to see if s3fs is mounted (perhaps by trying to read a file on S3, or ‘grepping’ the mounttab). If it’s not mounted, it tries to mount the disk.
- That’s pretty much it. You can now remove your albums-local and derivative-local folders to free up some disk space. If you have problems, post a comment below and I’ll see if I can help you.
- If you ever want to turn of the S3-directed rewrites, you should restore the Htaccess.tpl file to original, and go to Gallery2->Site Admin->URL Rewrite and hit ’save.’ Now all images go through your server.
- Bonus! This process works fine with WPG2, a popular way to integrate Gallery2 into Wordpress. And all your old links to photos will still work, which is super cool.

June 24th, 2008 at 7:54 am
Thanks for the wonderful writeup. This is a great idea. I’m going to try it out on my own site.
I’ve got a scenario at home where I host a personal gallery site for my family on a limited bandwidth server. I like having the local master copy of all the pictures in my house under my control, but I am also limited bandwidth wise.
I was wondering if you could set up a scenario like so:
local directory - /gallery2/albums
remote mount - /s3fs/gallery2/albums
Let Gallery continue to use /gallery2/albums as it’s content location, but have an rsync job keeping /gallery2/ablums & /s3fs/gallery2/albums in sync. Then use the rewrite rule to have clients access the /s3fs/gallery2/albums version.
The advantage would be new uploads go to the local server first & you still have that local master copy, but the clients can access the pictures via s3.
Do you think there would be any problems doing it that way?
June 25th, 2008 at 9:09 pm
I don’t see any problem with that right now. Since local space isn’t an issue, it’s a fine idea keeping a local copy. I’d also consider putting cache/derivative on S3 as many requests come from that directory.
August 30th, 2008 at 7:34 pm
Hi,
Am I missing something obvious here? Aside from the double bandwidth hit, why not just mount your g2data/albums on s3 (using the S3FS linked above), and be done with it in one step? What’s the problem with that method?
August 30th, 2008 at 9:06 pm
There’s nothing wrong with keeping your data on S3 and using s3fs so that all your data goes through the server. However, I’ve used gallery in this exact setup as part of testing an it’s noticeably and frustratingly slower. The whole image has to be downloaded by the server before it can start relaying it to the web client (unless, of course, the image is cached already on the server). As you browse the gallery, pages will render in your web browser, but the images will appear several seconds later, the time depending on the size of the image. It’s a big cut in the responsiveness of your gallery.
The idea is that Amazon S3 has a massive ‘cloud’ setup with huge bandwidth, greater than any single server could ever have. As much of the data intensive parts of a page should come directly from S3, and bypass the bandwidth limited server.
August 31st, 2008 at 11:36 pm
That makes sense; I’d assumed, however, that the S3->webhost download should be quite fast (I haven’t yet moved to a VPS (planned for next month), and my shared host doesn’t support FUSE, but most transfers between high capacity sites average at least 2 MBps (not Mbps)). I plan to try this with copying just the albums to S3; my derivatives use up relatively little space, and my goal is not to speed up distribution (although I don’t want to slow it down) but to provide a cost-effective way to host all of my pictures.
Nice work!
September 1st, 2008 at 11:04 am
Ok, I just tried your full solution on my home Linux machine, and it worked great!
I think, however, that I’m going to stick with the mounted FS idea. Even though the permissions shouldn’t offer a core.DownloadItem to an unauthorized user, anyone who has one will be able to download it or email it to someone without authorization (in my context, I’m more worried about an accidental release of a private image, and less of a deliberate attempt by a user to try and view images when not logged in). Perhaps running a minimal subset of gallery on Amazon’s EC2 can help that…
In any event, nice!
September 1st, 2008 at 6:01 pm
Mike,
Cool! I’m happy that it’s working for you. If you have any suggestions on how I can make any of the steps clearer, let me know.
September 14th, 2008 at 10:44 pm
[...] Using Amazon S3 with Gallery2. [...]