Using Amazon S3 with Gallery2

Update 23 November, 2013 I am no longer using this method on this website, and I have not updated these instructions in several years. These instructions should work, but your milage may vary.

Note: For Gallery 3, there is apparently a method for storing images on Amazon S3, see here. I have not personally tried this yet.

Here are step by step instructions on how I integrated Gallery2 with Amazon S3. This is assuming you already have a working Gallery2 installation. This also assumes you have the default rewrite pattern for ‘Download Item’ in your Gallery2 Rewrites preferences, which is “d/%itemId%-%serialNumber%/%fileName%”.

Big warning! – back your stuff up (photos and database) before you try this. I would strongly suggest trying this first on a test gallery to see if it works before moving your real gallery. I did!

I have put some details and discussion on this page.

  1. Download and build s3fs on your server. It uses /etc/mime.types to determine file types by extension when uploading to S3. You need to either edit that file, or make a copy and change the source (s3fs.cpp) to point your copy, such that the line for jpegs looks like this:
    image/jpeg                                      dat jpeg jpg jpe

    You need to add the ‘dat’ extension. Also make sure there are no other mimetypes with ‘dat’ associated with them. In my tests, uploading gifs, pngs and jpgs with the extension .dat to S3 with mimetype image/jpeg works for all modern browsers.

    If this task is beyond your skill set, this Gallery2 conversion isn’t something you should attempt! Sorry.

  2. Make an account at Amazon AWS. It costs you nothing to make and maintain an unused AWS account, so even if you don’t get any of this working it won’t cost you anything.
  3. Make a bucket on S3. On Mac OS X, I use S3 Browser, there is a Firefox S3 plugin, and there are Windows S3 tools. You’ll need your access info from Amazon AWS, which are two long strings of characters.
  4. Make a folder on your server which will be the mount point for your bucket. It doesn’t matter where it is, but it probably shouldn’t be inside your web root (public_html or similar).
  5. If you’re lucky, FUSE is already built into your kernel. If it is, a command like this should work, (I like to make a small shell script):
    #!/bin/sh
    /path/to/s3fs bucketname /path/to/bucket/mountpoint -o default_acl=public-read -o allow_other -o accessKeyId=XXXXXXXXXXXXXXXX -o secretAccessKey=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX -o use_cache=/tmp

    The “-o default_acl=public-read” bit is important, as it allows the objects that are uploaded to S3 to be publicly read, which is the whole point! I’m not 100% sure if “-o allow_other” is needed. I think it is, but overall it would be better if it wasn’t needed, as it allows anyone on your machine to edit files in your bucket. I was having problems when it wasn’t turned on. A local cache is specified by “-o use_cache=/tmp” and will be made in /tmp/bucketname, and is optional, but is a good idea

    If this command doesn’t work, either you should build the FUSE kernel module and install that, or explore your options. Depending on your hosting situation you might try asking customer support to do it. I can pretty much guarantee that if you’re on a popular shared hosting server (like Dreamhost, Site5, Host Monster), FUSE won’t be installed. It doesn’t hurt to ask!

    You can un-mount a s3fs disk with the command ‘fusermount -u /path/to/s3fs/mountpoint’.

    Note: if you run s3fs as above, your secret AWS info will appear to anyone who runs ‘ps’ on your machine. There is the option of using an external file (see the s3fs page), but it defaults to /etc/passwd-s3fs which may not be a place you can or want to put it. If you want to use the external file option, you could edit the source to point to a different file.

  6. Copy your entire g2data/albums and g2data/cache/derivative folders to your s3 mount point, i.e. cp -r g2data/albums /path/to/bucket/mountpoint/, which will put your albums folder at /path/to/bucket/mountpoint/albums/ (and derivative similarly) (see note about this). Transfer time will depend on your server’s network connection, the amount of data and the number of files. There is a large per-file transaction penalty. For example, it took me 4 hours to transfer 26,000 files that totaled 2.7GB, while 5,000 files/4GB took about an hour. (See note on details page.)
  7. (I’m not sure if this is necessary.) In your php.ini file, or perhaps in your .htaccess file, you’ll want to add your bucket mount point path to your open_basedir. It should look something like this:
    open_basedir = "/path/to/public_html:/path/to/g2data:/tmp:/usr/bin:/path/to/bucket/mountpoint"

    You can make sure this is working by making a php file in your public_html folder with this content:

    <?php
    phpinfo();
    ?>

    If you then open that file using your browser, you can see if the open_basedir has been set correctly (do a search for that word).

  8. At this point it is a good idea to make sure your gallery is still functional. As the only thing up to here that might effect Gallery2 is the open_basedir, it should still work, but it doesn’t hurt to check. From this point on you’ll need to check that your gallery still works after every step.
  9. Inside the g2data folder, execute these commands, which makes Gallery2 start using the s3 files:
    mv albums albums-local
    ln -s /path/to/bucket/albums albums
    mv cache/derivative cache/derivative-local
    ln -s /path/to/bucket/derivative cache/derivative
  10. Check that your gallery works. If not, take a look at your server error log, and make sure you followed all the steps above. Depending on what files s3fs has locally cached, images may load slower than normal because they are coming from s3.
  11. In your cgi-bin directory, put this python script:
    ?View Code PYTHON
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    
    #!/usr/bin/python
    # Stephen Skory
    # stephenskory@yahoo.com
    # Part of a hack to make Gallery2 work with Amazon's S3 service
     
    # uncomment this below if you want to do some debugging.
    #print "Content-Type: text/plain"
    #print
     
    # User edited values
    AWSBase = 'http://bucketname.s3.amazonaws.com/' # yes trailing slash
    albumsAWSBase = AWSBase + 'albums/' # yes trailing slash
    derivativeAWSBase = AWSBase + 'derivative/' # yes trailing slash
    s3fsBase = '/path/to/bucket/' # yes trailing slash
    galleryBase = '/gallery2/' # your web location for gallery2, minus your domain name
                               # if your gallery is at http://mysite.com/gallery2/main.php
                               # put /gallery2/ here, with the trailing slash
    table_prefix = 'g2_'
    column_prefix = 'g_'
    refreshTime = '180' # days
    myHost = 'localhost' # 99% of the time
    myUser = 'database_username'
    myPasswd = 'database_password'
    myDB = 'database_name'
     
    # --- EDIT ABOVE --- #
     
    # get the needed modules
    import cgi, time, sys, re, MySQLdb, urllib2
    #import cgitb; cgitb.enable() # for debugging
    from path import path # for 'touch'ing files, requries path.py: http://www.jorendorff.com/articles/python/path/
     
    # if the image is a full sized image, get the pieces of its pathname
    def fullSizedImage(numbers):
        path = []
        # loop over the numbers in its location
        for number in numbers:
            if (number != '7'): # 7 is the top album
                db.query("""SELECT %spathComponent from %sFileSystemEntity where g_id=%s""" \
                % (column_prefix,table_prefix,number))
                r = db.store_result()
                # store the results in path
                path.append(r.fetch_row()[0][0])
        # get the file name
        db.query("""SELECT %spathComponent from %sFileSystemEntity where g_id=%s""" \
        % (column_prefix,table_prefix,line))
        r = db.store_result()
        for row in r.fetch_row(1):
            path.append(row[0])
     
        # make the path to the full S3 version
        string = ''
        string = "/".join("%s" % part for part in path)
        string = albumsAWSBase + string
        return string
     
    # if the id is for a reduced size image, it's easy to make the S3 location
    def reducedSizeImage(line):
        if (int(line) <= 100): # the id=100 is a weird case I'm not 100% on...
            string = derivativeAWSBase + '0/%s/' % line[0]
            string = string + line.strip() + '.dat'
        else:
            string = derivativeAWSBase + '%s/%s/' % (line[0], line[1])
            string = string  + line.strip() + '.dat'
        return string
     
    # we call this after we've formed a S3 location to see if it's there already.
    # if it is and it's new enough, we return 0 and preserve the S3 version
    # otherwise we return 1 and will create a new local location
    def chooseRefreshedVersion(Location):
        # try to get the file off of S3
        req = urllib2.Request(Location)
        try:
            url_handle = urllib2.urlopen(req)
        # if it's not there, return 1
        except urllib2.HTTPError:
            return 1
        # otherwise, get the file headers
        headers = url_handle.info()
        # get the date of last change for the S3 version
        last_modified = headers.getheader("Last-Modified")
        time_mod = time.mktime(time.strptime(last_modified, "%a, %d %b %Y %H:%M:%S %Z"))
        # now
        time_now = time.mktime(time.localtime())
        # calculate the difference
        timeDiff = time_now - time_mod
        # calculate the refresh time in seconds
        rTimeSecs = 60*60*24*refreshTime
        # if the S3 file is too old, return 1
        # but also we'll 'touch' it to 'refresh' it. If it needs regeneration, Gallery will do it
        # because we're going through the 'official' channel, so an extra touch won't hurt it
        if (timeDiff > rTimeSecs):
            # take off the s3 address part
            localFileName = sb.subn('',Location)[0]
            # add the local stuff
            localFileName = s3fsBase + localFileName
            # create f under path framework
            f = path(localFileName)
            # touch it
            f.touch()
            return 1
        # if the S3 version is not too old, or if it exists on S3, return 0
        return 0
     
    # Establish a database connection
    # yeah, yeah, it's database specific. You get what you pay for!
    db = MySQLdb.connection(host=myHost,
                    user=myUser,
                    passwd=myPasswd,
                    db=myDB)
     
    # Make some regular expressions
    # to find the numbers from parentSequence
    p = re.compile('\d+')
    # to strip off stuff that's not a number
    D = re.compile('\D')
    # for updating refresh time
    sb = re.compile(AWSBase)
     
    # get the values from the request
    inputValue = cgi.FieldStorage()
     
    # get the ID
    if(inputValue.has_key('id')):
        id = inputValue['id'].value
        line = inputValue['id'].value
     
    # get the serial number
    if(inputValue.has_key('sn')):
        sn = inputValue['sn'].value
     
    # get the file name
    if(inputValue.has_key('fn')):
        fn = inputValue['fn'].value
     
    # clean the line of anything but digits, i.e. SQL injection queries
    line = D.subn('',line)[0]
     
    # if line is too long, or there's nothing, let's just exit 'cause something's wrong
    # 20 is an arbitrary number
    if (len(line) > 20 or len(line) == 0):
        sys.exit()
     
    # Make the SQL query
    db.query("""SELECT %sparentSequence FROM %sItemAttributesMap WHERE g_itemId=%s""" \
    % (column_prefix,table_prefix,line))
    r = db.store_result()
    string = ''
    # if the id queried is in ItemAttributesMap, it's a full sized image,
    # on error (as in nothing returned), we know it's a resized version, and make the path
    try:
        numbers = p.findall(r.fetch_row()[0][0])
    except IndexError:
        string = reducedSizeImage(line)
    # here, if we haven't filled 'string' it's a full sized image
    if (len(string) == 0):
        string = fullSizedImage(numbers)
     
    # after making the S3 version, we test to see if it's fresh or not, or even on S3
    if (chooseRefreshedVersion(string)):
        string = '%smain.php?g2_view=core.DownloadItem&g2_itemId=%s&g2_serialNumber=%s&g2_fileName=%s' \
        % (galleryBase,id,sn,fn)
     
    # after all that, we forward the client to the correct address
    print 'Location: %s' % string
    print

    Here is a direct download link for rewrite.py.

    Note that I wrote it in python and that it’s mysql-specific. I have some notes on that here.

    Here are the lines you’ll need to possibly change:
    –Line 11: Change bucketname to your bucket.
    –Lines 12+13: The locations of albums and derivative relative to your s3fs mount point root.
    –Line 14: The full path of your s3fs mount point.
    –Line 15: The web-root relative path to gallery2.
    –Lines 18+19: Change this to your settings for Gallery2′s database structure. It’s set to the default here.
    –Line 20: This controls how often the rewrite script sends the picture request to the normal core.DownloadItem channel, which should keep resized images fresh. Even if it doesn’t need to be refreshed, the image is ‘touched’ so it appears to be renewed.
    –Line 21-24: Your mysql database settings.

    You can comment out lines 30 and/or 7 & 8 to debug the script. Don’t forget to ‘chmod a+xr rewrite.py’.

  12. Download path.py and put it in cgi-bin. This provides a convenient ‘touch’ mechanism I use in rewrite.py.
  13. Test to see if the cgi is working. Navigate to your gallery and pick any image, and look at its address. If it’s something like ‘http://mysite.com/gallery2/d/93953-3/Kids.jpg,’ in your browser you should type ‘http://mysite.com/cgi-bin/rewrite.py?id=93953&sn=3&fn=Kids.jpg’ and your image should come up from either s3 or your website. If it doesn’t, try turning on the debugging stuff in the cgi, and check to see you’ve put the right values in.
  14. In gallery2/modules/rewrite/templates make a backup copy of Htaccess.tpl. You’re supposed to make a ‘local’ directory and edit gallery2/modules/rewrite/templates/local/Htaccess.tpl, but that didn’t work for me, so I edited the original version.

    In whichever version of Htaccess.tpl works for you, you’ll want it took look something like this:

    ?View Code SMARTY
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    
    {*
     * $Revision: 15835 $
     * If you want to customize this file, do not edit it directly since future upgrades
     * may overwrite it.  Instead, copy it into a new directory called "local" and edit that
     * version.  Gallery will look for that file first and use it if it exists.
     *}
    # BEGIN Url Rewrite section
    # (Automatically generated.  Do not edit this section)
    <IfModule mod_rewrite.c>
    {if $Htaccess.needOptions}
        Options +FollowSymlinks
    {/if}
        RewriteEngine On
     
        RewriteBase {$Htaccess.rewriteBase}
     
        RewriteCond %{ldelim}REQUEST_FILENAME{rdelim} -f [OR]
        RewriteCond %{ldelim}REQUEST_FILENAME{rdelim} -d [OR]
        RewriteCond %{ldelim}REQUEST_FILENAME{rdelim} gallery\_remote2\.php
        RewriteCond %{ldelim}REQUEST_URI{rdelim} !{$Htaccess.matchBaseFile}$
        RewriteRule .   -   [L]
     
    {foreach from=$Htaccess.rules item=rule}
    {if !empty($rule.conditions)}
    {foreach from=$rule.conditions item="condition"}
        RewriteCond %{ldelim}{$condition.test}{rdelim} {$condition.pattern}{if !empty($condition.flags)}   [{$condition.flags|@implode:","}]{/if}
     
    {/foreach}
    {/if}
        {if $rule.substitution != '/gallery2/main.php?g2_view=core.DownloadItem&g2_itemId=%1&g2_serialNumber=%2&g2_fileName=%3'}
        RewriteRule .   {$rule.substitution}{if !empty($rule.flags)}   [{$rule.flags|@implode:","}]{/if}
        {else}
        RewriteRule .    /cgi-bin/rewrite.py?id=%1&sn=%2&fn=%3 [QSA,L]
        {/if}
     
    {/foreach}
    </IfModule>
     
    # END Url Rewrite section

    The lines that are different are 30, and 32 through 34. What it’s doing is changing the one rewrite rule for retrieving images that allows us to s3-ize the gallery. Make sure you make the right side of the if statement on line 30 represent where your Gallery2 install is.

  15. Now go to your gallery site admin, URL Rewrite preferences, and click ‘save’ at the bottom. Take a look at your gallery2/.htaccess file, it should look like this:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    
    # BEGIN Url Rewrite section
    # (Automatically generated.  Do not edit this section)
    <IfModule mod_rewrite.c>
        RewriteEngine On
     
        RewriteBase /gallery2/
     
        RewriteCond %{REQUEST_FILENAME} -f [OR]
        RewriteCond %{REQUEST_FILENAME} -d [OR]
        RewriteCond %{REQUEST_FILENAME} gallery\_remote2\.php
        RewriteCond %{REQUEST_URI} !/gallery2/main\.php$
        RewriteRule .   -   [L]
     
        RewriteCond %{THE_REQUEST} /gallery2/v/([^?]+)/slideshowapplet\.html(\?.|\ .)
        RewriteCond %{REQUEST_URI} !/gallery2/main\.php$
            RewriteRule .   /gallery2/main.php?g2_view=slideshowapplet.SlideshowApplet&g2_path=%1   [QSA,L]    
        RewriteCond %{THE_REQUEST} /gallery2/d/([0-9]+)-([0-9]+)/([^/?]+)(\?.|\ .)
        RewriteCond %{REQUEST_URI} !/gallery2/main\.php$
            RewriteRule .    /cgi-bin/rewrite.py?id=%1&sn=%2&fn=%3 [QSA,L]
     
        RewriteCond %{THE_REQUEST} /gallery2/v/([^?]+)(\?.|\ .)
        RewriteCond %{REQUEST_URI} !/gallery2/main\.php$
            RewriteRule .   /gallery2/main.php?g2_path=%1   [QSA,L]    
    </IfModule>
     
    # END Url Rewrite section

    Line 19 is the difference between your typical gallery setup and this s3-ized gallery.

  16. Now go to your gallery and see if it works. If it does you’ve successfully integrated S3 with Gallery2! You may have to clear the browser’s cache, or try a different browser to make sure that images are coming from S3. Installing the Live HTTP Headers plugin for Firefox helps with being sure where things are coming from. If things aren’t coming from S3, or your gallery is broken, take a look at your error.log, or turn on debugging in rewrite.py, or make sure your gallery2/.htaccess file looks the way it should.
  17. You should add a crontab entry to delete files periodically from your s3fs local cache (if you’re using one), otherwise you could fill up the disk. Put something like this in your crontab:
    0       0       *       *       *       find /tmp/bucketname/ \! -type d -atime +3 -exec rm -f {} \;

    The line above will remove every file (not directory) from /tmp/bucketname that hasn’t been accessed in three days, and it checks at midnight every night.

  18. You should also figure out how to automatically mount your s3fs disk at machine boot. If you have root access, you can put a line like this into /etc/rc.local:
    sudo -u username /path/to/s3fs/mountscript.sh

    The username is whatever you’re using for your website. If you’re using suexec and/or userdirs, it would the the username. If you’re using the web daemon (www-data or similar), it should be that user. If you have the ‘-o allow_other’ option with s3fs, it actually doesn’t matter which user mounts the disk, but if you want to use ‘fusermount -u’ as non-root, I think you have to mount s3fs with the same user.

    Another option is to write a crontab job that checks to see if s3fs is mounted (perhaps by trying to read a file on S3, or ‘grepping’ the mounttab). If it’s not mounted, it tries to mount the disk.

  19. That’s pretty much it. You can now remove your albums-local and derivative-local folders to free up some disk space. If you have problems, post a comment below and I’ll see if I can help you.
  20. If you ever want to turn of the S3-directed rewrites, you should restore the Htaccess.tpl file to original, and go to Gallery2->Site Admin->URL Rewrite and hit ‘save.’ Now all images go through your server.
  21. Bonus! This process works fine with WPG2, a popular way to integrate Gallery2 into WordPress. And all your old links to photos will still work, which is super cool.

35 Responses to “Using Amazon S3 with Gallery2”

  1. Thanks for the wonderful writeup. This is a great idea. I’m going to try it out on my own site.

    I’ve got a scenario at home where I host a personal gallery site for my family on a limited bandwidth server. I like having the local master copy of all the pictures in my house under my control, but I am also limited bandwidth wise.

    I was wondering if you could set up a scenario like so:

    local directory – /gallery2/albums
    remote mount – /s3fs/gallery2/albums

    Let Gallery continue to use /gallery2/albums as it’s content location, but have an rsync job keeping /gallery2/ablums & /s3fs/gallery2/albums in sync. Then use the rewrite rule to have clients access the /s3fs/gallery2/albums version.

    The advantage would be new uploads go to the local server first & you still have that local master copy, but the clients can access the pictures via s3.

    Do you think there would be any problems doing it that way?

  2. I don’t see any problem with that right now. Since local space isn’t an issue, it’s a fine idea keeping a local copy. I’d also consider putting cache/derivative on S3 as many requests come from that directory.

  3. Mike says:

    Hi,

    Am I missing something obvious here? Aside from the double bandwidth hit, why not just mount your g2data/albums on s3 (using the S3FS linked above), and be done with it in one step? What’s the problem with that method?

  4. There’s nothing wrong with keeping your data on S3 and using s3fs so that all your data goes through the server. However, I’ve used gallery in this exact setup as part of testing an it’s noticeably and frustratingly slower. The whole image has to be downloaded by the server before it can start relaying it to the web client (unless, of course, the image is cached already on the server). As you browse the gallery, pages will render in your web browser, but the images will appear several seconds later, the time depending on the size of the image. It’s a big cut in the responsiveness of your gallery.

    The idea is that Amazon S3 has a massive ‘cloud’ setup with huge bandwidth, greater than any single server could ever have. As much of the data intensive parts of a page should come directly from S3, and bypass the bandwidth limited server.

  5. Mike says:

    That makes sense; I’d assumed, however, that the S3->webhost download should be quite fast (I haven’t yet moved to a VPS (planned for next month), and my shared host doesn’t support FUSE, but most transfers between high capacity sites average at least 2 MBps (not Mbps)). I plan to try this with copying just the albums to S3; my derivatives use up relatively little space, and my goal is not to speed up distribution (although I don’t want to slow it down) but to provide a cost-effective way to host all of my pictures.

    Nice work!

  6. Mike says:

    Ok, I just tried your full solution on my home Linux machine, and it worked great!

    I think, however, that I’m going to stick with the mounted FS idea. Even though the permissions shouldn’t offer a core.DownloadItem to an unauthorized user, anyone who has one will be able to download it or email it to someone without authorization (in my context, I’m more worried about an accidental release of a private image, and less of a deliberate attempt by a user to try and view images when not logged in). Perhaps running a minimal subset of gallery on Amazon’s EC2 can help that…

    In any event, nice!

  7. Mike,

    Cool! I’m happy that it’s working for you. If you have any suggestions on how I can make any of the steps clearer, let me know.

  8. Dave says:

    The jorendorff website, home of the path.py script, has been offline for the last couple of days.

    I think this link is to the same script, at a different location.

    http://pypi.python.org/pypi/path.py/

    If I’m wrong, of course, please delete this post.

  9. That is indeed the right script. Thanks for the link.

    I could put a copy of the script on my website, but I was trying to keep credit where it’s due.

  10. Fajju says:

    Has anyone tried this with other network storages? Mosso?

  11. I have no knowledge of whether someone has or hasn’t. It does seem perfectly possible, there is a Mosso FUSE client, which would do the same as s3fs here. You just need a FUSE implementation for the other storage service.

  12. Genia Bezman says:

    Hi.
    I’ve checked out gallery’s source code and saw that images are outputted to the browser in only two locations.
    One is gallery2/modules/core/DownloadItem.inc in the function sendFile (which is the usual template used to output images)
    The other is gallery2/modules/core/classes/Gallery.class in the function fastDownload.
    The idea I had is to replace the image-outputting code (which sends HTTP headers, reads file from disk while writing it to the browser) with a simple Location header to the amazon URL of the image. this way you get all the checks that Gallery2 does for you, but the end result is the image being transferred from S3.
    This is of course only theory for now, but I’ll try it later today and see if it works out.
    If (that’s a huge if) it indeed does work, then there is no longer any need to send HEAD requests to amazon to check if the file is in place (or to query the database for the matter), since by the time you read the DownloadItem page, you’re already outputting something from disk. So get rid of the disk, tell the browser to get it elsewhere :)
    Should work for everything, including derivative files (they a created before DownloadItem’s eendFile method).

    Thanks for the great post (and the nicely drawn diagram on the other detailed page)!

  13. HI Genia,

    I’m open to any ideas to make this thing better, of course. Good luck!

  14. Genia Bezman says:

    Good news!
    This method works (both with s3fs and with jungledisk). I’m currently experimenting with creating time-limited URLs for accessing the files (which would make private items actually private). Amazon S3 supports this (don’t remember what they call it, it’s just a URL that expires after a set amount of time, signed using a digest generated based on your amazon public and private keys) and I found a PHP class that gives a nice interface for creating such URLs (http://undesigned.org.za/2007/10/22/amazon-s3-php-class)

    As soon as I have something completely working, I’ll publish the code and give a URL to a demo site.

    This method should lighten the load on the server considerably, since the rewrite.py file and all the associated hackery needs not be called anymore – requests go straight to amazon’s servers.

    If anyone else is interested in this solution, please tell me :)

  15. Mike Miller says:

    Genia,

    That sounds really interesting. Right now, I’m using s3, but mounted via S3FS (my derivatives are local, but the full sizes are on s3), since I want the image firewall. If there was some way to access amazon directly while preserving some security, that would be great (my security needs are more along the lines of not leaking URLs through referer headers and less for actual “security”, so a temporary URL that expired after a few minutes would be plenty).

    Looking forward…

  16. Genia Bezman says:

    Hi.
    Further testing shows this method works extremely well (and really really really fast).

    The fact that images are sent through amazon’s server (since your server just redirects the user to the temporary URL) makes things much faster. If your server is getting hammered, you won’t be serving lengthy requests (images might be big, after all).

    Currently my implementation is very hacking (ashamed to show it, strange variables everywhere and strange files lying around), but the concept seems to work really well.

    I really wish I could make this into a gallery2 module, but it seems that one of the functions I’d need to modify is not modifiable through the module system, which is a problem. (Doesn’t seem Gallery.class can be changed in custom modules).

    If anyone has ideas, please tell me.

    Mike, if you are interested in what I’m doing, feel free to contact me through email (genia4@gmail.com). I’m also available ton google talk (same address). I can explain in more detail what I did (and provide code snippets).

  17. Steve,

    I think your work is great. We do use access control for images; this is the right way for us to use S3.

    I’m rewriting your solution it to work with MediaWiki (MW). I’m also looking to see what other people already may have done. Your code and instructions are great.

    Like Gallery2, MW creates thumbnails for every image. We have this configured to be done when the image is originally stored. People can specify the size of the thumbnails in their image tags. A thumbnail in that sense is just a page-specific scaling of the image and can get pretty large. Thumbnails are rebuilt if none is available for a page when it’s rendered.

    By using your model, are Gallery’s thumbnails also being stored on s3? If not, did you mess around with them to see if it made a difference?

    I’m not so cheap that I’m trying to move the diskspace for the thumbnails to s3 because of cost. I’m more interested in getting as much storage out of the local box so that I can also migrate my php apps to ec2.

    I’m experimenting with this on a local set of boxes today to see how it works. In MW, the storage of thumbnails is done in the same base directory as the fat image files. I want to see if I can just remote the whole directory. We have a few dozen separate MW instances to relocate. It will be a lot cleaner to have the same model with s3 in the mix.

    Thanks again for your work and thanks, in advance, for any comments.

  18. Bill,

    In Gallery2, the thumbnails are functionally treated the same as all the other re-sized copies of the image. So yes, they are on S3.

    If you’re moving a whole directory to S3, and the directory structure is more straightforward than the image referencing scheme for Gallery2, you may not even need a cgi script to redirect requests. You may be able to use only .htaccess URL rewrites.

    I am not familiar with MediaWIki, so I don’t know how much guidance I can give there. I wish you luck!

  19. [...] another solution from Stephen Skory to use AWS S3 for Gallery2. His version seems to be sophisticated but there are quite some steps to [...]

  20. Werner says:

    This is a real interesting project. I was looking for a lightweight solution without database accesses and an easy setup. I tried around and found a solution to put my images on S3 and on Cloudfront.

    The modifications to make on the web server are minimal:
    - only modify 2 lines in Gallery2′s .htaccess

    But there are some drawbacks and it’s only good for gallery which are not updated by several people.

    Have a look here:

    http://www.flughafen-kurier.ch/blog/2009/07/24/how-to-use-s3-and-cloudfront-for-gallery2/

    Comments and improvements are welcome.

  21. Werner,

    it seems to me that your solution works only when the original webserver has enough disk space for all your media files. When that this the case, and one is willing to keep the S3/Cloudfront copies up to date manually (or perhaps with a rsync script), your method will work.

    However, if the situation is like mine, where the webserver doesn’t have enough disk space for all your files, I don’t think it will work.

    That being said, I very much appreciate the link and the comment. Thanks!

  22. [...] using Amazon AWS (EC2, Elastic MapReduce, S3), I already use S3 for the blogs file hosting (Using s3fs and a perl script to redirect some queries to it), and really like the [...]

  23. [...] Using Gallery2 with Amazon’s S3 service (link) [...]

  24. Seamus Ryan says:

    Stephen,

    I was wondering if you’ve had a chance to try this solution with the Gallery3 beta.

    I’ve been looking for a way to host my gallery2 files on S3 for a long time but I am also keen to up upgrade to Gallery3 especially as it appears to be reaching a Beta4 or RC1 stage.

    Is your S3 solution compatible with Gallery3?

    Thanks.

  25. Seamus,

    I have not had a chance to play with G3, so I don’t know how well this method would work with it. Of course it would have to be modified, but I don’t know in what way. It is also possible that G3 changed things in such a way that a S3 plugin would be easier to write.

    I’m sorry I can’t be more helpful!

  26. przemek says:

    hi,
    thanks for your work Stephen, this is great !
    Unfortunatelly i have a problem with flv files located on S3, for images it works, but for movies (flv filetypes) it does not.
    I think all flv files are being fetched from ‘my server’, but i can see connection established between ‘my server’ and S3 service.
    But NO connection between browser on workstation/client and S3…..
    So, flv are on S3, and when one wants to view them, thare are fetched by apache and then only served to the one’s browser..

  27. przemek,

    Have you looked at the http connections in detail? You might try this plugin in Firefox:

    https://addons.mozilla.org/en-US/firefox/addon/3829

    Also, you should make sure that .flv is correctly set in /etc/mime.types. I haven’t ever messed with .flv file types in Gallery, so I’m sorry I can’t be more helpful! Good luck!

  28. przemek says:

    hi,
    thanks for your answer, finally i commented lines :
    # after making the S3 version, we test to see if it’s fresh or not, or even on S3
    #if (chooseRefreshedVersion(string)):
    # string = ‘%smain.php?g2_view=core.DownloadItem&g2_itemId=%s&g2_serialNumber=%s&g2_fileName=%s’ \
    # % (galleryBase,id,sn,fn)

    and it seems to be working now…. It changed the S3 version link into my server’s one again (with this refreshedversion), dont know why…

  29. sprite1 says:

    THANKS :)

  30. Tom Chiverton says:

    BTW, you need to escape the S3 URL correctly, at least here in chooseRefreshedVersion():

    req = urllib2.Request(urllib.quote(Location,safe=”:/”))

    Add urllib to the imports, obviously.

  31. Mack says:

    I see a lot of interesting posts on your website.
    You have to spend a lot of time writing, i know how to save you
    a lot of time, there is a tool that creates high quality, google friendly posts in couple of seconds,
    just search in google – k2 unlimited content

  32. I am not positive where you are getting your
    info, but great topic. I needs to spend some time finding
    out much more or figuring out more. Thank you for wonderful information I used to be
    looking for this information for my mission.

  33. magnificent post, very informative. I ponder why the other experts of
    this sector don’t understand this. You must continue your
    writing. I’m sure, you’ve a great readers’ base already!

    Feel free to visit my website; online business

  34. Quentin says:

    I read a lot of interesting articles here. Probably you spend a lot of time writing, i know how to save
    you a lot of work, there is an online tool that creates
    readable, google friendly posts in minutes, just search in google – laranitas free content source

Leave a Reply