• How to Get Started with Amazon CloudFront and Amazon S3

How to Get Started with Amazon CloudFront and Amazon S3

How to Get Started with Amazon CloudFront and Amazon S3
By Matthew Sacks

S3 and CloudFront

Amazon S3 is Amazon’s Cloud Computing Storage Service. It allows for storing files in “the cloud” using their API which can be utilized with Java, C#, Perl, PHP.

Why do I care?

S3 presents a viable and cheap option for storing files off site. It is useful for off-site backups as well as running applications in the Clouds. S3 is used on a pay-per-use basis and offers a simple way to utilize an enterprise storage infrastructure without paying hundreds of thousands or millions of dollars for a datacenter.

CloudFront is Amazon’s content delivery network. It uses files stored in S3 and can distribute them globally throughout Amazon’s global network of edge servers. The advantage of using CloudFront is that wherever a request to access a web site or file is originating from, the request will come from a server in close proximity to that request, which improves response times of web site or files being served.

This tutorial assumes that the reader already has an Amazon S3 and Amazon CloudFront account available here.

Getting started with S3

S3 is simply an API layer, which allows for storing file’s in Amazon’s Cloud. In order to interact with the S3 cloud, tools must be written in order to store, retrieve and manage files there. The simplest way to manage files in s3 that I have found is a tool called s3cmd. There are other tools out there, or a developer can write their own utilities for interacting with the S3 API’s and embed them in their application; however, s3cmd is simple and easy to use, hence it is my favorite.

Using s3cmd

The easiest way to use S3 is to use the command-line based S3 tools available at http://s3tools.logix.cz/s3cmd. If there isn’t an application that needs to programmatically interact with the AWS API, then S3 Tools are the way to go. They are designed for off-site backups, and other manual data storage tasks on the S3 cloud. s3cmd now supports managing files in CloudFront. Since version 0.9.9 of s3cmd it supports CloudFront as well so there’s no need for any other tools to set up the distribution. For more information on how to use s3cmd for CloudFront, please see the s3cmd CloudFront Howto.

Setting up s3cmd

Step 1) Configure s3cmd to work with an AWS account

gfizzle:s3cmd-0.9.8.4 msacks$ s3cmd –configure

Enter new values or accept defaults in brackets with Enter.
Refer to user manual for detailed description of all options.

Access key and Secret key are your identifiers for Amazon S3
Access Key: 9999999999999999
Secret Key: 9999999999999999

Encryption password is used to protect your files from reading
by unauthorized persons while in transfer to S3
Encryption password: MyPassword
Path to GPG program [/usr/local/bin/gpg]:

When using secure HTTPS protocol all communication with Amazon S3
servers is protected from 3rd party eavesdropping. This method is
slower than plain HTTP and can’t be used if you’re behind a proxy
Use HTTPS protocol [No]:

On some networks all Internet access must go through a HTTP proxy.
Try setting it here if you can’t conect to S3 directly
HTTP Proxy server name:

New settings:
Access Key: 9999999999999999
Secret Key: 9999999999999999
Encryption password: 0000
Path to GPG program: /usr/local/bin/gpg
Use HTTPS protocol: False
HTTP Proxy server name:
HTTP Proxy server port: 0

Test access with supplied credentials? [Y/n] y
Please wait…
Success. Your access key and secret key worked fine :-)

Now verifying that encryption works…
Success. Encryption and decryption worked fine :-)

Save settings? [y/N] y
Configuration saved to ‘/Users/msacks/.s3cfg’

Step 2) Creating an S3 Bucket
Before any files can be stored, a bucket must be created using the following commands:

gfizzle:s3cmd-0.9.8.4 msacks$ s3cmd mb s3://large-images-bitsource
Bucket ‘large-images-bitsource’ created

The list of buckets or bucket contents can be shown using the following command

gfizzle:s3cmd-0.9.8.4 msacks$ s3cmd ls
2008-12-29 06:56  s3://large-images-bitsource

Step 3) Upload Files into the bucket

To upload a single file:

s3cmd put <OPTIONS> <FILENAME> <BUCKET>

For example:

s3cmd put IMG_0113.JPG s3://large-images-bitsource

To upload multiple files:
When using s3cmd to upload files to the CDN, you can set them as public at upload time, rather than going through and setting them as public manually after they are uploaded. By default, files that are uploaded without access permissions specified are set as private.

gfizzle:large-images msacks$ for i in `ls`; do s3cmd put –acl-public –guess-mime-type $i s3://large-images-bitsource/$i; done
File ‘IMG_0113.JPG’ stored as s3://large-images-bitsource/IMG_0113.JPG (1092680 bytes in 20.5 seconds, 52.05 kB/s) [1 of 1]
File ‘IMG_0114.JPG’ stored as s3://large-images-bitsource/IMG_0114.JPG (2593819 bytes in 52.5 seconds, 48.20 kB/s) [1 of 1]
File ‘IMG_0115.JPG’ stored as s3://large-images-bitsource/IMG_0115.JPG (229671 bytes in 2.3 seconds, 97.50 kB/s) [1 of 1]
File ‘ IMG_0116.JPG ‘ stored as s3://large-images-bitsource/IMG_0116.JPG (4154633 bytes in 86.3 seconds, 47.00 kB/s) [1 of 1]

Amazon CloudFront

Once files are in S3, they can be distributed to the Amazon CDN with CloudFront. There are a few methods readily available for managing files with CloudFront.

Before distributions are created with CloudFront, publicly readable S3 buckets need to be present to distribute throughout the CDN. Make sure to choose a DNS-friendly bucket name, otherwise when attempting to create a distribution cfcurl.pl will fail with an error like the following:

<ErrorResponse xmlns=”http://cloudfront.amazonaws.com/doc/2008-06-30/”><Error><Type>Sender</Type><Code>InvalidArgument</Code><Message>The parameter Origin does not refer to a valid S3 bucket.</Message><Detail/></Error><RequestID>1b423e6-d748-47bd-8774-ae3d32c5b2</RequestID></ErrorResponse>

To create a publicly readable s3 bucket, make sure files are set to public when uploaded. The follow s3cmd command can be used to upload files with public permissions to an S3 bucket:

s3cmd put –acl-public –guess-mime-type file.jpg s3://large-images-bitsource/file.jpg

As each file is stored in the S3 bucket a public URL will be returned to the command line:

File ‘IMG_0113.JPG’ stored as s3://large-images-bitsource/IMG_0113.JPG (1460099 bytes in 29.1 seconds, 49.04 kB/s) [1 of 1]
Public URL of the object is: http://large-images-bitsource.s3.amazonaws.com/IMG_0113.JPG

This URL does not represent files served through the CloudFront CDN, but is rather being served from a single static location. In order to distribute files globally through the CloudFront CDN, there are a few steps that must be taken in order to achieve this.

Serving files from the CloudFront Content Delivery Network

To serve files through the CloudFront CDN there are a number of requirements that must be met:

1)    S3 buckets must be present to be served through CloudFront
2)    Contents of the bucket which are to be served must have public access permissions
3)    A CloudFront distribution must be created

Amazon and the development community around Amazon Web Services have made API’s available for a developer to create their own tools or integrate AWS with their application. For standard users who simply want to manually upload files to the CDN there are a number of ready-made tools available to start working with S3 and CloudFront. There are many ready-made utilities available, but two of the more common utilities for working with CloudFront are covered here: The Windows CloudFront GUI tool and cfcurl.pl, which is a command line utility for working with CloudFront. cURL can also be used directly to work with the CloudFront API’s, although this can be rather tedious and time consuming.

Managing CloudFront using the Windows GUI tool.

Step 1) Download the CloudFront GUI manager from http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1855

Step 2) Enter your account name which can be any arbitrary name, and then your access key id and secret access key, which are available under the “Access Identifiers” section of your AWS account.

Step 3) Once logged in, you can manage S3 buckets, and make files public or private. Files must be set to public access within S3 before serving them from the CloudFront CDN.

To make a file public within an S3 bucket, select the bucket in the CloudFront Manager and click the Lock Icon to set object permissions. If the desired object is already locked, set its access permissions to public. Now it can be served from the CloudFront CDN.

Step 4) Serve files through the CloudFront CDN

Click the “Amazon CloudFront” tab in the CloudFront Manager.
In order to serve files, we must first create a distribution.

Step 5) Create a new distribution icon to select a new distribution from the “
The “Origin/Bucket” drop-down menu to serve through the CloudFront CDN. Check the enabled box to make sure that the distribution is serve-able through the CDN and click Save. A message box should appear if the changes were successful.

A domain name should now be assigned to that bucket that will appear under the “Distributions” tab, at which point files will begin to propagate through the CDN.

Once propagation is complete, the status message box will show the status of file propagation throughout the CDN.

Now the files in the bucket can be accessed through the CDN using the following convention: http://bucketname.s3.amazonaws.com/filename.file

For example, if I have a file in a bucket called large-images-bitsource called IMG_0113.JPG, it will now be accessible from the CloudFront CDN using the following URL:
http://large-images-bitsource.s3.amazonaws.com/IMG_0113.JPG

Using the cfcurl.pl Perl script to manage CloudFront

The main function of cfcurl is to call curl with the proper arguments and HTTP headers for working with the CloudFront API. There aren’t many (or any) examples on how to of cfcurl.pl by itself, and still requires referring to the CloudFront API guide. Although a basic tutorial on creating a CloudFront distribution is given here to get you started.

This method is for the more advanced user and assumes they are familiar with the curl utility (http://curl.haxx.se/). There are plenty of good tutorials on using curl on the Internet already, so if you want to familiarize yourself with it I suggest the curl mailing lists, which are free (http://curl.haxx.se/mail/).

Step 1) Download cfcurl.pl

One of which is a simple to use windows GUI tool, which isn’t exactly ideal for programmers and system engineers (except lazy ones) and a perl script which is used in conjunction with curl is also provided at http://docs.amazonwebservices.com/AmazonCloudFront/latest/GettingStartedGuide/.

Step 2) Create an .aws-secrets file in the user’s home directory

Example:
%awsSecretAccessKeys = (
# primary account
primary => {
id => ”,
key => ”,
},

);

NOTE: Any number of access keys can be stored in a .aws-secrets file.
NOTE: Make sure that the .aws-secrets file is readable ONLY by the user executing the command, otherwise it will throw the following error message:

gfizzle:~ msacks$ I refuse to read your credentials from /Users/msacks/.aws-secrets as this file is readable by, writable by or owned by someone else. Try chmod 600 /Users/msacks/.aws-secrets at ./cfcurl.pl line 66.

[1]+  Exit 255                ./cfcurl.pl –keyname=primary https://cloudfront.amazonaws.com/2008-06-30/distribution

Simply change the permissions and that should solve the problem:
gfizzle:~ msacks$ chmod 700 .aws-secrets

Step 3) Creating the XML configuration file
Cfcurl.pl requires an XML file containing information about the distribution that is to be created.

Here is an example xml file for creating a distribution with cfcurl.pl called large_images-dist.xml:

large-images-bitsource.s3.amazonaws.com
20080111165200
Large Images Distribution
true

All of the tags provided should be the same except the <Origin> and <CallerReference> tags. The <Origin> tag takes the following format bucketname.s3.amazonaws.com.
<CallerReference> is usually the date stamp in the format of YYYYMMDDHHSS and is used to create a unique identifier for the request.

Step 4) Create the distribution
Execute the following command to create the distribution:

./cfcurl.pl –keyname <KEYNAME> — -i -X POST -H”Content-Type:text/xml; charset=UTF-8″ –upload-file <FILENAME.xml> https://cloudfront.amazonaws.com/2008-06-30/distribution

For example, to create a distribition based from large-images-dist.xml, execute the following command:

./cfcurl.pl –keyname primary — -i -X POST -H”Content-Type:text/xml; charset=UTF-8″ –upload-file images_dist.xml https://cloudfront.amazonaws.com/2008-06-30/distribution

The following response should appear in XML format indicating that the distribution has been created:

HTTP/1.1 100 Continue

HTTP/1.1 201 Created
ETag: E1RV04P3KJLNB0
Location: https://cloudfront.amazonaws.com/2008-06-30/distribution/E305ZEE4P6JK87
Content-Type: text/xml
Transfer-Encoding: chunked
Date: Sun, 11 Jan 2009 23:34:46 GMT
Server: CloudFront
E305ZEE4P6JK87InProgress2009-01-11T23:34:46.811Zd3f9whzii7ukkm.cloudfront.netlarge-images-bitsource.s3.amazonaws.com20080111165200Large Images Distributiontrue

To check on the status of the distribution execute the following command:
gfizzle:~ msacks$ ./cfcurl.pl –keyname primary –   https://cloudfront.amazonaws.com

100false
E305ZEE4P6JK87Deployed2009-01-11T23:34:46.811Zd3f9whzii7ukkm.cloudfront.netlarge-images-bitsource.s3.amazonaws.comLarge Images Distributiontrue

Notice that the Distribution <Status> tag shows “Deployed” for the newly created distribution. This means that files in the respective S3 bucket are now being served from the CloudFront CDN.

Try accessing a file from the CDN URL to confirm that files are properly being served:
http://large-images-bitsource.s3.amazonaws.com/IMG_0458.JPG

Testing Files in the Cloud

Try doing a traceroute from multiple geographic locations and see what you get for a response. You will notice that from different geographic locations there will be different IP addresses for the CloudFront edge server responding to the request, and hence, the beauty of a content delivery network: better response times regardless of the geographic location.

Further Reading:

There are many other commands and things that can be done CloudFront using our cfcurl.pl script or the CloudFront REST API. To get more information about the CloudFront API read the CloudFront developer guide.

Read the Amazon CloudFront and S3 Performance Review here: http://www.thebitsource.com/2009/01/28/amazon-cloudfront-and-s3-performance-reviewed-on-the-bitsource/

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • BlogMemes
  • Furl
  • LinkedIn
  • MySpace
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • Tumblr
  • TwitThis
  • Fark
  • Yahoo! Buzz
  1. cloudberryman Says:

    if you are on Windows and don’t have advanced programming skills you can try CloudBerry Explorer for Amazon S3 and CloudFront. It is a FREEWARE. http://cloudberrylab.com/

  2. mludvig Says:

    Hi,
    Thanks for mentioning s3cmd in your article. Since version s3cmd 0.9.9 it supports CloudFront as well so there’s no need for any other tools to set up the distribution. It’s as simple as

    s3cmd cfcreate s3://your.bucket.name

    Have a look at the s3cmd cloudfront howto for details.

    BTW Would you mind updating your blogpost with this info please?

  3. MatthewSacks Says:

    I have included that s3cmd now works for managing CloudFront as well. Thanks for the update!

Leave a Reply

You must be logged in to post a comment.