• Rackspace Mosso’s CloudFiles Tutorial and Review

Rackspace Mosso’s CloudFiles Tutorial and Review

CloudFiles is Mosso’s (NYSE:RAX) latest offering in their suite of cloud computing services. CloudFiles is a cloud-based file-storage and content distribution network service featuring a simple-to-use web console, API, and the power of Rackspace’s cloud computing server resources. CloudFiles also offers a CDN for distributing files amongst a CDN, powered by Limelight, for increased availability and performance of file serving of files storage in the cloud storage system.

I performed a review and audit of the performance and usability of the CloudFiles services. The results and data of the review are included in this article for those interested in considering a cloud files storage system and content delivery network.

Mosso’s Pitch


Cloud Files is an on-line, user (and developer) friendly, dynamically expanding storage system. You can store as much as you want and you only pay for what you use. Individual files can be as large as 5GB or as small as a single byte.

We also make it easy to serve your content with Limelight Networks’ CDN service. This allows you to take advantage of a proven world-class content distribution network that is affordable and easy to use.

Uploading Files to CloudFiles

Files are uploaded by logging in to the Mosso CloudFiles control panel and uploading using the web interface, or the API can be utilized for uploading and modifying files via the programming language of choice via the CloudFiles API.

The process is simple and straightforward to store and distribute a file using CloudFiles:
Store files via the web console or using the cloud files API (some client code is provided). Enable the files to be distributed through the CDN by activating the container by setting the files as “public”.

Creating a Container for Distributing Files Amongst the Cloud

Creating a Container for Distributing Files Amongst the Cloud

Using the CloudFiles API

For users who may need to upload a large amount of files, or programatically manipulate files stored and distributed through the Limelight CDN, just using the web interface falls short for doing the job. Mosso has included a simple-to-use API for interacting with the CloudFiles storage and CDN. I created a simple demo application for uploading a set of files that match a certain type into a container on CloudFiles storage as a simple demonstration of the API.

Documentation and code samples are available on the CloudFiles site for Java, PHP, Python, and C#, and a simple REST interface (using curl or wget) for access the CloudFiles API. The API itself is well-designed and simple to use, although there could be a bit more flushed out in the various use cases for the documentation section.

A Demo Application to Upload Multiple Files using the CloudFiles API (Written in Python):

These steps assume that an API access key has already been generated, which is a necessary prerequisite to access the CloudFiles API remotely from applications. Once an API access key is generated, a user can upload, delete, or modify their cloud files store, and activate files to be distributed through the Limelight CDN and makes manipulating files in the Cloud much more flexible.

Step 1) Get the CloudFiles Client Code in your language of choice. Java, PHP, Python and C# packages are currently available for download.

I chose Python to create an application that interacts with the CloudFiles API. To get it, Log into the Cloud Files Console and navigate to Support->Developer Resources->CloudFiles Client Code->Python

Step 2) Build the CloudFiles Libraries for the Python API

gfizzle:python-cloudfiles-1.2.0 msacks$ python setup.py build
running build
running build_py
creating build
creating build/lib
creating build/lib/cloudfiles
copying cloudfiles/__init__.py -> build/lib/cloudfiles
copying cloudfiles/authentication.py -> build/lib/cloudfiles
copying cloudfiles/connection.py -> build/lib/cloudfiles
copying cloudfiles/consts.py -> build/lib/cloudfiles
copying cloudfiles/container.py -> build/lib/cloudfiles
copying cloudfiles/errors.py -> build/lib/cloudfiles
copying cloudfiles/storage_object.py -> build/lib/cloudfiles
copying cloudfiles/utils.py -> build/lib/cloudfiles

Step 3) Create a Python application (I used a simple text editor such as vi)

vi uploadPDFs.py

Step 4) Import the CloudFiles module and other dependencies. This application was created from an example in the CloudFiles Python Documentation included in the CloudFiles Python Client Code. It is explained in comments denoted by a #.

import cloudfiles
import sys, os, string, commands

#establish a cloudfiles connection
conn = cloudfiles.get_connection(‘user’, ‘myAPIKey’)

#Create a new container object
myPDFS_container = conn.create_container(’pdf_container’)

#Store a bunch of file names in an array to be later uploaded to the CDN
#The pattern defines the regular expression to match the files to be selected for uploading
pattern = “*.pdf”
#The commandStr includes the Unix find command to create a full search path for files to be uploaded. In this case it is all pdfs in the directory /Users/msacks/Desktop/pdfs
commandStr = “find /Users/msacks/Desktop/pdfs/” + pattern
#Store the results in an array and do a string split on them to separate them with newlines
outputArr = commands.getoutput(commandStr)
findResults = string.split(outputArr, “\n”)

#Loop through each absolute path to the file to be uploaded in findResults.
for item in findResults:
shortName = string.split(item, “/”)
print shortName[5]
print “Uploading item: ” + item
#Use only the short name of the file for naming the file as it will be stored in
#cloudfiles, otherwise, the filename will be the absolute path to the file
myItem = myPDFS_container.create_object(shortName[5])
#Set metatdata attributes
myItem.content_type = “application/x-pdf”
#Upload the file to the container
myItem.load_from_filename(item)

Testing Environment Summary

The testing environment was intended to simulate a standard web site hosting environment. A standard Drupal website with various size images. Each page was monitored independently and provided some aspect of website performance that would be simulated in the real world.

Also independent performance tests were done to evaluate ad-hoc performance of a single file download from 2 opposite geographic locations in the continental United States. These were done using dedicated servers in a large datacenter connected directly to a burstable backbone Internet connection. The summary of these testing nodes is as follows:

Testing Host 1:
ISP Net2Ez
Location 365 Main Datacenter in El Segundo, CA

Testing Host 2:
ISP: Slicehost
Location: Saint Louis, Missouri


Keynote’s global monitoring service was used for trending and comparing cloud performance over time using their 24/7 monitoring network.


Alertra was used for server availability monitoring.

The Test Web Site

The web site used for testing with CloudFiles and the CloudFiles Content Delivery Network was a sub-site on my personal website, matthewsacks.com, which is a shared hosting plan hosted with Godaddy. It is intended to simulate a regularly shared hosting site that any individual might set up. Performance on such sites are generally lower than those hosted on private servers or in a companies datacenter, and represent the lowest-end of a web site, so performance metrics although testing results can also be taken into consideration for a large datacenter with a large high-volume website. Testing was just sampled at a smaller scale for sake of time, and are unlikely to be much different than performance results if the testing website was set up in a large datacenter – the only thing that might change here is the scale of the results.

The Structure of the Web Site is as follows. Four pages were set up for analyzing performance. The first page is a collection of large files, which consists of three images ranging from 1.5 to 3 megabytes in size. The second page was a collection of 3 small images from 59Kb to 293Kb. Copies of these two pages were created with images being served off of the CDN and on the local shared hosting to demonstrate the difference between a site with files served from CloudFiles and that of a standard website. A summary of the testing web site structure is as follows:

Node 1 Large Images Local Hosting
Node 4 Large Images Hosted on CloudFiles
Node 2 Small Images Local Hosting
Node 3 Small Images Hosted on CloudFiles

Cloud Platform Performance Testing Results

The Performance was evaluated in two ways: ad-hoc and time-based. The initial website performance tests were configured using Keynote’s Keynote Internet Testing Suite, which is a free utility available for download using Keynote’s global performance monitoring network. Once scripts were created in KITE, they were uploaded to Keynote’s global monitoring network to trend the performance of test pages over time. Many Fortune 100 companies monitor their global networks use the Keynote monitoring network, and this service is not free. I chose Keynote to gather time-based statistics because of their large global network and experience with website performance monitoring.

Baseline Network Test

In order to provide an accurate view of results, I measured the ad-hoc transfer speeds of the two testing nodes independently before proceeding with any time-based performance trending. An independent download of the JBoss application server from Sourceforge yielded very different results in terms of download times. In the case of measuring performance from two distinct geographic locations, baseline download speeds must be taken into consideration that peering becomes a factor in performance response times for these specific tests.

The first set of tests show general response times using traceroute and a download of a 7mb image using local, hosted storage. This was to demonstrate general network statistics and then a test downloading the 7.7MB image will be performed from each respective testing originating from El Segundo, CA (LAX) and Saint Louis, MO (STL). Phoenix is the location of the Godaddy hosting data center and is where the main website’s files are stored. PHX is the primary testing target from LAX and STL.

Baseline Tracroute Statistics – Locally Hosted Files

St Louis (STL) Routing Map to Godaddy (PHX)

root@stl-server:~# traceroute matthewsacks.com
traceroute to matthewsacks.com (208.109.225.188), 30 hops max, 40 byte packets
1 173-45-224-2.slicehost.net (173.45.224.2) 4.001 ms 4.001 ms 4.001 ms
2 (209.20.79.210) 4.001 ms 0.000 ms 0.000 ms
3 (209.20.79.225) 0.000 ms 0.000 ms 0.000 ms
4 ge-6-13-115.car1.StLouis1.Level3.net (4.79.132.225) 0.000 ms 0.000 ms 0.000 ms
5 ae-4-4.ebr1.Denver1.Level3.net (4.69.132.182) 16.001 ms 16.001 ms 16.001 ms
6 ae-2.ebr2.Dallas1.Level3.net (4.69.132.106) 32.002 ms 32.002 ms 32.002 ms
7 ae-82-82.csw3.Dallas1.Level3.net (4.69.136.146) 32.002 ms ae-62-62.csw1.Dallas1.Level3.net (4.69.136.138) 44.003 ms 44.003 ms
8 ae-71-71.ebr1.Dallas1.Level3.net (4.69.136.125) 36.002 ms ae-61-61.ebr1.Dallas1.Level3.net (4.69.136.121) 36.002 ms ae-71-71.ebr1.Dallas1.Level3.net (4.69.136.125) 36.002 ms
9 ae-8-8.car1.Phoenix1.Level3.net (4.69.133.29) 56.004 ms 60.004 ms 60.004 ms
10 THE-GO-DADD.car1.Phoenix1.Level3.net (4.53.104.2) 60.004 ms 60.003 ms 60.003 ms
11 ip-208-109-112-153.ip.secureserver.net (208.109.112.153) 60.003 ms 60.003 ms 60.003 ms
12 ip-208-109-112-142.ip.secureserver.net (208.109.112.142) 60.003 ms 60.003 ms 56.004 ms
13 ip-216-69-188-33.ip.secureserver.net (216.69.188.33) 56.004 ms 56.004 ms 60.003 ms

LAX to PHX Traceroute

C:\Documents and Settings\Administrator\Desktop\curl-7.19.2-ssl-sspi-zlib-static
-bin-w32>tracert matthewsacks.com

Tracing route to matthewsacks.com [208.109.225.188]
over a maximum of 30 hops:

1 <1 ms <1 ms 1 ms 64.93.77.65
2 <1 ms <1 ms <1 ms cr01-7-1.lax4.net2ez.com [64.93.64.13]
3 <1 ms <1 ms <1 ms br01-1-1.lax4.net2ez.com [64.93.64.73]
4 <1 ms <1 ms <1 ms cr01-1-1.lax6.net2ez.com [64.93.64.30]
5 1 ms <1 ms 1 ms TenGigabitEthernet2-4.ar4.LAX2.gblx.net [64.208.
17.5]
6 145 ms 174 ms 10 ms 64.210.13.110
7 10 ms 9 ms 9 ms ip-208-109-112-194.ip.secureserver.net [208.109.
112.194]
8 10 ms 9 ms 10 ms ip-216-69-188-33.ip.secureserver.net [216.69.188
.33]
9 10 ms 10 ms 10 ms ip-208-109-225-188.ip.secureserver.net [208.109.
225.188]
10 10 ms 10 ms 10 ms ip-208-109-225-188.ip.secureserver.net [208.109.
225.188]

Trace complete.

Targeted Download Response Time Tests: Local Hosting

During the targeted tests, I actually had to upload a new benchmark file of size 3.2MB because the download was happening so quickly from the CloudFiles CDN that it didn’t even register statistics with curl. To address this issue, I created a 7.7MB 24-bit PNG image and used this standalone image for testing on the CDN and local storage. No one in their right mind would ever put such a large image on a website, but in order to get the statistics to register with curl in terms of seconds (instead of milliseconds). The 7.7MB File was the smallest file I could benchmark with due to the high bandwidth connection at the St. Louis and El Segundo datacenters, so testing individual small images would be a futile attempt. I left the page-based benchmarking up to Keynote and Alertra.

Single Image Download Tests on Locally Hosted File

From LAX to PHX

C:\Documents and Settings\Administrator\Desktop\curl-7.19.2-ssl-sspi-zlib-static
-bin-w32>curl.exe http://matthewsacks.com/cloudfilesdemo/large_images/whistler.p
ng -o whistler.pngcls
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 7928k 100 7928k 0 0 4572k 0 0:00:01 0:00:01 –:–:– 4654k

From STL to PHX

root@stl-server:~# curl http://matthewsacks.com/cloudfilesdemo/large_images/whistler.png -o whistler.png
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 7928k 100 7928k 0 0 1148k 0 0:00:06 0:00:06 –:–:– 1226k

As expected, connections to PHX from LAX seem to be a bit speedier in general than from STL to PHX when accessing a file stored on local storage. This is a common problem when using a single location for storing content and files when response times are of importance.

Baseline Network Tests: CloudFiles

I did a traceroute from the LAX and STL locations and received expected results. The LAX address resolved to an LAX location and the STL CDN address was originating from Chicago (ORD).

From LAX to PHX:
C:\Documents and Settings\Administrator\Desktop\curl-7.19.2-ssl-sspi-zlib-static
-bin-w32>tracert cdn.cloudfiles.mosso.com

Tracing route to rackspace.vo.llnwd.net [208.111.144.47]
over a maximum of 30 hops:

1 <1 ms <1 ms <1 ms 64.93.77.65
2 <1 ms <1 ms <1 ms cr02-7-5.lax4.net2ez.com [64.93.64.49]
3 <1 ms <1 ms <1 ms br02-1-1.lax4.net2ez.com [64.93.64.77]
4 <1 ms <1 ms <1 ms pr01-1-8.lax4.net2ez.com [64.93.64.90]
5 10 ms 1 ms 1 ms llnw.com.any2ix.crgwest.com [206.223.143.16]
6 1 ms 1 ms 1 ms cds435.lax.llnw.net [208.111.144.47]

From STL to PHX:
root@stl-server:~# traceroute cdn.cloudfiles.mosso.com

traceroute to cdn.cloudfiles.mosso.com (68.142.73.39), 30 hops max, 40 byte packets
1 173-45-224-2.slicehost.net (173.45.224.2) 0.000 ms 0.000 ms 0.000 ms
2 (209.20.79.210) 0.000 ms 0.000 ms 0.000 ms
3 (209.20.79.225) 0.000 ms 0.000 ms 0.000 ms
4 ge-6-13-115.car1.StLouis1.Level3.net (4.79.132.225) 0.000 ms 0.000 ms 0.000 ms
5 ae-11-11.car2.StLouis1.Level3.net (4.69.132.186) 0.000 ms 0.000 ms 0.000 ms
6 ae-4-4.ebr2.Chicago1.Level3.net (4.69.132.190) 8.000 ms
16.001 ms 16.001 ms
7 ae-23-52.car3.Chicago1.Level3.net (4.68.101.39) 188.011 ms ae-23-56.car3.Chicago1.Level3.net (4.68.101.167) 188.011 ms 188.011 ms
8 glbx-level3-te.Chicago1.level3.net (4.68.110.194) 8.000 ms 4.000 ms 8.000 ms
9 64.215.29.250 (64.215.29.250) 12.001 ms 12.001 ms 12.001 ms
10 ve6.fr3.ord.llnw.net (69.28.172.41) 8.001 ms 8.001 ms 8.001 ms
11 cds110.ord.llnw.net (68.142.73.39) 8.001 ms 8.001 ms 8.001 ms

Targeted Test 1:
Testing 10 whistler.png downloads from Slicehost St. Louis Datacenter

With CloudFiles:
Command:
root@stl-server:~# i=1; while [ $i -lt 11 ]; do curl http://cdn.cloudfiles.mosso.com/c5621/whistler.png -o whistler.png; i=$[ $i+1 ]; done

Without CloudFiles
Command:
root@stl-server:~# i=1; while [ $i -lt 11 ]; do curl http://www.matthewsacks.com/cloudfilesdemo/large_images/whistler.png -o whistler.png; i=$[ $i+1 ]; done

Test Data:

Targeted Test 2:
Testing whistler.png downloads from 365Main El Segundo Datacenter

With CloudFiles:
Command:
C:\Documents and Settings\Administrator\Desktop\curl-7.19.2-ssl-sspi-zlib-static
-bin-w32>curl.exe http://cdn.cloudfiles.mosso.com/c5621/whistler.png -o whistler
.png

Without CloudFiles:
Command:
C:\Documents and Settings\Administrator\Desktop\curl-7.19.2-ssl-sspi-zlib-static
-bin-w32>curl.exe http://www.matthewsacks.com/cloudfilesdemo/large_images/whistl
er.png -o whistler.png

Test Data:

From LAX to PHX using the CloudFiles CDN, the download times cannot even be registered with curl they are happening so fast, but they can be calculated based on the data rate. At approximately 11 MB /second download rates and a 7.7 MB file, the load times are an estimated 0.7 seconds for a 7.7 MB file!

Page Responsiveness Over Time

The following Keynote Graphs Demonstrate Page Performance over the course of about one week.

Large Images: With CloudFiles vs. Hosted Only
Node 1 is hosted on Godaddy Only
Node 4 is hosted on Godaddy with Images being served from CloudFiles

Keynote Graph Lage Files. Cloud vs. Local Hosting. Performance and availability.

Keynote Graph Lage Files. Cloud vs. Local Hosting. Performance and availability.

Keynote Graph Large Files. Cloud vs. Local

Keynote Graph: Large Files Geographic Location Breakdown. CloudFiles vs. Local Hosting

The page responsive times show great increases for large files when using CloudFiles versus using local hosting. It brings down the page load time down to a standard time of 3.87 seconds regardless of geographic location; whereas with only local hosting, the page load time varies from 2.9 to 10.6 seconds depending on how close to the origin the request is coming from.

Small Images: With CloudFiles vs. Hosted Only
Node 1 is hosted on Godaddy Only
Node 4 is hosted on Godaddy with Images being served from CloudFiles

Keynote Graph: Small Files Performance and Availability. CloudFiles vs. Local Hosting.

Keynote Graph: Small Files Performance and Availability. CloudFiles vs. Local Hosting.

Keynote Graph Small Files. CloudFiles vs. Local Hosting.

Keynote Graph Small Files. CloudFiles vs. Local Hosting.

Although small image files did not show as dramatic of a performance gain as large files, serving small files from CloudFiles is still faster (except for the case of Boston Verizon) than pure local hosting. This is most likely because the majority of the load time is in the dynamic request of the Drupal page and not the actual content. For small images, CloudFiles is still faster, and in the world of websites that generate money, page load times tend to have a direct impact on revenue.

Server Availability
Server Availability monitoring was provided by Alertra. There were no reported service outages during the time of the test.

Conclusions

Cons:
Separate registration process to access the forums.
Documentation could use some improvements (it’s a bit slim)
Uploading large files through the web-console proved to be a bit difficult as my session expired before I could finish uploading.
Still in beta, so there a few kinks to still be worked out.

Pros:
Simple API
Performance Gains of almost 4 seconds for large images versus local hosting
Better performance of about 1 second for small files for locations far from the origin
Statistics Updated in near real time
Convenient integration between file storage and content network distribution
Good knowledge base and documentation.

Wish List:
A way to view near real time usage – for high end users
HTTP access logs
More visibility into the performance and diagnostics of edge servers

What was interesting was even though the close proximity of LAX to PHX made for impressive response times, serving the images off of CloudFiles provided for still greater response times. In terms of performance, which was a primary focus of my evaluation, CloudFiles gets an A+. Granted, performance was only measured in the United States, in the future I would like to perform some global testing to get a broader view of service performance and availability, but this tends to cost much more for the auditor.

Pricing seems fairly reasonable. I don’t have much to compare the pricing to, but I did an estimate on the CloudFiles site for serving 100GB per month with 500,000 requests, and it came out to less than $50 per month.

A pricing sheet and calculator for the CloudFiles service is available at http://www.mosso.com/pricingfiles.jsp.

CloudFiles proves to be a simple and easy to use way to store files in the Internet and distribute them for performance or high-availability (or both). Rackspace’s Mosso has done another great job rolling out a Cloud-based service, now for storage and file serving. When performance, ease-of-use, and cost are factors CloudFiles makes the grade in all three aspects.

Add to Technorati Favorites

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • BlogMemes
  • Furl
  • LinkedIn
  • MySpace
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • Tumblr
  • TwitThis
  • Fark
  • Yahoo! Buzz
  1. Rob La Gesse Says:

    Thanks for the in-depth review.

    A couple ofnotes, if you are on a Mac, the daily builds of CyberDuck now include Cloud Files support - so uploading even hundreds of files at a time is easy: http://update.cyberduck.ch/nightly/

    Also - the forum issue is corrected :)

    Nice work,

    Rob

  2. Mosso: The Hosting Cloud » Blog Archive » Cloud Files Review by Matthew Sacks Says:

    [...] this post: “Rackspace Mosso’s Cloud Files Review“, Matthew Sacks does a pretty in-depth report on Mosso’s Cloud Files with CDN [...]

Leave a Reply

You must be logged in to post a comment.