09/30/08

Rails Hosting: Review of Slicehost vs. EC2

It’s a goal of mine to write a series of review for all the major plugins and services that have gone into the creation of Bonanzle. Previously, I reviewed Fiveruns and gave it a “thumbs down,” which gave me the blues, since I’d like Fiveruns to be the killer app for monitoring Rails performance. Unfortunately, though I two or three different Fiveruns sales people have noticed Bonanzle and told me I should use Fiveruns, none have gotten back to me with a promise that they could make it easier to use after I point them to my review. But I digress. Today we discuss Slicehost.

Synopsis

Slicehost has been very good to Bonanzle. After a short and bad experience with another Rails hosting provider that gave limited shell access, we started using Slicehost almost a year ago, first with a 256 MB slice to host our main Bonanzle server. A Slicehost “slice” is their name for a server partition, very much like an EC2 server instance (will get to a comparison of the two of them shortly). When you sign up for a slice, you can pick from a number of sizes: 256 MB, 512 MB, 1 GB, 2 GB, or 4 GB. When you setup your slice, you can choose from a variety of OS’ to have pre-installed on your slice (including most all the flavors of Ubuntu). You have full shell access with any slice you setup, so essentially you have the full range of possibility in configuring your server instances that you’d have if you had the server in your basement. If you choose to add more slices in the future, you can copy the disk image from your existing slices as a starting point for your new slice (as long as you have backups turned on the slice whose disk image you want to copy). This has been very convenient for us, as it saves us the trouble of having to repeatedly install basic stuff like Mysql and Apache on each new slice we add.

How We’ve Used It

From that initial 256 MB slice we started with a year ago, we now have grown to seven slices ranging in size from 512 MB to 4 GB. As mentioned above, it’s very convenient to get a new slice up to speed using the disk image of an old slice. It’s also very fast — when we’ve put in our request to get a new slice, it has taken from 30 minutes to a couple hours max to get the new slice created.

Uptime

None of our slices have gone down in a year of use. That’s nice.

Performance

The bigger the size of your slice, the more CPU you get to use in times of contention. According to the support personnel I’ve spoken with, the servers are hosted on quad core Opteron 64-bit 2 Ghz machines, and a 4 GB slice would get up to 25% of the CPU cycles in times of contention (which there rarely are). Scale down from the 25% for each level down in slice size (e.g., 2 GB slice would get 1/8th of the cycles).

In terms of practical speed, we’re currently serving about 50,000 pages/day, mostly non-cached, on a site that has a lot of interactive features and image processing. We’re doing this on one 4GB slice that currently runs 8 mongrels AND the Mysql server itself. Most page load times are less than a second, creating images takes longer. Good enough for me for now.

Compared to EC2

The closest comparable EC2 offering to a 4GB Slicehost slice is the

7.5 GB of memory, $288.00/month – 850 GB of instance storage, 0 GB BW included in price, 4 EC2 Compute Units

Compare to Slicehost:

4 GB of memory, $280.00/month (with automatic 10% discount it’s $250/mo) – 160GB HD, 1600GB BW included in price, equivalent of 2 EC2 (e.g., one 2 ghz processor) units during resource contention, more otherwise

EC2 jumps out to the early lead, as it offers about twice as much computing power and RAM for $30 more. But Slicehost catches up quickly when you consider bandwidth and storage:

EC2 bandwidth = $0.10-$0.17 per GB transferred. Slicehost = up to 1600 GB transfer free.

That is, if you were to use all of your Slice’s bandwidth, you’d save yourself something in the neighborhood of $250/month vs. Amazon. For storage, Amazon offers more storage space by default, but they make no guarantees about that your instance storage won’t evaporate at any time, which is why they also offer Elastic Block Storage (EBS), which is intended to be your “real” disk when operating in an EC2 instance. Use of EBS costs $0.10/GB and $0.10 per million IO writes, which Amazon estimates to add up to about $26/month more for a “medium sized web site.”

When you add up the total costs, assuming you were going to use your storage and bandwidth, Slicehost offers about half the memory and half the computing power, but it does so at less than half the price of EC2. And a 4GB Slicehost slice is no small computing organism. As mentioned above, it’s serving 50k daily pages of dynamic content and getting by well enough (except when it comes to image creation, which can take 5-10 seconds to process including thumbnails).

Where does EC2 win?

Still, there are a number of advantages to EC2. The first is that 4GB (the size I’ve been discussing) is the largest instance size currently offered at Slicehost, whereas Amazon has a couple different instances with significant more computing power/memory available. This alone is reason that we will probably need to switch to EC2 in the not-distant future, since at times of peak traffic we’re pushing the maximum performance of our slice currently. Update: The Slicehost support team informs me that they also have 8GB and 15.5GB slices available by request. Both of the unlisted, larger-sized slices also having corresponding 2x or 4x increases in HD space and processing power (and of course, cost).

Another annoying limitation of Slicehost is that all traffic is throttled at 10Mbps. While it’s not a “low” amount (Wikipedia says that 8-12 Mbps is equivalent to “medium to high-definition digital channel with DVD quality data” aka about 1 meg of transfer per second) per se, it is not conducive to high traffic, image-heavy sites, and it is annoying that throttling is set at the same level regardless of what slice size you use. Update: The Slicehost support team informs me that this limit can be adjusted as necessary by request. I requested that they double our bandwidth allowance and they had it done within an hour.

Where does Slicehost win?

Firstly, there are the cost wins described above if you are hosting a site that uses lots of bandwidth.

Secondly, I get the sense (from documents I’d previous read but can no longer locate) that it is far less likely that one’s instance storage will evaporate with Slicehost. I know that it’s never happened to us in the year we’ve been hosted, whereas I thought I recalled reading that EC2 made no guarantee that their instance storage would be available at a given time. Would love to get more details on this if anyone else can cite where I might have read this?

Another great feature of Slicehost that’s easy to underestimate is the availability of their help. They have a Slicehost chat room that is staffed by a handful of Slicehost employees during all normal business hours (Update: and non-normal hours too… I was talking to them at 3 AM last night about the progress of our resize to an 8GB slice. There were two Slicehost employees manning the chat window at that hour (!)). I’ve ended up visiting this chat room on numerous occasions when I want instant answers to my questions, and I’ve found the people in the chat room to be very knowledgeable and patient. Getting good support at Amazon is very expensive ($100-$400 per month, or more, for a service Slicehost provides free of charge).

Also, I’ve found that our slice almost always has more than the “guaranteed” CPU cycles available: our slice regularly uses more than “100%” (=1 of the 4 quadcore processors… which is what’s guaranteed with a 4GB slice), according to top.

Final Summary

I hope to continue adding to this article as I gain experience with the two services. As mentioned above, we have stuck exclusively to Slicehost so far, but if our site gets into the millions of uniques we might end up making the move to EC2. Update: I did some research on EC2 recently, and was pretty surprised at how esoteric their documentation is (see the section on creating your own AMI if you need to lull yourself to sleep), so I’d just as soon stay at Slicehost where there are less proprietary concepts involved. For people making the decision today about where to host, I’d pick Slicehost if you’re looking for high configurability, less learning about proprietary concepts, more human support, and lower, more predictable costs. I’d pick EC2 if you already know how to use it or are planning to run a complex scalable architecture that you want to be able to swap in more servers on a whim. I’d imagine EC2 is pretty easy to get up and running with some of the pre-configured AMIs (haven’t researched, but I’m sure they have one for Rails). But then again, Slicehost is pretty damn easy to get Rails rolling too, since you can follow any of the kajillion tutorials about setting up Rails on an Ubuntu machine. (Or you can use modrails, which from what I’ve heard is pretty much plug-and-use.)

Stay tuned for updates, and if you have comparable experience with either, please post it below!

09/12/08

Rails Mysql Indexes: Step 1 in Pitiful to Prime Performance

Like any breathing Rails developer, I love blogging about performance. I do it all the time. I’ve done it here, here, and quite famously, here.

But one thing I haven’t done is blog about Rails performance from a perspective of experience. But tripling in traffic for a few months in a row has a way of changing that.

So now I’m a real Rails performance guy. Ask me anything about Rails performance, and I’ll tell you to get back to me in a couple months, because this aint’ exactly yellowpages.com that I’m running here. BUT, these are the lessons and facts from our first few months of operation:

  • One combined Rails server+Mysql slice at Slicehost is handling about 3000 daily visits and 30,000 daily pageviews (on a highly real time, interactive site) with relative ease. Almost all pageviews less than 2 seconds, most less than 1.
  • Memcached saves our ass repeatedly
  • Full text searching (we’re using Thinking Sphinx) saves our ass repeatedly
  • BackgroundRb will ruin your life, cron-scheduled rake tasks will save it
  • Database ain’t nothing but a chicken wing with indexing

Now there are five salient observations to take from a growing site, but you notice that it was the last one that I chose to single out in the title of this blog? Why? Because, if I called this entry “Rails Performance Blog,” your eyes would glaze over and you’d wouldn’t be able to read through the hazy glare.

Why else? Because the day I spent indexing our tables was the only time in the history of Bonanzle that I will ever bring forth a sitewide 2x-3x performance increase within about 4 hours time. God damn that was a fantastic day. I spent the second half of it writing airy musings to my girlfriend and anyone who would listen about how much fun web sites are to program. Then I drank beer and went rafting. Those who haven’t indexed their DB lately: don’t you hate me and want to be like me more than you ever have before?

Well, I can’t help you with the former, but the latter, that we can work on.

  1. Download Query Analyzer
  2. Delete your development.log file. Start your site in development mode. Go to your slowest page. Open your development.log file in an editor that can automatically update as the file changes.
  3. Look through the queries your Rails site is making. Any query where the “type” column reads “ALL” is a query on which you are searching every row of your database to satisfy the query. Hundreds of rows? OK, whatever. Thousands of rows? Ouch. Tens of thousands of rows (or more)? Your request might never be heard from again.
  4. Create indexes to make those “ALL”s go away. Adding an index in Rails is the simplest thing ever. In a migration: add_index :table_name, :column_name and you’re done. remove_index :table_name, :column_name and you’re undone.
  5. Observe that, at least for MySql, queries where you are filtering for more than one attribute in your where clause (e.g., select * from items where status = “active” and hidden = false) are still slow if you create indexes for “status” and “hidden.” Why? I think it’s because the DB ORs them together to find its results. But I don’t know the exact reason, nor do I care. What I do know is that an add_index :items, [:status, :hidden] creates a compound attribute that will get you back to log(n) time in making queries with compound where clauses.

Now, if you are like me or the 50 people in the Rails wiki and forums that learn about this crazy wonderful thing called “indexes,” your first question is “Indexing sounds pretty bangin. Why not just index the hell out of everything?”

Answer: Them indexes aren’t immaculately conceived, son. Every index you create has to be generated and maintained. So the more indexes you create, the more overhead there is to inserting or deleting records from your table. Of course, most queries on most sites are read queries, so you will make up the extra insert/delete time by 10x or more, but if you were to go buck wild and index the farm, you probably wouldn’t be much better off on balance than if you indexed nothing at all. You see why downloading Query Analyzer was the first step?

The general rule that is given on indexes is that most any foreign key should be indexed, and any criteria upon which you regularly search or sort should be indexed. That’s worked well for us. For tables with less than 500 rows, I usually get lazy and don’t do any indexing, and that seems fine. But assuredly, if you’re working on a table with 1,000 or more rows and you’re querying for columns that aren’t indexed, you are 15 minutes away from a beer-enabling, management-impressing performance optimization that would make Ferris Bueller proud.

09/2/08

Change ACL of Amazon S3 Files in Recursive Batch

We’re in the process of moving our images to be served off of S3, and wanted to share a quick recommend I came across this evening when trying to change our presently-private S3 image files to be public. The answer is Bucket Explorer. All things being equal, you certainly won’t mistake it for a high budget piece of UI mastery, but it is surprisingly capable of doing many things that have been troublesome for me with the Firefox S3 plugin (which is a major pain to even get working with Firefox 2 (which is a major pain to upgrade to Firefox 3… I upgraded for a month, spent 5 hours trying to figure out why some pages seemed to randomly freeze indefinitely before giving up and downgrading (my best guess was it seemed to be Flash-related))), or the AWS-S3 gem, or the other free S3 browsing web service I found somewhere or another.

In addition to providing a capable, FTP-like interface for one’s S3 buckets, it can also get stats on directories, do the aforementioned recursive batch permission setting, delete buckets (S3 gem won’t let me, even with the :force => true option), a bunch of other features, and probably most importantly (for me) — it works! Tra-lee!

It’s $50 to buy, but once it finishes changing batch permissions for about 20,000 of our images files (as its currently in the process of) I would seriously consider paying it. For the time being, I’m on a fully functional 30 day trial.