Google IO wrapped up last week with a tremendous number of data-related announcements. Today's post is going to focus on Google Compute Engine (GCE), Google's answer to Amazon's Elastic Compute Cloud (EC2) that allows you to create and run virtual compute instances within Google's cloud. We have spent a good amount of time talking about GCE in the past, in particular, benchmarking it against EC2 here, here, here, and here. The main GCE announcement at IO was, of course, the fact that now **anyone** and **everyone** can try out and use GCE. Yes, GCE instances now support up to 10 terabytes per disk volume, which is a BIG deal. However, the fact that GCE will use minute-by-minute pricing, which might not seem incredibly significant on the surface, is an absolute game changer.
Let's say that I have a job that will take just a thousand instances each a little bit over an hour to finish (a total of just over a thousand "instance hours"). I launch my thousand instances, run the needed job, and then shut down my cloud 61 minutes later. Let's also assume that Amazon and Google both charge about the same amount, say $0.50 per instance per hour (a relatively safe assumption) and that Amazon's and Google's instances have the same computational horsepower (this is not true, see my benchmark results). As Amazon charges by the hour, Amazon would charge me for two hours per instance or $1000.00 total (1000 instances x $0.50 per instance per hour x 2 hours per instance) whereas Google would only charge me $508.34 (1000 instances x $0.50 per instance per hour x 61/60 hours per instance). In this circumstance, Amazon's hourly billing has almost doubled my costs but the impact is far worse.
If I want to try to mitigate the over charge, I can run the job with fewer instances but for a longer time. One option would be to run 100 instances for just over 10 hours each. This setup would then cost me $550 (100 instances x 11 hours per instance x $0.50 per instance per hour). If I am exceedingly price sensitive, I could run a single instance for a 1001 hours and get the same job complete at a total cost of $500.50. At this point, I am only getting overcharged $0.50 cents but, if you are willing to wait 1000 hours for your results, why use the cloud at all?
Ok, now let's say completing the task is incredibly important to you and time is of the essence. In this case, let's throw 5,000 instances at the problem which now takes just over 12 minutes to solve (let's call this 13 minutes). Running these 5,000 instances in GCE would cost $541.66 (5000 instances x 13/60 hours per instance x $0.50 per hour per instance) whereas the same run in Amazon would cost $2500 (5000 instances x 1 hour per instance x $0.50 per hour per instance)!!!!
With GCE, I don't have to worry about this overcharge until I hit the 10-minute minimum charge window. Thus, whenever I use GCE, I should simply throw as many instances as possible at the problem without thinking as the price is going to wind up about the same in either case. Or, put another way, look at the best case that GCE provides (I get my job done in 13 minutes for about $540) whereas for the same amount of money ($550), Amazon completes this task in 10 hours. Which one would you choose?
This is the true beauty of the cloud. GCE's pricing scheme incentivizes users to take full advantage of the cloud (massive parallelization for bursty computation) whereas Amazon's does not. When using GCE, I will spin up as many instances to get the job done as fast as possible. With Amazon, I will not due to the billing overcharges. Even with all other things equal, GCE wins every time. Once users get used to getting immediate results, they won't go back.
My guess is that Amazon changes their hourly billing practices much sooner than later.