Google Compute Engine vs Amazon EC2 Part 2: Synthetic CPU and Memory Benchmarks

seansmall by 

Testing Assumptions

In the last article, I examined pricing and feature differentiation between Google Compute Engine and Amazon EC2 instance types. Now, it is time to see if the last article's key assumptions, that Google Compute Engine Units are equivalent to Amazon EC2 Compute Units, is correct; and the results may surprise you.

The Competitors

In the Google Compute engine corner is the n1-standard-4, both with and without ephemeral storage. In the other, relatively crowded corner, are three contenders from Amazon, the second generation m3.xlarge, the classic m1.xlarge, and the hi1.4xlarge. Per the benchmarking software:

[table id=2 /]

Note that GCE instances use a Google-compiled and modified Linux Kernel but otherwise the distribution looks like Ubuntu 12.04.  Also, all instances used identical Java versions,

java version "1.6.0_24" OpenJDK Runtime Environment (IcedTea6 1.11.5) (6b24-1.11.5-0ubuntu1~12.04.1) OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

Benchmark Software

Two different benchmark suites were used focusing on CPU and memory performance.  Components of the Phoronix Test Suite were used and the SciMark v2.1.1.1 and Java SciMark v2.0, which consists of five computational kernels: FFT, Gauss-Seidel relaxation, Sparse matrix-multiply, Monte Carlo integration, and dense LU factorization.

CPU Benchmark Results

Both the Java and non-Java SciMark benchmarks tell a similar story (higher scores indicate better performance).  In all tests, the n1-standard-4 and the m3.xlarge top the charts, trading the performance crown back and forth by small margins.  The m1.xlarge trails the pack by a very significant margin.



Curiously, the GCE instance wins all Java SciMark 2.0 regular tests but the m3.xlarge wins the same suite of tests when larger data sizes, designed to exceed the CPU cache size, were used.

From the Phoronix Suite, three separate computationally intensive tests where chosen: the LAMMPS molecular dynamics simulation v1.0, the parallel BZIP2 compression 1.1.6, and a ray tracer (POV-Ray 3.6.1).  Each test measured performance in seconds. To simplify comparison and visualization, all values were normalized by the longest run time for that test (universally the m1.xlarge). Thus, values are not shown as seconds but percentages, the lowest value is best.

The n1-standard-4 edged out the m3.xlarge in both the POV-Ray and LAMMPS but was defeated by the m3.xlarge in the BZIP2 compression test. Not surprising, the h1.4xlarge with 16 cores destroyed all comers in the parallel BZIP test.


Memory Benchmark Results

Memory benchmark results show that more robust metrics will be necessary to truly compare cloud computing capabilities. The line plot below shows memory speed benchmark results for the n1-standard-4 GCE instance and the m3.xlarge, m1.xlarge, and the h14.xlarge Amazon instances.


Each benchmark was run 4 times each for both the the n1-standard-4 and the m3.xlarge and this is where the real story lies. Notice that the GCE instance shows little performance variability across tests. In contrast, the m3.xlarge comes close to competing evenly with the GCE instance in most (but not all) tests but demonstrates performance drops up to 40%. It would seem that there is some validity to Google's claims that GCE offers more consistent performance than competitors. Interestingly, this benchmark took the longest wall clock time to run of the synthetic tests.


In terms of short-term number crunching, the m3.xlarge and the n1-standard-4 seem similarly capable, trading small wins across the numerous benchmarks. In terms of memory speed, a very different story emerges; the GCE instance holds a small but consistent lead in memory speed but a large margin of victory in consistency of performance.  For lengthy processor-intensive tasks, this differential could be significant.

As neither of these services is free, let's return to pricing as it would seem that not all compute units are the same. The GCE n1-standard-4 is either $0.48 per hour without ephemeral storage or $0.552 per hour with storage. In comparison, the m1.xlarge costs $0.520 per hour while the m3.xlarge costs $0.58 per hour and is only available without storage. Note that all prices were current as of 1/20/2013.

At these price points, the original m1.xlarge looks significantly overpriced. One must wonder when Amazon will either phase this option out or drastically alter its pricing. Even though the m3 second generation Amazon instances were just launched 11/1/2012, the story is similar. The comparable GCE instance offers approximately the same number crunching performance, better and, more importantly, more consistent memory performance, at a 20% discount.

The question that needs to be asked is what happens if computational performance is measured not just for a few seconds or minutes, but for hours or days at a time, a common situation in high performance computing and big data. Here is where I believe that Google may have a very significant advantage and I look forward to investigating this in my next article.


Anectdotally speaking, GCE instances are ready for use **much** faster than EC2 instances in my humble experiences. The time difference was quite noticable but I did not bother to quantify this characteristic.