GCE vs EC2 Part 3: Benchmarks from Serial and Multithreaded Java Applications

Welcome to part 3 of an examination of Google Compute Engine and Amazon Elastic Compute Cloud for cluster computing.


In part 1, I looked at how similarly Google and Amazon position their instance types and the characteristics that distinguish each including cost.

In part 2, I looked at the first set of benchmarks testing the compute and memory capabilities of individual instances, learning that Amazon and Google compute units are not the same.

Since the second post, Amazon ratcheted up the level of competition by offering a 20% price drop on some instances that have exact equivalents within Google Compute Engine.

The Benchmark

In this entry, I offer up some additional individual instance benchmarks using one of if not the defacto benchmark for examining the performance of MPI clusters: the NAS Parallel Benchmarks (NAS = NASA Advanced Supercomputing). In NASA's words:

The NAS Parallel Benchmarks (NPB) are a small set of programs designed to help evaluate the performance of parallel supercomputers. The benchmarks are derived from computational fluid dynamics (CFD) applications and consist of five kernels and three pseudo-applications in the original "pencil-and-paper" specification (NPB 1). The benchmark suite has been extended to include new benchmarks for unstructured adaptive mesh, parallel I/O, multi-zone applications, and computational grids.  Problem sizes in NPB are predefined and indicated as different classes. Reference implementations of NPB are available in commonly-used programming models like MPI and OpenMP (NPB 2 and NPB 3).

Note that you do have to jump through some hoops to sign up and download the benchmark source code but the process only takes a few minutes.

The NAS benchmark is up to version 3.3.1. However, a slew of problems compiling this latest version prompted me to turn to version 3.0 that contains the most recent Java port of the benchmark. The Java port easily compiled and, as I am interested in Java application performance due to a current research project, this was fine by me but I still hope to go back and run the full NAS benchmark suite on a cluster.

Note that the Java version is only sufficient to run serial and multithreaded benchmarks, exercising the serial and multithreaded capabilities of single instances and not MPI clusters.  If you would like more information about the Java port of the NAS Benchmarks, an detailed paper is available here.

The eight benchmarks both used in this test and that were originally specified in NPB 1 mimic the computation and data movement in computational fluid dynamics applications.

Fiver Computational Kernels:

  • IS - Integer Sort, random memory access
  • EP - Embarrassingly Parallel
  • CG - Conjugate Gradient, irregular memory access and communication
  • MG - Multi-Grid on a sequence of meshes, long- and short-distance communication, memory intensive
  • FT - discrete 3D fast Fourier Transform, all-to-all communication

Three Pseudo Applications

  • BT - Block Tri-diagonal solver
  • SP - Scalar Penta-diagonal solver
  • LU - Lower-Upper Gauss-Seidel solver

These individual benchmarks can be run for different size data sets including:

 Class S: small for quick test purposes

Class W: workstation size (a 90's workstation; now likely too small)

Classes A, B, C: standard test problems; ~4X size increase going from one class to the next

Classes D, E, F: large test problems; ~16X size increase from each of the previous classes

For these benchmarks, all tests were run on both the new Amazon m3.xlarge 2nd generation instance ($0.50 per hour as of 2/14/2013) and the Google n1-standard-4 ($0.48 per hour as of 2/14/2013).


In the results below, data classes S, W, A, and B were used.

The first plot, courtesy of ggplot2, shows serial performance across all tests and data classes S, W, and A.  Serial results were not computed for class B due to time and cost considerations.

Serial Performance Tests


 Across the tested data sizes and all serial tests, the GCE n1-standard-4 took the performance crown. If we take the Amazon instance as our baseline, the GCE instance bests it by an average of

  • 9.0% for IS;
  • 10.6% for CG;
  • 18.5% for MG;
  • 26.6% for FT;
  • 18.1% for BT;
  • 19.3% for SP;
  • and 19.1% for LU.


Multithreaded Performance Tests 


Results are somewhat more muddled for the multithreaded tests where the Amazon m3.xlarge pulls out wins, notably in the Integer Sort (IS) and the memory intensive Multi-Grid (MG) tests. Quantifying this with the Amazon instance as a baseline, we see average performance percent differences of:

  • 2.0% IS (AWS)
  • 3.5% CG (GCE)
  • 7.0 MG (AWS)
  • 18.4% FT (GCE)
  • 11.9% BT (GCE)
  • 11.1% SP (GCE)
  • 12.6% LU (GCE)

Stay tuned as more benchmarks are on their way ...