geographic data

Weekly Round-Up: Big Data Projects, OpenGeo, Coca-Cola, and Crime-Fighting

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from big data projects to Coca-Cola. In this week's round-up:

  • 5 Big Data Projects That Could Impact Your Life
  • CIA Invests in Geodata Expert OpenGeo
  • How Coca-Cola Takes a Refreshing Approach to Big Data
  • Fighting Crime with Big Data

5 Big Data Projects That Could Impact Your Life

Our first piece this week is a Mashable article listing 5 interesting data projects. The projects range from one that projects transit times in NYC to one that tracks homicides in DC to one that illustrates the prevalence of HIV in the United States. All are great examples of people doing interesting things with data that is becoming increasingly available.

CIA Invests in Geodata Expert OpenGeo

A while back, the CIA spun off a strategic investment arm called In-Q-Tel to make investments in data and technologies that could benefit the intelligence community. This week, it was announced that they have invested in geo-data startup OpenGeo. This GigaOM article provides a little detail about the company and what they do and also lists some of the other companies In-Q-Tel has invested in thus far.

How Coca-Cola Takes a Refreshing Approach to Big Data

This is an interesting Smart Data Collective article about Coca-Cola and how they use data to drive their decisions and maintain a competitive advantage. The article describes multiple ways the company uses big data and analytics, from interacting with their Facebook followers to the formulas for their soft drinks.

Fighting Crime with Big Data

Our final piece this week is an article about how analytics platform provider, Palantir, helps investigators find patterns to uncover white collar crime, which is usually hidden using data. The article contains multiple quotes from Palantir's legal counsel Ryan Taylor about how they work with crime-fighting agencies and what methods they employ to bring these criminals to justice.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Visualizing Web Scale Geographic Data in the Browser in Real Time: A Meta Tutorial

Visualizing geographic data is a task many of us face in our jobs as data scientists. Often, we must visualize vast amounts of data (tens of thousands to millions of data points) and we need to do so in the browser in real time to ensure the widest-possible audience for our efforts and we often want to do this leveraging free and/or open software. Luckily for us, Google offered a series of fascinating talks at this year's (2013) IO that show one particular way of solving this problem. Even better, Google discusses all aspects of this problem: from cleaning the data at scale using legacy C++ code to providing low latency yet web-scale data storage and, finally, to rendering efficiently in the browser.  Not surprisingly, Google's approach highly leverages **alot** of Google's technology stack but we won't hold that against them.



All the Ships in the World: Visualizing Data with Google Cloud and Maps (36 minutes)

The first talk walks through an overview of where the data comes from and the collection of Google cloud services that compose the system architecture responsible for cleaning, storing, and serving the data fast enough to do real time queries. This video is very useful for understanding how the different technology layers (browser, database, virtual instances, etc) can efficiently interact.

Description: Tens of thousands of ships report their position at least once every 5 minutes, 24 hours a day. Visualizing that quantity of data and serving it out to large numbers of people takes lots of power both in the browser and on the server. This session will explore the use of Maps, App Engine, Go, Compute Engine, BigQuery, Big Store, and WebGL to do massive data visualization.

Google Maps + HTML5 + Spatial Data Visualization: A Love Story (60 minutes)

The second talk discusses in code-level detail how to render vast geographic data (up to a few million data points) using Javascript in the browser.  One of the keys to enabling such large scale data visualization is to pass much of the complex and large scale rendering tasks to the computer's graphics processing unit (GPU) through the use of relatively simple vertex and fragment shaders.  Brendan Kenny, the speaker, explains how he uses CanvasLayer, available from his GitHub (, to synch a WebGL canvas containing the data, to Google Maps Version 3. Basically, he renders one layer for the map and one layer for the data. These two layers must move and scale in a synchronized fashion.  He even dives into excellent examples showing the workings of individual shaders running on the GPU.

Description: Much if not most of the world’s data has a geographic component. Data visualizations with a geographic component are some of the most popular on the web. This session will explore the principles of data visualization and how you can use HTML5 - particularly WebGL - to supplement Google Maps visualizations.


As a bit of background, Brendan leverages a number of technologies that you might not be familiar with, including three.js and WebGL. Three.js is a nice wrapper for WebGL (among other things) and can greatly simplify the process of getting up and running with 3D in the browser.  From the excellent tutorial here:

I have used Three.js for some of my experiments, and it does a really great job of abstracting away the headaches of getting going with 3D in the browser. With it you can create cameras, objects, lights, materials and more, and you have a choice of renderer, which means you can decide if you want your scene to be drawn using HTML 5's canvas, WebGL or SVG. And since it's open source you could even get involved with the project. But right now I'll focus on what I've learned by playing with it as an engine, and talk you through some of the basics.

WebGL is one mechanism for rendering three dimensional data in the browser and is based on OpenGL 2.0 ES. Wikipedia describes it as:

WebGL (Web Graphics Library) is a JavaScriptAPI for rendering interactive 3D graphics and 2D graphics[2] within any compatible web browser without the use of plug-ins. WebGL is integrated completely into all the web standards of the browser allowing GPU accelerated usage of physics and image processing and effects as part of the web page canvas. WebGL elements can be mixed with other HTML elements and composited with other parts of the page or page background.[3] WebGL programs consist of control code written in JavaScript and shader code that is executed on a computer's Graphics Processing Unit (GPU). WebGL is designed and maintained by the non-profit Khronos Group.