ggplot2

Data Visualization: From Excel to ???

So you're an excel wizard, you make the best graphs and charts Microsoft's classic product has to offer, and you expertly integrate them into your business operations.  Lately you've studied up on all the latest uses for data visualization and dashboards in taking your business to the next level, which you tried to emulate with excel and maybe some help from the Microsoft cloud, but it just doesn't work the way you'd like it to.  How do you transition your business from the stalwart of the late 20th century?

If you believe you can transition your business operations to incorporate data visualization, you're likely gathering raw data, maintaining basic information, making projections, all eventually used in an analysis-of-alternatives and final decision for internal and external clients.  In addition, it's not just about using the latest tools and techniques, your operational upgrades must actually make it easier for you and your colleagues to execute daily, otherwise it's just an academic exercise.

Google Docs

There are some advantages to using Google Docs over desktop excel, the fact that it's in the cloud, has built in sharing capabilities, wider selection of visualization options, but my favorite is that you can reference and integrate multiple sheets from multiple users to create a multi-user network of spreadsheets.  If you have a good javascript programmer on hand you can even define custom functions, which can be nice when you have particularly lengthy calculations as spreadsheet formulas tend to be cumbersome.  A step further, you could use Google Docs as a database for input to R, which can then be used to set up dashboards for the team using a Shiny Server.  Bottom line, Google makes it flexible, allowing you to pivot when necessary, but it can take time to master.

Tableau Server

Tableau Server is a great option to share information across all users in your organization, have access to a plethora of visualization tools, utilize your mobile device, set up dashboards, keep your information secure.  The question is, how big is your organization?  Tableau Server will cost you $1000/user, with a minimum of 10 users, and 20% yearly maintenance.  If you're a small shop it's likely that your internal operations are straightforward and can be outlined to someone new in a good presentation, meaning that Tableau is like grabbing the whole toolbox to hang a picture, it may be more than necessary.  If you're a larger organization, Tableau may accelerate your business in ways you never thought of before.

Central Database

There are a number of database options, including Amazon Relational Data Services and Google Apps Engine.  There are a lot of open source solutions using either, and it will take more time to set up, but with these approaches you're committing to a future.  As you gain more clients, and gather more data, you may want to access to discover insights you know are there from your experience in gathering that data.  This is a simple function call from R, and results you like can be set up as a dashboard using a number of different languages.  You may expand your services, hire new employees, but want to easily access your historical data to set up new dashboards for daily operations.  Even old dashboards may need an overhaul, and being able to access the data from a standard system, as opposed to coordinating a myriad of spreadsheets, makes pivoting much easier.

Centralize vs Distributed

Google docs is very much a distributed system where different users have different permissions, whereas setting up a centralized database will restrict most people into using your operational system according to your prescription.  So when do you consolidate into a single system and when do you give people the flexibility to use their data as they see fit?  It depends of course.  It depends on the time history of that data, if the data is no good next week then be flexible, if this is your company's gold then make sure the data is in a safe, organized, centralized place.  You may want to allow employees to access your company's gold for their daily purposes, and classic spreadsheets may be all they need for that, but when you've made considerable effort to get the unique data you have, make sure it's in a safe place and use a database system you know you can easily come back to when necessary.

Data Visualization: Graphics with GGPlot2

By:  DSC00302 - Version 2 Basic plots in R using standard packages like lattice work for most situations where you want to see trends in small data sets, such as your simulation variables, which make sense considering lattice began with the Bell Lab's S language.  However, when we need to summarize and communicate our work with those primarily interested in the "forest" perspective, we use tools like ggplot2.  In other words, the difference between lattice and ggplot2 is the difference between understanding data versus drawing pictures.

You can learn all about ggplot2 by downloading the R package and reading, but even Even Hadley Wickham, author of ggplot2, thinks going through the R help documentation will "drive you crazy!"  To alleviate stress, we've compiled references, examples, documentation, blogs, books, groups, and commentary from practitioners who use ggplot2 regularly, enjoy.

GGplot2 is an actively maintained open-source chart-drawing library for R based upon the principles of "Grammar of Graphics", thus the "gg".  Grammar of Graphics was written for statisticians, computer scientists, geographers, research and applied scientists, and others interested in visualizing data.  GGplot2 can be generalized as layers composed of: a data set, mappings and aesthetics (position, shape, size color), statistical transforms, and scaling.  To better wrap our minds around how this applies to ggplot2, we can take Hadley's tour, or attend one of his events.  The overall goal is to automate graphical processes and put more resources at our fingertips; below are some great works from practitioners.

London Bike RoutesPopularLondonBikeRoutes

The London bike routes image is built with three layers: building polygons, waterways and lakes, and bike routes.  The route data itself is a count of the number of bikes, as well as their position, featured as thickness and color intensity in yellow, which is a nice contrast to the black and grey of the city map.  I enjoy this dataviz because you can imagine yourself trying to get around on a bicycle in London.

Raman Spectroscopic Grading of GliomasSpectroscopicObservations

The background of this work is the classification of tumour tissues using their Raman-Spectra. A detailed discussion can be found in C. Beleites et al.  Gliomas are the most frequent brain tumours, and astrocytomas are their largest subgroup. These tumours are treated by surgery. However, the exact borders of the tumour are hardly visible. Thus the need for new tools that help the surgeon find the tumour border. A grading scheme is given by the World Health Organization (WHO).

TwitteR Packagetwitter-ggplot

Curious about your influence on twitter?  Want to see how your messages resonate within and outside your network?  Here is a great website that goes through many examples on using the TwitteR package in R, with the following ggplot2 code that creates the chart on our right-hand-side:

[code lang="R"]require(ggplot2)

ggplot()+geom_bar(aes(x=na.omit(df$rt)))+

opts(axis.text.x=theme_text(angle=-90,size=6))+

xlab(NULL)

[/code]

The ggplot2 interface is interesting because you're using the + operator, thus manifesting the Grammar of Graphics concept of layers.visualizingSentencingData-ggplot2

This final example of Sentencing Data for Local Courts easily breaks up the data by demographics committing different classes of crimes.  As above, the R code is very simple and follows the layering paradigm:

 

[code lang="R"]ggplot(iw, aes(AGE,fill=sex))+geom_bar() +

facet_wrap(~Offence_type)

[/code]