Data Visualization: New Shiny Packages & Products

Siny_Gridster_JustGauge_HighCharts_AppOver the past few weeks and months we've been exploring the new R web application framework Shiny, how we can develop in it, what its potential is, and what's new.  As expected, web apps with Shiny are getting very sophisticated, and thankfully are making development of professional products by data scientists a reality.  Where once insights gained by analysis required long and intense meeting, we now have beautiful interactive shiny web apps, none of which would be possible without the developers who are building interfaces through R for their favorite javascript packages.  In many ways working through R allows us to do very unique things because we're in an environment where we can easily manipulate, massage, and re-structure our data.  @Winston_Cheng, one of the elite developers building these R javascript packages, has put together a demo app that brings together Shiny with Gridster, JustGage, and Highcharts. Shiny comes with basic functions to create side-panels and main-panels, but those functions are not very configurable and they look amateurish, which can have some collateral issues.  Shiny is good however at modularizing its input vs output displays, and wouldn't it be nice is we could rearrange them as we saw fit, or anyone we build our apps for; Thank you Winston.

Gridster - JustGage - Highcharts

Let's overview with the short short version: Gridster is the boxes that JustGage, Highcharts, or your R charts live within.  A little more detail, Gridster is "the mythical drag-and-drop multi-column jQuery grid plugin that allows building intuitive draggable layouts from elements spanning multiple columns... made by Ducksboard.", JustGage generates and animates nice & clean gauges, and Highcharts is an interactive HTML5/Javascript charting library.  What's nice is these are all independent, which always makes coding easier, the example provided just happens to be what Winston felt like putting together.  One could just as easily use another chart type, such as ggplot2, iGraph, geoPlot, or your own javascript package.

New Potential

The real value here is that representations of your data can be manipulated on the fly by your users.  We do our best to get into our users heads and present the data in a clean simple way, but there can be too much to simplify yet still have meaning; there is a constant balance between depth and breadth.  If you can rearrange the position of different charts you address peoples' ability to absorb the information quickly, and if people can replace plots with new ones, based on your estimation of what's potentially interesting, they can customize their view.  Now your Shiny app is a plug and play system designed to give users flexibility based on their interests.

Shiny Products

My question recently has been what kind of products are ultimately possible with this new Shiny framework.  Initially the question for me was scalability, but I quickly found it's easy to create a Shiny server using Amazon EC2 which will scale for users automatically.  The next question was integration, gathering external data and publishing to external databases, which has been addressed by Shiny's reactive functions - although is can still be a little convoluted.  Interactivity was very quickly addressed through the built-in use of javascript and development of Shiny D3 packages.  The question really comes down to who uses R that would want to develop products?  We are an insight apart, in that we are constantly looking for a better way to bridge the gap between what we see and what is useful, shareable, consumable, etc.  For instance, now we can show the downstream impacts of a component's reliability in manufacturing, correlations between consumer purchasing behaviors, or real-time resource needs for a dispatch center.  These Shiny apps are like the copper wire between a voltage potential, they let the power of insight flow.

Data Visualization: rCharts

NYT_rCharts_AppWe've discussed a few times the advantages of presenting your work in R as an interactive visualization using Shiny, and the next obvious step has been interactive charts.  Let me introduce rCharts and Slidify created by Ramnath Vaidyanathan (ramnathv on GitHub).  As is increasingly the case, these tools are all about how quickly you can begin creating your own work and ideas, it's about putting the power in your hands. First things first, you'll want to install Slidify (demo) from Ramnath's GitHub account using install_github('slidify','ramnathv'), there are a number of other packages you need to install, including rChartsNYT, some of which are on his github account only, and you'll need R 3.0.0.  I generally don't like going through these details as a few Google searches will get you what you need, and it only took me about 30 minutes to update and install everything after running into a few errors and making some coffee.

That being said, as always the goal here is to Democratize Data, and what better way than to begin with America's pastime; although I must object because his demo begins with the Boston Red Sox chosen in the side-panel.  To fix this grievous error is simple, go to the 'ui.R' file and change the 'selectInput' function attribute 'selected' from 'Boston Red Sox' to 'New York Yankees' ... Aaaahh, much better!

Some details are provided in this great tutorial, of which the HTML5 code you can recreate and augment using Slidify using the R markdown file "index.Rmd" with these commands: slidify('index.Rmd'); system('open index.html').  You know everything is working when you can recreate this New York Times app using the command "runApp('app')" and both the tutorial and interactive chart show up in your browser.

The Shiny code is very simple, with 17 and 19 SLOCs for the ui.R and server.R functions respectively, but this is primarily due to the new rCharts functions 'showOutput' and 'renderChart' written for Shiny, and 'rPlot' function which uses the PolyChartsJS library to create interactive visualizations.  From here you need to know how to use the tooltip arguments in javascript.  This small amount of code is possible because the input 'team_data', defined in global.R and pulled from the Lahman baseball database, is a data-frame, and the rCharts function enables the tooltip arguments to operate on the data-frame variables.  In other words, you can create a lot of work for yourself if you don't set your data up right in the first place.

Again, the goal here is to easily create interactive presentations of your data, and rCharts with Shiny provides that given you can begin with organized data.  This seems completely reasonable to me as I myself have trouble speaking intelligently on a subject if I don't have the information organized in my own mind, why would I expect R to do better?

Data Visualization: From Excel to ???

So you're an excel wizard, you make the best graphs and charts Microsoft's classic product has to offer, and you expertly integrate them into your business operations.  Lately you've studied up on all the latest uses for data visualization and dashboards in taking your business to the next level, which you tried to emulate with excel and maybe some help from the Microsoft cloud, but it just doesn't work the way you'd like it to.  How do you transition your business from the stalwart of the late 20th century?

If you believe you can transition your business operations to incorporate data visualization, you're likely gathering raw data, maintaining basic information, making projections, all eventually used in an analysis-of-alternatives and final decision for internal and external clients.  In addition, it's not just about using the latest tools and techniques, your operational upgrades must actually make it easier for you and your colleagues to execute daily, otherwise it's just an academic exercise.

Google Docs

There are some advantages to using Google Docs over desktop excel, the fact that it's in the cloud, has built in sharing capabilities, wider selection of visualization options, but my favorite is that you can reference and integrate multiple sheets from multiple users to create a multi-user network of spreadsheets.  If you have a good javascript programmer on hand you can even define custom functions, which can be nice when you have particularly lengthy calculations as spreadsheet formulas tend to be cumbersome.  A step further, you could use Google Docs as a database for input to R, which can then be used to set up dashboards for the team using a Shiny Server.  Bottom line, Google makes it flexible, allowing you to pivot when necessary, but it can take time to master.

Tableau Server

Tableau Server is a great option to share information across all users in your organization, have access to a plethora of visualization tools, utilize your mobile device, set up dashboards, keep your information secure.  The question is, how big is your organization?  Tableau Server will cost you $1000/user, with a minimum of 10 users, and 20% yearly maintenance.  If you're a small shop it's likely that your internal operations are straightforward and can be outlined to someone new in a good presentation, meaning that Tableau is like grabbing the whole toolbox to hang a picture, it may be more than necessary.  If you're a larger organization, Tableau may accelerate your business in ways you never thought of before.

Central Database

There are a number of database options, including Amazon Relational Data Services and Google Apps Engine.  There are a lot of open source solutions using either, and it will take more time to set up, but with these approaches you're committing to a future.  As you gain more clients, and gather more data, you may want to access to discover insights you know are there from your experience in gathering that data.  This is a simple function call from R, and results you like can be set up as a dashboard using a number of different languages.  You may expand your services, hire new employees, but want to easily access your historical data to set up new dashboards for daily operations.  Even old dashboards may need an overhaul, and being able to access the data from a standard system, as opposed to coordinating a myriad of spreadsheets, makes pivoting much easier.

Centralize vs Distributed

Google docs is very much a distributed system where different users have different permissions, whereas setting up a centralized database will restrict most people into using your operational system according to your prescription.  So when do you consolidate into a single system and when do you give people the flexibility to use their data as they see fit?  It depends of course.  It depends on the time history of that data, if the data is no good next week then be flexible, if this is your company's gold then make sure the data is in a safe, organized, centralized place.  You may want to allow employees to access your company's gold for their daily purposes, and classic spreadsheets may be all they need for that, but when you've made considerable effort to get the unique data you have, make sure it's in a safe place and use a database system you know you can easily come back to when necessary.

Data Visualization: Reactive Functions in Shiny

We now know that Shiny for R is a powerful tool for data scientists to display their work quickly and easily to a broad audience, so let's get to some nitty gritty about what it takes to create Shiny visualizations.  We're not going to get into syntax (unless I want to scare everyone off), let's focus on its basic structures and why it comes naturally to those of us who're not web programmers by trade.

All Shiny applications have two basic functions: ui.R and server.R, and it is the relationship between these two functions that determines the functions you use.

Static Pages

A static page is the most basic architecture to start with and can be written using only these basic functions:

  • ui.R
    • shinyUI()
    • pageWithSidebar()
    • sidebarPanel()
    • mainPanel()
  • server.R
    • shinyServer()
    • output$variable <- reactive()
    • output$image <- reactivePlot()

All we've set up is a control panel on the left-hand-side and the plot outputs on the right-hand-side.  You can prompt the user to upload a file with "fileInput()", used within "sidebarPanel()", but to begin with it may be easier to use a file you're familiar with and wont throw you any curve balls.

You can simply display the table of values given using "tableOutput()", but the goal here is to use the side panel to explore your data, and visualizing the table is much more effective than displaying it.  You can prompt the user for basic yes/no information through "radioButtons()", you can ask them to choose specific columns with "selectInput()", or the user can select multiple values using "checkboxGroupInput()".

If you're not a web programmer by trade, and especially if you're primarily used to linear programming, it is important to note that functions in ui.R are always linked to something in server.R by the "input" and "output" variables, literally.  Shiny has reserved these two variables for passing information to server.R, "input", and passing processing information back to ui.R, "output".  For whatever reason, possibly because there are so many different variable names, server.R uses the full reference "input$inVariable", but ui.R only needs the "outVariable" portion of "output$outVariable".  So for example, you may create a plot in server.R:

output$imagePlot <- reactivePlot(function() { code code }

But ui.R only uses:

div(class="span6", plotOutput("imagePlot")),

I threw the div() function in there just to show the html relationship.

Dynamic Pages

There are a couple ways in which we can make the pages more dynamic, we have already given the user control over what data is used in the output graphs, but we can also dynamically choose what options we present to the user.  Based on initial choices (such as type of data), we can change the range on control functions (such as slider bars), and we can change the type of graph that's produced based on the input data and configuration.

To have the configuration panel change we use:

output$variable <- reactiveUI( function() { code code }

Shiny will keep track of that output variable and if it is to be used in ui.R, this reactiveUI function will be called first; I have seen race conditions so don't have too many functions feeding back on themselves.  To create a slide-bar, to say limit values of a joint PDF, place the "sliderInput()" function within reactiveUI:

output$slideRange <- reactiveUI( function() { code ; code ; sliderInput()}

Then render the output in ui.R with this simple call within sidebarPanel():


The same approach is used to change radio buttons based on the header within an uploaded file, or to add completely new sections to sidebarPanel().  Creating dynamic chart types is easy, just include conditional statements in server.R based on the "input" variables.  So if we're working with a fully-populated dataset create nice histograms or a heat map, but if it's sparsely populated create a force-graph or social-graph.  There must be a corresponding function in ui.R for each chart however, so you are limited by how you set up the page to begin with; your three charts may change their look, but there will always be three.

If many graphs depend on a single set of calculations, it may be prudent to use "reactiveValues()" coupled with the "observe()" function so you don't have to call the same function multiple times.  Shiny will keep track of what's been changed and what hasen't, so all you have to do is call the variable and Shiny will make sure it's up to date.  When you're operating on a large data set, this is essential for a real-time interface.  If we were displaying the psychological results of the Three Stooges, we might create a reactive variable like the following that our reactiveUI() functions would reference:

v <- reactiveValues( Names = c("Larry","Moe","Curly") , sane = NULL) observe(function() { #Calculations on v$Names and v$sane }

As a note for posterity, though we haven't listed many functions, a good data viz is simple and leads the user by your design.  Use the proverbial "eye chart" approach and you'll be lucky if the person dives in at all, too little and the user doesn't 'play' with the viz and doesn't explore their own data.  This can be likened to gamification, although I'm not advocating Farmville for data science; I remember reading as a kid that the original Super Mario Bros. for Nintendo was designed to be difficult yet allow me to win enough that I couldn't put it down.  Although I'm not sure who Bowser's equivalent is in your data, if you're looking to make an impression on your user remember that I still know where all the secret warp levels are.