Data Visualization: Shiny Clusters

Clustering is about recognizing associations between data points, which we can easily FailureProbabilitiesvisualize using different forcegraph layout structures (fructerman, reingold, circle, etc.).  Exploring data is about understanding how different data associations change the overall structure of the data corpus.  With hundreds of data fields and no specific rules on how data may or may not be related, it is up to the user to declare an association and verify their instincts through the resulting data viz.  As data scientists, many times we are expected to have The answer, so when we present our work the presentees may not be so willing to question.  This is where the value of RStudio Shiny becomes clear.  Just as Salman Kahn, of Kahn Academy, recognized that his nephews would rather listen to his lectures on YouTube, where they are free to rewind and fast forward without being rude, our presentees may want to experiment with the data associations and the overall structure.  Shiny allows data scientists to create the interactive clustering process, an alternative to boring power point presentations, that allows our presentees to freely ask their questions.  Data psychology shows that people remember better when they're part of the process, and our ultimate goal is to make an impression.

Data Science DC had an event a while back on clustering where Dr.Abhijit Dasgupta presented on unsupervised clustering.  The approaches outlined in the good doctor's presentation presume the data to be based on rules between data points in the set.  However, we can also introduce declaration or repudiation of associations, where the user declares data fields to be either associative, to filter associations between other fields, or to not be included.

This is important because when looking for patterns in the data, if we compare everything to everything else we may get the proverbial 'hairball' cluster, where everything is mutually connected.  This is useless if we're trying to find structure for a decision algorithm, where separation and distinction are key.

RStudio Shiny gives the power to easily build interactive cluster exploration visualizations, web apps, in R.  Shiny uses reactive functions to pass inputs and outputs between the ui.R and server.R functions in the application directory.  Programming a new app takes a little getting used to as linear programming in R is different than web programming in R; for instance assigning value to the output structure in server.R doesn't necessarily mean its available to pass to the reactive function a few lines down.  To keep things simple you have to use the right type of 'reactive function' on the server.R side or div function on the ui.R side, but the structure is simple and the rest of coding in R remains exactly the same.  Shiny Server gives you the power to host your web app in the cloud, but be warned that large applications on Amazon EC2 micro instances may run Very Slowly - which is Amazon's business model and understandable, they want you to upgrade now that you know the potential they offer.

Using the tools Shiny provides, you will end up with a control panel and a series of graphs; how many and where is up to you.  The difference between Shiny and say Tableau is the ability to process data on the back end, which is where you interpret the user's selections and operate to dynamically update the visualization presented on the webpage.  There is some UX flexibility to better guide the user experience, but if you want to have truly interactive graphs you'll have to incorporate JavaScript... another post for another Friday.