Visualization

Moderating The World IA Data Viz Panel

This weekend was my introduction to moderating an expert panel since switching careers and becoming a data science consultant. The panel was organized by Lisa Seaman of Sapient and consisted of Andrew Turner of Esri, Amy Cesal of Sunlight Foundation, Maureen Linke of USA Today, and Brian Price of USA Today. We had roughly an hour to talk, present information, and engage the audience. You can watch the full panel discussion thanks to the excellent work of Lisa Seaman and the World IA Day organizers, but there's a bit of back-story that I think is interesting.

DataViz-BW-AgencyFB Bold DataViz-Rivers-AgencyFB BoldIn the spring of 2013 Amy Cesal helped create the DVDC logo (seen on the right), so it was nice to have someone I'd already worked with. Similarly, Lisa had attended a few DVDC and asked me to moderate because she'd enjoyed them so much. By itself it's not exactly surprising that Lisa attended some DVDC events and went with who she'd met, but common sense isn't always so common. If you google "Data Viz" or "Data Visualization" and focus on local DC companies, experts, speakers, etc. you'll find some VERY accomplished people, but there's more to why people reach out. You have to know how people work together, and you can only know by meeting them and discussing common interests, which is a tenant of all the DC2 Programs.

Now that the sappy stuff is out of the way, I wanted to share some thoughts on running the panel. I don't know about you, but I fall asleep whenever the moderator simply asks a question and each panelist answers in turn. The first response can be interesting, but each subsequent response builds little on the one before, there's no conversation. This can go on for one, maybe two go-rounds, but any more than that and the moderator is just being lazy, doesn't know the panelists, doesn't know the material, or all of the above. A good conversation builds on each response, and if that drifts away from the original question the moderator can jump in, but resetting too much by effectively re-asking the question is robotic and defeats the purpose of having everyone together in one place.

Heading this potential disaster off at the pass, Lisa scheduled a happy hour, hopefully to give us a little liquid courage and create a natural discourse. I did my homework, read about everyone on the panel, and starting imagining how everyone's expertise and experience overlapped. Accuracy vs communicating information; Managing investigative teams vs design iteration; building industry tools vs focused and elegant interfaces; D3js vs Raphael. The result: a conversation, which is what we want from a panel, isn't it?

SynGlyphX: Hello and Thank You DC2!

The following is a sponsored post brought to you by one of the supporters of two of Data Community's five meetups.

Hello and Thank You DC2!

This week was my, and my company’s, introduction to Data Community DC (DC2).  We could not have asked for a more welcoming reception.  We attended and sponsored both Tuesday’s DVDC event on Data Journalism and Thursday’s DSDC event on GeoSpatial Data Analysis.  They were both pretty exciting, and timely, events for us.

SynglyphyxAs I mentioned, I’m new to DC2 and new to the “data as a science” community.  Don’t get me wrong, while I’m new to DC2 I’ve been awash in data my entire career.  I started as a young consultant reconciling discrepancies in the databases of a very early Client-Server implementation.  Basically, I had to make sure that all the big department store orders on the server were in sync with the home delivery client application.  A lot of manual reconciling that ultimately led to me programming code to semi-automatically reconcile the two databases.  Eventually (I think) they solved the technical issues that led the Client-Server databases being out of sync.

Synglyphyx2More recently, I was working for a company with a growing professional services organization.  The company typically hired new employees after a contract was signed; but the new professional services work involved short project durations.  If we waited to hire, the project would be over before someone started.  We developed a probability adjusted / portfolio analysis approach to compare supply of available resources (which is always changing as people finish projects, get extended, leave the organization) vs. demand (which is always changing as well), that enabled us to determine a range of positions and skillsets to hire for in a defined timeframe.

In both instances, it was data science that drove effective decision making.  Sure, you can apply some “gut” to any decision, but having some data science behind you makes the case much stronger.

So, I was fascinated to listen to the journalists discuss how they are applying data analysis to help:  1) support existing story lines; and 2) develop new story lines.  Nathan’s presentation on analyzing AIS data was interesting (and a bit timely as we had just gotten a verbal win for a client on doing similar type work, similar, but not exactly the same).

I know the power of data to solve complex business, operational, and other problems.  With our new company, SynGlyphX, we are focused on helping people both visualize and interact with their data.  We live in a world with sight and three dimensions.  We believe that by visualizing the data (unstructured, filtered, analyzed, any kind of data), we can help people leverage the power of the brain to identify patters, spot trends, and detect anomalies.  We joined DC2 to get to know folks in the community, generate some awareness for our company, and to get your feedback on what we are doing.  Thank you all for welcoming us and our company, SynGlyphX, to the community.  We appreciated everyone’s interest in the demonstrations of our interactive visualization technology.  Our website traffic was up significantly last week, so I am hoping this is a sign that you were interested in learning more about us.  Additionally, I have heard from a number of you since the events, and welcome hearing from more.

Here’s my call to action, I encourage you to tweet us your answer to the following question:  “Why do you find it helpful to visually interact with your data?”

See you at upcoming events.

Mark Sloan

About the Author:

As CEO of SynGlyphX, Mark brings over two decades of experience.  Mark began his career at Accenture, co-founded the global consulting firm RTM Consulting, and served as Vice President and General Manager of Convergys’ Consulting and Professional Services Group.

Mark has a M.B.A. from The Wharton School of the University of Pennsylvania, and a B.S. in Civil Engineering from the University of Notre Dame. He is a frequent speaker at industry events and has served as an Advisory Board Member for the Technology Professional Services Association (now Technology Services Industry Association (TSIA)).

General Assembly & DC2 Scholarship

GA DC2 Scholarship The DC2 mission statement emphasises that "Data Community DC is an organization committed to connecting and promoting the work of data professionals...", ultimately we see DC2 becoming a hub for data scientists interested in exploring new material, advancing their skills, collaborating, starting a business with data, mentoring others, teaching classes, changing careers, etc. Education is clearly a large part of any of these interests, and while DC2 has held a few workshops and is sponsored by organizations like Statistics.com, we knew we could do more and so we partnered with General Assembly and created a GA & DC2 scholarship specifically for members of Data Community DC.

For our first scholarship we landed on Front End Web Development and User Experience, which we naturally announced first at Data Viz DC.  How does this relate to data science?  As I was happy to rebut Mr. Gelman in our DC2 blogpost reply, sometimes I would love to have a little sandbox where I get to play with algorithms all day, but then again this is exactly what I've run away from in 2013 in becoming an independent data science consultant, I don't want a business plan I'm not a part of dictating what I can play with.  Enter Web Dev and UX.  As Harlan Harris, organizer of DSDC, mentions in his venn diagram on what makes a data scientist, which Tony Ojeda later emphasizes, programming is a natural and necessary part of being a data scientist.  In other words, there's this thing called the interwebs that has more data than you can shake a stick at, and if you can't operate in that environment then as a data scientist you're asking someone else to do that heavy lifting for you.

Over the next month we'll be choosing the winners of the GA DC2 Scholarship, and if you'd like to see any other scholarships in the future please leave your thoughts in the comments below or tweet us.

Happy Thanksgiving!

Eclipse Foundation LocationTech DC Tour

LocationTechMap  

Interested in open source software for geospatial systems? Join us on November 14th at GWU for an evening of tech talks about location-aware open source technologies.

This month, the Eclipse Foundation's LocationTech working group is hosting a series of events in six cities, concluding with Washington, DC on November 14th. We'll gather at The George Washington University for a round of invigorating talks in the early evening, followed by drinks and networking at a local watering hole.

Speakers include:

  • Juan Marin, CTO of Boundless (formerly OpenGeo)
  • Eric Gundersen, CEO of MapBox
  • Joshua Campbell, GIS Architect at Humanitarian Information Unit, U.S. Department of State

When: Thursday, November 14, 2013 Where: Elliott School of International Affairs, GWU Time: 6pm to 9pm (followed by drinks at CIRCA in Foggy Bottom)

The event is free but space is limited. Register today at http://tour.locationtech.org/

About LocationTech

LocationTech is the Eclipse Foundation's industry working group focusing on location-aware technologies. Members of LocationTech are also full-fledged members of the Eclipse Foundation. Eclipse is a vendor-neutral community for individuals and organizations who wish to collaborate on commercially-friendly open source software. The Eclipse Foundation is a not-for-profit, member-supported corporation that hosts technology projects and helps cultivate both an open source community and an ecosystem of complementary products and services.

US Government Contracting: Year 2013 Infographic by GovTribe

Below is a guest post and infographic from GovTribe, a DC startup that creates products that turn open government data into useful and understandable information. The hōrd iPhone app by GovTribe lets you understand the world of government contracting in real time.

GovTribe-FY13-in-Review-preview

Our latest release, hōrd 2.0, has way more data than our initial release. We spent the last six months building a completely new approach for consuming, processing, and making sense of government data from multiple sources. The iPhone app now provides insight and capability not available anywhere else. Our efforts have also given us pretty robust visibility into how the government behaves and where it allocates its resources. So we thought we'd share.

This post is the first in a series that GovTribe plans to publish. Our purpose is to find some signal in all that noisy data, and to provide some clear, interesting, and maybe even useful information about the world of federal government contracting.

We thought a good place to start, with just over a month left in fiscal year 2013, was a look back at what's been happening since October 1, 2012. Soon to come: Agency Insight. In this series we'll take a deeper look at individual agency activity. Stay tuned - and feedback is always appreciated.

GovTribe FY13 in Review

The Open Source Report Card - A Fun Data Project Visualizing your GitHub Data

Do you contribute to Open Source projects?

Do you have a Git Hub account?

If so, keep reading.  Dan Foreman-Mackey built the Open Source Report Card, a fantastic web application that simply asks you for your GitHub username and then gives you back some of your own data from the GitHub timeline in a fun and entertaining fashion.  It reminds me a lot of DC2's own Data Science Survey but Dan's ability to build a web app far exceeds my own. Further, Dan does a nice job of describing what he has done and how he did it. Enjoy!

opensourcereportcard

Here is a bit from his site:

Every day, many thousands of open source contributions are made on GitHub by developers around the world. This data is publicly available through the API and—even more conveniently—on the GitHub Archive. This is generally a pretty fun dataset to play with but it is particularly exciting for hackers because we get to play with data that describes our own behavior! Last year, shortly after the full event stream was publicly released, the first annual GitHub data challenge produced some sick data visualizations and it's clear that people at GitHub havebeenthinking about how to Use The Data For Good™.

The one graph that is especially awesome in all sorts of surprising ways is the contributions heat map on every user's profile page. What sets this apart from the other visualizations that already exist on the site? It makes a general statement about one specific user. It lets a developer have a global view of their contributions, skills and habits. This ends up being extremely motivating because it lets the developer see their progress in real time. With this in mind, it seemed like a good idea to provide a more complete set of global statistics summarizing the hacker personality of any GitHub user.

 

 

Data Visualization: Sweave

So you're a data scientist (statistician, physicist, data miner, machine learning expert, AI guy, etc.) and you have the envious challenge of communicating your ideas and your work to people who have not followed you down your rabbit hole.  Typically this involved first getting the data, writing your code, honing the analysis, distilling the pertinent information and graphs/charts, then organizing it into a presentable format (document, presentation, etc.).  Interactive visualizations are really cool and if done right they allow the user to explore the data and the implications of your analysis on their own time.  Unfortunately interactive visualizations require an extra effort, so once you're done with your analysis you have to repurpose the functions so they work within a framework such as Shiny.  For those of us who simply want a nice presentable document to compile once we've finished our work, I introduce you to Sweave.

Sweave is not necessarily built for RStudio, it is built specifically for R to create LaTeX documents, but naturally RStudio has built it right in and created a great interface.  This is a positive and a negative in that it's so easy you don't need to know precisely how the whole mechanism results in a pdf file, but that becomes an issue when your document doesn't compile and you need to debug it.  Sweave is its own language of sorts, with blocks for evaluating an R session, blocks for plain english, and an html tag style of its own that gives the document format instructions (title, body, size, figure dimensions, etc.).  In principle it's easy to understand, but with any new language it has its own syntax, its own unwritten rules, plenty of google searches despite well written tutorials, videos, and books, compatibility issues with different versions of R, and of course throwing your hands in the air in confusion.

Once you get the hang of it you can fit your normal data analysis into the framework sweave provides, and you end up telling a story with your work as you work.  Having good comments has always been a staple of writing code, whatever the language, but there is always a push back because it's mixed in with the code, requiring the reader to understand the flow of the code and how the functions and scripts work together.  Mathematica is a great example of code and presentation working together, but unfortunately it is not free.

Sweave will however change your style and will make you break up your analysis into digestible chunks for the target reader.  For example, when I am analyzing details of some dataset and/or debugging my functions, I will produce many more graphs than are necessary for the end reader.  Perhaps the makers of sweave worked similarly, and purposefully required a R code block in sweave to only print the first plot from that block, forcing you to choose your plots carefully.  You can get around this by exporting ggplot2 figure objects from your code as a list variable and plotting them using the "grid.arrange()" function from the "gridExtra" package, but this is not something you might normally do.  This is how sweave draws you into its style (don't forget to resize your figure: "<<fig=TRUE, echo=FALSE, height=10>>=" and "\setkeys{Gin}{width=0.9\textwidth}", the kittens will be fine), but the bottom line is if you can make sweave part of your routine, you can produce beautiful reports from your R comments and code; maybe it will even help me better remember years from now what that set of functions does that's buried in my computer, but I can only speculate.

Visualizing Web Scale Geographic Data in the Browser in Real Time: A Meta Tutorial

Visualizing geographic data is a task many of us face in our jobs as data scientists. Often, we must visualize vast amounts of data (tens of thousands to millions of data points) and we need to do so in the browser in real time to ensure the widest-possible audience for our efforts and we often want to do this leveraging free and/or open software. Luckily for us, Google offered a series of fascinating talks at this year's (2013) IO that show one particular way of solving this problem. Even better, Google discusses all aspects of this problem: from cleaning the data at scale using legacy C++ code to providing low latency yet web-scale data storage and, finally, to rendering efficiently in the browser.  Not surprisingly, Google's approach highly leverages **alot** of Google's technology stack but we won't hold that against them.

AllTheShips

 

All the Ships in the World: Visualizing Data with Google Cloud and Maps (36 minutes)

The first talk walks through an overview of where the data comes from and the collection of Google cloud services that compose the system architecture responsible for cleaning, storing, and serving the data fast enough to do real time queries. This video is very useful for understanding how the different technology layers (browser, database, virtual instances, etc) can efficiently interact.

Description: Tens of thousands of ships report their position at least once every 5 minutes, 24 hours a day. Visualizing that quantity of data and serving it out to large numbers of people takes lots of power both in the browser and on the server. This session will explore the use of Maps, App Engine, Go, Compute Engine, BigQuery, Big Store, and WebGL to do massive data visualization.

https://www.youtube.com/watch?feature=player_embedded&v=MT7cd4M9vzs

Google Maps + HTML5 + Spatial Data Visualization: A Love Story (60 minutes)

The second talk discusses in code-level detail how to render vast geographic data (up to a few million data points) using Javascript in the browser.  One of the keys to enabling such large scale data visualization is to pass much of the complex and large scale rendering tasks to the computer's graphics processing unit (GPU) through the use of relatively simple vertex and fragment shaders.  Brendan Kenny, the speaker, explains how he uses CanvasLayer, available from his GitHub (https://github.com/brendankenny), to synch a WebGL canvas containing the data, to Google Maps Version 3. Basically, he renders one layer for the map and one layer for the data. These two layers must move and scale in a synchronized fashion.  He even dives into excellent examples showing the workings of individual shaders running on the GPU.

Description: Much if not most of the world’s data has a geographic component. Data visualizations with a geographic component are some of the most popular on the web. This session will explore the principles of data visualization and how you can use HTML5 - particularly WebGL - to supplement Google Maps visualizations. https://www.youtube.com/watch?feature=player_embedded&v=aZJnI6hxr-c

Background

As a bit of background, Brendan leverages a number of technologies that you might not be familiar with, including three.js and WebGL. Three.js is a nice wrapper for WebGL (among other things) and can greatly simplify the process of getting up and running with 3D in the browser.  From the excellent tutorial here:

I have used Three.js for some of my experiments, and it does a really great job of abstracting away the headaches of getting going with 3D in the browser. With it you can create cameras, objects, lights, materials and more, and you have a choice of renderer, which means you can decide if you want your scene to be drawn using HTML 5's canvas, WebGL or SVG. And since it's open source you could even get involved with the project. But right now I'll focus on what I've learned by playing with it as an engine, and talk you through some of the basics.

WebGL is one mechanism for rendering three dimensional data in the browser and is based on OpenGL 2.0 ES. Wikipedia describes it as:

WebGL (Web Graphics Library) is a JavaScriptAPI for rendering interactive 3D graphics and 2D graphics[2] within any compatible web browser without the use of plug-ins. WebGL is integrated completely into all the web standards of the browser allowing GPU accelerated usage of physics and image processing and effects as part of the web page canvas. WebGL elements can be mixed with other HTML elements and composited with other parts of the page or page background.[3] WebGL programs consist of control code written in JavaScript and shader code that is executed on a computer's Graphics Processing Unit (GPU). WebGL is designed and maintained by the non-profit Khronos Group.

Data Visualization: The Double Edged Data Sword

Can we use data visualization, and perhaps data avatars, to build a better community?  If you've ever been part of making the rules for an organization, you may be familiar with the desire to write a rule for every scenario that may arise, to codify how the organization expects things to happen given a specific context.  For a small group this may work as you can resolve most issues with a simple conversation and only broad rules need be codified (roles, responsibilities, etc.).  However, we're also familiar with the draconian rules that arise as a result of some crazy thing that one person did.  One could argue, "What choice do we have?", because once the size of a group grows and communication becomes a combinatorial challenge (you can't talk to everyone about everything all the time) we need the rule of law.  Laws provide everyone a common reference people can relate to their individual context and use to govern their daily conduct, but so can data.  The difference our modern world, our sensor laden interconnected world, has compared to the opaque world of previous generations is data and information.  We have a much greater potential to be aware of the context beyond our immediate senses, and thereby better understand the consequences of our actions, but we can't reach that potential unless we can visualize that data.

There will always be human issues, people interpret data and information differently, which is why we must "trust but verify" and when necessary use data to revisit peoples' reasoning.  Data visualization is what allows us to be aware of "the context beyond our immediate senses", and this premise holds whether you're in the moment or you're looking for a deeper understanding.

When we're in the moment we, presumably, want to make a better decisions and need "decision support".  To make decisions in the moment, the information must be "readily available" or we're forced to make decisions without it.  Consider how new data might change basic decisions throughout your day, in fact businesses are taking advantage of this and coffee shops provide public transit information on tablets behind the counter so people know they have time for that extra latte.

Conversely, if we want to understand how events unfolded we rely on our observations, possibly the observations of those we trust, and fill the space between with our experience and assumptions about the world.  Different people make different observations, and sometimes we can piece together a more precise picture of our shared experience, but the more precise the observations the more unique the situation, and we need laws that provide "a common reference" so we write laws for the most general cases.  The consequence: an officer could write us a ticket for jaywalking even though there are no cars for miles, or we are afraid to help those around us because it may implicate us.  Bottom line,  the more observations we have the less we are forced to assume.

"Decision Support" and "Trust but Verify" are the two sides of the double edged data sword, and it's this give and take that forces gradual adoption by people, organizations, governments, etc.  Almost universally people want transparency into why events unfolded, but do not necessarily want information about them made widely available.  The most notorious of these examples involves Julian Assange of WikiLeaks and Edward Snowden of the recent NSA leaks, in these cases the US Government wants information without having their information disclosed.  Conversely many of us believe governments would run better with more transparency, but there is a proper balance.

On a less controversial and more personal level, I use financial tracking software to help me plan budgets and generally live within my means, I use GPS to help me understand my workout routines, I use event tools to plan my Data Visualization DC events, I generally allow third party applications access to my data in exchange for new and better services.  Each of these services has some sort of data visualization and analytics that come with it, and these visualizations and analytics are essential to my personal decision support.  I enjoy the services, but it is interesting to also suddenly see advertisements about the thing I shard, tweeted, emailed, etc., earlier that day or week.  On the one hand I'm glad advertisements have become more meaningful, but what are the consequences of the double edged data sword?

I would like to be able to revisit my personal information for innocuous reasons, to remember where I've been, what actions I took, who I talked to, who I shared what with, etc.  There are more compelling reasons too, trust but verify reasons, as the data could prove I was at work, was conducting work, met with that client, didn't waste the money, couldn't be liable for an accident, etc.; I'd like to generally have the power to confirm statements and claims I make using my personal data.

Unfortunately I don't own what's recorded about me, my personal data, typically the third party owns that data.  Theoretically I can get that information piecemeal, I can go to the service provided and manually record it, a former justice department lawyer even suggested the wide use of FOIA, but we data scientists and visualizers know that if you can't automate the data collection and visualization then it's really not practical.  In other words, without data visualization we can't hear the proverbial tree in the woods.  So until there is a FOIA app on my smartphone, basically an API for my personal data guaranteed by the government for the people, we can't visualize "the context beyond our immediate senses" for ourselves and others, and the other edge of the data sword will always be difficult to defend against.

The Value of the Data Scientist - A Data Science DC Event Review

Earlier this month the Data Science DC Meetup hosted researchers and staff from the Sunlight Foundation  for a presentation on bringing clarity to flat text data using advanced analysis and data visualization. There is a wealth of raw, unstructured data, available to the public and obtainable via the methods outlined under the Freedom of Information Act (FOIA). This data includes disclosure forms pertaining to lobbying activities before the Federal Government by private enterprises in the United States as well as comprehensive summaries of pending, passed, and rejected legislation before the United States Congress.

Lee Drutman, a Senior Fellow at Sunlight, and his colleagues took on the project of pulling a subset of this data out of its flat, unstructured form, and transforming it into meaningful content, illustrating the relationships between lobbying activity by industry sector and various immigration reform bills. In this way, the team at Sunlight has provided “a comprehensive and interactive guide to the web of interests with something at stake”, detailing “who these interests are, what they care about, and how intensely they are likely to lobby to get what they want” (1). They’ve named their project Untangling the Webs of Immigration Reform.

Lee opened the informative session with detailed information on the underlying unstructured data they had to work with: how it’s generated, how it’s collected, statistics on the total volume of data, and other descriptive measures.  Given the volume and complexity of the data, Lee’s team decided a network analysis would best bring the data to life.

Before we can detail the relationships between lobbying activities by sector and immigration reform bills, we need to first understand the relationships among the various immigration bills themselves. Zander Furnas, Research Fellow at Sunlight, described how the team used Latent Semantic Analysis to identify similarities between bills along the lines of immigration reform subtopics. He and his team represented each bill as a vector, using the bill summaries (the actual bills are too long for practical use) to construct the corpus.  Once each bill was represented as a vector, the team simply compared the vectors for similarity. The comparison was completed using a similarity matrix algorithm that enabled a specific type of clustering called Hierarchical Agglomerative Clustering (HAC). The team used Ward’s method to determine the linkage criteria for more uniform clusters.

The output produced by the HAC analysis was a dendrogram which was then combined with the data on lobbying activities using a network diagram. Zander described how the Sunlight team configured the network diagram as a bipartite graph (two types of nodes), which the first node representing industry sectors and the second node representing individual bills. The edges originate from the first type of node to the second type, i.e. from an industry sector to a bill they lobbied on and the edge weight is determined by the quantity of lobbying activities completed as measured in the number of lobbying reports. This basic network diagram, while technically accurate, produces a confusing and unappealing visualization, as shown below (2).

Network Graph, first run, unfiltered. Source: Sunlight Foundation.

Zander detailed the filtering methods that the team used to streamline the data represented in the network diagram, including the exclusion of the sectors that were below a minimum threshold of lobbying activities. This exclusion resulted in a K-Core sub-graph of the original, with a degeneracy of three. The updated version of the graph produced by K-Core filtration was cleaner but still very visually cluttered. The group then turned to their toolbox of layout algorithms in order to rearrange the data in such a way as to produce a more meaningful yet simple visualization. Using the OpenOrd layout provided by Gephi (open source visualization tool of choice for the Sunlight team), Zander was able to produce a visualization of tighter clusters. He refined the network graph further using weighted nodes (sectors that had more lobbying activity are larger compared to those that have less; bills that were lobbied on more are larger compared to those lobbied on less).

Amy Cesal, graphic designer at Sunlight Foundation, became involved in the project at that stage at Zander’s invitation. She was tasked with applying her visual design skills to beautify the final output of Zander’s network graph. She walked the audience through her decision making process with regard to color palette, flat graphs, cluster isolation and breakdown, and overall design strategy.

The final graph incorporating Amy’s work is shown below: (1)

Network Graph, final draft, filtered. Source: Sunlight Foundation.

Following the presentation, the Sunlight team participated in an open and excellent Q&A session with the audience, addressing specific questions on their project and their company.

It’s very important to highlight this kind of visual analytic work that Lee, Zander, and Amy are conducting at Sunlight Foundation because it illustrates the importance of not only having access to pertinent data (every organization today has jumped on the big data bandwagon and has begun to amass large volumes of unstructured data) but also access to skilled data scientists who can pull the unstructured data off the flat page and transform it into a meaningful visualization for concerned audiences. Data scientists who understand the advanced mathematics and work involved in analysis and also understand how to relate to and present information to a non-technical audience are extremely valuable in the data science community.

Audio is available for download in mp3 format here.

References:

(1) http://sunlightfoundation.com/blog/2013/03/25/immigration/

(2)  https://docs.google.com/presentation/d/1W7fwNLDgwlnkNQe2vvrat_LSzixKeloTtrSUt2HK9EQ/edit?usp=sharing