Data Visualization: The Data Industry

In any industry you either provide a service or a product, and data science is no exception.  Although the people who constitute the data science workforce are in many cases rebranded from statistician, physicist, algorithm developer, computer scientist, biologist, or anyone else who has had to systematically encode meaning from information as the product of their profession, data scientists are unique from these previous professions in that they operate across verticals as opposed to diving ever deeper down the rabbit hole.

What defines a business in the Data Science Industry?

There are a lot of companies doing very cool work that revolves around information, but data science has a specific meaning.  Science is the intellectual and practical activity encompassing the systematic study of structure and behavior, and Data Science focuses on the structure and behavior of whatever dataset you choose.  The line between a business in the data science industry and everyone else is whether they are searching for axioms in your data, irreducible truths about your information at hand.  Other businesses may utilize known truths in a product or service, but just as a welder is not a chemist, a chip manufacturer is not a physicist, a farmer is not a geneticist, a civil engineer is not a materials scientist, a gameifier is not a psychologist, or a politician is not a sociologist, just because you can wield the results of science does not mean you are a master or philosopher of that science.

In organizing data science events I've seen a trend in the demographic of data science, aka the data demographic, people are either new to it, contributing to it, standardizing it, consulting for it, or trading it.  We can use this characterization to review products and services, to ultimately understand who would use what's being sold, data scientists or otherwise, and whether the business has a chance.

Data Demographics

Let's focus on one product or service for each demographic.

New Recruits

People new to data science or who like to be in touch with new developments want summary information, they are sniffing around the edges and aren't ready to go down any one rabbit hole too far.  Blogs are a bridge, they are a megaphone for those who've been able to materialize their thoughts, and they are a communication medium with those who want to understand our thoughts.  Flowing Data focuses on data visualization and has a ranked list of the best they've found.  'Success' in a blog can be hard to measure, there are plenty of writers that gain value simply from expressing their thoughts and engaging with a like-minded audience, which can blend from the off-line to the on.  Nathan Yau may be profitable, depending on blog traffic, in that he's using his blog to draw attention to his books Data Points and Visualize This, as well as a tasteful CPM or affiliate model with Tableau and a few others.  I've highlighted this blog because of the excellent Twitter visualizations he features.


Contributors are practitioners, they are a special group because they understand the details of the underlying systems and can recreate it themselves if necessary, they are actors dug deep within the data science industry.  Contributors need and create tools and techniques that help them execute on a day to day basis, and whatever expedites this process is as good as gold.  Concurrent has a great buzzy summary of their primary product Cascading, I see it as an infrastructure and environment for data scientists to focus on analytics and applications without having to become Java, Hadoop, and web deployment experts in the process.  Data scientists are interested in the analyses at their fingertips not the infrastructure that enables it, like a driver is interested in wielding performance not becoming a mechanic; in that same vein they may know about the infrastructure but it's not the focus.



Distillers take a good idea, break it down, streamline it, and package it.  Distillers are looking for automation, real-time performance, mathematical approximations, simplification, anything that helps get the job done and no more.  Infegy has successfully simplified sentiment analysis in their flagship product Social Radar, which enables real-time monitoring of social media for market analysis.

Want to know if that press release is being discussed in a positive light or if that VP gaff is as bad as you're imagining, this tool can help.  These are not themselves data science applications, but there is certainly data science and data visualization under the hood of Social Radar, and is a great example of what products are possible with the right architecture of data science techniques.


Consulting from the others' point of view is another term for "exploring options".  Cubetto Mind is an iOS application based on the Mind Mapping diagram technique. The app is useful for highlighting the relationships within complex groups, and what relationships, existing or not, might make a difference in group dynamics.

So imagine consulting for a new client's organization, there are a number of different players, they all have different priorities, and they may not see how your new fangled data science ideas are ultimately in their interest.  Mind mapping can show the interdependencies within an organization, and how your proposals affect those dependencies.


Valuing and investing in the data science industry is about supply and demand awareness, and in data science as in most industries both are best provided by people.  To generate good leads and connect with the right people you need to filter the signal from the noise, a way to analyze alternatives and come to a conclusion.  Social Action was developed by the UMD Human Computer Interaction Lab and is designed to explore human networks by filtering relationships based on common attributes, and displaying those results using force graph data visualizations.

Compiling the initial data might be difficult and time consuming, but for people who can realistically valuate entrepreneurs in data science this should come naturally or they should have the right resources at their disposal.  Once the data is loaded, recognizing the next lead is about knowing what you believe is an important metric in measuring data scientists and their work.