data visualization

Notes on a Meetup

This is a guest post by Catherine Madden (@catmule), a lifelong doodler who realized a few years ago that doodling, sketching, and visual facilitation can be immensely useful in a professional environment. The post consists of her notes from the most recent Data Visualization DC Meetup. Catherine works as the lead designer for the Analytics Visualization Studio at Deloitte Consulting, designing user experiences and visual interfaces for visual analytics prototypes. She prefers Paper by Fifty Three and their Pencil stylus for digital note taking. (Click on the image to open full size.)

Developments with Data Visualization DC

SantaAlcubierreWarpDrive From its inception, Data Visualization DC has been actively searching for new ways to engage its audience and create a more integrated self sustaining community, and be a more valuable part of Data Community DC.  Most recently we've invested some of our sponsorship funds with Event Central to cover our events with great photography and video, and who has also been working with MoDevDC, Action Design DC, and others to create a series of excellent content unique to the Washington DC technical community. Event Central has recently completed the video from the last DVDC event "Visualizing Christmas & Visual Storytelling", which itself is the result of community participation by SynglyphX, Plot.ly, Visual.ly, Developfor, and a few volunteers.

Our goal with the local and at large data visualization community is to keep our finger on the pulse, engage with top talent, promote great content, facilitate relationships, and organize resources.  As a result of working with volunteers on Visualizing Christmas we will soon be able to offer workshops in data viz, something we've wanted to do since demand spiked after Andy Trice presented at nclud last summer.

In a similar vein, Antonio has been doing an excellent job putting these videos together and in time we hope to have a resource where people can easily navigate through these videos via time-tags based on subject, discussion, presenter, etc.  Once again we'll reach out to the community, and those interested in helping write that algorithm, can join DVDC and DC2 during periodic coworking sessions at 1776 or any coworking spaces interested in hosting us, where we'll get into all the details and other ongoing Data Community DC projects.

Plenty more to discuss, but I'd like to keep this post short(er), so join us at DVDC and let's go down the rabbit hole!

SynGlyphX: Hello and Thank You DC2!

The following is a sponsored post brought to you by one of the supporters of two of Data Community's five meetups.

Hello and Thank You DC2!

This week was my, and my company’s, introduction to Data Community DC (DC2).  We could not have asked for a more welcoming reception.  We attended and sponsored both Tuesday’s DVDC event on Data Journalism and Thursday’s DSDC event on GeoSpatial Data Analysis.  They were both pretty exciting, and timely, events for us.

SynglyphyxAs I mentioned, I’m new to DC2 and new to the “data as a science” community.  Don’t get me wrong, while I’m new to DC2 I’ve been awash in data my entire career.  I started as a young consultant reconciling discrepancies in the databases of a very early Client-Server implementation.  Basically, I had to make sure that all the big department store orders on the server were in sync with the home delivery client application.  A lot of manual reconciling that ultimately led to me programming code to semi-automatically reconcile the two databases.  Eventually (I think) they solved the technical issues that led the Client-Server databases being out of sync.

Synglyphyx2More recently, I was working for a company with a growing professional services organization.  The company typically hired new employees after a contract was signed; but the new professional services work involved short project durations.  If we waited to hire, the project would be over before someone started.  We developed a probability adjusted / portfolio analysis approach to compare supply of available resources (which is always changing as people finish projects, get extended, leave the organization) vs. demand (which is always changing as well), that enabled us to determine a range of positions and skillsets to hire for in a defined timeframe.

In both instances, it was data science that drove effective decision making.  Sure, you can apply some “gut” to any decision, but having some data science behind you makes the case much stronger.

So, I was fascinated to listen to the journalists discuss how they are applying data analysis to help:  1) support existing story lines; and 2) develop new story lines.  Nathan’s presentation on analyzing AIS data was interesting (and a bit timely as we had just gotten a verbal win for a client on doing similar type work, similar, but not exactly the same).

I know the power of data to solve complex business, operational, and other problems.  With our new company, SynGlyphX, we are focused on helping people both visualize and interact with their data.  We live in a world with sight and three dimensions.  We believe that by visualizing the data (unstructured, filtered, analyzed, any kind of data), we can help people leverage the power of the brain to identify patters, spot trends, and detect anomalies.  We joined DC2 to get to know folks in the community, generate some awareness for our company, and to get your feedback on what we are doing.  Thank you all for welcoming us and our company, SynGlyphX, to the community.  We appreciated everyone’s interest in the demonstrations of our interactive visualization technology.  Our website traffic was up significantly last week, so I am hoping this is a sign that you were interested in learning more about us.  Additionally, I have heard from a number of you since the events, and welcome hearing from more.

Here’s my call to action, I encourage you to tweet us your answer to the following question:  “Why do you find it helpful to visually interact with your data?”

See you at upcoming events.

Mark Sloan

About the Author:

As CEO of SynGlyphX, Mark brings over two decades of experience.  Mark began his career at Accenture, co-founded the global consulting firm RTM Consulting, and served as Vice President and General Manager of Convergys’ Consulting and Professional Services Group.

Mark has a M.B.A. from The Wharton School of the University of Pennsylvania, and a B.S. in Civil Engineering from the University of Notre Dame. He is a frequent speaker at industry events and has served as an Advisory Board Member for the Technology Professional Services Association (now Technology Services Industry Association (TSIA)).

General Assembly & DC2 Scholarship

GA DC2 Scholarship The DC2 mission statement emphasises that "Data Community DC is an organization committed to connecting and promoting the work of data professionals...", ultimately we see DC2 becoming a hub for data scientists interested in exploring new material, advancing their skills, collaborating, starting a business with data, mentoring others, teaching classes, changing careers, etc. Education is clearly a large part of any of these interests, and while DC2 has held a few workshops and is sponsored by organizations like Statistics.com, we knew we could do more and so we partnered with General Assembly and created a GA & DC2 scholarship specifically for members of Data Community DC.

For our first scholarship we landed on Front End Web Development and User Experience, which we naturally announced first at Data Viz DC.  How does this relate to data science?  As I was happy to rebut Mr. Gelman in our DC2 blogpost reply, sometimes I would love to have a little sandbox where I get to play with algorithms all day, but then again this is exactly what I've run away from in 2013 in becoming an independent data science consultant, I don't want a business plan I'm not a part of dictating what I can play with.  Enter Web Dev and UX.  As Harlan Harris, organizer of DSDC, mentions in his venn diagram on what makes a data scientist, which Tony Ojeda later emphasizes, programming is a natural and necessary part of being a data scientist.  In other words, there's this thing called the interwebs that has more data than you can shake a stick at, and if you can't operate in that environment then as a data scientist you're asking someone else to do that heavy lifting for you.

Over the next month we'll be choosing the winners of the GA DC2 Scholarship, and if you'd like to see any other scholarships in the future please leave your thoughts in the comments below or tweet us.

Happy Thanksgiving!

Eclipse Foundation LocationTech DC Tour

LocationTechMap  

Interested in open source software for geospatial systems? Join us on November 14th at GWU for an evening of tech talks about location-aware open source technologies.

This month, the Eclipse Foundation's LocationTech working group is hosting a series of events in six cities, concluding with Washington, DC on November 14th. We'll gather at The George Washington University for a round of invigorating talks in the early evening, followed by drinks and networking at a local watering hole.

Speakers include:

  • Juan Marin, CTO of Boundless (formerly OpenGeo)
  • Eric Gundersen, CEO of MapBox
  • Joshua Campbell, GIS Architect at Humanitarian Information Unit, U.S. Department of State

When: Thursday, November 14, 2013 Where: Elliott School of International Affairs, GWU Time: 6pm to 9pm (followed by drinks at CIRCA in Foggy Bottom)

The event is free but space is limited. Register today at http://tour.locationtech.org/

About LocationTech

LocationTech is the Eclipse Foundation's industry working group focusing on location-aware technologies. Members of LocationTech are also full-fledged members of the Eclipse Foundation. Eclipse is a vendor-neutral community for individuals and organizations who wish to collaborate on commercially-friendly open source software. The Eclipse Foundation is a not-for-profit, member-supported corporation that hosts technology projects and helps cultivate both an open source community and an ecosystem of complementary products and services.

US Government Contracting: Year 2013 Infographic by GovTribe

Below is a guest post and infographic from GovTribe, a DC startup that creates products that turn open government data into useful and understandable information. The hōrd iPhone app by GovTribe lets you understand the world of government contracting in real time.

GovTribe-FY13-in-Review-preview

Our latest release, hōrd 2.0, has way more data than our initial release. We spent the last six months building a completely new approach for consuming, processing, and making sense of government data from multiple sources. The iPhone app now provides insight and capability not available anywhere else. Our efforts have also given us pretty robust visibility into how the government behaves and where it allocates its resources. So we thought we'd share.

This post is the first in a series that GovTribe plans to publish. Our purpose is to find some signal in all that noisy data, and to provide some clear, interesting, and maybe even useful information about the world of federal government contracting.

We thought a good place to start, with just over a month left in fiscal year 2013, was a look back at what's been happening since October 1, 2012. Soon to come: Agency Insight. In this series we'll take a deeper look at individual agency activity. Stay tuned - and feedback is always appreciated.

GovTribe FY13 in Review

Weekly Round-Up: Industrial Internet, Business Culture, Visualization, and Beer Recommendations

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from the Industrial Internet to beer recommendations. In this week's round-up:

  • The Googlization Of GE
  • 10 Qualities a Data-Friendly Business Culture Needs
  • Interview with Miriah Meyer - Microsoft Faculty Fellow and Visualization Expert
  • Recommendation System in R

The Googlization Of GE

This is an interesting Forbes article about GE, the Internet of Things (which it calls the Industrial Internet), and how they are trying to be to that space what Google has become to the consumer data space.

10 Qualities a Data-Friendly Business Culture Needs

Running a data-driven organization requires not only having the right talent, tools, and infrastructure to meet the organization's objectives. It also requires a data-friendly culture, which is the premise for this article. The author identifies 10 qualities that can make for a better environment to foster innovative data-driven processes.

Interview with Miriah Meyer - Microsoft Faculty Fellow and Visualization Expert

This post is part of Jeff Leek's interview series on his Simply Stats blog. This week Jeff interviewed Miriah Meyer, who is an expert on data visualization. The interview includes questions about her work, background, influences, and advice she has for data scientists about visualization.

Recommendation System in R

This is a fun blog post about putting together a beer recommendation system using the R statistical programming language. The author walks us through the processes he followed, includes snippets of the code he used, and even shows off the resulting app where you choose a beer you like and it recommends other beers that are similar to it.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Event Review: Data Visualization DC - Visualizing HTML5

In this May/June edition of Data Visualization DC (DVDC) we took our first step and experimented with a new interactive format, which I'm happy to say went very well.  In short, we started with the standard Data Community DC (DC2) style introductionsAndy Trice gave his presentation on Adobe's work with HTML5, and we finished by breaking into two groups to play with visualization-focused and code-focused examples.  The last half of this format grew out of the standard DC2 lecture format, and was inspired by the enthusiastic requests of the DVDC members.  DC2 could continue to host lecture style events for the foreseeable future, and it will likely always be a good introduction for new members whether they're attending DVDC, DSDC, DSMD, DBDC, or SPDC, but we know there is more we can do to engage with DC's data community and the question has been, "How do we decide on an approach?"

Before I go any further, if you attended the event we would appreciate your feedback using this quick survey.

We wanted to experiment with this format based on the strong feedback we received from both the visualization and code focused groups.  The approach was simple: set the context using Andy's presentation, then break into two groups that catered to coders and visualizers, and maybe have some feedback from the two groups to wrap things up; Data Drinks follows every DC2 event.

The first surprise was the number of people interested in the detailed coding examples, and we had only prepared for about six people.  Some people in the visually interactive portion did access the presentation and interactive material, but we quickly learned that despite the tailored link "bit.ly/DVDC_HTML5_Ex" shared on the event page and my personal twitter handle, it was not a smooth process.  Most simply discussed the presentation, with the most activity circled around Andy himself who was feverishly answering questions from as many people as could gather around.

If there were a slight adjustment I could have made, it would have been to have a larger table setup in another area for the people interested in the coding examples, but of course not all spaces have this kind of room.  I also would have started promoting the interactive material much farther in advance, it is asking a little too much of people to pay attention during a presentation and find the example they're interested in.  More broadly however, the strong interest in two different types of interactive events begs for a follow-up workshop, hackathon, or both, which we are currently planning with Andy and nclud.

Again, if you attended the event, we would of course love your feedback using this quick survey.

Data Visualization: From Excel to ???

So you're an excel wizard, you make the best graphs and charts Microsoft's classic product has to offer, and you expertly integrate them into your business operations.  Lately you've studied up on all the latest uses for data visualization and dashboards in taking your business to the next level, which you tried to emulate with excel and maybe some help from the Microsoft cloud, but it just doesn't work the way you'd like it to.  How do you transition your business from the stalwart of the late 20th century?

If you believe you can transition your business operations to incorporate data visualization, you're likely gathering raw data, maintaining basic information, making projections, all eventually used in an analysis-of-alternatives and final decision for internal and external clients.  In addition, it's not just about using the latest tools and techniques, your operational upgrades must actually make it easier for you and your colleagues to execute daily, otherwise it's just an academic exercise.

Google Docs

There are some advantages to using Google Docs over desktop excel, the fact that it's in the cloud, has built in sharing capabilities, wider selection of visualization options, but my favorite is that you can reference and integrate multiple sheets from multiple users to create a multi-user network of spreadsheets.  If you have a good javascript programmer on hand you can even define custom functions, which can be nice when you have particularly lengthy calculations as spreadsheet formulas tend to be cumbersome.  A step further, you could use Google Docs as a database for input to R, which can then be used to set up dashboards for the team using a Shiny Server.  Bottom line, Google makes it flexible, allowing you to pivot when necessary, but it can take time to master.

Tableau Server

Tableau Server is a great option to share information across all users in your organization, have access to a plethora of visualization tools, utilize your mobile device, set up dashboards, keep your information secure.  The question is, how big is your organization?  Tableau Server will cost you $1000/user, with a minimum of 10 users, and 20% yearly maintenance.  If you're a small shop it's likely that your internal operations are straightforward and can be outlined to someone new in a good presentation, meaning that Tableau is like grabbing the whole toolbox to hang a picture, it may be more than necessary.  If you're a larger organization, Tableau may accelerate your business in ways you never thought of before.

Central Database

There are a number of database options, including Amazon Relational Data Services and Google Apps Engine.  There are a lot of open source solutions using either, and it will take more time to set up, but with these approaches you're committing to a future.  As you gain more clients, and gather more data, you may want to access to discover insights you know are there from your experience in gathering that data.  This is a simple function call from R, and results you like can be set up as a dashboard using a number of different languages.  You may expand your services, hire new employees, but want to easily access your historical data to set up new dashboards for daily operations.  Even old dashboards may need an overhaul, and being able to access the data from a standard system, as opposed to coordinating a myriad of spreadsheets, makes pivoting much easier.

Centralize vs Distributed

Google docs is very much a distributed system where different users have different permissions, whereas setting up a centralized database will restrict most people into using your operational system according to your prescription.  So when do you consolidate into a single system and when do you give people the flexibility to use their data as they see fit?  It depends of course.  It depends on the time history of that data, if the data is no good next week then be flexible, if this is your company's gold then make sure the data is in a safe, organized, centralized place.  You may want to allow employees to access your company's gold for their daily purposes, and classic spreadsheets may be all they need for that, but when you've made considerable effort to get the unique data you have, make sure it's in a safe place and use a database system you know you can easily come back to when necessary.