excel

Analyzing Social Media Networks using NodeXL

This is a guest post from Marc Smith, Chief Social Scientist at Connected Action Consulting Group, and a developer of NodeXL, an Excel-based system for (social) network analysis. Marc will be leading a workshop on NodeXL, offered through Data Community DC, on Wednesday, November 13th. If the below peaks your fancy, please register. Parts of this post appeared first on connectedaction.net. NodeXL Logo

I am excited to have the opportunity to present a NodeXL workshop with Data Community DC on November 13th at 6pm in Washington, D.C.

In this session I will describe the ways NodeXL can simplify the process of collecting, storing, analyzing, visualizing and publishing reports about connected structures. NodeXL supports the exploration of social media with import features that pull data from personal email indexes on the desktop, Twitter, Flickr, Youtube, Facebook and WWW hyperlinks. NodeXL allows non-programmers to quickly generate useful network statistics and metrics and create visualizations of network graphs.  Filtering and display attributes can be used to highlight important structures in the network.  Innovative automated layouts make creating quality network visualizations simple and quick.

2013-SMRF-NodeXL-SNA-5 steps for Social Media Network Analysis

For example, this a map of the connections among the people who recently tweeted about the DataCommunityDC Twitter account was created with just a few clicks and no coding:

DataCommunityDC Twitter NodeXL SNA Map and Report for Tuesday, 05 November 2013 at 15:15 UTC

This graph represents a network of 67 Twitter users whose recent tweets contained “DataCommunityDC", taken from a data set limited to a maximum of 10,000 tweets. The network was obtained from Twitter on Tuesday, 05 November 2013 at 15:15 UTC. The tweets in the network were tweeted over the 7-day, 16-hour, 4-minute period from Monday, 28 October 2013 at 22:38 UTC to Tuesday, 05 November 2013 at 14:42 UTC. There is an edge for each “replies-to” relationship in a tweet. There is an edge for each “mentions” relationship in a tweet. There is a self-loop edge for each tweet that is not a “replies-to” or “mentions”.

The network has been segmented into groups (“G1, G2, G3…”) and each group is labeled with the words most frequently used in the tweets from the people in that group. The size of each Twitter user’s profile picture represents the log scaled value of their follower count.

Analysis of the network location of each participant reveals the people in key locations in the network, people at the “center” of the graph:

For more examples, please see the NodeXL Graph Gallery at: http://nodexlgraphgallery.org/Pages/Default.aspx

Data Visualization: From Excel to ???

So you're an excel wizard, you make the best graphs and charts Microsoft's classic product has to offer, and you expertly integrate them into your business operations.  Lately you've studied up on all the latest uses for data visualization and dashboards in taking your business to the next level, which you tried to emulate with excel and maybe some help from the Microsoft cloud, but it just doesn't work the way you'd like it to.  How do you transition your business from the stalwart of the late 20th century?

If you believe you can transition your business operations to incorporate data visualization, you're likely gathering raw data, maintaining basic information, making projections, all eventually used in an analysis-of-alternatives and final decision for internal and external clients.  In addition, it's not just about using the latest tools and techniques, your operational upgrades must actually make it easier for you and your colleagues to execute daily, otherwise it's just an academic exercise.

Google Docs

There are some advantages to using Google Docs over desktop excel, the fact that it's in the cloud, has built in sharing capabilities, wider selection of visualization options, but my favorite is that you can reference and integrate multiple sheets from multiple users to create a multi-user network of spreadsheets.  If you have a good javascript programmer on hand you can even define custom functions, which can be nice when you have particularly lengthy calculations as spreadsheet formulas tend to be cumbersome.  A step further, you could use Google Docs as a database for input to R, which can then be used to set up dashboards for the team using a Shiny Server.  Bottom line, Google makes it flexible, allowing you to pivot when necessary, but it can take time to master.

Tableau Server

Tableau Server is a great option to share information across all users in your organization, have access to a plethora of visualization tools, utilize your mobile device, set up dashboards, keep your information secure.  The question is, how big is your organization?  Tableau Server will cost you $1000/user, with a minimum of 10 users, and 20% yearly maintenance.  If you're a small shop it's likely that your internal operations are straightforward and can be outlined to someone new in a good presentation, meaning that Tableau is like grabbing the whole toolbox to hang a picture, it may be more than necessary.  If you're a larger organization, Tableau may accelerate your business in ways you never thought of before.

Central Database

There are a number of database options, including Amazon Relational Data Services and Google Apps Engine.  There are a lot of open source solutions using either, and it will take more time to set up, but with these approaches you're committing to a future.  As you gain more clients, and gather more data, you may want to access to discover insights you know are there from your experience in gathering that data.  This is a simple function call from R, and results you like can be set up as a dashboard using a number of different languages.  You may expand your services, hire new employees, but want to easily access your historical data to set up new dashboards for daily operations.  Even old dashboards may need an overhaul, and being able to access the data from a standard system, as opposed to coordinating a myriad of spreadsheets, makes pivoting much easier.

Centralize vs Distributed

Google docs is very much a distributed system where different users have different permissions, whereas setting up a centralized database will restrict most people into using your operational system according to your prescription.  So when do you consolidate into a single system and when do you give people the flexibility to use their data as they see fit?  It depends of course.  It depends on the time history of that data, if the data is no good next week then be flexible, if this is your company's gold then make sure the data is in a safe, organized, centralized place.  You may want to allow employees to access your company's gold for their daily purposes, and classic spreadsheets may be all they need for that, but when you've made considerable effort to get the unique data you have, make sure it's in a safe place and use a database system you know you can easily come back to when necessary.