A Review of the July 2013 Data Science DC Meetup: Lightning Talks!

Guest post blogger Jenna Dutcher is the community relations manager for UC Berkeley's datascience@berkeley degree - the first and only online Master of Information and Data Science.  Follow datascience@berkeley on Twitter and Facebook for news and updates. jenna

Data science as a field lends itself to in-depth discussions of a specific topic; by their very nature, however, meetups encourage brief summaries of a subject or subjects, with no time to really delve into the nitty-gritty details. In July, Data Science DC took the best of both worlds and combined eight wildly intelligent speakers who presented on a topic of their choice, whether on a novice or expert level, but with a caveat: each speaker was only given eight minutes to present a condensed “Lightning Talk” for the audience’s benefit.

These talks ran the gamut in both subject and complexity, covering everything from using data science to improve credit-scoring methods (Dhruv Sharma, FDIC) to personal takeaways from previous workshops (Kevin Coogan, Amalgamood). While all eight speakers were engaging, three of the Lightning Talks stood out to me,  due both to their polished presentations and the intriguing topics that were covered.

ds_lightningtalks

First up was Jon Schwabish, an economist at the Congressional Budget Office. His presentation, “If You Give a Nerd a Number,” walked the audience through “the graphic continuum,” demonstrating how many different options there are for data visualization, and how quickly these representations can spiral out of control when left to data enthusiasts. After all, he argued, if you give a nerd a number, he might want a table to go with it, and once he sees patterns in the numbers, he may turn it into a chart, morph it into shapes, add dimensions and colors and so on.

Schwabish’s talk was perfectly suited for a presentation of this kind — it was short, snappy and presented on an introductory level that was accessible to all attendees, even those just dipping their toes in the waters with their first data science meetup. You can check out more of Jon’s work at policyviz.com.

Elena Zheleva also spoke to the audience about an accessible topic: social networks. More specifically, she discussed incentivized sharing in social networks, a marketing concept that has been fine-tuned by daily deals company LivingSocial, where Elena works as a data scientist.

What does this mean? As it turns out, the psychology of sharing online is the same as the psychology of sharing offline: it all comes down to “jumping on the bandwagon.” However, social media presents one key difference: rather than sharing from just one source of information, social platforms allow users to share amongst themselves, passing information back and forth from a variety of sources. Since each user makes an impression on their other connections, companies want customers to share, hoping that this will convince their social connections to also convert.

Zheleva explained that there are different methods of sharing that take place on social networks, including incentivized and non-incentivized sharing. LivingSocial has made a name for itself with its Me+3 sharing feature, which provides a post-purchase incentive for buyers who convince three other friends to purchase the same deal. Of course, these sharing initiatives are all well and good in theory, but they must be measured like any other business move. In order to assess the Me+3 feature, the data scientists at LivingSocial asked themselves these three questions:

  1. Is this incentive working?
  2. Can we offer better incentives and increase profit?
  3. Can we answer these questions without launching expensive large-scale social experiments?

The researchers defined their success metrics as whether or not the incentivized (Me+3) program led to a lift in shares. In the end, it turned out that there was a 46- to 58-percent lift of shares due to the incentive, which led to LivingSocial instituting the practice across their deals.  Using the knowledge gained from this experiment, they can continue to refine it by simulating behavior over time with the general network model Me+N, to see if a different incentive or number may further increase performance. Sound interesting? LivingSocial is currently hiring data scientists! You can reach out to Elena at elena.zheleva@livingsocial.com. [Note that Elena has requested her slides not be made public, but you can listen to her presentation in the audio below. -ed]

The penultimate speaker, Natalie Robb of WaveLength Data Analytics, introduced the concept of data exhaust. This process is a typical byproduct of normal operations — in the case she described, the exhaust came from overuse of network traffic (broadband usage) in two Canadian cities, Montreal and Toronto. While thousands of variables are at play, there are a few that network administrators are truly able to hone in on, including subscriber ID/IP address, type of application or network, inbound and outbound traffic (as measured in bytes) and date and time of the measured broadband usage.

Robb’s company used these stats in an attempt to answer the question of whether or not there was a particular time of day that people were using more bandwidth. In order to properly assess this, Internet users were divided into three categories:

  1. “Barely Users”: just paying the bill and adding to profit margin.
  2. “Power Users”: the most dynamic Internet users; traditionally the fastest-growing group and key to market development, these users make up 15 percent of the user base and use about 32 percent of all traffic.
  3. “Hogs”: 5 percent of the total users, who consume nearly 44 percent of all traffic; they destroy network performance for others, and make it difficult for companies to evenly distribute services.

In sum, WaveLength Data Analytics found that the penetration of “hog” users was essentially even in each city, but that Montreal had many more “barely users,” whereas Toronto was home to more power users/up and comers.

So, what accounts for these discrepancies? For starters, the language is different in the two cities; with much more English content on the web, it stands to reason that the English-speaking citizens of Toronto would get more usage out of their broadband. Toronto users were also streaming more, gaming more, watching videos online and downloading illegal content (P2P) “like crazy,” as Natalie put it.

This last bit was a key finding for Robb’s team, in terms of practical usage. By indexing P2P traffic to overall traffic, the researchers were able to come up with a baseline ratio. Identified P2P abusers were then assessed on an individual level, measuring their total P2P traffic against their total traffic. If their individual ratio turned out to be above average, the broadband company could put a bandwidth cap on these users, controlling P2P traffic, reducing the impact from “hogs” and ensuring that all customers have a good experience.

Audience response to the Data Science DC Lightning Talks meetup was overwhelmingly positive, and I observed some very involved, enthusiastic discussions taking place both before and after the event. For someone newer to the technical aspect of the data science field, this meetup provided exactly what I needed — a wide-ranging introduction to some truly engaging topics, speakers and peers. It’s a format that I hope is repeated many times in the future.

Did you miss out on the Data Science DC Lightning Talks meetup? While you wait for the next one, you can still check out the audio and slides of these Lightning Talks, included below. Note that Jon Schwabish has edited his presentation together with the audio, separately.

Jon's talk:

[youtube=http://www.youtube.com/watch?v=39D2Khtb3Ws]

Audio:

DSDC Lightning Talks - July 2013 (mp3)

Public Slides: