data scientists

Former Obama For America and Living Social Data Scientists Show Off Their Startups - Data Innovation DC - Next Week!

Welcome Back! As a few people have mentioned, DIDC has been missing in action for January and February and, for that, we must apologize. We had an amazing sequence of events planned for the last two months that fell through last minute. Now, however, we are back on track and are excited to bring you our first event of the year featuring amazing talks from Erek Dyskant of BlueLabs, Brian Muller of OpBandit, and more! Not only do we encourage you to check out our Data Innovation DC Meetup, we want you to participate. We pride ourselves in having interactive talks and an extensive period of mingling and discussion both before and after the event. We will have food and drink a plenty and Cooley, LLP offers us a simply incredible space to hold the meetup. Later this spring we will be featuring both Cyber Security and Business Intelligence focused events. If you know of anyone that might be interested in presenting, please let me know at sayhitosean [@]

Event Details

Register Here!

Date: 3/26/2014

6:00 - 6:45pm Networking 6:45 - 6:50pm Introduction 6:50 - 8:00pm Speakers

Speaker Bios

Erek Dyskant

 - is a Co-Founder at BlueLabs, an analytics and technology company dedicated to using data science techniques for improving social good.  Prior to BlueLabs, he was the Technology Lead for Geospatial Analytics at Obama for America. Erek finds that many of the best applications of big data processes not only start with huge datasets, but result in large nuanced predictions such as individual-level behavioral likelihoods, hyperlocal climate change predictions, patient-level treatment recommendations, or locations where election-related violence may (or may not) have occurred.  He works to expand the role of data driven decisions beyond strategic decisions made in boardrooms to tactical ones made by field organizers, case managers, and community health workers.

Brian Muller

 - is the Co-Founder and CTO of OpBandit, a company that helps publishers and marketers increase their online engagement with responsive content delivery. Prior to founding OpBandit, he was the Lead Data Scientist at LivingSocial. While at LivingSocial, he founded the data science team and oversaw the creation and growth of a big data infrastructure and the teams necessary to support it - all while the customer base grew from thousands to over 70 million users. Before that, he worked as the Web Director for Foreign Policy Magazine under the Washington Post. Brian has also worked as an adviser and consultant for a variety of groups including PBS, the Carnegie Foundation, and the State of South Carolina. He as a MS in the Biomedical Sciences, and has spent time in academia working for the Medical University of South Carolina and Johns Hopkins University School of Medicine focused on squeezing meaningful information out of vast quantities of genomic data.

General Assembly & DC2 Scholarship

GA DC2 Scholarship The DC2 mission statement emphasises that "Data Community DC is an organization committed to connecting and promoting the work of data professionals...", ultimately we see DC2 becoming a hub for data scientists interested in exploring new material, advancing their skills, collaborating, starting a business with data, mentoring others, teaching classes, changing careers, etc. Education is clearly a large part of any of these interests, and while DC2 has held a few workshops and is sponsored by organizations like, we knew we could do more and so we partnered with General Assembly and created a GA & DC2 scholarship specifically for members of Data Community DC.

For our first scholarship we landed on Front End Web Development and User Experience, which we naturally announced first at Data Viz DC.  How does this relate to data science?  As I was happy to rebut Mr. Gelman in our DC2 blogpost reply, sometimes I would love to have a little sandbox where I get to play with algorithms all day, but then again this is exactly what I've run away from in 2013 in becoming an independent data science consultant, I don't want a business plan I'm not a part of dictating what I can play with.  Enter Web Dev and UX.  As Harlan Harris, organizer of DSDC, mentions in his venn diagram on what makes a data scientist, which Tony Ojeda later emphasizes, programming is a natural and necessary part of being a data scientist.  In other words, there's this thing called the interwebs that has more data than you can shake a stick at, and if you can't operate in that environment then as a data scientist you're asking someone else to do that heavy lifting for you.

Over the next month we'll be choosing the winners of the GA DC2 Scholarship, and if you'd like to see any other scholarships in the future please leave your thoughts in the comments below or tweet us.

Happy Thanksgiving!

The Data Scientist Algorithm

The following is a guest post. What is the make-up of a data scientist? Is it all about the amount of knowledge one possesses? The specific area of study? To answer these questions, Software Advice, a website that researches and compares BI tools (check out their guide here), decided to examine the top performers on Kaggle – the largest data scientist community in the world.

Kaggle offers an online platform that allows companies to connect with data analysts from around the world, who then compete in the company’s big data challenge (often for prize money or a job). Below are the findings from the analysis of the top 100 prize-winning Kaggle performers (as of October 15, 2013).

Stay in School

Educational background was directly correlated with success in competitions. With over 80 percent of the top 100 Kaggle users having a Master’s degree or higher, depth of study was also a common indicator of top-level winners. Additionally, 35 percent of the top performers had a Ph.D.


Don’t Stress the Major

As expected, the top areas of study among these data superstars included computer science and mathematics. While these programs of study are of no surprise, others came up that suggested a more diverse background was as popular as an expected one – areas of study ranged from economics, to philosophy, to even law.



This analysis indicates there is no “formula” to create the ideal data scientist; these data “wizards” come from all walks of life. If anything, practice does make perfect – there was a strong correlation found between amount of contests entered and competition wins.

Read the full report on Software Advice’s Business Intelligence blog, Plotting Success.    

Weekly Round-Up: Statisticians, Build Smart DC, Kirk Borne, and Treating Parkinson's

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from collecting building data to treating Parkinson's. In this week's round-up:

  • Statisticians: An Endangered Species?
  • Washington DC Launches Real-time Building Energy Data Project
  • Time Spent with Kirk Borne
  • Michael J. Fox Foundation Points Big Data At Parkinson's

Statisticians: An Endangered Species?

Our first piece this week is an interesting blog post on the Revolution Analytics blog about how statisticians are perceived and how that relates to data science. The post was inspired by an American Statistical Association Magazine article that portrayed statisticians as being left in the dust of the big data movement. The author goes on to talk about how he was surprised at how little mention there was of R in the article and how contributing to the statistical programming language may be a good way for statisticians to continue to play an important role in data science.

Washington DC Launches Real-time Building Energy Data Project

Our next piece is a GigaOM article about a project that launched last week called Build Smart DC. The project monitors energy data from city-owned buildings at 15 minute intervals to provide management with a much more granular view of energy use in the properties than ever before. This will allow them to monitor trends and make data-driven decisions that will lead to more efficient energy consumption. The article also goes on to talk about the startup that is driving this program and some other cities that have similar projects in place.

Time Spent with Kirk Borne

Our third piece is an interesting short interview with Kirk Borne. Kirk is a Professor of Astrophysics and Computational Science at George Mason University and has been one of the most influential Big Data advocates on Twitter in recent years. He talks to the interviewer about astrophysics, big data, and data science education.

Michael J. Fox Foundation Points Big Data At Parkinson's

Our final article this week is an InformationWeek piece about how the Michael J. Fox Foundation put on a Kaggle competition to see if data scientists could help identify patients that had Parkinson's and track increases and decreases in symptoms among patients that had the disease. The article highlights the winning team in the competition, some of the methods they used to generate their predictive models, and how they were about to acquire the domain knowledge that ultimately helped them win the competition.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: Data Science Roles, Technology Stacks, Predictive Analytics, and Michael Jordan

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from data science technology stacks to Michael Jordan. In this week's round-up:

  • Five Roles You Need on Your Big Data Team
  • Choosing a Data Science Technology Stack
  • 12 Predictive Analytics Screw-ups
  • What Michael Jordan Can Teach Us About Big Data, Strategy And Innovation

Five Roles You Need on Your Big Data Team

Our first piece this week is an HBR article about the different roles you need when building a data science team. Data science is a very broad field and because of this, it's difficult to find someone who has all the skills that fall under its umbrella. This article attempts to break down the skill sets into more specific roles that can work together to really create value for an organization. The article lists the different roles, describes them, and also talks about the kind of culture you need to develop in order to get everyone in the organization on board and on the same page.

Choosing a Data Science Technology Stack

This is an interesting blog post about different data science technology stacks and how we as data scientists go about choosing one that works best for us. The author points out that there are several layers to a data science stack - sourcing the data, storing it, exploring it, modeling it, etc. - and there are several technological options available for performing each layer. The post examines these different options and even has a survey you can enter the technologies you use for each layer. When the survey is complete, those who participated will be emailed the results.

12 Predictive Analytics Screw-ups

This is a ComputerWorld article about some of the pitfalls you would do well to avoid when performing predictive analytics. The author interviewed experts at 3 data science consulting firms - Elder Research, Abbott Analytics, and Prediction Impact - about about the different mistakes they encounter to come up with this list. Take a look through them and see how many you've encountered yourself!

What Michael Jordan Can Teach Us About Big Data, Strategy And Innovation

Our final piece this week is a Forbes article that uses Michael Jordan and other sports examples to drive home points about big data and how we use it in business. The author starts out by drawing a parallel between the types of decisions managers need to make these days about new technologies, opportunities, and employees to looking at Michael in his early days when his athletic potential wasn't as obvious. He continues through the rest of the article writing about the processes we go through, the data we look at in our attempts to evaluate a situation and make appropriate decisions, and how big data and advances in technology improve our abilities to do all these things over time.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Mid Maryland Data Science Kickoff Event Review

On Tuesday, January 29th, nearly 90 academics, professionals, and data science enthusiasts gathered at JHU APL for the kick-off meetup of the new Mid-Maryland Data Science group. With samosas on their plates and sodas in hand, members filled the air with conversations about their work and interests. After their meal, members were ushered into the main auditorium and the presenters took their place at the front. PANO_20130129_183408


Greetings and Mission

by Jason Barbour & Matt Motyka

Jason and Matt kicked off the talks with an introduction of the group. Motivated by both growth of data science and the vast opportunities being made available by powerful free tools and open access to data, they described their interest in creating a local group that help grow  Maryland data science community. Being software developers with analytic experience, Jason and Matt next described their seven keys to a success analytic: infrastructure, people, data, model, and presentation. Lastly, metrics about the interests and experience of the members was presented.

The Rise of Data Products

by Sean Murphy

With excitement and passion, Sean took the stage to show how now is the Gold Rush for data products. Laying out the definition of a data product, and cycling through several well known examples, Sean explained how these products are able to bring social, financial, or environmental value through the combination of data and algorithms. Consumers want data, and the tools and infrastructure needed to supply this demand are available either freely or extremely low cost. Data scientists are now able to harness this stack of tools to provide the data products that consumers crave. As Sean succinctly stated, it is a great time time to work with data.

The article version of the talk can be found here.

The Variety of Data Scientists

by Harlan Harris

Being a full-fledged data science, Harlan followed up Sean by presenting his research into what the name “data scientists” really means. Using the results of a data scientist survey, Harlan listed several skill groupings that provide a shorthand for the variety of skills that data scientists possess: programming, stats, math, business, and machine learning/big data. Next Harlan, discussed that the diverse backgrounds of data scientists can be more accurately categorized into four types: data businessperson, data creative, data researcher, and data engineer. With this breakdown, Harlan demonstrated that the data scientists community is actually composed of individuals with a variety of interests and skills.

Cloudera Impala - Closing the near real time gap in BIGDATA

by Wayne Wheeles

A true cyber security evangelist, Wayne Wheeles presented how Cloudera’s Impala, was able to make near real time security analysis a reality. With his years of experience in the field of cyber security, and his prior work utilizing big data technologies, Wayne was given unique access to Cloudera’s latest tool. Through his testing and analysis, he concluded that the Impala tool offered a significant improvement in performance and could become a vital tool in cyber security.

After the last presentation, more than a dozen members joined joined us at nearby Looney’s Pub to end the night with a few beers and snacks. To everyone's surprise, Donald Miner of EMC Greenplum offered to pick-up the tab! You can follow him on Twitter or LinkedIn from this page.

If you missed this first event, don't worry as the next one is coming up on March 14th in Baltimore. Check it out here.