TensorFlow's DC Introduction

The Hello, TensorFlow! post introducing the basic workings of the TensorFlow deep learning framework, up now in O'Reilly's Data, AI, and Learning sections, is a product of the local data community.

Aaron Schumacher, one of the Data Science DC organizers and an employee of Arlington-based Deep Learning Analytics, wrote the article with the support of many local reviewers, including feedback from members of the DC Machine Learning Journal Club.

Aaron will be giving a talk on the material of Hello, TensorFlow! on Wednesday June 29 as part of the Deep Dive into TensorFlow meetup to be hosted at Sapient in Arlington. It should be a great opportunity to explore and discuss this new and exciting tool!

Introducing Women Data Scientists DC

Last month, a new meetup group for women data scientists in the DC area was started by Mandi Traud and Jackie Kazil. 

Women Data Scientists DC is a meetup group for women data scientists, women who want to be data scientists, and supporters of women in data science. Their monthly meetings will include presentations by data scientists, networking events, mentoring opportunities, and workshops to learn new data science skills.

Co-founders Jackie Kazil and Mandi Traud launched on July 9th with two members, and by the next day, the group had more than 85 members and growing!

Here's what the co-founders said individually when asked about how and why they decided to start this group. 

DC NLP September 2014 Meetup Announcement: Natural Language Processing for Assistive Technologies

Curious about techniques and methods for applying data science to unstructured text? Join us at the DC NLP September Meetup!

This month, we're joined by Kathy McCoy, Professor of Computer & Information Science and Linguistics at the University of Delaware. Kathy is also a consultant for the National Institute on Disability and Rehabilitation Research (NIDRR) at the U.S. Department of Education. Her research focuses on natural language generation and understanding, particularly for assistive technologies, and she'll be giving a presentation on Replicating Semantic Connections Made by Visual Readers for a Scanning System for Nonvisual Readers.

Welcome to DataKind DC!

Harlan Harris is the President and a co-founder of Data Community DC, and is a long-time fan of DataKind. Last week, DataKind, the nonprofit that connects pro-bono data and tech folks with nonprofits in need of data help, announced the first regional chapters, in the UK, Bangalore, Dublin, Singapore, San Francisco, and best of all (we think!), Washington, DC!

Announcing Discussion Lists! First up: Deep Learning

Data Community DC is pleased to announce a new service to the area data community: topic-specific discussion lists! In this way we hope to extend the successes of our Meetups and workshops by providing a way for groups of local people with similar interests to maintain contact and have ongoing discussions. Our first discussion list will be on the topic of Deep Learning. The below is a guest post from John Kaufhold. Dr. Kaufhold is a data scientist and managing partner of Deep Learning Analytics, a data science company based in Arlington, VA. He presented an introduction to Deep Learning at the March Data Science DC Meetup. A while back, there was this blog post about Deep Learning. At the end, we asked readers about their interest in hands-on Deep Learning tutorials.


The results are in, and the survey went to 11. And as in all data science, context matters--and this eleven is decidedly less inspiring than Nigel Tufnel’s eleven. That said, ten out of eleven respondents wanted a hands-on Deep Learning tutorial, and eight respondents said they would register for a tutorial even if it required hardware approval or enrollment in a hardware tutorial. But interest in practical hands-on Deep Learning workshops appears to be highly nonuniform. One respondent said they’d drive from hundreds of miles away for these workshops, but of the 3000+ data scientists in DC’s data and analytics community, presumably more local, only eleven total responded with interest.

In short, the survey was a bust.

So it’s still not clear what the area data community wants out of Deep Learning, if anything, but since April I’ve gotten plenty of questions from plenty of people about Deep Learning on everything from hardware to parameter tuning, so I know there’s more interest than what we got back on the survey. Since a lot of these questions are probably shared, a discussion list might help us figure out how we can best help the most members get started in Deep Learning.

So how about a Deep Learning discussion list? If you’re a local and want to talk about Deep Learning, sign up here:

For the record, this discussion list was Harlan’s original suggestion. If you’re looking to take away any rules of thumb here, a simple one is “just agree with whatever Harlan says.” Tommy Jones and I will run this discussion list for now. To be clear, this list caters to the specific Deep Learning interests of data enthusiasts in the DC area. For a bigger community, there’s always, the Deep Learning google+ page , and individual mailing lists and git repos for specific Deep Learning codebases, like Caffe, pylearn2, and Torch7.

In the meantime, I was happy to see some Deep Learning interest at DC NLP’s Open Mic night by Christo Kirov. And NLP data scientists need not watch Deep Learning developments from the sidelines anymore; some recent motivating results in the NLP space have been summarized in a tutorial by Richard Socher. I’m not qualified to say whether these are the kind of historic breakthroughs we’ve recently seen in speech recognition and object recognition, but it’s worth taking a look at what's happening out there.

Win Free eCopies of Social Media Mining with R

This is a sponsored post by Richard Heimann. Rich is Chief Data Scientist at L-3 NSS and recently published Social Media Mining with R (Packt Publishing, 2014) with co-author Nathan Danneman, also a Data Scientist at L-3 NSS Data Tactics. Nathan has been featured at recent Data Science DC and DC NLP meetups. Nathan Danneman and Richard Heimann have teamed up with DC2 to organize a giveaway of their new book, Social Media Mining with R.

Over the new two weeks five lucky winners will win a digital copy of the book. Please keep reading to find out how you can be one of the winners and learn more about Social Media Mining with R.

Overview: Social Media Mining with R

Social Media Mining with R is a concise, hands-on guide with several practical examples of social media data mining and a detailed treatise on inference and social science research that will help you in mining data in the real world.

Whether you are an undergraduate who wishes to get hands-on experience working with social data from the Web, a practitioner wishing to expand your competencies and learn unsupervised sentiment analysis, or you are simply interested in social data analysis, this book will prove to be an essential asset. No previous experience with R or statistics is required, though having knowledge of both will enrich your experience. Readers will learn the following:

  • Learn the basics of R and all the data types
  • Explore the vast expanse of social science research
  • Discover more about data potential, the pitfalls, and inferential gotchas
  • Gain an insight into the concepts of supervised and unsupervised learning
  • Familiarize yourself with visualization and some cognitive pitfalls
  • Delve into exploratory data analysis
  • Understand the minute details of sentiment analysis

How to Enter?

All you need to do is share your favorite effort in social media mining or more broadly in text analysis and natural language processing in the comments section of this blog. This can be some analytical output, a seminal white paper or an interesting commercial or open source package! In this way, there are no losers as we will all learn. 

The first five commenters will win a free copy of the eBook. (DC2 board members and staff are not eligible to win.) Share your public social media accounts (, Twitter, LinkedIn, etc.) in your comment, or email after posting.

A New Type of Meet Up Event?

Come join us the day after Memorial day for a new type of Meet Up. In the past, Data Innovation DC and Data Community DC have brought in fascinating speakers discussing data products and services that have already been built or data sets that are now available for public consumption. This Tuesday, we are changing things up as part of the National Day of Civic Hacking. Our goal is to have individuals and teams interested in building commercially viable data products attend and listen to experts strongly familiar with data problems that consumers of US Census data are having.  Simply put, we are trying to line up problems that other people (also known as potential customers) will pay to have them solved.  As a massive added bonus, if your team can put something together before the end of next weekend, you may be able to attract national-level press interest.

Some of the bios for our Tuesday Panelists are below. If you are interested in attending for free, please register here.

Andy Hait

Andrew W. Hait serves as the Data Product and Data User Liaison in the Economic Planning and Coordination Division at the U.S. Census Bureau.  With over 26 years of service at the Bureau, Andy oversees the data products and tools and coordinates data user training for the Economic Census and the Census Bureaus other economic survey programs. He also is the lead geographic specialist in the Economic Programs directorate.  Andy is the Census Bureau’s inside man for understanding our customer’s needs.

Judith Johnson (Remote)

Judith K. Johnson joins us from the Small Business Administration-funded Small Business Development Center’s (SBDC) National Information Clearinghouse to as Lead Librarian. She monitors daily incoming operations, provide business information research and review completed research by staff before distribution to SBDC advisors located nationwide.  Ms. Johnson’s also provides preliminary patent or trademark searches and trains staff and SBDC advisors.  She comes to the panel with a strong handle on entrepreneur / business owner data needs.

Matthew Earls,

M.U.R.P., is a GIS Analyst at Carson Research Consulting (CRC). His work primarily revolves around the Baltimore DataMind. Mr. Earls is also responsible for managing social media (e.g., Facebook and Twitter) for the DataMind as well as the DataMind blog. He provides assistance with data visualization and mapping for other CRC projects as needed.

Dr. Taj Carson

The CEO and founder of Carson Research Consulting (CRC), a research and evaluation firm based in Baltimore. Dr. Carson has been working in the field of evaluation since 1997 and specializes in research and evaluation that can be used to improve organizations and program performance. She is also the creator and driving force behind the Baltimore DataMind, an interactive online mapping tool that allows users to visualize various socio-economic data for the Baltimore city at the neighborhood level.

Kim Pierson (remote)

Kim Pierson is a Senior Data Analyst with ProvPlan in Providence, Rhode Island. She has 6 years of experience in data analysis, geospatial information, and data visualization.  She works with community organizations, non-profits, government agencies, and national organizations to transform data into information that supports better decision making, strengthens communities, and a promotes a more informed populace.  She specializes in urban-data analysis including demographic, education, health, public safety, and Census data. She has worked on web-based data and mapping applications including the RI Community Profiles, RI DataHUB, and ArcGIS Viewer for Flex applications. She holds a M.A. degree in Urban and Regional Planning from the University of Illinois.


The Pragmatic Hackathon - Lean Customer Development for Data Products with the US Census Bureau

Interested in starting a company? It is summertime, the time for sequels. Our first event with the US Census Bureau was such a success that we are having a follow up event as part of the National Day of Civic Hacking.

In our first Census Event, we had Census data experts come and talk about the data that the US Census Bureau has available and how it could potentially be used to start a company. During this event, it was uncovered that Census has a number of data consumers that have legitimate problems around the Census data that they consume; these companies could use help and this represented a very legitimate business opportunity.

At this event, we are going to bring in actual Census data consumers to discuss their data-related problems. Why? Because customer development and finding the data-product market fit are the hard parts of starting a company. By providing access to potential customers who have very specific problems around open data sets, we are trying to lower the barriers for enterprising individuals and teams to start companies. We sincerely hoping that teams will form to address these issues and potentially commercialize their solutions.

If you want, think of this as a practical hackathon. Instead of spending the weekend building a small application or website or data visualization, spend a few hours understanding real, addressable business problems that can be commercialized. We will leave it to you and your team to build the solution on your own time but will still provide drinks and the pizza.

Oh yeah, time to mention the last carrot we have to dangle. We will be an official part of the National Day of Civic Hacking. That means that those industrious teams that jump in to solve a problem and that can assemble something interesting by the end of the weekend, could get access to national level press.

Questions, please email Sean Murphy through or via Twitter @SayHiToSean.

Event Recap: Jawbone UP & Behavior Change

Guest post blogger Jenna Dutcher is the community relations manager for UC Berkeley’s datascience@berkeley degree – the first and only online Master of Information and Data Science.  Follow datascience@berkeley on Twitter and Facebook for news and updates. jenna May’s meeting of Action Design DC was sponsored by Fluencia and Hello Wallet and featured a talk by Kelvin Kwong of Jawbone.  Kwong is a product manager for the UP band, a role that left him well prepared to speak to the group about Jawbone UP: Designing for Exercise and More.  But what is the “more” in this scenario?  Kwong explained that Jawbone UP tracks the rhyming actions of food, mood, sleep, and eat[ing], but their true purpose goes further than just data collection. In fact, the company’s bread and butter is in behavior change, the act of getting people to do the things Jawbone knows they want to do.

The company doesn’t do this by guessing or “intuition”; rather, they employ the latest behavioral science in their quest to turn intention into action.  “No matter where we're starting, we all want to be better,” Kwong said, and Jawbone has taken it upon themselves to make this into a reality.  Human nature often leaves a gap between intention and action; for example, people may want to eat healthier, finish their course of antibiotics, or quit smoking, but those actions are often more difficult than they seem at first glance.  Many don’t follow through.  The goal of lifestyle wearables is to drive behavior change.  They aim to do this with a three-step process, by tracking, understanding, and acting.

  • Track - The first step is to build something wearable, a device that customers will want to sport daily.  Constant monitoring will allow Jawbone to gather your data, which is a necessary first step before any behaviors can be changed.
  • Understand - Once data has been collected, scientists get to work comparing and contrasting, looking for correlations in the data that might hint towards causation. More on this below.
  • Act -  Here, Kwong says, scientists face an interesting dilemma: now that we understand you, how do we compel you to do the things you said you want to do?  Forty-seven percent of daily decisions are made unconsciously. Think about all of the tiny decisions we have in life that we don’t even think about, like taking the stairs instead of the elevator, or walking rather than driving your car; this is the area where the UP band wants to have a real impact.  Once scientists understand those ingrained habits, they can make an effort to change them.

What are the principles behind this?  Kwong explained that we’re at an inflection point in behavioral science.  There has been a lot of theory to lay the groundwork of this science, and now companies like Jawbone must figure out the practical applications: how can Jawbone scientists apply these tenets to encourage behavior change for their user base?  In past decades, a lot of the research has been applied to consumer science, helping markets to advertise and upsell.

Jawbone doesn’t want to market, however; instead, they want to make an impactful difference in their customers’ lives.  Put a different way, Jawbone wants to create episodic interventions that lead to longitudinal interventions.  Luckily for them, they have a large pool of research subjects to draw from.  Rather than working with the tens or hundreds of subjects an academic researcher might have access to, Jawbone has data from hundreds of thousands of users at their fingertips.

Using this dataset, they conducted the biggest sleep study ever performed, on an aggregated 80 million nights of sleep, and observed interesting trends like gender and age gaps in amount of time slept.  While academic researchers have noted this in the past, it was never feasible to measure it across age groups, simply due to the difficulties of collecting a large enough sample from each group.  Using the UP band’s aggregate user data, however, Jawbone’s scientists were able to note a wide gap in average time spent sleeping between men and women at a college age, and a narrowing of this gap in retirement.

Let’s look at another real example: Jawbone’s extensive dataset allows their researchers to tie self-reported data (do you sleep with multiple pillows? Do you share a bed with a partner?  Is your mobile phone in the room during slumber?) to actual data (does your tracker show eight hours of sleep? Five times awakened?) and spot patterns in this aggregated dataset (for example, people who share a bed with a partner average 35 minutes more sleep each night).  After they notice these correlations, they can begin to apply a narrative and generate hypotheses. In this scenario, the presence of a partner illuminates a decision point, a sort of physically present bedtime reminder.

Kwong also demonstrated UP’s “Today I Will” feature, which is a very real manifestation of these behavior change efforts.  This feature of the UP band sends intelligent suggestions to users based on their behavior, and holds them accountable for completing the tasks (i.e., a daily increase in steps, sleep, or water intake).  Research has shown that if you say you’re going to do something, the likelihood is much higher that you’ll complete that task.

This is what’s known as the “foot in the door” technique - in Jawbone’s case, if you ask someone to use the Today I Will feature and announce their daily goal, they’re much more likely to hit that target.  In fact, opt-ins to the Today I Will feature go to bed 23 minutes earlier than those who have not pledged to modify this behavior.  In addition, there’s a 72% increased likelihood that these people will go to bed early enough to hit their sleep goal.

These behavioral change results have also been replicated with an activity goal.  Thanksgiving is one of the days where Jawbone users take the fewest daily steps (too busy eating turkey and catching up with family!).  In one experiment, the UP band’s Today I Will feature simply told users that they were less likely to get their steps on this day, and challenged them to get more steps.  The result?  Opt-ins walked 20% more on Thanksgiving than on their average days simply because they had publicly announced this goal.  The “foot in the door” technique is successful for one key reason: reaching your expressed goal allows you to remain congruent with your self image.

Above all else, Jawbone wants to create a product that solves a true human need and can be fully integrated into a user’s life.  Audience members raised questions about the accuracy of a wrist-based step tracker, but Kwong assured them that the product has been subjected to innumerable tests in an effort to increase step count accuracy.  More interestingly, he said, the future of activity tracker success isn't in incremental accuracy gains, but rather in what can be done with the data collected.  As he said, a few hundred steps gained or lost won’t make a difference in the long run; it’s the behavioral influences brought on by challenges like Today I Will that let Jawbone know they’re really on to something special with the UP band.

Want more information? Kelvin can be found on twitter at @kelvinskwong.