Building Data Apps with Python Workshop Returns on June 6th

Building Data Apps with Python Workshop Returns on June 6th

Data products are usually software applications that derive their value from data by leveraging the data science pipeline and generate data through their operation. They aren’t apps with data, nor are they one time analyses that produce insights - they are operational and interactive. The rise of these types of applications has directly contributed to the rise of the data scientist and the idea that data scientists are professionals “who are better at statistics than any software engineer and better at software engineering than any statistician.”

These applications have been largely built with Python. Python is flexible enough to develop extremely quickly on many different types of servers and has a rich tradition in web applications. Python contributes to every stage of the data science pipeline including real time ingestion and the production of APIs, and it is powerful enough to perform machine learning computations. In this class we’ll produce a data product with Python, leveraging every stage of the data science pipeline to produce a book recommender.

DC NLP November 2014 Meetup Announcement: Introduction to Topic Modeling with LDA

Curious about techniques and methods for applying data science to unstructured text? Join us at the DC NLP November Meetup!

This month's event features an overview of Latent Dirichlet Allocation and probabilistic topic modeling.

Topic models are a family of models to estimate the distribution of abstract concepts (topics) that make up a collection of documents. Over the last several years, the popularity of topic modeling has swelled. One model, Latent Dirichlet Allocation (LDA), is especially popular.

Notes on a Meetup

This is a guest post by Catherine Madden (@catmule), a lifelong doodler who realized a few years ago that doodling, sketching, and visual facilitation can be immensely useful in a professional environment. The post consists of her notes from the most recent Data Visualization DC Meetup. Catherine works as the lead designer for the Analytics Visualization Studio at Deloitte Consulting, designing user experiences and visual interfaces for visual analytics prototypes. She prefers Paper by Fifty Three and their Pencil stylus for digital note taking. (Click on the image to open full size.)

October Starts Off Right: A Full Month of Events for Women in Tech

This is a guest post by Shannon Turner, a software developer and founder of Hear Me Code, offering free, beginner-friendly coding classes for women in the DC area. In her spare time she creates projects like Shut That Down and serves as a mentor with Code for Progress. 

Over 200 women were in attendance for the DC Fem Tech Tour de Code Kickoff party held at Google Thursday night.  DC Fem Tech, a collective of over 25 women in tech organizations, collaborates to run events and support the women in DC's tech community.

DC NLP October 2014 Meetup Announcement: Automated Query Parsing, and Fact Checking with Truth Teller

Curious about techniques and methods for applying data science to unstructured text? Join us at the DC NLP October Meetup!

This month features an introduction to the art of automated query parsing, and a discussion of a WaPo app that automates some of the tedium of fact checking.

Tony Maull is a Senior Director of Enterprise at DataRPM. He will discuss the differences between computational search and content search. His primary focus is how the computation can be relied upon when a natural language question can be asked any number of ways but still needs to drive a consistently accurate answer.

Social Network Analysis with Python Workshop on November 22nd

Data Community DC and District Data Labs are hosting a full-day Social Network Analysis with Python workshop on Saturday November 22nd.  For more info and to sign up, go to  Register before October 31st for an early bird discount!

Social networks are not new, even though websites like Facebook and Twitter might make you want to believe they are; and trust me- I’m not talking about Myspace! Social networks are extremely interesting models for human behavior, whose study dates back to the early twentieth century. However, because of those websites, data scientists have access to much more data than the anthropologists who studied the networks of tribes!

Fast Data Applications with Spark & Python Workshop on November 8th

Data Community DC and District Data Labs are excited to be hosting a Fast Data Applications with Spark & Python workshop on November 8th  For more info and to sign up, go to  There’s even an early bird discount if you register before October 17th!

Hadoop has made the world of Big Data possible by providing a framework for distributed computing on economical, commercial off-the-shelf hardware. Hadoop 2.0 implements a distributed file system, HDFS, and a computing framework, YARN, that allows distributed applications to easily harness the power of clustered computing on extremely large data sets. Over the past decade, the primary application framework has been MapReduce - a functional programming paradigm that lends itself extremely well to designing distributed applications, but carries with it a lot of computational overhead.

DC NLP September 2014 Meetup Announcement: Natural Language Processing for Assistive Technologies

Curious about techniques and methods for applying data science to unstructured text? Join us at the DC NLP September Meetup!

This month, we're joined by Kathy McCoy, Professor of Computer & Information Science and Linguistics at the University of Delaware. Kathy is also a consultant for the National Institute on Disability and Rehabilitation Research (NIDRR) at the U.S. Department of Education. Her research focuses on natural language generation and understanding, particularly for assistive technologies, and she'll be giving a presentation on Replicating Semantic Connections Made by Visual Readers for a Scanning System for Nonvisual Readers.