Big Data Analysis with Topic Models: Human Interaction, Streaming Computation, and Social Science Applications from DC NLP

Curious about techniques and methods for applying data science to unstructured text?

Now that we've hit our stride, November's meetup will be at the same bat-time (second Wednesday of the month at 6:30pm) and same bat-channel (Stetsons Famous Bar & Grill in Adams Morgan). Please join us for a captivating presentation, stimulating conversation, and refreshing libations.

At our next meetup on Wednesday, November 13, we've got a special guest: Prof. Jordan Boyd-Graber from UMIACS, who'll be presenting a single hour-long talk.


Big Data Analysis with Topic Models: Human Interaction, Streaming Computation, and Social Science Applications


A common information need is to understand large, unstructured datasets: millions of e-mails during e-discovery, a decade worth of science correspondence, or a day’s tweets. In the last decade, topic models have become a common tool for navigating such datasets. This talk investigates the foundational research that allows successful tools for these data exploration tasks: how to know when you have an effective model of the dataset; how to correct bad models; how to scale to large datasets; and how to detect framing and spin using these techniques. After introducing topic models, I argue why traditional measures of topic model quality---borrowed from machine learning---are inconsistent with how topic models are actually used. In response, I describe interactive topic modeling, a technique that enables users to impart their insights and preferences to models in a principled, interactive way. I will then address computational and statistical limits to existing approaches and how streaming topic models, with an "infinite vocabulary", can be applied to real-world online datasets. Finally, I’ll discuss ongoing collaborations with political scientists to use these techniques to detect spin and framing in political and online interactions.

The DC NLP meetup group is for anyone in the Washington, D.C. area working in, or interested in learning about, Natural Language Processing (NLP). Our meetings will be an opportunity for folks to network, give presentations about their work or research projects, learn about the latest advancements in our field, and exchange ideas or brainstorm. Topics may include computational linguistics, machine learning, text analytics, data mining, information extraction, speech processing, sentiment analysis, and much more.

For more information and to RSVP, please visit:

Follow us on Twitter: @DCNLP