Hadoop for Data Science: A Data Science MD Recap

Hadoop logo On October 9th, Data Science MD welcomed Dr. Donald Miner as its speaker to talk about doing data science work and how the hadoop framework can help. To start the presentation, Don was very clear about one thing: hadoop is bad at a lot of things. It is not meant to be a panacea for every problem a data scientist will face.

With that in mind, Don spoke about the benefits that hadoop offers data scientists. Hadoop is a great tool for data exploration. It can easily handle filtering, sampling and anti-filtering (summarization) tasks. When speaking about these concepts, Don expressed the benefits of each and included some anecdotes that helped to show real world value. He also spoke about data cleanliness in a very Baz Luhrmann Wear Sunscreen sort of way, offering that as his biggest piece of advice.

Don then transitioned to the more traditional data science problems of classification (including NLP) and recommender systems.

The talk was very well received by DSMD members. If you missed it, check out the video:

http://www.youtube.com/playlist?list=PLgqwinaq-u-Mj5keXlUOrH-GKTR-LDMv4

Our next event will be November 20th, 2013 at Loyola University Maryland Graduate Center starting at 6:30PM. We will be digging deeper into the daily lives of 3 data scientists. We hope you will join us!