We asked presenters at the various Big Data Week events here in the DC area to send us any books, articles, or blog posts that they recommend that are related to their presentations. We hope you find this list of resources useful. (And we will add to it if additional speakers provide suggestions!)
Apr. 22nd -- INFORMS MD -- Getting Started with Big Data Analytics
Brian Keller recommends:
- The Yahoo Hadoop tutorial -- a great starting point for learning about Hadoop and Map-Reduce.
- The book MapReduce Design Patterns -- as a reference.
- R Bloggers -- for exposure to a wide variety of analytic techniques.
Apr. 22nd -- Data Visualization DC -- Big Data Visualization
Abhijit Dasgupta recommends:
- Nathan Yau's book, Visualize This, and his blog, Flowing Data.
- Hadley Wickham's BigVis paper, to be discussed at the event.
Ben Shneiderman recommends several introductory texts:
- Two books of his (with others): Analyzing Social Media Networks with NodeXL, and Interactive Visualization: Insight through Inquiry
- A collection: Mastering the Information Age: Solving Problems with Visual Analytics (pdf)
- Manuel Lima's Visual Complexity: Mapping Patterns of Information
April 23rd -- Data Science DC -- Natural Language Processing and Big Data
Ben Bengfort suggests:
- The NLTK book: Natural Language Processing with Python
- A set of NLTK introductory slides (pdf)
- Michael Noll's Hadoop Streaming with Python tutorials
Thomas Rindflesch recommends several papers related to his work on Semantic MEDLINE, including:
Apr. 23rd -- Big Data DC -- Challenges of Visualizing and Exploring Big Data
Will Gorman lists a number of important tools and technologies used in big data systems and data visualization:
- D3, Data Lake, Database Cracking (pdf), Dremel, Drill, Druid, Google Big Query, Impala, Parquet, Spark, Stinger
Apr. 24th -- Data Business DC -- Big Data Infrastructure
Charles Scyphers of Oracle recommends, for the business and non-technical attendees:
- The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century
- Planning For Big Data E-Book (free)
- Beautiful Data: The Stories Behind Elegant Data Solutions
- How to Measure Anything: Finding the Value of Intangibles in Business
Tom Zeng of Intridea suggested the following resources, which are more technical and geared more toward developers/technical managers.
- NoSQL (focus on MongoDB and Riak) related: MongoDB in Action, MongoDB Applied Design Patterns
- Some Ruby gems used by Intridea: MongoDB, Riak, MongoDB and Riak
- Local environments for learning and exploring Hadoop(Pig/Hive/Impala/HBase): HortonWorks Sandbox, Cloud QuickStart Virtual Machine
And, Jim Fiori at MapR recommends:
- Yahoo!'s Hadoop tutorial
Please note that Data Community DC is an Amazon Affiliate. Thus, if we recommend and link to a particular book to Amazon and you click on that link and then buy the book, we get a very small percentage of the proceeds (and eventually retire to a very small island in the Caribbean :) ).