Ensemble Learning Reading List

Tuesday's Data Science DC Meetup features GMU graduate student Jay Hyer's introduction of Ensemble Learning, a core set of Machine Learning techniques. Here are Jay's suggestions for readings and resources related to the topic. Attend the Meetup, and follow Jay on Twitter at @aDataHead! Also note that all images contain Amazon Affiliate links and will result in DC2 getting a small percentage of the proceeds should you purchase the book. Thanks for the support!

L. Breiman, J. Friedman, C.J. Stone, and R.A. Olshen. Classification and Regression Trees. Chapman and Hall.CRC, Boca Raton, FL, 1984.

This book does not cover ensemble methods, but is the book that introduced classification and regression trees (CART), which is the basis of Random Forests. Classification trees are also the basis of the AdaBoost algorithm. CART methods are an important tool for a data scientist to have in their skill set.

L. Breiman. Random Forests. Machine Learning, 45(1):5-32, 2001.

This is the article that started it all.

T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, 2nd ed. Springer, New York, NY, 2009.

This book is light on application and heavy on theory. Nevertheless, chapters 10, 15 & 16 give very thorough coverage to boosting, Random Forests and ensemble learning, respectively. A free PDF version of the book is available on Tibshirani’s website.

G. James, D. Witten, T. Hastie, R. Tibshirani. An Introduction to Statistical Learning: with Apllications in R, Springer, New York, NY, 2013.

As the name and co-authors imply, this is an introductory version of the previous book in this list. Chapter 8 covers, bagging, Random Forests and boosting.

Y. Freund and R.E. Schapire. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting Journal of Computer and System Sciences, 55(1): 119-139, 1997.

This is the article that introduced the AdaBoost algorithm.

G. Seni, and J. Elder. Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions. Morgan & Claypool Publishers, USA, 2010.

This is a good book with great illustrations and graphs. There is also a lot of R code too!

Z.H. Zhou. Ensemble Methods: Foundations and Algorithms. Chapman & Hall/CRC, 2012.

This is an excellent book the covers ensemble learning from A-Z and is well suited for anyone from an eager beginner to a critical expert.