Data Mining & Machine Learning with R on October 31st

Data Community DC and District Data Labs are hosting a Data Mining & Machine Learning with R workshop on Saturday October 31st from 9am - 5pm. Register before October 17th for an early bird discount!


R is a powerful language for statistical computing. A prolific user community backs R with with an extensive library of packages. If you can think of it, somebody has already written a library for it. R also has a superb IDE, R Studio, facilitating reproducible research.

This course is for people with some R programming experience. It gives an overview of statistical modeling and machine learning in R. We will focus on a small subset of algorithms. Students will learn where to find more machine learning libraries in R.

What You Will Learn

This course introduces R capabilities for regression, classification, clustering, and association rule mining. Many machine learning algorithms exist and it is only possible to cover a small subset in a single class.

We will focus on:

  • Linear and logistic regression
  • Decision tree and SVM classifiers
  • Hierarchical and K-means clustering
  • Similarity networks for association rule mining

Course Outline

  • Reproducible research: Setting up an R Studio Project and file structure.
  • Review of R, R Studio, and R markdown.
  • CRAN task view: machine learning
  • Linear regression methods.
  • General linear models, focusing on logistic regression.
  • Decision trees and random forests.
  • Support vector machines.
  • Training, testing, and k-fold cross validation.
  • CRAN task view: clustering
  • Agglomerative hierarchical clustering.
  • K-means and k-medoids clustering.
  • Constructing similarity networks.
  • Association rule mining.
  • Final project: construct a reproducible data analysis with R markdown and techniques covered.

After this course you will have used several supervised and unsupervised machine learning methods. Where possible, you will understand how to evaluate these methods. You will have stored your work using reproducible research techniques. This will allow you to revisit your work (and publish it on the web if you'd like).

Instructor: Tommy Jones

Tommy is a statistician, mathematician, or data scientist; depending on the problem or audience. He holds an MS in mathematics and statistics from Georgetown University and a BA in economics from the College of William and Mary. He is the Director of Data Science at Impact Research.

Tommy has previously performed economic and statistical modeling and analysis at the Science and Technology Policy Institute, the Federal Reserve Board, and the Institute for the Theory and Practice of International Relations. He has expertise in regression analyses, time series modeling and forecasting, natural language processing, data mining, and other quantitative techniques.

Register on the DDL Website