Event Review: Carl Morris Symposium on Large-Scale Data Inference

This is a guest blog post by area Statistician Jerzy Wieczorek. Jerzy is an active contributor to the local community through his blog, Civil Statistician, his twitter account, @civilstat, and Meetups, where he recently talked about interactive map visualizations in R. He is also active in pro-bono statistical consulting for non-profits and government agencies: internationally via Statistics Without Borders, and locally via DataKind's DataCorps project with DC Action for Children. Thanks, Jerzy! The 2nd Symposium on Large-Scale Data Inference took place last Thursday in Silver Spring. The symposium, organized by Social & Scientific Systems, focused on the intersection of statistics and data visualization and honored Carl Morris as the keynote speaker. Dr. Morris, whose work on Empirical Bayes methods and hierarchical models underlies many advances in large-scale data analysis, spoke about using multilevel modeling to reduce false positives in situations where you expect to see regression to the mean. He explained why "you’ll do better as a frequentist if you use Bayesian methods." The other speakers included:

  • Mark Hansen, on data-driven art installations, teaching data journalism, and looking at the data from all perspectives ("Everyone types the name of an object in R and watches all 20 billion lines scroll by at least once, right?"),
  • Di Cook, on "visual inference" i.e. informal hypothesis testing by trying to pick out the real data plot out of a lineup of fake data plots; and testing this approach on Mechanical Turk vs. on colleagues ("It’s impressive that statisticians have got good visual skills…"),
  • Rob Kass, on statistical modeling in cognitive neuroscience and the role of statistical thinking in general (Use models to decompose variation into knowledge vs. uncertainty; and Analyse the modeling procedures themselves),
  • Chris Volinsky, on using mobile phone data in city planning, useful metrics for classifying/clustering such structured data, and the practice of doing data visualization as a team ("Physical manifestations beat small screens"),
  • a final panel discussion, on defining big data (starting with "Big data is whatever can’t be read into R easily"); the statistician's role on a research team; distinctions between statistics and other quantitative sciences; tools and skills that statisticians should learn; and when to optimize carefully vs. just try things out.

See civilstat.com for more detailed notes on each talk: part 1 (Morris and  Hansen), part 2 (Cook and Kass), and part 3 (Volinksy and the panel).

With the symposium's small, comfortable scale, I found it easy to chat with the speakers and fellow attendees during breaks. There was also a student poster session during the lunch break, showcasing work in dynamic graphics and hierarchical modeling. For anyone who wants a broad, accessible insight into recent research and underlying themes in statistics, big data, and visualization, I highly recommend this symposium series.