predictive analytics

Simulation and Predictive Analytics

This is a guest post by Lawrence Leemis, a professor in the Department of Mathematics at The College of William & Mary.  A front-page article over the weekend in the Wall Street Journal indicated that the number one profession of interest to tech firms is a data scientist, someone whose analytic skills, computing skills, and domain skills are able to detect signals from data and use them to advantage. Although the terms are squishy, the push today is for "big data" skills and "predictive analytics" skills which allow firms to leverage the deluge of data that is now accessible.

I attended the Joint Statistical Meetings last week in Boston and I was impressed by the number of talks that referred to big data sets and also the number that used the R language. Over half of the technical talks that I attended included a simulation study of one type or another.

The two traditional aspects of the scientific method, namely theory and experimentation, have been enhanced with computation being added as a third leg. Sitting at the center of computation is simulation, which is the topic of this post. Simulation is a useful tool when analytic methods fail because of mathematical intractability.

The questions that I will address here are how Monte Carlo simulation and discrete-event simulation differ and how they fit into the general framework of predictive analytics.

First, how do how Monte Carlo and discrete-event simulation differ? Monte Carlo simulation is appropriate when the passage of time does not play a significant role. Probability calculations involving problems associated with playing cards, dice, and coins, for example, can be solved by Monte Carlo.

Discrete-event simulation, on the other hand, has the passage of time as an integral part of the model. The classic application areas in which discrete-event simulation has been applied are queuing, inventory, and reliability. As an illustration, a mathematical model for a queue with a single server might consist of (a) a probability distribution for the time between arrivals to the queue, (b) a probability distribution for the service time at the queue, and (c) an algorithm for placing entities in the queue (first-come-first served is the usual default). Discrete-event simulation can be coded into any algorithmic language, although the coding is tedious. Because of the complexities of coding a discrete-event simulation, dozens of languages have been developed to ease implementation of a model. 

The field of predictive analytics leans heavily on the tools from data mining in order to identify patterns and trends in a data set. Once an appropriate question has been posed, these patterns and trends in explanatory variables (often called covariates) are used to predict future behavior of variables of interest. There is both an art and a science in predictive analytics. The science side includes the standard tools of associated with mathematics computation, probability, and statistics. The art side consists mainly of making appropriate assumptions about the mathematical model constructed for predicting future outcomes. Simulation is used primarily for verification and validation of the mathematical models associated with a predictive analytics model. It can be used to determine whether the probabilistic models are reasonable and appropriate for a particular problem.

Two sources for further training in simulation are a workshop in Catonsville, Maryland on September 12-13 by Barry Lawson (University of Richmond) and me or the Winter Simulation Conference (December 7-10, 2014) in Savannah.

Weekly Round-Up: Data Science Roles, Technology Stacks, Predictive Analytics, and Michael Jordan

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from data science technology stacks to Michael Jordan. In this week's round-up:

  • Five Roles You Need on Your Big Data Team
  • Choosing a Data Science Technology Stack
  • 12 Predictive Analytics Screw-ups
  • What Michael Jordan Can Teach Us About Big Data, Strategy And Innovation

Five Roles You Need on Your Big Data Team

Our first piece this week is an HBR article about the different roles you need when building a data science team. Data science is a very broad field and because of this, it's difficult to find someone who has all the skills that fall under its umbrella. This article attempts to break down the skill sets into more specific roles that can work together to really create value for an organization. The article lists the different roles, describes them, and also talks about the kind of culture you need to develop in order to get everyone in the organization on board and on the same page.

Choosing a Data Science Technology Stack

This is an interesting blog post about different data science technology stacks and how we as data scientists go about choosing one that works best for us. The author points out that there are several layers to a data science stack - sourcing the data, storing it, exploring it, modeling it, etc. - and there are several technological options available for performing each layer. The post examines these different options and even has a survey you can enter the technologies you use for each layer. When the survey is complete, those who participated will be emailed the results.

12 Predictive Analytics Screw-ups

This is a ComputerWorld article about some of the pitfalls you would do well to avoid when performing predictive analytics. The author interviewed experts at 3 data science consulting firms - Elder Research, Abbott Analytics, and Prediction Impact - about about the different mistakes they encounter to come up with this list. Take a look through them and see how many you've encountered yourself!

What Michael Jordan Can Teach Us About Big Data, Strategy And Innovation

Our final piece this week is a Forbes article that uses Michael Jordan and other sports examples to drive home points about big data and how we use it in business. The author starts out by drawing a parallel between the types of decisions managers need to make these days about new technologies, opportunities, and employees to looking at Michael in his early days when his athletic potential wasn't as obvious. He continues through the rest of the article writing about the processes we go through, the data we look at in our attempts to evaluate a situation and make appropriate decisions, and how big data and advances in technology improve our abilities to do all these things over time.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups