Risks of Predictive Analytics

Coiffed Ray-GunBased on an analysis of more than a half million public posts on message boards, blogs, social media sites and news sources, IBM predicts that ‘steampunk,’ a sub-genre inspired by the clothing, technology and social mores of Victorian society, will be a major trend to bubble up, and take hold, of the retail industry. --Jan. 14, 2013 IBM Press Release  

Really? Is this a good idea? Not steampunk fashion -- that's clearly a bad idea. But publicizing this data-driven prediction -- is that a good idea? Could this press release actually cause an increase in rayguns and polished brass driving goggles?

I think this illustrates one of a couple of important potentially negative consequences to making and communicating statistical predictions. The first risk is that making predictions may sway people to follow the predictions. The second risk is that making predictions may sway people to inaction and complacency. Both of these risks may need to be actively managed to prevent advanced predictive modeling from causing more harm than good.

Recently, none other than Nate Silver indicated that if he thought his predictive models of elections were swaying the results, he would stop publishing them. There are longstanding questions about bandwagon and "back the winner" effects in polling and voting. If your predictions are widely seen as accurate, as Silver's are, then your statements may increase votes for the perceived winner and decrease them for the perceived loser. It's well known that more people report, after the fact, that they voted for a winning candidate than actually did so.

There are other ways that prediction can drive outcomes in unpredicted or undesired ways, especially when predictions are tied to action. If your predictive model estimates increased automobile traffic between two locations, and you build a highway to speed that traffic, than the "induced demand" effect (added capacity causes increased use) will almost certainly prove your predictive model correct. Even if the model was predicting only noise. The steampunk prediction may fall into this category, sadly.

The other problem is exemplified by sales forecasts. If your predictions are read by the people whose effort is needed to realize the forecast results, they may be less likely to come true. Your predictions are probably based on a number of assumptions, including that the sales team is putting in the same type of effort that they did last month or last year. But if forecast results are perceived as a "done deal," that assumption will be violated. A prediction is not a target, and should not be seen or communicated as such.

How can these problems be mitigated? In some cases, by better communications strategies. Instead of providing a point estimate of sales ("we're going to make $82,577.11 next week!"), you may be better off providing the numbers from an 80% or 90% confidence interval: "if we slack off, we could make as little as $60,000, but if we work hard, we could make as much as $100,000." Of course, if you have the sort of data where you can include sales effort as a predictor, you can do even better than that.

Another trick to keeping people motivated is to let them beat their targets most but not all of the time. How do you do this? Consider providing the 20th percentile of a forecast distribution as the target. If your model is well-calibrated, those forecasts will be met 80% of the time. There is extensive psychological and business research in the best way to set goals, and my (limited) understanding of it is that people who think they are doing well, but with room for improvement, are best engaged.

Returning to the upcoming steampunk sartorial catastrophe, perhaps IBM should have exercised some professional judgement, as Nate Silver seems to be doing, and just kept their big blue mouth shut on this one.