predictive modeling

Weekly Round-Up: Ford's Data, Apple's iWatch, Wavii's Acquisition, and Fighting Malaria

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from how Ford is leveraging data to improve their operations to combating malaria using data from cell phones. In this week's round-up:

  • How Data is Changing the Car Game for Ford
  • How Apple's iWatch Will Push Big Data Analytics
  • Google Bags Another Machine Learning Startup
  • Researchers Use Data from Cell Phones to Combat Outbreaks

How Data is Changing the Car Game for Ford

This is a GigaOM article about how Ford Motor Company is using data to build better cars and better customer experiences. The article goes into some detail about how the company is doing both of these things, such as creating data products that are available to consumers with some of their automobiles that provide them with data about their car's performance. The author goes on to quote some of the folks in charge of the data efforts at Ford about internal data processes and some of the changes the company has had to make in order to become more data-driven.

How Apple's iWatch Will Push Big Data Analytics

This is a Smart Data Collective article about what Apple's rumored iWatch could mean for Big Data. According to the article, the watch will be able to capture data about where you've been, what you've eaten, how many calories you've burned, and how you've slept among other things. The author provides some examples of products currently on the market (such as Nike's Fuelband and the Fitbit Ultra) that have opened up the amount of data that can be collected from individuals and opines that Apple's smart watch will capture significant share of this market. He also predicts that this will change the world of big data analytics, and he provides some examples of why he believes this.

Google Bags Another Machine Learning Startup

Google acquired machine learning startup, Wavii, this week and this Wired article has some of the details about the startup, the acquisition, and about how Wavii's technology may be used inside of Google. The article mentions that there was a bidding war between Apple and Google for the company, so hopefully Google will be able to make this victory pay off in the near future.

Researchers Use Data from Cell Phones to Combat Outbreaks

This is an MIT Technology Review article about how epistemologists at Harvard have been able to track the spread of diseases such as malaria by studying data generated from cell phone towers in Kenya. Using this data, they can track movement to and from regions of the country they know have a high infection rate and feed that information into predictive models that can forecast how the diseases may spread. The article goes into much more detail and is a fascinating and informative read.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Risks of Predictive Analytics

Coiffed Ray-GunBased on an analysis of more than a half million public posts on message boards, blogs, social media sites and news sources, IBM predicts that ‘steampunk,’ a sub-genre inspired by the clothing, technology and social mores of Victorian society, will be a major trend to bubble up, and take hold, of the retail industry. --Jan. 14, 2013 IBM Press Release  

Really? Is this a good idea? Not steampunk fashion -- that's clearly a bad idea. But publicizing this data-driven prediction -- is that a good idea? Could this press release actually cause an increase in rayguns and polished brass driving goggles?

I think this illustrates one of a couple of important potentially negative consequences to making and communicating statistical predictions. The first risk is that making predictions may sway people to follow the predictions. The second risk is that making predictions may sway people to inaction and complacency. Both of these risks may need to be actively managed to prevent advanced predictive modeling from causing more harm than good.

Recently, none other than Nate Silver indicated that if he thought his predictive models of elections were swaying the results, he would stop publishing them. There are longstanding questions about bandwagon and "back the winner" effects in polling and voting. If your predictions are widely seen as accurate, as Silver's are, then your statements may increase votes for the perceived winner and decrease them for the perceived loser. It's well known that more people report, after the fact, that they voted for a winning candidate than actually did so.

There are other ways that prediction can drive outcomes in unpredicted or undesired ways, especially when predictions are tied to action. If your predictive model estimates increased automobile traffic between two locations, and you build a highway to speed that traffic, than the "induced demand" effect (added capacity causes increased use) will almost certainly prove your predictive model correct. Even if the model was predicting only noise. The steampunk prediction may fall into this category, sadly.

The other problem is exemplified by sales forecasts. If your predictions are read by the people whose effort is needed to realize the forecast results, they may be less likely to come true. Your predictions are probably based on a number of assumptions, including that the sales team is putting in the same type of effort that they did last month or last year. But if forecast results are perceived as a "done deal," that assumption will be violated. A prediction is not a target, and should not be seen or communicated as such.

How can these problems be mitigated? In some cases, by better communications strategies. Instead of providing a point estimate of sales ("we're going to make $82,577.11 next week!"), you may be better off providing the numbers from an 80% or 90% confidence interval: "if we slack off, we could make as little as $60,000, but if we work hard, we could make as much as $100,000." Of course, if you have the sort of data where you can include sales effort as a predictor, you can do even better than that.

Another trick to keeping people motivated is to let them beat their targets most but not all of the time. How do you do this? Consider providing the 20th percentile of a forecast distribution as the target. If your model is well-calibrated, those forecasts will be met 80% of the time. There is extensive psychological and business research in the best way to set goals, and my (limited) understanding of it is that people who think they are doing well, but with room for improvement, are best engaged.

Returning to the upcoming steampunk sartorial catastrophe, perhaps IBM should have exercised some professional judgement, as Nate Silver seems to be doing, and just kept their big blue mouth shut on this one.