Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from sports analytics to drug side effects. In this week's round-up:
- How Numbers Can Reveal Hidden Truths About Sports
- How and Why LinkedIn is Becoming an Engineering Powerhouse
- Introducing Kaggle Connect: Data Science Consulting via Kaggle
- Unreported Drug Side Effects Found Using Internet Search Data
This is an article about a sports analytics study done at MIT examining the importance of factors in the success of field goals. The study analyzes 11,896 NFL field goal attempts from 2000 through 2011 and debunks some common misconceptions, like that calling a time-out before the kick to put extra pressure on the kicker will increase the likelihood of a missed field goal. The article also gives a brief history of sports analytics and mentions another interesting study about the value of flexibility in baseball roster construction.
This GigaOM article follows the changes in LinkedIn's data infrastructure over the last five years. This includes setting up the company's Hadoop infrastructure, their Voldemort distributed database, a scheduler for batch processes called Azkaban, a message broker system named Kafka, and the company's new Espresso database. The architecture combines online, offline, and nearline systems that each perform the necessary functions as efficiently as possible and allow the company to continue to scale effectively.
This is an interesting post from Kaggle blog introducing the company's new offering, called Kaggle Connect. Connect is a consulting platform that helps match top competitors in Kaggle competitions with companies that need machine learning and predictive analytics projects completed. The post mentions the intent behind the platform is to create a McKinsey in the Cloud, not a Mechanical Turk for Data Science. The post goes on to describe the platform in more detail and includes a map of where on the planet the Connect participants reside.
This is an interesting NY Times article about how a group of scientists from Microsoft, Stanford, and Columbia were able to detect evidence of unreported prescription drug side-effects before the warning system used by the Food and Drug Administration. They were able to do this by mining data from the search engines of Google, Microsoft, and Yahoo. The article goes on the mention the drugs that were analyzed in the study and provide more details about some of the group's findings.
That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.