Round-Ups

Weekly Round-Up: Data Analysis Tools, M2M, Machine Learning, and Naming Babies

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from data analysis tools to naming babies. In this week's round-up:

  • Data Analysis Tools Target Non-experts
  • How M2M Data Will Dominate the Big Data Era
  • What Hackers Should Know About Machine Learning
  • Knowledge Engineering Applied to Baby Names

Data Analysis Tools Target Non-experts

Our first piece this week is an O'Reilly Strata article about some of the data analysis tools that are coming to market and are aimed at providing business users with the analytics they need to make decisions. The article highlights several tools from a variety of companies and categorizes them into three different categories according to what they help you do. The article also includes links to all the companies' websites so that, if you're anything like me, you can check out every single one of them.

How M2M Data Will Dominate the Big Data Era

The Internet of Things is getting a lot of attention these days, partly due to the amount of data that gets produced when one connected device communicates with another connected device. This is known as Machine-to-Machine data (M2M), and this Smart Data Collective article describes where a lot of this data may come from and how much data can potentially be generated.

What Hackers Should Know About Machine Learning

Our third piece is a Fast Company interview with Drew Conway, the author of the must-own book Machine Learning for Hackers. In the interview Drew answers questions about why developers should learn machine learning, the biggest knowledge gaps they need to overcome, and the differences between a machine learning project and a development project. (Editor's Note, the image to the left links to Amazon where if you buy the book we get a small cut of the proceeds. Buy enough books through this link, and we retire to an island.)

Knowledge Engineering Applied to Baby Names

Our final piece this week is a blog post about a company called Nameling is in the midst of holding a contest to improve the algorithms behind their baby name recommendation engine. Coming up with a good name for your baby is very important to parents, as the consequences of choosing a bad one almost certainly result in ridicule and tears. It should be interesting to see the results of the contest as well as what kinds of names the recommendation engine spits out.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: Hadoop, Big Data vs. Analytics, Process Management, and Palantir

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from Hadoop to business process management. In this week's round-up:

  • To Hadoop or Not to Hadoop?
  • What’s the Difference Between Big Data and Business Analytics?
  • What Big Data Means to BPM
  • How A Deviant Philosopher Built Palantir

To Hadoop or Not to Hadoop?

Our first piece this week is an interesting blog post about what sorts of data operations Hadoop is and isn't good for. The post can serve as a useful guide when trying to figure out whether or not you should use Hadoop to do what you're thinking of doing with your data. It is organized into 5 categories of things you should consider and contains a series of questions you can ask yourself for each of the categories to help with your decision-making.

What’s the Difference Between Big Data and Business Analytics?

This is an excellent post on Cathy O'Neil's Mathbabe blog about how she distinguishes big data from business analytics. Cathy argues that what most people consider big data is really business analytics (on arguably large data sets) and that big data, in her opinion, consists of automated intelligent systems that algorithmically know what to do and need very little human interference. She goes into more detail about the differences between, including some examples to drive home her point.

What Big Data Means to BPM

Continuing on the subject of intelligent systems performing business processes, our third piece this week is a Data Informed article about big data's effect on business process management. The article is an interview with Nathaniel Palmer, a BPM veteran practitioner and author. In the interview, Palmer answers questions about what kinds of trends are emerging in business process management, how big data is affecting its practices, and what changes are being brought about because of it.

How A Deviant Philosopher Built Palantir

Our last piece this week is a Forbes article about Palantir, an analytics software company that works with federal intelligence agencies and is funded by In-Q-Tel - the CIA's investment fund. The article describes the company's CEO, what the company does, who it does for, and delves into some of Palantir's history. Overall, the article provides an interesting look at a very interesting company.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: Machine Learning, DIY Data Scientists, Games, and Helping Couples Conceive

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from machine learning to helping couple's conceive. In this week's round-up:

  • Jeff Hawkins: Where Open Source and Machine Learning Meet Big Data
  • The Rise Of The DIY Data Scientist
  • Why Games Matter to Artificial Intelligence
  • Three Questions for Max Levchin About His New Startup

Jeff Hawkins: Where Open Source and Machine Learning meet Big Data

Our first piece this week is an InfoWorld article about Jeff Hawkins, the machine learning work that him and his company have been doing, and the open source project they've recently released on Github. The project's name is the Numenta Platform for Intelligent Computing (NuPIC) and it's goal is to allow others to be able to embed machine intelligence into their own systems. The article has a short interview with Jeff and a link to the Github page where the project resides.

The Rise Of The DIY Data Scientist

This is an interesting Fast Company article about how Kaggle competition winners tend to be self-taught. The author of the article interview's Kaggle's chief scientist Jeremy Howard about this phenomenon and other interesting findings derived from Kaggle's competitions about data scientists. Some of the questions inquire about where the winners are from, how they learned data science, and what machine learning algorithms they use.

Why Games Matter to Artificial Intelligence

This blog post on the IBM Research blog is an interview with Dr. Gerald Tesauro about the significance of games in the Artificial Intelligence field. Dr. Tesauro was the IBM research scientists who taught Watson how to play Jeopardy. In the interview, he explains how games tend to be an ideal training ground for machines because they tend to simplify real life. He goes on to answer questions about how that prepares the machines for transitioning to other real-world problems, what he's currently working on, what Watson is doing these days, and where else machine learning can be used.

Three Questions for Max Levchin About His New Startup

Our final piece this week is an MIT Technology Review article about PayPal co-founder Max Levchin's new startup called Glow. A lot of people are having children later in life these days and one downside of this is that many couples have trouble trying to conceive. Levchin has developed an iPhone app that uses data to help couples identify the optimal time for conception. In this brief interview, Levchin talks about what they are doing, why, and the degree of accuracy they hope to achieve.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: Statisticians, Build Smart DC, Kirk Borne, and Treating Parkinson's

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from collecting building data to treating Parkinson's. In this week's round-up:

  • Statisticians: An Endangered Species?
  • Washington DC Launches Real-time Building Energy Data Project
  • Time Spent with Kirk Borne
  • Michael J. Fox Foundation Points Big Data At Parkinson's

Statisticians: An Endangered Species?

Our first piece this week is an interesting blog post on the Revolution Analytics blog about how statisticians are perceived and how that relates to data science. The post was inspired by an American Statistical Association Magazine article that portrayed statisticians as being left in the dust of the big data movement. The author goes on to talk about how he was surprised at how little mention there was of R in the article and how contributing to the statistical programming language may be a good way for statisticians to continue to play an important role in data science.

Washington DC Launches Real-time Building Energy Data Project

Our next piece is a GigaOM article about a project that launched last week called Build Smart DC. The project monitors energy data from city-owned buildings at 15 minute intervals to provide management with a much more granular view of energy use in the properties than ever before. This will allow them to monitor trends and make data-driven decisions that will lead to more efficient energy consumption. The article also goes on to talk about the startup that is driving this program and some other cities that have similar projects in place.

Time Spent with Kirk Borne

Our third piece is an interesting short interview with Kirk Borne. Kirk is a Professor of Astrophysics and Computational Science at George Mason University and has been one of the most influential Big Data advocates on Twitter in recent years. He talks to the interviewer about astrophysics, big data, and data science education.

Michael J. Fox Foundation Points Big Data At Parkinson's

Our final article this week is an InformationWeek piece about how the Michael J. Fox Foundation put on a Kaggle competition to see if data scientists could help identify patients that had Parkinson's and track increases and decreases in symptoms among patients that had the disease. The article highlights the winning team in the competition, some of the methods they used to generate their predictive models, and how they were about to acquire the domain knowledge that ultimately helped them win the competition.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: Data Science Roles, Technology Stacks, Predictive Analytics, and Michael Jordan

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from data science technology stacks to Michael Jordan. In this week's round-up:

  • Five Roles You Need on Your Big Data Team
  • Choosing a Data Science Technology Stack
  • 12 Predictive Analytics Screw-ups
  • What Michael Jordan Can Teach Us About Big Data, Strategy And Innovation

Five Roles You Need on Your Big Data Team

Our first piece this week is an HBR article about the different roles you need when building a data science team. Data science is a very broad field and because of this, it's difficult to find someone who has all the skills that fall under its umbrella. This article attempts to break down the skill sets into more specific roles that can work together to really create value for an organization. The article lists the different roles, describes them, and also talks about the kind of culture you need to develop in order to get everyone in the organization on board and on the same page.

Choosing a Data Science Technology Stack

This is an interesting blog post about different data science technology stacks and how we as data scientists go about choosing one that works best for us. The author points out that there are several layers to a data science stack - sourcing the data, storing it, exploring it, modeling it, etc. - and there are several technological options available for performing each layer. The post examines these different options and even has a survey you can enter the technologies you use for each layer. When the survey is complete, those who participated will be emailed the results.

12 Predictive Analytics Screw-ups

This is a ComputerWorld article about some of the pitfalls you would do well to avoid when performing predictive analytics. The author interviewed experts at 3 data science consulting firms - Elder Research, Abbott Analytics, and Prediction Impact - about about the different mistakes they encounter to come up with this list. Take a look through them and see how many you've encountered yourself!

What Michael Jordan Can Teach Us About Big Data, Strategy And Innovation

Our final piece this week is a Forbes article that uses Michael Jordan and other sports examples to drive home points about big data and how we use it in business. The author starts out by drawing a parallel between the types of decisions managers need to make these days about new technologies, opportunities, and employees to looking at Michael in his early days when his athletic potential wasn't as obvious. He continues through the rest of the article writing about the processes we go through, the data we look at in our attempts to evaluate a situation and make appropriate decisions, and how big data and advances in technology improve our abilities to do all these things over time.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: Big Data Projects, OpenGeo, Coca-Cola, and Crime-Fighting

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from big data projects to Coca-Cola. In this week's round-up:

  • 5 Big Data Projects That Could Impact Your Life
  • CIA Invests in Geodata Expert OpenGeo
  • How Coca-Cola Takes a Refreshing Approach to Big Data
  • Fighting Crime with Big Data

5 Big Data Projects That Could Impact Your Life

Our first piece this week is a Mashable article listing 5 interesting data projects. The projects range from one that projects transit times in NYC to one that tracks homicides in DC to one that illustrates the prevalence of HIV in the United States. All are great examples of people doing interesting things with data that is becoming increasingly available.

CIA Invests in Geodata Expert OpenGeo

A while back, the CIA spun off a strategic investment arm called In-Q-Tel to make investments in data and technologies that could benefit the intelligence community. This week, it was announced that they have invested in geo-data startup OpenGeo. This GigaOM article provides a little detail about the company and what they do and also lists some of the other companies In-Q-Tel has invested in thus far.

How Coca-Cola Takes a Refreshing Approach to Big Data

This is an interesting Smart Data Collective article about Coca-Cola and how they use data to drive their decisions and maintain a competitive advantage. The article describes multiple ways the company uses big data and analytics, from interacting with their Facebook followers to the formulas for their soft drinks.

Fighting Crime with Big Data

Our final piece this week is an article about how analytics platform provider, Palantir, helps investigators find patterns to uncover white collar crime, which is usually hidden using data. The article contains multiple quotes from Palantir's legal counsel Ryan Taylor about how they work with crime-fighting agencies and what methods they employ to bring these criminals to justice.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: Data Science Metro Map, Big Data Workers, Prescriptive Analytics, and Knewton

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from big data workers to educational recommendation algorithms. In this week's round-up:

  • Becoming a Data Scientist – Curriculum via Metromap
  • The Growing Need for Big Data Workers: Meeting the Challenge With Training
  • How Prescriptive Analytics Could Harness Big Data to See the Future
  • Q&A With Knewton’s David Kuntz, Maker of Algorithms

Becoming a Data Scientist – Curriculum via Metromap

For those of you looking to get started learning data science but don't know where to begin, this blog post literally maps it out for you. The author has taken the broad subject of data science and created a train map similar to those found in all major cities with public transportation. The different tracks of data science are depicted as different color train lines in the map and the subjects within those tracks are depicted as stops along those lines. Very interesting and definitely worth a look!

The Growing Need for Big Data Workers: Meeting the Challenge With Training

This is a Wired article about how the need for big data workers is growing as there is more and more data that needs to be collected, organized, analyzed, and acted upon. The article talks about the challenges of educating people and highlights the efforts of a few companies such as IBM, Big Data University, and DeveloperWorks.

Speaking of data science education, Data Community DC is hosting a Natural Language Processing Basics workshop on July 27th and there are still a few seats left. You can view details and sign up here.

How Prescriptive Analytics Could Harness Big Data to See the Future

Our third piece this week is about prescriptive analytics and how organizations can use it to help them make data-driven improvements in their operations. The article defines prescriptive analytics, contrasts it with the more commonly used descriptive and predictive analytics, and provides some examples as to how it can be useful.

Q&A With Knewton’s David Kuntz, Maker of Algorithms

Our final piece this week is an article about a company call Knewton and the interesting work they do. Knewton designs recommendation systems for educational products, which help customize the learning experience and tailor it to the individual student. In this article the author interviews David Kuntz, who is Knewton's Vice President of Research, about how their technology works, what kinds of things it can do, and what this means for education in the future.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: Data Scientist Types, Data Protection, Travel, and Jay-Z

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from data scientists types to data collecting music apps. In this week's round-up:

  • What Kind Of Data Scientist Are You?
  • Evernote’s Three Laws of Data Protection
  • Big Data Analysis Drives Revolution In Travel
  • Samsung and Jay-Z Accused of Using New Album to Mine Customer Data

What Kind Of Data Scientist Are You?

Our first article this week is a Fast Company piece about the new ebook our very own Harlan Harris, Marck Vaisman, and Sean Murphy authored. The ebook is about how there are actually multiple types of data scientists and the different combinations of skills and experience each type tends to have. The article provides some overview, some excerpts and graphics, and a link to the ebook as well.

Evernote’s Three Laws of Data Protection

This is a Smart Data Collective article about Evernote's stance on data protection and how it differs from other companies. Evernote is one of the most popular note-taking apps on the market, essentially letting you keep a copy of your brain out in the cloud where you can access it from anywhere and remember things your real brain may have forgotten. That being the case, the privacy of their users' data is of great importance to them.

Big Data Analysis Drives Revolution In Travel

Our third piece this week is an InformationWeek article about how data is revolutionizing the travel industry. We've all had to endure the frustrations that often come along with getting from point A to point B. This article highlights several companies and explains how they are using data to operate more efficiently and improve customer experiences.

Samsung and Jay-Z Accused of Using New Album to Mine Customer Data

Our final piece this week is a Time article about how Samsung and rapper Jay-Z offered early access to Jay's new album Magna Carta Holy Grail through an app on select Samsung mobile devices. The intent seemed to be for them to be able to collect some data about the types of customers that would want access to the album before the official release date. This article describes some of the data the app requested and talks about how this has raised some eyebrows about why they would need to collect the type of data they are collecting.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

The State of Recommender Technology

Reblogged with permission from Cobrain. socialnetwork_graph

So let’s start with the big idea that is the reason that we are all here: recommendation engines. If you are reading this, you have probably already overcome the mental hurdle of the massive design and implementation challenge that recommendation engines represent, otherwise I can’t imagine why you would have signed up! Or perhaps you don’t know what a massive design and implementation challenge recommendation engines represent. Either way, you’re in the right place- this post is an introduction to the state of the technology of recommendation systems.

Well sort of– here is a working state of the technology: Academia has created a series of novel machine learning and predictive algorithms that would allow scarily accurate trend analysis, recommendations, and predictions given the right, unbiased supervised training sets of sufficient magnitude. Commercial applications in very specific domains have leveraged these insights and extremely large data sets to create interesting results in the release phase of applications but have found that over time the quality of these predictions decreases rapidly. Companies with even larger data sets that have tackled other algorithmic challenges involving supervised training sets (Google) have avoided current recommender systems because of their domain specificity, and have yet to find a generic enough application.

To sum up:

Recommendation Engines are really really hard, and you need a whole heckuva lot of data to make them work.

Now go build one.

Don’t despair though! If it wasn’t hard, everyone would be doing it! We’re here precisely because we want to leverage existing techniques on interesting and novel data sets, but also to continue to push forward the state of the technology. In the process we will probably learn a lot and hopefully also provide a meaningful experience for our users. But before we get into that, let’s talk more generically about the current generation of recommender systems.

Who Does it Well?

The current big boys in the recommendation space are AmazonNetflixHunch (now owned by eBay), Pandora, and Goodreads. I strongly encourage you to understand how these guys operate and what they do to create domain specific recommendations. For example, the domain of Goodreads, Netflix, and Pandora is books, movies, and music respectively. Recommending inside a particular domain allows you to leverage external knowledge resources that either solve scarcity issues or allow ontological reasoning that can add a more accurate layer on top of the pure graph analyses that typically happen with recommenders.

Amazon and Hunch seem to be more generic, but in fact they also have domain qualification. Amazon has the data set of all SKU-level transactions through it’s massive eCommerce site. Even so, Amazon has spent 10 years and a lot of money perfecting how to rank various member behaviors. Because it is Amazon-specific, Amazon can leverage Amazon-only trends and purchasing behaviors, and they are still working on perfecting it. Hunch doesn’t have an item-specific domain, but rather a system-specific domain, using social and taste-making graphs to propose recommendations inside the context of social networks.

Speaking of Amazon’s decade long effort to create a decent recommender with tons of data, I hope you’ve heard of the Netflix Prize. Netflix was so desperate for a better algorithm for recommendations that they instituted an X-Prize like contest for a unique algorithm for recommending movies in particular. In fact, the test methodology for the Netflix Prize has become a standard for movie recommendations, and since 2009 (when the prize was awarded) other algorithm sets have actually achieved better results, most notably, Filmaster.

Given what these companies have tried to do, we can more generically speak of the state of the technology as follows: An “adequate” recommender system comprises of the following items:

  1. An unbiased, non-scarce data set of sufficient size
  2. A suite of machine learning and predictive algorithms that traverse that data set
  3. Knowledge resources to apply transformations on the results of those algorithms

Pandora is a great example of this. They have created an intensive project at detailing a “music genome” or an ontological breakdown of a sample of music. The genome itself is the knowledge resource. The analysis of the genomics of a piece of music aggregated across a large number of pieces is the unbiased non-scarce data set of sufficient size. Finally the suite recommendation algorithms that Pandora applies to these two sets then generates ranked recommendations that are interesting.

Types of Recommenders

Without getting into a formal description of recommenders, I do want to list a few of the common types of recommendation systems that exist within domain specific contexts. To do this, I need to describe the two basic classes of algorithms that power these systems:

  1. Collaborative Filtering: recommendations based on shared behavior with other people or things. E.g. if you and I bought a widget, and I also bought a sprocket, it is likely that you would also like a sprocket.
  2. Expert Adaptive or Generative Systems: recommendations based on shared traits of people or things or rules about how things interact with each other in a non-behavior way. E.g. if you play football and live in Michigan, this particular pair of cleats is great in the snow.

In the world of recommenders, we are trying to create a semantic relationship between people and things, therefore we can discuss person-centric and item-centric approaches in each of these classes of algorithms; and that gives us four main types of recommenders!

  1. Personalized Recommendations- A person-centric, expert adaptive model based on the person’s previous behavior or traits.
  2. Social/Collaborative Recommendations- A person-centric collaborative filtering model based on the past behavior of people similar to you, either because of shared traits or shared behavior. Note that the clustering of similar people can fall into either algorithm set, but the recommendations come from collaborative filtering.
  3. Ontological Reasoned Recommendations- An item-centric expert adaptive system that uses rules and knowledge mined with machine learning approaches to determine an inter-item relational model.
  4. Basket Recommendations- An item-centric collaborative filtering algorithm that uses inter-item relationships like “purchased together” to create recommendations.

Keep in mind, however, that these types of recommenders and classes are very loose and there is a lot of overlap!

Conclusion

Now that large scale search has been dramatically improved and artificial intelligence knowledge bases are being constructed with a reasonable degree of accuracy, it is generally considered that the next step in true AI will be effective trend and prediction analysis. Methodologies to deal with Big Data have evolved to make this possible, and many large companies are rushing towards predictive systems with a wide range of success. Recent approaches have revealed that near-time, large data, domain-specific efforts yield interesting results, if not truly predictive.

The overwhelming challenge is not just in engineering architectures that traverse graphs extremely well (see the picture at the top of this post), but also in finding a unique combination of data, algorithms, and knowledge that will give our applications a chance to provide truly scary, inspiring results to our users. Even though this might be a challenge, there are four very promising approaches that we can leverage within our own categories.

Stay tuned for more on this topic soon!

Weekly Round-Up: Data Scientists, Startups, Big Data Leaders, and Einstein

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from data scientists job descriptions to contrasting big data and genius. In this week's round-up:

  • It's a Bird! It's a Plane! No, It's Just a Data Scientist.
  • Meet the Startups Making Machine Learning an Elementary Affair
  • What the Companies Winning at Big Data Do Differently
  • What Would Big Data Think of Einstein?

It's a Bird! It's a Plane! No, It's Just a Data Scientist.

This week, we start off with a Smart Data Collective article about how typical data scientist job descriptions tend to be composed of an unrealistic wishlist of things the hiring organization thinks a data scientist is. The article mentions how the term data scientist is very unclear in nature and how it is made up of at least two roles - data management and data analytics - both of which take up a substantial amount of a person's time.

Meet the Startups Making Machine Learning an Elementary Affair

Next up, we have a GigaOM article about startups that are trying to make machine learning tools that business users can use. The article lists 5 startups and talks a little about what each one does and what they're trying to produce.

What the Companies Winning at Big Data Do Differently

This Bloomberg article examines a survey done by Tata Consulting Services on large companies with substantial investments in big data technologies and explains what the differences are between companies that are getting a high return on these investments and companies that are not.

What Would Big Data Think of Einstein?

Our final article this week is a BBC piece that asks the question what happens to genius and big ideas in a world where big data gets so much attention. The author says that coming up with answers becomes relatively easy once you have the data and you know what you want to measure. The problem with this is that it focuses on looking backward and not the creativity and imagination it takes to look toward the future.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: Industrial Internet, Business Culture, Visualization, and Beer Recommendations

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from the Industrial Internet to beer recommendations. In this week's round-up:

  • The Googlization Of GE
  • 10 Qualities a Data-Friendly Business Culture Needs
  • Interview with Miriah Meyer - Microsoft Faculty Fellow and Visualization Expert
  • Recommendation System in R

The Googlization Of GE

This is an interesting Forbes article about GE, the Internet of Things (which it calls the Industrial Internet), and how they are trying to be to that space what Google has become to the consumer data space.

10 Qualities a Data-Friendly Business Culture Needs

Running a data-driven organization requires not only having the right talent, tools, and infrastructure to meet the organization's objectives. It also requires a data-friendly culture, which is the premise for this article. The author identifies 10 qualities that can make for a better environment to foster innovative data-driven processes.

Interview with Miriah Meyer - Microsoft Faculty Fellow and Visualization Expert

This post is part of Jeff Leek's interview series on his Simply Stats blog. This week Jeff interviewed Miriah Meyer, who is an expert on data visualization. The interview includes questions about her work, background, influences, and advice she has for data scientists about visualization.

Recommendation System in R

This is a fun blog post about putting together a beer recommendation system using the R statistical programming language. The author walks us through the processes he followed, includes snippets of the code he used, and even shows off the resulting app where you choose a beer you like and it recommends other beers that are similar to it.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: Computer Vision, Machine Learning, Benchmarking, and R Packages

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from computer vision to popular R packages. In this week's round-up:

  • Google Explains How AI Photo Search Works
  • Matter Over Mind in Machine Learning
  • Principles of ML Benchmarking
  • A List of R Packages, By Popularity

Google Explains How AI Photo Search Works

This is an interesting blog post about how Google recently enhanced their image search functionality using computer vision and machine learning algorithms. The post describes in layman's terms how the algorithms work and how they are able to classify pictures. It also includes a link to Google's research blog, where they made the original announcement.

Matter Over Mind in Machine Learning

This is a post on the BigML blog which talks about the work of Dr. Kiri Wagstaff from NASA's Jet Propulsion Laboratory. The post highlights a specific paper of hers where she argues that instead of aiming for incremental abstract improvements in machine learning processes, we should be focused on attaining results that translate into a measurable impact for society at large. More detail is provided about what that means, the author plays a little devil's advocate, and the post also includes a link to Wagstaff's paper for those that would like to read more about this.

Principles of ML Benchmarking

This is a post on the Wise.io blog about how to benchmark machine learning algorithms. The post is structured as a thought exercise where the author starts by thinking about the purpose of benchmarking, why we should do it, and what our goals should be. From that point, he is able to formulate a set of guidelines for benchmarking that are very logical. The post lists each of the guiding principles along with some steps that can be taken to make sure you are abiding by them.

A List of R Packages, By Popularity

Our last article this week is a post on the Revolution Analyitcs blog that lists the top R packages in order of popularity. Some of the most popular packages include plyr, digest, ggplot2, and colorspace. Check out the list, see where your favorite packages rank, and potentially discover some useful packages you didn't know about!

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: NSA, Data Science History, Best Practices, and Robot Writers

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from the NSA's data collection practices to machines writing for the CIA. In this week's round-up:

  • Under the Covers of the NSA’s Big Data Effort
  • A Very Short History Of Data Science
  • 7 Habits of Highly Successful Big Data Pioneers
  • CIA Invests in Narrative Science and Its Automated Writers

Under the Covers of the NSA’s Big Data Effort

This is an interesting article about the types of technologies the NSA is using in their data collection practices and what they can and can't do with those technologies. The article also hypothesizes as to how much data they are able to collect and analyze.

A Very Short History Of Data Science

For those interested in how data science originated and has progressed up until current day, this Forbes article should be a worthwhile read. The article starts off in 1962 with John W. Tukey's paper titled "The Future of Data Analysis" and walks you through major milestones in the field up through September of 2012 when Tom Davenport and DJ Patil declared data scientist the sexiest job of the 21st century.

7 Habits of Highly Successful Big Data Pioneers

In the spirit of the 7 Habits of Highly Effective People, this Smart Data Collective article lists 7 habits for succeeding as a big data practitioner. The habits listed range from planning properly and making wise financial decisions when evaluating technologies to being flexible and adaptable when obstacles present themselves.

CIA Invests in Narrative Science and Its Automated Writers

This is an interesting article about a company called Narrative Science and their services that will be used by the CIA and the broader intelligence community in the near future. The company's product is able to transform data into sentences automatically and is currently being used to write up sports summaries from box scores and earnings reports from stock data.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: Big Data ROI, Statistics, GE, and China

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from Big Data's return on investment to its progress in China. In this week's round-up:

  • Big Data ROI Still Tough To Measure
  • What Statistics Should Do About Big Data
  • GE CEO Jeff Immelt’s Big Data Bet
  • In China, Big Data Is Becoming Big Business

Big Data ROI Still Tough To Measure

This is an article about how difficult it is to measure the return on investment of big data solutions. Given all the hype in the media, business leaders naturally want to know whether their investments in these solutions are paying off. The article goes on to describe some of the complexities involved and talks about some of the obstacles that will have to be overcome in order for business leaders to feel more satisfied with the solutions they invest in.

What Statistics Should Do About Big Data

This is a blog post by Jeff Leek continuing the discussions being had recently about the role of statistics in big data. Jeff writes about his understanding of what some of the issues raised in previous conversations boil down to and then provides his thoughts about what statisticians need to do in order to not get left out of the big data discussion. He concludes the post with a list of things he'd like to see come out these discussions that would help the discipline progress to the next level.

GE CEO Jeff Immelt’s Big Data Bet

This is a summary of GE CEO Jeff Immelt's interview at the D11 conference this past week, which centered around how data collected from sensors can make machines more efficient - what GE calls the Industrial Internet. The article provides some examples of where GE is trying to implement these practices and explains why it's important for GE to be doing this. If you'd like to see the full interview, you can find the video here.

In China, Big Data Is Becoming Big Business

Our last article this week is a Bloomberg BusinessWeek piece about how big data is progressing in China. The fact that it is such a large country and the fact that an increasing number of its citizens are using technology means that the quantity of data generated is rapidly increasing. This article talks about how data scientists will be in high demand there in the near future and how both government and businesses are working on building infrastructure that can support their data needs.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: WibiData, Big Data Trends, Analytics Processes, and Human Trafficking

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from Big Data trends to using data to fight human trafficking. In this week's round-up:

  • WibiData Gets $15M to Help It Become the Hadoop Application Company
  • 7 Big Data Trends That Will Impact Your Business
  • Want Better Analytics? Fix Your Processes
  • How Big Data is Being Used to Target Human Trafficking

WibiData Gets $15M to Help It Become the Hadoop Application Company

It was announced this week that Cloudera co-founder Christophe Bisciglia's new company, WibiData, has raised $15 million in a Series B round of financing. WibiData is looking to become a dominant player in the market by selling software that lets companies build consumer-facing applications on Hadoop. This article has additional details about the company and what they are trying to do.

7 Big Data Trends That Will Impact Your Business

We're all interested in seeing what the future of data science and Big Data have in store, and this article identifies 7 trends that the author thinks will continue to develop in the years ahead. Some general themes of the trends listed include predictions about platforms, structure, and programming languages.

Want Better Analytics? Fix Your Processes

In order to succeed in running a data-driven organization, you must have the proper analytical business processes in place so that any insights derived from your efforts can be applied to improving operations. In this article, the author proposes 5 principles to ensure analytics are used correctly and deliver the results the organization wants.

How Big Data is Being Used to Target Human Trafficking

Our last article this week is a piece about how Google announced recently that it will be partnering with other organizations in an effort to leverage data analytics in helping to fight human trafficking. Part of the effort will include aggregation of previously dispersed data and another part will consist of developing algorithms to identify patterns and better predict trafficking trends. This article lists additional details about the project.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: Google's Quantum Computer, Data Science vs. Statistics & BI, Business Computing, and Detecting Terrorism Networks

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from Google's new quantum computer to detecting terrorist networks. In this week's round-up:

  • Google Buys a Quantum Computer
  • Statistics vs. Data Science vs. BI
  • Could Business Computing Be Done by Users Without Technical Experience?
  • Can Math Models Be Used to Detect Terrorism Networks?

Google Buys a Quantum Computer

With the ever-increasing amount and complexity of data out there, companies at the edge of technology are starting to look for faster and more efficient ways to process, analyze, and put to use the data that is available to them. That is what Google seems to be working toward as they have purchased a quantum computer and are partnering with NASA to find ways to apply quantum computing to machine learning. This article has some more details about how they are looking to use it and what other companies are also looking into quantum computing.

Statistics vs. Data Science vs. BI

This is an interesting Smart Data Collective article that takes a stab at trying to differentiate between statistics, data science, and business intelligence. The author is a statistician, but ultimately feels that data scientist more accurately describes the work that he does and that's what led him to want to do the comparisons. Check it out and see how much you agree/disagree with his descriptions of each.

Could Business Computing Be Done by Users Without Technical Experience?

This is an article about business computing, how most of it is done using traditional spreadsheet programs, and what the difficulties and challenges that come with it have been. The author describes where spreadsheets are useful, but also where they have their shortcomings. At the end, he introduces a desktop BI solution called esCalc that attempts to correct many of these shortcomings and explains how it does so.

Can Math Models Be Used to Detect Terrorism Networks?

This article is about a paper published last month in the SIAM Journal on Discrete Mathematics. The subject of the paper was disrupting information flow in complex real-world networks, such as terrorist organizations. The article describes the similarities between terrorist networks and other hierarchical organizations and even some social networks. The article also talks about the type of model the authors are using and how the model works.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: Open Data Order, Data Discovery, Andrew Ng, and Connected Devices

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from Open Data to connected devices. In this week's round-up:

  • Open Data Order Could Save Lives, Energy Costs And Make Cool Apps
  • Four Types of Discovery Technology
  • Andrew Ng and the Quest for the New AI
  • Our Connected Future

Open Data Order Could Save Lives, Energy Costs And Make Cool Apps

This is a TechCrunch article about President Obama's recent Open Data Order, an executive order intended to make more government agency data openly available for analysis. The article goes on to talk about some of the ways open data has been used in the past and has a link to Project Open Data's Github page where you can find more details.

Four Types of Discovery Technology

This Smart Data Collective post talks about the value of discovery in data analytics and business. The author claims there are four types of discovery for business analytics - event discovery, data discovery, information discovery, and visual discovery - and he goes into some detail explaining each one and the differences between them.

Andrew Ng and the Quest for the New AI

This is an interesting Wired piece about Andrew Ng, best known as the Stanford machine learning professor who also co-founded Coursera. The article talks about Ng's background and interest in artificial intelligence as well as some of the deep learning projects he is working on. It goes on to explain a little about what deep learning is and how it may evolve in the future.

Our Connected Future

Our final piece this week is a GigaOM article about connected devices and how they will become more prevalent in the future. The article highlights some very interesting devices, explains what they do, and describes how they are being used. The article also talks about the data that can be collected from connected devices such as these and different ways that this data can be used.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: Big Data Value, Education, Social Data Analysis, and Saving the Planet

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from Big Data's impact on education to using data to reduce global violence. In this week's round-up:

  • The Value of Big Data Isn't the Data
  • Big Data Will Revolutionize Learning
  • Data Analysis Should Be a Social Event
  • Using Big Data to Save the Planet

The Value of Big Data Isn't the Data

This is an Harvard Business Review blog post by CTO of Narrative Science and Northwestern faculty member, Kris Hammond about where he believes the value is in Big Data. Hammond proposes that the value is in getting machines to conduct the data analysis we need conducted and communicating their findings in an intuitive way. In the post, he describes in more detail why he believes this is so valuable and provides explanations and diagrams outlining the steps that can be taken in order to put these processes in place.

Big Data Will Revolutionize Learning

This interesting Smart Data Collective article is about how technology now allows us to capture information about virtually everything that happens in education and what this means for the future of education. Some of these things include customizing content for individual students, reducing drop-out rates, and enhancing the overall learning experience - all resulting in improved student outcomes. The articles talks a little about each of these and describes how they are, and will continue to be, implemented.

Data Analysis Should Be a Social Event

This is another interesting HBR article advocating a more social approach to solving data analysis problems. The authors urge us to use an approach familiar to those that have attended data-dives or hackathons before - get a group of people with various different perspectives together to brainstorm and come up with ideas about how to best solve the problem you're trying to solve. The article points out that this approach doesn't just work well at hackathons, it has also been implemented with great success at companies.

Using Big Data to Save the Planet

Our final article this week is a Slashdot piece about how the U.S. State Department is partnering with groups from around the world and using data analytics to help reduce violence in countries where it is a major problem. According to the article, they are using an analytics tool named Senturion to track data that can be obtained from social networks, economic data, and other sources to provide output that can help determine what types of resources are necessary on the ground in those troubled countries. The article mentions some of the countries where this analytics system is helping to identify conflict trends and also provides some examples of specific initiatives it is providing assistance with.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: Ford's Data, Apple's iWatch, Wavii's Acquisition, and Fighting Malaria

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from how Ford is leveraging data to improve their operations to combating malaria using data from cell phones. In this week's round-up:

  • How Data is Changing the Car Game for Ford
  • How Apple's iWatch Will Push Big Data Analytics
  • Google Bags Another Machine Learning Startup
  • Researchers Use Data from Cell Phones to Combat Outbreaks

How Data is Changing the Car Game for Ford

This is a GigaOM article about how Ford Motor Company is using data to build better cars and better customer experiences. The article goes into some detail about how the company is doing both of these things, such as creating data products that are available to consumers with some of their automobiles that provide them with data about their car's performance. The author goes on to quote some of the folks in charge of the data efforts at Ford about internal data processes and some of the changes the company has had to make in order to become more data-driven.

How Apple's iWatch Will Push Big Data Analytics

This is a Smart Data Collective article about what Apple's rumored iWatch could mean for Big Data. According to the article, the watch will be able to capture data about where you've been, what you've eaten, how many calories you've burned, and how you've slept among other things. The author provides some examples of products currently on the market (such as Nike's Fuelband and the Fitbit Ultra) that have opened up the amount of data that can be collected from individuals and opines that Apple's smart watch will capture significant share of this market. He also predicts that this will change the world of big data analytics, and he provides some examples of why he believes this.

Google Bags Another Machine Learning Startup

Google acquired machine learning startup, Wavii, this week and this Wired article has some of the details about the startup, the acquisition, and about how Wavii's technology may be used inside of Google. The article mentions that there was a bidding war between Apple and Google for the company, so hopefully Google will be able to make this victory pay off in the near future.

Researchers Use Data from Cell Phones to Combat Outbreaks

This is an MIT Technology Review article about how epistemologists at Harvard have been able to track the spread of diseases such as malaria by studying data generated from cell phone towers in Kenya. Using this data, they can track movement to and from regions of the country they know have a high infection rate and feed that information into predictive models that can forecast how the diseases may spread. The article goes into much more detail and is a fascinating and informative read.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: Probabilistic Programming, Tech Startups, Data Viz Elements, and Super Mario Bros.

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from probabilistic programming to machines playing video games. In this week's round-up:

  • What is Probabilistic Programming?
  • 5 Ways for Tech Start-Ups to Attract Analytics Talent
  • The Three Elements of Successful Data Visualizations
  • AI Solves Super Mario Bros and Other NES Games

What is Probabilistic Programming?

This is an interesting O'Reilly article introducing probabilistic programming. The article talks about what probabilistic programming is, how it differs from regular high-level programming, and intuitively explains how it works. The author also explains how he believes the technology's development will progress and the impact it will have on data science and other technologies.

5 Ways for Tech Start-Ups to Attract Analytics Talent

For those looking to hire analytical talent, this article provides some practical pointers for hiring a data scientist. These pointers focus on some of the softer skills that are necessary to really excel in these types of roles and also on structuring an environment where your data scientists are properly motivated to do their absolute best work.

The Three Elements of Successful Data Visualizations

This is a Harvard Business Review article about what elements are necessary in making great data visualizations. The article highlights three elements - understanding the audience, setting up and framework, and telling a story - and explains why each of these are important in a little more detail.

AI Solves Super Mario Bros and Other NES Games

This article is about an interesting and fun application of machine learning - teaching a machine to solve video games. It revolves around a paper written by computer scientist Tom Murphy about how he was able to accomplish this using lexicographic ordering. The article talks about Murphy's research and how he went about figuring out how to do this. It also has a link to Murphy's paper for those that would like some more in-depth reading on the subject.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups