data business

Endgame hosts DIDC's Data and Cyber Security August Event

Our (lucky number) 13th DIDC meetup took place at the spacious offices of Endgame in Clarendon, VA. Endgame very graciously provided incredible gourmet pizza (and beer) for all those who attended.

Beyond such excellent beverages and  food, attendees were treated to four separate and compelling talks. For those of you who could not attend, a little information about the talks and speakers is below (as well as contact information) and the slides!

Selling Data Science: Validation

FixMyPineapple2 We are all familiar with the phrase "We can not see the forest for the trees", and this certainly applies to us as data scientists.  We can become so involved with what we're doing, what we're building, the details of our work, that we don't know what our work looks like to other people.  Often we want others to understand just how hard it was to do what we've done, just how much work went into it, and sometimes we're vain enough to want people to know just how smart we are.

So what do we do?  How do we validate one action over another?  Do we build the trees so others can see the forrest?  Must others know the details to validate what we've built, or is it enough that they can make use of our work?

We are all made equal by our limitation to 24 hours in a day, and we must choose what we listen to and what we don't, what we focus on and what we don't.  The people who make use of our work must do the same.  John Locke proposed the philosophical thought experiment, "If a tree falls in the woods and no one is around to hear it, does it make a sound?"  If we explain all the details of our work, and no one gives the time to listen, will anyone understand?  To what will people give their time?

Let's suppose that we can successfully communicate all the challenges we faced and overcame in building our magnificent ideas (as if anyone would sit still that long), what then?  Thomas Edison is famous for saying, “I have not failed. I've just found 10,000 ways that won't work.”, but today we buy lightbulbs that work, who remembers all the details about the different ways he failed?  "It may be important for people who are studying the thermodynamic effects of electrical currents through materials." Ok, it's important to that person to know the difference, but for the rest of us it's still not important.  We experiment, we fail, we overcome, thereby validating our work because others don't have to.

Better to teach a man to fish than to provide for him forever, but there are an infinite number of ways to successfully fish.  Some approaches may be nuanced in their differences, but others may be so wildly different they're unrecognizable, unbelievable, and beg for incredulity.  The catch is (no pun intended) methods are valid because they yield measurable results.

It's important to catch fish, but success is not consistent nor guaranteed, and groups of people may fish together so after sharing their bounty everyone is fed.  What if someone starts using this unrecognizable and unbelieveable method of fishing?  Will the others accept this "risk" and share their fish with those who won't use the "right" fishing technique, their technique?  Even if it works the first time that may simply be a fluke they say, and we certainly can't waste any more resources "risking" hungry bellies now can we.

So does validation lie in the method or the results?  If you're going hungry you might try a new technique, or you might have faith in what's worked until the bitter end.  If a few people can catch plenty of fish for the rest, let the others experiment.  Maybe you're better at making boats, so both you and the fishermen prosper.  Perhaps there's someone else willing to share the risk because they see your vision, your combined efforts giving you both a better chance at validation.

If we go along with what others are comfortable with, they'll provide fish.  If we have enough fish for a while, we can experiment and potentially catch more fish in the long run.  Others may see the value in our experiments and provide us fish for a while until we start catching fish.  In the end you need fish, and if others aren't willing to give you fish you have to get your own fish, whatever method yields results.

Weekly Round-Up: Hadoop, Big Data vs. Analytics, Process Management, and Palantir

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from Hadoop to business process management. In this week's round-up:

  • To Hadoop or Not to Hadoop?
  • What’s the Difference Between Big Data and Business Analytics?
  • What Big Data Means to BPM
  • How A Deviant Philosopher Built Palantir

To Hadoop or Not to Hadoop?

Our first piece this week is an interesting blog post about what sorts of data operations Hadoop is and isn't good for. The post can serve as a useful guide when trying to figure out whether or not you should use Hadoop to do what you're thinking of doing with your data. It is organized into 5 categories of things you should consider and contains a series of questions you can ask yourself for each of the categories to help with your decision-making.

What’s the Difference Between Big Data and Business Analytics?

This is an excellent post on Cathy O'Neil's Mathbabe blog about how she distinguishes big data from business analytics. Cathy argues that what most people consider big data is really business analytics (on arguably large data sets) and that big data, in her opinion, consists of automated intelligent systems that algorithmically know what to do and need very little human interference. She goes into more detail about the differences between, including some examples to drive home her point.

What Big Data Means to BPM

Continuing on the subject of intelligent systems performing business processes, our third piece this week is a Data Informed article about big data's effect on business process management. The article is an interview with Nathaniel Palmer, a BPM veteran practitioner and author. In the interview, Palmer answers questions about what kinds of trends are emerging in business process management, how big data is affecting its practices, and what changes are being brought about because of it.

How A Deviant Philosopher Built Palantir

Our last piece this week is a Forbes article about Palantir, an analytics software company that works with federal intelligence agencies and is funded by In-Q-Tel - the CIA's investment fund. The article describes the company's CEO, what the company does, who it does for, and delves into some of Palantir's history. Overall, the article provides an interesting look at a very interesting company.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: Machine Learning, DIY Data Scientists, Games, and Helping Couples Conceive

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from machine learning to helping couple's conceive. In this week's round-up:

  • Jeff Hawkins: Where Open Source and Machine Learning Meet Big Data
  • The Rise Of The DIY Data Scientist
  • Why Games Matter to Artificial Intelligence
  • Three Questions for Max Levchin About His New Startup

Jeff Hawkins: Where Open Source and Machine Learning meet Big Data

Our first piece this week is an InfoWorld article about Jeff Hawkins, the machine learning work that him and his company have been doing, and the open source project they've recently released on Github. The project's name is the Numenta Platform for Intelligent Computing (NuPIC) and it's goal is to allow others to be able to embed machine intelligence into their own systems. The article has a short interview with Jeff and a link to the Github page where the project resides.

The Rise Of The DIY Data Scientist

This is an interesting Fast Company article about how Kaggle competition winners tend to be self-taught. The author of the article interview's Kaggle's chief scientist Jeremy Howard about this phenomenon and other interesting findings derived from Kaggle's competitions about data scientists. Some of the questions inquire about where the winners are from, how they learned data science, and what machine learning algorithms they use.

Why Games Matter to Artificial Intelligence

This blog post on the IBM Research blog is an interview with Dr. Gerald Tesauro about the significance of games in the Artificial Intelligence field. Dr. Tesauro was the IBM research scientists who taught Watson how to play Jeopardy. In the interview, he explains how games tend to be an ideal training ground for machines because they tend to simplify real life. He goes on to answer questions about how that prepares the machines for transitioning to other real-world problems, what he's currently working on, what Watson is doing these days, and where else machine learning can be used.

Three Questions for Max Levchin About His New Startup

Our final piece this week is an MIT Technology Review article about PayPal co-founder Max Levchin's new startup called Glow. A lot of people are having children later in life these days and one downside of this is that many couples have trouble trying to conceive. Levchin has developed an iPhone app that uses data to help couples identify the optimal time for conception. In this brief interview, Levchin talks about what they are doing, why, and the degree of accuracy they hope to achieve.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: Data Science Roles, Technology Stacks, Predictive Analytics, and Michael Jordan

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from data science technology stacks to Michael Jordan. In this week's round-up:

  • Five Roles You Need on Your Big Data Team
  • Choosing a Data Science Technology Stack
  • 12 Predictive Analytics Screw-ups
  • What Michael Jordan Can Teach Us About Big Data, Strategy And Innovation

Five Roles You Need on Your Big Data Team

Our first piece this week is an HBR article about the different roles you need when building a data science team. Data science is a very broad field and because of this, it's difficult to find someone who has all the skills that fall under its umbrella. This article attempts to break down the skill sets into more specific roles that can work together to really create value for an organization. The article lists the different roles, describes them, and also talks about the kind of culture you need to develop in order to get everyone in the organization on board and on the same page.

Choosing a Data Science Technology Stack

This is an interesting blog post about different data science technology stacks and how we as data scientists go about choosing one that works best for us. The author points out that there are several layers to a data science stack - sourcing the data, storing it, exploring it, modeling it, etc. - and there are several technological options available for performing each layer. The post examines these different options and even has a survey you can enter the technologies you use for each layer. When the survey is complete, those who participated will be emailed the results.

12 Predictive Analytics Screw-ups

This is a ComputerWorld article about some of the pitfalls you would do well to avoid when performing predictive analytics. The author interviewed experts at 3 data science consulting firms - Elder Research, Abbott Analytics, and Prediction Impact - about about the different mistakes they encounter to come up with this list. Take a look through them and see how many you've encountered yourself!

What Michael Jordan Can Teach Us About Big Data, Strategy And Innovation

Our final piece this week is a Forbes article that uses Michael Jordan and other sports examples to drive home points about big data and how we use it in business. The author starts out by drawing a parallel between the types of decisions managers need to make these days about new technologies, opportunities, and employees to looking at Michael in his early days when his athletic potential wasn't as obvious. He continues through the rest of the article writing about the processes we go through, the data we look at in our attempts to evaluate a situation and make appropriate decisions, and how big data and advances in technology improve our abilities to do all these things over time.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: Big Data Projects, OpenGeo, Coca-Cola, and Crime-Fighting

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from big data projects to Coca-Cola. In this week's round-up:

  • 5 Big Data Projects That Could Impact Your Life
  • CIA Invests in Geodata Expert OpenGeo
  • How Coca-Cola Takes a Refreshing Approach to Big Data
  • Fighting Crime with Big Data

5 Big Data Projects That Could Impact Your Life

Our first piece this week is a Mashable article listing 5 interesting data projects. The projects range from one that projects transit times in NYC to one that tracks homicides in DC to one that illustrates the prevalence of HIV in the United States. All are great examples of people doing interesting things with data that is becoming increasingly available.

CIA Invests in Geodata Expert OpenGeo

A while back, the CIA spun off a strategic investment arm called In-Q-Tel to make investments in data and technologies that could benefit the intelligence community. This week, it was announced that they have invested in geo-data startup OpenGeo. This GigaOM article provides a little detail about the company and what they do and also lists some of the other companies In-Q-Tel has invested in thus far.

How Coca-Cola Takes a Refreshing Approach to Big Data

This is an interesting Smart Data Collective article about Coca-Cola and how they use data to drive their decisions and maintain a competitive advantage. The article describes multiple ways the company uses big data and analytics, from interacting with their Facebook followers to the formulas for their soft drinks.

Fighting Crime with Big Data

Our final piece this week is an article about how analytics platform provider, Palantir, helps investigators find patterns to uncover white collar crime, which is usually hidden using data. The article contains multiple quotes from Palantir's legal counsel Ryan Taylor about how they work with crime-fighting agencies and what methods they employ to bring these criminals to justice.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: Data Scientist Types, Data Protection, Travel, and Jay-Z

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from data scientists types to data collecting music apps. In this week's round-up:

  • What Kind Of Data Scientist Are You?
  • Evernote’s Three Laws of Data Protection
  • Big Data Analysis Drives Revolution In Travel
  • Samsung and Jay-Z Accused of Using New Album to Mine Customer Data

What Kind Of Data Scientist Are You?

Our first article this week is a Fast Company piece about the new ebook our very own Harlan Harris, Marck Vaisman, and Sean Murphy authored. The ebook is about how there are actually multiple types of data scientists and the different combinations of skills and experience each type tends to have. The article provides some overview, some excerpts and graphics, and a link to the ebook as well.

Evernote’s Three Laws of Data Protection

This is a Smart Data Collective article about Evernote's stance on data protection and how it differs from other companies. Evernote is one of the most popular note-taking apps on the market, essentially letting you keep a copy of your brain out in the cloud where you can access it from anywhere and remember things your real brain may have forgotten. That being the case, the privacy of their users' data is of great importance to them.

Big Data Analysis Drives Revolution In Travel

Our third piece this week is an InformationWeek article about how data is revolutionizing the travel industry. We've all had to endure the frustrations that often come along with getting from point A to point B. This article highlights several companies and explains how they are using data to operate more efficiently and improve customer experiences.

Samsung and Jay-Z Accused of Using New Album to Mine Customer Data

Our final piece this week is a Time article about how Samsung and rapper Jay-Z offered early access to Jay's new album Magna Carta Holy Grail through an app on select Samsung mobile devices. The intent seemed to be for them to be able to collect some data about the types of customers that would want access to the album before the official release date. This article describes some of the data the app requested and talks about how this has raised some eyebrows about why they would need to collect the type of data they are collecting.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: Data Scientists, Startups, Big Data Leaders, and Einstein

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from data scientists job descriptions to contrasting big data and genius. In this week's round-up:

  • It's a Bird! It's a Plane! No, It's Just a Data Scientist.
  • Meet the Startups Making Machine Learning an Elementary Affair
  • What the Companies Winning at Big Data Do Differently
  • What Would Big Data Think of Einstein?

It's a Bird! It's a Plane! No, It's Just a Data Scientist.

This week, we start off with a Smart Data Collective article about how typical data scientist job descriptions tend to be composed of an unrealistic wishlist of things the hiring organization thinks a data scientist is. The article mentions how the term data scientist is very unclear in nature and how it is made up of at least two roles - data management and data analytics - both of which take up a substantial amount of a person's time.

Meet the Startups Making Machine Learning an Elementary Affair

Next up, we have a GigaOM article about startups that are trying to make machine learning tools that business users can use. The article lists 5 startups and talks a little about what each one does and what they're trying to produce.

What the Companies Winning at Big Data Do Differently

This Bloomberg article examines a survey done by Tata Consulting Services on large companies with substantial investments in big data technologies and explains what the differences are between companies that are getting a high return on these investments and companies that are not.

What Would Big Data Think of Einstein?

Our final article this week is a BBC piece that asks the question what happens to genius and big ideas in a world where big data gets so much attention. The author says that coming up with answers becomes relatively easy once you have the data and you know what you want to measure. The problem with this is that it focuses on looking backward and not the creativity and imagination it takes to look toward the future.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: Industrial Internet, Business Culture, Visualization, and Beer Recommendations

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from the Industrial Internet to beer recommendations. In this week's round-up:

  • The Googlization Of GE
  • 10 Qualities a Data-Friendly Business Culture Needs
  • Interview with Miriah Meyer - Microsoft Faculty Fellow and Visualization Expert
  • Recommendation System in R

The Googlization Of GE

This is an interesting Forbes article about GE, the Internet of Things (which it calls the Industrial Internet), and how they are trying to be to that space what Google has become to the consumer data space.

10 Qualities a Data-Friendly Business Culture Needs

Running a data-driven organization requires not only having the right talent, tools, and infrastructure to meet the organization's objectives. It also requires a data-friendly culture, which is the premise for this article. The author identifies 10 qualities that can make for a better environment to foster innovative data-driven processes.

Interview with Miriah Meyer - Microsoft Faculty Fellow and Visualization Expert

This post is part of Jeff Leek's interview series on his Simply Stats blog. This week Jeff interviewed Miriah Meyer, who is an expert on data visualization. The interview includes questions about her work, background, influences, and advice she has for data scientists about visualization.

Recommendation System in R

This is a fun blog post about putting together a beer recommendation system using the R statistical programming language. The author walks us through the processes he followed, includes snippets of the code he used, and even shows off the resulting app where you choose a beer you like and it recommends other beers that are similar to it.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: Big Data ROI, Statistics, GE, and China

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from Big Data's return on investment to its progress in China. In this week's round-up:

  • Big Data ROI Still Tough To Measure
  • What Statistics Should Do About Big Data
  • GE CEO Jeff Immelt’s Big Data Bet
  • In China, Big Data Is Becoming Big Business

Big Data ROI Still Tough To Measure

This is an article about how difficult it is to measure the return on investment of big data solutions. Given all the hype in the media, business leaders naturally want to know whether their investments in these solutions are paying off. The article goes on to describe some of the complexities involved and talks about some of the obstacles that will have to be overcome in order for business leaders to feel more satisfied with the solutions they invest in.

What Statistics Should Do About Big Data

This is a blog post by Jeff Leek continuing the discussions being had recently about the role of statistics in big data. Jeff writes about his understanding of what some of the issues raised in previous conversations boil down to and then provides his thoughts about what statisticians need to do in order to not get left out of the big data discussion. He concludes the post with a list of things he'd like to see come out these discussions that would help the discipline progress to the next level.

GE CEO Jeff Immelt’s Big Data Bet

This is a summary of GE CEO Jeff Immelt's interview at the D11 conference this past week, which centered around how data collected from sensors can make machines more efficient - what GE calls the Industrial Internet. The article provides some examples of where GE is trying to implement these practices and explains why it's important for GE to be doing this. If you'd like to see the full interview, you can find the video here.

In China, Big Data Is Becoming Big Business

Our last article this week is a Bloomberg BusinessWeek piece about how big data is progressing in China. The fact that it is such a large country and the fact that an increasing number of its citizens are using technology means that the quantity of data generated is rapidly increasing. This article talks about how data scientists will be in high demand there in the near future and how both government and businesses are working on building infrastructure that can support their data needs.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups