What is data science? You may or may not have heard of it before because it is a newer term. It has only existed for 10 years in its current form with widespread use beginning just a few years ago. But what if I associate it with machine learning and data mining? Does an idea start to take form?
Data science, a somewhat nebulous term with many definitions, has a common root; data science is the process of manipulating data via mathematical and scientific methods in order to accomplish an objective, the outcome of which could be called a data analytic. Results are then displayed to your audience.
Data science is a rapidly growing field, spurred on by the big data revolution, and is hot right now.
And that is why the Mid-Maryland Data Science (MMDS) meetup was created. Data science will be a fundamental part of the next big culture shift. But to be an active participant, knowledge is required. MMDS is an avenue to obtain that knowledge.
So what is next? If the outcome of the data science process is a data analytic, then it is time to look at the pieces required for a successful analytic.
These are the focus areas of MMDS. Events and speakers will address one or more of these areas in across many different industries.
Without an adequate infrastructure, an analytic can never exist. Many common infrastructures today use hadoop, map reduce, hbase, accumulo, R and python among many other technologies. Most of these are geared toward big data but data analytics are not exclusive to big data sets.
People are the second component, the human infrastructure. The best data science teams are not solely engineers or statisticians or any other single group. They are technological melting pots comprised of many skill sets. The common trait they share is the desire to investigate data and discover correlations, both obvious and hidden.
Data is the currency of the data scientist and it is paramount for their survival. Data that is prepared and cleaned is data that is ideally unbiased and consistent. In the real world, this typically means that data is consistent and biased in known ways. When biases affecting the data are known, models can take those biases into account and attempt to mitigate them.
The point of a model is to simulate a real world situation. The better the model is at achieving that goal, the more helpful a data analytic will be. But models are not necessarily right for every situation. If the goal is to rank a list of items based solely on which item appears the most times, then a model is not needed. Part of the model creation process is to determine if a model is needed.
The two examples are a Nate Silver presentation of the political parties of winning presidential candidates at the state level over the past 50 years and a video documenting Foursquare check-ins in New York City before, during and after Superstorm Sandy respectively. Both are wonderful examples of presentations enhancing data to tell a more compelling story.
Just creating a data analytic is often not enough. It must also incorporate some sort of feedback mechanism. A feedback mechanism helps improve the model and therefore the analytic. Users are a dynamic group that can change over time and an appropriate feedback mechanism helps the analytic change with its users.
This slide was current as of the night before the presentation. At the time of this writing, MMDS now has 250 members.
Now lets see what skills MMDS members have and what aspirations they have.
The business category contains CEOs, CTOs, investors, founders of companies and startup consultants among others. Scientists are also a varied group. These include data scientists as well as computer scientists. At first glance, this looks like a mistake but members chose carefully between developers/engineers and computer scientists and so this distinction was kept for this metric.
MMDS members already have many skills that directly apply to the phases of a successful analytic. MMDS meetups will help round out the rest.