data startups

SynGlyphX: Hello and Thank You DC2!

The following is a sponsored post brought to you by one of the supporters of two of Data Community's five meetups.

Hello and Thank You DC2!

This week was my, and my company’s, introduction to Data Community DC (DC2).  We could not have asked for a more welcoming reception.  We attended and sponsored both Tuesday’s DVDC event on Data Journalism and Thursday’s DSDC event on GeoSpatial Data Analysis.  They were both pretty exciting, and timely, events for us.

SynglyphyxAs I mentioned, I’m new to DC2 and new to the “data as a science” community.  Don’t get me wrong, while I’m new to DC2 I’ve been awash in data my entire career.  I started as a young consultant reconciling discrepancies in the databases of a very early Client-Server implementation.  Basically, I had to make sure that all the big department store orders on the server were in sync with the home delivery client application.  A lot of manual reconciling that ultimately led to me programming code to semi-automatically reconcile the two databases.  Eventually (I think) they solved the technical issues that led the Client-Server databases being out of sync.

Synglyphyx2More recently, I was working for a company with a growing professional services organization.  The company typically hired new employees after a contract was signed; but the new professional services work involved short project durations.  If we waited to hire, the project would be over before someone started.  We developed a probability adjusted / portfolio analysis approach to compare supply of available resources (which is always changing as people finish projects, get extended, leave the organization) vs. demand (which is always changing as well), that enabled us to determine a range of positions and skillsets to hire for in a defined timeframe.

In both instances, it was data science that drove effective decision making.  Sure, you can apply some “gut” to any decision, but having some data science behind you makes the case much stronger.

So, I was fascinated to listen to the journalists discuss how they are applying data analysis to help:  1) support existing story lines; and 2) develop new story lines.  Nathan’s presentation on analyzing AIS data was interesting (and a bit timely as we had just gotten a verbal win for a client on doing similar type work, similar, but not exactly the same).

I know the power of data to solve complex business, operational, and other problems.  With our new company, SynGlyphX, we are focused on helping people both visualize and interact with their data.  We live in a world with sight and three dimensions.  We believe that by visualizing the data (unstructured, filtered, analyzed, any kind of data), we can help people leverage the power of the brain to identify patters, spot trends, and detect anomalies.  We joined DC2 to get to know folks in the community, generate some awareness for our company, and to get your feedback on what we are doing.  Thank you all for welcoming us and our company, SynGlyphX, to the community.  We appreciated everyone’s interest in the demonstrations of our interactive visualization technology.  Our website traffic was up significantly last week, so I am hoping this is a sign that you were interested in learning more about us.  Additionally, I have heard from a number of you since the events, and welcome hearing from more.

Here’s my call to action, I encourage you to tweet us your answer to the following question:  “Why do you find it helpful to visually interact with your data?”

See you at upcoming events.

Mark Sloan

About the Author:

As CEO of SynGlyphX, Mark brings over two decades of experience.  Mark began his career at Accenture, co-founded the global consulting firm RTM Consulting, and served as Vice President and General Manager of Convergys’ Consulting and Professional Services Group.

Mark has a M.B.A. from The Wharton School of the University of Pennsylvania, and a B.S. in Civil Engineering from the University of Notre Dame. He is a frequent speaker at industry events and has served as an Advisory Board Member for the Technology Professional Services Association (now Technology Services Industry Association (TSIA)).

Selling Data Science: Validation

FixMyPineapple2 We are all familiar with the phrase "We can not see the forest for the trees", and this certainly applies to us as data scientists.  We can become so involved with what we're doing, what we're building, the details of our work, that we don't know what our work looks like to other people.  Often we want others to understand just how hard it was to do what we've done, just how much work went into it, and sometimes we're vain enough to want people to know just how smart we are.

So what do we do?  How do we validate one action over another?  Do we build the trees so others can see the forrest?  Must others know the details to validate what we've built, or is it enough that they can make use of our work?

We are all made equal by our limitation to 24 hours in a day, and we must choose what we listen to and what we don't, what we focus on and what we don't.  The people who make use of our work must do the same.  John Locke proposed the philosophical thought experiment, "If a tree falls in the woods and no one is around to hear it, does it make a sound?"  If we explain all the details of our work, and no one gives the time to listen, will anyone understand?  To what will people give their time?

Let's suppose that we can successfully communicate all the challenges we faced and overcame in building our magnificent ideas (as if anyone would sit still that long), what then?  Thomas Edison is famous for saying, “I have not failed. I've just found 10,000 ways that won't work.”, but today we buy lightbulbs that work, who remembers all the details about the different ways he failed?  "It may be important for people who are studying the thermodynamic effects of electrical currents through materials." Ok, it's important to that person to know the difference, but for the rest of us it's still not important.  We experiment, we fail, we overcome, thereby validating our work because others don't have to.

Better to teach a man to fish than to provide for him forever, but there are an infinite number of ways to successfully fish.  Some approaches may be nuanced in their differences, but others may be so wildly different they're unrecognizable, unbelievable, and beg for incredulity.  The catch is (no pun intended) methods are valid because they yield measurable results.

It's important to catch fish, but success is not consistent nor guaranteed, and groups of people may fish together so after sharing their bounty everyone is fed.  What if someone starts using this unrecognizable and unbelieveable method of fishing?  Will the others accept this "risk" and share their fish with those who won't use the "right" fishing technique, their technique?  Even if it works the first time that may simply be a fluke they say, and we certainly can't waste any more resources "risking" hungry bellies now can we.

So does validation lie in the method or the results?  If you're going hungry you might try a new technique, or you might have faith in what's worked until the bitter end.  If a few people can catch plenty of fish for the rest, let the others experiment.  Maybe you're better at making boats, so both you and the fishermen prosper.  Perhaps there's someone else willing to share the risk because they see your vision, your combined efforts giving you both a better chance at validation.

If we go along with what others are comfortable with, they'll provide fish.  If we have enough fish for a while, we can experiment and potentially catch more fish in the long run.  Others may see the value in our experiments and provide us fish for a while until we start catching fish.  In the end you need fish, and if others aren't willing to give you fish you have to get your own fish, whatever method yields results.

Weekly Round-Up: Machine Learning, DIY Data Scientists, Games, and Helping Couples Conceive

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from machine learning to helping couple's conceive. In this week's round-up:

  • Jeff Hawkins: Where Open Source and Machine Learning Meet Big Data
  • The Rise Of The DIY Data Scientist
  • Why Games Matter to Artificial Intelligence
  • Three Questions for Max Levchin About His New Startup

Jeff Hawkins: Where Open Source and Machine Learning meet Big Data

Our first piece this week is an InfoWorld article about Jeff Hawkins, the machine learning work that him and his company have been doing, and the open source project they've recently released on Github. The project's name is the Numenta Platform for Intelligent Computing (NuPIC) and it's goal is to allow others to be able to embed machine intelligence into their own systems. The article has a short interview with Jeff and a link to the Github page where the project resides.

The Rise Of The DIY Data Scientist

This is an interesting Fast Company article about how Kaggle competition winners tend to be self-taught. The author of the article interview's Kaggle's chief scientist Jeremy Howard about this phenomenon and other interesting findings derived from Kaggle's competitions about data scientists. Some of the questions inquire about where the winners are from, how they learned data science, and what machine learning algorithms they use.

Why Games Matter to Artificial Intelligence

This blog post on the IBM Research blog is an interview with Dr. Gerald Tesauro about the significance of games in the Artificial Intelligence field. Dr. Tesauro was the IBM research scientists who taught Watson how to play Jeopardy. In the interview, he explains how games tend to be an ideal training ground for machines because they tend to simplify real life. He goes on to answer questions about how that prepares the machines for transitioning to other real-world problems, what he's currently working on, what Watson is doing these days, and where else machine learning can be used.

Three Questions for Max Levchin About His New Startup

Our final piece this week is an MIT Technology Review article about PayPal co-founder Max Levchin's new startup called Glow. A lot of people are having children later in life these days and one downside of this is that many couples have trouble trying to conceive. Levchin has developed an iPhone app that uses data to help couples identify the optimal time for conception. In this brief interview, Levchin talks about what they are doing, why, and the degree of accuracy they hope to achieve.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: Ford's Data, Apple's iWatch, Wavii's Acquisition, and Fighting Malaria

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from how Ford is leveraging data to improve their operations to combating malaria using data from cell phones. In this week's round-up:

  • How Data is Changing the Car Game for Ford
  • How Apple's iWatch Will Push Big Data Analytics
  • Google Bags Another Machine Learning Startup
  • Researchers Use Data from Cell Phones to Combat Outbreaks

How Data is Changing the Car Game for Ford

This is a GigaOM article about how Ford Motor Company is using data to build better cars and better customer experiences. The article goes into some detail about how the company is doing both of these things, such as creating data products that are available to consumers with some of their automobiles that provide them with data about their car's performance. The author goes on to quote some of the folks in charge of the data efforts at Ford about internal data processes and some of the changes the company has had to make in order to become more data-driven.

How Apple's iWatch Will Push Big Data Analytics

This is a Smart Data Collective article about what Apple's rumored iWatch could mean for Big Data. According to the article, the watch will be able to capture data about where you've been, what you've eaten, how many calories you've burned, and how you've slept among other things. The author provides some examples of products currently on the market (such as Nike's Fuelband and the Fitbit Ultra) that have opened up the amount of data that can be collected from individuals and opines that Apple's smart watch will capture significant share of this market. He also predicts that this will change the world of big data analytics, and he provides some examples of why he believes this.

Google Bags Another Machine Learning Startup

Google acquired machine learning startup, Wavii, this week and this Wired article has some of the details about the startup, the acquisition, and about how Wavii's technology may be used inside of Google. The article mentions that there was a bidding war between Apple and Google for the company, so hopefully Google will be able to make this victory pay off in the near future.

Researchers Use Data from Cell Phones to Combat Outbreaks

This is an MIT Technology Review article about how epistemologists at Harvard have been able to track the spread of diseases such as malaria by studying data generated from cell phone towers in Kenya. Using this data, they can track movement to and from regions of the country they know have a high infection rate and feed that information into predictive models that can forecast how the diseases may spread. The article goes into much more detail and is a fascinating and informative read.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Weekly Round-Up: CIA Big Data, Unifying Mean/Median/Mode, New Data Startups, and Naked Statistics

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from data startups to statistics lessons. In this week's round-up:

  • CIA Presentation on Big Data
  • Modes, Medians and Means: A Unifying Perspective
  • A Couple New Notable Data Startups
  • Naked Statistics: Stripping the Dread From the Data

CIA Presentation on Big Data

This is a Business Insider article about the presentation made by CIA Chief Technology Officer, Ira "Gus" Hunt, at GigaOM's Structure data conference in New York. The presentation was about how the agency plans to capture, store, and use the vast amounts of data it is able to collect. The article includes some highlights of the talk and a link to Hunt's slides from the presentation. The video and transcript of the entire talk can be found on GigaOM's website here.

Modes, Medians and Means: A Unifying Perspective

This is a post published earlier this week on the blog of John Myles White, co-author of Machine Learning for Hackers, where he tackles the task of explaining the relationships between mean, median, and mode; noting that this particularly important topic is usually excluded from introductory statistics courses. His explanation of the relationships between the three summary statistics comes across as intuitive and very well structured. For those that have a grasp on basic statistics, this post will definitely help you understand things a little deeper.

A Couple New Notable Data Startups

This week, I came across a couple articles about new startups in the data space that should be interesting to watch grow. The first was a TechCrunch article about Fivetran, a company that wants to reinvent spreadsheets so that they can handle the more modern data analysis tasks that have outpaced the functionality of traditional spreadsheets. Fivetran is backed by Paul Graham's startup incubator, Y-Combinator, and the article provides an overview of the problems they are trying to solve and how they are trying to solve them.

The second data startup article was about Wise.io, a company that is trying to provide machine learning as a service to the masses. The article talks about what they're trying to accomplish, where they got the idea from, and some of their sources of revenue (they are bootstrapped and already profitable).

Naked Statistics: Stripping the Dread From the Data

This is an interesting review of the recently released book Naked Statistics by Charles Wheelan on the Economist website. The book aims to strip away the complexity and explain statistics intuitively by using language, examples, and humor that most people can identify with. The review describes some of the specific examples used in the book to illustrate statistical concepts, comments on some of the other ways Wheelan has chosen to deliver the material, and highlights some of the things you will learn from reading the book.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups