Event Recap: Tandem NSI Deal Day (Part 1)

This is a guest post by John Kaufhold. Dr. Kaufhold is a data scientist and managing partner of Deep Learning Analytics, a data science company based in Arlington, VA. He presented an introduction to Deep Learning at the March Data Science DC Meetup. Tandem NSI is a public-private partnership between Arlington Economic Development and Amplifier Ventures. According to the TNSI website, the partnership is intended to foster a vibrant technology ecosystem that combines entrepreneurs, university researchers and students, national security program managers and the supporting business community. I attended the Tandem NSI Deal Day on May 7; this post is a summary of a few discussions relevant to DC2.

The format of Deal Day was a collection of speakers and panel discussions from both successful entrepreneurs and government representatives from the Arlington area, including:

  • Introductions by Arlington County Board Chairperson, Jay Fisette, and Arlington House Representative Jim Moran;
  • Current trends in mergers and acquisitions and business acquisitions for national security product startups;
  • “How to Hack the System,” a discussion with successful national security product entrepreneurs;
  • “Free Money,” in which national security agency program managers told us where they need research done by small business and how you can commercialize what you learn; and
  • “What’s on the Edge,” in which national security program managers told us where they have cutting edge opportunities for entrepreneurs that are on the edge of today’s tech, and will be the basis of tomorrow’s great startups.

There were two DC2-relevant themes from the day that I’ve distilled: the pros and cons of starting a tech business in the DC region, and the specific barriers to entry of which entrepreneurs focusing on obtaining federal contracts should be aware when operating in our region. This post will focus on the first theme; the second will be discussed in Part 2 of the recap, later this week.

Startups in the DC Metropolitan Statistical Area vs. “The Valley”

A lot of discussion focused on starting up a tech company here in the DC MSA (which includes Washington, DC; Calvert, Charles, Frederick, Montgomery and Prince George’s counties in MD; and Arlington, Fairfax, Loudoun, Prince William, and Stafford counties as well as the cities of Alexandria, Fairfax, Falls Church, Manassas and Manassas Park in VA) versus the Valley. Most of the panelists and speakers had experience starting companies in both places, and there were pros and cons to both. Here's a brief summary in no particular order.

DC MSA Startup Pros

  • Youth! According to Jay Fisette, Arlington has the highest percentage of 25-34 year olds in America.
  • Education. Money magazine called Arlington is the most educated city in America.
  • Capital. The concentration of many high-end government research sponsors--the National Science Foundation, Defense Advanced Research Projects Agency, Intelligence Advanced Research Projects Agency, the Office of Naval Research, etc.--can provide early-stage, non-dilutive research investment.
  • Localized impact. Entrepreneurial aims are often US-centric, rather than global.
  • A mission-focused talent pool.
  • A high concentration of American citizens and cleared personnel.
  • Local government support. As an example, initiatives like ConnectArlington provide more secure broadband for Arlington companies.

DC MSA Startup Cons

  • Localized impact. Entrepreneurial aims are often US-centric, rather than global. (Yes, this appears on both lists!)
  • Heavy regulations. Federal Acquisition Regulations (FAR) and Defense Contract Audit Agency accounting requirements can complicate the already difficult task of starting a business.
  • Bureaucracy. It’s DC. It’s a fact.
  • Extremely complex government organization with significant personnel turnover.
  • Less experienced “product managers.”

Silicon Valley Startup Pros

  • Venture capitalists and big corporations are “throwing money at you” in the tech space.
  • Plenty of entrepreneurial breadth.
  • Plenty of talent in productization.
  • Plenty of experience in commercial projects.
  • Very liquid and competitive labor market--which is great for individual employees.
  • Aims are often global, rather than US-centric.
  • Compensation is unconstrained by government regulation.
  • Great local higher education infrastructure: Berkeley, UNSF, National Labs, Stanford...

Silicon Valley Startup Cons

  • Very liquid and competitive labor market--which means building a loyal, talented team can be a struggle.
  • VCs and big corporation investments are unsustainably frothy.
  • Less talent in or exposure to federal contracting.
  • A smaller pool of American citizens and cleared personnel.

Check back later this week to find out what TNSI Deal Day panelists had to say about stumbling blocks to obtaining federal contracts!

SynGlyphX: Hello and Thank You DC2!

The following is a sponsored post brought to you by one of the supporters of two of Data Community's five meetups.

Hello and Thank You DC2!

This week was my, and my company’s, introduction to Data Community DC (DC2).  We could not have asked for a more welcoming reception.  We attended and sponsored both Tuesday’s DVDC event on Data Journalism and Thursday’s DSDC event on GeoSpatial Data Analysis.  They were both pretty exciting, and timely, events for us.

SynglyphyxAs I mentioned, I’m new to DC2 and new to the “data as a science” community.  Don’t get me wrong, while I’m new to DC2 I’ve been awash in data my entire career.  I started as a young consultant reconciling discrepancies in the databases of a very early Client-Server implementation.  Basically, I had to make sure that all the big department store orders on the server were in sync with the home delivery client application.  A lot of manual reconciling that ultimately led to me programming code to semi-automatically reconcile the two databases.  Eventually (I think) they solved the technical issues that led the Client-Server databases being out of sync.

Synglyphyx2More recently, I was working for a company with a growing professional services organization.  The company typically hired new employees after a contract was signed; but the new professional services work involved short project durations.  If we waited to hire, the project would be over before someone started.  We developed a probability adjusted / portfolio analysis approach to compare supply of available resources (which is always changing as people finish projects, get extended, leave the organization) vs. demand (which is always changing as well), that enabled us to determine a range of positions and skillsets to hire for in a defined timeframe.

In both instances, it was data science that drove effective decision making.  Sure, you can apply some “gut” to any decision, but having some data science behind you makes the case much stronger.

So, I was fascinated to listen to the journalists discuss how they are applying data analysis to help:  1) support existing story lines; and 2) develop new story lines.  Nathan’s presentation on analyzing AIS data was interesting (and a bit timely as we had just gotten a verbal win for a client on doing similar type work, similar, but not exactly the same).

I know the power of data to solve complex business, operational, and other problems.  With our new company, SynGlyphX, we are focused on helping people both visualize and interact with their data.  We live in a world with sight and three dimensions.  We believe that by visualizing the data (unstructured, filtered, analyzed, any kind of data), we can help people leverage the power of the brain to identify patters, spot trends, and detect anomalies.  We joined DC2 to get to know folks in the community, generate some awareness for our company, and to get your feedback on what we are doing.  Thank you all for welcoming us and our company, SynGlyphX, to the community.  We appreciated everyone’s interest in the demonstrations of our interactive visualization technology.  Our website traffic was up significantly last week, so I am hoping this is a sign that you were interested in learning more about us.  Additionally, I have heard from a number of you since the events, and welcome hearing from more.

Here’s my call to action, I encourage you to tweet us your answer to the following question:  “Why do you find it helpful to visually interact with your data?”

See you at upcoming events.

Mark Sloan

About the Author:

As CEO of SynGlyphX, Mark brings over two decades of experience.  Mark began his career at Accenture, co-founded the global consulting firm RTM Consulting, and served as Vice President and General Manager of Convergys’ Consulting and Professional Services Group.

Mark has a M.B.A. from The Wharton School of the University of Pennsylvania, and a B.S. in Civil Engineering from the University of Notre Dame. He is a frequent speaker at industry events and has served as an Advisory Board Member for the Technology Professional Services Association (now Technology Services Industry Association (TSIA)).

General Assembly & DC2 Scholarship

GA DC2 Scholarship The DC2 mission statement emphasises that "Data Community DC is an organization committed to connecting and promoting the work of data professionals...", ultimately we see DC2 becoming a hub for data scientists interested in exploring new material, advancing their skills, collaborating, starting a business with data, mentoring others, teaching classes, changing careers, etc. Education is clearly a large part of any of these interests, and while DC2 has held a few workshops and is sponsored by organizations like Statistics.com, we knew we could do more and so we partnered with General Assembly and created a GA & DC2 scholarship specifically for members of Data Community DC.

For our first scholarship we landed on Front End Web Development and User Experience, which we naturally announced first at Data Viz DC.  How does this relate to data science?  As I was happy to rebut Mr. Gelman in our DC2 blogpost reply, sometimes I would love to have a little sandbox where I get to play with algorithms all day, but then again this is exactly what I've run away from in 2013 in becoming an independent data science consultant, I don't want a business plan I'm not a part of dictating what I can play with.  Enter Web Dev and UX.  As Harlan Harris, organizer of DSDC, mentions in his venn diagram on what makes a data scientist, which Tony Ojeda later emphasizes, programming is a natural and necessary part of being a data scientist.  In other words, there's this thing called the interwebs that has more data than you can shake a stick at, and if you can't operate in that environment then as a data scientist you're asking someone else to do that heavy lifting for you.

Over the next month we'll be choosing the winners of the GA DC2 Scholarship, and if you'd like to see any other scholarships in the future please leave your thoughts in the comments below or tweet us.

Happy Thanksgiving!

Selling Data Science: Validation

FixMyPineapple2 We are all familiar with the phrase "We can not see the forest for the trees", and this certainly applies to us as data scientists.  We can become so involved with what we're doing, what we're building, the details of our work, that we don't know what our work looks like to other people.  Often we want others to understand just how hard it was to do what we've done, just how much work went into it, and sometimes we're vain enough to want people to know just how smart we are.

So what do we do?  How do we validate one action over another?  Do we build the trees so others can see the forrest?  Must others know the details to validate what we've built, or is it enough that they can make use of our work?

We are all made equal by our limitation to 24 hours in a day, and we must choose what we listen to and what we don't, what we focus on and what we don't.  The people who make use of our work must do the same.  John Locke proposed the philosophical thought experiment, "If a tree falls in the woods and no one is around to hear it, does it make a sound?"  If we explain all the details of our work, and no one gives the time to listen, will anyone understand?  To what will people give their time?

Let's suppose that we can successfully communicate all the challenges we faced and overcame in building our magnificent ideas (as if anyone would sit still that long), what then?  Thomas Edison is famous for saying, “I have not failed. I've just found 10,000 ways that won't work.”, but today we buy lightbulbs that work, who remembers all the details about the different ways he failed?  "It may be important for people who are studying the thermodynamic effects of electrical currents through materials." Ok, it's important to that person to know the difference, but for the rest of us it's still not important.  We experiment, we fail, we overcome, thereby validating our work because others don't have to.

Better to teach a man to fish than to provide for him forever, but there are an infinite number of ways to successfully fish.  Some approaches may be nuanced in their differences, but others may be so wildly different they're unrecognizable, unbelievable, and beg for incredulity.  The catch is (no pun intended) methods are valid because they yield measurable results.

It's important to catch fish, but success is not consistent nor guaranteed, and groups of people may fish together so after sharing their bounty everyone is fed.  What if someone starts using this unrecognizable and unbelieveable method of fishing?  Will the others accept this "risk" and share their fish with those who won't use the "right" fishing technique, their technique?  Even if it works the first time that may simply be a fluke they say, and we certainly can't waste any more resources "risking" hungry bellies now can we.

So does validation lie in the method or the results?  If you're going hungry you might try a new technique, or you might have faith in what's worked until the bitter end.  If a few people can catch plenty of fish for the rest, let the others experiment.  Maybe you're better at making boats, so both you and the fishermen prosper.  Perhaps there's someone else willing to share the risk because they see your vision, your combined efforts giving you both a better chance at validation.

If we go along with what others are comfortable with, they'll provide fish.  If we have enough fish for a while, we can experiment and potentially catch more fish in the long run.  Others may see the value in our experiments and provide us fish for a while until we start catching fish.  In the end you need fish, and if others aren't willing to give you fish you have to get your own fish, whatever method yields results.

Want to Learn Data Science? Get a Personal Tutor from SageBourse, a Hot New Startup in DC

SageBourseSageBourse is a tutoring marketplace for data science and programming. It's essentially a way for people who want to enhance their data science and programming skills to get personalized instruction from others in the community that are knowledgeable about those subjects. This can be either in person or online. The process is pretty simple. Tutors sign up and choose which subjects they know well enough to teach. When someone requests a lesson in a subject, the tutors that can teach that subject get notified and have the opportunity to bid on the lesson. Once the student chooses which bid they'd like to accept, they are connected with the tutor so they can coordinate a time that works best for both of them. When the lesson is over, SageBourse takes care of collecting payment from the student and paying the tutor.

This provides those that want to learn data science and programming an easy, affordable, and efficient way to do that. It also provides those that can teach these subjects the ability to leverage the knowledge they have to make a some extra money.

According to the Founder, DC2's very own Tony Ojeda:

Tony Ojeda_smallThere's an increasing amount of data being made available every single day.  I strongly believe that the more people we arm with the skills necessary to turn that data into information, and then do something useful with that information, the better off we will be.  Organizations are becoming more data-driven and the jobs of tomorrow are going to reflect that.  Technology is advancing and if your skills aren't advancing with it, you risk getting left behind.  I created SageBourse to help give people the ability to not only catch up with technology, but get ahead of it.

An Introduction to TechBreakfast and Why You Should Care

I want to introduce you to TechBreakfast, a rapidly growing meetup group that hosts events in Baltimore, Columbia, DC, and Northern VA.  So why the introduction? Don't you already have enough Meetup groups to go to? DC2 has at least two good reasons for doing so. First, TechBreakfast demos early stage technology companies from the area. While entrepreneurship and data are different, these two subjects are closely related and the data revolution probably would not be happening if it weren't for the tech startup scene.  Thus, we thought you might just be interested in this area.  While potentially relevant content is a good start, we don't mention any and all tech meetups that come our way. In this writer's humble opinion, TechBreakfast is one of the best run and enjoyable meetups that I have been to ... and I have been to hundreds such events. Keep reading if you want to learn more and discover some exciting upcoming events.


Want to see cool new technology? Want to interact with other cool techies, startups, and business folks? Have some time in the morning? Then come to TechBreakfast, a monthly breakfast in Baltimore, Columbia, DC, and Northern Virginia where entrepreneurs, techies, developers, designers, business people, and interested people see showcases on cool new technology in a demo format and interact with each other . "Show and Tell for Adults" is what we usually say. No boring presentations or speakers who drone on. This is a "show and tell" format where we tell people to show me, don't tell meabout the great things they are working on. Each TechBreakfast begins at 8:00am and goes until 10AM (although people usually hang around later).  This event is FREE! Thank our sponsors when you see them!

  • Wed. Feb. 27, 2013: Baltimore TechBreakfast - Featuring Vince Talbert Success Story - Bill Me Later. Featuring awesome technology companies showcasing their innovations in a demo-format and a Success Stories guest. Presenters for this installment: Vince Talbert, Platfolio, Sexual Health Innovations, OpiaTalk, and ChefTabl. FREE. Location: DLA Piper, Baltimore, MD. Register and info at http://www.meetup.com/TechBreakfast/events/97731722/
  • Fri. Mar. 1, 2013: Insurance BizWorkshop - Just because you are a startup or a small business doesn't mean that things can't go wrong. And when those things go wrong... they can really go wrong. The beauty of insurance is that it's there to protect you when things go wrong. And it doesn't always need to cost an arm and a leg. In the Insurance BizWorkshop on March 1, 2013, we'll bring one of the area's most advanced and notable firms in the insurance field to help you figure out what you need, how little or much coverage you need to cover those risks, and how to save a bundle doing it. Indeed, cover your butt for pennies on the dollar. Cost is $15 if you register by Feb. 27, 2013. More information and register at http://www.meetup.com/BizWorkshop/events/105199632/.
  • Wed. Mar. 6, 2013: NoVA TechBreakfast - Featuring awesome technology companies showcasing their innovations in a demo-format. Presenters for this installment: Nanobird, OpiaTalk, Omic Biosystems, Workman, Stormpins. FREE. Location: AOL Fishbowl, Reston, VA. Register and info at http://www.meetup.com/TechBreakfast/events/103533882/
  • Tue. Mar. 12, 2013: Columbia TechBreakfast - Featuring awesome technology companies showcasing their innovations in a demo-format. Presenters for this installment: Thycotic, Gruply, RackTop Systems, MeetLocalBiz, Light Point Security. FREE. Location: Loyola Columbia, Columbia, MD. Register and info at http://www.meetup.com/TechBreakfast/events/97737422/

The Rise of Data Products

by Sean Murphy

I had the great opportunity to present at the kick-off event for the Mid-Maryland Data Science Meetup on "The Rise of Data Products". Below is the talk captured in images and text.

Update: You can also download the audio here and follow along.

Tonight's talk is focused on capturing what I see as a new (or continuing) Gold Rush and could not be more excited about.

Before we can talk about the Rise of Data products, we need to define a Data Product.  Hilary Mason provides the following definition: "a data product is a product that is based on the combination of data and algorithms."  To flesh this definition out a bit, here are some examples.

1) LinkedIn has a well known data science team and highlighted below is one such data product - a vanity metric indicating how many times you have appeared in searches and how many times people have viewed your profile. While some may argue that this is more of a data feature than product, I am sure it drives revenue as you have to pay to find out who is viewing your profile.

2) Google's search is the consumate data product. Take one part giant web index (data) and add it to the page rank algorithm (the algorithms) and you have a ubiquitous data product.

3) Last, and not least, is Hipmunk. This company allows users to search flight data and visualize the results in an easy to understand fashion. Additionally, Hipmunk attempts to quantify the pain entailed by different flights (those 3 layovers add up) into an "agony" metric.

So let's try a slightly different definition - a data product is the combination of data and algorithms that creates value--social, financial, and/or environmental in nature--for one or more individuals.

One can argue that data products have been around for some time and I would completely agree. However, the point of this talk is why are they exploding now?

 I would argue that it is all about supply and demand. And, for this brief 15 minute talk (a distillation of a much longer talk), I am going to constrain the data product supply issue to the availability and cost of the tools required to explore data and the infrastructure required to deliver data products. On the demand side, I am going to do a "proof by example," complete with much arm waving, to show that today's mass market consumers want data.

On the demand side, let's start with something humans have been doing ever since they came down from the trees: running.

With a small sensor embedded in the shoe (not the only way these days), Nike+ collects detailed information about runners and simply cannot give enough data back to its customers. In terms of this specific success as evidence of general data product demand, Nike+ users have logged over 2 billion miles as of 1/29/2013.

As further evidence of mass market data desire, 23and Me has convinced nearly a quarter million people to spit into a little plastic cup, seal it up, mail it off, and get their DNA sequenced. 23and Me then gives back the data to the user in the form of a genetic profile, complete with relative genetic disease risks and clear/detailed explanations of those numbers.

And finally is Google Maps or GPS in general .. merging complex GIS data with sophisticated algorithms to compute optimal pathing and estimated time of arrival. Who doesn't use this data product?

In closing, the case for overwhelming data product demand is strong ::insert waving arms::: and made stronger by the fact that our very language has become sprinkled with quasi stat/math terms.  Who would ever think that pre-teens would talk about something trending?

Let's talk about the supply side of the equation now, starting with the tools required to explore data.

Then: Everyone's "favorite" old-school tool, Excel, costs a few hundred dollars depending on many factors.

Now: Google docs has a spreadsheet where 100 of your closest friends can simultaneously edit your data while you watch in real time.

And the cost, FREE.

Let's take a step past spreadsheets and rapidly prototype some custom algorithms using Matlab (Yes, some would do it in C but I would argue that most can do it faster in Matlab). The only problem here is that Matlab ain't cheap. Beware when a login is required to get even individual license pricing.

Now, you have Python and a million different modules to support your data diving and scientific needs. Or, for the really adventurous, you can jump to the very forward looking, wickedly-fast, big-data ready, Julia. If a scientific/numeric programming language can be sexy, it would be Julia.

And the cost, FREE.

Let's just say you want to work with data frames with some hardcore statistical analyses. For a number of years, you have had SAS, Stata, and SPSS but these tools come at an extremely high cost. Now, you have R. And its FREE.

Yes, an amazing set of robust and flexible tools for exploring data and prototyping data products can now be had for the low, low price of free, which is a radical departure from the days of yore.

Now that you have an amazing result from using your free tools, it is time to tell the world.

Back in the day (think Copernicus and Galileo), you would write a letter containing your amazing results (your data product) which would then take a few months to arrive to a colleague (your market). This was not a scalable infrastructure.


Contemporary researchers push their findings out through the twisted world of peer-reviewed publications ... where the content producers (researchers) often have to pay to get published while someone else makes money off of the work. Curious. More troubling is the fact that these articles are static.

Now, if you want to reach a global audience, you can pick up a CMS like WordPress or a web framework such as Rails or Django and build an interactive application.  Oh yeah, these tools are free.

So the tools are free and now the question of infrastructure must be addressed. And before we hit infrastructure, I need to at least mention that over used buzz word, "big data."

In terms of data products, "big data" is interesting for at least the simple reason that having more data increases the odds of having something valuable to at least someone.

Think of it this way, if Google only indexed a handful of pages, "Google" would never have become the verb that it is today.

If you noticed the pattern of tools getting cheaper, we see the exact same trend with data stores. Whether your choice is relational or NOSQL, big or little-data, you can have your pick for FREE.


With data stores available for the low cost of nothing, we need actual computers to run everything. Traditionally, one bought servers which cost an arm and a leg and don't forget siting requirements and maintenance among other costs. Now Amazon's EC2 and Google Compute Engine allow you to spin up a cluster of 100 instances in a few minutes. Even better, with Heroku, sitting on top of Amazon, you can stand up any number of different data stores in minutes.

Why should you be excited? Because the entire tool set and the infrastructure required to build and offer world-changing data products is now either free or incredibly low cost.

Let me put it another way. Imagine if Ford started giving away car factories, complete with all required car parts, to anyone with the time to make cars!!!!!

Luckily, there are such individuals who will put this free factory to work. These "data scientists" understand the entire data science stack or pipeline. They can by themselves take raw data to a product ready to be consumed globally  (or at least make a pretty impressive prototype). While these individuals are relatively rare now, this state will change. Such an opportunity will draw a flood of individuals, and that rate will only increase as the tools become simpler to use.

Let's make the excitement a bit more personal and go back to that company with a lovable logo, Hipmunk.

If I remember the story correctly, two guys at the end of 2010 taught themselves Ruby On Rails and built what would become the Hipmunk we know and love today.

Two guys.

Three months.

Learned to Code.

And, by the way, Hipmunk has $20.2 million in funding 2 years later!

It is a great time to work with data.