Selling Data Science: Validation

FixMyPineapple2 We are all familiar with the phrase "We can not see the forest for the trees", and this certainly applies to us as data scientists.  We can become so involved with what we're doing, what we're building, the details of our work, that we don't know what our work looks like to other people.  Often we want others to understand just how hard it was to do what we've done, just how much work went into it, and sometimes we're vain enough to want people to know just how smart we are.

So what do we do?  How do we validate one action over another?  Do we build the trees so others can see the forrest?  Must others know the details to validate what we've built, or is it enough that they can make use of our work?

We are all made equal by our limitation to 24 hours in a day, and we must choose what we listen to and what we don't, what we focus on and what we don't.  The people who make use of our work must do the same.  John Locke proposed the philosophical thought experiment, "If a tree falls in the woods and no one is around to hear it, does it make a sound?"  If we explain all the details of our work, and no one gives the time to listen, will anyone understand?  To what will people give their time?

Let's suppose that we can successfully communicate all the challenges we faced and overcame in building our magnificent ideas (as if anyone would sit still that long), what then?  Thomas Edison is famous for saying, “I have not failed. I've just found 10,000 ways that won't work.”, but today we buy lightbulbs that work, who remembers all the details about the different ways he failed?  "It may be important for people who are studying the thermodynamic effects of electrical currents through materials." Ok, it's important to that person to know the difference, but for the rest of us it's still not important.  We experiment, we fail, we overcome, thereby validating our work because others don't have to.

Better to teach a man to fish than to provide for him forever, but there are an infinite number of ways to successfully fish.  Some approaches may be nuanced in their differences, but others may be so wildly different they're unrecognizable, unbelievable, and beg for incredulity.  The catch is (no pun intended) methods are valid because they yield measurable results.

It's important to catch fish, but success is not consistent nor guaranteed, and groups of people may fish together so after sharing their bounty everyone is fed.  What if someone starts using this unrecognizable and unbelieveable method of fishing?  Will the others accept this "risk" and share their fish with those who won't use the "right" fishing technique, their technique?  Even if it works the first time that may simply be a fluke they say, and we certainly can't waste any more resources "risking" hungry bellies now can we.

So does validation lie in the method or the results?  If you're going hungry you might try a new technique, or you might have faith in what's worked until the bitter end.  If a few people can catch plenty of fish for the rest, let the others experiment.  Maybe you're better at making boats, so both you and the fishermen prosper.  Perhaps there's someone else willing to share the risk because they see your vision, your combined efforts giving you both a better chance at validation.

If we go along with what others are comfortable with, they'll provide fish.  If we have enough fish for a while, we can experiment and potentially catch more fish in the long run.  Others may see the value in our experiments and provide us fish for a while until we start catching fish.  In the end you need fish, and if others aren't willing to give you fish you have to get your own fish, whatever method yields results.

National Day of Civic Hacking Events in DC, MD, and VA

We've written before about Hackathons and Data Dives. Now, I'd like to bring your attention to a coordinated set of events happening the weekend of June 1st, called the National Day of Civic Hacking. The official web site describes it as:

A National Event that will take place 06/01-02/2013 and will bring together citizens, software developers, and entrepreneurs across the nation to collaboratively create, build and invenusing publicly-released data, code and technology to solve challenges relevant to our neighborhoods, our cities, our states and our country.

There are national challenges, mostly from Federal agencies looking for citizen help in using technology to solve problems. To take the first example from the Challenges list, the new Consumer Financial Protection Bureau is looking for teams to create tools that leverage their data of consumer complaints about financial products and companies. And there are local projects too, using data and problems posed by local governments and civic organizations.

The DC event, called The DC Hack for Change Day, will primarily be about local problems and local data. Coordinated by Code for DC, the local brigade of Code for America, developers, data people, and domain experts will be working on topics including the DC legal code, DC K-12 education data, and DC budget data. (I'm coordinating the education data project.) The event is filled up, but I'd encourage people interested in participating in this sort of thing to join the waitlist, then join Code for DC and help to continue these efforts in the weeks and months ahead. (One of the goals of hackathons, after all, is to energize people to work on projects that will grow over time!)

For people outside the district, or who heard about this too late to join the DC event, there are several other outstanding events nearby:

These are all amazing opportunities to get involved with important problems, meet others, learn a lot, and start to build analyses and solutions that could make a difference! We hope that you can attend!

Hackathons and DataDives

The Data Events DC calendar currently shows three Hackathons and DataDives over the next few months, and at least one other will be posted shortly. But what is a Hackathon or a DataDive? How are they different? And why would a data professional be interested? Hackathons are intense events, usually held over several days, where people try to creatively solve problems and prototype products, often software products. They usually have stakeholders or sponsors with problems to solve and prize money to award. Participants see presentations about the event's goals, then self-organize into teams to work. After a couple of days (and often nights) of work, fueled typically by caffeine and pizza, the teams present their work and a winner is announced.

DataDives are similar, but are focused on analysis of data rather than development of products. The term was coined by the folks at DataKind a few years ago when they were trying to distinguish their events, where statisticians and data scientists team up to work pro bono on nonprofit data, from often entrepreneurial Hackathons.

So why would you, as a data professional, want to lose an entire weekend to one of these? Here are a few reasons:

  • DataDives for nonprofits and Open Data Hackathons for the public sector are a great way to give back to the community. You have great skills -- put them to use for more than just a paycheck.
  • These events are insanely good networking opportunities. You won't just be swapping business cards, you'll be working side-by-side with people in different industries, with different skill sets, in a way that's otherwise impossible.
  • You'll learn a lot -- about technology, about data, about the problem domain. Expect to have new tools in your toolkit by the end of the weekend.
  • The experience of creatively exploring a problem or a data set in a team environment, with intense time pressure, can be very fun and rewarding on its own.
  • Projects don't always just last 36 hours. Many Hackathons turn into real products, commercial or open-source. And many DataDives turn into positive long-term engagements with nonprofits. Check out the amazing results that started at a DC DataDive last year!
  • Adrenaline and caffeine are great drugs. Free pizza/food.
  • Who knows -- you might win something!

So what's coming up?

Next weekend, January 26th-27th, is a Hackathon focused around issues of domestic violence in Central America. Many of the needs are around technology and software development, but there are some data-focused projects as well. The event is at sites across Central America, as well as here in DC, at the World Bank.

The Bicoastal Datafest, February 2nd-3rd, is organized in part by the Sunlight Foundation, a DC nonprofit focused on government transparency. Participants will dive into campaign contribution data to suggest ways of understanding and communicating the role of money in politics. The DataDive is in NYC and at Stanford, or virtual participants are welcome. There's plenty of time for a DC-based team to find nonvirtual space too -- contact us if you'd like to organize this!

Open Data Day is an international hackathon around government data on Saturday, February 23rd, with events held around the world. The DC event is currently sold out, but the organizers are looking for a larger venue where they'll be able to accommodate the amazing number of interested people. Registration is once again open for the DC event!

Check back soon for an announcement around another DC-centric DataDive, this one to be held in March, dealing with international development data! And even more hackathons are expected in coming months. Subscribe to this blog or our events calendar, or follow DC2 on Twitter to be sure not to miss any announcements.

Had a good experience around a Hackathon or DataDive? Know of an event we don't yet have listed? Please post a comment!

Hack//Meat or Data Community DC Members go to NYC

This past weekend, five intrepid souls (Robert Vesco, Valerie Coffman, Harlan Harris, Octavian Geagla, and ) from Data Community DC converged on New York City for the first Hack//Meat.

So, what is Hack//Meat? An effort to bring

"together technologists, entrepreneurs, creatives, policy experts, non-profit leaders and industry executives to develop technologies and tools that help democratize meat. Over the course of the weekend, “steakholders” will work with teams to rapidly prototype innovative solutions to business or consumer education challenges in the way meat is produced, processed, distributed, sold and consumed."

From Friday night to Sunday at 4pm, the team slaved away (with special technical awards going to Octavian and Valerie), first brainstorming solutions to the problem of data communication and then building out a prototype, HashMeat.org. Overall, we left NYC well stuffed with meat and having had a fantastic time. Keep reading if you would like a few more details on our project.

The Problem: Meat Monopolies

While everyone knows about the obvious monopolies (duopolies or oligopolies) such as Comcast and Verizon in the cable industry, many may not know of the radical consolidation that has occurred in the meat industry. The team's challenge was to unlock public policy results and data too often buried in PDF files and rarely-read glossy reports.

The Solution: #HashMeat

Telling a story to motivate activism is crucial. Transitioning that story telling from the cramped confines of PDFs and Powerpoints to the dynamic possibilities of the web and low-barrier activism is the key to kickstarting viral loops via social networks.

HashMeat.org is an HTML 5 website powered by JQuery and Twitter that provides a framework for rapidly deploying relevant infographics while drastically reducing the friction for engaged citizens to challenge their government representatives to affect change.

Using HTML 5's geolocation functionality, #HashMeat uses your latitude and longitude (and Sunlight Lab's services) to identify appropriate Senators and Representatives. With each unique infographic, #HashMeat allows the civic minded viewer to tweat context-sensitive tweets to these representatives, calling for change. Instrumental to HashMeat is the leaderboard (still under develoment) that visualizes the individual's influence via the use of a global leaderboard.

The Future:

This advocacy story is only the beginning. The site serves as a possible design concept for different advocacy movements without a voice. The gaps this solution has helped to bridge can serve as an example to other campaigns that are struggling to maintain a social presence or lack provisions with which their audience can take meaningful political action.