Data Innovation DC

Endgame hosts DIDC's Data and Cyber Security August Event

Our (lucky number) 13th DIDC meetup took place at the spacious offices of Endgame in Clarendon, VA. Endgame very graciously provided incredible gourmet pizza (and beer) for all those who attended.

Beyond such excellent beverages and  food, attendees were treated to four separate and compelling talks. For those of you who could not attend, a little information about the talks and speakers is below (as well as contact information) and the slides!

DIDC Lean Data Product Development with the US Census Bureau - Debrief and Video

Thank you

I want to thank everyone for attending DIDC's May Meetup event, Lean Data Product Development with the US Census Bureau. This was our first attempt at helping bring potential data product needs to our audience and, based on audience feedback, it will not be our last. That being said, we would love your thoughts on how we could further improve future such events.lean_data_product_panel

I want to add a massive thanks not only to our in-person and online panelists, but also to Logan Powell who was a major force in both organizing this event and also acting as the emcee and guiding the conversation.

Video of the Event

If you missed it, a video of the panel and event is available here:

https://www.youtube.com/watch?v=bWWbk5E1Jzg

Information Resources

Finally, below are some follow up information links for those interested.

From Judith K. Johnson, Lead Librarian SBDCNet

From Sara Schnadt

A New Type of Meet Up Event?

Come join us the day after Memorial day for a new type of Meet Up. In the past, Data Innovation DC and Data Community DC have brought in fascinating speakers discussing data products and services that have already been built or data sets that are now available for public consumption. This Tuesday, we are changing things up as part of the National Day of Civic Hacking. Our goal is to have individuals and teams interested in building commercially viable data products attend and listen to experts strongly familiar with data problems that consumers of US Census data are having.  Simply put, we are trying to line up problems that other people (also known as potential customers) will pay to have them solved.  As a massive added bonus, if your team can put something together before the end of next weekend, you may be able to attract national-level press interest.

Some of the bios for our Tuesday Panelists are below. If you are interested in attending for free, please register here.

Andy Hait

Andrew W. Hait serves as the Data Product and Data User Liaison in the Economic Planning and Coordination Division at the U.S. Census Bureau.  With over 26 years of service at the Bureau, Andy oversees the data products and tools and coordinates data user training for the Economic Census and the Census Bureaus other economic survey programs. He also is the lead geographic specialist in the Economic Programs directorate.  Andy is the Census Bureau’s inside man for understanding our customer’s needs.

Judith Johnson (Remote)

Judith K. Johnson joins us from the Small Business Administration-funded Small Business Development Center’s (SBDC) National Information Clearinghouse to as Lead Librarian. She monitors daily incoming operations, provide business information research and review completed research by staff before distribution to SBDC advisors located nationwide.  Ms. Johnson’s also provides preliminary patent or trademark searches and trains staff and SBDC advisors.  She comes to the panel with a strong handle on entrepreneur / business owner data needs.

Matthew Earls,

M.U.R.P., is a GIS Analyst at Carson Research Consulting (CRC). His work primarily revolves around the Baltimore DataMind. Mr. Earls is also responsible for managing social media (e.g., Facebook and Twitter) for the DataMind as well as the DataMind blog. He provides assistance with data visualization and mapping for other CRC projects as needed.

Dr. Taj Carson

The CEO and founder of Carson Research Consulting (CRC), a research and evaluation firm based in Baltimore. Dr. Carson has been working in the field of evaluation since 1997 and specializes in research and evaluation that can be used to improve organizations and program performance. She is also the creator and driving force behind the Baltimore DataMind, an interactive online mapping tool that allows users to visualize various socio-economic data for the Baltimore city at the neighborhood level.

Kim Pierson (remote)

Kim Pierson is a Senior Data Analyst with ProvPlan in Providence, Rhode Island. She has 6 years of experience in data analysis, geospatial information, and data visualization.  She works with community organizations, non-profits, government agencies, and national organizations to transform data into information that supports better decision making, strengthens communities, and a promotes a more informed populace.  She specializes in urban-data analysis including demographic, education, health, public safety, and Census data. She has worked on web-based data and mapping applications including the RI Community Profiles, RI DataHUB, and ArcGIS Viewer for Flex applications. She holds a M.A. degree in Urban and Regional Planning from the University of Illinois.

 

The Pragmatic Hackathon - Lean Customer Development for Data Products with the US Census Bureau

Interested in starting a company? It is summertime, the time for sequels. Our first event with the US Census Bureau was such a success that we are having a follow up event as part of the National Day of Civic Hacking.

In our first Census Event, we had Census data experts come and talk about the data that the US Census Bureau has available and how it could potentially be used to start a company. During this event, it was uncovered that Census has a number of data consumers that have legitimate problems around the Census data that they consume; these companies could use help and this represented a very legitimate business opportunity.

At this event, we are going to bring in actual Census data consumers to discuss their data-related problems. Why? Because customer development and finding the data-product market fit are the hard parts of starting a company. By providing access to potential customers who have very specific problems around open data sets, we are trying to lower the barriers for enterprising individuals and teams to start companies. We sincerely hoping that teams will form to address these issues and potentially commercialize their solutions.

If you want, think of this as a practical hackathon. Instead of spending the weekend building a small application or website or data visualization, spend a few hours understanding real, addressable business problems that can be commercialized. We will leave it to you and your team to build the solution on your own time but will still provide drinks and the pizza.

Oh yeah, time to mention the last carrot we have to dangle. We will be an official part of the National Day of Civic Hacking. That means that those industrious teams that jump in to solve a problem and that can assemble something interesting by the end of the weekend, could get access to national level press.

Questions, please email Sean Murphy through MeetUp.com or via Twitter @SayHiToSean.

Former Obama For America and Living Social Data Scientists Show Off Their Startups - Data Innovation DC - Next Week!

Welcome Back! As a few people have mentioned, DIDC has been missing in action for January and February and, for that, we must apologize. We had an amazing sequence of events planned for the last two months that fell through last minute. Now, however, we are back on track and are excited to bring you our first event of the year featuring amazing talks from Erek Dyskant of BlueLabs, Brian Muller of OpBandit, and more! Not only do we encourage you to check out our Data Innovation DC Meetup, we want you to participate. We pride ourselves in having interactive talks and an extensive period of mingling and discussion both before and after the event. We will have food and drink a plenty and Cooley, LLP offers us a simply incredible space to hold the meetup. Later this spring we will be featuring both Cyber Security and Business Intelligence focused events. If you know of anyone that might be interested in presenting, please let me know at sayhitosean [@] gmail.com.

Event Details

Register Here!

Date: 3/26/2014

6:00 - 6:45pm Networking 6:45 - 6:50pm Introduction 6:50 - 8:00pm Speakers

Speaker Bios

Erek Dyskant

 - is a Co-Founder at BlueLabs, an analytics and technology company dedicated to using data science techniques for improving social good.  Prior to BlueLabs, he was the Technology Lead for Geospatial Analytics at Obama for America. Erek finds that many of the best applications of big data processes not only start with huge datasets, but result in large nuanced predictions such as individual-level behavioral likelihoods, hyperlocal climate change predictions, patient-level treatment recommendations, or locations where election-related violence may (or may not) have occurred.  He works to expand the role of data driven decisions beyond strategic decisions made in boardrooms to tactical ones made by field organizers, case managers, and community health workers.

Brian Muller

 - is the Co-Founder and CTO of OpBandit, a company that helps publishers and marketers increase their online engagement with responsive content delivery. Prior to founding OpBandit, he was the Lead Data Scientist at LivingSocial. While at LivingSocial, he founded the data science team and oversaw the creation and growth of a big data infrastructure and the teams necessary to support it - all while the customer base grew from thousands to over 70 million users. Before that, he worked as the Web Director for Foreign Policy Magazine under the Washington Post. Brian has also worked as an adviser and consultant for a variety of groups including PBS, the Carnegie Foundation, and the State of South Carolina. He as a MS in the Biomedical Sciences, and has spent time in academia working for the Medical University of South Carolina and Johns Hopkins University School of Medicine focused on squeezing meaningful information out of vast quantities of genomic data.

DIDC MeetUp Review - The US Census Bureau Pushes Data

Data Community DC is excited to welcome Andrea to our host of bloggers. Andrea's impressive bio is below and she will be bringing energy, ideas, and enthusiasm to the Data Innovation DC organizational team. IMG_20131112_192600

Census Data is cool?

At least that’s what everyone discovered at last night’s Data Innovation DC's MeetUp. The U.S Census Bureau came in to "reverse pitch" their petabytes of data to a group of developers, data scientists and data-preneurs at Cooley LLP in Downtown DC.

First off, let's offer a massive thanks to the US Census Bureau that sent five of their best and brightest to come engage the community long into the evening and late night hours. Who specifically did they send? Just take a look at the impressive list below:

census_contact

Editor's note - a special thank you to Logan Powell who made this entire event possible.

And they brought the fantastic Jeremy Carbaugh jcarbaugh [at] sunlightfoundation.com from the Sunlight Foundation, a company working on making census data (and other government data) interesting, fun, and mobile. They have this sweet app called Sitegeist. You give it a location and it gives you impressive stats such as the history of the place, how many people are baby making, or just living the bachelor lifestyle; it even connects to Yelp and wunderground too just in case you need the weather and a place to grab a brewski while you’re at it. Further, Eric at the Census bureau made a great point for everyone out there in real estate. You can use this app to show potential buyers how the demographics in the area have changed, good school districts, income levels, number of children per household, etc.. You know you’ll look good whipping out that tablet and showing them ;)

By the way, Sunlight created a very convenient python wrapper for the Census API; you can pip it off of PyPI and check out the source on github here (a round of applause for our sunlight folks!) Did I mention that they are a non-profit doing this with far less funding then many others out there?

Sitegeist is nice but exactly how accessible is the Census Data?  I am glad you asked. The census has two approaches, their American Fact Finder and API, both easy to use. The fact finder is good to just go ahead and peruse what you may find interesting before actually grabbing the data for yourself. The api is like the Twitter version 1 API. You get a key and use stateless HTTP GET requests to pull the data via the web. For those non-api folks, I’ll be posting a how-to shortly.

The census also has their own fun mobile app called Americas Economy.

Alright so we’ve got some data, we’ve got some ways to get it but what’s up with the reverse pitch thing? This was the best part as everyone had awesome ideas and ideations.

Some questions included:

Can we blend WorldBank and Federal Reserve Bank data to get meaningful results?

This came from a guy who was already building some nice apps around WB and Fed data. The general consensus was "yes," a lot of business value can come from that, but they need folks like us to come up with use-cases. So, thoughts? Please comment and tinker away.

What about the geospatial aspects of the data?

There were a lot of questions around the GIS mapping data and some problems with drilling down on the geo-spatial data down to block sizes or very small lots of land. People seem really interested in getting this data for things like understanding how diseases spread, patters of migration etc. The Census folks said that with the longer term surveys you can definitely get down to the block level but, because boundaries and borders can be defined differently across the nation, it is very difficult to normalize the data. Another use-case? A herculean effort? Hmm..food for thought. Also, shortly after the event, someone posted this on geo-normalization in Japan. Thanks Logan!

Editor's note: More information on US Census Grids can be found here.

How does Census data help commercial companies?

There was a great established use case where the Census helped Target Retail understand their demographic. That blew me away. The gov’t and a private retail company working to make a better profit, a better product? This definitely got my creative juices flowing, hopefully it will get everyone out there cogitating too.

https://www.youtube.com/watch?v=jgsdQxTv5kY

or, check out this case study from the National Association of Homebuilders:

http://www.youtube.com/watch?v=CBDmE5Nj0BY

and last but not least, an example of Census data helping disaster relief (not really commercial but Logan didn't get a chance to show all of his videos):

http://www.youtube.com/watch?v=PaEu8-xH9LE

We finally had people talking about the importance of longitudinal studies.

What is different now for our nation in terms of demographics, culture, and geography from 20-30-50 years ago? Just imagine some really cool heat map or time series visualization of how Central Park in NY or Rock Creek in DC has changed…yes I am saying this so someone actually goes out and gives that one a go. Don’t worry you can take the credit ;)

Oh and I almost forgot due to obvious privacy issues a lot of the data is pre-processed so you can’t stalk your ex-boss/boyfriend/girlfriend. But, listen up! If you are in school and doing research and want to get your hands on the microdata, you can apply. Go to this link and check it out (http://www.census.gov/ces/rdcresearch/howtoapply.html). For those of you stuck on a thesis topic in any domain that may need information about society, cough cough, nudge nudge ...

So there you have it, these are the kinds of meetups happening at Data Innovation DC. I don’t know about you, but I definitely have a new perspective on government data. I also feel a little more inclined to open my door when those census folk drop by and give them real answers.

Please comment as you see fit and send me questions.  Also, JOIN Data Innovation DC and check out Data Community DC with all of other related data meetup groups. Let us know what kind of information you want to know about and what issues/topics you want us to address.

I’m new to the blog/review game but will continue to review meetups and some hot topics, podcasts etc. that I think need to be checked out. Let me know if you want me to speak to anything in particular.

Women in Data - A Special Event on Monday, September 23rd

Data Innovation DC is excited to bring you a very special event: Women in Data.  As many Meetup.com members may have noticed, tech-oriented events, including many offered by Data Community DC, tend to have male-dominated audiences. While much has been said about this imbalance in the traditional STEM fields and computer science, not much discussion has occurred in the burgeoning area of data science but there is clearly a need.gender This state of affairs is especially unfortunate as a number of the "feeder" fields--areas of traditional academic focus that provide a great foundation for data science--have near equal numbers of men and women (or even more women). Thus, unlike computer science, there are a ton of talented female practitioners already practicing what many would consider data science but they are not using the name.  Imagine how much stronger our growing data community could be if this group of women that are currently consultants, analysts, economists, epidemiologists, sociologists, political scientists, anthropologists and many more, joined our ranks.

Keeping focused on the solution, Data Innovation DC felt that if we showcased the very impressive data-related accomplishments of three current data practitioners who just happen to be women, we could attract more women to our MeetUp and entreat them to join and improve our community.

The event will take place Monday, the 23rd of September at the fantastic office of Cooley LLP at 1299 Pennsylvania Ave, Washington DC (very metro accessible). The event will start at 6pm with some food and drink and networking with speakers starting close to 7pm. A list of speakers follows and more event details are here; if you have any thoughts or comments, please post them below.

Speakers

Dr. Jane L. Snowdon is Director and Chief Innovation Officer for IBM U. S. Federal Government in Washington D.C.  She has responsibility for defining and driving strategy, and designing new solutions that address client mission requirements through innovation and technology adoption.

Prior to this role, Jane was a Senior Manager and Research Staff Member in the Department of Strategy and Worldwide Technical Operations at the IBM T. J. Watson Research Center in Yorktown Heights, NY.  She was responsible for jointly leading the direction of IBM’s overall Research strategy across twelve global labs and the 2013 Global Technology Outlook.

Jane has been a leader in IBM for developing strategies and driving research efforts worldwide to create innovative solutions for smarter buildings as part of IBM’s Smarter City initiative for which she received an IBM Research Division Outstanding Technical Achievement Award.  Jane was instrumental in defining a partnership with Columbia, CUNY, and NYU for research collaboration, which was announced by Mayor Bloomberg, to help address New York City’s energy challenges.  She initiated and successfully completed two projects on energy analytics for large portfolios of buildings with 1,200 K-12 public schools in a major city and a university campus of 60 buildings covering a combined 150 million square feet.

Dr. Marie desJardins is a Professor in the Department of Computer Science and Electrical Engineering at the University of Maryland, Baltimore County, where she has been a member of the faculty since 2001. Her research is in artificial intelligence, focusing on the areas of machine learning, multi-agent systems, planning, interactive AI techniques, information management, reasoning with uncertainty, and decision theory.

Dr. desJardins received her Ph.D. in 1992 from the University of California, Berkeley. Her dissertation research focused on developing techniques to allow autonomous agents to learn useful models of their environments without being supervised by a human designer. From 1991-2001, she was a member of the research staff at SRI International in Menlo Park, California.

Dr. desJardins has published over 100 scientific papers in journals, conferences, and workshops. She is an Associate Editor of the Journal of Artificial Intelligence Research and the Journal of Autonomous Agents and Multi-Agent Systems, a member of the editorial board of AI Magazine, and the Program Cochair for AAAI-13. She is an ACM Distinguished Member and a AAAI Senior Member.

Dr. desJardins is active in teaching, mentoring, and advising students. She was named one of UMBC's ten "Professors Not to Miss" in 2011, and is regularly sought out to give invited talks to student groups. She has been involved with the AAAI/SIGART Doctoral Consortium as chair, co-chair, mentor, and/or reviewer from 2001-2012. She has advised 11 Ph.D. students (10 to completion), 25 M.S. students (22 to completion), over 50 undergraduate researchers, and four high school student interns.

Melinda Wittstock is a serial entrepreneur, media-tech executive, ‘recovering’ journalist and evangelist on all things data-driven, mobile and social. Her passion is cracking the code of 'what' and 'who' you can trust in the fast-flowing torrent of social data. Melinda is replicating with algorithms and processes what she did as an award-winning investigative journalist to be first and right: Verifeed is all about detecting patterns to put data in context, qualifying source reputation and filtering for relevance and impact.  She began her quest by founding NewsiT, which has grown from its original vision as a crowd-sourcing mobile app into the ever more sophisticated Verifeed social intelligence platform for filtering real-time insights, actionable information and credible influencers.

MW web photo smiling 2 compressedMelinda thrives on putting the seemingly impossible into practice, a trait first recognized by her grandmother who called her “disruptive” at age 5 when she knocked on doors to secure $100 in pre-payments for her first business. Over the years since she’s developed interactive websites and tools, driven ratings and traffic, created award-winning content, developed new models for news and content creation, managed teams large and small, sold content and advertising, and won awards for her print and broadcast journalism. At 22 she was busy breaking big stories as a business and media correspondent of the Times of London, and never looked back – whether anchoring, producing or creating programming for Financial Times/CNBC Europe, BBC World TV, and ABC World News Now. In 2002, she launched Capitol News Connection, an award-winning independent production company supplying public radio stations, TV outlets and newspapers with ‘localized’ news from Congress. Melinda is also a mom of two great young kids (and a golden retriever) and she’s become an acolyte of what can only be described as the agile methodology of ‘work-life’ integration.

You can also find her on LinkedIn Twitter @NewsiTnews | @veriate and Facebook

Building a Business on Open Government Data

hordThis is a guest post featuring an impressive data-oriented startup in the DC Area. GovTribe is a small, DC-area startup with a big goal: we want to impact the world of federal government contracting through smart, accessible technology. While perhaps not the sexiest of industries to focus on, government contracting is a roughly $500 billion market. Supported by woefully antiquated (and bafflingly expensive) technology, it also relies heavily on open government data. To us, this creates an environment primed for disruption. Given current trends in data-driven decision-making and technology consumerization, how could such a market remain unchallenged by scrappy startups like us?

The founders of GovTribe spent nearly a decade in the business of federal government contract services. Working for a big four management consulting firm, we toiled away in the daily pursuit and execution of government contracts. Our experiences, both bad and good, helped to inform our first product: hord.  In short, the hord iPhone app provides an easy, affordable, and portable method for quickly understanding what is happening in the world of government contracting. We move beyond the walled gardens of protected email distribution lists and expensive websites. With contract data from 130 federal agencies, hōrd enables users to subscribe to real time government contract activity feeds. And we make all of this natively available on a mobile device for the first time.

In order to deliver this service, we mine and process a significant amount of open government data from multiple sources. Through a combination of proprietary code, helpful third-party services, and incalculable hours in an Arlington, VA basement we parsed and added value to free, publicly available government data. And response from the industry is good.

GovTribe recently attended a Meetup hosted by Data Innovation DC that focused on creating value from government data. It was great to hear from other entrepreneurs also proceeding into the open data market. They're taking the leap that value can be delivered on top of free government data, and that viable business models can be developed around that concept. As the collective open government data set continues to grow, this trend will and must continue. In light of this, we thought we'd share a few lessons that informed our product vision and market approach.

  • The Data is Secondary to the Problem Being Solved. A lot of government data has been made public. Now what? How can it be incorporated into current decision-making or processes? What new activities can be enabled or feats accomplished that were not possible before? Simple access to the data, even with the slickest of interfaces, will be of little use to the majority of consumers. Action-oriented services supported by open government data is where the real value lies. What problems do you want to address?
  • Customers Don’t Care That You Mine Open Government Data. Save the “built on open government data” pitch for the VCs. Customers only care about how your product makes their lives easier and more efficient. Early on, the open government data angle was a significant aspect of our branding. We were proud of our accomplishments in the challenging world of semi-structured and unstructured data and thought our customers would care. They didn't, so much.
  • You Don’t Have to Be a Data Scientist Or Programmer to Do This. Perhaps you have an interesting idea that could be brought to life by remixing open government data but don't consider yourself to be a developer or a data wonk. Some might see this as a reason to shelve that idea and go back to watching season two of Game of Thrones (again).  Fear not. The amount of freely available online resources, technologies, and machine-readable (trust us, this matters) data sources grows exponentially by the day. Sure, the learning curve may seem a bit steep, but with a bit of work and persistence anyone can get into the game. And frankly, there's plenty of room on the field.

GovTribe has big plans and it includes going deeper into the data. We will soon be looking for folks who are interested in data science, software development, and the promise of open government data.  If you are interested in what we do and want to learn more, drop us a line over your favorite social media platform or send us an email at whatshappening@govtribe.com. We believe there is a bright future ahead for the data community writ large and are doing our best to be a part of it.

 

GovTribe is an Arlington, Virginia-based firm that specializes in turning open government data into products real people can use. Incorporated in 2012 by three former Big Four government consulting alums, the GovTribe team strives to improve the enterprise IT experience through business applications built and priced for the end user.

 

 

ConnecTech, the DC Goverment, Small Business Innovation Research, and Data Community DC

One way that smaller and startup firms can become more specialized is to enhance their service offerings through research and development (R&D). While R&D typically requires internal investment, the federal government has a program in place that will award grants or contracts to small businesses to pursue R&D efforts on its behalf. The Small Business Innovation Research (SBIR) program is a highly competitive program that encourages domestic small businesses to engage in Federal Research/Research and Development (R/R&D) that has the potential for commercialization.

How It Works

The SBIR program awards contracts or grants in three phases to small businesses. Phase I is typically an award of $150,000 for twelve months to establish the technical merit, feasibility, and commercial potential of the proposed R/R&D effort. Phase II is typically an award of $1,000,000 for two years to continue the R/R&D efforts initiated in Phase I, usually toward a refined prototype. Phase III is full commercialization. The SBIR program does not fund Phase III work. However, for some federal agencies, Phase III may involve continuing, non-SBIR funded R&D or production contracts for products, processes or services intended for use by the U.S. Government.

ConnecTech

The District of Columbia recently began a new program called ConnecTech that aims at engaging more District businesses with the SBIR program through a variety of offerings. Primarily, ConnecTech will provide training to entrepreneurs and companies interested in the SBIR program. The training sessions are focused on topics that will help better position firms for a successful Phase III transition even before submitting the Phase I bid. This includes teaching companies how best to identify topics that are likely to be transitioned to Phase III and selecting the right partners for the R&D effort.

Still Not Convinced?

Ultimately, with groups like ConnecTech offering SBIR support, there is little reason not to participate in the program. In addition to commercialization potential, here are three more reasons why your firm should consider the SBIR program:

  1. Non-Dilutive Capital: For startups and small companies taking on additional capital can mean dilution. SBIR funding is neither equity nor debt, so it is an excellent vehicle for companies to raise capital and further validate their business model.
  2. Reduced Overhead: SBIR efforts require a Principal Investigator (PI) to lead the research. Some firms believe this must be a person with an academic background and PhD; however that is not the case. Many times a firm’s Chief Technology Officer or a Sr. Engineer can serve as PI and have some of their hours allocated to the SBIR project instead of another cost center such as overhead.
  3. New Client Relationships and Commercial Work: SBIR projects can represent a way into new clients without the long window that is typically required. Additionally, a requirement of most SBIR bids is a commercialization plan that is focused on the private sector. As a part of developing the plan, the company will be creating a pathway to private sector business that can be expanded and create differentiation away from the federal market.

Topics for Data Community DC

Here is a list of current topics that are currently open from various agencies that may be interesting to Data Community DC blog readers:

The SBIR program leverages the agility and creativity of America’s small businesses and fosters the kind of innovation that business requires. It provides a pathway that allows entrepreneurial firms to create their next opportunity and cultivates a corporate culture focused on outside of the box thinking. With so many resources available to provide SBIR support, more and more companies are beginning to understand the full capabilities the program offers and yours should as well.

Please feel free to contact Philip Reeves, Manger of Small Business Technology and Innovation at DC Department of Small and Local Business Development with questions: http://dslbd.dc.gov/connectech.

 

 

Why Aren't There More Open Data Startups?

This post is a guest reblog (with permission original 1/19/2011) by Tom Lee, the Director of Sunlight Labs and recent speaker at Data Innovation DC. It's a question I'm seeing asked more and more: by press, by Gov 2.0 advocates, and by the online public. Those of us excited by the possibilities of open data have promised great things. So why is BrightScope the only government data startup that anyone seems to talk about?datagov  I think it's important that those of us who value open data be ready with an answer to this question. But part of that answer needs to address the misperceptions built into the query itself.

There Are Lots of Open Data Businesses

BrightScope is a wonderful example of a business that sells services built in part on publicly available data. They've gotten a lot of attention because they started up after the Open Government Directive, after data.gov -- after Gov 2.0 in general -- and can therefore be pointed to as a validation of that movement.

But, if we want to validate the idea of public sector information (PSI) being useful foundations for businesses in general, we can expand our scope considerably. And if we do, it's easy to find companies that are built on government data: there are databases of legal decisionsdatabases of patent information,medicare data, resellers of weather databusiness intelligence services that rely in part on SEC data, GIS products derived from Census data, and many others.

Some of these should probably be free, open, and much less profitable than they currently are*. But all of them are examples of how genuinely possible it is to make money off of government data. It's not all that surprising that many of the most profitable uses of PSI emerged before anyone started talking about open data's business potential. That's just the magic of capitalism! This stuff was useful, and so people found it and commercialized it. The profit motive meant that nobody had to wait around for people like me to start talking about open formats and APIs. There are no doubt still efficiencies to be gained in improving and opening these systems, but let's not be shocked if a lot of the low-hanging commercial fruit turns out to have already been picked.

Still, surely there are more opportunities out there. A lot of new government data is being opened up. Some of it must be valuable... right?

Government Does What The Market Won't

Well, sure. Much of it is extremely valuable. But it may not be valuable to entrepreneurs. To understand why, we need to get a little philosophical. What does government do? It provides public goods: things of value that the market is not able to adequately supply on its own. A standing army and public schools and well-policed streets and clean water are all things that are useful to society as a whole, but which the market can't be relied upon to provide automatically. So we organize government as a structure that can provide those kinds of things, and which will make sure that everyone can benefit from them in a way that's fair.

These are not ideal conditions under which to start a business: the fact that the government is the one collecting a particular type of data may mean that no one is interested in buying it -- a natural market for the data doesn't exist in the way that it does for, say, sports scores or stats about television viewership. And, even if you create a business that takes advantage of the subsidy represented by government involvement (data collected at taxpayer expense, resold at low, low prices!), your long-term prospects may still be poor since there's no way to deny competitors access to the same subsidy**. Someone else can come along and undercut you, and there's nothing you can do about it except be better and cheaper. That's great for the consumer, but not so great for people hoping to start a lucrative business. (Those who think BrightScope is a counterexample should have a closer look at their about page: they utilize a mix of public data, data that they laboriously capture themselves, and data bought from subscription services.)

Data's Real Value Can Be Hard To Measure

I'll be glad to see more open data startups -- and to be clear, I think we will see more. But the open data movement will be important regardless of whether any IPOs come out of it.

There are lots of types of value that are difficult to measure. If the IRS puts forms online, taxpayers have to spend less time waiting in line at the post office. If Census data reveals where a retailer's new store should go, it can mean profits for shareholders and more jobs for the community. If scientific data's openness allows more researchers to engage with a question, it can lead to better conclusions, better policies and better outcomes. If regulatory data about companies is public, it can give firms an incentive to self-police and help markets price things correctly.

All of these are real benefits, but they can be difficult or impossible to calculate -- and tough for a startup to monetize. Still, this is where I think the really exciting benefits to open data are likely to be found. If government data helps entrepreneurs make money, that's great. If it makes our country work better, that's fantastic.

* Historically, many gov data vendors have made money off of the data's artificial scarcity -- a legacy that we must unravel, even though doing so will be politically difficult: openness's benefits to the public will probably mean less revenue for the vendors.

** There shouldn't be, anyway -- in practice, public/private partnerships often fall short of this goal.