Event Recap: Tandem NSI Deal Day (Part 2)

This is the second part of a guest post by John Kaufhold. Dr. Kaufhold is a data scientist and managing partner of Deep Learning Analytics, a data science company based in Arlington, VA. He presented an introduction to Deep Learning at the March Data Science DC Meetup. Tandem NSI is a public-private partnership between Arlington Economic Development and Amplifier Ventures. According to the TNSI website, the partnership is intended to foster a vibrant technology ecosystem that combines entrepreneurs, university researchers and students, national security program managers and the supporting business community. I attended the Tandem NSI Deal Day on May 7; this post is a summary of a few discussions relevant to DC2.

In part one, I discussed the pros and cons of starting a tech business in the DC region; in this post, I'll discuss the specific barriers to entry of which entrepreneurs focusing on obtaining federal contractors should be aware when operating in our region, as well as ideas for how interested members of our community can get involved.

Barriers to innovation and entrepreneurship for federal contractors

One of the first talks of the day came from SpaceX's Deputy General Counsel, David Harris. It captured in one slide an issue all small technology companies operating in the federal space face, namely the FAR (Federal Acquisition Regulations). Specifically, David simply counted the number of clauses in different types of contracts, including standard Collaborative Research And Development Agreements, Contract Service Level Agreement Property Licenses, SpaceX's Form LSA, and a consumer-off-the-shelf procurement contract. The number of clauses is generally 12 to 27 in each of these contracts. As a bottom line, he compared these to the number of clauses in a Traditional FAR-fixed-price with one cost-plus Contract Line Item Number: more than 200 clauses. In discussion, there was even a suggestion that the federal government might want to reexamine how it does business with smaller technology companies to encourage innovators to spend time innovating rather than parsing legalese. The tacit message was the FAR may go too far. Add to the FAR the requirements of the Defense Contract Audit Agency and sometimes months-long contracting delays, and you have created a heavy legal and accounting burden on innovators.

Peggy Styer of Blackbird also told a story about how commitment to mission and successful execution for the government can sometimes narrow the potential market for a business. A paraphrase of Peggy's story: It's good to be focused on mission, but there can be strategic conflict between commercial and government success. As an example, when they came under fire in theatre, special ops forces were once expected to carry a heavy tracking device the size of a car battery and run for their lives into the desert where a rescue team could later find and retrieve them. Blackbird miniaturized a tracking device with the same functionality, which made soldiers on foot faster and more mobile, improving survivability. The US government loved the device. But they loved it so much they asked Blackbird to sell to the US government exclusively (and not to commercialize it for competitors). This can put innovators for the government in a difficult position with a smaller market than they might have expected in the broader commercial space.

Dan Doney, Chief Innovation Officer at the Defense Intelligence Agency described a precedent “culture" of the “man on the moon” success that was in many ways a blueprint for how research is still conducted in the federal government. Specifically, putting a man on the moon was a project of a scale and complexity only our coordinated US government could manage in the 1960s. To accomplish the mission, the US government collected requirements, matched requirements with contractors, and systematically filled them all. And that was a tremendous success. However, almost 50 years later, a slavish focus on requirements may be the problem, Dan argued. Dan described "so much hunger” to solve mission-critical problems by our local innovative entrepreneurs that in order to exploit it, the government needs to eliminate the “friction” from the system. Dan argued eliminating that “friction” has been shown to get enormous results faster and cheaper than traditional contracting models. He continued: "our innovation problems are communication problems," pointing out that Broad Area Announcements -- how the US govt often announces project needs--are terrible abstractions of problems to be solved. The overwhelming jumble of legalese that has nothing to do with technical work was also discussed as a barrier for technical minds—just finding the technical nugget the BAA is really asking for is an exhausting search across all the fedbizops announcements.

A brief discussion of how contracts can become inflexible handcuffs that focus contractors on “hitting their numbers” on the tasks a PM originally thought they should solve at the time of contracting, while in the course of a program it may even become clear a contractor should now be solving other, more relevant problems. In essence, contractors are asked to ask and answer relevant research questions, and research is executed with contracts, but those contracts often become counterproductively inflexible for asking and answering research questions.

What can DC2 do?

  1. I only recognized three DC2 participants at this event. With a bigger presence, we could be a more active and relevant part of the discussion on how to incentivize government to make better use of its innovative entrepreneurial resources here in the DMV.
  2. Deal Day provided a forum to hear from both successful entrepreneurs and the government side. These panels documented some strategies for how some performers successfully navigated those opportunities for their businesses. What Deal Day didn’t offer was a chance to hear from small innovative startups on what their particular needs are. Perhaps DC2 could conduct a survey of its members to inform future Tandem NSI discussions.

Event Recap: Tandem NSI Deal Day (Part 1)

This is a guest post by John Kaufhold. Dr. Kaufhold is a data scientist and managing partner of Deep Learning Analytics, a data science company based in Arlington, VA. He presented an introduction to Deep Learning at the March Data Science DC Meetup. Tandem NSI is a public-private partnership between Arlington Economic Development and Amplifier Ventures. According to the TNSI website, the partnership is intended to foster a vibrant technology ecosystem that combines entrepreneurs, university researchers and students, national security program managers and the supporting business community. I attended the Tandem NSI Deal Day on May 7; this post is a summary of a few discussions relevant to DC2.

The format of Deal Day was a collection of speakers and panel discussions from both successful entrepreneurs and government representatives from the Arlington area, including:

  • Introductions by Arlington County Board Chairperson, Jay Fisette, and Arlington House Representative Jim Moran;
  • Current trends in mergers and acquisitions and business acquisitions for national security product startups;
  • “How to Hack the System,” a discussion with successful national security product entrepreneurs;
  • “Free Money,” in which national security agency program managers told us where they need research done by small business and how you can commercialize what you learn; and
  • “What’s on the Edge,” in which national security program managers told us where they have cutting edge opportunities for entrepreneurs that are on the edge of today’s tech, and will be the basis of tomorrow’s great startups.

There were two DC2-relevant themes from the day that I’ve distilled: the pros and cons of starting a tech business in the DC region, and the specific barriers to entry of which entrepreneurs focusing on obtaining federal contracts should be aware when operating in our region. This post will focus on the first theme; the second will be discussed in Part 2 of the recap, later this week.

Startups in the DC Metropolitan Statistical Area vs. “The Valley”

A lot of discussion focused on starting up a tech company here in the DC MSA (which includes Washington, DC; Calvert, Charles, Frederick, Montgomery and Prince George’s counties in MD; and Arlington, Fairfax, Loudoun, Prince William, and Stafford counties as well as the cities of Alexandria, Fairfax, Falls Church, Manassas and Manassas Park in VA) versus the Valley. Most of the panelists and speakers had experience starting companies in both places, and there were pros and cons to both. Here's a brief summary in no particular order.

DC MSA Startup Pros

  • Youth! According to Jay Fisette, Arlington has the highest percentage of 25-34 year olds in America.
  • Education. Money magazine called Arlington is the most educated city in America.
  • Capital. The concentration of many high-end government research sponsors--the National Science Foundation, Defense Advanced Research Projects Agency, Intelligence Advanced Research Projects Agency, the Office of Naval Research, etc.--can provide early-stage, non-dilutive research investment.
  • Localized impact. Entrepreneurial aims are often US-centric, rather than global.
  • A mission-focused talent pool.
  • A high concentration of American citizens and cleared personnel.
  • Local government support. As an example, initiatives like ConnectArlington provide more secure broadband for Arlington companies.

DC MSA Startup Cons

  • Localized impact. Entrepreneurial aims are often US-centric, rather than global. (Yes, this appears on both lists!)
  • Heavy regulations. Federal Acquisition Regulations (FAR) and Defense Contract Audit Agency accounting requirements can complicate the already difficult task of starting a business.
  • Bureaucracy. It’s DC. It’s a fact.
  • Extremely complex government organization with significant personnel turnover.
  • Less experienced “product managers.”

Silicon Valley Startup Pros

  • Venture capitalists and big corporations are “throwing money at you” in the tech space.
  • Plenty of entrepreneurial breadth.
  • Plenty of talent in productization.
  • Plenty of experience in commercial projects.
  • Very liquid and competitive labor market--which is great for individual employees.
  • Aims are often global, rather than US-centric.
  • Compensation is unconstrained by government regulation.
  • Great local higher education infrastructure: Berkeley, UNSF, National Labs, Stanford...

Silicon Valley Startup Cons

  • Very liquid and competitive labor market--which means building a loyal, talented team can be a struggle.
  • VCs and big corporation investments are unsustainably frothy.
  • Less talent in or exposure to federal contracting.
  • A smaller pool of American citizens and cleared personnel.

Check back later this week to find out what TNSI Deal Day panelists had to say about stumbling blocks to obtaining federal contracts!

Energy Education Data Jam

This is a guest post by Austin Brown, co-organizer of the Data Jam, and senior analyst with the National Renewable Energy Laboratory. DC2 urges you to check this out. (And if you participate, please let us know how it goes!) The Office of Energy Efficiency and Renewable Energy (EERE) at the U.S. Department of Energy (DOE) is hosting an “Energy Education Data Jam,” which will take place on Thursday, March 27, 2014, from 9am to 4pm, in Washington D.C. This is an event that could really benefit from the participation of some more talented developers, data folks, and designers.

Features presentations from a great set of experts and innovators: Aneesh Chopra, former White House CTO; Dr. Ed Dieterle, Gates Foundation; Dr. Jan DeWaters, Clarkson University; Dr. Cindy Moss, Discovery Education; Diane Tucker, Wilson Center

In the growing ecosystem of energy-related data jams and hackathons, this one will be distinct in that it is targeted toward improving the general understanding of the basics of energy in the U.S., which we have identified as a key obstacle to sensible long-term progress in energy.  We hope that what emerges from this data jam will be applicable to learners of any age – from preschool to adult learners.

EERE is working to amplify our approach to help improve energy understanding, knowledge, and decision-making. To address the measured gap in America's energy literacy, we plan to unite energy experts with the software, visualization, and development communities. This single-day event will bring developers and topic experts together with the goal of creating innovative products and partnerships to directly address energy literacy going forward.

The goal of the data jam is to catalyze development of tools, visualizations, and activities to improve energy literacy by bringing together:

•         Developers and designers who understand the problems presented by the energy literacy gap, and have a desire to bring about change

•         Educators with knowledge of how students learn, how energy is taught, and ideas about how we can bridge the energy literacy gap

•         Energy experts with a high-level understanding of the energy economy and who are capable of deconstructing complicated energy data

•         Energy foundations and nonprofits committed to clean energy and an understanding that education can be the first step towards a clean energy economy

No prior experience in energy education is required – just an innovative mindset and a readiness to try to change the thinking on spreading the word about energy.

If you have any questions or would like to RSVP, please send an email to You can also RSVP through Eventbrite. This event will strive for participation from a number of different backgrounds and expertise and, as such, space will be limited.  We ask that you kindly respond as soon as possible. Lunch will be provided.

August 2013 Data Science DC Event Review: Confidential Data

This is a guest post by Brand Niemann, former Sr. Enterprise Architect at EPA, and Director and Sr. Data Scientist at Semantic Community. He previously wrote a blog post for DC2 about the upcoming SOA, Semantics, and Data Science conference. The August Data Science DC Meetup provided the contrasting views of a data scientist and a statistician to a controversial problem about the use of "restricted data".

Open Government Data can be restricted because of the Open Data Policy of the US Federal Government as outlined at

  • Public Information: All datasets accessed through are confined to public information and must not contain National Security information as defined by statute and/or Executive Order, or other information/data that is protected by other statute, practice, or legal precedent. The supplying Department/Agency is required to maintain currency with public disclosure requirements.
  • Security: All information accessed through is in compliance with the required confidentiality, integrity, and availability controls mandated by Federal Information Processing Standard (FIPS) 199 as promulgated by the National Institute of Standards and Technology (NIST) and the associated NIST publications supporting the Certification and Accreditation (C&A) process. Submitting Agencies are required to follow NIST guidelines and OMB guidance (including C&A requirements).
  • Privacy: All information accessed through must be in compliance with current privacy requirements including OMB guidance. In particular, Agencies are responsible for ensuring that the datasets accessed through have any required Privacy Impact Assessments or System of Records Notices (SORN) easily available on their websites.
  • Data Quality and Retention: All information accessed through is subject to the Information Quality Act (P.L. 106-554). For all data accessed through, each agency has confirmed that the data being provided through this site meets the agency's Information Quality Guidelines.
  • Secondary Use" Data accessed through do not, and should not, include controls over its end use. However, as the data owner or authoritative source for the data, the submitting Department or Agency must retain version control of datasets accessed. Once the data have been downloaded from the agency's site, the government cannot vouch for their quality and timeliness. Furthermore, the US Government cannot vouch for any analyses conducted with data retrieved from

Federal Government Data is also governed by the Principles and Practices for a Federal Statistical Agency Fifth Edition:

Statistical researchers are granted access to restricted Federal Statistical and other data on condition that their public disclosure will not violate the laws and regulations associated with these data, otherwise the fundamental trust involved with the collection and reporting of these data is violated and the data collection methodology is compromised.

Tommy Shen, a data scientist and the first presenter, commented afterwards: "One of the reasons I agreed to present yesterday is that I fundamentally believe that we, as a data science community, can do better than sums and averages; that instead of settling for the utility curves presented to us by government agencies, can expand the universe of the possible information and knowledge that can be gleaned from the data that your tax dollars and mine help to collect without making sacrifices to privacy."

Daniell Toth, a mathematical statistician, described the methods he uses in his work for a government agency as follows:

  • Identity
    • Suppression; Data Swapping
  • Value
    • Top-Coding; Perturbation;
    • Synthetic Data Approaches
  • Link
    • Aggregation/Cell Suppression; Data Smearing

His slides include examples of each method and he concluded:

  • Protecting data always involves a trade-off of utility
  • You must know what you are trying to protect
  • We discussed a number of methods – the best depends on the intended use of the data and what you are protecting

My comment was that the first speaker needs to employ the services of a professional statistician who knows how to anonymize and/or aggregate data while preserving its statistical properties, and that the second speaker needs to explain that decision makers in the government have access to the raw data and detailed results and that the public needs to work with available open government data and lobby their Congressional Representatives to support legislation like the Data Act of 2013.

Also of note, SAS provides simulated statistical data sets for training and the Data Transparency Coalition has a conference on September 10th, Data Transparency 2013, to discuss ways to move forward.

Overall, excellent Meetup! I suggest we have event host CapitalOne Labs speak at a future Meetup to tell us about the work they do and especially their recent acquisition of Bundle to advance their big data agenda. "Bundle gives you unbiased ratings on businesses based on anonymous credit card data."

For more, see the event slides and audio:

Cloud SOA Semantics and Data Science Conference

This is a guest post prepared by the SOA, Semantics, & Data Science conference for the Data Community DC Blog to provide introduction/context to the types of technology that the conference focuses on, and its applications.

CSDS_Logo_v2Featuring 15th SOA for E-Gov Conference:

Conference Title: Cloud: SOA, Semantics, & Data Science Theme: The Changing Landscape of Federal Information Technology Dates: 9/10/2013  to 9/11/2013 Location: The Waterford, Springfield, VA Contact: Tammy Kicker ( Web Site:

Federal organizations are racing to capitalize on social, mobile and cloud computing trends to provide solutions for their agency mission needs.  At the same time, there is great pressure to spend less while improving capability, service, cost and flexibility.

This event is an open knowledge exchange forum for communities of practice in Cloud, SOA, Semantics, and Data Science.  It brings together thought leaders and experts from the federal and business communities to continue the conversation around best practices in advancing SOA, semantic technology and data science within the Cloud construct.

The event builds on the successes of two previous events: the Service-Oriented Architecture (SOA) e-Government Conferences and the Department of Defense SOA and Semantic Technology Conferences.

This event is focused on SOA, Cloud Computing, Semantics Technology and Data Analytics.

SOA uses data as a service, which in turn requires dealing effectively with semantics.  Data science is used to process and analyze the data for those semantics to extract information.  Given the recent pronouncement by Dominic Sale, OMB (invited Keynote) that "all content is data", this conference is especially timely and focused.

Presenters and panelists will examine the benefits of governance frameworks and approaches Federal agencies are pursuing to increase the maturity and efficiency of their SOA, Cloud, Semantic Technology and Data Science.

The types of technology focused on and its applications are summarized in the table below.


Speaker Technology Applications Comments
Brand Niemann Data Science Data Visualization Tools (12 Leaders and Chalengers) OMB Analytic Data Sets and Public Data Sets for DC Data Science Community Director and Senior Data Scientist, Semantic Community and Founder of Federal SOA Community of Practice
Dominic Sale (invited) TBD TBD OMB Chief of Data Analytics & Reporting
Steve Woodward Cloud Computing in Canada New Agency CEO, Cloud Perspectives
David S. Linthicum Cloud and SOA Convergence Your Enterprise Cloud Computing Thought Leader, Executive, Consultant, Author, and Speaker
Denzil Wasson Semantics Cloud and SOA for Government Chief Technology Officer, Everware-CBDI
Vendor Showcase Multiple Multiple Always a Favorite at These Conferences
Use Cases and Pilots Cray Grph Computer Semantic Medline - National Library of Medicine & White House OSTP’s NITRD Federal Big Data Senior Steering WG Discovery of Disease Cause and Effect
Geoffrey Charles Fox Cyberinfrastructure Enabling e-Government, e-Business and e-More Or Less Anything Associate Dean for Graduate Studies & Research Distinguished Professor of Computer Science and Informatics, Indiana University
Dennis Wisnosky Semantics Mainstream Former DoD Business Mission Area CTO, member of the Enterprise Data Management Council
Michaela Iorga Cloud Computing and Security Government and Industry Senior Security Technical Lead for Cloud Computing and Chair, NIST Cloud Computing Security WG
Use Cases and Pilots Semantics & Other Mission/Business Transformation Needs Four Applications for Department of Veterans Affairs and Other Organizations

Your participation and suggested contributions are welcomed to continue to build and sustain this unique community of communities of practice to improve the delivery of government services in support of the US Federal Digital Government Strategy and Open Government Data Initiatives.


Brand Niemann, former Senior Enterprise Architect and Data Scientist with the US EPA, completed 30 years of federal service in 2010. Since then he has worked as a data scientist for a number of organizations, produced data science products for a large number of data sets, and published data stories for Federal Computer Week, Semantic Community and AOL/Breaking Government.

Data Science MD Discusses Health IT at Shady Grove

For its June meetup, Data Science MD explored a new venue, The Universities at Shady Grove. And what better way to venture into Montgomery County than to spend an evening discussing one of its leading sectors. That's right, an event all about healthcare. And we tackled it from two different sides.

The night started with a presentation from Gavin O'Brien from NIST's National Cybersecurity Center of Excellence. He spoke about creating a secure mobile Health IT platform that would allow doctors and nurses to share relevant pieces of information in a manner that is secure and follows all guidelines and policies set forth documenting how health data must be handled. Gavin's presentation focused on securing 802.11 links as opposed to cellular links or other types of wireless links as this is a good first step and is immediately practical when deployed within one building like a hospital. Gavin discussed all of the technological challenges, from encrypting data during transmission rather than in the clear where it can be intercepted as well as creating Access Control Lists so that only the correct people saw a patient's data. As his talk progressed, one thought was constantly in the back of my mind: how can this architecture be put in place to provide the protection for the data that the policies stipulate while still allowing the data to be distributed so that analytics can be run on the data? For instance, a hospital should be interested in trends among patients in their care like if patients had complications all after receiving the same family of drugs or specific drug (perhaps from the same batch), when patients have the most problems and therefore require the most attention and when a bacteria or virus may be loose in the hospital, further complicating patients ailments. The architecture may allow these types of analytics but they were not specifically discussed during Gavin's presentation. If you have any ideas how a compliant architecture can support these analytics or potential problems to running analytics, please provide a comment to this post.

The final speaker of the night was Uma Ahluwalia, the Director of Health and Human Services for Montgomery County. Uma spoke about the various different avenues that county residents have to report problems and that often times, their needs cross many different segments of health and human services, usually requires their stories to be retold each time. According to her vision, a resident/patient could report their problem to any one of six segments and then all of the segments could see the information without the patient having to reiterate their story over and over again. One big problem with this solution is that data would be shared across many groups, giving county workers access to more information than they should according to health regulations. However, Montgomery County sees each segment as a part of one organization, and therefore the data can be shared internally among all employees within that organization. While this should help with reducing the amount of time patients need to retell their story, it still does not provide an open platform for data scientists. However, Uma also had a potential solution to that problem: volunteers. Volunteers can sign non-disclosure agreements allowing them access to see patient data to help create useful analytics, thereby opening the problem space to many more minds in the hopes of creating truly revolutionary analytics. Perhaps you will be the next great mind that unlocks the meaning behind a current social issue.

Finally, Data Science MD needs to acknowledge a few key people and groups that contributed to this meetup. Mary Lowe and Melissa Marquez from the Universities at Shady Grove were instrumental in making this happen, helping to secure the room and providing the food and A/V setup. Dan Hoffman, the Chief Innovation Officer for Montgomery County also provided a great deal of support to make this happen. Finally, John Rumble, a DSMD member, took the lead in getting DSMD beyond the Baltimore/Columbia corridor. Thanks so much to all of these key people.

If you want to catch up on previous meetups, please check out our YouTube channel.

Please check our July meetup where we discuss analysis techniques in Python and R at Betamore.

Weekly Round-Up: Open Data Order, Data Discovery, Andrew Ng, and Connected Devices

Welcome back to the round-up, an overview of the most interesting data science, statistics, and analytics articles of the past week. This week, we have 4 fascinating articles ranging in topics from Open Data to connected devices. In this week's round-up:

  • Open Data Order Could Save Lives, Energy Costs And Make Cool Apps
  • Four Types of Discovery Technology
  • Andrew Ng and the Quest for the New AI
  • Our Connected Future

Open Data Order Could Save Lives, Energy Costs And Make Cool Apps

This is a TechCrunch article about President Obama's recent Open Data Order, an executive order intended to make more government agency data openly available for analysis. The article goes on to talk about some of the ways open data has been used in the past and has a link to Project Open Data's Github page where you can find more details.

Four Types of Discovery Technology

This Smart Data Collective post talks about the value of discovery in data analytics and business. The author claims there are four types of discovery for business analytics - event discovery, data discovery, information discovery, and visual discovery - and he goes into some detail explaining each one and the differences between them.

Andrew Ng and the Quest for the New AI

This is an interesting Wired piece about Andrew Ng, best known as the Stanford machine learning professor who also co-founded Coursera. The article talks about Ng's background and interest in artificial intelligence as well as some of the deep learning projects he is working on. It goes on to explain a little about what deep learning is and how it may evolve in the future.

Our Connected Future

Our final piece this week is a GigaOM article about connected devices and how they will become more prevalent in the future. The article highlights some very interesting devices, explains what they do, and describes how they are being used. The article also talks about the data that can be collected from connected devices such as these and different ways that this data can be used.

That's it for this week. Make sure to come back next week when we’ll have some more interesting articles! If there's something we missed, feel free to let us know in the comments below.

Read Our Other Round-Ups

Data Unconference: The Sunlight Foundation's 5th Annual Transparency Camp

DC2 would like to invite you to Sunlight Foundation’s 5th annual TransparencyCamp on May 4th and 5th at the George Washington University’s Marvin Center, Washington, DC. Early bird registration for TransparencyCamp 2013 is still open until March 1, 2013, so register today!

For the last five years, we've gathered together a variety of journalists, policy creators, technologists, concerned citizens, academics, watchdogs, and others to build community, share best practices, and problem-solve challenges to work in the transparency arena. Last year, we hosted over 400 people from over 30 countries and 26 US states. This year, we’re expecting around 500 participants with even more participation from attendees across the country and abroad. Please check out for a preview of what to expect at this year’s unconference.

  • What: TransparencyCamp 2013
  • Where: George Washington University (Marvin Center) 800 21st St NW,  Washington, DC 20052
  • When: May 4-5, 2013

Also, lunch is provided by DC’s awesome food trucks. Click here to register for TransparencyCamp now.

Plus check out videos here and here from past TCamps and be sure to come with ideas, share the registration link with your friends, and tweet #TCamp13!