Managing Open Transportation Data at the U.S. Department of Transportation

Dennis D. McDonald, Ph.D. Dennis is an independent management consultant based in Alexandria, Virginia. His experience includes consulting company ownership and management, database publishing and data transformation projects, managing the consolidation of large systems, open data, statistical research, corporate IT strategy, and IT cost analysis. Clients have included the U.S. Department of Veterans Affairs, the U.S. Environmental Protection Agency, the National Academy of Engineering, the World Bank, and the National Library of Medicine. He has worked as a project manager, analyst, and researcher in the U.S. and in Europe, Egypt, and China. Contact him via his web site, via email, or at @ddmcd on Twitter.

On May 27, I attended a symposium in Washington D.C. sponsored by Data Innovation DC at the Georgetown University School of Continuing Studies. The topic was “Get Moving with Data - The US Department of Transportation and its Data.” Presentations were made by U.S. Department of Transportation speakers including Dan Morgan, the first Chief Data Officer in USDOT. 

Morgan and the others walked us through the variety of data sets generated and published by the Department.

"Published" is the key word here. Many of these data are available online, many in a standardized form accompanied by metadata dictionaries, and in some cases, there are APIs facilitating data access and use. Some of the data sets discussed included highway data (all roads and highway performancebridge inventoryfreight analysis),  research data sets for intelligent, connected vehicles, the National Transit Database, and data on aviation safety, roadway fatalities and crashesproduct recalls, and transportation statistics.

Of special interest to me were comments by the presenters on how the Department’s data management practices are changing as, in this day of “open data,” use and reuse of data is being encouraged.

What follow are my own comments about some of the key points I took away from the meeting regarding:

  • Changing data management practices

  • Continuing importance of partnerships

  • Encouraging reuse

  • Funding calculations

  • Increasing importance of real-time data

  • Data specialization & data generalization

Changing data management practices

Some of the data sets generated and published by DOT go back to the 1940s and are based on data manually gathered and submitted on a regular monthly basis by state and local agencies. Over time, some data are now being gathered more frequently (if funding and cooperation are available). In some cases, heavy use is still being made of phone and fax based data collection.

Some data are gathered automatically via in-road sensors. Other data on road conditions are gathered photographically.

It's a real mix. Efforts are currently underway via the ARNOLD system to integrate all highway related data into a single network model, incorporating geolocation, road condition, traffic, incident, weather, and other data elements.

It’s a massive effort. As a former number cruncher, I’m seriously impressed.

Continuing importance of partnerships

DOT doesn’t generate all this data by itself, but depends on the cooperation of many state and local entities to supply and update data.

This partnership model is one of the first things you learn about Federal open data management efforts, regardless of whether you are discussing roadway passenger traffic volume, incident and accidents, expenditures, miles driven, or headcounts. Data ultimately originate at the state and local level, and usage occurs at all levels. It’s useful to keep this "partnership" concept in mind when applying a “data management lifecycle" model to tracking and managing data from the time of origination to publication, use in modeling or calculations, updating, and retirement. Given the multiple stakeholders involved (and the multiple political interest groups, FAA data being a very good example), "managing" how all these parties work together is a major DOT concern. How DOT does this management has to change with the times as well as data management and access methods – and policies – continue to evolve.

Encouraging reuse of data

DOT staff want their data to be used and reused, and not just specifically for legislated funding apportionment. DOT encourages use of data by a variety of means including "hackathons" to promote interest among analytically oriented innovators and entrepreneurs. DOT also encourages innovative use of its data by commercial ventures including the publishing of FAA data for private pilot iPads and the analysis of truck incident data patterns by insurance companies.

Such uses may not always be specified by DOT’s enabling legislation. One of the great things about “open data” is that such data is available for use and combination in original ways with other data.

In some ways this is similar to what NOAA is doing with its big data project where major cloud vendors are encouraged to support both public access and data reuse. One major difference, though, is NOAA’s massive data volume compared with Transportation.

Funding calculations

Some data gathered and published by DOT is specifically designed for calculating how Federal funds are to be allocated. This requirement is both a blessing and a curse.

Blessing-wise, this means that data collection and publishing efforts can evolve to become dependable, reliable, and sustainable (although there may be occasional hiccups introduced by periodic funding sequesters).

A curse is that this focus on the data needs of specific programs may limit the resources that DOT can devote to seeking out and encouraging innovative and potentially commercially viable uses of DOT data. The end result of such resource limitations might be that, by adhering to legislated program priorities, the public could be losing out if new or innovative data uses aren't being surfaced. Again, making such data “open” is one way to encourage innovative usage.

Increasing importance of real-time data

Transportation data that may have been collected on a biennial or annual basis decades ago might now be collected annually or monthly. Other data are being collected more frequently or in near real-time.

For some applications, the argument for more frequent collection is straightforward: i.e., increased accuracy of data for users. More frequent collection of data, of course, generates increased costs all the way through from collection through processing, storage, and release.

Not all types of data and data uses can justify the expense of increased frequency or real time. Deciding where to place priorities becomes a complex issue that raises interesting governance, management, policy, and technology issues. It might also call for a more open and transparent process for making such decisions so that fairness and objectivity can be maintained.

Data specialization and data generalization

DOT data cover a variety of specialized and generalized topics. Everyone can appreciate the significance of data describing traffic and accidents on the ground or in the air, but the language used to describe the data may vary widely according to specialty and understandability. This places a strong emphasis on availability of good documentation describing DOT data and metadata. DOT does provide much  documentation along with its data files.

Viewed strategically the mix of standards, terminology, semantics, and vocabularies places a premium on managing data across the board as a strategic asset in ways that are aligned with national priorities. At the same time, there is the need to maintain data quality and utility for the many vertical specialties represented by DOT programs. These are reasons why DOT and other organizations now have a Chief Data Officer Data Officer so that within department and across department goals and objectives can be supported efficiently.

Discussion

One things I found refreshing about the DOT presentations was the enthusiasm of the program managers for "their data." Coming from a background in research and statistics, I find this both appropriate and inspiring. Maintaining data quality and utility requires professional skills and disciplines. The focus on analytics and visualization was impressive.

Still, one topic that wasn't addressed much during the presentations -- if at all -- was the overall governance and management of the Department’s data related operations. This is the aspect of "big data" and "open data" that fascinate me given the need to coordinate the many stakeholders involved. How data related processes permeate data intensive organizations such as DOT might argue for a "flatter" data management architecture where participants and influencers all along the nodes of various data management lifecycles are able -- and encouraged -- to collaborate, share information, and work together.

This need for collaboration is a pretty basic requirement where stakeholders and decision makers are distributed throughout an organization. At the same time, it’s not uncommon for hierarchically structured bureaucracies to resist changes. Sometimes the resulting stability is good -- and sometimes it’s bad.

In the case of the data management operations at organizations like DOT, the dedication of its IT and data professionals will be an important force for good. However, as more changes are demanded in how data are generated, managed, and released, it’s not only the IT managers, data administrators, programmers, and analysts who have to work together to make things happen. All impacted departments and budgets need to be involved in planning, implementation, and oversight as more data are generated, standardized, released, and supported.

This calls for more collaboration and coordination than can be accomplished via a series of quarterly or even monthly meetings among department heads. Such challenges are not unique to the Federal Government. All large organizations desiring to take a more strategic position in how data -- the lifeblood of organization processes -- are managed and released will have to address such governance issues.

Related reading: