ryanwold.net - notes from CodeForOakland

Data within and across public agencies & Getting it out

A core component of an effective open government strategy involves the publishing of system data on a regular basis.

To get started on an initiative like this, an early step in determining the scope of the effort should include a data inventory. A data inventory includes a listing of all computer systems that hold data for an organization. Basically, what system holds what and how is it used? From payroll to hr to finances to CRM systems, there is valuable data being stored. A centralized index of data can be used to help prioritize data publishing efforts and identify opportunities for systems and workflow process improvement. Thus, it is important to inventory all data within an organization.

It seems obvious that public data is of interest to the public. From research, curiosity, and accountability purposes, it is important to track data across public organizations as well. Comparing data across organizations creates a context for identifying patterns that could lead to better public services or increased government efficiency.

In regards to getting data out of public agencies, I often hear of about the desire for sophisticated data catalogs, and there are many duplicative efforts along these lines (even though the http;//nationaldatacatalog.com has been offlined). I'm hesitant to call for an additional layer of software to publish datasets. Rather, I'm an advocate for low-barrier solutions.

Government data is typically stored inside systems, and more specifically, inside databases. Open data can simply be exports of data ontained within those systems. Much more can be done than this, however, by initially being satisfied by a low-barrier solution, public agencies can avoid the burden of having to develop and maintain API's around this. The last thing we need is for 38,000 public agencies, many of which that do not have websites, to be expected to provide consumable API's. So, when it comes to data, go the easy route: cronjobs and database dumps to .csv files (or the like). Open data should be a natural artifact of public business. It should require significant amounts of effort to manage or provide. Leave it to the #opengov community to build atop and make use of the data.