Working toward Data Standards

There is a lot of talk about gov 2.0 (two-oh), "transparency", and open government. For transparency, the end game is to have agencies manage and publish complete, accurate, and standardized datasets.

There has been talk of a Github for data. I'm a bit weary of versioning datasets, in terms of the difficulty to track valid branches and forks. I believe the further data gets from the source and the more it is manipulated, the less credible it becomes. The software that stores and provides and interface to the data is designed for a specific structure or "schema".

Versioning data content is necessary at times (legislation/bills), and sometimes it is not (financial transactions). There is also versioning of data formats (ie: 2 new fields were added to an extract, or , changed), but that's another issue.

It seems worthwhile to develop a working feedback loop that enables agencies to improve their raw data feeds at the source: in-house.

Perhaps feedback can be provided through comments/dialogue attached to datasets (as data.gov allows) - this could serve as a form of social data normalization, since 'standards' are what developers are seeking and what will enable government to become a true platform. If comments could be tied to a specific dataset -and- to a generic working standard, I think that'd be quite useful for both those who consume and create data.

← Scraper Wiki'ing

O'Reilly Open Government →