My partner Kin Lane recently received a Knight Foundation prototype grant for a project he’s calling Adopta.Agency. The idea is to build upon President Obama’s open data initiative so that federal datasets are actually useful – the data is clean and (ideally) machine-readable.
Adopting datasets strikes me as particularly important when it comes to the Department of Education, which publishes a lot of data (although only 300 datasets have made it to the data.gov website), but often in proprietary formats (PDFs or PowerPoint slides or Microsoft Word files) or in strangely formatted spreadsheets with columns and filenames that lack consistency let alone clarity year-over-year.
(Let’s pause to note the irony of a department that demands the collection of more and more data from schools and students – a cornerstone of its calls for “accountability” – but that cannot figure out how to manage or release its own data in open, useable formats.)
Kin’s project does not involve any new technology or platform. Rather, he’s created a blueprint that uses GitHub to host the data, the roadmap for the project, Q&As, and issues. The blueprint (ideally) contains all the pieces you’ll need to get started. (A more detailed although admittedly very preliminary How-To is here.) Kin’s reasons for choosing GitHub are severalfold: you can easily fork and contribute to projects. Also, GitHub Pages makes it simple to spin up a website for each project (or “repository” in GitHub lingo).
I’ve started two projects using Department of Education data: one that will clean up the datasets gathered for the My Brother’s Keeper initiative and the other will identify and make machine-readable datasets pertaining to education technology. (Of the 17 datasets that you can find with the keywords “education” and “technology” only 3 are in open formats.)
The data in both of these collections certainly need to be cleaned up (OMG do they ever), and I’d like to see them expanded upon as well. (Indeed, one of the criticisms of the My Brother’s Keeper initiative is that its “gender exclusive focus” ignores the needs of black girls; consider my project an opportunity for a “fork” that includes data pertaining to girls as well.)
I’m interested in helping address some of the problems with Department of Education data. (First order of business: calling them out on the shoddiness of their efforts so far.) More broadly I’m also interested in exploring the use of GitHub as part of a “Reclaim Your Domain” strategy – that is, how do we make it easier for folks to develop and control their data and their digital identity. Government data is our data, after all. And finally, how do we lower the barriers to entry so that “open data” and “open government” initiatives and the like don’t simply concentrate power in the hands of a technical elite? How can we create templates for data-curious projects – such as Kin’s Adopta.Agency blueprint – so that it’s as easy as possible to get up and running with a similar effort?