Tuesday, February 23, 2010

datapkg: data/knowledge packaging

Via the Open Knowledge Foundation Blog, we're excited to see datapkg maturing. This is a command-line tool for discovering, installing and sharing data packages. If you are familiar with Linux, the idea is to do for data what apt-get/aptitude/dpkg do for programs, with registries like CKAN playing the role for data that Debian or other distributions do for programs. For example, with datapkg installed, one can do something like this:


$ datapkg search ckan:// economics
stw_thesaurus_for_economics -- STW Thesaurus for Economics
energy-stern-review-economics-climate-change -- The Stern Review -- The Economics of Climate Chanage
repec -- Research Papers in Economics
unstats -- United Nations Statistical Databases
usa_bls_employment -- USA Employment status of the civilian noninstitutional population, 1940 to date
eurostat-gfs -- Eurostat - Government Finance Statistics (GFS) Data
numbrary -- Numbrary
economagic -- Economagic Economic Time Series
pl-budget -- Poland - Ministry of Finance - Budget
econ-alfred -- ALFRED: ArchivaL Federal Reserve Economic Data
ehnet -- Economic History Services Databases
eu-cohesion-beneficiaries-ie -- EU Cohesion Beneficiaries - Ireland
nl-statistics -- Netherlands - Statistics
econ-gdp-historical -- World Population, GDP and Per Capita GDP, 1-2003 AD
esfdb -- European State Finance Database
econ-fraser -- FRASER - Federal Reserve Archival System for Economic Research
fi-budget -- Finland - Valtiovarainministeriö - Budget
econ-fred -- Federal Reserve Economic Data
ceprdata -- CEPR Data

$ datapkg info ckan://econ-gdp-historical
## Package: econ-gdp-historical

name: econ-gdp-historical
title: World Population, GDP and Per Capita GDP, 1-2003 AD
version: None
license: Non-OKD Compliant::Other
author: None
author_email: None
maintainer: None
maintainer_email: None
url: http://www.ggdc.net/maddison/
download_url: http://www.ggdc.net/maddison/Historical_Statistics/horizontal-file_03-2007.xls
notes: ### Author

Angus Maddison

### Openness: Not open

* No license
* Plus following statement attached to link to data: "Last update: March 2007, copyright Angus Maddison"

### Format

* xls (excel)
tags: ['economic', 'history', 'gdp', 'license-not-specified', 'data']
extras: {}

$ datapkg install ckan://econ-gdp-historical .
Registering ...
Created on disk at: ./econ-gdp-historical
Downloading package resources ...
horizontal-file_03-2007.x 100% |=========================| 1.5 MB 00:03

$ ls -R
.:
econ-gdp-historical

./econ-gdp-historical:
horizontal-file_03-2007.xls metadata.txt


In other words, it gets a whole lot easier to discover and install data in an organized fashion. One exciting thing about that is that this opens the door to propagating changes and updates in a sane way, a topic near to our hearts. Good stuff!

No comments: