Sunday, February 28, 2010

Big list of directories

The "links to other directories" section on this blog has been growing steadily over time, as Noemi digs up more and more directories related in some way to what we've been calling the "rooted economy." Here's a snapshot of the list, for those following by RSS or email:


If you know of a directory that isn't on the list yet, please share!

Saturday, February 27, 2010

Best practice for open data, a reading list

It can be hard to convince data-sharers that data being "freely available" on a web-page isn't the end of the argument about whether that data can be reused. Here's a collection of postings I've found helpful in understanding the issues.
For comparison purposes, it is worth looking at the history of software repositories. For example, Debian has 20,000+ packages within it (depending how you count), covering every kind of software under the sun (and beyond, stargazers should check out the "stellarium" package). A typical package will depend on a dozen or so other packages, which in turn depend on others. It is a massive work of aggregation. There are huge technical challenges, but underlying the solution is the Debian social contract and their Free Software Guidelines. Here are the guidelines in full:

  1. Free Redistribution

    The license of a Debian component may not restrict any party from selling or giving away the software as a component of an aggregate software distribution containing programs from several different sources. The license may not require a royalty or other fee for such sale.

  2. Source Code

    The program must include source code, and must allow distribution in source code as well as compiled form.

  3. Derived Works

    The license must allow modifications and derived works, and must allow them to be distributed under the same terms as the license of the original software.

  4. Integrity of The Author's Source Code

    The license may restrict source-code from being distributed in modified form _only_ if the license allows the distribution of patch files with the source code for the purpose of modifying the program at build time. The license must explicitly permit distribution of software built from modified source code. The license may require derived works to carry a different name or version number from the original software. (This is a compromise. The Debian group encourages all authors not to restrict any files, source or binary, from being modified.)

  5. No Discrimination Against Persons or Groups

    The license must not discriminate against any person or group of persons.

  6. No Discrimination Against Fields of Endeavor

    The license must not restrict anyone from making use of the program in a specific field of endeavor. For example, it may not restrict the program from being used in a business, or from being used for genetic research.

  7. Distribution of License

    The rights attached to the program must apply to all to whom the program is redistributed without the need for execution of an additional license by those parties.

  8. License Must Not Be Specific to Debian

    The rights attached to the program must not depend on the program's being part of a Debian system. If the program is extracted from Debian and used or distributed without Debian but otherwise within the terms of the program's license, all parties to whom the program is redistributed should have the same rights as those that are granted in conjunction with the Debian system.

  9. License Must Not Contaminate Other Software

    The license must not place restrictions on other software that is distributed along with the licensed software. For example, the license must not insist that all other programs distributed on the same medium must be free software.

  10. Example Licenses

    The GPL, BSD, and Artistic licenses are examples of licenses that we consider free.

There's a lot going on here, and not all of it applies to data. But the desired outcome could perhaps be summarized as:
  • We want stuff than can be collected, mixed, and redistributed to others, who can in turn do the same.
  • We only want stuff that expressly permits such activity, and where the owners don't have any requirements that make such activity excessively complicated.
While data and software are different, projects like the DCP (and many others) would like to see this same outcome achieved for data. A challenge is that there is so much data out there that is "freely available" on a random website, but doesn't expressly permit aggregation, or state any requirements on allowed use. And yet, almost certainly, there are implicit requirements (no commercial use! no use by organizations my members disapprove of!) which may or may not have legal force but certainly ought to be respected by a well-behaved aggregator, even if the only way to respect them without splitting up the "data commons" too much would be by omitting the data entirely.

Tuesday, February 23, 2010

Linking with International Projects

There are a few key "data commons"-like projects that I think we should be following, connecting with, and perhaps collaborating with. A primary question for the DCP, I think, is: to what extent can we draw from and learn from (possibly adapt) the technologies of these projects (all open source) to serve DCP's goals? Are there things that DCP can contribute to these projects?

Mapping databases:

http://www.zoes.it/ (Italian)
http://www.solidarius.com.br/ (Portuguese)
http://www.fbes.org.br/index.php?option=com_content&task=view&id=3748&Itemid=215 (Portuguese)

Brasil also has a site that does social networking for their SE enterpises:

http://cirandas.net/ (Portuguese)

It uses this application:

http://noosfero.org/Noosfero (English)

With regards to this last application, it's built (like the DCP application) in Ruby on Rails. Is it possible that DCP could integrate it into our current work and have a readily-useable social networking/online commerce application up-and-running fairly quickly?

datapkg: data/knowledge packaging

Via the Open Knowledge Foundation Blog, we're excited to see datapkg maturing. This is a command-line tool for discovering, installing and sharing data packages. If you are familiar with Linux, the idea is to do for data what apt-get/aptitude/dpkg do for programs, with registries like CKAN playing the role for data that Debian or other distributions do for programs. For example, with datapkg installed, one can do something like this:


$ datapkg search ckan:// economics
stw_thesaurus_for_economics -- STW Thesaurus for Economics
energy-stern-review-economics-climate-change -- The Stern Review -- The Economics of Climate Chanage
repec -- Research Papers in Economics
unstats -- United Nations Statistical Databases
usa_bls_employment -- USA Employment status of the civilian noninstitutional population, 1940 to date
eurostat-gfs -- Eurostat - Government Finance Statistics (GFS) Data
numbrary -- Numbrary
economagic -- Economagic Economic Time Series
pl-budget -- Poland - Ministry of Finance - Budget
econ-alfred -- ALFRED: ArchivaL Federal Reserve Economic Data
ehnet -- Economic History Services Databases
eu-cohesion-beneficiaries-ie -- EU Cohesion Beneficiaries - Ireland
nl-statistics -- Netherlands - Statistics
econ-gdp-historical -- World Population, GDP and Per Capita GDP, 1-2003 AD
esfdb -- European State Finance Database
econ-fraser -- FRASER - Federal Reserve Archival System for Economic Research
fi-budget -- Finland - Valtiovarainministeriö - Budget
econ-fred -- Federal Reserve Economic Data
ceprdata -- CEPR Data

$ datapkg info ckan://econ-gdp-historical
## Package: econ-gdp-historical

name: econ-gdp-historical
title: World Population, GDP and Per Capita GDP, 1-2003 AD
version: None
license: Non-OKD Compliant::Other
author: None
author_email: None
maintainer: None
maintainer_email: None
url: http://www.ggdc.net/maddison/
download_url: http://www.ggdc.net/maddison/Historical_Statistics/horizontal-file_03-2007.xls
notes: ### Author

Angus Maddison

### Openness: Not open

* No license
* Plus following statement attached to link to data: "Last update: March 2007, copyright Angus Maddison"

### Format

* xls (excel)
tags: ['economic', 'history', 'gdp', 'license-not-specified', 'data']
extras: {}

$ datapkg install ckan://econ-gdp-historical .
Registering ...
Created on disk at: ./econ-gdp-historical
Downloading package resources ...
horizontal-file_03-2007.x 100% |=========================| 1.5 MB 00:03

$ ls -R
.:
econ-gdp-historical

./econ-gdp-historical:
horizontal-file_03-2007.xls metadata.txt


In other words, it gets a whole lot easier to discover and install data in an organized fashion. One exciting thing about that is that this opens the door to propagating changes and updates in a sane way, a topic near to our hearts. Good stuff!