Data Commoners: May 2009

Thursday, May 21, 2009

data.gov goes live

The US Federal Government is putting some more of its data out there, at data.gov. Their full list of raw data at the time of writing comes to 47 "catalogs" with formats including XML, CSV, KML, ESRI, and a few RSS feeds. It is not very extensive at the moment compared to existing sources, but a good start. The site is clearly separated into the raw data catalog on one side, and the tools they have for exploring the data on the other. This means that third-parties could come along and make better tools with access to the same data, benefiting everyone. Within the data commons, we're following a similar model, where the data commons "repository" holds all the data, and our own online tools and API will not be privileged in any way. If someone can come along and visualize our data better than we can, then that would be great!

Monday, May 18, 2009

Why and How: Making Data Open

Our similarly-named-but-unrelated-peers, the Open Data Commons, have posted a guide to making your data open, briefly explaining why you might want to do it, and how to do it right.

Here's their take on why open data is important:

[Open data is] crucial because open data is so much easier to break-up and recombine, to use and reuse.

And their take on the need for clear licensing:

Licensing is important because it removes uncertainty. Without a license you don’t know where you, as a user, stand: when are you allowed to use this data? Are you allowed to give to others? To distribute your own changes, etc?

Tuesday, May 12, 2009

Generating spreadsheets online

One of the advantages we hope the data commons will bring to data sharers is that we'll handle all the tedious format conversion issues that often raise the cost of collaboration.

Smaller organizations often prefer to work with spreadsheets rather than databases, so we are working to support export and import to spreadsheet formats, such as that of Excel.

The free and open source spreadsheet program gnumeric comes with a command-line utility ssconvert ("SpreadSheet convert") which is almost ideal for automating such conversions. For example, it can take the primitive CSV (comma separated value) format which is easy to generate from any source, and convert it to all the various Excel formats (and lots of other formats too).

A patch from a data commoner (myself) to support merging multiple workbooks was recently accepted by the developers of gnumeric. This is typical of how the free and open source software community works: someone benefiting from a public good extends it to meet a need they have, then contributes the extension for the benefit of all. This model is at the heart of what the data commons project wants to bring to the cooperative economy.

Monday, May 11, 2009

A site to watch: Farm 2 Local

I just ran across Farm2Local, another budding start-up that aims to make it easier for farmers to find buyers and consumers to find fresh, local produce. Perhaps we can learn from them.

Saturday, May 9, 2009

Data Genius

"Don't be data rich and knowledge poor." Now that's a tagline I find intriguing. Living Naturally is one of NCGA's partner organizations, a "supplier of software and solutions to retailers in the natural products industry, including store automation." The sophistication of the data environment for some types of enterprises, including food co-ops, is a little intimidating for me. How can we make sure that what the Data Commons provides is of sufficient value and can integrate into the rest of their operations seamlessly?

Tuesday, May 5, 2009

Featured Directory: The .Coop Directory

Cooperatives are one of the few types of businesses to have their own Top-Level Domain suffix, .coop. The organization that grants these domain names has been improving its directory of listed organizations, as a service to domain holders and in an effort to popularize the .coop suffix.
Features that they are advertising:

.coop domain holders can Claim and Customize their listings
listings are geo-tagged, so you can search by geography
multimedia: you can add a photo, video and logo to your listing; you can also upgrade (pay?) to add more photos and videos, or a custom map or directory
sharing widgets to point people to the directory, such as via Facebook and Myspace

These seem like fairly good ideas. It will be fun to see if they want to become a Data Sharing Organization someday.

UPDATE: The .coop Directory uses a company's software to display the directory listings on a map. We're a little disappointed at the closed and proprietary nature of the software. Coops can do better!

Retailer-Supplier Data-sharing

I don't know if the Data Commons will ever get this specific. Here's a story we were alerted to about how a retailer (Food Lion) is working with a lot of its suppliers to exchange real-time (or nearly) data on sales, inventory, etc. This helps get a clearer idea of what the stores have, what customers are buying, when a product is out-of-stock, and so on. This kind of data may not be our primary focus at the moment, but who knows where we'll end up?

Saturday, May 2, 2009

SQLFairy schema translation

The SQLFairy project, which has one of the most ... unusual logos around, produces a nifty command-line tool called sqlt. This tool can map the structure of a database from one format to another, including SQL databases, Excel spreadsheets, comma-separated tables, etc. Here's the full list of formats it can read from ("Parsers") and write to ("Producers"):

Parsers: Access, DB2, DB2-Grammar, DBI, DBI-DB2, DBI-MySQL, DBI-Oracle, DBI-PostgreSQL, DBI-SQLServer, DBI-SQLite, DBI-Sybase, DBIx-Class, Excel, MySQL, Oracle, PostgreSQL, SQLServer, SQLite, Storable, Sybase, XML, XML-SQLFairy, YAML, xSV

Producers: ClassDBI, DB2, DBIx-Class-File, DiaUml, Diagram, Dumper, GraphViz, HTML, Latex, MySQL, Oracle, POD, PostgreSQL, SQLServer, SQLite, Storable, Sybase, TT-Base, TT-Table, TTSchema, XML, XML-SQLFairy, YAML

It translates the structure of data (in SQL: CREATE, ALTER) and not the data itself (in SQL: INSERT, UPDATE, DELETE). On Ubuntu/Debian systems, this tool is available in the "sqlfairy" package. Let's try a quick test on the following Excel spreadsheet (well, actually an OpenOffice spreadsheet, but saved in Excel format):

If we save this as sqlfairy.xls and run sqlt like this:

sqlt --from Excel sqlfairy.xls --to MySQL

we get:

--
-- Created by SQL::Translator::Producer::MySQL
-- Created on Sat May 2 14:30:57 2009
--
SET foreign_key_checks=0;

--
-- Table: `Accounts`
--
CREATE TABLE `Accounts` (
`Account` integer(3) NOT NULL DEFAULT '',
`First_Name` char(4) DEFAULT '',
`Last_Name` char(5) DEFAULT '',
`Balance` integer(4) DEFAULT '',
PRIMARY KEY (`Account`)
);

SET foreign_key_checks=1;

Not a bad start. If you're translating from a mysql database, you can either connect to it live using the DBI parser, or dump your database first like this:

mysqldump --user=USER --password=PASS DATABASE_NAME --lock-tables=false --no-data > dump.sql

And then convert it like this (here we convert to sqlfairy's own xml format):

sqlt --from MySQL dump.sql -t XML-SQLFairy > dump.xml