A blog to communicate about the Data Commons Project and keep track of progress creating the Data Commons Cooperative, a hybrid worker and consumer owned cooperative providing data services to members of the rooted economy.
Thursday, October 15, 2009
Solidarity Economy Maps
Monday, June 29, 2009
Open Database License, version 1 is out
This license is an "Attribution and Share-Alike" license, doing for data/databases what the creative commons has done for media.
Thursday, June 4, 2009
DCP work in the latest version of Gnumeric
Thursday, May 21, 2009
data.gov goes live
Monday, May 18, 2009
Why and How: Making Data Open
Here's their take on why open data is important:
[Open data is] crucial because open data is so much easier to break-up and recombine, to use and reuse.And their take on the need for clear licensing:
Licensing is important because it removes uncertainty. Without a license you don’t know where you, as a user, stand: when are you allowed to use this data? Are you allowed to give to others? To distribute your own changes, etc?
Tuesday, May 12, 2009
Generating spreadsheets online
Smaller organizations often prefer to work with spreadsheets rather than databases, so we are working to support export and import to spreadsheet formats, such as that of Excel.
The free and open source spreadsheet program gnumeric comes with a command-line utility ssconvert ("SpreadSheet convert") which is almost ideal for automating such conversions. For example, it can take the primitive CSV (comma separated value) format which is easy to generate from any source, and convert it to all the various Excel formats (and lots of other formats too).
A patch from a data commoner (myself) to support merging multiple workbooks was recently accepted by the developers of gnumeric. This is typical of how the free and open source software community works: someone benefiting from a public good extends it to meet a need they have, then contributes the extension for the benefit of all. This model is at the heart of what the data commons project wants to bring to the cooperative economy.
Monday, May 11, 2009
A site to watch: Farm 2 Local
Saturday, May 9, 2009
Data Genius
Tuesday, May 5, 2009
Featured Directory: The .Coop Directory
Features that they are advertising:
- .coop domain holders can Claim and Customize their listings
- listings are geo-tagged, so you can search by geography
- multimedia: you can add a photo, video and logo to your listing; you can also upgrade (pay?) to add more photos and videos, or a custom map or directory
- sharing widgets to point people to the directory, such as via Facebook and Myspace
UPDATE: The .coop Directory uses a company's software to display the directory listings on a map. We're a little disappointed at the closed and proprietary nature of the software. Coops can do better!
Retailer-Supplier Data-sharing
Saturday, May 2, 2009
SQLFairy schema translation
Parsers: Access, DB2, DB2-Grammar, DBI, DBI-DB2, DBI-MySQL, DBI-Oracle, DBI-PostgreSQL, DBI-SQLServer, DBI-SQLite, DBI-Sybase, DBIx-Class, Excel, MySQL, Oracle, PostgreSQL, SQLServer, SQLite, Storable, Sybase, XML, XML-SQLFairy, YAML, xSV
Producers: ClassDBI, DB2, DBIx-Class-File, DiaUml, Diagram, Dumper, GraphViz, HTML, Latex, MySQL, Oracle, POD, PostgreSQL, SQLServer, SQLite, Storable, Sybase, TT-Base, TT-Table, TTSchema, XML, XML-SQLFairy, YAML
It translates the structure of data (in SQL: CREATE, ALTER) and not the data itself (in SQL: INSERT, UPDATE, DELETE). On Ubuntu/Debian systems, this tool is available in the "sqlfairy" package. Let's try a quick test on the following Excel spreadsheet (well, actually an OpenOffice spreadsheet, but saved in Excel format):
If we save this as sqlfairy.xls and run sqlt like this:
sqlt --from Excel sqlfairy.xls --to MySQLwe get:
--Not a bad start. If you're translating from a mysql database, you can either connect to it live using the DBI parser, or dump your database first like this:
-- Created by SQL::Translator::Producer::MySQL
-- Created on Sat May 2 14:30:57 2009
--
SET foreign_key_checks=0;
--
-- Table: `Accounts`
--
CREATE TABLE `Accounts` (
`Account` integer(3) NOT NULL DEFAULT '',
`First_Name` char(4) DEFAULT '',
`Last_Name` char(5) DEFAULT '',
`Balance` integer(4) DEFAULT '',
PRIMARY KEY (`Account`)
);
SET foreign_key_checks=1;
mysqldump --user=USER --password=PASS DATABASE_NAME --lock-tables=false --no-data > dump.sqlAnd then convert it like this (here we convert to sqlfairy's own xml format):
sqlt --from MySQL dump.sql -t XML-SQLFairy > dump.xml
Thursday, April 30, 2009
Merging and maintaining bibliographic databases
There is a definite parallel between maintaining and merging library catalogs, and what the data commons aims to do. Wish we had an equivalent of the Library of Congress...
All fields become optional, all relationships many-to-many
Important to remember as we try to build a data commons that will last a long, long time.
- All Fields Become Optional: As your dataset grows, exceptions creep in. There’s not enough research time to fill in all your company profiles, there’s one guy in Guam when you expected everyone to be in a U.S. state, there’s data missing from the page you’re scraping, you have to pull updates from a new source...
- All Relationships Become Many-to-Many: Some guy works in DC but lives in Virginia, so he needs two Locations. A new type of incoming email needs to be shoveled out to different feeds. A state has both a primary and a caucus. Someone eventually realizes categories never really were mutually exclusive...
Use Case: Aggravation
- Two different ways of writing the company name
- Two different ways of writing the same address, and a third address of unknown reliability
- A misspelling of the town name
- The perennial East Coast problem of having to tell Excel that zip codes starting in "zero" need to be treated as text
- No entries in the third column, "Title" -- and the probability that any listed titles could already be out of date
- The sense of futility that fixing these problems once will not mean they are fixed forever
It's comforting, in a way, to know that at least I need the Data Commons to exist to make my life easier.
Open Database License, new draft out
The license does a good job of using copyright to maintain freedoms, as free software licenses do. It does not address "privacy rights / data protection rights over information in the contents." I wonder whether such rights could be used, like copyright, as a means to maintain data freedom? In other words, as part of the privacy / data protection terms, agreeing to maintain freedom would be a requirement.The Open Database Licence (ODbL) is a licence agreement intended to allow users to freely share, modify, and use this Database while maintaining this same freedom for others. Many databases are covered by copyright, and therefore this document licenses these rights. Some jurisdictions, mainly in the European Union, have specific rights that cover databases, and so the ODbL addresses these rights, too. Finally, the ODbL is also an agreement in contract for users of this Database to act in certain ways in return for accessing this Database.
Monday, April 27, 2009
A Shared Directory of Local Food
While browsing the Food Routes website, which is dedicated to promoting local food buying, I ran across this description of their database:
In collaboration with eatwellguide.org, FoodRoutes brings you an online map that can help you find locally-produced food near you. This map combines multiple directories from organizations around the nation into one powerful database. In the directory, you'll find descriptions, phone numbers, addresses, web sites, crop lists, and directions all to make local food purchasing that much easier.I would love to know how they combine all those directories and keep them up to date, and whether the Data Commons Project can help them do that better.
Thursday, April 23, 2009
An Example of a Shared Repository
I'm interested in their funding sources. Two years ago, they got a 4-year, $4.9 million grant from the Gordon and Betty Moore Foundation. "The Gordon and Betty Moore Foundation, established in 2000, seeks to advance environmental conservation and cutting-edge scientific research around the world and improve the quality of life in the San Francisco Bay Area. The Foundation’s Science Program seeks to make a significant impact on the development of provocative, transformative scientific research, and increase knowledge in emerging fields." Could we learn something from Fedora's application for the grant?
Big Data
- All saved versions: http://download.wikimedia.org/enwiki/
- Latest files: http://download.wikimedia.org/enwiki/latest/
- The main file is "enwiki-latest-pages-articles.xml.bz2"
The DMOZ open directory (like Yahoo's directory, but volunteer created and under a free license) is downloadable in RDF format at http://rdf.dmoz.org/.
Of course, there's lots more data out there, but this does give a sense of one way in which "Big Data" may be distributed. What I like:
- It is really easy to get the free data, just like it is easy to get free software.
- The data is in a good format to use, just like free software source code.
- Rights to the data are granted in a clear and free license, just like free software.
- There's no equivalent of "patches" in software. Let me explain. If you improve a piece of code someone else wrote, you can automatically generate the "difference" between the original and your revised version, send that difference (called a "patch") to the original author, who can then evaluate it and if they like it merge it automatically with their code (even if they've made their own non-overlapping changes in the meantime). That's patching in software. Now what happens if you improve pages you downloaded from Wikipedia? I guess you go to the site and try typing them in - there's no way I see to submit something like a patch. And without a patching mechanism, there's no basis for distributed development of the data, like happens with free software.
Wednesday, April 22, 2009
Products and Services
The DCP has grown out of previous projects, including the Grassroots Economic Organizing "Economy of Hope" directory and the Regional Index of Cooperation (www.find.coop). The idea initially was simply to build an open, shared, comprehensive and accurate catalogue of the (small-c) cooperative economy -- including coops and credit unions, but also land trusts, local currencies, employee-owned companies, community-supported agriculture, and so on. This effort is linking up with similar efforts worldwide of cataloguing the cooperative/solidarity/social economy. (There are different terms, none of which is completely satisfactory -- we've been playing with using the term "rooted" economy. More on that later.)
What we would like to build is:
1. a big repository of data: a database with names of companies, organizations, and individuals, contact info, descriptions of their products/services, etc.
2. a website that is one of many clients* of the big repository, that would display the info, permit searches, display results on an interactive map, maybe provide more value-added reports for a fee, etc.
3. a set of tools & protocols that allow fast, efficient, relatively easy merging and cleaning of data to and from the repository.
* Other clients would be members of the Data Commons Cooperative, and as such, subscribers to periodic updates from the repository, as well as suppliers of their own updates back to the repository (somewhat in the style of the AP news story cooperative, where members both contribute and use stories, and also non-members can sign up to just be users for a fee).
The idea here is that if an organization changes its contact info, say, or has an announcement, then it would be great if that change were picked up in one place and broadcast out to all
the places that have an interest in it, instead of piecemeal as each place goes through and updates their database, or that organization having to somehow contact everyone to tell them about their changes.
For example, a new worker coop formed in Western Mass. would be of interest to at least the Valley Alliance of Worker Coops, the ECWD, the USFWC, NCBA, NASCO, CFNE, SEN, GEO, MASSEIO, any industry or sector-specific networks that it might belong to, etc. etc., not to mention potential clients or suppliers. So, without getting too long-winded about all the possibilities, that's what we'd like to create, in a way that actually creates value for our users and would be financially sustainable.
Early Adopters
Since we are gearing up to become a consumer-owned coop, this early "sweat equity" investment would count for most or all of the equity investment to join as a member of the coop. And the better feedback we get, the better we make our products and services, the more valuable being a member is.
Monday, April 13, 2009
Open Everything NYC
From johndbritton:Open Everything is a global conversation about the art, science and spirit of 'open'. It gathers people using openness to create and improve software, education, media, philanthropy, architecture, neighbourhoods, workplaces and the society we live in: everything. It's about thinking, doing and being open.
Good to see conversations like this happening! It seems like the theme of openness is cropping up more and more in every field, but the opportunities for communicating between those fields are few and far between.Open Everything NYC will take place on Saturday 18 April 2009 at the UNICEF headquarters in the United Nations Plaza, NYC. The event will run the full day, registration will open at 8:00AM and things will be in full swing by 9:00AM.
The event will be 100% free and open to the public on a first come first serve basis, online pre-registration is required. The main hall can hold up to 250 guests.
The event will consist of two keynote presentations (one opening & one closing) each of about 1 hour in duration. In the time between the two keynotes attendees will be in control of the program (Barcamp style). There will be a number of conference rooms available for individuals to hold talks & discussions on topics they see fit. Past events have included topics such as Open Publishing, Open Education, Government Transparency, Open Access, Open Research Data, Creative Commons, Open Hardware, and more.
Tuesday, April 7, 2009
International Mapping & Database Projects
The Brazilian solidarity economy directory (called "Solidarius") has been in development since 2005 and, after two phases of participatory "mapping" of enterprises, now lists over 22,000 initiatives and is developing powerful information tech features to increase the usefulness of its database to grassroots economic movements.
Here's the link to Solidarius.
This directory includes basic and advanced search features, a virtual marketplace for solidarity economy products and services, educational and informational resources, a network-building facilitation feature, and an integrated social currency that facilitates exchange among solidarity economy consumers and producers.
The software (which we need to learn more about in terms of technical specifications) is, as far as I understand, open source.
The Quebec database is another informative project, though my lack of French makes it difficult to fully explore. Here's the link. One aspect of this project is that is it a "portal" rather than simply a directory. They intend to create a kind of "one stop shopping" for information on the "social economy." The directory, then, is placed in the context of news, job offers, events postings, and an online commerce feature.
The Brazilian database developers are currently working with those in Quebec on a system through which both databases would "talk to each other." I don't know the details of this project.
Mapping Solidarity Economy Networks
Today: the concept of "mapping networks." One of the potentially powerful applications of a comprehensive relational database of cooperative/solidarity economy initiatives is that we could begin to "map" the concrete economic relationships between them--supply chains, product distribution routes and markets. A dynamic analysis of these relationships could allow SE enterprises to visually understand new possibilities for building economic relationships across sectors and geographical regions. We would be able to understand where our relational strengths lie, be able to map the patterns of the networks to better understand their topology, and see where the "holes" are that could be filled.
This is a way of thinking that works effectively for some capitalist firms and production networks, so why not for solidarity networks?
Here are some links:
Monday, April 6, 2009
First Post: We got some money!
We also don't want to forget all of the individual and organizational donors who gave to the DCP during our fundraising campaign last year. Thanks for believing in us. We are on the job.