Who owns (y)our data?
The panel I most wish I had been at the Web 2.0 Summit to see is the one on Open Data (see “Web 2.0 Confab Takes Aim at Closed Platforms” and “Google CEO Eric Schmidt: We Would Never Trap User Data“). In Marc Hedlund’s summary:
Whenever people talk about the new wave of web applications like Flickr and del.icio.us, the idea of users contributing their data to a pool of information on a site Ã¢â‚¬â€ photos on Flickr, bookmarks on del.icio.us, and so on Ã¢â‚¬â€ always comes up. Open data is about the next step Ã¢â‚¬â€ what then? What happens to my information once I share it on a web site, and what can I do to control it?
I’ve been thinking about this a lot since OSCON ’06 earlier this summer, where the issue was raised most clearly by Tim O’Reilly in one of the opening keynotes.
He pointed out that in a world shifting toward services, open source licenses are in one sense obsolete. In using a service like Google, the lack of openness of their source code is in essence irellevant for the end user. (Of course, the cost effectiveness of Google’s infrastructure is based on the usage of open source, and Google does provide patronage for a number of open source projects).
Given a shift toward web-based services over installed desktop applications (see “Data is the next Intel Inside“) – he asked, what kind of new licenses (and new definitions) will be necessary for open data and open services?
In other words, what will be the equivalent of the Open Source Definition and the OSI Approved Licenses for web applications which never distribute source code?
Tim Bray, in a blog post following OSCON, offered the following definition:
I think any online service can call itself Ã¢â‚¬Å“OpenÃ¢â‚¬Â if it makes, and lives up to, this commitment: Any data that you give us, weÃ¢â‚¬â„¢ll let you take away again, without withholding anything, or encoding it in a proprietary format, or claiming any intellectual-property rights whatsoever.
That’s a fantastic statement of a core principle. It think the “without . . . encoding it in a proprietary format” may be too restrictive – why not allow re-encoding for use within the application, so long as the data is always also made available in an open format?
Hedlund, whose summary of the issues I quoted above, goes on in the same post to outline the “Data Bill of Rights” Wesabe (his company) announced as the basis for their application (which launched today
has not launched yet, but you can request access to the preview):
- You can export and/or delete your data from Wesabe whenever you want.
- Your data is your data, not ours. Our job is to help you understand and act on your data.
- WeÃ¢â‚¬â„¢ll keep all of your data online and accessible for as long as you have an account. No Ã¢â‚¬Å“archive accessÃ¢â‚¬Â charges.
- Any data you want us to keep private, we will.
- If a question comes up not covered by these rights, we will answer it remembering that your data belongs to you.
This seems like a great beginning for a definition of open data. I don’t know how many application providers would be willing to accept they “No ‘archive access’ charge” notion, though they do say “as long as you have an account.” I think it would be fine to enable some kind of content expiration in many cases – we will keep data you provide for a reasonable period of time (which could be spelled out in detail for particular applications).
A few other interesting attempts at answers to the problem:
- The Talis Community License, which “is intended to guarantee your freedom to use, share and modify data and to preserve the availability and accessibility of such data for the wider community”
- MoveMyData.org, which is based on the assertion that “if you can’t move it it’s not really yours.” Really just a discussion at this point, but an interesting proposal for an application which would support users moving content between services.
Eric Schmidt (Google CEO) was also quoted on the subject at the Summit:
“. . . we would never trap user data.” . . . Schmidt was asked if users could get all of their search history and export it to Yahoo. “We would like to do that, as long as it is authenticatedÃ¢â‚¬Â¦.If users can switch it keeps us honest.”
(via Between the Lines).
“The more we can let people move their data around … the better off we’ll be” (via TechWeb)
In other words, while philosophically Google is in agreement, nothing is in place today for access to such data.
(An interesting and early discussion of the issue with respect to Gmail is “Information ownership: it’s not yours until you can move it,” which addressed the question of data ownership in Gmail back in April 2004.)
But what about “our data” – data which is not owned by any single individual contributor but results from collaboration?
What should a site like LinkedIn do, since in part the value of the site is based on the interaction between:
- Data I own (my own profile, job history, etc.)
- Data I co-own along with others (my direct connections – surely they “own” the fact that we are connected as much as I do), and
- Data owned by other people (my direct connections’ links to other people – my “extended network”).
A quick look through their user agreement (which, like most users, I probably never read while collecting all these links) turns up the following:
You understand that by posting materials on the LinkedIn website or otherwise providing materials to LinkedIn, you are granting to LinkedIn Corporation a royalty-free, perpetual, irrevocable license to use this information in the course of offering the LinkedIn service. Furthermore, you understand that LinkedIn retains the right to reformat, excerpt, or translate any materials submitted by you.
In other words, I don’t own even my own profile, really – I have given them an irrevocable license to use it in the course of offering their service.
Remember the exchange in Fast Times at Ridgemont High where Spicolli challenges Mr. Hand?:
Mr. Hand: Mr. Spicoli, you’re on dangerous ground here. You’re causing a major disturbance in my class and on my time.
Jeff Spicoli: I’ve been thinking about this, Mr. Hand. If I’m here… and you’re here… doesn’t that make it our time?
If data is created by multiple users, doesn’t that make it our data? Should I partially own that data?
If an application provider funded the infrastructure which created that added value of linking people together, do they own that data?
If they don’t, who does?