digital odyssey 2012: open data

Post on 29-Jun-2015

345 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Open Data is Dead!Long Live Open Data!

MJ SuhonosJune 8, 2012

The web and openness

2009: The Next Web• TED talk on the 20th anniversary of

the WWW

• Idea of WWW borne of frustration

• Unrealized potential due to

incompatibility

• Virtual documentation system on the

Internet

”vague, but exciting”

A new way of thinking

• CD-ROMs already had isolated

hyperlinking

• Later done "on the side, as a play

project”

• Made everything openly and freely

available

A grassroots movement

• People started doing things that

weren't imagined originally

• Network effect: more involvement =

more new, interesting, useful things

• Most valuable thing was the

community

Openness Movements

• About community and culture

building

• Based around a new way of thinking

• Facilitated by a new technology

Openness Movements

• Open Access: 1997 (SPARC)

• Open Source: 1998 (Open Source

Summit)

Old ideas rebooted

• Both actually go back to about 1910

• New movements based on the idea

of non-rivalry (digital reproduction)

• Facilitated by the Internet and WWW

The value of data

• Data is only useful when someone

does something with it

• No data = zero possibilities

• More unrealized potential

RawDataNow!

Gold stars of Open Data1. Make your stuff openly available on the

web ★2. Make it available as structured data

★★

e.g. Excel instead of PDF

3. Use a non-proprietary format ★★★

e.g. CSV instead of Excel

2010: TPL Open Data

• First project was to submit the entire

catalogue to the Internet Archive

• 2.5 million MARC records, about 2GB

http://archive.org/details/

marc_toronto_public_library

Open catalogue data

• 2/3 stars for binary MARC format ★★

• Downloaded 89 times since 2010

• U of T: 5400 times, UPEI 2900 times

• TPL is hands-off: no updates, no

license

2009-2010

OCLC record use policy• Trying to protect their business

model by preventing sharing

• Deliberately exploited uncertainty of

legality

• Librarians argued vocally for public

domain

• Policy retracted and changed (not

defensible)

Circling the wagons

• Libraries have the power to fight

back

• Best counter-strategy is to release

the data

• Need the ability to work together

somehow

Linked Data

Linked Data

• Technical framework for data

interoperability

• A common language for sharing data

and relations online

• More unrealized potential due to

massive incompatibility & “siloing”

A new way of thinking

• Fundamentally differs from

conceptualization underlying data

formats of the 20th century

• From concept of "records" as

bounded sets, to an unbounded set

of "statements”

Based on a new technology

• Same principles and mechanisms as

WWW

– URIs for names, HTTP for retrieval, plus

RDF

• Still organized facts about things, but

infinitely more flexible structure

”vague, but exciting”

Why Linked Data?

• Breaking data out of silos by pointing to

and linking between other databases

• Formulate questions for which no answer

exists on the current WWW

• Anyone can contribute unique expertise in

a form that can be reused and recombined

“The coolest thing to do to your data will be thought of by

someone else.”

Open Data

Open Data• Legal and policy framework for data

interoperability

• Clarifies the terms and purposes of

data use

• Allows for a spectrum of licensing

options

– see Creative Commons

Open Data definition

“freely usable, reusable and

redistributable, subject, at most,

to the requirements to attribute

and share-alike”

http://opendefinition.org/okd/

Database hugging• People don't want to let go of their

data:

– until it's perfect or complete or

"finished”

– because data is raw and unpolished and

ugly

– because “we know better than everyone

else”

– something unforeseeably terrible might

happen

Misconception #1

• Open Data will destroy/compromise

quality

– Already a lot of high-quality data being

created outside of libraries

– Our MARC records aren't actually that

great

Misconception #2• Open Data will reveal our

mistakes/problems

– everyone's data is messy, that’s its

nature

– what if someone were able to clean it up

for you?

Misconception #3

• Open Data will facilitate competition

– new and useful tools are good, even

ones that involve money

– what if someone does a better job with

our data than we do?

Misconception #4

• Open Data is a loss of control

– if you deliberately make it available, you

can set the (legal) terms of its use

– requires thinking about / dealing with

legal stuff

An increasing trend• 2012: Canada Post Files Copyright

Lawsuit Over Crowd-sourced Postal

Code Database

http://geocoder.ca/?sued=1

1. take down the openly-licensed

database

2. pay damages on lost business

($5500/year)

New library business model

1. Sell access to library catalogue data

2. Sue every organization who makes

bibliographic data available for free

e.g. Internet Archive, Amazon, Library of

Congress

3. Profit!

Open Data vs. Linked Data• Open Data does not have to be

Linked Data

• Linked Data does not require it to be

Open

• But the potential of the both is best

realized when data is published as

Open Linked Data

Open Linked Data

Linked

Data

Open Data

Gold stars of Open Linked Data1. Make your stuff openly available on the

web ★2. Make it available as structured data

★★

3. Use a non-proprietary format ★★★

4. Use URIs to identify your things ★★★★

5. Link to other people’s things using URIs

★★★★★

Libraries & The Semantic Web

2011: Library Linked Data

• W3C Library Linked Data incubator

group

• Panel of invited librarians,

academics, experts

• “to help increase global

interoperability of library data on the

Semantic Web”

• Final report produced October 2011

A struggle for relevancy

• "library" = all cultural heritage & memory

institutions (archives, museums)

• Natural extension to the collaborative sharing

models historically employed by libraries

• In a position to provide trusted metadata for

resources of long-term cultural importance

Major goals for libraries

1. Foster discussion about Open Data and

rights management issues

2. Develop library standards that are

compatible with Linked Data

3. Apply library experience in curation and

long-term preservation to Open Linked

Data

A discussion about Open Data

• Data can have unclear and untested rights

issues that hinder their release as Open Data

• Seek agreement with owners about licensing;

consider the impact of usage restrictions

• Establish institutional policies for data sharing

and licensing

Issues with library standards

• Data is expressed primarily in natural-

language text

• Technology changes depend on vendor

systems development

• Data is not integrated with web resources

• Designed only for the library community

Benefits of Open Linked Data

• Will be able to use mainstream solutions

• Can give libraries a wider choice of vendors

and developers to recruit from and interact

with

• Much larger community to provide IT support

• Smaller institutions can make themselves

more visible and connected

Already going mainstream

• National libraries of Sweden, Hungary,

Germany, France, the British Library, L of C

• BNB: 2.6 million records as 85 million RDF

statements, public domain license

• Cities of Vancouver, Edmonton, Ottawa, and

Toronto have created grassroots @g4open

In Summary

Now is the time

• Missed opportunities before

• Don’t often get a second chance

• Major opportunity here for libraries to

catch up and become leaders online

Open Data Now!• Remember the 5 stars of Open

Linked Data

1. Choose a license, keep control of the

rights

2. Release the data – just get it out

there

Thanks!

@mjsuhonos

mj@suhonos.ca

http://mj.suhonos.ca

top related