digital odyssey 2012: open data
TRANSCRIPT
Open Data is Dead!Long Live Open Data!
MJ SuhonosJune 8, 2012
①
The web and openness
2009: The Next Web• TED talk on the 20th anniversary of
the WWW
• Idea of WWW borne of frustration
• Unrealized potential due to
incompatibility
• Virtual documentation system on the
Internet
”vague, but exciting”
A new way of thinking
• CD-ROMs already had isolated
hyperlinking
• Later done "on the side, as a play
project”
• Made everything openly and freely
available
A grassroots movement
• People started doing things that
weren't imagined originally
• Network effect: more involvement =
more new, interesting, useful things
• Most valuable thing was the
community
Openness Movements
• About community and culture
building
• Based around a new way of thinking
• Facilitated by a new technology
Openness Movements
• Open Access: 1997 (SPARC)
• Open Source: 1998 (Open Source
Summit)
Old ideas rebooted
• Both actually go back to about 1910
• New movements based on the idea
of non-rivalry (digital reproduction)
• Facilitated by the Internet and WWW
The value of data
• Data is only useful when someone
does something with it
• No data = zero possibilities
• More unrealized potential
RawDataNow!
Gold stars of Open Data1. Make your stuff openly available on the
web ★2. Make it available as structured data
★★
e.g. Excel instead of PDF
3. Use a non-proprietary format ★★★
e.g. CSV instead of Excel
2010: TPL Open Data
• First project was to submit the entire
catalogue to the Internet Archive
• 2.5 million MARC records, about 2GB
http://archive.org/details/
marc_toronto_public_library
Open catalogue data
• 2/3 stars for binary MARC format ★★
• Downloaded 89 times since 2010
• U of T: 5400 times, UPEI 2900 times
• TPL is hands-off: no updates, no
license
2009-2010
OCLC record use policy• Trying to protect their business
model by preventing sharing
• Deliberately exploited uncertainty of
legality
• Librarians argued vocally for public
domain
• Policy retracted and changed (not
defensible)
Circling the wagons
• Libraries have the power to fight
back
• Best counter-strategy is to release
the data
• Need the ability to work together
somehow
②
Linked Data
Linked Data
• Technical framework for data
interoperability
• A common language for sharing data
and relations online
• More unrealized potential due to
massive incompatibility & “siloing”
A new way of thinking
• Fundamentally differs from
conceptualization underlying data
formats of the 20th century
• From concept of "records" as
bounded sets, to an unbounded set
of "statements”
Based on a new technology
• Same principles and mechanisms as
WWW
– URIs for names, HTTP for retrieval, plus
RDF
• Still organized facts about things, but
infinitely more flexible structure
”vague, but exciting”
Why Linked Data?
• Breaking data out of silos by pointing to
and linking between other databases
• Formulate questions for which no answer
exists on the current WWW
• Anyone can contribute unique expertise in
a form that can be reused and recombined
“The coolest thing to do to your data will be thought of by
someone else.”
③
Open Data
Open Data• Legal and policy framework for data
interoperability
• Clarifies the terms and purposes of
data use
• Allows for a spectrum of licensing
options
– see Creative Commons
Open Data definition
“freely usable, reusable and
redistributable, subject, at most,
to the requirements to attribute
and share-alike”
http://opendefinition.org/okd/
Database hugging• People don't want to let go of their
data:
– until it's perfect or complete or
"finished”
– because data is raw and unpolished and
ugly
– because “we know better than everyone
else”
– something unforeseeably terrible might
happen
Misconception #1
• Open Data will destroy/compromise
quality
– Already a lot of high-quality data being
created outside of libraries
– Our MARC records aren't actually that
great
Misconception #2• Open Data will reveal our
mistakes/problems
– everyone's data is messy, that’s its
nature
– what if someone were able to clean it up
for you?
Misconception #3
• Open Data will facilitate competition
– new and useful tools are good, even
ones that involve money
– what if someone does a better job with
our data than we do?
Misconception #4
• Open Data is a loss of control
– if you deliberately make it available, you
can set the (legal) terms of its use
– requires thinking about / dealing with
legal stuff
An increasing trend• 2012: Canada Post Files Copyright
Lawsuit Over Crowd-sourced Postal
Code Database
http://geocoder.ca/?sued=1
1. take down the openly-licensed
database
2. pay damages on lost business
($5500/year)
New library business model
1. Sell access to library catalogue data
2. Sue every organization who makes
bibliographic data available for free
e.g. Internet Archive, Amazon, Library of
Congress
3. Profit!
Open Data vs. Linked Data• Open Data does not have to be
Linked Data
• Linked Data does not require it to be
Open
• But the potential of the both is best
realized when data is published as
Open Linked Data
Open Linked Data
Linked
Data
Open Data
Gold stars of Open Linked Data1. Make your stuff openly available on the
web ★2. Make it available as structured data
★★
3. Use a non-proprietary format ★★★
4. Use URIs to identify your things ★★★★
5. Link to other people’s things using URIs
★★★★★
④
Libraries & The Semantic Web
2011: Library Linked Data
• W3C Library Linked Data incubator
group
• Panel of invited librarians,
academics, experts
• “to help increase global
interoperability of library data on the
Semantic Web”
• Final report produced October 2011
A struggle for relevancy
• "library" = all cultural heritage & memory
institutions (archives, museums)
• Natural extension to the collaborative sharing
models historically employed by libraries
• In a position to provide trusted metadata for
resources of long-term cultural importance
Major goals for libraries
1. Foster discussion about Open Data and
rights management issues
2. Develop library standards that are
compatible with Linked Data
3. Apply library experience in curation and
long-term preservation to Open Linked
Data
A discussion about Open Data
• Data can have unclear and untested rights
issues that hinder their release as Open Data
• Seek agreement with owners about licensing;
consider the impact of usage restrictions
• Establish institutional policies for data sharing
and licensing
Issues with library standards
• Data is expressed primarily in natural-
language text
• Technology changes depend on vendor
systems development
• Data is not integrated with web resources
• Designed only for the library community
Benefits of Open Linked Data
• Will be able to use mainstream solutions
• Can give libraries a wider choice of vendors
and developers to recruit from and interact
with
• Much larger community to provide IT support
• Smaller institutions can make themselves
more visible and connected
Already going mainstream
• National libraries of Sweden, Hungary,
Germany, France, the British Library, L of C
• BNB: 2.6 million records as 85 million RDF
statements, public domain license
• Cities of Vancouver, Edmonton, Ottawa, and
Toronto have created grassroots @g4open
⑤
In Summary
Now is the time
• Missed opportunities before
• Don’t often get a second chance
• Major opportunity here for libraries to
catch up and become leaders online
Open Data Now!• Remember the 5 stars of Open
Linked Data
1. Choose a license, keep control of the
rights
2. Release the data – just get it out
there