open data and ckan data catalogues
TRANSCRIPT
Open Data & coding data.gov.uk
David Read
Open Knowledge Foundation
Contents
The context: Linked Open Data
Our data catalogue: CKAN
data.gov.uk using CKAN
Discussion
Open Data
Data is expensive to createBut think of the mutual
benefits of it being open
AccessibleAllowed to use and republishWithout restiction
Science
UEA criticised for a "culture of withholding information."
CC-BY-SA http://commons.wikimedia.org/wiki/User:ChrisO
Geographic data
Also Haiti, iPhone cycle map
Public data
Allowing with JobCentresPlus, was highlighted as an innovative use of government data
Linking data
Dr. Hans Rosling, Professor of Global Health, Karolinska Institute, Sweden (TED talk)
But linking data is even more powerful
Health and economic data
Dr. Hans Rosling, Professor of Global Health, Karolinska Institute, Sweden (TED talk)
Linking data 2
Geonames lat/long of place namesDbpedia munge of Wikipedia content
e.g. Where do footballers in the premiership come from?
Linked data
Note: google maps here Google have built their business on being very good at not only search, but linking data too. Map has restaurants, travel directions, traffic, related ads. This profits them, but what about the rest of society?
Opening government data
Transparency --> effectiveness
Labour and Conservatives agree (!)
with Cambridge economists:Making government datasets public will
bring a 6bn boost to UK economy
(We have paid for it...)
Open Data and
Open Software
Zero cost
Good performance
Principles: Many hands make light work / natural selection / wisdom of crowd / on shoulders of giants
Not a proprietry format
No supplier lock-in
One way achieve what hundreds of organised and motivated Google programmers do?
Infrastructure
SoftwareData
LicenceGPLPDDL, ODbL, ODC-By (OKF 2007-)isitopendata.org (OKF 2009-)
Modules/LinkingLib, eggSpreadsheet, database, RDF/OWL
Human DiscoveryCKAN (OKF 2008-)
Automatic DistributionApt-get, CPAN, easy_installCKANdatapkg (OKF 2008-)
HostingSourceforge, PyPI, bitbucketarchive.org / knowledgeforge.net
Communityfreshmeatdata.gov.uk email list closest?
Installing linux packages really sophisticated system of downloading lots of modules and they work together
Someone might combine a couple of datasets, may well do some cleaning, produce a graph, but doesn't give back the data.
Also: Scraperwiki
Open Knowledge Foundation
Aim: promote Open Knowledge
Founded 2004 as a 'not for profit' organisation
Strong connections with Cambridge University
A key director: Rufus Pollock
Volunteer driven
Create software tools (CKAN, KnowledgeForge), organise conferences, licenses, create visuals & mash-ups (Where Does My Money Go, Open Shakespeare), campaigns (Panton Principals)
Introducing... CKAN
Comprehensive Knowledge Archive Network ...well... a fancy Data Catalog
CKAN is a registry or catalogue system for datasets or other "knowledge" resources. CKAN aims to make it easy to find, share and reuse open content and data, especially in ways that are machine automatable.
Data Package
nametitleversionurlauthor
licencenotesextras
Tag
name
Resource
urlformatdescriptionhash
CKAN data model
*
*
*
Group
nametitledescription
*
*
Core metadata based on debian package.No dependencies shown here, but we do have that too.
Wiki
API
REST
$ curl http://ckan.net/api/rest/package["2000-us-census-rdf", "32000-naples-florida-businesses-kml", "aaoe-87", "acawiki", "adb-sdbs", "addgene", "adopt-a-roadside",
Search
$ curl "http://ckan.net/api/search/resource?url=.fr&all_fields=1"{"count": 6, "results": [{"id": "819c811c-7afc-4d4f-a7f8-aca0b2a84df5", "package_id": "0ad0dbb9-e1b7-43d6-9fae-ca92a889e871", "url": "http://www.frst.govt.nz/funding/futurefunding", "format": "Spreadsheet", "description": "Future funding (FRST): Spreadsheet", "hash": "", "position": 0}, {"id": ...
$ curl http://ckan.net/api/rest/package/coins-data{"id": "78eccf9d-d5b3-4dbd-8ada-6801cfd7e4c8", "name": "coins-data", "title": "COINS data", "version": null, "url": "http://data.gov.uk/dataset/coins", "author": "HM Treasury (UK Government)", "author_email": null, "maintainer": null, "maintainer_email": null, "notes": "### About\r\n\r\nThe UK Government's HM Trea...
Can also update via API.
Also have python, php, Drupal, Wordpress and other clients to help access API.
datapkg
Getting a data package
$ datapkg index-add file:///.... $ datapkg update $ datapkg search "military spending" military: Military Spending 1890-1914 military-norm: Military Spending 1890-1914 (normalized) $ datapkg install military-norm Downloading military-norm and dependencies. $ datapkg plot military
$ datapkg create military-uk-usa table.csv Military spending UK vs USA $ datapkg register military-uk-usa
Upload derivative data
CKAN communities
Europe: Austria, Hungary, Germany, Italy, Finland, Netherlands, France, Norway
North America: Colorado, Canada
Australasia: New Zealand
Lobbying governements, or just tocollect known datasets.
Groups like ownership and personalisation of the site.
Sharing metadata
ckan.netcanada.ckan.netit.ckan.netno.ckan.netdata.gov.no
data.gov.it
clone/push/pull/merge/reject changes
Architecture
Drupal front-endPylons front-end(genshi, routes, repoze.who)Vdm - Versioned Domain ModelPostgresREST & Search APIssqlalchemy
view
controller
FormalchemyData importscriptsSearchExport scriptsrepoze.whomodel
Atom feeds
carrotpyamqp
sqlalchemy-migrateblinker
data.gov.uk
Gordon Brown invited Tim Berners-Lee for exciting digital plans
David Cameron supportive
Run by Cabinet Office,
aided by The National Archives
Raw Data Now, then improve and link
COI team produce Drupal front-end with OKFN producing CKAN back-end
Measuring success
Stats: users, number of datasets, per department, big wins: Ordnance Survey, Coins, top public salaries
Creation of visualisations, apps, linked data, news stories, companies - 6bn
CKAN similar goals
What do you think?
Software Learnings
Pylons flexible, organised, powerful to customise
Formalchemy tough to get beyond basics (had to read lots of code), but really neat, flexible & powerful system
Pip, virtualenv, nose use happily
Drupal interfacing Drupal modules rely on internal model
CKAN futures
More metadata fields and guidance / control
INSPIRE geographic bounding boxes
Improve navigating datasets to help linking data
Improving RDF catalog
Keep goal of supporting automated linking data
Suggestions please!
Project learnings
Open source, trac, email discussionGood for getting feedback and people involved
Slightly worrying
Easy to get flooded with requests
Easy to criticise high load on launch
Civil servants surprisingly happy to open data
Questions
Muokkaa otsikon tekstimuotoa napsauttamalla
Muokkaa jsennyksen tekstimuotoa napsauttamallaToinen jsennystasoKolmas jsennystasoNeljs jsennystasoViides jsennystasoKuudes jsennystasoSeitsems jsennystasoKahdeksas jsennystasoYhdekss jsennystaso