open data and ckan data catalogues

Download Open Data and CKAN Data Catalogues

If you can't read please download the document

Upload: david-read

Post on 16-Apr-2017

1.531 views

Category:

Technology


2 download

TRANSCRIPT

Open Data & coding data.gov.uk

David Read

Open Knowledge Foundation

[email protected]

Contents

The context: Linked Open Data

Our data catalogue: CKAN

data.gov.uk using CKAN

Discussion

Open Data

Data is expensive to createBut think of the mutual
benefits of it being open

AccessibleAllowed to use and republishWithout restiction

Science

UEA criticised for a "culture of withholding information."

CC-BY-SA http://commons.wikimedia.org/wiki/User:ChrisO

Geographic data

Also Haiti, iPhone cycle map

Public data

Allowing with JobCentresPlus, was highlighted as an innovative use of government data

Linking data

Dr. Hans Rosling, Professor of Global Health, Karolinska Institute, Sweden (TED talk)

But linking data is even more powerful

Health and economic data

Dr. Hans Rosling, Professor of Global Health, Karolinska Institute, Sweden (TED talk)

Linking data 2

Geonames lat/long of place namesDbpedia munge of Wikipedia content

e.g. Where do footballers in the premiership come from?

Linked data

Note: google maps here Google have built their business on being very good at not only search, but linking data too. Map has restaurants, travel directions, traffic, related ads. This profits them, but what about the rest of society?

Opening government data

Transparency --> effectiveness

Labour and Conservatives agree (!)
with Cambridge economists:Making government datasets public will bring a 6bn boost to UK economy

(We have paid for it...)

Open Data and
Open Software

Zero cost

Good performance

Principles: Many hands make light work / natural selection / wisdom of crowd / on shoulders of giants

Not a proprietry format

No supplier lock-in

One way achieve what hundreds of organised and motivated Google programmers do?

Infrastructure

SoftwareData

LicenceGPLPDDL, ODbL, ODC-By (OKF 2007-)isitopendata.org (OKF 2009-)

Modules/LinkingLib, eggSpreadsheet, database, RDF/OWL

Human DiscoveryCKAN (OKF 2008-)

Automatic DistributionApt-get, CPAN, easy_installCKANdatapkg (OKF 2008-)

HostingSourceforge, PyPI, bitbucketarchive.org / knowledgeforge.net

Communityfreshmeatdata.gov.uk email list closest?

Installing linux packages really sophisticated system of downloading lots of modules and they work together

Someone might combine a couple of datasets, may well do some cleaning, produce a graph, but doesn't give back the data.

Also: Scraperwiki

Open Knowledge Foundation

Aim: promote Open Knowledge

Founded 2004 as a 'not for profit' organisation

Strong connections with Cambridge University

A key director: Rufus Pollock

Volunteer driven

Create software tools (CKAN, KnowledgeForge), organise conferences, licenses, create visuals & mash-ups (Where Does My Money Go, Open Shakespeare), campaigns (Panton Principals)

Introducing... CKAN

Comprehensive Knowledge Archive Network ...well... a fancy Data Catalog

CKAN is a registry or catalogue system for datasets or other "knowledge" resources. CKAN aims to make it easy to find, share and reuse open content and data, especially in ways that are machine automatable.

Data Package

nametitleversionurlauthor
licencenotesextras

Tag

name

Resource

urlformatdescriptionhash

CKAN data model

*

*

*

Group

nametitledescription

*

*

Core metadata based on debian package.No dependencies shown here, but we do have that too.

Wiki

API

REST

$ curl http://ckan.net/api/rest/package["2000-us-census-rdf", "32000-naples-florida-businesses-kml", "aaoe-87", "acawiki", "adb-sdbs", "addgene", "adopt-a-roadside",

Search

$ curl "http://ckan.net/api/search/resource?url=.fr&all_fields=1"{"count": 6, "results": [{"id": "819c811c-7afc-4d4f-a7f8-aca0b2a84df5", "package_id": "0ad0dbb9-e1b7-43d6-9fae-ca92a889e871", "url": "http://www.frst.govt.nz/funding/futurefunding", "format": "Spreadsheet", "description": "Future funding (FRST): Spreadsheet", "hash": "", "position": 0}, {"id": ...

$ curl http://ckan.net/api/rest/package/coins-data{"id": "78eccf9d-d5b3-4dbd-8ada-6801cfd7e4c8", "name": "coins-data", "title": "COINS data", "version": null, "url": "http://data.gov.uk/dataset/coins", "author": "HM Treasury (UK Government)", "author_email": null, "maintainer": null, "maintainer_email": null, "notes": "### About\r\n\r\nThe UK Government's HM Trea...

Can also update via API.

Also have python, php, Drupal, Wordpress and other clients to help access API.

datapkg

Getting a data package

$ datapkg index-add file:///.... $ datapkg update $ datapkg search "military spending" military: Military Spending 1890-1914 military-norm: Military Spending 1890-1914 (normalized) $ datapkg install military-norm Downloading military-norm and dependencies. $ datapkg plot military

$ datapkg create military-uk-usa table.csv Military spending UK vs USA $ datapkg register military-uk-usa

Upload derivative data

CKAN communities

Europe: Austria, Hungary, Germany, Italy, Finland, Netherlands, France, Norway

North America: Colorado, Canada

Australasia: New Zealand

Lobbying governements, or just tocollect known datasets.

Groups like ownership and personalisation of the site.

Sharing metadata

ckan.netcanada.ckan.netit.ckan.netno.ckan.netdata.gov.no

data.gov.it

clone/push/pull/merge/reject changes

Architecture

Drupal front-endPylons front-end(genshi, routes, repoze.who)Vdm - Versioned Domain ModelPostgresREST & Search APIssqlalchemy

view

controller

FormalchemyData importscriptsSearchExport scriptsrepoze.whomodel

Atom feeds

carrotpyamqp

sqlalchemy-migrateblinker

data.gov.uk

Gordon Brown invited Tim Berners-Lee for exciting digital plans

David Cameron supportive

Run by Cabinet Office,
aided by The National Archives

Raw Data Now, then improve and link

COI team produce Drupal front-end with OKFN producing CKAN back-end

Measuring success

Stats: users, number of datasets, per department, big wins: Ordnance Survey, Coins, top public salaries

Creation of visualisations, apps, linked data, news stories, companies - 6bn

CKAN similar goals

What do you think?

Software Learnings

Pylons flexible, organised, powerful to customise

Formalchemy tough to get beyond basics (had to read lots of code), but really neat, flexible & powerful system

Pip, virtualenv, nose use happily

Drupal interfacing Drupal modules rely on internal model

CKAN futures

More metadata fields and guidance / control

INSPIRE geographic bounding boxes

Improve navigating datasets to help linking data

Improving RDF catalog

Keep goal of supporting automated linking data

Suggestions please!

Project learnings

Open source, trac, email discussionGood for getting feedback and people involved

Slightly worrying

Easy to get flooded with requests

Easy to criticise high load on launch

Civil servants surprisingly happy to open data

Questions

Muokkaa otsikon tekstimuotoa napsauttamalla

Muokkaa jsennyksen tekstimuotoa napsauttamallaToinen jsennystasoKolmas jsennystasoNeljs jsennystasoViides jsennystasoKuudes jsennystasoSeitsems jsennystasoKahdeksas jsennystasoYhdekss jsennystaso