ckan overview

81
Open source data catalog An overview of CKAN Augusto Herrmann Open Knowledge Brazil

Upload: augusto-herrmann-batista

Post on 14-Jul-2015

387 views

Category:

Software


3 download

TRANSCRIPT

Open source data catalog

An overview of CKANAugusto Herrmann

Open Knowledge Brazil

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Topics covered in this presentation

• Introduction

○ what is CKAN

○ who uses it

○ feature tour

• Features of CKAN

• Data publishing

2

• Under the hood

○ installation and maintenance

• Site administration

• Directions (where to find stuff)

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Time constraints

• pick and choose topics accordingly

• I’ll be quick, but will address questions

3

by Moyan Brenn

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

4

First, a quick poll

•who is familiar with

○ the concepts of open data

○ browsing open data catalogs

○ including data in CKAN catalogs

○ installing CKAN

○ developing / theming CKAN

by sean dreilinger

What is it?

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

What is it?

Comprehensive

Knowledge

Archive

Network

by degreezero2000

6

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

What is it?

An open source software for open data catalogs

by Steven de Costa

7

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

What is it?

An open source software for open data catalogs

Affero GPL 3 Licence

● if you offer it as software-as-a-service (SaaS), you also haveto make source code available

https://github.com/ckan/ckan

more than 7 years old

more than 80 developers

8

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

What is it?

An open source software for open data catalogs

● stores metadata, not data itself(in principle)

● makes it easy to find data

● keep handy documentation about data

by Reeding Lessons

9

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

What is it?

An open source software for open data catalogs

● data must be available on the internetin a permanent URL

○ directly linkable

by Dave Winer

10

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

What is it?

An open source software for open data catalogs

● data must be available on the internetin a permanent URL

○ no captcha!

by L

uChO

eDu

11

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

What is it?

An open source software for open data catalogs

● structured data

○ no tables inside pdf or doc

■ common offenders: statistic bulletins,official press

○ no tables as images

by Petras Gagilas

12

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

What is it?

An open source software for open data catalogs

● open formats

○ common formats: csv, json, xml, rdf

● open licences

○ “Open data and content can be freelyused, modified, and shared by anyonefor any purpose” - opendefinition.org

○ examples: CC 4.0, ODbL, OGLby Jonathan Grey

13

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Who makes it?

● Open Knowledgehttp://okfn.org

http://br.okfn.org● Community of developers

http://github.com/ckan/ckan

● Governance: CKAN Associationhttp://ckan.org/about/association

14

Who uses it?

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Who uses it?

● national governments

● local and regionalgovernments

● parliaments

● civil society(e.g. community instances)

● research institutions(open research data)

more at: http://ckan.org/instances

16

Who uses it?

National govenments

CKAN Overview | Augusto Herrmann

IV Moscow Urban Forum

data.gov.uk

18

United Kingdom

Source code:https://github.com/datagovuk

CKAN Overview | Augusto Herrmann

IV Moscow Urban Forum

data.gov

19

USA

CKAN Overview | Augusto Herrmann

IV Moscow Urban Forum

dados.gov.br

20

Brazil

Source code:http://dev.dados.gov.br/codigo/dev/tema-ckan

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

and many other countries

21

● Argentina

● Australia

● Austria

● Canada

● Germany

● Iceland

● Ireland

● Italia

● Japan

● Mexico

● Netherlands

● Norway

● Romania

● Slovakia

● Sweden

● Switzerland

● Uruguay

Riley Kaminer

Who uses it?

City govenments

CKAN Overview | Augusto Herrmann

IV Moscow Urban Forum

dados.recife.pe.gov.br

23

Recife, PE, Brazil

Source code:http://dados.recife.pe.gov.br/source/ckan_dados_recife_20140828.zip

CKAN Overview | Augusto Herrmann

IV Moscow Urban Forum

data.rio.rj.gov.br

24

Rio de Janeiro, RJ, Brazil

CKAN Overview | Augusto Herrmann

IV Moscow Urban Forum

datapoa.com.br

25

Porto Alegre, RS, Brazil

CKAN Overview | Augusto Herrmann

IV Moscow Urban Forum

data.buenosaires.gob.ar

26

Buenos Aires, Argentina

CKAN Overview | Augusto Herrmann

IV Moscow Urban Forum

opendata.caceres.es

27

Cáceres, Spain

CKAN Overview | Augusto Herrmann

IV Moscow Urban Forum

data.kk.dk

28

Copenhagen, Denmark

Who uses it?

Community instances

CKAN Overview | Augusto Herrmann

IV Moscow Urban Forum

datahub.io

30

Open Knowledge

CKAN Overview | Augusto Herrmann

IV Moscow Urban Forum

hubofdata.ru

31

OpenGovData.ru

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Internationalization (i18n)

● available in 53 languages

● languages with 99% or more completein version 2.2:○ bulgarian○ catalan○ czech○ dutch○ french○ finnish○ german○ italian

○ japanese○ norweigan○ portuguese (br)○ spanish○ swedish

32

by Eric Andresen

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Russian localization

● 92% completed for version 2.2

● translation of version 2.3 will soon begin

● join the localization team:○ collaborative translation platform - Transifex○ https://www.transifex.com/projects/p/ckan/language/ru/

33

Features

by Jereme Rauckman

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Catalog and search data

● catalog through the web interface,

using the API or harvesting tools

● search all metadata fields

● faceted search

○ organization, tag,format, license

● data is sorted out as “datasets”

and “resources”

35

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Find related data

● related or similar resources

are registered in the same

dataset (e.g. same data, but

different format; same data,

but for differing time periods,

etc.)

36

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Find relevant metadata

● title

● description

● unique identifier

● author and maintainer

● license

● website or source page for the data

● groups, tags, organizations

● format (for the resource)

● other (including custom ones)

37

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Preview data

● preview a sample of the resource

as a table, chart, map, etc.

● interactive - e.g. tables are sortable

by column, axes in charts can

be configured to any column, etc.

● uses the recline.js data visualization

library

38

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Preview data

39

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Handle geospatial data

● through the ckanext-spatial extension

● visualize geo data in a map

(e.g. contours of plazas and parks)

● search for data inside a user-defined

bounding box selectable by the user

in a search query

40

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

See a dataset’s change history

● track changes to a dataset

● see who did what and when

41

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Sort out datasets by organization

● each organization can

manage their own data

in the catalog and authorize

users who can edit

● gets their own page in the

catalog with visibility for the

data they publish

● is also a facet available

for search

42

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Sort out datasets into groups

● another way to link related

datasets

● useful for thematic

classification

● is also a facet available for

search

43

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Sort out datasets into tags

● free-form user (editor) defined tags

● also for searching

44

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Custom themes

● simple customization (colors, layout of main page, portal title, etc.) can be made

through the user interface by the site administrator

● for deeper customization, use the extension programming interface (Python) and

develop custom templates (Jinja2)

45

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Extensible

● programming interface

for creating extensions

● extension repository

extensions.ckan.org

● has many extensions with

varying degrees of

maturity

46James Petts

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

FileStore and DataStore

● built-in extensions

● FileStore: allow for uploading files and

store them in CKAN, instead of just

linking to a URL

● DataStore: allow for querying data through

an API, even “joining” data from different

resources

○ also comes with the DataPusher service,

which updates the DataStore on each

file registered

47

DRs Kulturarvsprojekt

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Harvesting

● metadata can be harvested from another portal by using the etension ckanex-harvest

● in (configurable) time, data newly catalogued or modified in the source will show up in

the harvesting portal

48

by Martin Pettitt

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Feedback

● there are extensions for users

to comment in a specific dataset

● stimulates discussion about and

improvement of data

49

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Access by API

● uses http requests (pseudo-RESTful)

● consumes and returns metadata in JSON format

● you can do programmatically any operation you can do

using the UI (e.g., searching)

● by using an access key on the API you can

overcome access throttling limitations

and also do any of the same read and write operations

your user is allowed to do via UI

● useful for processing and cataloguing data in great

volumes (e.g. apply a fix to many datasets in a batch,

include many similar resources in a dataset, etc.)

50

by Andrea Vallejos

Cataloguing data on CKAN

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Datasets and resources

● resources can be data files, API entry points, query examples, extended data

documentation, etc.

● a resource has exactly one format and URL

● datasets can have one or more resources

● as a general guideline, can be catalogued under the same dataset:

○ resources that are representations of the same data in various formats

○ resources that are about the same data but in different time periods

○ resources that are about the same data but in different regional spans

52

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Datasets and resources

● a dataset has

○ a single source (URL for a source page of the data)

○ a single license

○ a single author

○ a single maintainer

○ a single (or none) organization

○ a set of groups that applies to the whole dataset

○ a set of tags that applies to the whole dataset

53

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Organizations

● only organization editors (or admins)

can create datasets in it

● users can create datasets in any

organizations for which they are editors

● organization admins can invite existing

or new users for the organization and

assign them a role (member, editor or

administrator)

54

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Creating a new dataset

● Click “add a new dataset”

○ on the dataset search screen; or

○ on the organization screen for an organization for which you are an editor

or admin

55

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Creating a new dataset

● CKAN will ask for the following basic metadata:

○ title

○ description

○ tags

○ license

○ organization (if you’re editor on

more than one organization)

● when finished, click “Next: add data”

to include resources

56

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Including resources

● select “link to file”, “link to an API” or “upload a file” (in case FileStore is

enabled)

● type in name, description and format

● if you have other resources to include,

select “save & add another”

● after including all resources, click

“next: additional info”

57

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Additional dataset information

● “visibility”: “public” can be seen by any site visitor; “private” means visible to

members of the organization only

● “author” / “author e-mail”: person or organization responsible for producing

the data

● “maintainer” / “maintainer e-mail”: person or organization technically

responsible for keeping data available

● optional custom fields

● press “finish” to create the dataset

58

Under the hood

BiblioArchives / LibraryArchives

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

System Architecture

• Usually sits alongside a CMS (e.g. Drupal or Wordpress)

• WGSI Application pluggable to Apache (modwsgi), to nginx, etc.

• PostgreSQL database (metadata, access control, etc.)

• Apache Solr (for indexing and searching)

• Other components (depending on the installed and in-use extensions)

60

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Installing CKAN

• Supported operating system:

• Other possible OS’s:

○ Debian

○ CentOS

○ Red Hat

○ Windows (version 1.8 of CKAN) http://www.hackneyworkshop.com/2012/03/30/ckan-on-windows/

○ OS X

61

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Installing CKAN

• Types of installation

○ Ubuntu 12.04 64-bit server package

○ source code

○ using Docker

62

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Package install

● Requirements: Ubuntu 12.04 64-bit server

●installs CKAN and DataPusher (for DataStore)

●Steps:1. Install the CKAN package and its

dependencies2. Install PostgreSQL and Solr3. Restart Apache and Nginx

sudo apt-get update

sudo apt-get install -y nginx apache2

libapache2-mod-wsgi libpq5

wget http://packaging.ckan.org/python-

ckan_2.2_amd64.deb

sudo dpkg -i python-ckan_2.2_amd64.deb

sudo apt-get install -y postgresql

solr-jetty

sudo service apache2 restart

sudo service nginx restart

63

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Source code install

● sequence of commands depend on operating system

○ detailed instructions for each are available in:

https://github.com/ckan/ckan/wiki/How-to-Install-CKAN

1. install dependency packages2. install CKAN packages into a Python virtualenv3. configure Postgres database4. create a CKAN configuration file (production.ini)5. configure Solr6. create database tables7. configure DataStore (optional)8. link to who.ini (Repoze.who configuration file)64

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Docker install

● Requirement: have Docker installed and configured

● set of 3 commands

● Docker downloads images automatically (can take a long time)

$ docker run -d --name db

ckan/postgresql

$ docker run -d --name solr ckan/solr

$ docker run -d -p 80:80 --link db:db

--link solr:solr ckan/ckan

65

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Initial configuration

• Create a site administrator user

paster sysadmin add seanh -c

/etc/ckan/default/production.ini

• Create other users if necessary

• Edit production.ini (for instance to configure the site name)

ckan.site_title = Open data portal

66

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Other maintenance commands

• Rebuild search index

paster --plugin=ckan search-index rebuild --

config=/etc/ckan/std/std.ini

• Create and remove users

paster --plugin=ckan user add exampleuser --

config=/etc/ckan/std/std.ini

paster --plugin=ckan user remove exampleuser --

config=/etc/ckan/std/std.ini

67

CKAN site administration

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Simple customization

http://<my-ckan-url>/ckan-admin/config/

● some simple customization changes

can be made through the UI

by the site administrator

○ site title and description

○ color scheme

○ intro text, about text and others

○ custom css

69

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

User registration

● by default, user self-registration

is enabled

● to disable (e.g. to avoid spam),

change a flag in .ini file

ckan.auth.create_user_via_web = False

70

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Registering new groups and organizations

● by default, creating new

organizations is enabled for all editors

● to disable, change a flag in .ini file

ckan.auth.user_create_organizations = False

● likewise, the same for groups

● note: site admin can always create

groups and organizations regardless

71

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Manage users

● look for user in

http://<my-ckan-url>/user/

● when logged in as admin, you

see a “manage” button under

the user profile

● admin can edit profile, change

passwords or delete the user

72

Directions

by Nick Page

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Documentation

http://docs.ckan.org

There are specific manualsfor specific audiences:

● End user (editor)

● Site administrator

● Maintainer

74

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Documentation

Also manuals for specific subjects:

● API guide

● Extending guide

● Theming guide

● Contributing guide

by John Haslam

75

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Where to get help

On mailing lists:

● CKAN Global User Grouphttps://groups.google.com/forum/#!forum/ckan-global-user-group

● ckan-devhttps://lists.okfn.org/mailman/listinfo/ckan-dev

by Upupa4me

76

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Where to get help

On IRC chat:

server: irc.freenode.net

channel: #ckan

by Garry Knight

77

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Where to get help

Paid support:

● hosting with a SLA

● deployment and maintenance

● support, consultancy,

training

by glasseyes view

78

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Where to try CKAN

demo.ckan.org

● free for experimentation, cataloguing data

and getting to know CKAN

● content is periodically wiped out

by Horia Varlan

79

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Where to register datasets

datahub.io

● community instance

● as an individual, if you

don’t have you own

CKAN, this is an option

● e.g. data that has been

cleaned up as

result of a hackathon

80

IV Moscow Urban Forum

CKAN Overview | Augusto Herrmann

Questions?

thank you

спасибо

[email protected]

[email protected]