ckan as an open-source data management solution for open data

18
CKAN an open-source data management solution for open data Ivan Ermilov

Upload: aims-agricultural-information-management-standards

Post on 09-Aug-2015

265 views

Category:

Technology


4 download

TRANSCRIPT

CKANan open-source data management solution for open data

Ivan Ermilov

AKSW Research Group

http://aksw.org

My experience with CKAN

● PublicData.eu portalo Crowd-sourcing CSV2RDF mappings

● LODStatso Version 1: crawling datahub.io (CKAN)o Version 2: CKAN aggregator for data.gov,

publicdata.eu and datahub.ioo Version 2: Crawled all three portals and published

the data on datahub.io

CKAN IS NOTa file storage!

Why CKAN?

● An open source platformo Relatively easy to deployo Provides a rich set of features for free

● Data management● Community involvement

Who use CKAN?

● All major open governmentso Canada (open.canada.ca): 244,238 datasetso The U.S. (data.gov): 131,348 datasetso Europe (publicdata.eu): 47,863 datasets

● And some other communities:o Semantic Web community (datahub.io): 9,509

datasets

CKAN architecture

CKAN Pros/Cons

● Proso Organizes your data in structured wayo Have an extension to support DCAT (only for

datasets)o Provides API to digest your data

● Conso The data model does not work for all use cases

(DBpedia)o No strict guidelines for dataset publishing

CKAN functionality

● Publishing metadata ● Exposing metadata (API/front-end)● Access control for users/organizations● Additional functionality via plugins

CKAN extensions/plugins

● Data preview and visualization● CKAN + DCAT● Extension that adds the Disqus commenting

system to CKAN● Simple API dataset hits counter

Full list is available at: http://extensions.ckan.org/

CKAN deployment

● From source● OS package (e.g. as debian package)● Docker image

Official guide: http://docs.ckan.org/en/latest/maintaining/installing/index.html

CKAN Multi-Tier Deployment

CKAN API

● Well documented● Covers everything you can do with the web

interfaceo You can write your own web interface

● Various API clientso ckanclient (python) - officialo Ruby, PHP, Java, Nodejs, Perl, R

https://github.com/ckan/ckan/wiki/CKAN-API-Clients

CKAN API methods

● Retrieving data● Creating new data● Update existing data● Delete existing data● Data is: packages, resources, groups, tags,

users etc.

http://docs.ckan.org/en/latest/api/index.html

CKAN API: Examples

● Get package listo http://demo.ckan.org/api/3/action/package_listo Disabled for data.gov

● Get one packageo http://demo.ckan.org/api/3/action/package_show?id=

adur_district_spending● ckan.logic.action.get.organization_show

o api/3/action/organization_show?id=...

Use Case: LODStats● Aggregate CKAN

instances via API

● Filter out only related datasets

● Build an application on top of it

Use Case: CSV2RDF● Integrated with a particular CKAN instance

● Aggregates all CSV files from the instance

● Provides an interface for CSV2RDF conversion

Thank you for your attention!

Presented by Ivan Ermilov.LinkedIn: https://www.linkedin.com/in/iermilovEmail: [email protected]: earthquakesan