ckan 2.0: harvesting from other sources

Download ckan 2.0: Harvesting from other sources

If you can't read please download the document

Upload: chengjen-lee

Post on 21-May-2015

1.075 views

Category:

Technology


0 download

TRANSCRIPT

  • 1. ckan 2.0:Harvesting from other sourcesInternship @ Academia SinicaReport #3Presenter: Cheng-Jen Lee (Sol)Email: cjlee AT iis.sinica.edu.twThis work is licensed under aCreative Commons Attribution-ShareAlike 3.0 Taiwan License.

2. Agenda Harvesters Usage: manually and automatically Custom harvester some issues Linked Data and RDFOct 13, 2014 2 3. Harvesters ckanext-harvest Remote harvesting extension Source Type CSW csv/xls WAF customOct 13, 2014 3 4. HarvestersOct 13, 2014 4 5. Harvesters Usage (manually) (pyenv) $ paster --plugin=ckanext-harvestharvester gather_consumer -c/etc/ckan/default/production.ini (pyenv) $ paster --plugin=ckanext-harvestharvester fetch_consumer -c/etc/ckan/default/production.ini (pyenv) $ paster --plugin=ckanext-harvestharvester run -c/etc/ckan/default/production.iniOct 13, 2014 5 6. Harvesters Usage (automatically) Supervisor (for gather & fetch consumer) Cron (for run) Supervisor (with profile) $ sudo supervisorctl reread $ sudo supervisorctl add ckan_gather_consumer $ sudo supervisorctl add ckan_fetch_consumer $ sudo supervisorctl start ckan_gather_consumer $ sudo supervisorctl start ckan_fetch_consumerOct 13, 2014 6 7. Harvesters Custom harvester We can implement the harvester interface toperform harvesting operations The process take place on three steps: gather: get the identification fetch: fetch the contents import: create ckan package(dataset) Implementation https://github.com/u10313335/ckanext-harvest/blob/master/ckanext/harvest/harvesters/srdaharvester.pyOct 13, 2014 7 8. Harvesters Harvesting Interfacefrom ckan.plugins.core import SingletonPlugin, implementsfrom ckanext.harvest.interfaces import IHarvesterclass MyHarvester(SingletonPlugin):implements(IHarvester)def get_original_url(self, harvest_object_id)::param harvest_object_id: HarvestObject id:returns: A string with the URL to the original documentdef gather_stage(self, harvest_job)::param harvest_job: HarvestJob object:returns: A list of HarvestObject idsdef fetch_stage(self, harvest_object)::param harvest_object: HarvestObject object:returns: True if everything went right, False if errors were founddef import_stage(self, harvest_object):Oct 13, 2014 8:param harvest_object: HarvestObject object:returns: True if everything went right, False if errors were found 9. Harvesters Some issues Title with non-ASCII characters Useless update check TGOS CSW: failed in gather stage Caused by OWSLib Harvest source varies We should modified the extension for properlyharvesting Modified version available On Github:https://github.com/u10313335/ckanext-harvestOct 13, 2014 9 10. Linked Data and RDF Resource Description Framework a family of W3C specifications a metadata data model based on XML, URIOct 13, 2014 10Source: http://techserviceslibrary.blogspot.tw/2011/04/rdf-resource-description.html 11. Linked Data and RDF Vocabularies DCAT and Dublin Core Two way to get RDF metadata curl -L -H "Accept:application/rdf+xml"http://thedatahub.org/dataset/gold-prices curl -L http://thedatahub.org/dataset/gold-prices.rdfOct 13, 2014 11 12. Documents Read the Docs: https://readthedocs.org/projects/ckan-docs-tw/Oct 13, 2014 12 13. Thanks for your attention!Any Q?Oct 13, 2014 13