![Page 1: Metadata curation: hands-on session - CLARIN · 6/5/2018 · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June](https://reader035.vdocument.in/reader035/viewer/2022063015/5fd245c01a726755172d242b/html5/thumbnails/1.jpg)
Metadatacuration:hands-onsession
CMDIandMetadataCurationTaskForces
CLARINCentre&Developersmeeting4-5June2018
Utrecht,TheNetherlands
CLARIN 1
![Page 2: Metadata curation: hands-on session - CLARIN · 6/5/2018 · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June](https://reader035.vdocument.in/reader035/viewer/2022063015/5fd245c01a726755172d242b/html5/thumbnails/2.jpg)
Prerequisites
• Java1.8- https://java.com/en/download/
• Internetconnection
CLARIN 2
![Page 3: Metadata curation: hands-on session - CLARIN · 6/5/2018 · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June](https://reader035.vdocument.in/reader035/viewer/2022063015/5fd245c01a726755172d242b/html5/thumbnails/3.jpg)
Menu
1. Viewyour recordsinthe VLO2. Viewyour harvest (and its log)3. Getyour records
1. From the tarball2. Harvest them
4. Curation module1. Lookatthe website2. Runlocally
5. CMDIbestpractices1. Checkyour profiles2. Checkyour records
6. Structural queries1. Loadyour records/validation reports into BaseX2. Some useful XQueries
7. Inspect the mapping8. VLO9. Fixingproblems,butwhere?10. What’s missing?
CLARIN 3
![Page 4: Metadata curation: hands-on session - CLARIN · 6/5/2018 · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June](https://reader035.vdocument.in/reader035/viewer/2022063015/5fd245c01a726755172d242b/html5/thumbnails/4.jpg)
Viewyour recordsinthe VLO
• Filterthe recordsbased onyour endpoint:- _oaiEndpointURI:
• https://vlo.clarin.eu/search?q=_oaiEndpointURI:https://clarin-pl.eu/oai/request
- Endpoints?centres.clarin.eu/oai_pmh
• Filterthe recordsbased onaprofile:- _componentProfile:
• https://vlo.clarin.eu/search?q=_componentProfile:LINDAT_CLARIN• Note:use the profilenameinstead ofits ID!
CLARIN 4
![Page 5: Metadata curation: hands-on session - CLARIN · 6/5/2018 · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June](https://reader035.vdocument.in/reader035/viewer/2022063015/5fd245c01a726755172d242b/html5/thumbnails/5.jpg)
Viewyour harvest (and its log)
• Not inproduction yet,butlocal preview- will replace https://vlo.clarin.eu/data/
• Paged lists• Filteronendpoints and/orrecords• Seethe logofaharvest
CLARIN 5
![Page 6: Metadata curation: hands-on session - CLARIN · 6/5/2018 · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June](https://reader035.vdocument.in/reader035/viewer/2022063015/5fd245c01a726755172d242b/html5/thumbnails/6.jpg)
Getyour records
1. From the tarball1. https://vlo.clarin.eu/data/resultsets/2. tar xjf clarin.tar.bz2
results/cmdi/DANS_CMDI_Provider3. Note:just clicking the tarball might freeze your Mac!
2. Harvest them1. https://github.com/clarin-eric/oai-harvest-manager/releases2. Editproviderssectionofresources/config-test.xml3. run-harvester.sh workdir=`pwd`
resources/config-test.xml
CLARIN 6
![Page 7: Metadata curation: hands-on session - CLARIN · 6/5/2018 · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June](https://reader035.vdocument.in/reader035/viewer/2022063015/5fd245c01a726755172d242b/html5/thumbnails/7.jpg)
Curation module
1. Lookatthe website1. https://clarin.oeaw.ac.at/curate/
2. Runlocally1. https://github.com/clarin-eric/clarin-curation-module2. curation.jar (goo.gl/Cx4h3N )3. Create your own specific copyofconfig.properties4. java -jar curation.jar -config
config.properties -c -path results/cmdi/ARCHE
CLARIN 7
![Page 8: Metadata curation: hands-on session - CLARIN · 6/5/2018 · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June](https://reader035.vdocument.in/reader035/viewer/2022063015/5fd245c01a726755172d242b/html5/thumbnails/8.jpg)
CMDIbestpractices
1. https://www.clarin.eu/content/cmdi-best-practice-guide2. Schematron rules (schematron.com)
1. https://github.com/TheLanguageArchive/SchemAnon/releases2. Also supported by oXygen orother XMLeditors3. Also easyto define your own rules
3. Checkyour profiles1. Identify the profiles you’re using
1. https://github.com/clarin-eric/FindProfiles/releases2. java -jar findProfiles.jar -e=xml
clarin/results/cmdi/The_Language_Archive/2. wget -O profile.xml
https://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/1.x/profiles/clarin.eu:cr1:p_1505397653795/xml && java -jar SchemAnon.jarhttps://raw.githubusercontent.com/clarin-eric/cmdi-toolkit/develop/src/main/resources/toolkit/sch/cmd-component-best-practices.sch profile.xml
CLARIN 8
![Page 9: Metadata curation: hands-on session - CLARIN · 6/5/2018 · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June](https://reader035.vdocument.in/reader035/viewer/2022063015/5fd245c01a726755172d242b/html5/thumbnails/9.jpg)
CMDIbestpractices
4. Checkthe records1. java -jar SchemAnon.jar
https://raw.githubusercontent.com/clarin-eric/cmdi-toolkit/develop/src/main/resources/toolkit/sch/cmd-record-best-practices.schclarin/results/cmdi/The_Language_Archive/xml
2. Note:use the -s optionto savethe SVRLreport
5. Validate the records1. https://github.com/clarin-eric/cmdi-instance-validator/releases2. cmdi-validator results/cmdi/IMS_Repository/3. Note:use the -s optionto use another Schematron file
CLARIN 9
![Page 10: Metadata curation: hands-on session - CLARIN · 6/5/2018 · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June](https://reader035.vdocument.in/reader035/viewer/2022063015/5fd245c01a726755172d242b/html5/thumbnails/10.jpg)
Structural queries
1. Loadyour records/validation reports into BaseX1. basex.org orbrew install basex
2. Create anewdatabaseand importyour records/reports
2. XQuery (w3.org/XML/Query)declare namespace cmd="http://www.clarin.eu/cmd/1";
declare namespace svrl="http://purl.oclc.org/dsdl/svrl";
…- goo.gl/CEZtTm
Notes1. You can use the namespace wildcard(*:element)to dealwith
(many)profilespecific namespaces2. You can use base-uri() to getthe filenameofamatchingrecord3. BaseX hasuseful modules,butalso FunctX (xqueryfunctions.com)4. Aproblem that occurs often might be acandidate for aSchematron
rule
CLARIN 10
![Page 11: Metadata curation: hands-on session - CLARIN · 6/5/2018 · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June](https://reader035.vdocument.in/reader035/viewer/2022063015/5fd245c01a726755172d242b/html5/thumbnails/11.jpg)
Inspect the mapping
1. Identify the profiles you’re using1. https://github.com/clarin-eric/FindProfiles/releases2. java -jar findProfiles.jar -e=xml
clarin/results/cmdi/The_Language_Archive/
2. Inspect the mapping1. https://github.com/clarin-eric/VLO-mapping2. https://cmdi.clarin.eu/mapping/
CLARIN 11
![Page 12: Metadata curation: hands-on session - CLARIN · 6/5/2018 · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June](https://reader035.vdocument.in/reader035/viewer/2022063015/5fd245c01a726755172d242b/html5/thumbnails/12.jpg)
VLO
1. Curation VLO(to be updated)1. https://vlo.minerva.arz.oeaw.ac.at/vlo
2. Request an importinthe beta VLO1. [email protected]
3. Doalocal VLOimport1. https://gitlab.com/CLARIN-ERIC/compose_vlo#run-the-
importer-to-ingest-cmdi-metadata-into-the-vlo
4. Runthe importer onone record1. https://github.com/clarin-eric/VLO/blob/master/vlo-
importer/src/main/java/eu/clarin/cmdi/vlo/importer/MetadataMapper.java• CLASSPATH="vlo-importer-4.2-SNAPSHOT-importer.jar" javaeu.clarin.cmdi.vlo.importer.MetadataMapper -c VloConfig.xml -r test.xml
CLARIN 12
![Page 13: Metadata curation: hands-on session - CLARIN · 6/5/2018 · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June](https://reader035.vdocument.in/reader035/viewer/2022063015/5fd245c01a726755172d242b/html5/thumbnails/13.jpg)
Fixingproblems,butwhere?
• Your records- Typos inyour records- Inconsistencies inyour records
• Consider adopting acommon(CLARIN/CLAVAS)vocabulary- Facetmapping problems
• Can you fixthem inyour profile(s)?• Orprovide feedbackto the MetadataCuration TF([email protected])
- Valuemapping problems• Provide feedbackto the MetadataCuration TF ([email protected])
• Others records- reportthem viathe VLOfeedbackbutton
CLARIN 13
![Page 14: Metadata curation: hands-on session - CLARIN · 6/5/2018 · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June](https://reader035.vdocument.in/reader035/viewer/2022063015/5fd245c01a726755172d242b/html5/thumbnails/14.jpg)
What’s missing?
• OAIViewer- history- include alocal curation run- general log- mailto technical contactwhen number ofharvested recordsdrop
• VLOimporter- reportshowing the mappings applied
• VLO- _componentProfileURI
• profilenamemight not be unique!- Centerfacet
• filterto all recordsfrom one center,possible multipleendpoints- showoriginal value
• More?
CLARIN 14
![Page 15: Metadata curation: hands-on session - CLARIN · 6/5/2018 · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June](https://reader035.vdocument.in/reader035/viewer/2022063015/5fd245c01a726755172d242b/html5/thumbnails/15.jpg)
Questions
• MetadataCuration Taskforces- [email protected]
• CMDITaskforce- [email protected]
• CMDIfirstaid kit- clarin.eu/sites/default/files/CMDI-first-aid-kit.pdf
• MenzoWindhouwer- [email protected]
CLARIN 15