lod kos marcia zeng - ligatus · kos vocabs published as lod. 6/9/19 3 marcia zeng -linked...
TRANSCRIPT
6/9/19
1
LOD KOS Marcia Zeng Linked Conservation Data (LCD) Terminology Workshop
Stanford University, June 6-7, 2019
LOD = Linked Open Data KOS=Knowledge Organization Structures/Systems
outline
uA. KOS vocabs published as LOD
1. Trend: publishing KOS as LOD
2. A closer look in a LOD thesaurus
uB. Usages of LOD KOS
1. For LOD Dataset Producers
2. For Website Developers
3. For Vocabulary Producers
4. For [end user] Researchers
machine-readable
Web-browsable
machine-processable
2
LOD KOS
Website Developers
Thesauruseditor
Conservator
Conservation scientist
Dataset Producer
Marcia Zeng - Linked Conservation Data,
Stanford University, June 6-7, 2019
6/9/19
2
http://nkos.slis.kent.edu/KOS_taxonomy.htm
A quick review
A-1. Trend: publishing KOS as LOD
Semantic Web standards
Ø SKOS
Ø OWL
Ø RDFS
Ø SPARQL
Ø RDF
LOD KOS
Ø Many KOS schemes have been turned into
Ø OWL ontologies, or
Ø SKOS-ified datasets;
Ø Such datasets are usually available
Ø as data dumps, or
Ø through SPARQL endpoints.
enables
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019 4
uA. KOS vocabs published as LOD
6/9/19
3
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019 5
In the BARTOC registry
(The Basel Register of Thesauri, Ontologies & Classifications )
KOS registered: -(2016-05): 1,836
-(2017-08): 2,753
v RDF: 297
-(2019-05-14): 2,983v RDF: 437v SKOS: 431
In the Datahub registry
(for LOD datasets,
not limited to KOS)
LOD KOS registered -(2019-06-01): 1133 found, with tags of:
• “thesaurus” (80)
• “classification” (478)
• “taxonomy” (37)
• “ontology “(531)
• “terminology” (39)
• “glossary” (13)
• “"name authority”(92)
(Note: some are tagged with multiple categories. Some have multiple editions.)
https://old.datahub.io/datasetAccessed 2019-06-01
http://bartoc.org/Accessed 2019-05-14
Standardized domain KOS
o AGROVOC
o Art and Architecture Thesaurus (AAT)
o ICONCLASS - Multilingual Thematic Classification
o English Heritage Monument Types Thesaurus & a series of thesauri for cultural heritage
o Medical Subject Headings (MeSH)
o Gene Ontology
o STW Thesaurus for Economics
o & dozens for biomedicine
Language- and culture-specific KOS
§ Traditional Korean Medicine Ontology
§ Art and Architecture Thesaurus-Taiwan
§ National Diet Library of Japan (NDL) Authorities
§ & more
General-purpose KOS
• Library of Congress Subject Headings (LCSH)
• EuroVoc
• Faceted Application of Subject Terminology (FAST)
• Universal Decimal Classification (UDC) Summary
• Library of Congress Classification;
• National Diet Library of Japan subject headings
Name-authority types of KOS
q Getty Thesaurus of Geographic Names (TGN)
q Union List of Artist Names (ULAN)
q FAO geopolitical ontology
q VIAF (Virtual International Authority File)
q & several national library’s name authorities
LOD KOS in the Datahub -Examples
Available in various formats
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019 6
6/9/19
4
Data of a LOD KOS are expressed as RDF triples and may be encoded using any concrete RDF syntax.
7
Images captured 2019-05-14
Available in various formats
http://id.loc.gov/authorities/subjects/sh2007006251
Ø Many KOS schemes have been turned intoØ OWL ontologies or Ø SKOS-ified datasets;
Ø Such datasets are usually available Ø as data dumps or Ø through SPARQL endpoints.
http://vocabularies.unesco.org/browser/en/aboutImage captured 2019-05-14
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019 8
6/9/19
5
http://vocabularies.unesco.org/sparql-form/Image captured 2019-05-14
Ø Many KOS schemes have been turned intoØ OWL ontologies or
Ø SKOS-ified datasets;
Ø Such datasets are usually available Ø as data dumps or
Ø through SPARQL endpoints.
9
A-2. a closer look in a LOD thesaurus…
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019 10
6/9/19
6
Broader
/Narrower
Labels in
different
languages
URI
LOD format
Normal view on the web
Image captured 2019-05-14
http://agrovoc.uniroma2.it/agrovoc/agrovoc/en/page/c_6599 11
SKOS-ifiedskos:broader
skos:broader
skos:broader
skos:broader
skos:related
c_6211products @en
c_8171plant products @en
c_1474cereals @en
c_6599rice @en
c_7552sweet corn @en
c_14385soft corn @en
c_15500corn starch@en
Relationships expressed in SKOS
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019 12
6/9/19
7
RDF triples- view in html table
http://aims.fao.org/aos/agrovoc/c_6599.html
Image captured 2019-05-14Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019 13
RDF triples-View as RDF/XML
http://agrovoc.uniroma2.it/agrovoc/agrovoc/en/page/c_6599
Image captured 2019-05-14Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019 14
Normal view on the web
6/9/19
8
RDF triples -View As RDF/XML
Image captured 2019-05-1415
outline
uA. KOS vocabs published as LOD1. Trend: publishing KOS as LOD
2. A closer look in a LOD thesaurus
uB. Usages of LOD KOS1. For LOD Dataset Producers
2. For Website Developers
3. For Vocabulary Producers
4. For [end user] Researchers
16
LOD KOS
Website Developers
Thesauruseditor
Conservator
Conservation scientist
Dataset Producer
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019
6/9/19
9
17Tim Berners-Lee https://www.w3.org/DesignIssues/LinkedData.html
B-1. For LOD Dataset Producers LOD KOS vocabularies enable their data to become 4-star and 5-star Linked Open Data
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019
18Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019
The options and actions related with KOS in the LOD dataset production
no controlled values
ècontrolled
• Need to populate controlled vocabs in a dataset
controlled, but local è use standard or
popular KOS
• Need to map to standard vocabs
controlled, standard vocab, but not on LOD è use LOD KOS
• Need to use LOD vocabs(with URIs)
LOD Dataset Producers
6/9/19
10
Milestones:1. Identify the entities.2. Put the entities into structured data.3. Clean up the newly structured data,
with local control.
• Situation: Dealing with semi-structured and unstructured data that have no controlled values for the named entities and topics.
• Task: create LOD datasets from scratch.
These situations can be found everywhere, in data about
o places, o persons, o institutions, o events, o objects, concepts,o … …
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019 19
Examples:• Digitized materials,
textual or non-textual, in silos • Archival finding aids• Oral history
transcripts• Merged local files
search & browse
My dataMetadata
Repository
records
RDF graphs
LOD
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019
20Image source: Zeng, 2012
Milestones:1. Identify the entities.2. Put the entities into structured data.3. Clean up the newly structured data, with
local control.4. Encode the entities with standardized
KOS vocabularies (as strings).
5. Obtain URIs for names of entities provided by the LOD KOS datasets.
6/9/19
11
The 5-star datasets in the LOD Cloud indicated the essential role of LOD KOS Vocabularies
Source: Annotated by the author on the LOD CLOUD 2014-08-30 image Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019 21
22
B-1 Summary For LOD Dataset Producers
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019
LOD KOS vocabularies ⁃ are the source of http URIs/IRIs for named entities and concepts used in data-
transformation;⁃ empower the owners of data to
⁃ convert and publish their data under the LOD principles, with high quality and trustworthy linkages in RDF triples;
⁃ transform anyone’s database into LOD Datasets, reaching 4- and 5- stars; ⁃ create machine-understandable and machine-processable data for any users, machine or
human.
The openly available, well-established, and constantly-maintained vocabulariesare invaluable engines for the LOD datasets.
6/9/19
12
B-2. For Website Developers
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019 23
https://www.moma.org/artists/1364Images captured 2018-06-18
Example:• (my) ID àtoà (their) URI(s)
Front-end
(my) ID
“sameAs”(their) URIs
back-end
Case: MoMA
24
6/9/19
13
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019 25
Results: Found websites for this ‘thing’ on the web.[ ulan: 500009365 ]
Images captured 2019-06-01
Instructions of consolidation using tools such as OpenRefine: https://www.getty.edu/research/tools/vocabularies/garcia_open_refine.pdf
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019 27
CASE: A checklist of Museums and Collections with Maya Inscriptions
Made by Interdisciplinary Dictionary of Classic Mayan (Textdatenbank und Wörterbuch des Klassischen Maya) research center at the University of Bonn.
One of the corpus databases has been constructed for objects that are now housed in museums and collections.
http://mayawoerterbuch.de/museumscollections/
The website provides a resource listing all museums and collections with Mayan inscriptions worldwide.
A webpage about a museumincluding the identifiers of Getty TGN, ILAN, and GeoNames.
6/9/19
14
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019 28
Case: Europeana
Semantic enrichment
original
Enriched by linking to other
LOD KOS
http://florentinedrawings.itatti.harvard.edu/catalog/0
001157-Berenson
“The amount of energy put into translating (from
Italian to English) and standardizing the metadata for
each record in this resource cannot be overlooked, as
Getty’s Art & Architecture Thesaurus, GeoNames and
Virtual International Authority File (VIAF) terms will
help streamline the research process.”
-- https://arlisna.org/publications/multimedia-technology-
reviews/1229-the-drawings-of-the-florentine-painters
Case: Florentine Drawings
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019 29
6/9/19
15
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019 30
B-3. For Vocabulary Producers
who are involved in the development and enrichment of KOS
- LOD approaches lead to unconventional processes and results.
[We will see more tomorrow morning. Here are just two examples.]
Ø Mapping with a source vocabulary
Ø Enrich contents (concepts, labels, relationships) Ø Leafnotes, microthesaurus, shared bridges, etc.
Ø Obtain URIs from a source vocabulary Ø Consolidation using tools such as OpenRefine
Marcia Zeng - Linked Conservation Data, Stanford, June 6-7, 201931
Additional references,
bookmarks, and demos are
available on a website.
• Notes and bookmarks are available at: http://metadataetc.org/LOD/interoperability.html
Enriching the KOS-at-hand
6/9/19
16
With the correct coding of propertiesa FAST’s controlled term • is related to a real-world entity • allows humans to gather more information about the
entity that is being described
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019 32
Case: FAST
33Marcia Zeng - Linked Conservation Data, Stanford, June 6-7, 2019
Ø Creating new value vocabularies for a particular project’s products by
extracting the components from a comprehensive KOS vocabulary
Ø a widely used approaches today as LOD gains momentum
(Instructions for AAT-based microthesaurus available at: http://metadataetc.org/LOD/6hands-on-Microthesauri-from-AAT.pdf )
ØGenerating a microthesaurus-- a designated subset of a thesaurus that is capable of functioning as a complete thesaurus
-- ISO25964-2:2013
6/9/19
17
https://www.canada.ca/en/heritage-information-network/services/collections-documentation-standards/chin-guide-museum-standards/vocabulary-data-value.html
The units (facets, hierarchies) were recommended to
be used
34
microthesauri
Image captured 2019-04-16
Microthesaurus
Marcia Zeng - Linked Conservation Data, Stanford, June 6-7, 2019
http://vocabularies.unesco.org/sparql-form/Image captured 2019-05-14 35
Case: UNESCO
6/9/19
18
300212133(<costume by function>)
300212133 <costume by function>
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019 36
http://vocab.getty.edu/sparql
How can I get the dataset? How long will it take?
Case: AAT
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019 37
Steps:Go to Getty Vocab LOD SPARQL Endpoint: http://vocab.getty.edu/sparql1. Choose ‘Queries’. 2. Choose "Descendants of a Given Parent" from the template, click. à Now, the template's text will show on the right.3. Click ‘SPARQL” to get the query text up.4. Submit
http://vocab.getty.edu/sparql1
2
3
4
6/9/19
19
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019 38
AAT descendants of 300194567 "drinking vessels".
Got the dataset in 2 seconds!
Download in a format you like.
The query can also
include scope
note, etc.
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019 39
6/9/19
20
B-4. For Researcher end-usersLOD KOS products can become knowledge bases and provide semantic-rich discoveries
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019 40
Extend the functionality of the KOS beyond being the “value vocabularies”
41Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019
Using well-developed KOS products, high quality and relevant knowledge bases are now available for researchers to use easily.
Ø LOD KOS can be used for Ø obtaining special graphs or datasets
for very complicated questions, andØ revealing unknown relationships.
Could a LOD KOS dataset be considered
• as a knowledge base?• as the foundation of a
network analysis? • as the building blocks of a
framework for for research in humanities and science?
Beyond being a ‘vocabulary’
6/9/19
21
http://sparql.uniprot.org/
Example: Universal Protein Resource (UniProt)
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019 42
http://vocab.getty.edu/queries#Top-level_Subjects
One can obtaining special RDF graphs or datasets for very complicated questions, andrevealing unknown relationships
43
Example:Getty Vocabularies: LOD
6/9/19
22
44
1
2
3
Steps: http://vocab.getty.edu/ => Queries(1) go to 4.18, (2) click on that SPARQL sign for 4.18, (3) submit.
Demo: Look for castles around The Netherlands (within the boundary of 50.787185 3.389722 53.542265 7.169019)
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019
Demo: Look for castles around The Netherlands
4
(4) Download the datasets in a selected format.
Netherlands
North Holla
nd
Europe
6
5
Additional:(5) Click on any castle’s ID, (6) open the single data record for this concept. Download the dataset as you wish.
(7) You may click on the Website to see its normal html view.
7
45
6/9/19
23
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019 46
• Got a dataset of over 200 caves spread in various countries, all done within a few minutes.
• Each URI also brings the full data for each cave and other related information.
• The dataset is available for downloading with various formats.
“caves” within bounding box (24.75083 28.95778 43.80722 108.92861)Demo: Look for caves on or around the ancient Silk Road
47
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019
6/9/19
24
At the same query templates page
Find the section for ULAN. Ø There are many
interesting query examples.
48
http://vocab.getty.edu/queries
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019
Name authorities offer foundational structured data for network analyses.
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019 49
Example:Online Coins of the Roman Empire (OCRE),http://numismatics.org/ocre/
All coin types from Augustus to Zeno (representing five centuries of Roman imperial numismatics) have been published.
OCRE incorporated 107,000+ physical coins related to these coin types from 21 different datasets.
These datasets originate from large collections as well as smaller civic or university museums, archaeological databases, and the Domuztepe excavations published through OpenContext which publishes research data on the web (Gruber 2017).[Figure 22a and 22b]https://opencontext.org/
Liberate users from unfamiliar query languages
6/9/19
25
50
• Browse by ontological classes
• Modeling in an ontology (formed in classes, properties, relationships)
• Following Linked Data principles• Using RDF triples for entities • Querying in SPARQL language http://numismatics.org/ocre/
Liberate users from unfamiliar query languages
51http://numismatics.org/ocre/id/ric.6.lon.66
• Auto-generated data related to an object, map, and quantitative analysis.
http://numismatics.org/ocre/Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019
6/9/19
26
Marcia Zeng - Linked Conservation Data, Stanford, June 6-7, 2019 52
53
Visualize your queries on-the-fly
How? http://wiki.numismatics.org/numishare:visualize
• Modeling in an ontology (formed in classes, properties, relationships)
• Following Linked Data principles
• Using RDF triples for entities
• Querying in SPARQL language
Marcia Zeng - Linked Conservation Data, Stanford University, June 6-7, 2019
6/9/19
27
Marcia Zeng - Linked Conservation Data, Stanford, June 6-7, 2019 54
• There are great and endless potential of LOD KOS.
• The semantic rich structure and high quality controlled vocabulary now can be used in an integrated manner and innovatively, beyond simply existing as controlled vocabularies or standardized name authorities.
• LOD KOS datasets should be considered as:• knowledge bases, • the foundation of network analyses, • the building blocks of a framework for research in
humanities and science.
B-4 Summary --
For Researcher end-users, LOD KOS products can become knowledge bases and provide semantic-rich discoveries.
Conclusions
Ø Although it is possible to use each available component of KOS independently, the real power lies in the skillful coordination of all. Ø The Semantic Web standards such as SKOS, OWL, RDFS, and SPARQL have paved the way
for conventional KOS to become LOD datasets. Ø There have been tremendous and continuous needs for KOS of all kinds, across domains,
and worldwide. Ø In the 21st century, the opportunities for using the semantic-rich LOD KOS are much
greater than ever before due to the fact that:Ø LOD KOS data are machine-understandable, -processable, and –actionable (instead of just
being machine-readable); Ø the Semantic Web connects things instead of strings.
Marcia Zeng - Linked Conservation Data, Stanford, June 6-7, 2019
55
“The whole is greater than the sum of its parts”
- Aristotle*
*https://www.goodreads.com/author/quotes/2192.Aristotle
6/9/19
28
LOD KOS Marcia Zeng Linked Conservation Data, Terminology Workshop
Stanford University, June 6-7, 2019
LOD = Linked Open Data KOS=Knowledge Organization Structures/Systems
THANK YOU!