agrovoc gacs working group
TRANSCRIPT
Links• Agrovoc online: www.fao.org/agrovoc • Download:
– https://aims-fao.atlassian.net/wiki/display/AGV/Releases
• VB 2.0 Sandbox: http://202.73.13.50:55481/vocbench/ (only Agrovoc)
• JIRA : https://aims-fao.atlassian.net/ • Sparql endpoint:
– http://202.45.139.84:10035/catalogs/fao/repositories/agrovoc
• Agrontology: http://aims.fao.org/aos/agrontology• VOID file for Agrovoc LOD:
– http://aims.fao.org/aos/agrovoc/void.ttl
04/15/23 2
People
• Caterina Caracciolo (coordination, data)• Sarah Dister (communication, editors)• Lavanya Kiran (content analysis)• Sachit Rajbahndari (VB, data)• Armando Stellato (VB, data)• Andrea Turbati (VB, data)
• Fabrizio Celli (Agris)• Valeria Pesce (AgriDrupal, AgriVivo)
04/15/23 3
Outline• A bit of history of Agrovoc• Agrovoc now• Access• Maintenance• Users• Data publication workflow• Editorial workflow• Copyright, license• Agrovoc structure• Walkaround in Agrovoc following VB• Agrovoc by domain: scientific tax, names, geographical entities• LOD• Mapping
04/15/23 5
AGROVOC from the beginning
• 1980s: AGROVOC on paper, only 3-4 language versions
• 1990s: AGROVOC moves to DB. More language versions are added.– Central dabatase, DB dump sent to partners for
translation – > limitations in maintenance and sharing
04/15/23 6
2000s
• Move to semantic technologies • Need to streamline maintenance, and improve
sharing over the web• Adopted format is OWL• Development of Workbench
04/15/23 7
Late 2000s
• Clear need for distinction between conceptual level and terminological level: notion of a concept scheme
• Collaboration between FAO and ICRISAT (KSI, Knowledge Sharing and Innovation Group) – Top concepts reduced from 918 to 25– Around 85,000 term relations revised– Non-hierarchical relationships refined by semantic relations– Ca. 4,000 non-preferred terms changed to preferred terms
2010 until now
• RDF is widely accepted standard• SKOS is the RDF vocabulary for thesauri • OWL shows limitations with thesauri structure
and multilinguality• Workbench becomes VocBench• VocBench adopted by Biotechnology Glossary,
used for bibliographic data
04/15/23 9
AGROVOC 2014
• AGROVOC RDF/SKOS (SKOS-XL)– for download – “live” through SPARQL endpoint and web services
• LOD: linked to 13 vocabularies• VocBench v2.1 out in a few days
04/15/23 11
Some figures
• Total number of concepts = ~ 32,000• 20 languages published
– 4 under development• 25 top concepts• Maximum depth hierarchy: 14
04/15/23 12
How to look at AGROVOC
• A terminological resource• A domain model
– Provides a view on how concepts are related to one another, e.g. agriculture is an “activity”, not say, a “science”, a “technique”, or a “religion”…
• An RDF/SKOS resource and a linked data set for use in web based applications
04/15/23 13
Strengths of AGROVOC
• Multilinguality• Number of (institutional) users • Experience and work done towards use in
open data environment
04/15/23 14
Agrovoc Online
• Browse/search• New tool under development: an SKOS
explorer, Drupal module– http://dev-skos-explorer.gotpantheon.com/
skos_explorer
04/15/23 16
For download
• RDF-SKOS– Agrovoc only, aka Agrovoc Core– Agrovoc LOD
• Other formats are no longer updated– relational DB, XML..
• SPARQL endpoint• Webservices
04/15/23 18
“live” access
• SPARQL endpoint– More used than we thought, as per Survey
• Webservices– In fact less used than we thought…– Certainly used by big institutions/library - they
count 1 for us…
04/15/23 19
In the past
• Institutions translating AGROVOC would get a dump of the DB, then:
• Translations– workflow decided internally to the institution
(who translates, who revises, ...) & totally implicit
• Data sent back to FAO for inclusion in master copy
04/15/23 21
Now• Data is managed within VocBench
– Web based– Implement formalized editorial workflow– Editors may get rights on languages
• Agrontology– URI: http://aims.fao.org/aos/agrontology
• Imported vocabularies are maintained elsewhere….
04/15/23 22
Manual - automatic
• Contribution to Agrovoc (new concepts, translations) is largely manual– ~2012 a couple of languages first proposed by
automatic translation tool (company provided), then revised manually
• Some tests done for automatic extraction of relations from text, we may continue on that in the future
• Mapping automatically extracted, then validated
04/15/23 23
VB 2.0• Re-engineered RDF backend, based on RDF Management platform
Semantic Turkey• Support for different triple stores• Extension mechanism based on OSGi• Multi scheme management. Several skos:ConceptSchemes can be
developed for the same dataset, providing different views on the dataStatistics module (a module providing resuming information about the loaded data).
• Export module: for exporting all or part of the content of a project according to several existing RDF serialization standards
• Load data module: for loading bulk data serialized in some RDF serialization standard
• Ontology Import Management (Administration-->Ontologies): to owl:import ontologies to be used as property vocabularies for the modeled thesauri
• New tabs under the concept view for covering extensively the SKOSXL standard (note, notations)
04/15/23 24
VB 2.1 – out in a few days
• A completely rebuilt installation mechanism - headache-free
• Self-installing DB, with auto-updating scripts• Wizard-driven system configuration, with import/export of
configuration profiles• SPARQL module: query/update content directly through the
SPARQL query language for RDF; syntax completion & highlight
• Multi scheme management: now concepts can be shared among different schemes
• RSS feeds for all editing actions
04/15/23 25
Editors
• So far, one focal point per language– See Agrovoc web site
• Now, we are moving to editing responsibility per language, per domain
04/15/23 27
Editing rights in VocBench
• Now editors get rights by language• Future: maybe also by domain (or similar
notion?)
04/15/23 28
Users of Agrovoc
• libraries, information management, …• From survey, also software developers,
translators, managers, researchers
04/15/23 29
Communication with users
• Through website (form)• Direct email [email protected]• AIMS bulleting• Agrovoc googlegroup
– Just started, not yet active
04/15/23 30
Support for users
• For editors: – VB support (user manual, video tutorials)– Syntax-oriented guidelines for editors– Would like to have more domain-oriented
guidelines, also explaining the use of Agrontology
04/15/23 31
Tools and hosting
• Mostly at Mimos Berhad (Malaysia)– Editing: VocBench– SPARQL endpoint: Allegrograph– LOD content negotiation, Pubby for producing
HTML
04/15/23 KISAF, Rome 33
“master” data -> download
1. Extraction of data from VocBench2. Data preparation for publication
1. Agrovoc Core2. Agrovoc LOD
1. Add reference to VOID file, per concept2. Add data of Agrovoc version3. For LOD, add triples
3. Load files on download site
04/15/23 KISAF, Rome 35
“master” data -> dynamic access
• Agrovoc LOD into triple store for external access
04/15/23 KISAF, Rome 36
Basics
• Formally defined: all editorial activities happen inside VB (as opposed as via email, phone..)
• Roles of users• Status of elements
04/15/23 38
Status of elements
• Draft• Revised• Validated• Published• Proposed deprecated• Deprecated
04/15/23 40
Validation phase
• Dedicated module: Validation• See also module: Recent change, RSS• Also, History of concepts, terms
04/15/23 41
Copyright
• FAO languages, stays with FAO– English, French, Spanish, Arabic, Russian, Chinese
• Other languages, stays with the institution that authored it
• Not exactly defined how it will be with a distributed authorship, e.g. by domain– Provenance?
04/15/23 KISAF, Rome 43
License
• Agrovoc may be used for free by anyone• Would like to have a CC3.0
– Although this is not official FAO policy
04/15/23 44
Concepts and terms
• Distinction between conceptual level and terminological level
• Concepts are represented by terms
04/15/23 46
Terms
• Thesaurus– maize UF corn (maize)– Corn (maize) use Maize
• Concept scheme– maize preferred term/ preferred label– Corn (maize) non preferred term/ alternative
label
04/15/23 48
SKOS
• Terms are turned into “labels”…– Skos:prefLabel “maize”@en– Skos:altLabel “corn (maize)”@en– skos:prefLabel “Mais”@it– skos:altLabel “granoturco”@it– ….
• … of the same concept = skos:Concept• A concept is identified and represented by an URI
http://aims.fao.org/aos/agrovoc/c_12332• …and located, as an URL…
04/15/23 49
Remarks
• Same hierarchy for all languages– Historical reasons: English was the first language,
the others were added as translations
• Multiple parents allowed– Ca. 1200 concepts with more than one parent
• Max. depth of hierarchy = 14
04/15/23 57
skos:broader
skos:broader
skos:broader
skos:broader
skos:related
c_6211Products @en
c_8171Plant products @en
c_1474Cereals @en
c_12332Maize @en
c_7552Sweet corn @en
c_14385Soft corn @en
c_15500corn starch@en
BT/NT in Agrovoc • In tree like format:
– [Products] • [Plant products]
– [Cereals*] » [Rice*, Paddy]
• In standard thesaurus-like format:– [Rice*] BT [Cereals*]
• In SKOS (simplified):– http://aims.fao.org/aos/agrovoc/c_12332 skos:broader
http://aims.fao.org/aos/agrovoc/c_6599
04/15/23 60
For sake of readability...
• I am using the English preferred label to talk about a concept– As in: [Rice*]– Instead of using its URI
• Preferred and alternative labels would be written as: [Rice*, Paddy]
04/15/23 61
Thesaurus hierarchies• Sometimes close to is-a relations:
– BT Plant products • Maize
– NT Dent maize– NT Sweet corn
• Sometimes close to containment or part-of:– BT Europe– BT Southern Europe
• Italy – NT Abruzzi– NT Sicily
04/15/23 62
Alphanumeric URIs
http://aims.fao.org/aos/agrovoc/c_12332
• To be language independent• How to chose one language over the other?• Label in that language may not be available
04/15/23 64
URIs of concepts…
… existing before the conversion to SKOS– URI is formed by appending:
• Namespace +• c_ + • the term code of the Agrovoc term
04/15/23 65
About Agrovoc term codes
• In the DB, terms were given a double key: a code for the term, plus a code for the language. Then, all preferred term had the same term code + different language code
• Codes had no fixed length
04/15/23 67
URIs of concepts…
… created after the conversion to SKOS– URI is formed by appending:
• Namespace +• c_ + • 13-digit automatically generated code
04/15/23 68
Remarks
• Great difference in development of hierarchy under each top• Not necessarily a problem…
– No “thematic roots”• Plants and animals under organisms• Agriculture is under activities (economic activity)• Forestry under subject• Food under product
04/15/23 73
AGROVOC data
• All terminological and domain information– Concepts– Terms– Relationships between concepts, or between
terms • URIs within namespace:
http://aims.fao.org/aos/agrovoc/– E.g. URI of concept [maize]:
http://aims.fao.org/aos/agrovoc/c_12332
04/15/23 77
Vocabularies used in Agrovoc - 1
• Vocabulary here is = The collection of properties that are used to describe concepts, terms and relations
• SKOS: to express concepts, BT/NT, RT, and matches with other vocabularies
• SKOS-XL: to express labels (to be able to make statements about labels)
04/15/23 78
Vocabularies used in Agrovoc - 2
• VOID: to describe the dataset• Dcterms: for date of creation and modification • Foaf: for images
04/15/23 79
Vocabularies used in Agrovoc - 3
• Agrontology: for AGROVOC-specific properties and relations – http://aims.fao.org/aos/agrontology
04/15/23 80
A look at Agrovoc through VocBench 2.0
• Note that some things present in the interface of VocBench are not really used in Agrovoc...
• In the following, we follow the tabs used to show information about concepts
04/15/23 KISAF, Rome 82
Terms
• All terms available for that concept • Terms are clickable, more info about the term
is shown– The possibility of making statements about terms
is provided by SKOS-XL (more on this later)
04/15/23 85
Definitions
• VB allows for more than one definition– Can specify source, URL– Expressed with skos:definition
• Each definition may have more than one translation
• Agrovoc only has single definitions– Its value is an URIs, like
agrovoc:c_def_1328252885416
• Mostly added after conversion to SKOS, ~120004/15/23 87
Notes – general
• This tab collects two types of notes, that Agrovoc inherited from its “thesaurus-time”
• Scope notes– To define the scope of applicability of concepts
• Editorial notes– To keep track of some editorial information
04/15/23 89
Scope note
• Rendered by skos:scopeNote– Its value is string – just text– May be given in various languages– Used to define the scope of applicability of a
concept = in old Agrovoc it was not possible to give definitions, so often Scope Notes were used to provide definitions
04/15/23 90
Editorial note
• Rendered as skos:editorialNote– In old Agrovoc thesaurus there was no way to keep
author/year of a scientific name, so editors often sued Editorial Notes.
– E.g. <http://aims.fao.org/aos/agrovoc/c_39617>Skos:prefLabel Aulopus filamentosus @en Skos:editorialNote “Author: (Bloch 1792)”@en– Plan is to have proper encoding of author/year of
scientific names04/15/23 91
Attributes
• Datatype properties of concepts– = the value of the property is a word, or rather, a
string
• Currently, Agrontology includes:– isSpatiallyIncludedInState (not actually used)– isSpatiallyIncludedInCity– isHoldBy (<sic> Not actually used)– isPartOfSubvocabulary
04/15/23 94
Remarks
• Some are currently under examination as part of the geographical domain (ref. Otakar Čerba)– isSpatiallyIncludedInState (no occurrences)– isSpatiallyIncludedInCity
• Notion of list of concepts– agrontology:isPartOfSubvocabulary– Ideally we would like to use something more
standard
• isHoldBy is not used in Agrovoc04/15/23 95
Notation
• Meant to keep codes of concepts– skos:notation
• Not used in Agrovoc (codes were given to terms)
04/15/23 97
Concepts relationships
• Non-hierarchical relationships of a given concept
• In thesauri, only RT, between terms• In concept schemes, same notion with
skos:related– same vagueness as RT, but between concepts
• Also other, more specific “related” are possible
04/15/23 99
Recap• In thesaurus, only RT relation• At some point, the RTs were “refined”
– ~ 160 (including inverse)
• Now:– Number of relations has reduced – further
reduction under evaluation– Vocabulary Agrontology collects Agrovoc relations
• Defined as an extension of skos:related
04/15/23
Example of Agrovoc relationships
[Oryza sativa*] agrontology:produces [Rice*, Paddy]
Which infers:[Rice*, Paddy] agrontology:is producedby [Oryza sativa*]
[Rice*, Paddy] skos:related [Oryza sativa*]
04/15/23
Agrontology
• Visualized in Module Relationship (called Properties in VB 2.1)
• Also available as HTML from:– http://aims.fao.org/aos/agrontology
04/15/23
History
• The list of actions performed on the concept• Some data is kept in Agrovoc, i.e. date of
creation, last update• The changes performed in between creation
and last update are stored in VB only
04/15/23 107
Image
• A pointer to an image available online, one may give name, source and URL of the source.
• http://xmlns.com/foaf/0.1/depiction
04/15/23 110
Hierarchy
• Is meant to give a quick grasp of the position of the selected concept in the hierarchy– Only parents from current concept to its Top
04/15/23 KISAF, Rome 113
The idea behind
• A way to make lists of concepts• Visualized in tab Attribute (of concepts)• Expressed by predicate
agrontology:isPartOfSubvocabulary– Value is a char (the name of the list)– You may think of them as a flag assigned to a
concept
• Subvocabularies may be defined by administrators
04/15/23 115
Subvocabularies currently defined
• Chemicals (644 concepts)• Geographical country level (247)• Geographical above country level (246)• Geographical below country level (522) • Fishery related terms (259)
04/15/23 117
Remarks
• Geographical vocabularies are currently under examination:– Country level– Above country level– Below country level
04/15/23 KISAF, Rome 118
Terms in SKOS
• In RDF/SKOS view, terms are labels of concepts (strings)
• Labels are strings
04/15/23 121
1981-01-26
skosxk:prefLabel
Dcterms:created
maize @en
PublishedAgrontology:hasStatus
skosxl:literalForm
agrovoc:c_12332 agrovoc:xl_en_1299486843709
Terms in SKOS-XL
• SKOS-XL is an extension of SKOS• Labels are really objects, about which one can
make statements
04/15/23 124
Agrovoc is in SKOS-XL
• Because we want to make statements about terms… – E.g. Creation date, status (drafted, ..., published)
• skosxl:prefLabel for descriptors• skosxl:altLabel for non descriptors
• But also skos:prefLabel and skos:altLabel are generated for various purposes– E.g. Pubby
04/15/23 125
Linguistic information as attribute of terms
• Singular, plural forms– agrontology:hasSingular, agrontology:hasPlural
• Spelling variants– Agrontology:hasSpellingVariant– E.g. hemophilia, haemophilia different labels?
• Membership to a subvocabulary of terms– Agrontology:hasTermType
04/15/23 133
The idea behind
• A way to make lists of terms, just like the subvocabularies of concepts
04/15/23 137
Subvocabularies of terms
• Expressed as a term, i.e., predicate agrontology:hasTermType
• In VocBench: tab Attribute of a term• Subvocabularies may be defined by
administrators
04/15/23 139
# terms per subvocabularies (en)
• Taxonomic terms for plants: 4297 • Taxonomic terms for animals: 14809 • Taxonomic terms for bacteria: 7133 • Common name for viruses: 50 • Taxonomic terms for viruses: 5136 • Common name for plants: 8000 • Common name for bacteria: 29 • Acronym: 867 • Common name for animals: 4613 • Taxonomic terms for fungi: 17830 • Common name for fungi: 144 04/15/23 140
Remark
• More than one membership is possible• But now mostly individual membership
– in the old MySQL maintenance tool it was not possible to assign two attributes
04/15/23 142
Information in Agrovoc
• For concepts: – date of creation
• Dct:created
– Date of last update• Dct:modified
• For terms– Date of creation and last update
• As above
04/15/23 144
Information in VocBench
• Extra information for management purposes– User, action, change– See VB modules Recent Changes and Validation
04/15/23 145
Multilinguality
• Labels are marked with ISO 2 Letter Language Code– Languages as recognized by linguists– En for English, Es (Spanish) in general,
independently of where it is spoken
04/15/23 147
Language variations by countries
• English in UK, USA, OZ, ..• Spanish in Spain, Argentina, Venezuela…• Portuguese in Portugal, Brazil
• This fact may be expressed by using ISO2 for languages + ISO for countries
04/15/23 149
Terminological variations by regions
• Names for concrete objects, or object of large use (food, plants, animals, …) tend to vary considerably “within a language”, e.g. by country/state/region/…– They also tend to stay over time… e.g., indigenous
words used after the indigenous language is no longer spoken…
• Reflection on limitations of code-oriented approach
04/15/23 151
Multilinguality
• More flexible way to support multilinguality?– accommodate the “area of use” of a given name
• Where area may be a country, a political or geographical region, or an aggregation of these
• E.g. palta is Quechua name for avocado, used in Argentina, Bolivia, Chile, Peru, Uruguay….
• Connection between linguistic and geographical information
04/15/23 152
Where in the hierarchy
• Most concepts for animal and plants are under top concept organisms*– ~ 2/3 of total concepts in AGROVOC
04/15/23 156
In Agrovoc thesaurus
• Only BT/NT hierarchies of taxa– E.g. [Oryza*] BT [Poaceae]
• No specification of ranks– E.g. Oryza is a genus
• No formal way to specify other “attributes” of a scientific name– See the use of Editorial notes
04/15/23 158
Elements in a scientific taxonomy and their names
• In case more than one name is available for a concept, the binominal name is preferred label
04/15/23 159
Taxonomies in Agrovoc now
• Mostly, skos:broader– Oryza skos:broader Poacacee
• Also, some pairs of taxa are connected by a number of relations in Agrontology– e.g. agrontology:includesSubGroup– Note that they are subproperties of skos:related..– Plan is to remove them
04/15/23 161
Ranks in Agrovoc now
• Rendered by agrontology:hasTaxonomicLevel – Oryza glaberrima* hasTaxonomicLevel species
(taxa)*• Values for hasTaxonomicLevel are concepts under:
– Groups* » Taxa*
04/15/23 162
Scientific names
• Binominal nomenclature
Hibiscus rosa-sinensis, Linnaeus 1753Common names:
Hibiscus, Chinese hibiscus, …
04/15/23 164
What language tag for scientific names?
• Some people say binominal names are Latin….– No! rather latinate…
• In Agrovoc, they are repeated in all languages, with the corresponding language tag
• This is due to historical reason, it was the only way to have scientific names in each language version
04/15/23 165
Scientific names in AGROVOC
• Are all labels of skos:concepts• May be preferred or alternative labels
– General guideline: Scientific name is preferred under a scientific taxonomy
– In case more than one scientific name is available for the same entity, the non preferred is a synonym of the other
04/15/23 166
How to mark scientific names - 1
• Using a subvocabulary of terms• Predicate agrontology:hasTermType
– Current possible values: Taxonomic name for [animals | Bacteria | Fungi | Plants | viruses], and also Common names for …
• Does not require that common names are available
04/15/23 167
How to “mark” scientific names - 1
• agrontology:hasScientificName• Applies to terms • Link together scientific and common name of
the same concept, the same language
04/15/23 168
Remarks
• Membership to a subvocabulary is a unary notion
• Our goal: unary property to mark that a term is a scientific name, together with author and year
04/15/23 172
Background
• At the time of the revision of Agrovoc there was the idea to have separate hierarchies for scientific and common sense taxonomies– Connected by skos:related – Under a taxonomic hierarchy, the preferred term is
always a scientific names
04/15/23 174
Separate hierarchiesfor scientific and common sense
taxonomies
* figure taken from earlier revision work on AGROVOC
04/15/23 175
Remark
• Sometimes no clear separation between scientific taxonomies and “common sense” taxonomies
– Example: Felidae (= biological family of cats, next slide)
• Appears within a scientific taxonomy• But its subconcepts are not scientific
Remark 1
• It is not always possible to have clear 1-1 correspondences between scientific and common sense taxonomies – Common names are not always available
• One common name may be used for very many…• They often are just the same as scientific names, or
similar…
04/15/23 179
Remark 2
• Different classifications are proposed (-> scientific taxonomies). – taxonomies also change over time
• In AGROVOC, attempt to use a C-C relation agrontology:formerlyIncludedIn– Agavaceae* agrontology:formerlyIncludedIn
Liliaceae– …satisfactory?...
04/15/23 180
How they are expressed
• Agrontology:hasSubvocabulary• Two vocabularies:
– Geographical country level = 247 concepts– Geographical above country level
04/15/23 182
Geopolitical information
• Agrovoc has no notion of time: – Czechoslovak Socialist Republic – German Democratic Republic
04/15/23 183
Physical geography
• Geographical above country level = 246 • A variety of notions…
– Americas– Baltic States– Islamic countries– Lake Kivu– Atlantic Ocean– Yellow Sea
04/15/23 184
Geographical relations
• spatiallyIncludes– Guyana – Cuyuni River
• A variety of notions…– Americas– Baltic States– Islamic countries– Lake Kivu– Atlantic Ocean– Yellow Sea
04/15/23 185
Agrovoc on geographical entities
• Currently under revision by Otakar Cerba (visitor scientist @FAO)
04/15/23 186
Mapping activities @Agrovoc
• Intense activity around 2011-2012• Process managed internally
– Candidate mapping automatically generated (using both publicly available implementations and algorithms implemented in-house)
– Manual evaluation• Done by one colleague, Gudrun
– Publication
04/15/23 191
VB and mapping
• Development happens within SemaGrow – Project deadline 2015– But because of internal deadline, might be ready Nov
2014, first version at least• What’s needed… ongoing…
– improved multi-project management capabilities (in Semantic Turkey), to access data from a different project, mediated by a dedicated Access Control policy
– mapping engine
More info
SemaGrow project www.semagrow.eu
Deliverable D3.2.1- Techniques for Ontology Alignment
http://www.semagrow.eu/sites/default/files/D3.2.1-Techniques%20for%20Ontology%20Alignment.pdf