skos-2-hive gwu workshop. introductions hollie white [email protected] jane greenberg...

91
SKOS-2-HIVE GWU workshop

Upload: justin-tucker

Post on 17-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

SKOS-2-HIVEGWU workshop

Page 2: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

IntroductionsHollie White [email protected] Greenberg [email protected]

Page 3: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Morning Session Morning Session ScheduleSchedule

Introductions

Section 1: Characterizing Knowledge Organization Structures

Section 2: Thesauri and What They Represent

BREAK

Section 3: From Thesauri to SKOS

Section 4: From SKOS to HIVE

Exploring HIVE

Page 4: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Section 1: Characterizing knowledge organization

structures

Page 5: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Types of knowledge Types of knowledge organization structuresorganization structures

From least to most structure

Term lists

Controlled vocabularies

Thesauri

Taxonomy

Ontology

Page 6: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Languages for Languages for aboutnessaboutness

Indexing languages: Terminological tools

Thesauri (CV – controlled vocabulary) Subject headings lists Authority files for named entities (people, places,

structures, organizations)

Classification / Classificatory systems

Keyword lists

Natural language systems (broad interpretation)

6

Page 7: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Term listsTerm listsControlled but semi-unstructured list

Term List in practice

http://library.lib.asu.edu/search/y

Page 8: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Authority filesAuthority files-standardization of names, subjects and titles for easier

identification and interoperability of information

Authority Files:

http://authorities.loc.gov/

Page 9: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

ThesauriThesauri Less-structured and structured thesauri

Lexical semantic relationships

Composed of indexing terms/descriptors

Descriptors - representations of conceptsConcepts - Units of meaning

Page 10: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Thesaurus basicsThesaurus basics Preferred terms vs. non-preferred terms

--ex. dress vs. clothing

Semantic relations between terms

--broader, narrower, related

How to apply terms (guidelines, rules)

Scope notes

Page 11: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Common thesaural Common thesaural identifiersidentifiers

SN Scope Note Instruction, e.g. don’t invert phrases

USE Use (another term in preference to this one)

UF Used For

BT Broader Term

NT Narrower Term

RT Related Term

Page 12: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Controlled VocabulariesControlled Vocabularies

(less structured thesauri also referred to as subject heading lists)

Library of Congress Subject Headings (LCSH)

Sears Subject Headings

Medical Subject Headings (MeSH)

http://www.nlm.nih.gov/mesh/MBrowser.html

Page 13: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

ThesauriThesauriThesaurus in practice

ERIC

NBII

http://thesaurus.nbii.gov/portal/server.pt

NASA thesaurus

http://www.sti.nasa.gov/thesfrm1.htm

Page 14: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

TaxonomyTaxonomyFirst used by Carl von Linne (Linneaus) to

classify zoology.

A grouping of terms representing topics or subject categories. A taxonomy is typically structured so that its terms exhibit hierarchical relationships to one another, between broader and narrower concepts.

taxonomy == a subject-based classification that arranges the terms in the controlled vocabulary into a hierarchy (Garshol 2004)

Page 15: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

OntologyOntology

In general (in the LIS domain): a tool to help organize knowledge a way to convey or represent a class (or classes) of things,

and relationships among the class/es.

No exact definition…this comes from the community you are coming from

15

Page 16: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

KOS used in Digital KOS used in Digital LibrariesLibraries

Looked at 269 online digital libraries and collections

KOS used:

Locally developed taxonomy (113)

LCSH (78)

Author list (34)

Thesauri (26)

Alphabetical listing (20)

Geographic arrangement (16)

Shiri, A. and Chase-Kruszewski, S. (2009) Knowledge organization systems in North American digital library collections. Program:electronic library and information systems. 43 (2) pp 121-139.

Page 17: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Discussion:Discussion:

Think about your own organization.

What type of controlled vocabularies, thesauri, and ontologies does your organization use for everyday work?

How do these vocabulary choices help you meet the goals of your institution?

Page 18: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Organizing Knowledge

Organization Structures

Page 19: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

HodgeHodge’’s Types of Knowledge Organization Systemss Types of Knowledge Organization Systems

Terms Lists :

Authority Files, Glossaries, Gazetteers,

Dictionaries

Classifications and Categories:

Subject Headings, Classification Schemes,

Taxonomies, and Categorization Schemes

Relationship Lists:

Thesauri, Semantic Networks, OntologiesHodge, G. (2000) Systems of Knowledge Organization for Digital Libraries: Beyond Traditional Authority Files.http://www.clir.org/pubs/abstract/pub91abst.html

Page 20: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

(McGuinness, D. L. (2003). Ontologies Come of Age. In Fensel, et al, Spinning the Semantic Web. Cambridge, MIT Press), pp. 175. [see also, p. 181 + 189])

Page 21: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Classical view of ILS languages

<___|____|_______|______|_____|______|______|_______|________|_____>

Simple thesauri/ deeper taxonomies low level full/intricate

Key word CV thesauri ontologies ontologies

Lists (WordNet) (OWL)

Greenberg’s Ontology Continuum

Page 22: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

(http://jodi.tamu.edu/Articles/v04/i04/Smith/#section12)

Page 23: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

http://www.semantic-conference.com

Page 24: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Section 2: Thesauri and what they represent

Page 25: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Examples of different types of Examples of different types of

““thesaurithesauri”” Cook’s Thesaurus

http://www.foodsubs.com/

BZZURKK! Thesaurus of Champions

http://epe.lac-bac.gc.ca/100/200/300/ktaylor/kaboom/bzzurkk.htm

General Multilingual Environmental Thesaurus

http://www.eionet.europa.eu/gemet

Page 26: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Common thesaural Common thesaural identifiersidentifiers

SN Scope Note Instruction, e.g. don’t invert phrases

USE Use (another term in preference to this one)

UF Used For

BT Broader Term

NT Narrower Term

RT Related Term

Page 27: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Syndetic Syndetic RelationshipsRelationships

Hierarchical

Equivalent

Associative

Page 28: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

HierarchicalHierarchical Level of generality – both preferred terms

BT (broader term) Birthday cakes

BT Cakes

NT (narrower term) Cakes

NT Birthday cakes

…remember inheritance

Page 29: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

EquivalentEquivalent When two or more terms represent the

same concept

One is the preferred term (descriptor), where all the information is collected

The other is the non-preferred and helps the user to find the appropriate term

Page 30: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

EquivalentEquivalent

• Non-preferred term USE Preferred term– Biological diversification

USE Biodiversity

• Preferred term UF (used for) Non-preferred term– Biodiversity

UF Biological diversification

Page 31: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

AssociativeAssociative One preferred term is related to another

preferred term

Non-hierarchical

“See also” function

In any large thesaurus, a significant number of terms will mean similar things or cover related areas, without necessarily being synonyms or fitting into a defined hierarchy

Page 32: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

AssociativeAssociative

• Related Terms (RT) can be used to show these links within the thesaurus– Bed

RT Bedding– Paint Brushes

RT Painting– Vandalism

RT Hostility– Programming

RT Software

Page 33: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Exercise: Thesauri Exercise: Thesauri BuildingBuilding

• Montages

• Digital photographs

• Illustrations

• Pictures

• Photographic prints

• Drawings

• Photographs

• Daguerreotypes

• Negatives

Page 34: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Where to start:Where to start: Look at the overall offering Determine the aboutness Identify the “root” element or broadest term Identify groups/categories of information Start structuring based on the syndetic relations

you know Create hierarchies based on the semantic

relations Use the appropriate identifiers to show the

relationships

Page 35: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Section 3: From Thesauri to SKOS

Page 36: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Simple Knowledge Simple Knowledge Organization SystemsOrganization Systems

Classical view of ILS languages

<___|____|_______|______|_____|______|______|_______|_______|______>

Simple thesauri/ deeper taxonomies low level full/intricate

Key word CV thesauri ontologies ontologies

Lists (i.e WordNet) (i.e. OWL)

SKOS

Page 37: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Exam

ple

1:w

eb

Exam

ple

1:w

eb

vie

w o

f NB

II vie

w o

f NB

II entry

entry

Page 38: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu
Page 39: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Descriptive MarkupDescriptive Markup“the markup is used to label parts of the

document rather than to provide specific instructions as to how they should be processed. The objective is to decouple the inherent structure of the document from any particular treatment or rendition of it. Such markup is often described as "semantic".

--from Wikipedia

Page 40: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Markup LanguagesMarkup Languages“is a system for annotating a text in a way which is

syntactically distinguishable from that text.”

Using tags:

<tag>content to be rendered</tag>

Or a keyword in brackets to distinguish texts

--from Wikipedia

Page 41: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

HTMLHTMLHypertext Markup Language

--language used to mark up webpages

--both descriptive and processing

Page 42: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

HTML encodingHTML encoding<!doctype html>

<html>

<head>

<title>Hello HTML</title>

</head>

<body>

<p>Hello World!</p>

</body>

</html>

Page 43: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

NB

II in H

TM

LN

BII in

HTM

L

<a href="#" onclick="return oamSubmitForm4178('result','result:j_id_jsp_1679715049_7:0:j_id_jsp_1679715049_9',null,[['synonym','Heterozygotes']]);">Heterozygotes</a></td><td class="valign”><table><tbody id="result:j_id_jsp_1679715049_7:0:j_id_jsp_1679715049_14:tbody_element”><tr class="odd"><td class="type">BT</td><td class="synonym"><a href="#" onclick="return oamSubmitForm4178('result','result:j_id_jsp_1679715049_7:0:j_id_jsp_1679715049_14:0:j_id_jsp_1679715049_18',null,[['synonym','Genotypes']]);">Genotypes</a></td></tr><tr class="even"><td class="type">NT</td><td class="synonym"><a href="#" onclick="return oamSubmitForm4178('result','result:j_id_jsp_1679715049_7:0:j_id_jsp_1679715049_14:1:j_id_jsp_1679715049_18',null,[['synonym','Carriers (genetics)']]);">Carriers (genetics)</a></td></tr><tr class="odd"><td class="type">RT</td><td class="synonym"><a href="#" onclick="return oamSubmitForm4178('result','result:j_id_jsp_1679715049_7:0:j_id_jsp_1679715049_14:2:j_id_jsp_1679715049_18',null,[['synonym','Heterozygosity']]);">Heterozygosity</a></td></tr><tr class="even"><td class="type">RT</td><td class="synonym"><a href="#" onclick="return oamSubmitForm4178('result','result:j_id_jsp_1679715049_7:0:j_id_jsp_1679715049_14:3:j_id_jsp_1679715049_18',null,[['synonym','Homozygotes']]);">Homozygotes</a></td></tr><tr class="odd"><td class="type">SC</td><td class="synonym">LSC Life Sciences</td></tr></tbody></table></td></tr><tr class="even"><td class="valign"><a href="#" onclick="return oamSubmitForm4178('result','result:j_id_jsp_1679715049_7:1:j_id_jsp_1679715049_9',null,[['synonym','Homozygotes']]);">Homozygotes</a></td><td class="valign”><table><tbody id="result:j_id_jsp_1679715049_7:1:j_id_jsp_1679715049_14:tbody_element”><tr class="odd"><td class="type">BT</td><td class="synonym"><a href="#" onclick="return oamSubmitForm4178('result','result:j_id_jsp_1679715049_7:1:j_id_jsp_1679715049_14:0:j_id_jsp_1679715049_18',null,[['synonym','Genotypes']]);">Genotypes</a></td></tr><tr class="even"><td class="type">RT</td><td class="synonym"><a href="#" onclick="return oamSubmitForm4178('result','result:j_id_jsp_1679715049_7:1:j_id_jsp_1679715049_14:1:j_id_jsp_1679715049_18',null,[['synonym','Heterozygotes']]);">Heterozygotes</a></td></tr><tr class="odd"><td class="type">RT</td><td class="synonym"><a href="#" onclick="return oamSubmitForm4178('result','result:j_id_jsp_1679715049_7:1:j_id_jsp_1679715049_14:2:j_id_jsp_1679715049_18',null,[['synonym','Homozygosity']]);">Homozygosity</a></td></tr><tr class="even"><td class="type">SC</td><td class="synonym">LSC Life Sciences</td></tr></tbody></table></td></tr>;

Page 44: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

XMLXMLExtensible Markup Language

--Created by the World Wide Web Consortium (W3C).

--Used to mark up documents on the internet or electronic documents.

--Users get to describe the tags that are used and define how they are used.

Page 45: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

XML encodingXML encoding

Page 46: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

NB

II in X

ML

NB

II in X

ML

<CONCEPT>

<DESCRIPTOR>Zygotes</DESCRIPTOR>

<UF>Ookinetes</UF>

<BT>Ova</BT>

<NT>Oocysts</NT>

<RT>Hemizygosity</RT>

<RT>Reproduction</RT>

<RT>Zygosity</RT>

<SC>ASF Aquatic Sciences and Fisheries</SC>

<SC>LSC Life Sciences</SC>

<STA>Approved</STA>

<TYP>Descriptor</TYP>

<INP>2007-08-14</INP>

 <UPD>2007-08-14</UPD>

</CONCEPT>

Page 47: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

RDFRDFResource Description Framework

“is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a variety of syntax formats”

--from Wikipedia

Page 48: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

RDF data modelRDF data model is similar to Entity-Relationship or Class

diagrams,

  statements about resource in subject-predicate- object expressions called “triples”.

subject = resource

predicate = traits or aspects of the resource and expresses a relationship between the subject and the object.

Page 49: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

The sky The sky has the color has the color blueblue

RDF triple:

a subject denoting "the sky“

a predicate denoting "has the color”

an object denoting "blue” 

Page 50: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

OWLOWLWeb Ontology Language

--knowledge representation language for displaying ontologies working with logic

Page 51: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

SKOSSKOS Family of languages used to describe thesauri,

controlled vocabulary, subject headings, and taxonomies.

Page 52: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

NB

II in S

KO

S/R

DF

NB

II in S

KO

S/R

DF

<rdf:Description rdf:about="http://thesaurus.nbii.gov/nbii#Zygotes">

<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>

<skos:inScheme rdf:resource="http://thesaurus.nbii.gov/nbii#conceptScheme"/>

<skos:altLabel>Ookinetes</skos:altLabel>

<skos:broader rdf:resource="http://thesaurus.nbii.gov/nbii#Ova"/>

<skos:narrower rdf:resource="http://thesaurus.nbii.gov/nbii#Oocysts"/>

<skos:prefLabel>Zygotes</skos:prefLabel>

<skos:related rdf:resource="http://thesaurus.nbii.gov/nbii#Hemizygosity"/>

<skos:related rdf:resource="http://thesaurus.nbii.gov/nbii#Reproduction"/>

<skos:related rdf:resource="http://thesaurus.nbii.gov/nbii#Zygosity"/>

<skos:scopeNote>ASF Aquatic Sciences and Fisheries LSC Life Sciences</skos:scopeNote>

</rdf:Description>

Page 53: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Basic SKOS TagsBasic SKOS TagsSkos:concept

Skos:prefLabel

Skos:altLabel

Skos:broader

Skos:narrower

Skos:related

Page 54: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

SKOS tagsSKOS tags

• SN Scope Note = skos:scopeNote

• USE Use = skos:prefLabel

• UF Used For = skos:altLabel

• BT Broader Term = skos:broader

• NT Narrower Term = skos:narrower

• RT Related Term = skos:related

Each entry term has a skos:concept

Page 55: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Terms vs. Concepts?Terms vs. Concepts?

Example: TableExample: Table

Lexical level : Table

Conceptual level :

Page 56: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

What is a SKOS What is a SKOS Concept?Concept?

ZygotesBT OvaNT OocystsRT HemizygosityRT ReproductionRT ZygosityUF Ookinetes

All these relationshipsmake up a SKOS concept

Page 57: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Projects Using SKOS:Projects Using SKOS: Library of Congress

http://id.loc.gov/authorities/search/

Europeana

http://www.europeana.eu/portal/

HIVE

http://ils.unc.edu/mrc/hive/

Page 58: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

EXPERIMENTING EXPERIMENTING

WITH SKOSWITH SKOSInstructions: SKOS tags can easily be mapped to identifiers found in traditional thesauri. For this activity try mapping basic SKOS tags to an TGM: Subject Terms excerpt.

Page 59: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Section 4: From SKOS to HIVE

Page 60: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

OverviewOverview• HIVE—Helping Interdisciplinary Vocabulary Engineering

Motivation—Dryad repository

• HIVE—Goals, status, and design•A scenario

• Usability

• Conclusion and questions

Page 61: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

61

HIVE modelHIVE model

<AMG> approach for integrating discipline CVs Model addressing C V cost, interoperability, and usability constraints (interdisciplinary environment)

Page 62: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

MotivationMotivation

Page 63: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

63

Page 64: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

American Society of NaturalistsAmerican Naturalist

Ecological Society of AmericaEcology, Ecological Letters, Ecological Monographs, etc.

European Society for Evolutionary BiologyJournal of Evolutionary Biology

Society for Integrative and Comparative BiologyIntegrative and Comparative Biology

Society for Molecular Biology and EvolutionMolecular Biology and Evolution

Society for the Study of Evolution EvolutionSociety for Systematic Biology

Systematic BiologyCommercial journals

Molecular EcologyMolecular Phylogenetics and Evolution

Partner JournalsPartner Journals

Page 65: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Dryad’s workflow

~ low burden submission

<M><M>

<M>

Page 66: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Vocabulary needs for Vocabulary needs for DryadDryad

• Vocabulary analysis – 600 keywords, Dryad partner journals

• Vocabularies: NBII Thesaurus, LCSH, the Getty’s TGN, ERIC Thesaurus, Gene Ontology, IT IS (10 vocabularies)

• Facets: taxon, geographic name, time period, topic, research method, genotype, phenotype…

• Results431 topical terms, exact matches– NBII Thesaurus, 25%; MeSH, 18%531 terms (research method and taxon)– LCSH, 22% found exact matches, 25% partial

• Conclusion: Need multiple vocabularies

Page 67: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Goals, status, and Goals, status, and designdesign

Page 68: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

HIVE...HIVE...as a solutionas a solution• Address CV (controlled vocabulary) cost, interoperability,

and usability constraints• COST: Expensive to create, maintain, and use • INTEROPERABILITY: Developed in silos (structurally

and intellectually) • USABILITY: Interface design and functionality

limitations have been well documented

Page 69: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

HIVE Goals− Automatic metadata

generation approach that dynamically integrates discipline-specific controlled vocabularies encoded with the Simple Knowledge Organisation System (SKOS)

• Provide efficient, affordable, interoperable, and user friendly access to multiple vocabularies during metadata creation activities

• A model that can be replicated—> model and service

Three phases of HIVE:

1. Building HIVE- Vocabulary preparation- Server development

- Primate Life Histories Working Group

- Wood Anatomy and Wood Density Working Group

2. Sharing HIVE- Continuing education

(empowering information empowering information professionalsprofessionals)

3. Evaluating HIVE- Examining HIVE in Dryad

Page 70: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

HIVE PartnersHIVE PartnersVocabulary

Partners Library of Congress: LCSH

the Getty Research Institute (GRI): TGN (Thesaurus of Geographic Names )

United States Geological Survey (USGS): NBII Thesaurus, Integrated Taxonomic Information System (ITIS)

Agrovoc Thesaurus

Advisory Board Jim Balhoff, NESCent Libby Dechman, LCSH Mike Frame, USGS Alistair Miles, Oxford, UK William Moen, University of North

Texas Eva Méndez Rodríguez, University

Carlos III of Madrid Joseph Shubitowski, Getty Research

Institute Ed Summers, LCSH Barbara Tillett, Library of Congress Kathy Wisser, Simmons Lisa Zolly, USGS

WORKSHOPS HOSTS: Columbia Univ.; Univ. of California, San Diego; Univ. of North Texas; Universidad Carlos III de Madrid, Madrid, Spain

Page 71: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

HIVE ConstructionHIVE Construction• HIVE stores millions of concepts from different vocabularies,

and makes them available on the Web by a simple HTTP– Vocabularies are imported into HIVE using SKOS/RDF format

• HIVE is divided in two different modules:

1.HIVE Core– SKOS/RDF storage and management (SESAME/Elmo)– SMART HIVESMART HIVE: Automatic Metadata Extraction and Topic

Detection (KEA++)– Concept Retrieval (Lucene)

2.HIVE Web– Web user Interface (GWT—Google Web Toolkit)– Machine oriented interface (SOAP and REST)

Page 72: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu
Page 73: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu
Page 74: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu
Page 75: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu
Page 76: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

A scenarioA scenario

HIVE for scientists, depositors

HIVE for information professionals: curators, professional librarians, archivists, museum catalogers

Page 77: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Meet AmyMeet Amy

Amy Zanne is a botanist.

Like every good scientist, she publishes.

Page 78: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu
Page 79: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

~~~~Amy~~~~Amy

• Amy Zanne is a botanist.

• Like every good scientist, she publishes.

• She deposits data in Dryad.

Page 80: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Dryad’s workflow

~ low burden submission

<M><M>

<M>

Page 81: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu
Page 82: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu
Page 83: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu
Page 84: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu
Page 85: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

UsabilityUsabilityLS and IS students (32 students) - Understanding HIVE: 3.8 on 5 pt. scale- Ease of navigation: 4.5- Concept cloud a good idea: 3.3 - Represent document accurately:

2.0 (simple HIVE), 3.3 (smart HIVE)

Advisory board (10 members)- Systems/technical folks want integration w/systems, Getty—

EAD- Librarians/KO folks, want to see term relationships- Like tag cloud, want relevance percentages- Color, placement of box, labels..

White 2009-2010; HIVE Team 2009-2010

Page 86: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

UsabilityUsability

Huang, 2010

Page 87: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

System usability and flow System usability and flow metricsmetrics

Huang, 2010

Page 88: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

ChallengesChallenges Building vs. doing/analysis

• Source for HIVE generation, beyond abstracts Combining many vocabularies during the indexing/term

• matching phase is difficult, time consuming, inefficient.• NLP and machine learning offer promise

Interoperability = dumbing down • ontologies

Proof-of-concept/ illustrate the differences between HIVE and other vocabulary registries (NCBO and OBO Foundry)

General large team logistics, and having people from multiple disciplines (also the ++)

Page 89: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Summary and next Summary and next stepssteps

Open source, customizable, SKOS, + hybrid metadata generation

Research and evaluation Team project relating to Dryad Hollie White--dissertation Lesley Skalla--master’s paper Craig Willis– MeSH/SKOS conversion Curator interface design Workshop evaluation

User’s and developer’s groups on “Google Groups”• Long Term Ecological Research (LTER) Network (http://www.lternet.edu/)

Page 90: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Exploring HIVEhttp://hive.nescent.org

Page 91: SKOS-2-HIVE GWU workshop. Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu

Questions /CommentsQuestions /CommentsHollie White

[email protected]

Ryan Scherle

[email protected]

Jane Greenberg

[email protected]

Craig Willis

[email protected]