o n t o p e d i a the identity of everything creating topic maps + topic maps and knowledge...

42
www.ontopedia.net O N T O P E D I A The Identity of Everything Creating Topic Maps + Topic Maps and Knowledge Organization Steve Pepper [email protected] Oslo University College, 2007-09-15

Post on 21-Dec-2015

227 views

Category:

Documents


3 download

TRANSCRIPT

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Creating Topic Maps+ Topic Maps and Knowledge Organization

Steve [email protected]

Oslo University College, 2007-09-15

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Course agenda

Week 37 – 09-08 Introduction to Topic Maps – Part 1 Week 38 – 09-15 Creating a topic map Week 39 – 09-22 Introduction to Topic Maps – Part 2 Week 42 – 10-13 Ontology-driven editing Week 43 – 10-20 The machinery of Topic Maps Week 46 – 11-10 (Semantic Web) Week 48 – 11-24 (Ontologies)

Terminology:– Topic Maps: The technology and the standard

– topic maps: The artefacts (documents) we create

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Today’s agenda

Quick recap: basic concepts and building blocks Topic Maps and Knowledge Organization

– Metadata, taxonomies, thesauri, faceted classification

Interchange syntaxes– XTM, LTM and CTM

Demo: Creating a topic map using LTM– Pay close attention...

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Recap: Core concepts

A pool of information or data, and

information

• Associations– representing relationships between

subjects

composed by

born in

composed by

• Occurrences– links to information that is somehow

relevant to a given subject

= The TAO of Topic Maps

a knowledge layer consisting of

knowledge

• Topics– a set of topics representing the key

subjects of the domain in question

Puccini

Tosca

Lucca

MadameButterfly

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Recap: Basic building blocks

Basic building blocks are– Topics: e.g. “Puccini”, “Lucca”, “Tosca”

– Associations: e.g. “Puccini was born in Lucca”

– Occurrences: e.g. “http://www.opera.net/puccini/bio.html

is a biography of Puccini”

Each of these constructs can be typed– Topic types: “composer”, “city”, “opera”

– Association types: “born in”, “composed by”

– Occurrence types: “biography”, “street map”, “synopsis”

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Topic Maps and Knowledge Organization

Keywords & controlled vocabulariesTaxonomies, thesauri & classificationsIndexes & glossariesOntologies

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Bibliographic languages

Work language– Author language

– Title language

– Edition language

– Subject language Classification language Index language

Document language– Production language

– Carrier language

– Location language

Svenonius, Elaine (2000):The Intellectual Foundation of Information Organization.Cambridge, MA: MIT Press (p.54)

Work languages– “Work languages describe information

entities, their intellectual (as opposed to physical) attributes, and relationships among them.” (p.87)

Document languages– ”A document is a particular space-time

embodiment of information: a document language describes and provides access to this embodiment.” (p.107)

Subject languages– “A subject language is used to depict

what a document is about.” (p.127)

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Two perspectives

Works have tended to be conflated with documents– So in practice there have been two kinds of language

Document languages– describe the work and its manifestations– document-centric (or resource-centric), e.g.

document metadata (Dublin Core) bibliographic records (MARC)

Subject languages– describe the subject space in which the work exists– subject-centric, e.g.

thesauri, taxonomies (ICD) classification schemes (LCSH, DDC) faceted classification (Colon)

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Metadata

“Data about data”– Information about documents– e.g. author, title, publisher, date, format,

keywords Useful for managing the content

– Especially suitable for librarians Somewhat useful for searching

– Especially for experts Less useful for end-users

– the user starts out wanting to know more about a subject

– traditional metadata, however, focuses on the document

– if aboutness is provided at all, it gets squeezed into a single field

Title: Creating Topic Maps

Author: Steve Pepper

Date: 2007-09-13

Format: appl/ppt

Keywords: topic maps, syntax,knowledge organization

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Keywords

Primitive form of subject-based classification– The keywords are used to describe the subject

– Cheap and simple… Folksonomies and tagging.

But also problematic because authors– misspell keywrods,

– use different keywords/terms/tags for the same thing, and

– use keywords that make no sense

Secondary problem– No way for the user to find out what keywords have been used

A keyword is a topic name

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Controlled vocabularies

Solution: create a list of legal keywords!– Requires somewhere to keep the list, and a process for new terms

Benefits– Solves problems of misspelling and duplicates (synonyms)

Disadvantages– Introduces some overhead (a flat list is difficult to manage)

– Users can still search using the wrong terms

– Users (and authors) still have difficulty finding terms

A controlled vocabulary is a well-defined set of topics with one name per topic

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Taxonomies

Organize the keywords into a tree– Most general at the top, more specific further down

– Common structure used by Yahoo!, etc.

– The folder metaphor file systems, email, favourites

Requires relationships between terms– Relationships state that one term is more specific

than another

– Advantage: terms somewhat easier to find

– Disadvantage: real world does not fit neatly into a hierarchy

A taxonomy is a set of topics related through a specific type of hierarchical association

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Thesauri

Like a taxonomy, but with some extensions– Also better defined: there are ISO standards for thesauri

Relationship types:– BT Broader term NT Narrower term– USE Preferred term UF Non-preferred terms– RT Related term– SN Scope note

A thesaurus is a set of topics related through particular, predefined association types

– BT/NT (hierarchical) and RT (untyped, associative)– (Scope notes are a kind of occurrence)– (USE and UF represent multiple names for the same concept/topic)

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Faceted classification

Invented by S. R. Ranganathan in the 1930s– Defines a number of facets or dimensions

– Defines a set of terms within each facet

– Sometimes these terms are arranged in a taxonomy

– Documents are classified against each facet separately

A faceted classification is a collection of topic “hierarchies”

– Each “hierarchy” contains topics whose names are used as terms within a particular facet

– XFML: An XML interchange syntax for faceted classification inspired by Topic Maps

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Expressivity progression

Topic maps– use any types, properties, and relationships you like

Faceted classification– multiple vocabularies, taxonomies or thesauri (one per facet)

Thesauri– more formal taxonomy; still no topic types; two association types

Taxonomy– terms arranged in a hierarchy; no topic types; single association type

Controlled vocabulary, folksonomies– just a list of terms; no topic types; no associations

open model

fixed model

no model

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Document-centric approaches

Traditional metadata is document-centric– Provides substantial descriptive power for documents

– Allows connection into subject-based classification

– Crucial for the management of content

– However, users are most interested in the subjects

Taxonomies, thesauri, and faceted classification are also document-centric

– These are methods for subject-based classification

– They provide hardly any descriptive power for subjects

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Subject-centric approaches

Topic maps are subject-centric– They provide great descriptive power for subjects

– Good as finding aids, because subjects are what users care about

Documents can be treated as subjects– This enables topic maps to capture metadata as well

– It also enables topic maps to stitch metadata and subject-based classification together into one seamless whole

Topic Maps is the knowledge model par excellence:– A subject-centric knowledge model that encompasses every other

kind of knowledge organization model

– Topic Maps can therefore be used to relate and combine taxonomies, indexes, thesauri, classifications, etc. etc.

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Syntaxes

XTM, LTM and CTM

What are they?

When should I use which?

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Topic Maps Syntaxes

HyTM (HyTime Topic Maps)– Original syntax, expressed in terms of SGML and HyTime

– No longer part of ISO 13250

XTM (XML Topic Maps Syntax)– Later, XML-based syntax, recently moved to version 2.0

– Easy to understand but very verbose

LTM (Linear Topic Map Notation)– Defined by Ontopia in 2001 and supported by other products

– A simple ASCII syntax for rapid prototyping

CTM (Compact Topic Maps Syntax)– ISO standard replacement for LTM

– Complete draft exists, but no implementations yet

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Topic Map – XTM 1.0 Syntax

<!ELEMENT topicMap ( topic | association | mergeMap )* ><!ATTLIST topicMap id ID #IMPLIED xmlns CDATA #FIXED 'http://www.topicmaps.org/xtm/1.0/' xmlns:xlink CDATA #FIXED 'http://www.w3.org/1999/xlink' xml:base CDATA #IMPLIED >

<?xml version="1.0" encoding="ISO-8859-1"?><topicMap xmlns="http://www.topicmaps.org/xtm/1.0/" xmlns:xlink="http://www.w3.org/1999/xlink">

<!-- topics, associations, and mergeMap elements go here -->

</topicMap>

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Topic Map – LTM Syntax

/* topics, associations, and occurrences go here */

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Topic – XTM 1.0 Syntax

<!ELEMENT topic ( instanceOf*, subjectIdentity?, ( baseName | occurrence )* )><!ATTLIST topic id ID #REQUIRED>

<topic id="italy"> ...</topic>

<topic id="puccini"> ...</topic>

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Topic – LTM Syntax

[topic-id]

[italy]

[puccini]

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Topic Name – XTM 1.0 Syntax (1 of 2)

<!ELEMENT baseName ( scope?, baseNameString, variant* ) ><!ATTLIST baseName id ID #IMPLIED >

<!ELEMENT baseNameString ( #PCDATA ) ><!ATTLIST baseNameString id ID #IMPLIED >

<!ELEMENT variant ( parameters, variantName?, variant* ) ><!ATTLIST variant id ID #IMPLIED >

<!ELEMENT variantName ( resourceRef | resourceData ) ><!ATTLIST variantName id ID #IMPLIED>

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Topic Name – XTM 1.0 Syntax (2 of 2)

<topic id="la-boheme"> <baseName> <baseNameString>La Bohème</baseNameString> <variant> <parameters> <subjectIndicatorRef xlink:href="http://www.topicmaps.org/xtm/1.0/core.xtm#sort"/> </parameters> <variantName> <resourceData>Bohème, La</resourceData> </variantName> </variant> </baseName></topic>

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Topic Name – LTM Syntax

[topic-id = basename; sortname?; dispname?]

[la-boheme = ”La Bohème"; "Bohème, La"]

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Topic Type – XTM 1.0 Syntax

Use <instanceOf> subelement

<topic id="opera"> ...</topic>

<topic id="tosca"> <instanceOf> <topicRef xlink:href="#opera"/> </instanceOf></topic>

<topic id="boito"> <instanceOf> <topicRef xlink:href="#composer"/> </instanceOf> <instanceOf> <topicRef xlink:href="#librettist"/> </instanceOf></topic>

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Topic Type – LTM Syntax

[topic-id : topic-type]

[tosca : opera]

[boito : composer librettist]

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Occurrence – XTM 1.0 Syntax

Use <occurrence> subelement:external/internal resources: <resourceRef> or <resourceData>

<!ELEMENT occurrence ( instanceOf?, scope?, ( resourceRef | resourceData ) )><!ATTLIST occurrence id ID #IMPLIED>

<topic id="la-boheme"> <occurrence> <instanceOf><topicRef xlink:href="#homepage"/></instanceOf> <resourceRef xlink:href="http://www.opera.it/Opere/La-Boheme/La-Boheme.html"/> </occurrence> <occurrence> <instanceOf><topicRef xlink:href="#premiere-date"/></instanceOf> <resourceData>1896 (1 Feb)</resourceData> </occurrence></topic>

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Occurrence – LTM Syntax

{topic-id, occurrence-type, [URL | data]}

{la-boheme, homepage, "http://www.opera.it/Opere/La-Boheme/La-Boheme.html"}

{la-boheme, premiere-date, [[1896 (1 Feb)]]}

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Topic – Complete XTM 1.0 Syntax

<topic id="la-boheme"> <instanceOf><topicRef xlink:href="#opera"/></instanceOf> <baseName> <baseNameString>La Bohème</baseNameString> <variant> <parameters> <subjectIndicatorRef xlink:href="http://www.topicmaps.org/xtm/1.0/core.xtm#sort"/> </parameters> <variantName><resourceData>Boheme, La</resourceData></variantName> </variant> </baseName> <occurrence> <instanceOf><topicRef xlink:href="#homepage"/></instanceOf> <resourceRef xlink:href="http://www.opera.it/Opere/La-Boheme/La-Boheme.html"/> </occurrence> <occurrence> <instanceOf><topicRef xlink:href="#premiere-date"/></instanceOf> <resourceData>1896 (1 Feb)</resourceData> </occurrence></topic>

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Topic – Complete LTM Syntax

[la-boheme : opera = "La Bohème"; "Boheme, La” ]

{la-boheme, homepage, "http://www.opera.it/Opere/La-Boheme/La-Boheme.html"}

{la-boheme, premiere-date, [[1896 (1 Feb)]]}

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Association – XTM 1.0 Syntax

<!ELEMENT association (instanceOf?, scope? , member+)><!ATTLIST association id ID #REQUIRED><!ELEMENT member (roleSpec?, (topicRef | ...)+) >

<!ATTLIST member id ID #IMPLIED><!ELEMENT roleSpec (topicRef | ...) >

<association> <instanceOf><topicRef xlink:href="#composed-by"/></instanceOf> <member> <roleSpec><topicRef xlink:href="#composer"/></roleSpec> <topicRef xlink:href="#puccini"/> </member> <member> <roleSpec><topicRef xlink:href="#work"/></roleSpec> <topicRef xlink:href="#tosca"/> </member></association>

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Association – LTM Syntax

assoc-type ( role-player, role-player, ... )

composed-by( puccini , tosca )

Note 1: There can be more than two role-players in an association. We’ll talk about that next week.

Note 2: The above is an oversimplification due to the fact that we have not yet talked about role types. We’ll do that next week.

The exact syntax should be as follows:

assoc-type ( role-player : role-type, role-player : role-type, ... )

composed-by( puccini : composer, tosca : work )

When omitted, the role type will be assumed to be identical to the type of the role-playing topic. This can be a useful short-hand and we will use it for now, but it is not always what you want...

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Subject Identity – XTM 1.0 Syntax

<!ELEMENT topic (instanceOf*, subjectIdentity?,...)><!ELEMENT subjectIdentity (resourceRef?, (topicRef | subjectIndicatorRef)*) > <!– Refer to a resource as subject: --><topic id="foo"> <subjectIdentity> <resourceRef xlink:href="http://www.ontopia.net"/> </subjectIdentity> <baseName> <baseNameString>The Ontopia Website</baseNameString> </baseName></topic>

<!– Refer to a subject indicator: --><topic id="bar"> <subjectIdentity> <subjectIndicatorRef xlink:href="http://www.ontopia.net/about.html"/> </subjectIdentity> <baseName> <baseNameString>Ontopia</baseNameString> </baseName></topic>

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Subject Identity – LTM Syntax

[topic-id = names %subject-address-URL][topic-id = names @subject-indicator-URL]

/* Refer to a resource as subject: */[foo = "The Ontopia Website" %"http://www.ontopia.net" ]

/* Refer to a subject indicator: */[bar = "Ontopia" @"http://www.ontopia.net/about.html"]

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Scope – XTM 1.0 Syntax

<!-- "scope" subelements on baseName, occurrence, and association (also "parameters" on variantName) -->

<topic id="composed-by"> <baseName> <baseNameString>composed by</baseNameString> </baseName> <baseName> <scope><topicRef xlink:href="#composer"/></scope> <baseNameString>composer of</baseNameString> </baseName></topic>

<topic id="la-boheme2"> <baseName> <baseNameString>La Bohème (Leoncavallo)</baseNameString> </baseName> <baseName> <scope><topicRef xlink:href="#leoncavallo"/></scope> <baseNameString>La Bohème</baseNameString> </baseName></topic>

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Scope – LTM syntax

(name or occurrence or association) / scoping-topic(s)

[born-in = "composed by" = "composer of" / composer ]

[la-boheme1 = "La Bohème (Puccini)" = "La Bohème" / puccini ]

[la-boheme2 = "La Bohème (Leoncavallo)" = "La Bohème" / leoncavallo ]

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Demo: Creating a topic map

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Home assignment

1. Prerequisites– You have installed Java and the OKS Samplers

– You know the basics of LTM http://www.ontopia.net/download/ltm.html

2. Create your first topic map– Decide what domain you want to cover

– Write LTM in a text editor (Notepad, TextPad, emacs, ...)

– Keep it in its own directory

– Copy to .../apache-tomcat/webapps/omnigator/WEB-INF/topicmaps for testing in the Omnigator

– Use Reload function

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Your own topic map

Choose something that really interests you

– It’s much more fun than something boring!

Some ideas:– Sport (football, cricket, ...)

– Culture (music, film, literature, theatre, ...)

– Study courses

– Project management

– Conference website

– Languages

– Geography

This first topic map is your own personal one

– The next one will be a group project for term assessment

Requirements:– Minimum 4 topic types, 4

association types, 4 occurrence types

– Minimum 10 topics, 20 associations, 10 occurrences

– Send to [email protected] by Monday 29 September

www.ontopedia.net

O N T O P E D I AThe Identity of Everything

Next lecture

Monday September 22 Same time, same place Agenda

– Advanced features (roles, scope, identity, reification)

– Help with home assignment