introduction to the semantic webelite.polito.it/files/courses/01lhv/2009/1-knowledge.pdf ·...
TRANSCRIPT
Introduction tothe Semantic Web
F. Corno, L. Farinetti - Politecnico di Torino 2
Semantic Web
Web second generationWeb 3.0
“Conceptual structuring of the Web in an explicit machine-readable way” (Tim Berners-Lee)In other words…
…let the machine do most of the work!!!
http://www.w3.org/2001/sw/
F. Corno, L. Farinetti - Politecnico di Torino 3
“Official” introductionThe Semantic Web is a web of data. There is lots of data we all use every day, and its not part of the web. I can see my bank statements on the web, and my photographs, and I can see my appointments in a calendar. But can I see my photos in a calendar to see what I was doing when I took them? Can I see bank statement lines in a calendar?Why not? Because we don’t have a web of data. Because data is controlled by applications, and each application keeps it to itself
F. Corno, L. Farinetti - Politecnico di Torino 4
Example
F. Corno, L. Farinetti - Politecnico di Torino 5
“Official” introductionThe Semantic Web is about two thingsIt is about common formats for integration and combination of data drawn from diverse sources, where on the original Web mainly concentrated on the interchange of documents.It is also about language for recording how the data relates to real world objects. That allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing
F. Corno, L. Farinetti - Politecnico di Torino 6
An example …
How can a machine distinguish the meanings … ?
“I am a professor of computer science.”
“I am a professor of computer science, you may think. Well,…”
F. Corno, L. Farinetti - Politecnico di Torino 7
Key principlesThe Semantic Web is the Web
Same base technologies, evolutionaryDecentralized (incomplete, inconsistent)
Provide explicit statements regarding web resources
Authors, original information providersIntermediaries (humans and/or machines)
Information consumers determine consequences of the statements
Distributed ‘reasoning’
F. Corno, L. Farinetti - Politecnico di Torino 8
1989: WWW original proposal
F. Corno, L. Farinetti - Politecnico di Torino 9
Technology stack (old: pre-2008)
F. Corno, L. Farinetti - Politecnico di Torino 10
Technology stack (current: 2008)
F. Corno, L. Farinetti - Politecnico di Torino 11
Goal of the semantic Web
The Semantic Web will enable machines to COMPREHEND semantic documents and data, NOT human speech and writing
Then, how???Semantic Web foundation: metadata
Metadata andMetadata Standards
F. Corno, L. Farinetti - Politecnico di Torino 13
Resource and description
ResourceContent, format, …Access method dependent on format (I can read it if I “know” its language)
Resource descriptionIndependent of the format (I can read “people’s comments” about the resource… provided that I know the language in which the comment is written)
F. Corno, L. Farinetti - Politecnico di Torino 14
Resource and descriptiondescription
resource
this resource was created on April 14th, 2009
the title of this resource is
“Introduction to the Semantic
Web”
the author of this resourceis L. Farinetti
this resource is related to computer science,
knowledge representation and metadata
the quality of this resource
is high, according to
F. Corno
this resource is suitable for PhD students
F. Corno, L. Farinetti - Politecnico di Torino 15
Resource and descriptionResource
Content, format, …Access method dependent on format (I can read it if I “know” its language)
Standardization (i.e. common language for applications) ???
Practically impossible …Huge amount of existing informationHundreds of human languagesHundreds of computer languages (other word for formats)
F. Corno, L. Farinetti - Politecnico di Torino 16
Resource and descriptionResource description
Independent of the format (I can read “people’s comments” about the resource… provided that I know the language in which the comment is written)
Standardization (i.e. common language for applications) ???
FeasibleSmaller amount of information, possibly newSolution: define a standard language for writing comments (“metadata” in semantic web terminology)
F. Corno, L. Farinetti - Politecnico di Torino 17
Resource and description
this resource was created on April 14th, 2009
the title of this resource is
“Introduction to the Semantic
Web”
the author of this resourceis L. Farinetti
this resource is related to computer science,
knowledge representation and metadata
the quality of this resource
is high, according to
F. Corno
this resource is suitable for PhD students
Metadata
Field name = field value
F. Corno, L. Farinetti - Politecnico di Torino 18
Resource and descriptiondescription
resource
Date =2009-04-14
Title =“Introduction to
the Semantic Web”
Author =L. Farinetti
Topic ={computer science,
knowledge representation,
metadata}
Quality = high
Level = PhD students
Rated by F. CornoRated by F. Corno
F. Corno, L. Farinetti - Politecnico di Torino 19
Semantic Web main tasksMetadata annotation
Description of resources using standard languages
SearchRetrieve relevant information according to user’s query / interest / intentionUse metadata (and possibly content) in a “smart” way (i.e. “reasoning” about the meaning of annotations)
F. Corno, L. Farinetti - Politecnico di Torino 20
Meaningful metadata annotations
Common language for describing resourcesResource description standards
Common language for description field namesMetadata standards
Common language for description field valuesMetadata standards + controlled vocabularies
Semantically rich descriptions to support searchKnowledge representation techniques, ontologies
F. Corno, L. Farinetti - Politecnico di Torino 21
Common language for describing resources
F. Corno, L. Farinetti - Politecnico di Torino 22
Common language for describing resources
Resource Description Framework (RDF)Resource = URI (retrievable, or not)RDF is structured in statements
A statement is a tripleSubject – predicate – objectSubject: a resourcePredicate: a verb / property / relationshipObject: a resource, or a literal string
F. Corno, L. Farinetti - Politecnico di Torino 23
Common language for describing resources
Diagram:
Simple RDF assertion (triple):
triple (hasAuthor, URI, L.Farinetti)triple (hasAuthor, URI, L.Farinetti)
URIURI L.FarinettiL.FarinettihasAuthor
Author =L. Farinetti
F. Corno, L. Farinetti - Politecnico di Torino 24
Common language for describing resources
RDF in XML syntax:
Author =L. Farinetti
<RDF xmlns=“http://www.w3.org/TR/ … ” ><Description about=“http://www.polito.it/semweb/intro”>
<Author>L.Farinetti</Author></Description>
</RDF>
<RDF xmlns=“http://www.w3.org/TR/ … ” ><Description about=“http://www.polito.it/semweb/intro”>
<Author>L.Farinetti</Author></Description>
</RDF>
F. Corno, L. Farinetti - Politecnico di Torino 25
Common language for field names
Title = ...
Problem
Author = …
Creator, Maker, Contributor …
Synonymy
Topic = …
Topics, Subject, Subjects, Argument, Arguments
Singular / plural
Level = …
Difficult to clearly define concept in a few words
Educational level, destination, suitability, …
Date = …
Date of creation, date of last modification, date of revision, …
Different concepts: need for more details
F. Corno, L. Farinetti - Politecnico di Torino 26
Common language for field names
Solution: metadata standardsMany standardization bodies are involvedStandards may be general
e.g. Dublin Core (DC)or may depend on goal, context, domain, …
e. g. educational resources (IEEE LOM), multimedia resources (MPEG- 7), images (VRA), people (FOAF, IEEE PAPI), geospatial resources (GSDGM), bibliographical resources (MARC, OAI), cultural heritage resources (CIDOC CRM)
F. Corno, L. Farinetti - Politecnico di Torino 27
Metadata standards examples
F. Corno, L. Farinetti - Politecnico di Torino 28
Dublin Core
Dublin Core Metadata Element Set (DCMES)
Building blocks to define metadata for the Semantic Web15 elements, or categories, general enough to describe most of the published resourcesExtra elements and element refinements
F. Corno, L. Farinetti - Politecnico di Torino 29
DC metadata element set
F. Corno, L. Farinetti - Politecnico di Torino 30
Example of description using Dublin Core (in RDF)
A paper in the “Ariadne” journal
F. Corno, L. Farinetti - Politecnico di Torino 31
Common language for field values
ProblemsValue type
Title =“Introduction to
the Semantic Web”
type = string
Date =2009-04-14
type = date
Author =L. Farinetti
type = string“standard” format? Laura Farinetti, FarinettiLaura, Farinetti L., …
F. Corno, L. Farinetti - Politecnico di Torino 32
Common language for field values
ProblemsValue typeValue restrictions?
freedom vs shared understanding
Quality = high
High, medium, low?1 to 5?any value?
Level = PhD students
any value?list of possible values?
Topic ={computer science,
knowledge representation,
metadata}
any value?any number of values?
F. Corno, L. Farinetti - Politecnico di Torino 33
Common language for field values
Solution: metadata standards + controlled vocabulariesMetadata standards
Only some, and partiallyControlled vocabularies
Explicit list of possible values
F. Corno, L. Farinetti - Politecnico di Torino 34
Examples from IEEE LOM
1484.12.1 - 2002 Learning Object Metadata (LOM) Standard
Developed by the IEEE Learning Technology Standards Committee (LTSC)
Standard to describe the “Learning Objects” in order to guarantee their interoperability
F. Corno, L. Farinetti - Politecnico di Torino 35
Examples from IEEE LOM
F. Corno, L. Farinetti - Politecnico di Torino 36
Examples from IEEE LOM
F. Corno, L. Farinetti - Politecnico di Torino 37
Examples from IEEE LOM
F. Corno, L. Farinetti - Politecnico di Torino 38
… + controlled vocabularies
A closed list of named subjects, which can be used for classificationMetadata field values arerestricted to a list of terms(selected by experts)
Topic ={computer science,
informatics, knowledge
representation, metadata}
F. Corno, L. Farinetti - Politecnico di Torino 39
Semantically rich descriptions to support search
F. Corno, L. Farinetti - Politecnico di Torino 40
Semantically rich descriptions to support search
http://dictybase.org/db/html/help/GO.htmlhttp://dictybase.org/db/html/help/GO.html
Topic ={metabolism, …}
KnowledgeRepresentation
F. Corno, L. Farinetti - Politecnico di Torino 42
Need for knowledge representation
Semantically rich descriptions need “understanding” the meaning of a resource and the domain related to the resource
Disambiguation of termsShared agreement on meaningsDescription of the domain, with concepts and relations among concepts
F. Corno, L. Farinetti - Politecnico di Torino 43
Example: Dublin Core metadataMetadata of a single paper
F. Corno, L. Farinetti - Politecnico di Torino 44
ProblemsTitle usually offers good clues, but
it does not necessarily mention all names of all subjects the user is interested init may presuppose knowledge the user does not actually possess
Subject is meant to convey precisely what the document is about, but
much depends on how extensive the set of keywords is, whether all related subjects are mentioned, and whether too many subjects are listed
Metadata does not say much about “how related” a resource is to a given subject
F. Corno, L. Farinetti - Politecnico di Torino 45
Search results for “topic maps”
F. Corno, L. Farinetti - Politecnico di Torino 46
Problems
Authors were free to define their own subject keywords
Results are not “about” topic maps, but “related to” topic mapsIf an author forgets to list “topic maps”, his paper will never be found
F. Corno, L. Farinetti - Politecnico di Torino 47
Subject-based classificationAny form of content classification that groups objects by their subjects
e.g the use of keywords to classify papers Metadata fields describe what the objects are about by listing discrete subjects inside a subject-based classificationImportant: difference between describing the objects being classified and describing the subjects used to classify them
Metadata describe objects Subject- based classification is the approach to describe subject
F. Corno, L. Farinetti - Politecnico di Torino 48
Subject-based classification ...“On those remote pages it is written that animals are divided into:a. those that belong to the Emperor b. embalmed ones c. those that are trained d. suckling pigse. mermaids f. fabulous ones g. stray dogs h. those that are included in this classificationi. those that tremble as if they were mad j. innumerable ones k. those drawn with a very fine camel's hair brush l. others m. those that have just broken a flower vase n. those that resemble flies from a distance"
From The Celestial Emporium of Benevolent Knowledge, Borges
F. Corno, L. Farinetti - Politecnico di Torino 49
Subject-based classification techniques
Controlled vocabulariesTaxonomiesThesauriFaceted classificationOntologiesFolksonomiesOthers
… Most come from library science
F. Corno, L. Farinetti - Politecnico di Torino 50
Controlled vocabularyA closed list of named subjects, which can be used for classificationComposed of terms: particular name for a particular concept
similar to keywordsTerms are not concepts
A single term may be the name of one or more conceptsA single concept may have multiple namesAmbiguity avoided by forbidding duplicate terms
F. Corno, L. Farinetti - Politecnico di Torino 51
Controlled vocabulary
GoalPrevent authors from defining terms that are meaningless, too broad or too narrowPrevent authors from misspelling Prevent different authors from choosing slightly different forms of the same term
The simplest form of controlled vocabulary is a list of terms (or “pick list”)
Topic ={computer science,
knowledge representation, mtadata, RDF,
topic navigation maps}topic maps
F. Corno, L. Farinetti - Politecnico di Torino 52
Controlled vocabularyReduce ambiguity inherent in normal human languages Solve the problems of homographs, homonyms, synonyms and polysemes by ensuring
That each concept is described using only one authorized term That each authorized term in the controlled vocabulary describes only one concept
F. Corno, L. Farinetti - Politecnico di Torino 53
Problems solved
Synonymdifferent words with identical or very similar meanings
F. Corno, L. Farinetti - Politecnico di Torino 54
Problems solved
Synonymdifferent words with identical or very similar meanings
close“Will you please close that door!”
“The tiger was now so close that I could smell it...”
pupilstudent
opening in the iris of the eye
axes('æk.səz) plural of axe
('æk.siz) plural of axis
F. Corno, L. Farinetti - Politecnico di Torino 55
Problems solved
Synonymdifferent words with identical or very similar meanings
student and pupil (noun) buy and purchase (verb) sick and ill (adjective)
to get
take (I'll get the drinks)
become (she got scared)
wood
understand (I get it)
a piece of a tree
a geographical area with many trees
F. Corno, L. Farinetti - Politecnico di Torino 56
Controlled vocabulary examples
Practically no “real” examplesWith very little extra effort: taxonomies and thesauri!
Circuit theoryElectronic circuitsMicrowave technology Electron tubes Semiconductor materials and devicesDielectric materials and devicesMagnetic materials and devicesSuperconducting materials and devices…
Circuit theoryElectronic circuitsMicrowave technology Electron tubes Semiconductor materials and devicesDielectric materials and devicesMagnetic materials and devicesSuperconducting materials and devices…
BloodCord bloodErythrocyte Leukocyte BasophilEosynophilLymphoblastLymphocyte MonocyteNeutrophil…
BloodCord bloodErythrocyte Leukocyte BasophilEosynophilLymphoblastLymphocyte MonocyteNeutrophil…
F. Corno, L. Farinetti - Politecnico di Torino 57
Taxonomy
Subject-based classification that arranges the terms in the controlled vocabulary into a hierarchy
Dates back to Carl Linnæus’s work on zoological and botanical classification (18th century)
F. Corno, L. Farinetti - Politecnico di Torino 58
TaxonomyAllow related terms to be grouped together
It is clear that “topic maps” and “XTM” are relatedEasier to classify documentsEasier to choose search keywords
F. Corno, L. Farinetti - Politecnico di Torino 59
Taxonomies and metadata
Metadata are stored as usual with the resourceThe “subject” will contain only controlled termsControlled terms belong to a hierarchy, shared by all papers
F. Corno, L. Farinetti - Politecnico di Torino 60
Taxonomy example: INSPEC
http://www.theiet.org/publishing/inspec/index.cfmhttp://www.theiet.org/publishing/inspec/index.cfm
F. Corno, L. Farinetti - Politecnico di Torino 61
Taxonomy example: INSPEC
F. Corno, L. Farinetti - Politecnico di Torino 62
INSPEC journal article database
INSPEC journal article database
F. Corno, L. Farinetti - Politecnico di Torino 63
Taxonomy example: anatomy terms
http://www.cbil.upenn.edu/anatomy.php3http://www.cbil.upenn.edu/anatomy.php3
F. Corno, L. Farinetti - Politecnico di Torino 64
Taxonomy example
F. Corno, L. Farinetti - Politecnico di Torino 65
Taxonomy example
http://www.acm.org/class/1998/ccs98.htmlhttp://www.acm.org/class/1998/ccs98.html
F. Corno, L. Farinetti - Politecnico di Torino 66
Taxonomy limitsOnly two kinds of relationships between terms
Parent = broader termChild = narrower term
topic navigation mapssynonym
no more in use
difference?
synonymXML topic map
difference?
F. Corno, L. Farinetti - Politecnico di Torino 67
Thesaurus
Extends taxonomiessubjects are arranged in a hierarchy
Other statements can be made about the subjectsTwo ISO standards
ISO2788 for monolingual thesauriISO5964 for multilingual thesauri
F. Corno, L. Farinetti - Politecnico di Torino 68
Thesaurus relationshipsBT – broader term
Refers to a term with wider or less specific meaningSome systems allow multiple BTs for one term, while others do notInverse property: NT - narrower termA taxonomy only uses BT and NT
SN – scope noteString explaining its meaning within the thesaurusUseful when the precise meaning of the term is not obvious from context
F. Corno, L. Farinetti - Politecnico di Torino 69
Thesaurus relationshipsUSE
Another term that is to be preferred instead of this termImplies that the terms are synonymousInverse property: UF
TT – top termThe topmost ancestor of this termThe BT of the BT of the BT...
RT – related termA term that is related to this term, without being a synonym of it or a broader/narrower term
F. Corno, L. Farinetti - Politecnico di Torino 70
Thesaurus example
http://www.ukat.org.uk/thesaurus/http://www.ukat.org.uk/thesaurus/
F. Corno, L. Farinetti - Politecnico di Torino 71
Thesaurus example
http://www.swinburne.edu.au/corporate/registrar/rms/keywords.htmhttp://www.swinburne.edu.au/corporate/registrar/rms/keywords.htm
F. Corno, L. Farinetti - Politecnico di Torino 72
http://www.loc.gov/cds/lcsh.htmlhttp://www.loc.gov/cds/lcsh.html
Thesaurus exampleLibrary of Congress Subject Heading
F. Corno, L. Farinetti - Politecnico di Torino 73
Faceted classification
Proposed by S.R. Ranganathan in the ‘30sFacets are the different axes along which documents can be classifiedEach facet contains a number of terms
Usually with a thesaurus organizationUsually a term belongs to one facet onlyA document is classified by selecting one term from each facet
F. Corno, L. Farinetti - Politecnico di Torino 74
Faceted classification example
http://flamenco.berkeley.edu/http://flamenco.berkeley.edu/
F. Corno, L. Farinetti - Politecnico di Torino 75
Advantages
Multi-dimensionalityPersistenceScalabilityFlexibility
http://freeable.polito.it/http://freeable.polito.it/
F. Corno, L. Farinetti - Politecnico di Torino 76
Ontology
Model for describing the world that consists of a set of types, properties, and relationships Extends the other subject-based classification approaches
Has open vocabulariesHas open relationship types (not just BT/NT, RT and USE/UF)
F. Corno, L. Farinetti - Politecnico di Torino 77
Ontology structure
ConceptsRelationships
Is-aOther
Instances
F. Corno, L. Farinetti - Politecnico di Torino 78
Ontology example
http://onto.stanford.edu:8080/wino/index.jsphttp://onto.stanford.edu:8080/wino/index.jsp
F. Corno, L. Farinetti - Politecnico di Torino 79
FolksonomyInternet-mediatedsocial environmentsTags compiledthrough social taggingSocial tagging
Decentralized practice where individuals and groups create, manage and share tags to annotate digital resources in an online social environmentGenerally characterized by non-standard tagging
Knowledge Management - Intro 80
Example (flickr - del.icio.us)
F. Corno, L. Farinetti - Politecnico di Torino 81
Other subject-based techniques
Synonym ringsConnect together a set of terms as being equivalent for search purposeSimilar to UF/USE relationship of thesauri, but no preferred term
F. Corno, L. Farinetti - Politecnico di Torino 82
Other subject-based techniques
Authority fileSimilar to a synonym ring, but consists of UF/USE relationships instead of synonym relationshipsOne term in each synonym ring is indicated as the preferred term for that subjecte.g. Library ofCongress NameAuthority File
F. Corno, L. Farinetti - Politecnico di Torino 83
Subject-based classification summary
Terminology is rarely used in a consistent way
Controlled vocabularies are thesauri, thesauri are ontologies, …
Terminology is rarely used in a consistent way
Controlled vocabularies are thesauri, thesauri are ontologies, …
http://www.iesr.ac.uk/profile/vocabs/index.html/#CtrldVocabsListhttp://www.iesr.ac.uk/profile/vocabs/index.html/#CtrldVocabsList
F. Corno, L. Farinetti - Politecnico di Torino 84
Subject-based classification summary
F. Corno, L. Farinetti - Politecnico di Torino 85
Coming next …
More on RDFMore on Ontologies