bhargabi chakrabarti, reshmi de. outline 1.foundations of semantic web 2.rdf 3.rdfs 4.owl 5.owl2...

Bhargabi Chakrabarti, Reshmi De

OUTLINE 1.Foundations of Semantic Web 2.RDF 3.RDFS 4.OWL 5.OWL2 6.Semantic Web Layer Cake 7.RIF

What is Semantic? The word semantic itself implies meaning or understanding. As such, the fundamental difference between Semantic Web technologies and other technologies related to data (such as relational databases or the World Wide Web itself) is that the Semantic Web is concerned with the meaning and not the structure of data.

Why do we need Semantic web? Consider a typical web page: Markup consists of: rendering information (e.g.,font size and colour) Hyper-links to related content Semantic content is accessible to humans but not (easily) to computers

What information we can see.. WWW2002 The eleventh international world wide web conference Sheraton waikiki hotel Honolulu, hawaii, USA 7-11 may 2002 1 location 5 days learn interact Registered participants coming from australia, canada, chile denmark, france, germany, ghana, hong kong, india, ireland, italy, japan, malta, new zealand, the netherlands, norway, singapore, switzerland, the united kingdom, the united states, vietnam, zaire Register now On the 7 th May Honolulu will provide the backdrop of the eleventh international world wide web conference. This prestigious event Speakers confirmed Tim berners-lee Tim is the well known inventor of the Web, Ian Foster Ian is the pioneer of the Grid, the next generation internet

What information can a machine see

Solution: XML markup with meaningful tags?

Machine sees

Solution To enable machine processing - There can be two approaches: Smarter machines Smarter data

Approach. 1 Smarter machines Teach computers to understand the meaning of Web data -The Artificial Intelligence (AI) approach Natural language processing Image recognition Etc.

Approach 2 Smarter data Make data easier for machines to understand Express meaning in a machine processable format Example: metadata The Semantic Web approach Injecting more metadata so that data become structured.

The Current Web Minimal machine processable information --dumb links. Resources are linked together forming the Web. There is no distinction between resources or the links that connect resources.

The Semantic Web -An extension of the current Web More machine-processible information To give meaning to resources and links, new standards and languages are being investigated and developed. The rules and descriptive information made available by these languages allow the type of resources on the Web and the relationships between resources to be characterized individually and precisely.

Why is machine processing difficult? Two key problems: Problem 1: Ambiguity Problem 2: Language complexity

Ambiguity "David Booth has VIN #2745534." Which "David Booth"? Vehicle #2745534? Vinyl siding order #2745534? Need to identify things: Unambiguously, in a Uniform Web-friendly way

Kinds of things to identify Three kinds of things in the universe: 1) Web resources 2) Non-Web resources - Physical objects Eg Cars, people, houses, etc. 3) Abstract concepts Sizes, colors, verbs, "love", etc. "Creator" (e.g., the creator of a document) "Airline reservation"

Unambiguously identifying Web resources Solution (trivial): URLs http://www.example.org/index.html

Unambiguously identifying physical objects Many human systems: Vehicle Identification Numbers (VIN) Product serial numbers Employee numbers Problems: Too many formats Most are not global in scope Solution: Convert to URIs http://www.example.com/employeeid/85740

Unambiguously identifying abstract concepts Solution: Use URIs Problem: Which URIs? Need to agree on common vocabulary Solution: Ontology

URI In computing, a uniform resource identifier (URI) is a string of characters used to identify a name or a resource. URIs can be classified as locators (URLs), as names (URNs), or as both. A uniform resource name (URN) functions like a person's name, while a uniform resource locator (URL) resembles that person's street address. In other words: the URN defines an item's identity, while the URL provides a method for finding it.

Ontology "Formal description of concepts and their relationships" In other words: Vocabulary of terms "book", "publication", "greyhound", "dog" And their relationships "book is-a-kind-of publication" "greyhound is-a-kind-of dog"

Ontology Vocabulary+Structure=Taxonomy Taxonomy+Relationships,Contraints, Rules=Ontology Ontology+Instances=Knowledge Base

Structure of an Ontology Ontologies typically have two distinct components: Names for important concepts in the domain Elephant is a concept whose members are a kind of animal Herbivore is a concept whose members are exactly those animals who eat only plants or parts of plants Adult_Elephant is a concept whose members are exactly those elephants whose age is greater than 20 years Background knowledge/constraints on the domain Adult_Elephants weigh at least 2,000 kg All Elephants are either African_Elephants or Indian_Elephants No individual can be both a Herbivore and a Carnivore

Dublin Core One well-known ontology Defines 15 basic terms for documents and publishing: "title", "creator", "subject", "publisher Each term unambiguously identified by URI http://purl.org/dc/elements/1.1/creator

Ontology Languages Wide variety of languages for Explicit Specification Graphical notations Semantic networks Topic Maps (see http://www.topicmaps.org/) UML RDF Logic based Description Logics (e.g., OIL, DAML+OIL, OWL) Rules (e.g., RuleML, LP/Prolog) First Order Logic (e.g., KIF) Conceptual graphs (Syntactically) higher order logics (e.g., LBase) Non-classical logics (e.g., Flogic, Non-Mon, modalities) Probabilistic/fuzzy Degree of formality varies widely Increased formality makes languages more amenable to machine processing (e.g., automated reasoning)

6300 kilometers western China's Qinghai-Tibet Plateau East China Sea "Here is data about the Yangtze River. It has a length of 6300 kilometers. Its startingLocation is western China's Qinghai-Tibet Plateau. Its endingLocation is the East China Sea."">

What is the Purpose of RDF? The purpose of RDF (Resource Description Framework) is to give a standard way of specifying data "about" something. Here's an example of an XML document that specifies data about China's Yangtze river : 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea "Here is data about the Yangtze River. It has a length of 6300 kilometers. Its startingLocation is western China's Qinghai-Tibet Plateau. Its endingLocation is the East China Sea."

6300 kilometers western China's Qinghai-Tibet Plateau East China Sea XML Modify "> 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea XML Modify the following XML document so that it is also a valid RDF document: 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea RDF Yangtze.xml Yangtze.rdf "convert to""> 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea XML Modify " title="XML --> RDF 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea XML Modify ">

XML --> RDF 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea XML Modify the following XML document so that it is also a valid RDF document: 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea RDF Yangtze.xml Yangtze.rdf "convert to"

6300 kilometers "> 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea RDF provides an ID attribute for identifying the resource being described. The ID attribute is in the RDF namespace. Add the "fragment identifier symbol" to the namespace. 1 2 3"> 6300 kilometers " title="The RDF Format 6300 kilometers ">

The RDF Format 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea RDF provides an ID attribute for identifying the resource being described. The ID attribute is in the RDF namespace. Add the "fragment identifier symbol" to the namespace. 1 2 3

6300 kil"> 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea Identifies the type (class) of the resource being described. Identifies the resource being described. This resource is an instance of River. These are properties, or attributes, of the type (class). Values of the properties 1 2 3 4"> 6300 kil" title="The RDF Format (cont.) 6300 kil">

The RDF Format (cont.) 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea Identifies the type (class) of the resource being described. Identifies the resource being described. This resource is an instance of River. These are properties, or attributes, of the type (class). Values of the properties 1 2 3 4

Namespace Convention xmlns="http://www.geodesy.org/river#" Question: Why was "#" placed onto the end of the namespace? E.g., Answer: RDF is very concerned about uniquely identifying things - uniquely identifying the type (class) and uniquely identifying the properties. If we concatenate the namespace with the type then we get a unique identifier for the type, e.g., http://www.geodesy.org/river#River If we concatenate the namespace with a property then we get a unique identifier for the property, e.g., http://www.geodesy.org/river#length http://www.geodesy.org/river#startingLocation http://www.geodesy.org/river#endingLocation Thus, the "#" symbol is simply a mechanism for separating the namespace from the type name and the property name. Best Practice

6300 kilometers western China's Qinghai-Tibet Plateau East China Sea Suppose that this RDF/XML document is located at this URL: http://www.china.org/geography/rivers. Thus, the complete URI for this resource is: Yangtze.rdf http://www.china.org/geography/rivers#Yangtze">

rdf:ID The value of rdf:ID is a "relative URI". The "complete URI" is obtained by concatenating the URL of the XML document with "#" and then the value of rdf:ID, e.g., 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea Suppose that this RDF/XML document is located at this URL: http://www.china.org/geography/rivers. Thus, the complete URI for this resource is: Yangtze.rdf http://www.china.org/geography/rivers#Yangtze

6300 kilometers western China's Qinghai-Tibet Plateau East China Sea Resource URI = concatenation(xml:base, '#', rdf:ID) = concatenation(http://www.china.org/geography/rivers, '#', "Yangtze") = http://www.china.org/geography/rivers#Yangtze">

xml:base On the previous slide we showed how the URL of the document provided the base URI. Depending on the location of the document is brittle: it will break if the document is moved, or is copied to another location. A more robust solution is to specify the base URI in the document, e.g., 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea Resource URI = concatenation(xml:base, '#', rdf:ID) = concatenation(http://www.china.org/geography/rivers, '#', "Yangtze") = http://www.china.org/geography/rivers#Yangtze

6300 kilometers western China's Qinghai-Tibet Plateau East China Sea">

rdf:about Instead of identifying a resource with a relative URI (which then requires a base URI to be prepended), we can give the complete identity of a resource. However, we use rdf:about, rather than rdf:ID, e.g., 6300 kilometers western China's Qinghai-Tibet Plateau East China Sea

value..."> value..."> value..." title="The RDF Format value...">

The RDF Format value...

Advantage of using the RDF Format You may ask: "Why should I bother designing my XML to be in the RDF format?" Answer: there are numerous benefits: The RDF format, if widely used, will help to make XML more interoperable: Tools can instantly characterize the structure, "this element is a type (class), and here are its properties. The RDF format gives you a structured approach to designing your XML documents. The RDF format is a regular, recurring pattern. It enables you to quickly identify weaknesses and inconsistencies of non-RDF-compliant XML designs. It helps you to better understand your data! You reap the benefits of both worlds: You can use standard XML editors and validators to create, edit, and validate your XML. You can use the RDF tools to apply inferencing to the data. It positions your data for the Semantic Web! Network effect Interoperability

Disadvantage of using the RDF Format Constrained: the RDF format constrains you on how you design your XML (i.e., you can't design your XML in any arbitrary fashion). RDF uses namespaces to uniquely identify types (classes), properties, and resources. Thus, you must have a solid understanding of namespaces. Another XML vocabulary to learn: to use the RDF format you must learn the RDF vocabulary.

Triple -> resource/property/value http://www.china.org/geography/rivers#Yangtze has a http://www.geodesy.org/river#length of 6300 kilometers resource property value http://www.china.org/geography/rivers#Yangtze has a http://www.geodesy.org/river#startingLocation of western China's... resource property value http://www.china.org/geography/rivers#Yangtze has a http://www.geodesy.org/river#endingLocation of East China Sea resource property value

The RDF Format = triples! The fundamental design pattern of RDF is to structure your XML data as resource/property/value triples! The value of a property can be a literal (e.g., length has a value of 6300 kilometers). Also, the value of a property can be a resource, as shown above (e.g., property-A has a value of Resource-B, property-B has a value of Resource-C). Value-C value of property-A value of property-B Notice that the RDF design pattern is an alternating sequence of resource-property. This pattern is known as "striping".

RDF Model (graph) Legend: Ellipse indicates "Resource" Rectangle indicates "literal string value"

Terminology As you read the RDF literature you may see the following terminology: Subject: this term refers to the item that is playing the role of the resource. predicate: this term refers to the item that is playing the role of the property. Object: this term refers to the item that is playing the role of the value. Subject Object predicate Resource Value property Equivalent!

RDF Parser There is a nice RDF parser at the W3 Web site: http://www.w3.org/RDF/Validator/ This RDF parser will tell you if your XML is in the proper RDF format.

What is missing from RDF? A Schema Support Enables Reasoning Solution: Use RDF-S (RDF Schema)

RDF Schema (RDFS) RDF gives a formalism for meta data annotation, and a way to write it down in XML, but it does not give any special meaning to vocabulary such as subClassOf or type RDF Schema allows you to define vocabulary terms and the relations between those terms it gives extra meaning to particular RDF predicates and resources this extra meaning, or semantics, specifies how a term should be interpreted

RDF Schema Extension to RDF to allow definition of application- specific classes and properties Provides a framework to describe such Classes - similar to OOP Allows instances and subclasses of classes.

Ocean Lake BodyOfWater River Stream Properties: length: Literal emptiesInto: BodyOfWater Sea NaturallyOccurringWaterSource RDF Schema is about creating Taxonomies! Tributary Brook Rivulet

"> 6300 kilometers What inferences can be made with this data? Inferences are made by examining a taxonomy that contains River. See next slide. What inferences can be made on this RDF/XML, given the taxonomy on the last slide?"> " title="Yangtze.rdf ">

Yangtze.rdf 6300 kilometers What inferences can be made with this data? Inferences are made by examining a taxonomy that contains River. See next slide. What inferences can be made on this RDF/XML, given the taxonomy on the last slide?

6300 kilometers">

Ocean Lake BodyOfWater River Stream Properties: length: Literal emptiesInto: BodyOfWater Sea NaturallyOccurringWaterSource TributaryBrook Inference Engine Inferences: - Yangtze is a Stream - Yangtze is an NaturallyOcurringWaterSource - http://www.china.org/geography#EastChinaSea is a BodyOfWater Yangtze.rdf Rivulet 6300 kilometers

How does a taxonomy facilitate searching? Ocean Lake BodyOfWater River Stream Properties: length: Literal emptiesInto: BodyOfWater Sea NaturallyOccurringWaterSource Tributary Brook The taxonomy shows that when searching for "streams", any RDF/XML that uses the class Brook, Rivulet, River, or Tributary are relevant. See next slide. Rivulet

6300 kilometers Search Engine Results: - Yangtze is a Stream, so this document is relevant to the query. "Show me all documents that contain info about Streams" Yangtze.rdf Rivulet">

Ocean Lake BodyOfWater River Stream Properties: length: Literal emptiesInto: BodyOfWater Sea NaturallyOccurringWaterSource TributaryBrook 6300 kilometers Search Engine Results: - Yangtze is a Stream, so this document is relevant to the query. "Show me all documents that contain info about Streams" Yangtze.rdf Rivulet

So RDF Schemas RDF Schemas is all about defining taxonomies (class hierarchies). As we've seen, a taxonomy can be used to make inferences and to facilitate searching. That's all there is to RDF Schemas! The rest is just syntax The previous slide showed the taxonomy in a graphical form. Obviously, we need to express the taxonomy in a form that is machine-processable. RDF Schemas provides an XML vocabulary to express taxonomies.

RDF Schema provides an XML vocabulary to express taxonomies Ocean Lake BodyOfWater River Stream Properties: length: Literal emptiesInto: BodyOfWater Sea NaturallyOccurringWaterSource TributaryBrook XML NaturallyOccurringWaterSource.rdfs Rivulet "express as"

.. This is read as: "I hereby define a River Class. River is a subClassOf Stream." "I hereby define a Stream Class. Stream is a subClassOf NaturallyOccurringWaterSource."... NaturallyOccurringWaterSource.rdfs (snippet) All classes and properties are defined within rdf:RDF Defines the River class Defines the Stream class Since the Stream class is defined in the same document we can reference it using a fragment identifier. 1 2 Assigns a namespace to the taxonomy! 3 4 5">

Defining a class (e.g., River) ... This is read as: "I hereby define a River Class. River is a subClassOf Stream." "I hereby define a Stream Class. Stream is a subClassOf NaturallyOccurringWaterSource."... NaturallyOccurringWaterSource.rdfs (snippet) All classes and properties are defined within rdf:RDF Defines the River class Defines the Stream class Since the Stream class is defined in the same document we can reference it using a fragment identifier. 1 2 Assigns a namespace to the taxonomy! 3 4 5

rdfs:Class This type is used to define a class. The rdf:ID provides a name for the class. The contents are used to indicate the members of the class. The contents are ANDed together. Name of the class ANDed

Equivalent!

rdfs:subClassOf Stream River This represents the set of Streams, i.e., the set of instances of type Stream. This represents the set of Rivers, i.e., the set of instances of type River.

rdfs:subClassOf Use this property to indicate a subclass relationship between one class and another class. You may specify zero, one, or multiple rdfs : subClassOf properties. Zero: if you define a class without specifying rdfs:subClassOf then you are implicitly stating that the class is a subClassOf rdfs:Resource (the root of all classes). One: if you define a class by specifying one rdfs:subClassOf then you are indicating that the class is a subclass of that class. Multiple: if you define a class by specifying multiple rdfs:subClassOf properties then you are indicating that the class is a subclass of each of the other classes. Example: consider the River class: suppose that it has two rdfs:subClassOf properties - one that specifies Stream and a second that specifies SedimentContainer. Thus, the two rdfs:subClassOf properties indicate that a River is a Stream and a SedimentContainer. That is, each instance of River is both a Stream and a SedimentContainer.

Example of multiple rdfs:subClassOf properties Stream River SedimentContainer - a River is both a Stream and a SedimentContainer. The conjunction (AND) of two subClassOf statements is a subset of the intersection of the classes.

rdfs:subClassOf is transitive Ocean Lake BodyOfWater River Stream Sea NaturallyOccurringWaterSource TributaryBrook Rivulet Consider the above class hierarchy. It says, for example, that: - A Rivulet is a Brook. - A Brook is a Stream. Therefore, since subClassOf is transitive, a Rivulet is a Stream. (Note that a Rivulet is also a NaturallyOccurringWaterSource.)

.. This is read as: "I hereby define an emptiesInto Property. The domain (class) in which emptiesInto is used is River. The range (of values) for emptiesInto are instances of BodyOfWater." That is, the emptiesInto Property relates (associates) a River to a BodyOfWater. NaturallyOccurringWaterSource.rdfs (snippet) River BodyOfWater emptiesInto domain range">

Defining a property (e.g., emptiesInto ) ... This is read as: "I hereby define an emptiesInto Property. The domain (class) in which emptiesInto is used is River. The range (of values) for emptiesInto are instances of BodyOfWater." That is, the emptiesInto Property relates (associates) a River to a BodyOfWater. NaturallyOccurringWaterSource.rdfs (snippet) River BodyOfWater emptiesInto domain range

rdf:Property This type is used to define a property. The rdf:ID provides a name for the property. The contents are used to indicate the usage of the property. The contents are ANDed together. Name of the property ANDed

Equivalent!

rdfs:range Use this property to indicate the type of values that a property will contain. You may specify zero, one, or multiple rdfs:range properties. Zero: if you define a property without specifying rdfs:range then you are providing no information about the type of value that the property will contain. One: if you define a property by specifying one rdfs:range then you are indicating that the property will contain a value whose type is that specified by rdfs:range. Multiple: if you define a property by specifying multiple rdfs:range properties then you are indicating that the property will contain a value which belongs to every class defined by the rdfs:range properties. Example: consider the property emptiesInto: suppose that it has two rdfs:range properties - one that specifies BodyOfWater and a second that specifies CoastalWater. Thus, the two rdfs:range properties indicate that emptiesInto will contain a value that is a BodyOfWater and a CoastalWater.

Example of multiple rdfs:range properties BodyOfWater range CoastalWater - the value of emptiesInto is a BodyOfWater and a CoastalWater.

rdfs:domain Use this property to indicate the classes that a property will be used with. You may specify zero, one, or multiple rdfs:domain properties. Zero: if you define a property without specifying rdfs:domain then you are providing no information about the class that the property will be used with, i.e., the property can be used with any class. One: if you define a property by specifying one rdfs:domain then you are indicating that the property will be used with the class specified by rdfs:domain. Multiple: if you define a property by specifying multiple rdfs:domain properties then you are indicating that the property will be used with a class which belongs to every class defined by the rdfs:domain properties. Example: consider the property emptiesInto: suppose that it has two rdfs:domain properties - one that specifies River and a second that specifies Vessel. Thus, the two rdfs:domain properties indicate that emptiesInto will be used with a class that is a River and a Vessel.

Example of multiple rdfs:domain properties River domain Vessel - emptiesInto is to be used in instances that are of type River and Vessel.

Note that properties are defined separately from classes With most Object-Oriented languages when a class is defined the properties (attributes) are simultaneously defined. For example, "I hereby define a Rectangle class, and its attributes are length and width." With RDF Schema things are different. You define a class (and indicate its relationships to other classes). Separately, you define properties and then associate them with a class! For the above example you would define the Rectangle class (and indicate that it is a subclass of GeometricObject). Separately, you then define a length property, indicate its range of value, and then indicate that length may be used with the Rectangle class. (Thus, if you have an untyped Resource with a length property you can infer the Resource is a Rectangle.) Likewise for the width property.

Advantage of separately defining classes and properties As we have seen, the RDF Schema approach is to define a class, and then separately define properties and state that they are to be used with the class. The advantage of this approach is that anyone, anywhere, anytime can create a property and state that it is usable with the class!

.. NaturallyOccurringWaterSource.rdfs (snippet)">

The XML Representation of the taxonomy ... NaturallyOccurringWaterSource.rdfs (snippet)

Literal value A literal type is a simple, untyped string.

NaturallyOccurringWaterSource Ontology! NaturallyOccurringWaterSource.rdfs defines a set of classes and how the classes are related. It defines a set of properties and indicates the type of values they may have and what classes they may be associated with. That is, it defines an ontology for NaturallyOccurringWaterSources!

6300 kilometers Notice that in this RDF/XML instance the class of the resource (Yangtze) is not identified: However, we can infer that Yangtze is a River because length and emptiesInto have a rdfs:domain of River, i.e., their domain asserts that these properties will be used in a River instance.">

Inferring a resource's class from the properties' domain 6300 kilometers Notice that in this RDF/XML instance the class of the resource (Yangtze) is not identified: However, we can infer that Yangtze is a River because length and emptiesInto have a rdfs:domain of River, i.e., their domain asserts that these properties will be used in a River instance.

RDF Schemas: simple, yet powerful Let's summarize what we have learned: Use RDF Schema to define: a class hierarchy (a taxonomy), properties associate them with a class (use rdfs:domain) indicate the range of values (use rdfs:range) Once an RDF Schema is defined then it can be used to infer additional facts about data: a class is an instance of all superclasses

Problems with RDFS RDFS too weak to describe resources in sufficient detail No localised range and domain constraints Cant say that the range of hasChild is person when applied to persons and elephant when applied to elephants No existence/cardinality constraints Cant say that all instances of person have a mother that is also a person, or that persons have exactly 2 parents No inverse properties Cant say that hasPart is the inverse of isPartOf Two classes, same concept - people use different words to represent the same thing. It would be very useful to be able to state "this class is equivalent to this second class". One person may create an ontology with a class called "Airplane". Another person may create an ontology with a class called "Plane". It would be useful to be able to indicate that the two classes are equivalent.

RDF Schemas: Building Block to More Expressive Ontology Languages RDF Schema OWL RDF Schema was designed to be extended. The ontology languages all use RDF Schema's basic notions of Class, Property, domain, and range. OWL = Web Ontology Language

RDF Schema vs XML Schema XML Schemas is all about syntax. RDF Schema is all about semantics. An XML Schema tool is intended to validate that an XML instance conforms to the syntax specified by the XML Schema. An RDF Schema tool is intended to provide additional facts to supplement the facts in RDF/XML instances. XML Schemas is prescriptive - an XML Schema prescribes what an element may contain, and the order the child elements may occur. RDF Schemas is descriptive - an RDF Schema simply describes classes and properties.

Purpose of OWL The purpose of OWL is identical to RDF Schemas - to provide an XML vocabulary to define classes, their properties and their relationships among classes. RDF Schema enables you to express very rudimentary relationships and has limited inferencing capability. OWL enables you to express much richer relationships, thus yielding a much enhanced inferencing capability. A benefit of OWL is that it facilitates a much greater degree of inference making than you get with RDF Schemas.

OWL and RDF Schema enables machine-processable semantics XML/DTD/XML Schemas RDF Schema OWL Syntax Semantics

OWL = RDF Schema + more Note: all of the elements/attributes provided by RDF and RDF Schema can be used when creating an OWL document.

Web Ontology Language OWL adds many new features to RDF: Functional properties Inverse functional properties (database keys) Local domain and range constraints General cardinality constraints Inverse properties Symmetric and transitive properties

Example 1: The Robber and the Speeder DNA samples from a robbery identified John Walker Lindh as the suspect. Here is the police report on the robbery:... Later in the day a state trooper gives a person a ticket for speeding. The driver's license showed the name Sulay. Here is the state trooper's report on the speeder:...

Any Relationship between the Robber and the Speeder? The Central Intelligence Agency (CIA) has a file on Sulay: Robbery Speeder John Walker Lindh Sulay owl:sameIndividualAs Inference: The Robber and the Speeder are one and the same! The local police, state troopers, and CIA share their information, thus enabling the following inference to be made:

Lesson Learned OWL provides a property (owl:sameIndividualAs) for indicating that two resources (e.g., two people) are the same.

Example 2: Using a Web Bot to Purchase a Camera My Web Assistant (a Web Bot) Web Site "Please send me your e-catalog" 1.4 300mm zoom optional $325 USD Is "SLR" a Camera? "Here's my e-catalog" 1 2 3 * A Web Bot is a software program which crawls the Web looking for information.

Camera OWL Ontology Camera SLR Large-Format Digital My Web Assistant program consults the Camera OWL Ontology. The Ontology shows how SLR is classified. The Ontology shows that SLR is a type (subclass) of Camera. Thus, my Web Assistant Bot dynamically realizes that: Inference: The Olympus-OM10 SLR is a Camera!

Lesson Learned OWL provides elements to construct taxonomies (called class hierarchies). The taxonomies can be used to dynamically discover relationships!

Example 3: The Birthplace of King Kameha is Upon scanning the Web, three documents were found which contain information about King Kameha Question: What is the birthplace of King Kameha? 1 2 3

Answer: all three! The Person OWL Ontology indicates that a Person has only one birthplace location: Location Person birthplace 1 Thus, the Person OWL Ontology enables this inference to be made: Inference: Hawaii, Sandwich Islands, and Aloha State all represent the same location! King Kameha Hawaii Sandwich Islands Aloha State birthplace King Kameha birthplace King Kameha birthplace They all represent the same location!

Lesson Learned In the example we saw that the Person Ontology defined this relationship: Location Person birthplace 1 This is read as: "A person has exactly one birthplace location." This example is a specific instance of a general capability in OWL to specify that a subject Resource has exactly one value: Resource (value) Resource (subject) property 1 We saw in the example that such information can be used to make inferences. OWL Terminology: properties that relate a resource to exactly one other resource are said to have a cardinality=1.

Review Some of the OWL's capabilities are: An OWL instance document can be enhanced with an OWL property to indicate that it is the same as another instance. OWL provides the capability to construct taxonomies (class hierarchies). Such taxonomies can be used to dynamically understand how entities in an XML instance relate to other entities. OWL provides the capability to specify that a subject can have only one value. By leveraging OWL, additional facts about your instance data can be dynamically ascertained. That is, OWL facilitates a dynamic understanding of the semantics of your data!

Defining Properties in OWL Recall that with RDF Schema the rdf:Property was used for both: relating a Resource to another Resource Example: The emptiesInto property relates a River to a BodyOfWater. relating a Resource to an rdfs:Literal or a datatype Example: The length property relates a River to a xsd:nonNegativeInteger. OWL decided that these are two classes of properties, and thus each should have its own class: owl:ObjectProperty is used to relate a Resource to another Resource owl:DatatypeProperty is used to relate a Resource to an rdfs:Literal or an XML Schema built-in datatype

ObjectProperty vs. DatatypeProperty Resource ObjectProperty Resource DatatypeProperty Resource Value An ObjectProperty relates one Resource to another Resource: A DatatypeProperty relates a Resource to a Literal or an XML Schema datatype:

owl:ObjectProperty and owl:DatatypeProperty are subclasses of rdf:Property rdf:Property owl:ObjectProperty owl:DatatypeProperty rdf:Property owl:ObjectProperty owl:DatatypeProperty

Defining Properties in OWL vs. RDF Schema RDFS OWL

The Three Faces of OWL

OWL Full, OWL DL, and OWL Lite Not everyone will need all of the capabilities that OWL provides. Thus, there are three versions of OWL: OWL Full OWL DL OWL Lite DL = Description Logic

Comparison OWL FullOWL DL OWL Lite Everything that has been shown in this tutorial is available. Further, you can mix RDF Schema definitions with OWL definitions. You cannot use owl:cardinality with TransitiveProperty. You cannot use a class as a member of another class, i.e., you cannot have metaclasses. FunctionalProperty and InverseFunctionalProperty cannot be used with datatypes (they can only be used with ObjectProperty). All the DL restrictions plus: You cannot use owl:minCardinality or owl:maxCardinality. The only allowed values for owl:cardinality is 0 and 1. Cannot use owl:hasValue. Cannot use owl:disjointWith. Cannot use owl:oneOf. Cannot use owl:complementOf. Cannot use owl:unionOf.

Advantages/Disadvantages Full: The advantage of the Full version of OWL is that you get the full power of the OWL language. The disadvantage of the Full version of OWL is that it is difficult to build a Full tool. Also, the user of a Full-compliant tool may not get a quick and complete answer. DL/Lite: The advantage of the DL or Lite version of OWL is that tools can be built more quickly and easily, and users can expect responses from such tools to come quicker and be more complete. The disadvantage of the DL or Lite version of OWL is that you don't have access to the full power of the language.

Experience with OWL OWL plays a key role in an increasing number & range of applications E.g. Science, eCommerce, geography, engineering, defence etc. E.g. OWL tools used to identify and repair errors in a medical ontology: Experience of OWL in use has identified restrictions: on expressivity on scalability These restrictions are problematic in some applications Research has now shown how some restrictions can be overcome W3C group has updated OWL accordingly Result is called OWL 2 OWL 2 is now a Proposed Recommendation

OWL2 Extends OWL 1 Inherits OWL 1 language features The new features of OWL 2 based on: Real applications User experience Tool developer experience

OWL 2 in a Nutshell Extends OWL with a small but useful set of features That are needed in applications For which semantics and reasoning techniques are well understood That tool builders are willing and able to support Adds profiles Language subsets with useful computational properties Is fully backwards compatible with OWL: Every OWL ontology is a valid OWL 2 ontology Every OWL 2 ontology not using new features is a valid OWL ontology Already supported by popular OWL tools & infrastructure: Protg, HermiT, Pellet, FaCT++, OWL API

Whats New in OWL 2? Four kinds of new feature: Increased expressive power Extended Datatypes Metamodelling and annotations Syntactic sugar

Feature 1: Increased expressive power qualified cardinality restrictions, e.g. P ersons having two friends who are republicans property chains, e.g.: T he brother of your parent is your uncle local reflexivity restrictions, e.g. Classes of objects that are related to themselves by a given property narcissists love themselves. Auto-regulating processes regulate themselves reflexive, irreflexive, and asymmetric properties, e.g.: Everything is part of itself ( Globally reflexive), Nothing can be a proper part of itself (irreflexive) If x is proper part of y, then the opposite does not hold(assymentric) disjoint properties, e.g.:Y ou cant be both the parent of and child of the same person keys, e.g.:c ountry + license plate constitute a unique identifier for vehicles

= 18 DatatypeRestriction(xsd:integer minInclusive 18) Data range combinations Intersection of DataIntersectionOf( xsd:nonNegativeInteger xsd:nonPositiveInteger ) Union of DataUnionOf( xsd:string xsd:integer ) Complement of data range DataComplementOf( xsd:positiveInteger )">

Feature 2: Extended Datatypes Extra datatypes- Much wider range of XSD Datatypes supported, e.g.:Integer, string, boolean, real, decimal, float, datatime, Datatype definitions- New User-defined datatypes:Eg format of Italian registration plates: xsd:string xsd:pattern "[A-Z]{2} [0-9]{3}[A-Z]{2} Datatype restrictions Range of datatypes. Eg: adult has an age >= 18 DatatypeRestriction(xsd:integer minInclusive 18) Data range combinations Intersection of DataIntersectionOf( xsd:nonNegativeInteger xsd:nonPositiveInteger ) Union of DataUnionOf( xsd:string xsd:integer ) Complement of data range DataComplementOf( xsd:positiveInteger )

Feature 3: Metamodelling and Annotations Restricted form of metamodelling via punning, e.g.: SnowLeopard subClassOf BigCat (i.e., a class) SnowLeopard type EndangeredSpecies (i.e., an individual) Classes and individuals can have the same. SnowLeopard as a class and as an individual Annotations of axioms as well as entities, e.g.: SnowLeopard type EndangeredSpecies (source: WWF) Even annotations of annotations

Feature 4: Syntactic sugar Syntax used to make things easier to read or to express. It makes the language "sweeter" for humans to use. DisjointUnion- Eg:Element is the DisjointUnion of Earth Wind Fire Water i.e,Element is equivalent to the union of Earth Wind Fire Water. Earth Wind Fire Water are pair-wise disjoint DisjointClasses- A set of classes, All the classes are pairwise disjoint Example:Nothing can be both a LeftLung and a RightLung NegativeObjectPropertyAssertion- Two individuals, a property does not hold between them Example, Patient John does not live in Povo NegativeDataPropertyAssertion An individual, A literal, A property does not hold between them Example, John is not 5 years old.

Profiles Profiles are sublanguages of OWL 2 There are three profiles OWL 2 EL OWL 2 QL OWL 2 RL

OWL 2 EL The EL acronym reflects the profiles basis in the EL family of description logics This logic is also called small description logic (DL) EL This logic allows for conjunction and existential restrictions It does not allow disjunction and universal restrictions It can capture the expressive power used by many large-scale ontologies.

OWL 2 QL The QL acronym reflects its relation to the standard relational Query Language It does not allow existential and universal restrictions to a class expression or a data range These restrictions enable a tight integration with RDBMSs, reasoners can be implemented on top of standard relational databases Can answer complex queries

OWL 2 RL The RL acronym reflects its relation to the Rule Languages OWL 2 RL is designed to accommodate:- OWL 2 applications that can trade the full expressivity of the language for efficiency RDF(S) applications that need some added expressivity from OWL 2 Existential quantification to a class, union and disjoint union to class expressions are not allowed These restrictions allow OWL 2 RL to be implemented using rule-based technologies such as rule extended DBMSs

Profiles Profile selection depends on Expressivenss required by the application Priority given to reasoning on classes or data Size of the datasets

Using OWL to Define Classes

Constructing Classes using Set Operators OWL gives you the ability to construct classes using these set operators: intersectionOf unionOf complementOf

Class Constructors OWL classes can be constructed from other classes in a variety of ways: Intersection (Boolean AND) Union (Boolean OR) Complement (Boolean NOT) Restriction Class construction is the basis for description logic.

OWL Class Constructors

OWL Axioms Axioms (mostly) reducible to inclusion ( v ) C D iff both C v D and D v C

OWL vs. Database Advantages of using OWL to define an Ontology: Extensible: much easier to add new properties. Contrast with a database - adding a new column may break a lot of applications (see example on next slide) Portable: much easier to move an OWL document than to move a database. Advantages of using a Database to define an Ontology: Mature: the database technology has been around a long time and is very mature.

The semantic Web architecture is composed of a series of standards organized into a certain structure that is an expression of their interrelationships. This architecture is represented using a diagram It starts with the foundation of URIs and Unicode. On top of that we can find the syntactic interoperability layer in the form of XML, which in turn underlies RDF and RDF schema (RDFS). Webontology languages are built on top of RDF(S). The three last layers are the logic, proof, and trust, which have not been significantly explored. Some of the layers rely on the digital signature component to ensure security.

Semantic Web Layer Cake Illustrates the different parts of the semantic Web architecture. First proposed by Tim Berners-Lee.

Evolution of the Web

Logic and Proof Current semantic Web research Good: systems can understand basic concepts (subclass, inverse etc.) Better: if we could state any logical principles we wanted to. Logical statements (rules) that allow the computer to make inferences and deductions.

Logic I am an employee of MemberCo. MemberCo is a member of W3C. MemberCo has GET access to http://www.w3.org/Member/. I (therefore) have access to http://www.w3.org/Member/.

Example(deduction) If someone sell more than 100 products then they are a member of Super Salesman club. John sold 102 things; therefore John is a member of the Super Salesman club. More complex rules and inference engines explored

Proof Different people can write logic statements Machines can follow semantic links to prove facts Prove John is a Super Salesman - Sales: John sold 55 widgets + 47 sprockets - Widgets + sprockets: company products - 55 + 47 =102 - 102 > 100 - Super Salesman rule - Proved: John is a Super Salesman A Web of information processors (e.g. P2P)

Proof MemberCo's document employList lists me as an employee. W3Cs member list includes MemberCo. The ACLs for http://www.w3.org/Member/ assert that employees of members have GET access.

bhargabi chakrabarti, reshmi de. outline 1.foundations of semantic web 2.rdf 3.rdfs 4.owl 5.owl2...

Documents

semantic web approach

meaning of web data

semantic web technologies

foundations of semantic

uniform webfriendly

semantic web layer cake

typical web page

language complexity