handling markup overlaps using owl
Post on 04-Jun-2015
565 Views
Preview:
DESCRIPTION
TRANSCRIPT
http://creativecommons.org/licenses/by-sa/3.0
Handling markup overlaps using OWL
Angelo Di Iorio (diiorio@cs.unibo.it)Silvio Peroni (speroni@cs.unibo.it)
Fabio Vitali (fabio@cs.unibo.it)
Summary
• Overlapping markup in everyday life
• EARMARK: an OWL-based meta-markup language
• Conclusions and future works
Overlapping markup... wait, what?
• A definition: overlapping markup “describes cases where some markup structures do not nest neatly into others”DeRose, S. (2004). Markup Overlap: A Review and a Horse. In Proceedings of Extreme Markup Languages 2004. Montreal, Canada.<body> <p>Some <em>very</p> <p>interesting</em> text</p></body>
• Different techniques to embed overlap in XML hierarchies, for instance:
Overlapping markup... wait, what?
• A definition: overlapping markup “describes cases where some markup structures do not nest neatly into others”DeRose, S. (2004). Markup Overlap: A Review and a Horse. In Proceedings of Extreme Markup Languages 2004. Montreal, Canada.<body> <p>Some <em>very</p> <p>interesting</em> text</p></body>
• Different techniques to embed overlap in XML hierarchies, for instance:✦ milestones – expressed through empty elements to mark the boundaries of the content
<body> <p>Some <em start=”id1”/>very</p> <p>interesting<em end=”id1”/> text</p></body>
Overlapping markup... wait, what?
• A definition: overlapping markup “describes cases where some markup structures do not nest neatly into others”DeRose, S. (2004). Markup Overlap: A Review and a Horse. In Proceedings of Extreme Markup Languages 2004. Montreal, Canada.<body> <p>Some <em>very</p> <p>interesting</em> text</p></body>
• Different techniques to embed overlap in XML hierarchies, for instance:✦ milestones – expressed through empty elements to mark the boundaries of the content
<body> <p>Some <em start=”id1”/>very</p> <p>interesting<em end=”id1”/> text</p></body>
✦ fragmentation – expressed by two non-overlapping elements linked through id-idref pairs<body> <p>Some <em id=”em1” next=”em2”>very</em></p> <p><em id=”em2”>interesting</em> text</p></body>
Overlapping everywhere
• Where we can find it: word processor formats + change tracking (e.g., ODT)<office:text>
<text:changed-region text:id="S1"><text:insertion>
<office:change-info><dc:creator>John Smith</dc:creator><dc:date>2009-10-27T18:45:00</dc:date>
</office:change-info></text:insertion>
</text:changed-region><text:p>
The beginning and <text:change-start text:change-id="S1"/>
</text:p><text:p>
also<text:change-end text:change-id="S1"/>the end.
</text:p></office:text>
What the document is
Overlapping everywhere
• Where we can find it: word processor formats + change tracking (e.g., ODT)<office:text>
<text:changed-region text:id="S1"><text:insertion>
<office:change-info><dc:creator>John Smith</dc:creator><dc:date>2009-10-27T18:45:00</dc:date>
</office:change-info></text:insertion>
</text:changed-region><text:p>
The beginning and <text:change-start text:change-id="S1"/>
</text:p><text:p>
also<text:change-end text:change-id="S1"/>the end.
</text:p></office:text>
What the document is
office:text
text:p
The beginning and the end.2009-10-27T18:45:00
before
What the documentrepresents
Overlapping everywhere
• Where we can find it: word processor formats + change tracking (e.g., ODT)<office:text>
<text:changed-region text:id="S1"><text:insertion>
<office:change-info><dc:creator>John Smith</dc:creator><dc:date>2009-10-27T18:45:00</dc:date>
</office:change-info></text:insertion>
</text:changed-region><text:p>
The beginning and <text:change-start text:change-id="S1"/>
</text:p><text:p>
also<text:change-end text:change-id="S1"/>the end.
</text:p></office:text>
What the document is
office:text
text:p
The beginning and the end.2009-10-27T18:45:00
before
What the documentrepresents
office:text
text:p text:p
alsoafter
Overlapping everywhere
• Where we can find it: word processor formats + change tracking (e.g., ODT)<office:text>
<text:changed-region text:id="S1"><text:insertion>
<office:change-info><dc:creator>John Smith</dc:creator><dc:date>2009-10-27T18:45:00</dc:date>
</office:change-info></text:insertion>
</text:changed-region><text:p>
The beginning and <text:change-start text:change-id="S1"/>
</text:p><text:p>
also<text:change-end text:change-id="S1"/>the end.
</text:p></office:text>
What the document is
office:text
text:p
The beginning and the end.2009-10-27T18:45:00
before
What the documentrepresents
office:text
text:p text:p
alsoafter
inserted by John Smith
• EARMARK is a vocabulary that defines a meta-markup language by means of OWL ontologies – http://www.essepuntato.it/2008/12/earmark
• It is more expressive than XML
• Three disjoint base classes:✦ Docuverse – it represents the textual content of a document
Subclasses: StringDocuverse, URIDocuverse
✦ Range – it describes any text lying between two locationsSubclasses: PointerRange, XPathRange, XPathPointerRange
✦ MarkupItem – a collection of individuals belonging to the classes MarkupItem and RangeSubclasses: Element, Attribute, Comment
XML EARMARK
Data structure
Overlapping
Semantics
Tree DAGOnly by using tricks Of course, it is a feature here
What? Yes, it is OWL!
An example
:aDoc a earmark:StringDocuverse; earmark:hasContent “The beginning and the end.”^^xsd:string .
@prefix earmark: <http://www.essepuntato.it/2008/12/earmark#> .@prefix : <http://www.example.com/> .
The beginning and the end.
An example
:aDoc a earmark:StringDocuverse; earmark:hasContent “The beginning and the end.”^^xsd:string .
@prefix earmark: <http://www.essepuntato.it/2008/12/earmark#> .@prefix : <http://www.example.com/> .
The beginning and the end.
An example
:r2 a earmark:PointerRange; earmark:refersTo :aDoc; earmark:begins “14”^^xsd:nonNegativeInteger; earmark:ends “26”^^xsd:nonNegativeInteger .
:aDoc a earmark:StringDocuverse; earmark:hasContent “The beginning and the end.”^^xsd:string .
@prefix earmark: <http://www.essepuntato.it/2008/12/earmark#> .@prefix : <http://www.example.com/> .
The beginning and the end.
An example
:r2 a earmark:PointerRange; earmark:refersTo :aDoc; earmark:begins “14”^^xsd:nonNegativeInteger; earmark:ends “26”^^xsd:nonNegativeInteger .
:aDoc a earmark:StringDocuverse; earmark:hasContent “The beginning and the end.”^^xsd:string .
@prefix earmark: <http://www.essepuntato.it/2008/12/earmark#> .@prefix : <http://www.example.com/> .
The beginning and the end.
also
office:text
text:p
office:text
text:p text:p
An example
:r2 a earmark:PointerRange; earmark:refersTo :aDoc; earmark:begins “14”^^xsd:nonNegativeInteger; earmark:ends “26”^^xsd:nonNegativeInteger .
@prefix c: <http://swan.mindinformatics.org/ontologies/1.2/collections/> .
:aMarkupItem a earmark:Element; earmark:hasGeneralIdentifier “p”; earmark:hasNamespace“urn:oasis:names:tc:opendocument:xmlns:text:1.0”
; c:firstItem :item1; c:lastItem :item2 .
:item1 c:itemContent :r1; c:nextItem :item2 .
:item2 c:itemContent :r2 .
:aDoc a earmark:StringDocuverse; earmark:hasContent “The beginning and the end.”^^xsd:string .
@prefix earmark: <http://www.essepuntato.it/2008/12/earmark#> .@prefix : <http://www.example.com/> .
The beginning and the end.
also
office:text
text:p
office:text
text:p text:p
An example
:r2 a earmark:PointerRange; earmark:refersTo :aDoc; earmark:begins “14”^^xsd:nonNegativeInteger; earmark:ends “26”^^xsd:nonNegativeInteger .
@prefix c: <http://swan.mindinformatics.org/ontologies/1.2/collections/> .
:aMarkupItem a earmark:Element; earmark:hasGeneralIdentifier “p”; earmark:hasNamespace“urn:oasis:names:tc:opendocument:xmlns:text:1.0”
; c:firstItem :item1; c:lastItem :item2 .
:item1 c:itemContent :r1; c:nextItem :item2 .
:item2 c:itemContent :r2 .
:aDoc a earmark:StringDocuverse; earmark:hasContent “The beginning and the end.”^^xsd:string .
@prefix earmark: <http://www.essepuntato.it/2008/12/earmark#> .@prefix : <http://www.example.com/> .
The beginning and the end.
also
office:text
text:p
office:text
text:p text:p
inserted by John Smith
An example
:r2 a earmark:PointerRange; earmark:refersTo :aDoc; earmark:begins “14”^^xsd:nonNegativeInteger; earmark:ends “26”^^xsd:nonNegativeInteger .
@prefix c: <http://swan.mindinformatics.org/ontologies/1.2/collections/> .
:aMarkupItem a earmark:Element; earmark:hasGeneralIdentifier “p”; earmark:hasNamespace“urn:oasis:names:tc:opendocument:xmlns:text:1.0”
; c:firstItem :item1; c:lastItem :item2 .
:item1 c:itemContent :r1; c:nextItem :item2 .
:item2 c:itemContent :r2 .
:aDoc a earmark:StringDocuverse; earmark:hasContent “The beginning and the end.”^^xsd:string .
@prefix earmark: <http://www.essepuntato.it/2008/12/earmark#> .@prefix : <http://www.example.com/> .
The beginning and the end.
also
office:text
text:p
office:text
text:p text:p
inserted by John Smith
:p2 a Insertion ; dc:creator “John Smith”; dc:date “2009-10-27T18:45:00”^^xsd:dateTime .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
EARMARK Data Structure
• It is an API and a Java library that allows to easily create and modify EARMARK document within Java applications
• Open Source project: http://earmark.sourceforge.netEARMARKDocument ed = new EARMARKDocument(new URI("http://www.example.com"));
Docuverse aDoc =ed.createStringDocuverse("The beginning and the end.");
[...]
Range aRange = ed.createPointerRange(aDoc, 14, 26);
[...]
Element aMarkupItem = ed.createElement("p", "urn:oasis:names:tc:opendocument:xmlns:text:1.0",Collection.Type.List);
ed.appendChild(anotherMarkupItem);
[...]
Semantic Web technologies as added value
• Because every EARMARK document is expressed as proper ABox of an ontology, we can use Semantic Web technologies:
✦ to manipulate documents✦ to query them✦ to infer new assertions ✦ to check some integrity constraints on document structure and on content semantics
• In EARMARK, those technologies can be very helpful in solving issues that are difficult to solve or are not solvable at all by using XML tools
• An example: “get all the text fragments inserted by John Smith”
Semantic Web technologies as added value
• Because every EARMARK document is expressed as proper ABox of an ontology, we can use Semantic Web technologies:
✦ to manipulate documents✦ to query them✦ to infer new assertions ✦ to check some integrity constraints on document structure and on content semantics
• In EARMARK, those technologies can be very helpful in solving issues that are difficult to solve or are not solvable at all by using XML tools
• An example: “get all the text fragments inserted by John Smith”✦ XPath
for $id in //@text:id[../text:insertion//(dc:creator[. = ‘John Smith’] | @office:chg-author[. = ’ John Smith’])] return //text:p//text()[(preceding-sibling::text:change-start[1][@text:change-id = $id] and following-sibling::text:change-end[1][@text:change-id = $id]) or ancestor::text:changed-region/@text:id = $id]
Semantic Web technologies as added value
• Because every EARMARK document is expressed as proper ABox of an ontology, we can use Semantic Web technologies:
✦ to manipulate documents✦ to query them✦ to infer new assertions ✦ to check some integrity constraints on document structure and on content semantics
• In EARMARK, those technologies can be very helpful in solving issues that are difficult to solve or are not solvable at all by using XML tools
• An example: “get all the text fragments inserted by John Smith”✦ XPath
for $id in //@text:id[../text:insertion//(dc:creator[. = ‘John Smith’] | @office:chg-author[. = ’ John Smith’])] return //text:p//text()[(preceding-sibling::text:change-start[1][@text:change-id = $id] and following-sibling::text:change-end[1][@text:change-id = $id]) or ancestor::text:changed-region/@text:id = $id]
✦ SPARQLSELECT ?r WHERE { ?r a earmark:Range , Insertion ; dc:creator "John Smith" . }
Conclusions andfuture works
• We presented a new meta-markup language called EARMARK, defined by means of OWL ontologies, that allows to make very complex markup documents
• We applied it in a real-case scenario (ODT format with change tracking) showing how it allows to handle, manipulate and query complex documents in a better way (than XML does)
• Future works about this topic include:✦ Rocco and Fretta are two on-going projects that allow transformations from
XML documents (with overlapping markup specified by using tricks) to EARMARK documents, and vice versa
✦ a formalism to specify explicitly semantics of markup and of textual content✦ a word processor that allows to define EARMARK documents in a very
simple way, with the possibility to add any kind of semantic assertions to any entity of the document (both markup items and textual content)
Late time example:A more complex ODT document...
<office:text><text:changed-region text:id="S2">! <text:deletion><office:change-info>! ! ! <dc:creator>Silvio Peroni</dc:creator>! ! ! <dc:date>2009-10-27T18:45:00</dc:date>
! ! </office:change-info><text:p>.</text:p></text:deletion>! <text:insertion>! ! <office:change-info office:chg-author="Angelo Di Iorio"! ! ! office:chg-date-time="2009-10-27T18:42:00"/>! </text:insertion></text:changed-region><text:changed-region text:id="A2">! <text:insertion><office:change-info>! ! ! <dc:creator>Angelo Di Iorio</dc:creator>! ! ! <dc:date>2009-10-27T18:42:00</dc:date>
! ! </office:change-info></text:insertion></text:changed-region>[...]<text:p>This is one paragraph<text:change-start text:change-id="S1"/>;! actually, it was!<text:change-end text:change-id="S1"/>! <text:change text:change-id="S2"/>
<text:change-start text:change-id="A2"/></text:p><text:p><text:change-end text:change-id="A2"/>! <text:change text:change-id="A3"/><text:change-start text:change-id="A4"/>S! <text:change-end text:change-id="A4"/>plit in two.</text:p>
</office:text>
... and its representation in EARMARK
TIME
r3
r1
r5
r4
r6
This is one paragraph that will be split in two.
; actually, it was!
text
p
p
text
p
textr2 p
a text:insertion ;dc:creator “Silvio Peroni”dc:date “2009-10-27T18:45:00”
a text:deletion ;dc:creator “Silvio Peroni”dc:date “2009-10-27T18:45:00”
a text:insertion ;dc:creator “Angelo Di Iorio”dc:date “2009-10-27T18:42:00”
a text:deletion ;dc:creator “Angelo Di Iorio”dc:date “2009-10-27T18:42:00”
. S
Legend
beginlocation
endlocation
string in the range
docuversecontent
docuverses ranges markup items assertions
top related