10/14/2001 coping with semantics in xml document management thomas kudrass leipzig university of...
TRANSCRIPT
![Page 1: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics](https://reader036.vdocument.in/reader036/viewer/2022082611/56649de75503460f94ae0065/html5/thumbnails/1.jpg)
10/14/2001
Coping with Semantics in XML Document Management
Thomas Kudrass
Leipzig University of Applied Sciences
Department of Computer Science and Mathematics
![Page 2: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics](https://reader036.vdocument.in/reader036/viewer/2022082611/56649de75503460f94ae0065/html5/thumbnails/2.jpg)
Coping with Semantics in XML Document Management
2
Overview
Introduction– Motivation
– XML: A Semantic Perspective
– XML Document Types
XML Semantic Problems– XML: A Database Perspective
– Common Mapping Problems
RM-ODP Viewpoints on XML Documents– Content View vs. Logical Layout View
– Example
Realization of XML Document Management: Nesting of Viewpoints
Conclusions
![Page 3: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics](https://reader036.vdocument.in/reader036/viewer/2022082611/56649de75503460f94ae0065/html5/thumbnails/3.jpg)
Coping with Semantics in XML Document Management
3
Motivation
Aim: XML Document Management using Database Systems
Problem: Map XML Documents to Databases– different approaches
– no mapping rules
– many open issues
Reason: Semantics of XML not well understood– XML: only syntax, no predefined semantics
• Introduction
• XML Semantic Problems
• Viewpoints on XML Documents
• Realization
• Conclusions
![Page 4: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics](https://reader036.vdocument.in/reader036/viewer/2022082611/56649de75503460f94ae0065/html5/thumbnails/4.jpg)
Coping with Semantics in XML Document Management
4
XML - A Semantic Perspective
User-Defined Markup– structure the character data of a document
– explain the documents through the use of names
Naming– RMD-ODP: “A name is a term that refers to an entity in a
given naming context.“
– XML namespaces no solution
– possible improvement: shared ontologies
No Standard Behavior of Tags– XSL processors: flexible presentation of XML document
– XML processor: check well-formedness and validity of the XML document
– open issue: document object semantics
• Introduction
• XML Semantic Problems
• Viewpoints on XML Documents
• Realization
• Conclusions
![Page 5: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics](https://reader036.vdocument.in/reader036/viewer/2022082611/56649de75503460f94ae0065/html5/thumbnails/5.jpg)
Coping with Semantics in XML Document Management
5
XML Document Types Data-Centric Documents
– designed for machine consumption (XML for data transport) – examples: sales orders, stock quotes, flight schedules– fairly regular structure– fine-grained data
Document-Centric Documents– designed for human readers– examples: books, journal articles, emails– less regular structure– coarse-grained data
Hybrid Documents– composition of documents of different types– example: medical documents = patient data + findings +
prescriptions + procedures
Document Type Requirements to the Document Management System
• Introduction
• XML Semantic Problems
• Viewpoints on XML Documents
• Realization
• Conclusions
![Page 6: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics](https://reader036.vdocument.in/reader036/viewer/2022082611/56649de75503460f94ae0065/html5/thumbnails/6.jpg)
Coping with Semantics in XML Document Management
6
XML - A Database Perspective
Round-Trip Problem– store an XML document in a database and retrieve the “same“
document back again
– vital to applications required by law to keep exact copies of documents
– less important to data-centric documents
• focus on the document content
• ignore the order of sibling elements
– many XML-to-DB algorithms don‘t preserve the whole documents
• CDATA sections
• character entities
• comments
• processing instructions
• Introduction
• XML Semantic Problems
• Viewpoints on XML Documents
• Realization
• Conclusions
![Page 7: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics](https://reader036.vdocument.in/reader036/viewer/2022082611/56649de75503460f94ae0065/html5/thumbnails/7.jpg)
Coping with Semantics in XML Document Management
7
Common Mapping Problems (1)
Attributes vs. Element Text– where to store data of a document?
– both alternatives possible, influenced by the implementation
Meaning of Attributes– ambiguities when interpreting attributes
– example: order of a customer has an attribute expiry date = “11/2001“ different meanings:
• “The order will expire in Nov. 2001“
• “The information about the order can be thrown away in Nov. 2001“
• “The expiry date is an information about the credit card used for purchase“
• Introduction
• XML Semantic Problems
• Viewpoints on XML Documents
• Realization
• Conclusions
![Page 8: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics](https://reader036.vdocument.in/reader036/viewer/2022082611/56649de75503460f94ae0065/html5/thumbnails/8.jpg)
Coping with Semantics in XML Document Management
8
Common Mapping Problems (2) Null Values
– different semantics of null values– database null values have to be reflected in XML documents– XML Schema:
• null values in element‘s text can be expressed• no concept of null for attributes
– DTD: optional elements and attributes
Comments, Processing Instructions– considered no content of the document in many algorithms
Markup– visible in the logical document layout (e.g., character entities)– substituted in the physical representation of the document– Example:
• <foo/> stored in a database• non-XML aware database don‘t recognize markup <foo/>
• Introduction
• XML Semantic Problems
• Viewpoints on XML Documents
• Realization
• Conclusions
![Page 9: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics](https://reader036.vdocument.in/reader036/viewer/2022082611/56649de75503460f94ae0065/html5/thumbnails/9.jpg)
Coping with Semantics in XML Document Management
9
Common Mapping Problems (3)
Links– links originally designed for documents and document
fragmentse.g., XPointers point to document subtrees using XPath
– not adequate to express semantic relationships among document elements
• e.g., ID: identifier value - primary key IDREF - foreign key Behavioral Semantics?
– another language more appropriate to specify the invariants
Sibling Orders– particularly important for document-centric documents
– can be arbitrary in data-centric documents
Other Invariants (e.g., identity constraints)– specified on the level of instances - not schema
– construct the set of all concerned objects (using XPath) before
• Introduction
• XML Semantic Problems
• Viewpoints on XML Documents
• Realization
• Conclusions
![Page 10: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics](https://reader036.vdocument.in/reader036/viewer/2022082611/56649de75503460f94ae0065/html5/thumbnails/10.jpg)
Coping with Semantics in XML Document Management
10
RM-ODP Viewpoints on Documents
Physical Presentation View– dependent on media, screen size / paper size
– document = composition of characters with attributes (font, size, style)
– XML character entities replaced
Logical Layout View– composition of prose components (paragraphs, sections, lists,
list items) and other objects (e.g., frames, code sections)
– mostly ordered composition in document-centric documents
– many possible physical presentation views
Content View– composition of information objects (title, author, abstract,
body, bibliography)
– can be organized in a hierarchical structure or can be flat
– mapped to several logical layouts
• Introduction
• XML Semantic Problems
• Viewpoints on XML Documents
• Realization
• Conclusions
![Page 11: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics](https://reader036.vdocument.in/reader036/viewer/2022082611/56649de75503460f94ae0065/html5/thumbnails/11.jpg)
Coping with Semantics in XML Document Management
11
Content View vs. Logical Layout
Content View– document-centric documents
• information viewpoint in DTD or XML Schema
• some constructs to specify structural constraints (e.g., cardinality constraints in XML Schema)
– data-centric documents
• structure not very relevant
• many invariants among content elements cannot be adequately expressed in DTD / XML Schema
• possible abuse of XLink / XPointers to specify relationships among content elements
Logical Layout– document-centric documents
• may follow the structure of the content
– data-centric documents
• often arbitrary
• Introduction
• XML Semantic Problems
• Viewpoints on XML Documents
• Realization
• Conclusions
![Page 12: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics](https://reader036.vdocument.in/reader036/viewer/2022082611/56649de75503460f94ae0065/html5/thumbnails/12.jpg)
Coping with Semantics in XML Document Management
12
Data-Centric Documents: Content View Example: Integrity Constraints:
– The overall value of an order must exceed a certain minimum.
– A customer can submit at most 5 orders.
– If a customer is deleted, all of his orders have to be cancelled.
Order
Header Line Item
(1,1) (1,N)
Customer
Product
C
D
How to Map to an XML Document ?
OR How to Map to the Logical
Layout View?
• Introduction
• XML Semantic Problems
• Viewpoints on XML Documents
• Realization
• Conclusions
Rel
![Page 13: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics](https://reader036.vdocument.in/reader036/viewer/2022082611/56649de75503460f94ae0065/html5/thumbnails/13.jpg)
Coping with Semantics in XML Document Management
13
Alternative 1<Customer> C1... <Order> O1 ... <Item> ... </Item> <Item> ... </Item> ...
</Order> <Order> O2 ... <Item> ... </Item> <Item> ... </Item> ...
</Order> ...
</Customer><Customer> C2... <Order> O3 ... <Item> ... </Item> <Item> ... </Item> ...
</Order> <Order> O4 ... <Item> ... </Item> <Item> ... </Item> ...
</Order> ...
Data-Centric Documents: Logical Layout View Alternative 2
<Order> O1... <Customer> C1... <Item> ... </Item> <Item> ... </Item> ...
</Order><Order> O2 ... <Customer> C1... <Item> ... </Item> <Item> ... </Item> ...
</Order><Order> O3 ... <Customer> C2... <Item> ... </Item> <Item> ... </Item> ...
</Order><Order> O4 ... <Customer> C2... <Item> ... </Item> <Item> ... </Item> ...
</Order>...
Alternative 3<Item> ... <Order> O1 ... <Customer> C1... </Customer> <Order> </Item> <Item> ... <Order> O2 ... <Customer> C1... </Customer> <Order> </Item><Item> ... <Order> O3 ... <Customer> C2... </Customer> <Order> </Item>...
• Introduction
• XML Semantic Problems
• Viewpoints on XML Documents
• Realization
• Conclusions
![Page 14: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics](https://reader036.vdocument.in/reader036/viewer/2022082611/56649de75503460f94ae0065/html5/thumbnails/14.jpg)
Coping with Semantics in XML Document Management
14
Operations
Operations are viewpoint-specific XML-APIs: DOM / XPath
– based on a tree model
– although powerful, not appropriate for set-oriented operations
Viewpoints vs. Operations– content view: set-oriented operations
– logical layout view: navigating operations (on a tree)
Need another language to express operations in the content view of data-centric documents!• Introduction
• XML Semantic Problems
• Viewpoints on XML Documents
• Realization
• Conclusions
![Page 15: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics](https://reader036.vdocument.in/reader036/viewer/2022082611/56649de75503460f94ae0065/html5/thumbnails/15.jpg)
Coping with Semantics in XML Document Management
15
iTE
Realization: Nesting of Viewpoints
XML Document
Content View
B“Store“ “Retrieve“
Semantic Model
ENTERPRISE INFORMATION COMPUT. ENG. TECHNOLOGY
RDBMSnative XML-DB
Logical Layout View
B“Store“ “Retrieve“
XML SchemaDTD
ENTERPRISE INFORMATION COMPUT. ENG. TECHNOLOGY
File (Template)Large Object
Presentation View
B“Browse““Store“
SVG PDF
ENTERPRISE INFORMATION COMPUT. ENG. TECHNOLOGY
Media: Screen, Paper
• Introduction
• XML Semantic Problems
• Viewpoints on XML Documents
• Realization
• Conclusions
![Page 16: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics](https://reader036.vdocument.in/reader036/viewer/2022082611/56649de75503460f94ae0065/html5/thumbnails/16.jpg)
Coping with Semantics in XML Document Management
16
Conclusions
Analyze the requirements first before building an XML system– data-centric vs. document-centric documents
– huge impact on the choice of technology (storage platform)
Think in viewpoints to understand the semantics– mixed occurrence of content view and logical layout in XML
documents
– expand viewpoints into the specification of a new system
Use generic relationships for constraint modelling Beware of the difference between specification
and realization
• Introduction
• XML Semantic Problems
• Viewpoints on XML Documents
• Realization
• Conclusions