10/14/2001 coping with semantics in xml document management thomas kudrass leipzig university of...

16
10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics

Upload: hester-richards

Post on 25-Dec-2015

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics

10/14/2001

Coping with Semantics in XML Document Management

Thomas Kudrass

Leipzig University of Applied Sciences

Department of Computer Science and Mathematics

Page 2: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics

Coping with Semantics in XML Document Management

2

Overview

Introduction– Motivation

– XML: A Semantic Perspective

– XML Document Types

XML Semantic Problems– XML: A Database Perspective

– Common Mapping Problems

RM-ODP Viewpoints on XML Documents– Content View vs. Logical Layout View

– Example

Realization of XML Document Management: Nesting of Viewpoints

Conclusions

Page 3: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics

Coping with Semantics in XML Document Management

3

Motivation

Aim: XML Document Management using Database Systems

Problem: Map XML Documents to Databases– different approaches

– no mapping rules

– many open issues

Reason: Semantics of XML not well understood– XML: only syntax, no predefined semantics

• Introduction

• XML Semantic Problems

• Viewpoints on XML Documents

• Realization

• Conclusions

Page 4: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics

Coping with Semantics in XML Document Management

4

XML - A Semantic Perspective

User-Defined Markup– structure the character data of a document

– explain the documents through the use of names

Naming– RMD-ODP: “A name is a term that refers to an entity in a

given naming context.“

– XML namespaces no solution

– possible improvement: shared ontologies

No Standard Behavior of Tags– XSL processors: flexible presentation of XML document

– XML processor: check well-formedness and validity of the XML document

– open issue: document object semantics

• Introduction

• XML Semantic Problems

• Viewpoints on XML Documents

• Realization

• Conclusions

Page 5: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics

Coping with Semantics in XML Document Management

5

XML Document Types Data-Centric Documents

– designed for machine consumption (XML for data transport) – examples: sales orders, stock quotes, flight schedules– fairly regular structure– fine-grained data

Document-Centric Documents– designed for human readers– examples: books, journal articles, emails– less regular structure– coarse-grained data

Hybrid Documents– composition of documents of different types– example: medical documents = patient data + findings +

prescriptions + procedures

Document Type Requirements to the Document Management System

• Introduction

• XML Semantic Problems

• Viewpoints on XML Documents

• Realization

• Conclusions

Page 6: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics

Coping with Semantics in XML Document Management

6

XML - A Database Perspective

Round-Trip Problem– store an XML document in a database and retrieve the “same“

document back again

– vital to applications required by law to keep exact copies of documents

– less important to data-centric documents

• focus on the document content

• ignore the order of sibling elements

– many XML-to-DB algorithms don‘t preserve the whole documents

• CDATA sections

• character entities

• comments

• processing instructions

• Introduction

• XML Semantic Problems

• Viewpoints on XML Documents

• Realization

• Conclusions

Page 7: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics

Coping with Semantics in XML Document Management

7

Common Mapping Problems (1)

Attributes vs. Element Text– where to store data of a document?

– both alternatives possible, influenced by the implementation

Meaning of Attributes– ambiguities when interpreting attributes

– example: order of a customer has an attribute expiry date = “11/2001“ different meanings:

• “The order will expire in Nov. 2001“

• “The information about the order can be thrown away in Nov. 2001“

• “The expiry date is an information about the credit card used for purchase“

• Introduction

• XML Semantic Problems

• Viewpoints on XML Documents

• Realization

• Conclusions

Page 8: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics

Coping with Semantics in XML Document Management

8

Common Mapping Problems (2) Null Values

– different semantics of null values– database null values have to be reflected in XML documents– XML Schema:

• null values in element‘s text can be expressed• no concept of null for attributes

– DTD: optional elements and attributes

Comments, Processing Instructions– considered no content of the document in many algorithms

Markup– visible in the logical document layout (e.g., character entities)– substituted in the physical representation of the document– Example:

• &lt;foo/&gt stored in a database• non-XML aware database don‘t recognize markup <foo/>

• Introduction

• XML Semantic Problems

• Viewpoints on XML Documents

• Realization

• Conclusions

Page 9: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics

Coping with Semantics in XML Document Management

9

Common Mapping Problems (3)

Links– links originally designed for documents and document

fragmentse.g., XPointers point to document subtrees using XPath

– not adequate to express semantic relationships among document elements

• e.g., ID: identifier value - primary key IDREF - foreign key Behavioral Semantics?

– another language more appropriate to specify the invariants

Sibling Orders– particularly important for document-centric documents

– can be arbitrary in data-centric documents

Other Invariants (e.g., identity constraints)– specified on the level of instances - not schema

– construct the set of all concerned objects (using XPath) before

• Introduction

• XML Semantic Problems

• Viewpoints on XML Documents

• Realization

• Conclusions

Page 10: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics

Coping with Semantics in XML Document Management

10

RM-ODP Viewpoints on Documents

Physical Presentation View– dependent on media, screen size / paper size

– document = composition of characters with attributes (font, size, style)

– XML character entities replaced

Logical Layout View– composition of prose components (paragraphs, sections, lists,

list items) and other objects (e.g., frames, code sections)

– mostly ordered composition in document-centric documents

– many possible physical presentation views

Content View– composition of information objects (title, author, abstract,

body, bibliography)

– can be organized in a hierarchical structure or can be flat

– mapped to several logical layouts

• Introduction

• XML Semantic Problems

• Viewpoints on XML Documents

• Realization

• Conclusions

Page 11: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics

Coping with Semantics in XML Document Management

11

Content View vs. Logical Layout

Content View– document-centric documents

• information viewpoint in DTD or XML Schema

• some constructs to specify structural constraints (e.g., cardinality constraints in XML Schema)

– data-centric documents

• structure not very relevant

• many invariants among content elements cannot be adequately expressed in DTD / XML Schema

• possible abuse of XLink / XPointers to specify relationships among content elements

Logical Layout– document-centric documents

• may follow the structure of the content

– data-centric documents

• often arbitrary

• Introduction

• XML Semantic Problems

• Viewpoints on XML Documents

• Realization

• Conclusions

Page 12: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics

Coping with Semantics in XML Document Management

12

Data-Centric Documents: Content View Example: Integrity Constraints:

– The overall value of an order must exceed a certain minimum.

– A customer can submit at most 5 orders.

– If a customer is deleted, all of his orders have to be cancelled.

Order

Header Line Item

(1,1) (1,N)

Customer

Product

C

D

How to Map to an XML Document ?

OR How to Map to the Logical

Layout View?

• Introduction

• XML Semantic Problems

• Viewpoints on XML Documents

• Realization

• Conclusions

Rel

Page 13: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics

Coping with Semantics in XML Document Management

13

Alternative 1<Customer> C1... <Order> O1 ... <Item> ... </Item> <Item> ... </Item> ...

</Order> <Order> O2 ... <Item> ... </Item> <Item> ... </Item> ...

</Order> ...

</Customer><Customer> C2... <Order> O3 ... <Item> ... </Item> <Item> ... </Item> ...

</Order> <Order> O4 ... <Item> ... </Item> <Item> ... </Item> ...

</Order> ...

Data-Centric Documents: Logical Layout View Alternative 2

<Order> O1... <Customer> C1... <Item> ... </Item> <Item> ... </Item> ...

</Order><Order> O2 ... <Customer> C1... <Item> ... </Item> <Item> ... </Item> ...

</Order><Order> O3 ... <Customer> C2... <Item> ... </Item> <Item> ... </Item> ...

</Order><Order> O4 ... <Customer> C2... <Item> ... </Item> <Item> ... </Item> ...

</Order>...

Alternative 3<Item> ... <Order> O1 ... <Customer> C1... </Customer> <Order> </Item> <Item> ... <Order> O2 ... <Customer> C1... </Customer> <Order> </Item><Item> ... <Order> O3 ... <Customer> C2... </Customer> <Order> </Item>...

• Introduction

• XML Semantic Problems

• Viewpoints on XML Documents

• Realization

• Conclusions

Page 14: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics

Coping with Semantics in XML Document Management

14

Operations

Operations are viewpoint-specific XML-APIs: DOM / XPath

– based on a tree model

– although powerful, not appropriate for set-oriented operations

Viewpoints vs. Operations– content view: set-oriented operations

– logical layout view: navigating operations (on a tree)

Need another language to express operations in the content view of data-centric documents!• Introduction

• XML Semantic Problems

• Viewpoints on XML Documents

• Realization

• Conclusions

Page 15: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics

Coping with Semantics in XML Document Management

15

iTE

Realization: Nesting of Viewpoints

XML Document

Content View

B“Store“ “Retrieve“

Semantic Model

ENTERPRISE INFORMATION COMPUT. ENG. TECHNOLOGY

RDBMSnative XML-DB

Logical Layout View

B“Store“ “Retrieve“

XML SchemaDTD

ENTERPRISE INFORMATION COMPUT. ENG. TECHNOLOGY

File (Template)Large Object

Presentation View

B“Browse““Store“

SVG PDF

ENTERPRISE INFORMATION COMPUT. ENG. TECHNOLOGY

Media: Screen, Paper

• Introduction

• XML Semantic Problems

• Viewpoints on XML Documents

• Realization

• Conclusions

Page 16: 10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics

Coping with Semantics in XML Document Management

16

Conclusions

Analyze the requirements first before building an XML system– data-centric vs. document-centric documents

– huge impact on the choice of technology (storage platform)

Think in viewpoints to understand the semantics– mixed occurrence of content view and logical layout in XML

documents

– expand viewpoints into the specification of a new system

Use generic relationships for constraint modelling Beware of the difference between specification

and realization

• Introduction

• XML Semantic Problems

• Viewpoints on XML Documents

• Realization

• Conclusions