1 xml: extensible markup language. 2 xml overview xml is a meta-language, a simplified form of sgml...

56
1 XML: eXtensible Markup Language

Upload: abraham-morgan-jordan

Post on 27-Dec-2015

286 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

1

XML: eXtensible Markup Language

Page 2: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

2

XML Overview

• XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language)

• XML was initiated in large parts by Jon Bosak of Sun Microsystems, Inc., through a W3C working group

Page 3: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

3

What you should already know

• WWW, HTML and the basics of building Web pages

• Web scripting languages like JavaScript or VBScript

Page 4: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

4

What is XML?

• XML stands for EXtensible Markup Language

• XML is a markup language much like HTML

• XML was designed to describe data • XML tags are not predefined in XML. You

must define your own tags

Page 5: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

5

XML Overview (cont.)

• An XML compliant application generally needs three files to display XML content:

– The XML document

• Contains the data tagged with meaningful XML elements

– A stylesheet

• Dictates the formatting when the XML document is displayed. Examples: CSS - cascading style sheets, XSL - extensible stylesheet language

– A document type definition - DTD

• Specifies the rules how elements and attributes are logically related

Page 6: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

6

XML Terminology

• Element, e.g.,:

<Body>

This is text formatted according to the body element

</Body>– An element consists always of two tags:

• An opening tag, e.g., <Body>

• A closing tag, e.g., </Body>

• An element can have attributes, e.g.,:

<Price currency=“Euro”>25.43</Price>– Attribute values must always be in quotes (unlike

HTML)

Page 7: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

7

A Simple XML Document

• Example: Book description

XMLÉùÃ÷ ÐèҪȡµÃÎÄ µµÒÔÍâ µÄÐÅÏ¢ ²ÅÄܶÔÎÄ µµ½øÐнâÎö

ÔªËØ

– <?xml version=“1.0” standalone=“no”?>

<!DOCTYPE OReilly:Books SYSTEM “sample.dtd”>

<!-- Here begins the XML data -->

<OReilly:Booksxmlns:OReilly=“http://www.oreilly.com/”>

<OReilly:Product>XML Pocket Reference </OReilly:Product>

<OReilly:Price currency=“Euro”>8.95</OReilly:Price>

</OReilly:Books>

ÊôÐÔ

ÃüÃû¿Õ¼ä

区分良构 (well-formed)和有效的 (validated)之间的区别

Page 8: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

8

A Simple Document Type Definition

• Example DTD– <!-- DTD for the sample document -->

<!ELEMENT OReilly:Books (OReilly:Product, OReilly:Price)>

<!ELEMENT OReilly:Product (#PCDATA)>

<!ELEMENT OReilly:Price (#PCDATA)>

– Required child elements for OReilly:Books element: OReilly:Product and OReilly:Price

– #PCDATA: parsed character data - any characters are allowed (except <, &, ]]>)

Page 9: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

9

The DTD Language (1)• An XML document is composed of elements:

– Simple elements

<!ELEMENT title ANY>

• The element can contain valid tags and character data

<!ELEMENT title (#PCDATA)>

• The element can not contain tags, only character data

– Nested elements

<!ELEMENT books (title)>

<!ELEMENT title (#PCDATA)>

Page 10: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

10

The DTD Language (2)– Nested and ordered elements:

<!ELEMENT books (title,authors)>

<!ELEMENT title (#PCDATA)>

<!ELEMENT authors (#PCDATA)>

• The order of the elements must be title, then authors

– Nested either-or elements

<!ELEMENT books (title|authors)>

<!ELEMENT title (#PCDATA)>

<!ELEMENT authors (#PCDATA)>

• There must be either a title or a authors element, but not both.

Page 11: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

11

The DTD Language (3)– Grouping and recurrence:

<!ELEMENT reviews (rating,synopsis?,comments+)*>

<!ELEMENT rating ((tutorial|reference)*,overall)>

<!ELEMENT synopsis (#PCDATA)>

<!ELEMENT comments (#PCDATA)>

<!ELEMENT tutorial (#PCDATA)>

<!ELEMENT reference (#PCDATA)>

<!ELEMENT overall (#PCDATA)>

• ? 0 or 1 time

• + 1 or more times

• * 0 or more times

Page 12: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

12

The DTD Language (4)• Inside a DTD we can declare an entity which allows us to use

an entity reference to substitute a series of characters, similar to macros.

– Format:

<!ENTITY name “replacement_characters”>

• Example for the © symbol:

<!ENTITY copyright “&#xA9;”>

– Usage: entities must be prefixed with ‘&’ and followed by a semicolon (‘;’):

<copyright>

&copyright; 2000 MyCompany, Inc.

</copyright>

Page 13: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

13

The DTD Language (5)• Parameter entity references appear only

within a DTD and cannot be used in an XML document. They are prefixed with a %.

– Format and usage:

<!ENTITY % name “replacement_characters”>

• Example:

<!ENTITY % pcdata “(#PCDATA)”>

<!ENTITY authortitle %pcdata;>

Page 14: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

14

The DTD Language (6)• External entities allow us to include data from another XML

document (think of an #include<...> statement in C):

– Format and usage:

<!ENTITY quotes SYSTEM

“http://www.stocks.com/quotes.xml”>

• Example:

<document>

<heading>Current stock quotes</heading>

&quotes; <!-- data from quotes.xml -->

</document>

– Works well for the inclusion of dynamic data.

Page 15: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

15

The DTD Language (7)• Attribute declarations in the DTD. Attributes for various XML

elements must be specified in the DTD.

– Format and usage:

<!ATTLIST target_element attr_name

attr_type default>

• Examples:

<!ATTLIST box length CDATA “0”>

<!ATTLIST box width CDATA “0”>

<!ATTLIST frame visible (true|false) “true”>

<!ATTLIST person marital (single | married | divorced | widowed) #IMPLIED>

Page 16: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

16

The DTD Language (8)

• Default modifiers in DTD attributes:Modifier Description

#REQUIRED The attributes value must be specified withthe element.

#IMPLIED The attribute value can remain unspecified.#FIXED The attribute value is fixed and cannot be

changed by the user.

Page 17: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

17

The DTD Language (9)

• Datatypes in DTD attributes:Type Description

PCDATA Character data enumerated A series of values of which only 1 can be chosen ENTITY An entity declared in the DTD ENTITIES Multiple whitespace separated entities declared

in the DTD ID A unique element identifier IDREF The value of a unique ID type attribute IDREFS Multiple whitespace separated IDREFs of

elements NMTOKEN An XML name token NMTOKENS Multiple whitespace separated XML name tokens NOTATION A notation declared in the DTD

Page 18: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

18

The DTD Language (10)• Example: Sales Order Document

“An order document is comprised of several sales orders. Each individual order has a number and it contains the customer information, the date when the order was received, and the items ordered. Each customer has a number, a name, street, city, state, and ZIP code. Each item has an item number, parts information and a quantity. The parts information contains a number, a description of the product and its unit price.

The numbers should be treated as attributes.”

Page 19: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

19

The DTD Language (11)• Example: Sales Order Document DTD<!-- DTD for example sales order document -->

<!ELEMENT Orders (SalesOrder+)>

<!ELEMENT SalesOrder (Customer,OrderDate,Item+)>

<!ELEMENT Customer (CustName,Street,City,State,ZIP)>

<!ELEMENT OrderDate (#PCDATA)>

<!ELEMENT Item (Part,Quantity)><!ELEMENT Part (Description,Price)><!ELEMENT CustName (#PCDATA)><!ELEMENT Street (#PCDATA)><!ELEMENT ... (#PCDATA)><!ATTLIST SalesOrder SONumber CDATA #REQUIRED><!ATTLIST Customer CustNumber CDATA #REQUIRED><!ATTLIST Part PartNumber CDATA #REQUIRED><!ATTLIST Item ItemNumber CDATA #REQUIRED>

Page 20: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

20

The DTD Language (12)• Example: Sales Order XML Document

<Orders><SalesOrder SONumber=“12345”> <Customer CustNumber=“543”>

<CustName>ABC Industries</CustName> <Street>123 Main St.</Street> <City>Chicago</City> <State>IL</State> <ZIP>60609</ZIP> </Customer> <OrderDate>10222000</OrderDate> <Item ItemNumber=“1”> <Part PartNumber=“234”> <Description>Turkey wrench</Description> <Price>9.95</Price> </Part> <Quantity>10</Quantity> </Item> </SalesOrder></Orders>

Page 21: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

21

Displaying XML with Stylesheet

• Why display xml with stylesheet?

- Because XML does not use predefined tags (we can use any tags we want), the meanings of these tags are not understood: A browser does not know how to display an XML document.

- XML markup does not (usually) include formatting information

- The information in an XML document may not be in the form in which it is desired to present it

So there must be something in addition to the XML document

that describes how the document should be displayed

Page 22: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

22

Displaying XML with XSL<?xml version="1.0" ?>

<?xml-stylesheet type="text/xsl" href="simple.xsl"?> <breakfast_menu>

<food><name>Belgian Waffles</name> <price>$5.95</price> <description>two of our famous Belgian Waffles with

plenty of real maple syrup</description> <calories>650</calories> </food><food>

<name>Strawberry Belgian Waffles</name>   <price>$7.95</price>   <description>light Belgian waffles covered with

strawberries and whipped cream</description>

  <calories>900</calories> </food>

</breakfast_menu>

Page 23: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

23

<?xml version="1.0“?><html xsl:version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

xmlns="http://www.w3.org/TR/xhtml1/strict"><body style="font-family:Arial,helvetica,sans-serif;font-size:12pt; background-

color:#EEEEEE"><xsl:for-each select="breakfast_menu/food">

<div style="background-color:teal;color:white;padding:4px"><span style="font-weight:bold;color:white">

  <xsl:value-of select="name" />   </span>   <xsl:value-of select="price" />   </div>

<div style="margin-left:20px;margin-bottom:1em;font-size:10pt">  <xsl:value-of select="description" />

<span style="font-style:italic">  (   <xsl:value-of select="calories" />   calories per serving)   </span>  </div>  </xsl:for-each> </body>

A XSL style sheet simple.xsl

Page 24: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

24

Page 25: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

25

XSL - More than a Style Sheet

• XSLT (a language for transforming XML documents)

• XPath (a language for defining parts of an XML document)

• XSL Formatting Objects (a vocabulary for formatting XML documents)

Page 26: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

26

XML and Databases (1)

• “Is XML a database?”• In a strict sense, no.

• In a more liberal sense, yes, but …

– XML has:

• Storage (the XML document)

• A schema (DTD)

• Query languages (XQL, XML-QL, …)

• Programming interfaces (SAX, DOM)

– XML lacks:

• Efficient storage, indexes, security, transactions, multi-user access, triggers, queries across multiple documents

Page 27: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

27

XML and Databases (2)

• Data versus Documents

– There are two ways to use XML in a database environment:

• Use XML as a data transport, i.e., to get data in and out of the database

– Data is stored in a relational or object-oriented database

– Middleware converts between the database and XML

• Use a “native XML” database, i.e., store data in document form

– Use a content management system

Page 28: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

28

XML and Databases (3)• Data-centric documents

– Fairly regular structure

– Fine-grained data

– Order of sibling elements often not significant

• Document-centric documents

– Irregular structure

– Larger-grained data

– Order of sibling elements is significant

Page 29: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

29

XML and Databases (4)

• Data-centric storage and retrieval systems

– Use a database

• Add middleware to convert to/from XML

– Use an XML server (specialized product for e-commerce)

– Use an XML-enabled web server with a database backend

• Document-centric storage and retrieval systems

– Content management system

– Persistent DOM implementation

Page 30: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

30

XML and Databases (5)• Mapping document structure to database structure

– Template-driven

• No predefined mapping

• Embedded commands process (retrieve) data

• Currently only available from RDBMS to XML

<?xml version=“1.0”><FlightInfo> <Intro>The following flights have

available seats:</Intro> <SelectStmt>SELECT Airline, FltNumber, Depart, Arrive FROM Flights</SelectStmt> <Conclude>We hope one of these meets your needs</Conclude></FlightInfo>

Page 31: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

31

XML and Databases (6)– Template-driven - Example result:

<?xml version=“1.0”><FlightInfo> <Intro>The following flights have

available seats:</Intro> <Flights> <Row> <Airline>ACME</Airline> <FltNumber>123</FltNumber> <Depart>Dec 12, 2000, 13:43</Depart> <Arrive>Dec 13, 2000, 01:21</Arrive> </Row> </Flights> <Conclude>We hope one of these meets your needs</Conclude></FlightInfo>

Page 32: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

32

XML and Databases (7)

• Mapping document structure to database structure– Model-driven

• A data model is imposed on the structure of the XML document

• This model is mapped to the structures in the database

• There are two common models:– Model the XML document as a single table or a set of

tables

– Model the XML document as a tree of data-specific objects (good for OODBMS mapping)

Page 33: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

33

XML and Databases (8)– Single table or set of tables:

<?xml version=“1.0”><database> <table> <row> <column1>...</column1> <column2>...</column2> ... </row> </table></database>

– Tree organization:

Orders | SalesOrder / | \Customer Item Item | |

Part Part

Page 34: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

34

XML and Databases (9)• Generating DTDs from a database schema and vice versa

– Many times the DTD does not change often for an application and does not need to be automatically generated.

– Some simple conversions are possible

• Example: DTD from relational schema:

For each table, create an ELEMENT.

For each column in a table, create an attribute or a PCDATA-only child ELEMENT.

For each primary key/foreign key relationship in which a column of the table contributes the primary key, create a child ELEMENT.

Page 35: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

35

Marketing XML Research Trend for Papers

Page 36: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

36

Contents

• What are the research areas/focuses at DBMS?

• SIGMOD

• VLDB

• ICDE

Page 37: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

37

What are the research areas at DBMS Labs in these days?

• Distributed, Parallel, Mobile Databases• Indexing, Access Methods, Data Structures• Mining Data, Text and Web• Query / Transaction on Processing and Optimization• View Maintenance / Materialization• Temporal, Spatial, Scientific, Statistical, Biological

Databases• Semi-structured Data, Metadata and XML• WWW and Databases• Data Warehousing and OLAP• Database Applications and Experiences• Middleware, Workflow and Security

Page 38: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

38

Research Session within 2 years

• OLAP• Stream Query Processing• Data Security and Protection• XML Indexing and Compression• Join Algorithms• Temporal Queries• Meta-Data Management• Statistics• Data Integration and Sharing• Transaction• Semi-structured Data

Page 39: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

39

Research Session within 2 years

• Query Processing• Streaming XML• Spatial and Nearest Neighbor Queries• Sensor Database• XML Query Processing• Approximate Querying• Monitoring Data Streams• Data warehousing and archive• Distributed Data & Streams

Page 40: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

40

Research Session within 2 years

• Data Mining• Compression• Similarity and Matching• XML Matching & Storage• Aggregation, Prediction & Constraints• Query Optimization• Metadata & Sampling• Security and Privacy• Maintenance of Statistics• Data Transformation and Integration

Page 41: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

41

Year - 1998• SIGMOD– Semi-structured Data

• A Tool for Semi-Automatically Extracting Semi-structured Data from Text Documents

• Extracting Schema from Semi-structured Data • Enhanced Hypertext Categorization Using Hyperlinks

• VLDB– Querying and Browsing (1997)

• DataGuides: Enabling Query Formulation and Optimization in Semi-structured Databases

• ICDE– Semi-structured Data

• Representing and Querying Changes in Semi-structured Data • Optimizing Regular Path Expressions Using Graph Schemas• WebOQL: Restructuring Documents, Databases, and Webs

Page 42: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

42

Year - 1999• SIGMOD– Semi-structured Data and Mediators

• Storing Semi-structured Data with STORED • Computing Capabilities of Mediators • Query Rewriting for Semi-structured Data

• VLDB– Semi-structured Data & XML Queries

• Capturing and Querying Multiple Aspects of Semi-structured Data

• Relational Databases for Querying XML Documents: Limitations and Opportunities

• Query Optimization for XML

• ICDE

Page 43: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

43

Year - 2000• SIGMOD

– XML• On Wrapping Query Languages and Efficient XML Integration • XMILL: An Efficient Compressor for XML Data • XTRACT: A System for Extracting Document Type Descriptors from XML

Documents

• VLDB– Publishing, Filtering, and Mappings

• Efficient Filtering of XML Documents for Selective Dissemination of Information • Efficiently Publishing Relational Data as XML Documents

– Demonstration• Agora: Living with XML and Relational. • XPERANTO: Middleware for Publishing Object-Relational Data as XML Documents

• ICDE

– New Applications • Efficient Storage of XML Data

– XML and Databases • Oracle8i - The XML Enabled Data Management System • XML and DB2

Page 44: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

44

Year - 2001• SIGMOD

– XML• Updating XML.

• On Supporting Containment Queries in Relational Database Management Systems

• Monitoring XML Data on the Web

– Distributed and Heterogeneous Databases • Efficient Evaluation of XML Middle-ware Queries

• VLDB– XML Queries and Views

• Answering XML Queries on Heterogeneous Data Sources

• Query Engines for Web-Accessible XML Data

• Querying XML Views of Relational Data

• Views in a Large Scale XML Repository

– New Index Structures • A Fast Index for Semi-structured Data

• Indexing and Querying XML Data for Regular Path Expressions

Page 45: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

45

Year 2001 (2)

– XML Processing • Change-Centric Management of Versions in an XML Warehouse. • Estimating the Selectivity of XML Path Expressions for Internet Scale Applications • On Processing XML in LDAP

• ICDE– XML

• XML Data and Object Databases: A Perfect Couple? • Tamino - A DBMS designed for XML. • The Nimble XML Data Integration System. • An Automated Change Detection Algorithm for HTML Documents Based on

Semantic Hierarchies • An XML Indexing Structure with Relative Region Coordinate • Querying XML Documents Made Easy: Nearest Concept Queries • A Graph-Based Approach For Extracting Terminological Properties of

Elements of XML Documents

Page 46: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

46

Year - 2002• SIGMOD– XML

• StatiX: making XML count. • QURSED: querying and reporting semi-structured data. • Storing and querying ordered XML using a relational database system • Approximate XML joins • Efficient algorithms for minimizing tree pattern queries • Holistic twig joins: optimal XML pattern matching

– path indexing • Accelerating XPath location steps • APEX: an adaptive path index for XML data

• VLDB– XML Query Processing

• Structural Function Inlining Technique for Structurally Recursive XML Queries • Efficient Algorithms for Processing XPath Queries • Incorporating XSL Processing into Database Engines • Optimizing View Queries in ROLEX to Support Navigable Result Trees

– XML Indexing • Updates for Structure Indexes • RE-Tree: An Efficient Index Structure for Regular Expressions • Efficient Structural Joins on Indexed XML Documents

Page 47: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

47

Year 2002(2)– VLDB02– Maintenance of Statistics

• XPathLearner: An On-line Self-Tuning Markov Histogram for XML Path Selectivity Estimation

• Structure and Value Synopses for XML Data Graphs – Demonstrations

• Active XML: Peer-to-Peer Data and Web Services Integration. • LegoDB: Customizing Relational Storage for XML Documents

• ICDE– Semi-structured Data, Metadata and XML

• Detecting Changes in XML Documents. • From XML Schema to Relations: A Cost-Based Approach to XML Storage. • Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to

Schema Matching • Exploiting Local Similarity for Indexing Paths in Graph-Structured Data • Structural Joins: A Primitive for Efficient XML Query Pattern Matching • XGRIND: A Query-Friendly XML Compressor • Efficient Filtering of XML Documents with XPath Expressions • Mixing Querying and Navigation in MIX • A Graphical XML Query Language • NeT & CoT: Inferring XML Schemas from Relational World • Data Cleaning and XML: The DBLP Experience

Page 48: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

48

Year - 2003• SIGMOD

– XML and Text • Querying Structured Text in an XML Database • XRANK: Ranked Keyword Search over XML Documents

– XML Indexing and Compression • ViST: A Dynamic Index Method for Querying XML Data by Tree Structures. • XPRESS: A Queriable Compression for XML Data. • D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data.

– Data Integration and Sharing • Capturing both Types and Constraints in Data Integration • Exchanging Intensional XML Data

– Streaming XML • Stream Processing of XPath Queries with Predicates • XPath Queries on Streaming Data

– XML Query Processing • Composing XSL Transformations with XML Publishing Views • Dynamic XML documents with distribution and replication • On Relational Support for XML Publishing: Beyond Sorting and Tagging. • A Comprehensive XQuery to SQL Translation using Dynamic Interval

Encoding.

Page 49: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

49

Year 2003 (2)• VLDB

– XML Queries Processing • Path Queries on Compressed XML • On the minimization of Xpath queries • Covering Indexes for XML Queries: Bisimulation - Simulation = Negation • Projecting XML Documents. • Mixed Mode XML Query Processing • From Tree Patterns to Generalized Tree Patterns: On Efficient Evaluation of XQuery • Efficient Processing of Expressive Node-Selecting Queries on XML Data in Secondary Storage:

A Tree Automata-based Approach. • Query Processing for High-Volume XML Message Brokering. • Holistic Twig Joins on Indexed XML Documents

– XML Matching & Storage • Phrase Matching in XML • RRXF: Redundancy reducing XML storage in relations. • MARS: A System for Publishing XML from Mixed and Redundant Storage.

• ICDE– Query / Transaction Processing and Optimization

• Navigation- vs. Index-Based XML Multi-Query Processing • XR-Tree: Indexing XML Data for Efficient Structural Joins

Page 50: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

50

Year 2003 (3)– Semi-structured Data, Metadata and XML

• Keyword Proximity Search on XML Graphs • XPath Query Evaluation: Improving Time and Space Efficiency • PBiTree Coding and Efficient Processing of Containment Joins • Structural Join Order Selection for XML Query Optimization • Streaming XPath Processing with Forward and Backward Axes • PXML: A Probabilistic Semi-structured Data Model and Algebra • X-Diff: An Effective Change Detection Algorithm for XML Documents • A Framework for the Selective Dissemination of XML Documents based on Inferred

User Profiles• Propagating XML Constraints to Relations

– Poster Papers• Navigation- vs. Index-Based XML Multi-Query Processing • Index-Based Approximate XML Joins• XML Publishing: Look at Siblings too

– Demonstration Paper • PIX: A System for Phrase Matching in XML Documents

Page 51: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

51

Year - 2003 (Cont’d)• ICDE

– Semi-structured Data, Metadata and XML • Keyword Proximity Search on XML Graphs • XPath Query Evaluation: Improving Time and Space Efficiency • PBiTree Coding and Efficient Processing of Containment Joins • Structural Join Order Selection for XML Query Optimization • Streaming XPath Processing with Forward and Backward Axes • PXML: A Probabilistic Semi-structured Data Model and Algebra • X-Diff: An Effective Change Detection Algorithm for XML Documents • A Framework for the Selective Dissemination of XML Documents based on

Inferred User Profiles• Propagating XML Constraints to Relations

– Poster Papers• Navigation- vs. Index-Based XML Multi-Query Processing • Index-Based Approximate XML Joins• XML Publishing: Look at Siblings too

– Demonstration Paper • PIX: A System for Phrase Matching in XML Documents

Page 52: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

52

Year – 2004 (1)

• SIGMOD• BLAS: An Efficient XPath Processing System • Lazy Query Evaluation for Active XML• Implementing a scalable XML publish / subscribe system

using a relational database system• Incremental Maintenance of XML Structural Indexes • Flexible Structure and Full-Text Querying for XML• Data Stream Management for Historical XML Data • Constraint-Based XML Query Rewriting For Data

Integration • Incremental Evaluation of Schema-Directed XML

Publishing• Approximate XML Query Answers • Secure XML Querying with Security Views

Page 53: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

53

Year 2004 (2)

• VLDB04• XML Views and Schemas

– Answering XPath Queries over Networks by Sending Minimal ViewsKeishi Tajima, Yoshiki Fukui (JAIST)

– A Framework for Using Materialized XPath Views in XML Query ProcessingAndrey Balmin, Fatma Özcan, Kevin Beyer, Roberta Cochrane, Hamid Pirahesh (IBM Almaden)

– Schema-Free XQueryYunyao Li, Cong Yu, H. V. Jagadish (Univ. of Michigan)

• Controlling Access – Client-Based Access Control Management for XML Documents

Luc Bouganim (INRIA Rocquencourt), François Dang Ngoc, Philippe Pucheral (PRiSM Laboratory) – Secure XML Publishing without Information Leakage in the Presence of Data Inference

Xiaochun Yang (Northeastern Univ.), Chen Li (Univ. of Californina, Irvine) – On Testing Satisfiability of Tree Pattern Queries

Laks V. S. Lakshmanan, Ganesh Ramesh, Hui (Wendy) Wang, Zheng (Jessica) Zhao (Univ. of British Columbia)

– Containment of Nested XML Queries Xin Dong, Alon Halevy, Igor Tatarinov (Univ. of Washington)

– Efficient XML-to-SQL Query Translation: Where to Add the Intelligence?Rajasekar Krishnamurthy (IBM Almaden), Raghav Kaushik (Microsoft Research), Jeffrey Naughton (Univ. of Wisonsin-Madison)

– Taming XPath Queries by Minimizing Wildcard StepsChee-Yong Chan (National Univ. of Singapore), Wenfei Fan (Univ. of Edinburgh and Bell Laboratories), Yiming Zeng (National Univ. of Singapore)

– The NEXT Framework for Logical XQuery OptimizationAlin Deutsch, Yannis Papakonstantinou, Yu Xu (Univ. of California, San Diego)

Page 54: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

54

Year 2004 (3)– VLDB04– Indexing Temporal XML Documents

Alberto Mendelzon, Flavio Rizzolo (Univ. of Toronto), Alejandro Vaisman (Univ. of Buenos Aires) – Schema-based Scheduling of Event Processors and Buffer Minimization for Queries on Structured Data Streams

Christoph Koch, Stefanie Scherzinger (Technische Univ. Wien), Nicole Schweikardt (Humboldt Univ. Berlin), Bernhard Stegmaier (Technische Univ. München)

– Bloom Histogram: Path Selectivity Estimation for XML Data with UpdatesWei Wang (Univ. of NSW), Haifeng Jiang, Hongjun Lu (Hong Kong Univ. of Science and Technology), Jeffrey Xu Yu (The Chinese Univ. of Hong Kong)

– XQuery on SQL HostsTorsten Grust, Sherif Sakr, Jens Teubner (Univ. of Konstanz)

– ROX: Relational Over XMLAlan Halverson (Univ. of Wisconsin-Madison), Vanja Josifovski, Guy Lohman, Hamid Pirahesh (IBM Almaden), Mathias Mörschel

– From XML View Updates to Relational View Updates: Old Solutions to a New ProblemVanessa Braganholo (Universidade Federal do Rio Grande do Sul), Susan Davidson (Univ. of

• XML Implementations, Automatic Physical Design and Indexing – Query Rewrite for XML in Oracle XML DB

Muralidhar Krishnaprasad, Zhen Liu, Anand Manikutty, James W. Warner, Vikas Arora, Susan Kotsovolos (Oracle Corp.) – Indexing XML Data Stored in a Relational Database

Shankar Pal, Istvan Cseri, Oliver Seeliger, Gideon Schaller, Leo Giakoumakis, Vasili Zolotov (Microsoft Corp.) – Automated Statistics Collection in DB2 UDB

Ashraf Aboulnaga, Peter Haas (IBM Almaden), Mokhtar Kandil, Sam Lightstone (IBM Toronto Lab), Guy Lohman, Volker Markl (IBM Almaden), Ivan Popivanov (IBM Toronto Lab), Vijayshankar Raman (IBM Almaden)

– High Performance Index Build Algorithms for Intranet Search EnginesMarcus Fontoura, Eugene Shekita, Jason Zien, Sridhar Rajagopalan, Andreas Neumann (IBM Almaden)

– Automated Design of Multi-dimensional Clustering Tables in Relational Databases Sam Lightstone (IBM Toronto Lab), Bishwaranjan Bhattacharjee (IBM T.J. Watson Res. Ctr.)

•  

Page 55: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

55

Year 2004 (4)• ICDE

– A prime number labeling scheme for dynamic ordered XML trees– A Web-Services Architecture for Efficient XML Data Exchange– Efficient incremental validation of XML documents– Integrating XML data in the TARGIT OLAP system.pdf– Minimization and group by detection for nested XQueries.pdf– Multiresolution indexing of XML for frequent queries– PRIX--Indexing And Querying XML Using Pr¨ufer Sequence– Recursive XML schemas– Routing XML queries– Selectivity Estimation for XML Twigs– Storing XML with XSD in SQL databases– Updates and Incremental Validation of XML Documents– XBench benchmark and performance testing of XML DBMSs– XJoin index.– XML database to support open XML– XML query processing

Page 56: 1 XML: eXtensible Markup Language. 2 XML Overview XML is a meta-language, a simplified form of SGML (Standard Generalized Markup Language) XML was initiated

56

Year 2005…Year 2006• SIGMOD • VLDB• ICDE05

– Vectorizing and Querying Large XML Repositories,Peter Buneman, Byron Choi, Wenfei Fan, Robert Hutchison, Robert Mann,Stratis Viglas

– On the Sequencing of Tree Structures for XML Indexing,Haixun Wang, Xiaofeng Meng– A Probabilistic XML Approach to Data Integration,

Maurice van Keulen, Ander de Keijzer, Wouter Alink– Efficient Creation and Incremental Maintenance of the HOPI Index for Complex XML Document Collections,Ralf Schenkel, Anja

Theobald, Gerhard Weikum

– Bloom Filter-based XML Packets Filtering for Millions of Path Queries,Xueqing Gong, Ying Yan, Weining Qian, Aoying Zhou

– Cache-Conscious Automata for XML Filtering,Bingsheng He, Qiong Luo, Byron Choi

– MAX: The Big Picture of Dynamic XML Statistics,Maya Ramanath, Lingzhi Zhang, Juliana Freire, Jayant R. Haritsa

– Full-fledged Algebraic XPath Processing in Natix,Matthias Brantner, Sven Helmer, Carl-Christian Kanne, Guido Moerkotte

– XML Views as Integrity Constraints and their Use in Query Translation,Rajasekar Krishnamurthy, Raghav Kaushik, Jeffrey F. Naughton

– BOXes: Efficient Maintenance of Order-Based Labeling for Dynamic XML Data,Adam Silberstein, Hao He, Ke Yi, Jun Yang

– Adaptive Processing of Top-K Queries in XML,Amelie Marian, Sihem Amer-Yahia, Nick Koudas, Divesh Srivastava