2003-3-28dasfaa2003, kyoto, japan1 glass: a graphical query language for semi-structured data wei ni...

44
2003-3-28 DASFAA2003, Kyoto, Japan 1 GLASS GLASS : : A Graphical Query A Graphical Query Language for Semi- Language for Semi- Structured Data Structured Data Wei Ni Tok Wang Ling Department of Computer Science National University of Singapore, Singapore E-mail: {niwei, lingtw}@comp.nus.edu.sg

Upload: philomena-lynch

Post on 19-Jan-2016

224 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 1

GLASSGLASS:: A Graphical Query A Graphical Query Language for Semi-Structured DataLanguage for Semi-Structured Data

Wei Ni Tok Wang LingDepartment of Computer Science

National University of Singapore, SingaporeE-mail: {niwei, lingtw}@comp.nus.edu.sg

Page 2: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 2

RoadmapRoadmap

1. Introduction and motivation

2. ORA-SS the data model of our language

3. GLASS, our graphical query language

4. Related works and comparison

5. Conclusion and future work

Page 3: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 3

1. Introduction and 1. Introduction and motivationmotivation

XML: a standard for representation, manipulation and exchange of data.

Query engines: a crucial application to exploit the full power of XML.

XQuery: a standard for querying XML data which is too difficult for non-technical users to use.

Graphical query language: a way to help users query XML data.

Page 4: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 4

1. Introduction and motivation 1. Introduction and motivation (Cont.)(Cont.)

Criteria of a good query language Expressiveness Completeness User-friendliness

Graphical Query Language: user-friendly and easy understanding

Page 5: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 5

2. ORA-SS2. ORA-SS the data model of our language the data model of our language

ORA-SS (Object-Relationship-Attribute model for Semi-Structured data) : a rich semantic data model.

<!ELEMENT department (course+)> <!ATTLIST department name ID #REQUIRED><!ELEMENT course (title?, student+)> <!ATTLIST course code ID #REQUIRED><!ELEMENT title PCDATA><!ELEMENT student (name?, grade+)> <!ATTLIST student number #IMPLIED><!ELEMENT name PCDATA><!ELEMENT grade PCDATA>

Example 1:

The DTD of “Department.xml”

department

course

student

name

code title

number name grade

2, 1:n, 1:1

cs, 2, 1:n, 1:n

cs

The ORA-SS schema diagram of “Department.xml”

There is a binary relationship type, named as “cs”, between course and student where one course may has one or many students and one students can take one or many courses.

<grade> belongs to the binary relationship type “cs” between <course> and <student>

Page 6: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 6

3. GLASS3. GLASS our graphical query our graphical query languagelanguage

GLASS (Graphical Query Language for Semi-Structured Data)

Targets of GLASS support various queries including Aggregation

Functions and Negation; be clear and concise without ambiguity; provide “freedom” to non-technical users in querying.

Page 7: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 7

3. GLASS3. GLASS our graphical query languageour graphical query language 3.1 General concepts in GLASS3.1 General concepts in GLASS

1) Data iconsi. Rectangle: the object class in ORA-SS, non-terminal

element in XML (element with subelements or attributes).

ii. Circle: the attribute in ORA-SS, terminal element with PCDATA only and the attribute (or attribute list) in XML.

2) Connectionsi. Arrow: relationship typeii. Dashed arrow: IDREF in XMLiii. Line: link between output and original entities

Page 8: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 8

3. GLASS3. GLASS our graphical query languageour graphical query language 3.1 General concepts in GLASS 3.1 General concepts in GLASS (Cont.)(Cont.)

3) Box: the group entity in GLASS query graphs that consists of all rectangles or circles inside the box.

4) Derived entities: derived object classes and attributes are represented as dashed rectangles and dashed circles which are new data types defined by users.

5) Condition Logic Window (CLW): an optional part in a GLASS query to write logic expressions and statements (e.g. IF-THEN) for complex query conditions and/or constructions.

Page 9: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 9

3. GLASS3. GLASS our graphical query languageour graphical query language 3.1 General concepts in GLASS 3.1 General concepts in GLASS (Cont.)(Cont.)

6) Path Identifier a unique name given to data icons or boxes; indicated by prefix “$”.

7) Condition Identifier a unique name given to the connections; without prefix “$”.

8) Logic Expression: specifies the logic in query conditions

9) Statement: helps construct complex outputs

Page 10: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 10

3. GLASS3. GLASS our graphical query languageour graphical query language 3.1 General concepts in GLASS 3.1 General concepts in GLASS (Cont.)(Cont.)Structure of GLASS query graph

CLW

Logic expressions and statements

Query condition

Result construction

The black parts are optional.

Page 11: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 11

3. GLASS3. GLASS our graphical query languageour graphical query language 3.2 3.2 Output constructionOutput construction

<?xml version = “1.0” standalone = “no” encoding = “UTF-8”><DOCTYPE BOOK SYSTEM “Department.dtd”><department name=“CS”> <course code=“201”> <title>Software Engineering</title> <student number=“1001”> <name>John Smith</name> <grade>A</grade> </student> <student number=“1002”> <name>Mel Green</name> <grade>C</grade> </student> </course> <course code=“303”> <title>Database Design</title> <student number=“1001”> <name>John Smith</name> <grade>B</grade> </student> </course></department>

The content of “Department.xml”

Example 1:

<!ELEMENT department (course+)> <!ATTLIST department name ID #REQUIRED><!ELEMENT course (title?, student+)> <!ATTLIST course code ID #REQUIRED><!ELEMENT title PCDATA><!ELEMENT student (name?, grade+)> <!ATTLIST student number #IMPLIED><!ELEMENT name PCDATA><!ELEMENT grade PCDATA>

The DTD of “Department.xml”

department

course

student

name

code title

number name grade

2, 1:n, 1:1

cs, 2, 1:n, 1:n

cs

The ORA-SS schema diagram of “Department.xml”

Page 12: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 12

<?xml version = “1.0” standalone = “no” encoding = “UTF-8”><DOCTYPE BOOK SYSTEM “Department.dtd”><department name=“CS”> <course code=“201”> <title>Software Engineering</title> <student number=“1001”> <name>John Smith</name> <grade>A</grade> </student> <student number=“1002”> <name>Mel Green</name> <grade>C</grade> </student> </course> <course code=“303”> <title>Database Design</title> <student number=“1001”> <name>John Smith</name> <grade>B</grade> </student> </course></department>

3. GLASS3. GLASS our graphical query languageour graphical query language 3.2 3.2 Output construction (Cont.)Output construction (Cont.)

course<course code=“201”> <title>Software Engineering</title></course><course code=“303”> <title>Database Design</title></course> 

(a) Extract courses and all information one level under course elements by using the default output method

In the default output method, everything will be kept in the original style from source data.

QUERY QUERY RESULT

SOURCE DATA

department

course

student

name

code title

number name grade

2, 1:n, 1:1

cs, 2, 1:n, 1:n

cs

The default number of level is “1”. We use “/2” to represent 2 levels under element course.

The default entity type implies both attributes and simple subelements of course element

<?xml version = “1.0” standalone = “no” encoding = “UTF-8”><DOCTYPE BOOK SYSTEM “Department.dtd”><department name=“CS”> <course code=“201”> <title>Software Engineering</title> <student number=“1001”> <name>John Smith</name> <grade>A</grade> </student> <student number=“1002”> <name>Mel Green</name> <grade>C</grade> </student> </course> <course code=“303”> <title>Database Design</title> <student number=“1001”> <name>John Smith</name> <grade>B</grade> </student> </course></department>

Page 13: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 13

<?xml version = “1.0” standalone = “no” encoding = “UTF-8”><DOCTYPE BOOK SYSTEM “Department.dtd”><department name=“CS”> <course code=“201”> <title>Software Engineering</title> <student number=“1001”> <name>John Smith</name> <grade>A</grade> </student> <student number=“1002”> <name>Mel Green</name> <grade>C</grade> </student> </course> <course code=“303”> <title>Database Design</title> <student number=“1001”> <name>John Smith</name> <grade>B</grade> </student> </course></department>

<course code=“201”> <title>Software Engineering</title> <student number=“1001”> <name>John Smith</name> <grade>A</grade> </student> <student number=“1002”> <name>Mel Green</name> <grade>C</grade> </student></course><course code=“303”> <title>Database Design</title> <student number=“1001”> <name>John Smith</name> <grade>B</grade> </student></course>

(b) Extract courses and all information at all levels under course elements by using the default output method

QUERY QUERY RESULT

SOURCE DATA

course

*

“*” is a wildcard which means all nested level under course element.

3. GLASS3. GLASS our graphical query languageour graphical query language 3.2 3.2 Output construction (Cont.)Output construction (Cont.)

department

course

student

name

code title

number name grade

2, 1:n, 1:1

cs, 2, 1:n, 1:n

cs

Page 14: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 14

<?xml version = “1.0” standalone = “no” encoding = “UTF-8”><DOCTYPE BOOK SYSTEM “Department.dtd”><department name=“CS”> <course code=“201”> <title>Software Engineering</title> <student number=“1001”> <name>John Smith</name> <grade>A</grade> </student> <student number=“1002”> <name>Mel Green</name> <grade>C</grade> </student> </course> <course code=“303”> <title>Database Design</title> <student number=“1001”> <name>John Smith</name> <grade>B</grade> </student> </course></department>

course<course code=“201”></course><course code=“303”></course> 

(c) Extract courses with all attributes of course element in the original XML document

QUERY QUERY RESULT

SOURCE DATA

@

3. GLASS3. GLASS our graphical query languageour graphical query language 3.2 3.2 Output construction (Cont.)Output construction (Cont.)

The circle with “@” inside implies all attributes of course element in the original XML document

department

course

student

name

code title

number name grade

2, 1:n, 1:1

cs, 2, 1:n, 1:n

cs

Page 15: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 15

<?xml version = “1.0” standalone = “no” encoding = “UTF-8”><DOCTYPE BOOK SYSTEM “Department.dtd”><department name=“CS”> <course code=“201”> <title>Software Engineering</title> <student number=“1001”> <name>John Smith</name> <grade>A</grade> </student> <student number=“1002”> <name>Mel Green</name> <grade>C</grade> </student> </course> <course code=“303”> <title>Database Design</title> <student number=“1001”> <name>John Smith</name> <grade>B</grade> </student> </course></department>

course<course> <title>Software Engineering</title></course><course> <title>Database Design</title></course>  

(d) Extract courses and all subelements at one level under course elements (except the attributes of course elements)

QUERY QUERY RESULT

SOURCE DATA

3. GLASS3. GLASS our graphical query languageour graphical query language 3.2 3.2 Output construction (Cont.)Output construction (Cont.)

department

course

student

name

code title

number name grade

2, 1:n, 1:1

cs, 2, 1:n, 1:n

cs

e

The circle with “e” inside indicates all simple subelements of course element.

Page 16: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 16

<?xml version = “1.0” standalone = “no” encoding = “UTF-8”><DOCTYPE BOOK SYSTEM “Department.dtd”><department name=“CS”> <course code=“201”> <title>Software Engineering</title> <student number=“1001”> <name>John Smith</name> <grade>A</grade> </student> <student number=“1002”> <name>Mel Green</name> <grade>C</grade> </student> </course> <course code=“303”> <title>Database Design</title> <student number=“1001”> <name>John Smith</name> <grade>B</grade> </student> </course></department>

course<course code=“201”> <title>Software Engineering</title> <student></student> <student></student></course><course code=“303”> <title>Database Design</title> <student></student></course> 

(e) Extract courses with their attributes and/or simple subelements as well as the contents of complex subelements at one level under course element.

QUERY QUERY RESULT

SOURCE DATA

Since only the entities at one level under course element are displayed, the student number at the second level under course doesn’t appear in the result.

3. GLASS3. GLASS our graphical query languageour graphical query language 3.2 3.2 Output construction (Cont.)Output construction (Cont.)

The blank rectangle stands for all complex subelements of course element.

department

course

student

name

code title

number name grade

2, 1:n, 1:1

cs, 2, 1:n, 1:n

cs

The blank circle implies both attributes and simple subelements of course element

Page 17: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 17

<?xml version = “1.0” standalone = “no” encoding = “UTF-8”><DOCTYPE BOOK SYSTEM “Department.dtd”><department name=“CS”> <course code=“201”> <title>Software Engineering</title> <student number=“1001”> <name>John Smith</name> <grade>A</grade> </student> <student number=“1002”> <name>Mel Green</name> <grade>C</grade> </student> </course> <course code=“303”> <title>Database Design</title> <student number=“1001”> <name>John Smith</name> <grade>B</grade> </student> </course></department>

<course title=“Software Engineering”> <code>201</code></course><course title=“Database Design”> <code>303</code></course>

(f) Extract courses with their titles as attributes and codes as subelements in the output.

a demonstration of the conversion between attribute and subelement

QUERY QUERY RESULT

SOURCE DATA

3. GLASS3. GLASS our graphical query languageour graphical query language 3.2 3.2 Output construction (Cont.)Output construction (Cont.)

code

course

title@

course

title

codee

The links here imply that the values of the new defined title attribute and code element come from the title and code in the original data.

Page 18: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 18

3. GLASS3. GLASS our graphical query languageour graphical query language 3.3 3.3 Basic query operatorsBasic query operators SelectionSelection

To select all courses whose codes begin with “2”, we can write an XQuery language as:

FOR $c IN $department/courseWHERE $c/@code/data() = ‘2%’RETURN $c

course course

code = ‘2%’

*

The output course in RHS is the course that satisfies the condition in LHS

Page 19: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 19

3. GLASS3. GLASS our graphical query languageour graphical query language 3.43.4 Basic query operatorsBasic query operators ProjectionProjection

To project all courses with their titles, XQuery will give the following expression:

FOR $c IN $department/courseRETURN <course>{ $c/title }</course>

course

title

Without the specification of subelements or attributes, the title will remain unchanged as a subelement of course in the result.

Page 20: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 20

3. GLASS3. GLASS our graphical query languageour graphical query language 3.5 3.5 Basic query operatorsBasic query operators Join Join

Suppose we have another XML data named as “Description.xml” containing the descriptions of all courses.

course

code title description

<!ELEMENT course (title?, description)> <!ATTLIST course code ID #REQUIRED><!ELEMENT title PCDATA><!ELEMENT description PCDATA>

The DTD of “Description.xml” The ORA-SS schema diagram of “Description.xml”

Page 21: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 21

3. GLASS3. GLASS our graphical query languageour graphical query language 3.5 3.5 Basic query operators Basic query operators Join Join (Con(Cont.)t.)

Query: To extract everything of the courses from “Department.xml”, which have descriptions in “Description.xml”, and put the corresponding descriptions under the courses in the results.

FROM Department.xml

courseFROM Description.xml

course

code description

course

*

description

The URLs of data sources

Page 22: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 22

3. GLASS3. GLASS our graphical query languageour graphical query language 3.6 3.6 Aggregation functionsAggregation functions groupgroup & & averageaverage

Query: For all courses, display courses with their information (in default way) and the average grade of each course.

Aggregation functions are available after “_group” functionality is done.

To group student under course

avg_grade is a derived entity and it will be constructed as a subelement of “course”

course

student

_group

cs

grade

course

avg_grade

AVG

eThe character “e” inside the dotted circle implies the avg_grade is a element type

Page 23: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 23

3. GLASS3. GLASS our graphical query languageour graphical query language 3.6 3.6 Aggregation functionsAggregation functions groupgroup & & averageaverage

Query: For all courses, display courses with their information (in default way) and the average grade of each course. (Presented by XQuery expressions as follows)

for $ccd in distinct-values(document("Department.xml")//course/@code)

let $c := document("Department.xml")//course,

$s := $c/student

where $c/@code = $ccd

return

<course code = "{$c/@code}" >

{

$c/title,

}

<avg_grade> {avg($s/grade)} </avg_grade>

</course>

Page 24: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 24

3. GLASS3. GLASS our graphical query languageour graphical query language 3.7 3.7 Query order sensitive dataQuery order sensitive data

The ORA-SS schema diagram of “Bib.xml”, order-sensitive data

Example 2:

isbn title

firstname lastname

2, +, +, <

content

author

book

The <author> order is important to the <book>

Suppose we have an order sensitive data “Bib.xml”.

Page 25: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 25

3. GLASS3. GLASS our graphical query languageour graphical query language 3.7 3.7 Query order sensitive data (ConQuery order sensitive data (Cont.)t.)

author

book

isbn title

[1]

Query: Display all books with their isbn’s, titles and their first authors .

“[1]” means the first element in order

Page 26: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 26

3. GLASS3. GLASS our graphical query languageour graphical query language 3.8 3.8 Advanced features of GLASSAdvanced features of GLASS

Suppose we have a more complex XML data about projects, members and publications.

project

member

publication

id name

name job_title

number title

2, +, +

3, *, +

Example 3:

The ORA-SS schema diagram of the data about “project, member and publication”

Ternary relation among <project>, <member> and <publication> where one member in one project can have 0 or many publications; and one publication can belong to one or many (project, member) pairs.

Page 27: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 27

3. GLASS3. GLASS our graphical query languageour graphical query language 3.8.1 3.8.1 Group entity Group entity the use of box the use of box

Query: Display the members with names who have taken part in less than 5 projects but written more than 6 publications in some project they attended, and their names begin with the character “S”.

_group

member

name = ‘S%’

project

_group

publication

CNT < 5

CNT > 6

member

name

To group projects under member

“CNT” means “count”. (“CNT” do count without eliminating duplicates.) We can also use other aggregation functions like SUM, AVG, etc.

To group publications under each pair of (member, project)

The box

Page 28: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 28

3. GLASS3. GLASS our graphical query languageour graphical query language 3.8.1 3.8.1 Group entity Group entity the use of box the use of box (Con(Cont.)t.)

A new query: Display the members with names who have totally taken part in less than 5 projects and totally written more than 6 publications, and their names begin with character “S”.

member

name

member

name = ‘S%’

project

_group

publication

CNT < 5

CNT_UNIQUE > 6

_group

_group

member

name = ‘S%’

project

_group

publication

CNT < 5

CNT > 6

member

name

The previous query: Display the members with names who have taken part in less than 5 projects but written more than 6 publications in some project they attended, and their names begin with the character “S”.

Select publication GROUP BY <member, project> pairs

Select publication only GROUP BY <member>

CNT_UNIQUE will eliminate duplicates in counting.

Page 29: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 29

3. GLASS3. GLASS our graphical query languageour graphical query language 3.8.2 3.8.2 Negation Negation the use of the use of Condition Logic WindowCondition Logic Window

member

name

member

publication

title = “Introduction to XML”

: A :

CLW

¬ A;

QUERY: display the name of those members who haven’t written the publication titled “Introduction to XML”.

Logic expression means “does not have A” or “does not exist A”

The identifier “A” means has a publication with title = “Introduction to XML”

The colons are not part of the identifiers but distinguish the identifiers from relationship type names.

Page 30: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 30

3. GLASS3. GLASS our graphical query languageour graphical query language 3.8.3 3.8.3 IF-THEN StatementIF-THEN Statement

CLW

A B;

{IF (A) THEN EXTRACT $pro;}

member

publication

title = “Introduction to XML”

: A :

publication

: B :

title = “Introduction to Internet”

member

name

project : $pro : Without the statements in CLW, the information of the projects that all satisfied members have taken part in will be displayed.With the statement in CLW,

the information of the projects will be extracted only when condition A is satisfied.

Query: Display the members with their names who have written a publication titled “Introduction to XML” or “Introduction to Internet”; and for those members who have written “Introduction to XML”, also display all information about the projects that they have taken part in.

Path identifier “$pro” represents the project element with its the attributes and/or simple subelements

Page 31: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 31

4. Related works and comparison4. Related works and comparison Form-based graphical XML query languages have

a limited expression power. Graphical XML Query Language [15] XMLApe Query Language [17]

XML-GL[4,5] is not strong in expressing query logic and may cause misunderstanding in some cases.

Ref[4] S. Ceri, S. Comai, E. Damiani, P, Fraternali, S. Paraboschi, and L. Tanca. XML-GL: a graphical language of querying and restructuring XML documents. In Proc. WWW8, Toronto, Canada, May 1999Ref[5] S. Ceri, S. Comai, E. Damiani, P. Fraternali, and L. Tanca. Complex Queries in XML-GL. SAC (2) 2000: 888-893Ref[15] Leo Mark, etc. XMLApe. College of Computing, Georgia Institue of Technology. http://www.cc.gatech.edu/projects/XMLApe/ Ref[17] Jan Paredaens, Peter Peelman, Letizia Tanca. G-Log: A Graph-Based Query Language. IEEE Transactions on Knowledge and Data Engineering, 7(3):436--453, June 1995.

Page 32: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 32

XML-GL is the world’s first visual language that explicitly addresses the full complexity of querying XML data.

Has a bipartite structure to express querying and restructuring.

Supports various XML queries Selection and projection Join from one or more input documents; Construction of new documents and new elements Arithmetic and aggregate functions Union, difference, heterogeneous union, and Cartesian

product.

4. Related works and comparison4. Related works and comparison Features of XML-GLFeatures of XML-GL

Page 33: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 33

4. Related works and comparison4. Related works and comparison An example of XML-GLAn example of XML-GL

The first interpretation:

For all <manufacturer> elements where rank <= 10, we output MN-NAME, YEAR and <model> subelement with MO-NAME and RANK;

and for the remaining, we only output MN-NAME and YEAR. (No <manufacturer> will appear twice in the result; and all <manufacturer>s are output together in original order.)

The second interpretation:

For all <manufacturer> elementswhere rank <= 10, we output MN-NAME, YEAR and <model> subelement with MO-NAME and RANK;

and for all <manufacturer> elementswe output MN-NAME and YEAR again but do not output <model> subelement.(Some <manufacturer>s will appear twice in the result.)

Page 34: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 34

4. Related works and comparison4. Related works and comparison How we express bothHow we express both (the 1(the 1stst

interpretation)interpretation)

For all <manufacturer> elements where rank <= 10, we output MN-NAME, YEAR and <model> subelement with MO-NAME and RANK; and for the remaining, we only output MN-NAME and YEAR..

(No <manufacturer> will appear twice in the output; and all <manufacturer>s are output together in original order)

MANUFACTURER

MODEL

RANK <= 10

MANUFACTURER

MODELMN-NAME

YEAR

MO-NAME

RANK

Without any links with the entities in the LHS, all <MANUFACTURE>s are directly extracted from the data.

With this link between two <MODEL>s, only the <MODEL>s with RANK <= 10 appear in the results.

Page 35: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 35

4. Related works and comparison4. Related works and comparison How we express bothHow we express both (the 2(the 2nd nd

interpretation)interpretation)

MANUFACTURER

MN-NAME

YEAR

For all <manufacturer> elements where rank <= 10, we output MN-NAME, YEAR and <model> subelement with MO-NAME and RANK; and for all <manufacturer> elements we output MN-NAME and YEAR again but do not output <model> subelement.

(The results of both queries are put separately in one file.)

MANUFACTURER

MODEL

RANK <= 10

MANUFACTURER

MODELMN-NAME

YEAR

MO-NAME

RANK

With this link, only <model>s with rank <=10 are kept in the outputs

In comparison, the MANUFACTURER without any links with the LHS means all MANUFACTURER elements from the source data.

The link between the two MANUFACTURERs implies only the MANUFACTURER that satisfies the condition in the LHS will be displayed.

Page 36: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 36

4. Related works and comparison4. Related works and comparison ComparisonComparison

XML-GL

Graphical XML Query Language

XMLApe Query Language

GLASS

Data Model XML DTD XML DTD XML Schema ORA-SS

Support Selection, Projection

and Join

Yes

Yes Yes Yes

Support Queries on Ordered Data

Yes No No Yes

Support “group by” operator

Yes No No Yes

Support Aggregation Function

Yes

No

No

Yes

Support Negation No No No Yes

Support Complex Condition

Yes No No Yes

Page 37: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 37

4. Related works and comparison4. Related works and comparison Comparison (Cont.)Comparison (Cont.)

XML-GL

Graphical XML Query Language

XMLApe Query Language

GLASS

Support XPath Yes No No Yes

Support Qualifiers(, )

No No No Yes

Support Conditional Output Construction (e.g. With IF-THEN

Clause)No No No Yes

Support User-defined View

Yes No No Yes

Support View Validation

No No No Yes

Page 38: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 38

5. Conclusion5. Conclusion

Query logic can be explicitly expressed in CLW. In comparison with XML-GL, GLASS can express

negation and logic among conditions easily and clearly. By separating logic expressions from graphical

data structures, we now can build a clear and concise graphical query language.

Page 39: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 39

5. Conclusion (Cont.)5. Conclusion (Cont.)

We tend to express the complexity of XML queries in XQuery standard including Aggregation functions and Negation.

We support view transformation and validation, especially swapping elements in different levels. [6]

Ref[6]: Yabing Chen, Tok Wang Ling, Mong Li Lee: Designing Valid XML Views. To appear in the proceedings of 21st International Conference on Conceptual Modeling (ER'2002), October 7-11, 2002, Tampere, Finland.

Page 40: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 40

Future workFuture work

Enhance the language; Interpret the graphical expression into XQu

ery standard; Expand the content of GLASS

data manipulation (e.g., INSERT, etc) data integration view definition view maintenance

Page 41: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 41

ReferencesReferences[1] S. Abiteboul, D. Quass, J. McHugh, J. Widom and J. Wiener. The Lorel Query Language for Semistructured Data. Department of Computer Science, Stanford University. International Journal on Digital Libraries, 1(1):68-88, Apr. 1997.

[2] M. Angelaccio, T. Catarci, and G. Santucci. QDB*: A graphical query language with recursion. IEEE Transactions on Software Engineering, 16(10):1150-1163, 1990.

[3] T. Catarci, S.K. Chang, M.F. Costabile, S. Levialdi, and G. Santucci. A graph-based framework for multiparadigmatic visual access to databases. IEEE Transactions on Knowledge and Data Engineering, 8(3):455-475, 1996.

[4] S. Ceri, S. Comai, E. Damiani, P, Fraternali, S. Paraboschi, and L. Tanca. XML-GL: a graphical language of querying and restructuring XML documents. In Proc. WWW8, Toronto, Canada, May 1999

[5] S. Ceri, S. Comai, E. Damiani, P. Fraternali, and L. Tanca. Complex Queries in XML-GL. SAC (2) 2000: 888-893

[6] Yabing Chen, Tok Wang Ling, Mong Li Lee: Designing Valid XML Views. To appear in the proceedings of 21st International Conference on Conceptual Modeling (ER'2002), October 7-11, 2002, Tampere, Finland.

[7] Zhuo Chen. Extracting Schema from XML Documents. SoC, NUS. Honours Year Project Report.

[8] Sara Comai, Ernesto Damiani, Letizia Tanca. The WG-Log System: Data Model and Semantics. INTERDATA technical report, T2-R06, July 1998.

[9] C. J. Date. An Introduction to Database Systems. 3rd Edition, Addison-Wesley Publishing Company, 1981.

[10] Gillian Dobbie, Wu Xiaoying, Tok Wang Ling, Mong Li Lee: ORA-SS: An Object-Relationship-Attribute Model for Semistructured Data. TR21/00, Technical Report, Department of Computer Science, National University of Singapore, December 2000.

Page 42: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 42

References (Cont.)References (Cont.)[11] P.W. Eklund, J. Leane, and C. Nowak. GrIT: An implementation of a graphical user interface for conceptual structures. Technical Report TR94-03, Computer Science Department, The University of Adelaide, February 1994.

[12] Extensible Stylesheet Language (XSL) Specification. W3C Working Draft. Apr 1999. http://www.w3.org/TR/1999/WD-xsl-19990421/

[13] Ankur Gupta, Zahid Khan. Graphical XML Query Language. Course paper. College of Computing, Georgia Institute of Technology, Sep 2000

[14] Joshua S. Hodas, Robert M. Keller, Ingo Muschenets, Jeffrey Polakow, Amy R. Ward and Will Ballard. Condor: A Simple, Expressive Graphical Database Query Language. Department of Computer Science, Harvey Mudd College. Computer Science Technical Report HMC-CS-97-04.

[15] Leo Mark, etc. XMLApe. College of Computing, Georgia Institue of Technology. http://www.cc.gatech.edu/projects/XMLApe/

[16] Yuanying Mo, Tok Wang Ling. Storing and Maintaining Semistructured Data Efficiently in an Object-Relational Database. Research Report. SoC, NUS.

[17] Jan Paredaens, Peter Peelman, Letizia Tanca. G-Log: A Graph-Based Query Language. IEEE Transactions on Knowledge and Data Engineering, 7(3):436--453, June 1995.

[18] Jayavel Shanmugasundaram, Kristin Tufte, Gang He, Chun Zhang, David DeWitt and Jeffrey Naughton. Relational Databases for Querying XML Documents: Limitations and Opportunities. VLDB 1999: 302-314 Department of Computer Sciences, University of Wisconsin-Madison.

[19] XML Path Language (XPath) 2.0. W3C. Apr 2002. http://www.w3.org/TR/xpath20/

Page 43: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 43

References (Cont.)References (Cont.)[20] XML Query Requirements. W3C. Feb 2001. http://www.w3.org/TR/xmlquery-req

[21] XML Syntax for XQuery 1.0 (XQueryX). W3C. Jun 2001.http://www.w3.org/TR/xqueryx

[22] XQuery 1.0 and XPath 2.0 Data Model. W3C. Apr 2002. http://www.w3.org/TR/query-datamodel/

[23] XQuery 1.0 and XPath 2.0 Functions and Operators Version 1.0. W3C. Apr 2002. http://www.w3.org/TR/xquery-operators/

[24] XQuery 1.0 Formal Semantics. W3C. Mar 2002.http://www.w3.org/TR/query-semantics/

Page 44: 2003-3-28DASFAA2003, Kyoto, Japan1 GLASS: A Graphical Query Language for Semi-Structured Data Wei Ni Tok Wang Ling Department of Computer Science National

2003-3-28 DASFAA2003, Kyoto, Japan 44

The EndThe End

Thank you very much