1 statistics xml: –altavista: 800,000 pages returned. –amazon.com: 242 books. in comparison:...
Post on 21-Dec-2015
215 views
TRANSCRIPT
![Page 1: 1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,](https://reader031.vdocument.in/reader031/viewer/2022032801/56649d555503460f94a33300/html5/thumbnails/1.jpg)
1
Statistics• XML:
– Altavista: 800,000 pages returned.
– Amazon.com: 242 books.
• In comparison:– God: 12,000 books, 7 Million pages
– Bible: 32,000 books, 4.6 Million pages.
• More comparisons:– Alon Levy + XML: 132 pages (770 without Alon)
– XML-QL: 509 pages.
– Levy + God: 12,000, (Alon Levy + God: 1, but not me).
– Levy + Bible: 10,000 (Alon Levy + bible: 3; 1 me).
![Page 2: 1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,](https://reader031.vdocument.in/reader031/viewer/2022032801/56649d555503460f94a33300/html5/thumbnails/2.jpg)
2
What is XML?
– Emerging format for data exchange on the web and between applications.
<db> <book> <title>Complete Guide to DB2</title> <author>Chamberlin</author> </book> <book> <title>Transaction Processing</title> <author>Bernstein</author> <author>Newcomer</author> </book> <publisher> <name>Morgan Kaufman</name> <state>CA</state> </publisher></db>
eXtensible Markup Language:
![Page 3: 1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,](https://reader031.vdocument.in/reader031/viewer/2022032801/56649d555503460f94a33300/html5/thumbnails/3.jpg)
3
Attributes and References
<db> <book ID="b1" pub="mkp"> <title>Complete Guide to DB2</title> <author>Chamberlin</author> </book> <book ID="b2" pub="mkp"> <title>Transaction Processing</title> <author>Bernstein</author> <author>Newcomer</author> </book> <publisher ID="mkp"> <name>Morgan Kaufman</name> <state>CA</state> </publisher></db>
XML distinguishes attributes from sub-elements. ID’s and IDREFs are used to reference objects.
![Page 4: 1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,](https://reader031.vdocument.in/reader031/viewer/2022032801/56649d555503460f94a33300/html5/thumbnails/4.jpg)
4
Document Type Descriptors
<!ELEMENT Book (title, author*) >
<!ELEMENT title #PCDATA> <!ELEMENT author (name, address,age?)>
<!ATTLIST Book id ID #REQUIRED> <!ATTLIST Book pub IDREF #IMPLIED>
Sort of like a schema but not really. Won’t stay for very long, either. First in a long series of 3-letter acronyms.
![Page 5: 1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,](https://reader031.vdocument.in/reader031/viewer/2022032801/56649d555503460f94a33300/html5/thumbnails/5.jpg)
5
Origin of XML • Comes from SGML (very nasty language).
• Principle: separate the data from the graphical presentation.
<UL> <li> <b> Complete Guide to DB2 </b> By <i> Chamberlin </i>.
<li> <b> Transaction Processing </b> By <i> Bernstein and Newcomer </i>
<li> <b> The guide to the good lifethrough database research. </b> By <i> Alon Levy </i> <UL>
![Page 6: 1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,](https://reader031.vdocument.in/reader031/viewer/2022032801/56649d555503460f94a33300/html5/thumbnails/6.jpg)
6
XML, After the roots• A format for sharing data.• Applications:
– EDI: electronic data exchange:• Transactions between banks• Producers and suppliers sharing product data (auctions)• Extranets: building relationships between companies• Scientists sharing data about experiments.
– Sharing data between different components of an application.– Format for storing all data in Office 2000.
• Basis for data sharing and integration.
![Page 7: 1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,](https://reader031.vdocument.in/reader031/viewer/2022032801/56649d555503460f94a33300/html5/thumbnails/7.jpg)
7
Why Do People Like it so much?
• It’s easy to learn.
• It’s human readable. No need for proprietary formats anymore.
• It’s very flexible:– Data is self-describing– Can add attributes easily– Data can be irregular
• Note: without common DTD’s data sharing is not solved!
![Page 8: 1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,](https://reader031.vdocument.in/reader031/viewer/2022032801/56649d555503460f94a33300/html5/thumbnails/8.jpg)
8
Why are we DB’ers interested?
• It’s data, stupid. That’s us.• Proof by Altavista:
– database+XML -- 40,000 pages.
• Database issues:– How are we going to model XML? (graphs).– How are we going to query XML? (XML-QL)– How are we going to store XML (in a relational database?
object-oriented?)– How are we going to process XML efficiently? (uh…
well..., um..., ah..., get some good grad students!)
![Page 9: 1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,](https://reader031.vdocument.in/reader031/viewer/2022032801/56649d555503460f94a33300/html5/thumbnails/9.jpg)
9
3-Letter Acronyms
• XML, DTD, W3C
• DOM (Document Object Model)
• XML-schemas
• XQL (very early query language)
• RDF (resource description framework)
• Today, in New Jersey, a W3C committee is meeting to discuss standard query language.
![Page 10: 1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,](https://reader031.vdocument.in/reader031/viewer/2022032801/56649d555503460f94a33300/html5/thumbnails/10.jpg)
10
XML Data Model (Graph)
bookb1
b2
title authorauthor
author
pcdata
Complete... P rincip les...Chamberlin Bernstein Newcomer
pcdata pcdata pcdata pcdata
publisher
name state
CAMorgan...
pcdata pcdata
pub pub
db
mkp
#1 #2 #3 #4 #5 #6 #7
#0
book
title
Issues:• distinguish between attributes and sub-elements?• Should we conserve order?
Think of the labels asnames of binary relations.
![Page 11: 1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,](https://reader031.vdocument.in/reader031/viewer/2022032801/56649d555503460f94a33300/html5/thumbnails/11.jpg)
11
Querying XML
• Requirements:– Query a graph, not a relation.– The result should be a graph (representing an
XML document), not a relation.– No schema.– We may not know much about the data, so we
need to navigate the XML.
![Page 12: 1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,](https://reader031.vdocument.in/reader031/viewer/2022032801/56649d555503460f94a33300/html5/thumbnails/12.jpg)
12
Query Languages
• First, there was XQL (from Microsoft).
• Very quickly realized that it was very limited.
• Then, a bunch of database researchers looked at XML and invented XML-QL.– XML-QL comes from the nicer StruQL
language.– Many people got excited. Formed a committee.
![Page 13: 1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,](https://reader031.vdocument.in/reader031/viewer/2022032801/56649d555503460f94a33300/html5/thumbnails/13.jpg)
13
Extracting Data by Query
• Matching data using elements patterns.WHERE <book>
<publisher><name>Addison-Wesley</></>
<title> $t </>
<author> $a </>
</book> IN “www.a.b.c/bib.xml”
CONSTRUCT $a
![Page 14: 1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,](https://reader031.vdocument.in/reader031/viewer/2022032801/56649d555503460f94a33300/html5/thumbnails/14.jpg)
14
Constructing XML Data
WHERE <book>
<publisher><name>Addison-Wesley</></>
<title> $t </>
<author> $a </>
</> IN “www.a.b.c/bib.xml
CONSTRUCT <result>
<author> $a </>
<title> $t</>
</>
![Page 15: 1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,](https://reader031.vdocument.in/reader031/viewer/2022032801/56649d555503460f94a33300/html5/thumbnails/15.jpg)
15
Grouping with Nested Queries
WHERE <book>
<title> $t </>,
<publisher><name>Addison-Wesley</></>
</> CONTENT_AS $p IN “www.a.b.c/bib.xml”
CONSTRUCT <result>
<titre> $t </>
WHERE <author> $a </> IN $p
CONSTRUCT <auteur> $a</>
</>
![Page 16: 1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,](https://reader031.vdocument.in/reader031/viewer/2022032801/56649d555503460f94a33300/html5/thumbnails/16.jpg)
16
Joining Elements by Value
WHERE <article> <author>
<firstname> $f </> <lastname> $l </>
</> </> ELEMENT_AS $e IN “www.a.b.c/bib.xml”
<book year=$y> <author>
<firstname> $f </> <lastname> $l </>
</> </> IN “www.a.b.c/bib.xml” , y > 1995
CONSTRUCT $e Find all articles whose writers also published a book
after 1995.
![Page 17: 1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,](https://reader031.vdocument.in/reader031/viewer/2022032801/56649d555503460f94a33300/html5/thumbnails/17.jpg)
17
Tag Variables
WHERE <article> <author>
<firstname> $f </> <lastname> $l </>
</> </> ELEMENT_AS $e IN “www.a.b.c/bib.xml”
<$t year=$y> <author>
<firstname> $f </> <lastname> $l </>
</> </> IN “www.a.b.c/bib.xml” , y > 1995
CONSTRUCT $e Find all articles whose writers have done something
after 1995.
![Page 18: 1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,](https://reader031.vdocument.in/reader031/viewer/2022032801/56649d555503460f94a33300/html5/thumbnails/18.jpg)
18
Regular Path Expressions
WHERE
<part*>
<name>$r</>
<brand>Ford</> </>
IN "www.a.b.c/bib.xml"
CONSTRUCT
<result>$r</>Find all parts whose brand is Ford, no matter what level
they are in the hierarchy.
![Page 19: 1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,](https://reader031.vdocument.in/reader031/viewer/2022032801/56649d555503460f94a33300/html5/thumbnails/19.jpg)
19
Regular Path Expressions
WHERE
<part+.(subpart|component.piece)>$r</>
IN "www.a.b.c/parts.xml"
CONSTRUCT
<result> $r </>
![Page 20: 1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,](https://reader031.vdocument.in/reader031/viewer/2022032801/56649d555503460f94a33300/html5/thumbnails/20.jpg)
20
XML Data Integration
WHERE <person>
<name></> ELEMENT_AS $n
<ssn> $ssn </>
</> IN “www.a.b.c/data.xml”
<taxpayer>
<ssn> $ssn </>
<income></> ELEMENT_AS $I
</> IN “www.irs.gov/taxpayers.xml”
CONSTRUCT <result> $n $I </>
Query can access more than one XML document.
![Page 21: 1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,](https://reader031.vdocument.in/reader031/viewer/2022032801/56649d555503460f94a33300/html5/thumbnails/21.jpg)
21
Query Processing For XML• Approach 1: store XML in a relational database.
Translate an XML-QL query into a set of SQL queries.– Leverage 20 years of research & development.
• Approach 2: store XML in an object-oriented database system.– OO model is closest to XML, but systems do not perform
well and are not well accepted.
• Approach 3: build an entire DBMS tailored to XML.– Still in the research phase.