introduction to xquery bob ducharme [email protected] these slides:

26
Introduction to XQuery Bob DuCharme www.snee.com/bob [email protected] these slides: www.snee.com/xml

Upload: laila-mccormick

Post on 29-Mar-2015

229 views

Category:

Documents


5 download

TRANSCRIPT

  • Slide 1

Introduction to XQuery Bob DuCharme www.snee.com/bob [email protected] these slides: www.snee.com/xml Slide 2 What is XQuery? A query language that uses the structure of XML intelligently can express queries across all these kinds of data, whether physically stored in XML or viewed as XML via middleware. This specification describes a query language called XQuery, which is designed to be broadly applicable across many types of XML data sources. XQuery 1.0: An XML Query Language W3C Working Draft Slide 3 History February 1998: XML (Rec) November 1999: XSLT 1.0, Xpath 1.0 (Recs) (as of 8 June 2005): XPath 2.0, XSLT 2.0, XQuery 1.0 in last call Working Draft status Steps for a W3C standard: Working Draft Last Call Working Draft Candidate Recommendation Proposed Recommendation Recommendation Slide 4 input1.xml sample document This is a sample file. This line really has an inline element. This line doesn't. Do you like inline elements? Slide 5 Our first query Querying from the command line: java net.sf.saxon.Query " {doc('input1.xml')//p[emph]} " Result: This line really has an inline element. Do you like inline elements? Slide 6 Query stored in a file xq1.xqy: (: Here is an XQuery comment. :) doc('data1.xml')//p[emph] Executing it: java net.sf.saxon.Query xq1.xqy Slide 7 Simplifying the command line Linux shell script xquery : java net.sf.saxon.Query $1 $2 $3 $4 $5 $6 Windows batch file xquery.bat : java net.sf.saxon.Query %1 %2 %3 %4 %5 %6 (assuming saxon8.jar is in classpath) Executing either: xquery xq1.xqy Slide 8 Data for more serious examples RecipeML: DTD and documentation http://www.formatdata.com/recipeml Squirrel's RecipeML Archive http://dsquirrel.tripod.com/recipeml/indexrecipes2.html My sample: 294 files Slide 9 RecipeML: typical structure Walnut Vinaigrette Dressings 1 1 cup Canned No Salt Chicken Bring chicken broth to a boil. Slide 10 Saxon and collection() function Argument to function names document in this format: Slide 11 Looking for some sugar collection('recipeml/docs.xml')/recipeml/ recipe/head/title [//ingredients/ing/item[contains(.,'sugar')]] Slide 12 A more SQL-like approach for $ingredient in collection('recipeml/docs.xml')// ingredients/ing/item[contains(.,'sugar')] return $ingredient/../../../head/title Slide 13 Outputting well-formed XML { let $target := 'sugar' for $ingredient in collection('recipeml/docs.xml')// ingredients/ing/item[contains(., $target )] return $ingredient/../../../head/title } Slide 14 FLWOR expressions for let where order by return "a FLWOR expression... supports iteration and binding of variables to intermediate results. This kind of expression is often useful for computing joins between two or more documents and for restructuring data." Slide 15 Extracting subsets: XPath vs. FLWOR approach Get the title element for each recipe whose yield is greater than 20: collection('recipeml/docs.xml')/recipeml/ recipe/head/title[../yield > 20] Go through all the documents in the collection, and for any with a yield of more than 20, get the title: for $doc in collection('recipeml/docs.xml')/recipeml where $doc/recipe/head/yield > 20 return $doc/recipe/head/title Slide 16 Doing more with the for clause variable (: Create an HTML page linking to recipes that serve more than 20 people. :) Food for a Crowd Food for a Crowd { for $doc in collection('recipeml/docs.xml') where $doc /recipeml/recipe/head/yield > 20 return { $doc /recipeml/recipe/head/title/text()} } Slide 17 Calling functions from a let clause (: Which recipe(s) serves the most people? :) let $maxYield := max(collection('recipeml/docs.xml')/recipeml/ recipe/head/yield) return collection('recipeml/docs.xml')/recipeml/ recipe[head/yield = $maxYield] Slide 18 distinct-values and order by (: A unique, sorted list of all unique ingredients in the recipe collection, with URLS to link to the recipes. :) { for $ingr in distinct-values( collection('recipeml/docs.xml')/ recipeml/recipe/ingredients/ing/item ) order by $ingr return { for $doc in collection('recipeml/docs.xml') where $doc/recipeml/recipe/ ingredients/ing/item = $ingr Slide 19 distinct-values and order by, continued return {$doc/recipeml/recipe/head/title/ text() } } } Slide 20 "Best Ever" Pizza Sauce "Best Ever" Pizza Sauce"Blondie" BrowniesWalnut Pound Cake"Faux" Sourdough"Indian Chili""Best" Apple Nut Pudding"Gold Room" Scones"Outrageous" Chocolate-Oatmeal Chipper (Cooki"First" Ginger Molasses Cookies"Foot in the Fire" Chocolate Cake"Frank's Place" Crawfish Etouff'ee"Hamburger" / Ground Meat Balti"Indian Chili"">"Best Ever" Pizza Sauce"Best Ever" Pizza Sauce Excerpt from output"Best Ever" Pizza Sauce"Blondie" BrowniesWalnut Pound Cake"Faux" Sourdough"Indian Chili""Best" Apple Nut Pudding"Gold Room" Scones"Outrageous" Chocolate-Oatmeal Chipper (Cooki"First" Ginger Molasses Cookies"Foot in the Fire" Chocolate Cake"Frank's Place" Crawfish Etouff'ee"Hamburger" / Ground Meat Balti"Indian Chili" Slide 21 RecipeML: varying markup richness One way to do it: (12-oz) tomato paste Another way: 12 oz tomato paste Slide 22 Normalizing data with declared functions (: A unique, sorted list of all unique ingredients in the recipe collection, with URLs to link to them. Ingredient names get normalized by functions declared in the query prolog. :) declare namespace sn = "http://www.snee.com/ns/misc/" ; declare function sn:normIngName($ingName) as xs:string { (: Normalize ingredient name. :) (: remove parenthesized expression that may begin string, e.g. in "(10 ozs) Rotel diced tomatoes":) let $normedName := replace($ingName,"^\(.*?\)\s*","") (: convert to all lower-case :) let $normedName := lower-case($normedName) (: replace multiple spaces with a single one :) let $normedName := normalize-space($normedName) return $normedName }; Slide 23 Normalizing data with functions, part 2 of 3 declare function sn:normIngList($ingList) as item()* { (: Normalize a list of ingredient names. :) for $ingName in $ingList return sn:normIngName($ingName) }; { let $normIngNames := sn:normIngList(collection('recipeml/docs.xml')// ing/item) Slide 24 Normalizing data with functions, part 3 of 3 for $ingr in distinct-values($normIngNames) order by $ingr return { for $doc in collection('recipeml/docs.xml'), $i in $doc/recipeml/recipe/ingredients/ing/item where sn:normIngName($i) = $ingr return {$doc/recipeml/recipe/head/title/text()} } } Slide 25 Specs at http://www.w3.org/tr XQuery 1.0: An XML Query Language XQuery 1.0 and XPath 2.0 Formal Semantics the XQuery 1.0 and XPath 2.0 Data Model XSLT 2.0 and XQuery 1.0 Serialization XQuery 1.0 and XPath 2.0 Functions and Operators XML Query Use Cases Slide 26 Other resources eXist: http://www.exist-db.org http:ww/w3.org/TR: MarkLogic: http://www.marklogic.com Mike Kay Comparing XSLT and XQuery: http://idealliance.org/proceedings/xtech05/pap ers/02-03-01/ http:ww/w3.org/TR: XQuery Update Requirements XQuery 1.0 and XPath 2.0 Full-Text