indexing semistructured data j. mchugh, j. widom, s. abiteboul, q. luo, and a. rajaraman stanford...

15
Indexing Semistructured Data J. McHugh, J. Widom, S. Abiteboul, Q. Luo, and A. Rajaraman Stanford University January 1998 http://www-db.stanford.edu/lore/ EECS 684 02/21/2000 Presented by Weiming Zhou

Post on 15-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Indexing Semistructured Data J. McHugh, J. Widom, S. Abiteboul, Q. Luo, and A. Rajaraman Stanford University January 1998

Indexing Semistructured Data

J. McHugh, J. Widom, S. Abiteboul,

Q. Luo, and A. Rajaraman

Stanford University

January 1998

http://www-db.stanford.edu/lore/

EECS 684 02/21/2000 Presented by Weiming Zhou

Page 2: Indexing Semistructured Data J. McHugh, J. Widom, S. Abiteboul, Q. Luo, and A. Rajaraman Stanford University January 1998

Outline

• Introduction

- Data Model

- Query Language• Indexes in Lore• Query plans using indexes• Conclusions

Page 3: Indexing Semistructured Data J. McHugh, J. Widom, S. Abiteboul, Q. Luo, and A. Rajaraman Stanford University January 1998

Data Model - Object Exchange Model (OEM)

Page 4: Indexing Semistructured Data J. McHugh, J. Widom, S. Abiteboul, Q. Luo, and A. Rajaraman Stanford University January 1998

The Lorel Query Language (Lorel)

Example 1select DB.Movie.Titlewhere DB.Movie.Actor.Name = “Harrison Ford”

Example 2select Tfrom DB.Movie M, M.Title Twhere exists A in M.Actor : exists N in A.Name

: N = “Harrison Ford”

Page 5: Indexing Semistructured Data J. McHugh, J. Widom, S. Abiteboul, Q. Luo, and A. Rajaraman Stanford University January 1998

Indexes In Lore

• Value index

• Text index

• Link index

• Path index

• Edge index

Page 6: Indexing Semistructured Data J. McHugh, J. Widom, S. Abiteboul, Q. Luo, and A. Rajaraman Stanford University January 1998

Value index

Similar to attribute indexes in Relational DBMS

Example

Suppose we create a Value index for DB.Movie.Year

If we perform a lookup for DB.Movie.Year = “1956”, Result: &12.

Page 7: Indexing Semistructured Data J. McHugh, J. Widom, S. Abiteboul, Q. Luo, and A. Rajaraman Stanford University January 1998

Text Index

• An information-retrieval style keyword search.• Restricted by incoming labels.• Locates string values containing specific words.• Useful for strings containing a significant amount of text.

Implementation:Inverted lists - map a given word w and label l to a list of atomic values with incoming edge l that contain word w.

Example: Lookup for all objects with an atomic string value containing theword “Ford" and an incoming edge Name.Results: {<&17, 2>, <&21, 2>}.

Page 8: Indexing Semistructured Data J. McHugh, J. Widom, S. Abiteboul, Q. Luo, and A. Rajaraman Stanford University January 1998

Link Index

• Locates parents of a given object.• Serves as back-pointers

Implementation• Extendible hashing• One Link Index for the entire database graph

Example The Link Index lookup for object &17 returns parent object &6, and the lookup for object &21 returns object &13.

Page 9: Indexing Semistructured Data J. McHugh, J. Widom, S. Abiteboul, Q. Luo, and A. Rajaraman Stanford University January 1998

Path Index

Locate all objects reachable by a given labeled path.

Provided by DataGuide.

Exampleselect DB.Movie.Title Using the Path Index to directly locate all objects reachable via DB.Movie.Title.

Results: &5; &9; &14.

Page 10: Indexing Semistructured Data J. McHugh, J. Widom, S. Abiteboul, Q. Luo, and A. Rajaraman Stanford University January 1998

Edge Index

All parent-child pairs connected via a specified label.

Example

Look up label “Year” in Edge Index

Results: &2-&7, &3-&12

Page 11: Indexing Semistructured Data J. McHugh, J. Widom, S. Abiteboul, Q. Luo, and A. Rajaraman Stanford University January 1998

Query Plans Using Indexes

• Top-Down• Bottom-Up• Hybrid

Example select Tfrom DB.Movie M, M.Title Twhere exists A in M.Actor : exists N in A.Name

: N = “Harrison Ford”

Page 12: Indexing Semistructured Data J. McHugh, J. Widom, S. Abiteboul, Q. Luo, and A. Rajaraman Stanford University January 1998

Top-Down Query Plan

Exhaustive Top-down traversalsDB.Movie.Actor.Name = “Harrison Ford” &17, &21 Link Index &17 &2, &21 &4DB.Movie.Title &5, &14

Page 13: Indexing Semistructured Data J. McHugh, J. Widom, S. Abiteboul, Q. Luo, and A. Rajaraman Stanford University January 1998

Bottom-Up Query Plan

Look up Value Index DB.Movie.Actor.Name = “Harrison Ford” &17, &21Link Index &17 &2, &21 &4DB.Movie.Title &5, &14

Page 14: Indexing Semistructured Data J. McHugh, J. Widom, S. Abiteboul, Q. Luo, and A. Rajaraman Stanford University January 1998

Hybrid Query Plan

select Xfrom A.B Xwhere exists Y in X.C : Y =5

Bottom-up: Value Index A.B.C = “5”

Top-down: A.B

Intersect

Page 15: Indexing Semistructured Data J. McHugh, J. Widom, S. Abiteboul, Q. Luo, and A. Rajaraman Stanford University January 1998

Conclusions

• Presents Lore’s indexing structures: Value

Index, Text Index, Link Index, Path Index

and Edge Index.

• Query plans using indexes

• Preliminary performance results:

at least an order of magnitude improvement

when indexes are used for query processing.