slide 1 web-base management systems aaron brown and david oppenheimer cs294-7 february 11, 1999

26
Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999

Upload: lee-mitchell

Post on 18-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999

Slide 1

Web-Base Management Systems

Aaron Brown and David Oppenheimer

CS294-7

February 11, 1999

Page 2: Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999

Slide 2

Introduction

• Online data is stored in both databases (relational) and web sites (hypertext)

• Need single framework to manage both types of data and present integrated views

• Solution: Web Base Management Systems (WBMSs)

– 2 challenges1) querying and extracting structure from semi-

structured web data, transforming it, and presenting custom views

2) mapping structured database data to the web (adding navigational access paths, redundancy, ...)

– To address these challenges, we need a data model that maps between relational and hypertextual models

Page 3: Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999

Slide 3

ARANEUS Data Models

Relational ADM HTML

Navigational access

Structure

Page 4: Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999

Slide 4

ARANEUS Data Model• ADM = Logical data model for web hypertexts

– Based on page schemes and navigational access paths– Page scheme = logical structure shared by a set of pages

» Like a “class”– Web page = instance of page scheme

» Like an “object” with identifier (URL) + attributes

Page 5: Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999

Slide 5

ADM Example Fragment

Page 6: Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999

Slide 6

Adding Structure to HTML

Relational ADM HTML

Navigational access

Structure

Page 7: Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999

Slide 7

EDITOR: Structuring HTML

• EDITOR starts with an existing ADM scheme– Generated by inspection of web site

• EDITOR maps web page text to attributes of an ADM page scheme

– “Wrapping” a web page– Imposes structure on web pages

• EDITOR uses a procedural language to guide the wrapping process

– Each page seen as object with extraction methods» One method for each attribute of page» Method accesses page’s HTML source, extracts value of

corresponding attribute

Page 8: Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999

Slide 8

Querying ADM-Structured Hypertext

Relational ADM HTML

Navigational access

Structure

Page 9: Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999

Slide 9

ULIXES: A Navigational Query Lang.

• Language for defining relational views over hypertext that follows an ADM scheme

– Based on navigational expressions (path expressions)

• DEFINE TABLE statement creates relational views based on page schemes

– local materialized view (tuples) or– virtual view

» user can then pose SQL queries across multiple views» optimizer chooses optimal navigation path through site

to satisfy query• fetches hypertext pages and extracts attributes via

EDITOR wrappers• cost metric is number of HTML page fetches

Page 10: Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999

Slide 10

ULIXES ExampleDEFINE TABLE VLDBPapers (Authors, Title, Reference)

AS AuthorSearchPage.NameForm.Submit ->

AuthorPage.WorkList

IN DBLPScheme

USING AuthorPage.WorkList.Authors, AuthorPage.WorkList.Title, AuthorPage.WorkList.Reference

WHERE AuthorSearchPage.NameForm.Name = ‘Leonardo Da Vinci’AuthorPage.WorkList.Reference LIKE ‘%VLDB%’

Page 11: Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999

Slide 11

Generating ADM from existing DB

Relational ADM HTML

Navigational access

Structure

Page 12: Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999

Slide 12

The ARANEUS Design Methodology

Database Conceptual Design (ER)

Hypertext Conceptual Design (NCM)

Hypertext Logical Design (ADM)

DB Mapping (PENELOPE)

+ Page Design (HTML)

Database Logical Design (relational)

Web Site Generation

Page 13: Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999

Slide 13

Database Conceptual Model

• Starting point for database design• Conceptual description of a domain• Represents essential properties of data

abstractly• Entity-Relationship Model

– Based on entities and relationships among entities– Rectangles = entity sets

» Associated attributes are connected with lines

– Diamonds = relationship sets» Lines connect entity sets via relationship sets

Page 14: Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999

Slide 14

ER Example

Page 15: Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999

Slide 15

Hypertext Conceptual Design

• ER not suitable for modeling hypertext– no directed paths (links)– hypertext access paths not modeled (web page

hierarchies)– no way to group related entities into a singe

“macroentity”

• Navigational Conceptual Model (NCM) describes these conceptual properties of hypertext

– macroentities (groups of related ER entites) model hypertext nodes

» associated with simple (atomic) or complex (structured) attributes, either mono- or multi-valued

– directed relationships model links (may be bidirectional)– union nodes model link targets that can be of different

types– aggregations model hierarchical access paths

Page 16: Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999

Slide 16

Mapping ER to NCM: Example

Seminar

Professor

SPlace Responsible

Room

Room#

Name Phone

Title Speaker Date

1:11:11:N 1:N

ER Model

Department

General EducationResearch

Seminar

People

ProfessorResponsible 1:1

Room#TitleSpeakerDate

Name Phone

NCM Model

Page 17: Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999

Slide 17

Mapping NCM to ADM

1) macroentity -> one or more pages single-valued attribute -> ADM simple attribute multi-valued attribute -> ADM list

2) directed relationship -> link to another page scheme

– anchor = a descriptive key of target macroentity– reference = URL of target page scheme

3) aggregation node -> ADM “unique” page scheme– unique page scheme = page scheme with only one instance

4) long lists -> forms– list items retrieved through program running on server

Page 18: Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999

Slide 18

Mapping NCM to ADM: Example

Page 19: Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999

Slide 19

The ARANEUS Design Methodology

Database Conceptual Design (ER)

Hypertext Conceptual Design (NCM)

Hypertext Logical Design (ADM)

DB Mapping (PENELOPE)

+ Page Design (HTML)

Database Logical Design (relational)

Web Site Generation

Page 20: Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999

Slide 20

Generating web site from ADM + DB

Relational ADM HTML

Navigational access

Structure

Page 21: Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999

Slide 21

Hypertext Views of DB Data

• Given a database and an ADM scheme for it– database may be local

» derived from design methodology» uses derived ADM scheme

– composed from one or more remote sites» derived from integrated relational view produced by

one or more ULIXES queries» uses new ADM scheme concocted to match integrated

view

• PENELOPE language used to integrate ADM and DB in a generated hypertext

– PENELOPE description = ADM augmented with URL’s and references to database fields

Page 22: Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999

Slide 22

PENELOPE Description

• Query: reorganize (Da Vinci’s VLDB) papers based on year

DEFINE PAGE YearPageAS URL URL(<Year>);

Year: TEXT<Year>;WorkList: LIST OF (Authors: TEXT <Authors>;

Title: TEXT <Title>; Reference: TEXT <Reference>; ToRefPage: LINK TO ConferencePage UNION

JournalPage <ToRefPage>);FROM DaVinciPapers

DEFINE PAGE DaVinciYearsPage UNIQUEAS URL ‘result.html’;

YearList: LIST OF (Year: TEXT <Year>; ToYearPage:LINK TO YearPage

(URL(<Year>)));FROM DaVinciPapers

Page 23: Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999

Slide 23

Derived Hypertext View

Page 24: Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999

Slide 24

Resulting Web Pages

Page 25: Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999

Slide 25

Retrospective• Exceptions during wrapping

– Logically homogenous pages may be physically heterogeneous

» Different ways of laying out the same information» Errors masked by browsers

• ULIXES syntax is difficult for beginners– Alternatives

» Fill out forms corresponding to pre-determined ULIXES queries

» Developed POLYPHEMUS query interface• User selects path for query by clicking on graphical

representation of ADM page schemes

• Push vs. Pull– Either supported; hybrid model preferred– Dealing with updates

» each DB update generates a mixed transaction that updates both the DB and any pushed (static) HTML pages

• Managing internal sites– PENELOPE-generated HTML includes description of page

scheme and tags attributes» Like XML but uses HTML comments

Page 26: Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999

Slide 26

Conclusion

• ARANEUS provides database-like functionality for mixed web/relational DB data

• More to be filled in later...

Relational ADM HTML