reference book principles of distributed database system chapters chapter 12: distributed dbms...

Reference Book Reference Book Principles of Distributed Database Principles of Distributed Database

SystemSystem

Chapters Chapters Chapter 12: Distributed DBMS Reliability Chapter 12: Distributed DBMS Reliability Chapter 14: Distributed Object Database Chapter 14: Distributed Object Database

Management Systems Management Systems Chapter 16: Current Issues Chapter 16: Current Issues

Preethi VishwanathPreethi Vishwanath

Week 2 : 12Week 2 : 12thth September 2006 –24 September 2006 –24thth September 2006 September 2006

Reliability concepts - definitionsReliability concepts - definitionsSystemSystem refers to a mechanism that consists of a collection of refers to a mechanism that consists of a collection of

components and interacts with its environment with a recognizable components and interacts with its environment with a recognizable pattern of behavior.pattern of behavior.

Each component of a system is itself a system, commonly called a Each component of a system is itself a system, commonly called a subsystemsubsystem..

The way components of a system are put together is called the The way components of a system are put together is called the designdesign of the system.of the system.

An An external stateexternal state of a system can be defined as the response that a of a system can be defined as the response that a system gives to an external stimulus.system gives to an external stimulus.

The behavior of the system in providing response to all the possible The behavior of the system in providing response to all the possible stimuli from the environment needs to be laid out in an authoritative stimuli from the environment needs to be laid out in an authoritative specificationspecification of its behavior. of its behavior.

Any deviation of a system from the behavior described in the Any deviation of a system from the behavior described in the specification is considered a specification is considered a failurefailure..

Some transactions could cause system failure, such internal states are Some transactions could cause system failure, such internal states are called called erroneous stateserroneous states..

Any error in the internal states of the components of a system or in the Any error in the internal states of the components of a system or in the design of a system is called a design of a system is called a faultfault in the system. in the system.

A permanent fault, also called a hard fault, is one that reflects an A permanent fault, also called a hard fault, is one that reflects an irreversible change in the behavior of the system.irreversible change in the behavior of the system.

ReliabilityReliability– Reliability refers to the Reliability refers to the

probability that the system probability that the system under consideration does not under consideration does not experience any failures in a experience any failures in a given time interval.given time interval.

– R(t) = Pr{0 failure in time [0,t] R(t) = Pr{0 failure in time [0,t] no failures at t = 0}no failures at t = 0}

where R(t) : reliability of the where R(t) : reliability of the systemsystem

AvailabilityAvailability– Refers to the probability that Refers to the probability that

the system is operational the system is operational according to its specification at according to its specification at a given point in time ta given point in time t

– A = µ / A = µ / حح + µ+ µwhere where حح is a failure rateis a failure rate

µ is a mean repair timeµ is a mean repair time

Mean Time between FailuresMean Time between Failures– Is the expected time between Is the expected time between

subsequent failures in a subsequent failures in a system with repair.system with repair.

– Can be calculated either from Can be calculated either from empirical data or from the empirical data or from the reliability functionreliability function

– Is related to the failure rateIs related to the failure rate– MTBF = ∫MTBF = ∫∞∞

0 0 R(t) dtR(t) dt

Mean Time to repairMean Time to repair– Expected time to repair a Expected time to repair a

failed system.failed system.– Is related to the repair rate Is related to the repair rate

Steady State availability of a Steady State availability of a system with exponential failure system with exponential failure and repair rates can be and repair rates can be specified asspecified asA = MTBF/(MTBF + MTTR)A = MTBF/(MTBF + MTTR)

Reasons for FailureReasons for Failure

Study Conducted at Stanford Linear

accelerator

Environment

Operations

Hardware

Sof tware

Tandem Data

Environment

hardware

sof tware

maintainence

operations

SE SS Swi tc h Data

Unknown

Hardware

Sof tware

Operations

Fault toleranceFault tolerance– Refers to a system design approach which recognizes that faults will occurRefers to a system design approach which recognizes that faults will occur

Fault prevention/Fault intoleranceFault prevention/Fault intolerance– Aim at ensuring that the implemented system will not contain any faultsAim at ensuring that the implemented system will not contain any faults– Two aspectsTwo aspects

Fault avoidanceFault avoidance– Refers to the techniques used to make sure that faults are not introduced into the systemRefers to the techniques used to make sure that faults are not introduced into the system– Involve detailed design methodologies such as design walkthroughs, design inspections etc..Involve detailed design methodologies such as design walkthroughs, design inspections etc..

Fault removalFault removal– Refers to the techniques that are employed to detect any faults that might have remained in the system despite the Refers to the techniques that are employed to detect any faults that might have remained in the system despite the

application of fault avoidance and removed these faults.application of fault avoidance and removed these faults.

Fault detectionFault detection– Issue a warning when a failure occurs but do not provide any means of tolerating the failure.Issue a warning when a failure occurs but do not provide any means of tolerating the failure.

Latent FailureLatent Failure– One that is detected some time after its occurrenceOne that is detected some time after its occurrence

Mean time to detectMean time to detect– Average error latency time over a number of identical systems.Average error latency time over a number of identical systems.

Fail-stop modulesFail-stop modules– Constantly monitors itself and when it detects a fault, shuts itself down automaticallyConstantly monitors itself and when it detects a fault, shuts itself down automatically

Fail-fastFail-fast– Implemented in software by defensive programming, where each software module checks Implemented in software by defensive programming, where each software module checks

its own state during state transactions.its own state during state transactions.

Different ways of implementing process pairsDifferent ways of implementing process pairs– Lock-stepLock-step– Automatic check pointingAutomatic check pointing– State check pointingState check pointing– Data check pointingData check pointing– Persistent process pairsPersistent process pairs

Failure in Distributed DBMSFailure in Distributed DBMSSite(System) FailuresSite(System) Failures– Always assumed to result in Always assumed to result in

the loss of main memory the loss of main memory contents.contents.

– Total failures, refers to the Total failures, refers to the simultaneous failure of all sites simultaneous failure of all sites in the distributed system.in the distributed system.

– Partial Failure indicates the Partial Failure indicates the failure of only some sites while failure of only some sites while the others remain operational.the others remain operational.

Transaction FailureTransaction Failure– Incorrect input dataIncorrect input data– Detection of present or Detection of present or

potential deadlockpotential deadlock– Usual approach to take in Usual approach to take in

cases of transaction failure is cases of transaction failure is to abort the transaction.to abort the transaction.

Media FailuresMedia Failures– Refers to the failures of the Refers to the failures of the

secondary storage devices secondary storage devices that store the database.that store the database.

– Duplexing of disk storage and Duplexing of disk storage and maintaining archival copies of maintaining archival copies of the database are common the database are common techniques that deal with this techniques that deal with this sort of catastrophic problem.sort of catastrophic problem.

Communication FailureCommunication Failure– Unique to the distributed case.Unique to the distributed case.– Most common ones are the Most common ones are the

errors in the messages, errors in the messages, improperly ordered messages, improperly ordered messages, lost messages and line lost messages and line failuresfailures

– The term for the failure of the The term for the failure of the communication network to communication network to deliver messages and the deliver messages and the confirmations within this confirmations within this period is performance failure period is performance failure

Interface between the local recovery manager and the Interface between the local recovery manager and the buffer managerbuffer manager

Stable database

Local Recovery Manager

Database BufferManager

Database Buffers

(VolatileDatabase)

Recovery InformationRecovery InformationIn-Place Update Recovery In-Place Update Recovery InformationInformation– Necessary to store info about Necessary to store info about

database state changes, inorder to database state changes, inorder to recover back.recover back.

– Recorded in the database logRecorded in the database log– REDO ActionREDO Action

Database needs to include sufficient Database needs to include sufficient data to permit the undo by taking the data to permit the undo by taking the old database state and recover the old database state and recover the new statenew state

– UNDO ActionUNDO ActionDatabase needs to include sufficient Database needs to include sufficient data to permit the undo by taking the data to permit the undo by taking the new database state and recover the new database state and recover the old state.old state.

Out-of-place update recovery Out-of-place update recovery informationinformation– Typical techniques Typical techniques

ShadowingShadowingEvery time an update is made, the old Every time an update is made, the old

stable storage page, called shadow stable storage page, called shadow page is left intact and a new page page is left intact and a new page with the updated data item values is with the updated data item values is written into the stable database.written into the stable database.Differential filesDifferential files

Network Partitioning Network Partitioning – Simple partitionSimple partition

Network is divided into only two Network is divided into only two componentscomponents

– Multiple partitioningMultiple partitioningNetwork is divided into more than two Network is divided into more than two

componentscomponents

Centralized ProtocolsCentralized Protocols– Primary SitePrimary Site

Makes sense to permit the operation of the Makes sense to permit the operation of the partition that contains the primary site, partition that contains the primary site, since it manages the lock.since it manages the lock.

– Primary copyPrimary copyMore than one partition may be operational More than one partition may be operational

for different queries.for different queries.

Voting-based ProtocolsVoting-based Protocols– Transactions are executed if a majority Transactions are executed if a majority

of the sites vote to execute it.of the sites vote to execute it.– Quorum-based voting can be used as Quorum-based voting can be used as

a replica control method, as well as a a replica control method, as well as a commit method to ensure transaction commit method to ensure transaction atomicity in the presence of network atomicity in the presence of network partitioning.partitioning.

– In case of non replicated databases, In case of non replicated databases, this involves the integration of the this involves the integration of the voting principle with commit protocols.voting principle with commit protocols.

2 Phase Commit Protocol2 Phase Commit ProtocolThe two phase commit protocol is a distributed algorithm which lets all The two phase commit protocol is a distributed algorithm which lets all sites in a distributed system agree to sites in a distributed system agree to commit commit a transaction.a transaction.The protocol results in either all nodes committing theThe protocol results in either all nodes committing the transaction transaction or or aborting, even in the case of site failures and message losses. aborting, even in the case of site failures and message losses. Basic AlgorithmBasic Algorithm

– Commit-request phaseCommit-request phase

1.1. The coordinator sends a The coordinator sends a query to commitquery to commit message to all cohorts. message to all cohorts. 2.2. The cohorts execute the transaction up to the point where they will be The cohorts execute the transaction up to the point where they will be

asked to commit. They each write an entry to their asked to commit. They each write an entry to their undo logundo log and an entry and an entry to their to their redo logredo log. .

3.3. Each cohort replies with an Each cohort replies with an agreementagreement message if the transaction message if the transaction succeeded, or an succeeded, or an abortabort message if the transaction failed. message if the transaction failed.

4.4. The coordinator waits until it has a message from each cohortThe coordinator waits until it has a message from each cohort

Commit phaseCommit phase– SuccessSuccess

If the coordinator received an If the coordinator received an agreementagreement message from message from allall cohorts cohorts during the commit-request phase:during the commit-request phase:

1.1. The coordinator writes a commit record into its log. The coordinator writes a commit record into its log. 2.2. The coordinator sends a The coordinator sends a commitcommit message to all the cohorts. message to all the cohorts. 3.3. Each cohort completes the operation, and releases all the locks and Each cohort completes the operation, and releases all the locks and

resources held during the transaction. resources held during the transaction. 4.4. Each cohort sends an Each cohort sends an acknowledgementacknowledgement to the coordinator. to the coordinator. 5.5. The coordinator completes the transaction when acknowledgements The coordinator completes the transaction when acknowledgements

have been received.have been received.– FailureFailure1.1. If If anyany cohort sent an cohort sent an abortabort message during the commit-request message during the commit-request

phase:phase:2.2. The coordinator sends a The coordinator sends a rollbackrollback message to all the cohorts. message to all the cohorts. 3.3. Each cohort undoes the transaction using the undo log, and Each cohort undoes the transaction using the undo log, and

releases the resources and locks held during the transaction. releases the resources and locks held during the transaction. 4.4. Each cohort sends an Each cohort sends an acknowledgementacknowledgement to the coordinator. to the coordinator. 5.5. The coordinator completes the transaction when The coordinator completes the transaction when

acknowledgements have been received.acknowledgements have been received.

3 Phase Commit3 Phase CommitNon blocking when failures are restricted to site failuresNon blocking when failures are restricted to site failuresA commit protocol that is synchronous within one state A commit protocol that is synchronous within one state transition is nonblocking if and only if its state transition transition is nonblocking if and only if its state transition diagram contains neither of the following.diagram contains neither of the following.– No state that is “adjacent” to both a commit and an abort state.No state that is “adjacent” to both a commit and an abort state.– No noncommittal state that is “adjacent” to a commit state.No noncommittal state that is “adjacent” to a commit state.

Replication and Replica Control ProtocolsReplication and Replica Control Protocols

Having replicas of data items improves system Having replicas of data items improves system availability.availability.

AdvantagesAdvantages– With careful design, it is possible to ensure that single points of With careful design, it is possible to ensure that single points of

failure are eliminatedfailure are eliminated– Overall system availability is maintained even when one or more Overall system availability is maintained even when one or more

sites fail. sites fail.

DisadvantagesDisadvantages– Whenever updates are introduced, the complexity of keeping Whenever updates are introduced, the complexity of keeping

replicas consistent arises and this is the topic of replication replicas consistent arises and this is the topic of replication protocols.protocols.

ConceptsConceptsObjectObject– Represents a real entity in the Represents a real entity in the

system system – Represented as a pair (object Represented as a pair (object

Identity, state)Identity, state)– Enables referential object sharing.Enables referential object sharing.

StateState– Either an atomic value or a Either an atomic value or a

constructed valueconstructed valueValueValue– An element of D is a value, called an An element of D is a value, called an

atomic valueatomic value– [a1:v1,…,an:vn], in which ai is an [a1:v1,…,an:vn], in which ai is an

element of A and vi is either a value element of A and vi is either a value or an element of I, is called a tuple or an element of I, is called a tuple value.value.

– {v1,..,vn}, in which vi is either a value {v1,..,vn}, in which vi is either a value or an element of I, is called a set or an element of I, is called a set value.value.

ClassClass– Grouping of common objectsGrouping of common objects– Template for all common objectsTemplate for all common objects

InheritanceInheritance– Declaring a type to be a subtype of Declaring a type to be a subtype of

another.another.

Abstract Data TypesAbstract Data Types– Template for all objects of that type.Template for all objects of that type.– Describes type of data by providing a Describes type of data by providing a

domain of data with the same domain of data with the same structure, as well as operations structure, as well as operations applicable to the objects of that applicable to the objects of that domain.domain.

– Abstraction capability commonly Abstraction capability commonly referred as encapsulation.referred as encapsulation.

Composition (Aggregation)Composition (Aggregation)– Restriction on composite objects Restriction on composite objects

results in complex objectsresults in complex objects– The composite object relationship The composite object relationship

between types can be represented by between types can be represented by a composition graph.a composition graph.

CollectionCollection– User defined grouping of objectsUser defined grouping of objects– Similar to class in that it groups Similar to class in that it groups

objects. objects. Subtyping Subtyping – Based on specialization relationship Based on specialization relationship

among types.among types.

Object Distribution DesignObject Distribution DesignPath partitioningPath partitioning– A concept describe the clustering A concept describe the clustering

of all the objects forming a of all the objects forming a composite object into a partition.composite object into a partition.

– Can be represented as a Can be represented as a hierarchy of nodes forming a hierarchy of nodes forming a structural index.structural index.

– Index contains the references to Index contains the references to all the component objects of a all the component objects of a composite object, eliminating the composite object, eliminating the need to traverse the class need to traverse the class composition hierarchy.composition hierarchy.

Class Partitioning AlgorithmsClass Partitioning Algorithms– Main issue is to improve the Main issue is to improve the

performance of user queries and performance of user queries and applications by reducing the applications by reducing the irrelevant data access.irrelevant data access.

– Affinity based approachAffinity based approachAffinity among instance variables Affinity among instance variables and methods and affinity among and methods and affinity among multiple methods can be used for multiple methods can be used for horizontal and vertical class horizontal and vertical class partitioning.partitioning.

– Cost-Driven ApproachCost-Driven Approach

AllocationAllocation– Local behavior-local objectLocal behavior-local object

Behavior, the object to which it is Behavior, the object to which it is applied, and the arguments are all applied, and the arguments are all co-located.co-located.No special mechanism needed to No special mechanism needed to handle this case. handle this case.

– Local behavior-remote objectLocal behavior-remote objectBehavior, the object to which it is Behavior, the object to which it is applied, and the arguments are all applied, and the arguments are all co-located.co-located.Two ways to dealTwo ways to deal

– Move th remote object to the site Move th remote object to the site where the behavior is located.where the behavior is located.

– Ship the behavior Ship the behavior implementation to the site where implementation to the site where the object is locatedthe object is located

Client-Server ArchitectureClient-Server Architecture

ObjectDatabase

ObjectDatabase

Cache ConsistencyCache ConsistencyProblem in any data shipping system that moves data to the clients.Problem in any data shipping system that moves data to the clients.Cache consistency algorithmsCache consistency algorithms– Avoidance-based synchronous algorithmsAvoidance-based synchronous algorithms

Clients retain read locks across transactions, but they relinquish write locks at the end Clients retain read locks across transactions, but they relinquish write locks at the end of the transaction.of the transaction.The client send lock requests to the server and they block until the server responds.The client send lock requests to the server and they block until the server responds.If the client requests a write lock on a page that is cached at other clients.If the client requests a write lock on a page that is cached at other clients.

– Avoidance-based asynchronous algorithmsAvoidance-based asynchronous algorithmsDo not have the message blocking overhead present in synchronous algorithms.Do not have the message blocking overhead present in synchronous algorithms.Clients send lock escalation messages to the server and continue application Clients send lock escalation messages to the server and continue application processingprocessing

– Avoidance-based deferred algorithmsAvoidance-based deferred algorithmsClients batch their lock escalation requests and send them to the server at commit Clients batch their lock escalation requests and send them to the server at commit time.time.The server blocks the updating client if other clients are reading the updated objects.The server blocks the updating client if other clients are reading the updated objects.

– Detection-based synchronous algorithmsDetection-based synchronous algorithmsClients contact the server whenever they access a page in their cache to ensure that Clients contact the server whenever they access a page in their cache to ensure that the page is not stale or being written to by other clients.the page is not stale or being written to by other clients.

– Detection-based asynchronous algorithmsDetection-based asynchronous algorithmsClients send lock escalation requests to the server, but optimistically assume that their Clients send lock escalation requests to the server, but optimistically assume that their requests will be successful.requests will be successful.After a client transaction commits, the server propagates the updated pages to all the After a client transaction commits, the server propagates the updated pages to all the other clients that have also cached the affected pages.other clients that have also cached the affected pages.

– Detection-based deferred algorithmsDetection-based deferred algorithmsCan outperform callback locking algorithms even while encountering a higher abort rate Can outperform callback locking algorithms even while encountering a higher abort rate if the client transaction state completely fits into the client cache, and all application if the client transaction state completely fits into the client cache, and all application processing is strictly performed at the clients. processing is strictly performed at the clients.

Object Identifier ManagementObject Identifier ManagementObject Identifiers are system generatedObject Identifiers are system generatedUsed to Uniquely identify every objectUsed to Uniquely identify every objectTransient object identity can be implemented more Transient object identity can be implemented more efficientlyefficientlyTwo common solutionsTwo common solutions– Physical Identifier approach (POID)Physical Identifier approach (POID)

Equates the OID with the physical address of the corresponding Equates the OID with the physical address of the corresponding objectobjectAdvantage , the object can be obtained directly from the OID.Advantage , the object can be obtained directly from the OID.Drawback, all the parent objects and indexes must be updated Drawback, all the parent objects and indexes must be updated whenever an object is moved to a different page.whenever an object is moved to a different page.

– Logical Identifier approach (LOID)Logical Identifier approach (LOID)Consists of allocating a system wide unique OID.Consists of allocating a system wide unique OID.Since OIDs are invariant, there is no overhead due to object Since OIDs are invariant, there is no overhead due to object movement.movement.

Object MigrationObject MigrationThree alternatives can be Three alternatives can be considered for the migration of considered for the migration of classes (types)classes (types)

– The source code is moved and The source code is moved and recompiled at the destinationrecompiled at the destination

– The compiled version of a class is The compiled version of a class is migrated just like any other object, migrated just like any other object, oror

– The source code of the class The source code of the class definition is moved, but not its definition is moved, but not its compiled operations, for which a compiled operations, for which a lazy migration strategy us used.lazy migration strategy us used.

Objects can be in one of the four Objects can be in one of the four statesstates

– ReadyReady,,Ready objects are not currently Ready objects are not currently invoked, or have not received a invoked, or have not received a message, but are ready to be message, but are ready to be invoked to receive a message.invoked to receive a message.

– ActiveActiveActive objects are currently Active objects are currently involved in an activity in response involved in an activity in response to an invocation or a messageto an invocation or a message

– WaitingWaitingWaiting objects have invoked Waiting objects have invoked another object and are waiting for another object and are waiting for a response.a response.

– SuspendedSuspendedSuspended objects are Suspended objects are temporarily unavailable for temporarily unavailable for invocation.invocation.

Migration involves two stepsMigration involves two steps

– Shipping the object from the Shipping the object from the source to the destination, and source to the destination, and

– Creating a proxy at the source, Creating a proxy at the source, replacing the original object.replacing the original object.

Object ClusteringObject Clustering

– Difficult for two reasonsDifficult for two reasons

Not orthogonal to object identity implementation. Logical Not orthogonal to object identity implementation. Logical OIDs incur more overhead , but enable vertical partitioning of OIDs incur more overhead , but enable vertical partitioning of classes.classes.Clustering of complex objects along the composition Clustering of complex objects along the composition relationship is more involved because of object sharing .relationship is more involved because of object sharing .

– Given a class graph, there are three basic storage Given a class graph, there are three basic storage models for object clusteringmodels for object clustering

The decomposition storage model, partitions each object The decomposition storage model, partitions each object class in binary relations.class in binary relations.The normalized storage model stores each class as a The normalized storage model stores each class as a separate relation.separate relation.The direct storage model enables multi-class clustering of The direct storage model enables multi-class clustering of complex objects based on the composition relationship.complex objects based on the composition relationship.

Distributed Garbage CollectionDistributed Garbage Collection

– As programs modify objects and remove references, a persistent As programs modify objects and remove references, a persistent object may become unreachable from the persistent roots of the object may become unreachable from the persistent roots of the system when there is no more reference to it. system when there is no more reference to it.

– Basic garbage collection algorithms can be categorized Basic garbage collection algorithms can be categorized reference countingreference counting

– In reference counting, each object has an associated count o referenceIn reference counting, each object has an associated count o reference– Each time a program creates an additional reference that points to an Each time a program creates an additional reference that points to an

object, the object’s count is incremented.object, the object’s count is incremented.– When reference to an object is destroyed, the corresponding count is When reference to an object is destroyed, the corresponding count is

decremented.decremented.tracing-based.tracing-based.

– Mark and sweep algorithmsMark and sweep algorithmsTwo phase algorithmsTwo phase algorithmsFirst phase, mark phase, starts from the root and marks every First phase, mark phase, starts from the root and marks every reachable objectreachable objectOnce all live objects are marked, the memory is examined and Once all live objects are marked, the memory is examined and unmarked objects are reclaimed. unmarked objects are reclaimed.

– Copy-based algorithmsCopy-based algorithmsDivide memory into two disjoint areasDivide memory into two disjoint areasFrom-spaceFrom-space, Programs manipulate from this space, Programs manipulate from this spaceTo-spaceTo-space, left empty, left empty

Object Query Processing – Important issuesObject Query Processing – Important issuesObject Query Processor Object Query Processor ArchitecturesArchitectures– Open OODB projectOpen OODB project

Separation between the user Separation between the user query language parsing query language parsing structures and the operator structures and the operator graph on which the optimizer graph on which the optimizer operatesoperates

– EPOQ projectEPOQ projectApproach to query optimization Approach to query optimization extensibility, where the search extensibility, where the search space is divided into regionsspace is divided into regions

– TIGUKAT projectTIGUKAT projectUses an object approach to Uses an object approach to query processing extensibilityquery processing extensibilityIs an extensible uniform Is an extensible uniform behavioral model characterized behavioral model characterized by a purely behavioral by a purely behavioral semantics and a uniform semantics and a uniform approach to objects.approach to objects.

Query Processing IssuesQuery Processing Issues– Search space and transformation Search space and transformation

rulesrules– Search AlgorithmSearch Algorithm

– Cost FunctionCost FunctionCan be defined recursively based Can be defined recursively based on the algebraic processing tree.on the algebraic processing tree.

– ParameterizationParameterization– Path ExpressionPath Expression– Rewriting and Algebraic Rewriting and Algebraic

OptimizationOptimization– Path IndexesPath Indexes

Query ExecutionQuery Execution– Path IndexesPath Indexes

AlgorithmsAlgorithms1.1. Create an index on each class Create an index on each class

traversedtraversed2.2. Define indexes on objects across Define indexes on objects across

their type inheritancetheir type inheritance3.3. Access support relations, is a Access support relations, is a

data structure that stores selected data structure that stores selected path expression.path expression.

– Set MatchingSet MatchingAlgorithmsAlgorithms

1.1. Centralized AlgorithmsCentralized Algorithms2.2. Join execution algorithmJoin execution algorithm

Data Delivery alternativesData Delivery alternatives

Pull-onlyPull-only– Transfer of data from servers to Transfer of data from servers to

clients is initiated by a client clients is initiated by a client pull.pull.

– Arrival of new data items or Arrival of new data items or updates to existing data items updates to existing data items are carried out a server without are carried out a server without modification to clients unless modification to clients unless clients explicitly poll the server.clients explicitly poll the server.

Push-onlyPush-only– Transfer of data from servers to Transfer of data from servers to

clients is initiated by a server clients is initiated by a server push in the absence of any push in the absence of any specific request from clients.specific request from clients.

HybridHybrid– Combines the client-pull and Combines the client-pull and

server-push mechanisms.server-push mechanisms.

Architecture of a Data Architecture of a Data WarehouseWarehouse

Query/Analysis Reporting

Data Mining

Target Database

Metadata repository

Source database

QUERIEs

Integrate

Semi structured DataSemi structured Data– Free and commercial database on product information etc, interfaces to Free and commercial database on product information etc, interfaces to

such sources, is typically a collection of fill-out forms.such sources, is typically a collection of fill-out forms.– Typically modeled as a labeled graphTypically modeled as a labeled graph– A labeled graph are self-describing and have no schema.A labeled graph are self-describing and have no schema.– Object Exchange Model is used to illustrate such a labeled graphObject Exchange Model is used to illustrate such a labeled graph

A label which is the name of the object classA label which is the name of the object classA type which is either atomic (integer, string etc.) or setA type which is either atomic (integer, string etc.) or setA value which is either atomic or a set of objectsA value which is either atomic or a set of objectsAn optional object identifierAn optional object identifier

Web Server

Wrapper Data Source

Global Data Dictionary

Wrapper

Wrapper

Data Source

Data Source

Problems with Pull-based approachProblems with Pull-based approach

– users need to know a priori where users need to know a priori where and when to look for data.and when to look for data.

– Mismatch between the Mismatch between the asymmetric nature of some asymmetric nature of some applications and the symmetric applications and the symmetric communications infrastructure on communications infrastructure on applications such as internet.applications such as internet.

– Two types of asymmetryTwo types of asymmetryNetwork asymmetry, network Network asymmetry, network bandwidth between client- server bandwidth between client- server different from server-client.different from server-client.Distributed information systems, Distributed information systems, due to imbalance between the due to imbalance between the number of clients and the number number of clients and the number of servers.of servers.Data, amount of data being Data, amount of data being transferred between client and transferred between client and server.server.Data volatilityData volatility

Why Push based technologies?Why Push based technologies?

Response to some of the Response to some of the problems inherent in pull-based problems inherent in pull-based systems.systems.

Algorithm – Push based approachAlgorithm – Push based approach

1.1. Order the data items from hottest to Order the data items from hottest to coldestcoldest

2.2. Partition the data items into ranges of Partition the data items into ranges of items, such that the items in each items, such that the items in each range have similar application access range have similar application access profiles. The number of ranges is profiles. The number of ranges is denoted by denoted by num_rangesnum_ranges. .

3.3. Choose the relative broadcast Choose the relative broadcast frequency for each range as integers frequency for each range as integers (rel_freq(rel_freqii, where i is the range)., where i is the range).

4.4. Divide each range into smaller Divide each range into smaller elements, called chunks (Celements, called chunks (C ijij is the j-th is the j-th chunk of range i). Determine the chunk of range i). Determine the number of chunks into which range i is number of chunks into which range i is divided as num_chunk, = divided as num_chunk, = max_chunks/rel_freqmax_chunks/rel_freqii, where , where max_chunks is the least common max_chunks is the least common multiple of rel_freqmultiple of rel_freqii,¥,¥ii..

5.5. Create the broadcast schedule by Create the broadcast schedule by interleaving the chunks of each range interleaving the chunks of each range using the following procedure.using the following procedure.for I from 0 to max_chunks-1 by 1 dofor I from 0 to max_chunks-1 by 1 do for j from 1 to max ranges by 1 dofor j from 1 to max ranges by 1 do

Broadcast chunk CBroadcast chunk Cjj, (i mod , (i mod num_chunksnum_chunksjj) )

end-forend-forend-forend-for

Difference between pull-based and push-based systemsDifference between pull-based and push-based systems– Cache replacement policiesCache replacement policies– Prefetching mechanismPrefetching mechanism

An idealized algorithm for page replacement is one which determines the An idealized algorithm for page replacement is one which determines the page with the smallest ratio between its probability of access and its page with the smallest ratio between its probability of access and its frequency of broadcast.frequency of broadcast.

PIX algorithm, calculates the “cost” of replacing a page and replaces the PIX algorithm, calculates the “cost” of replacing a page and replaces the least costly one.least costly one.

The operation of the algorithm is as follows:The operation of the algorithm is as follows:1.1. When a page Pi is brought into cache and inserted into a chain. When a page Pi is brought into cache and inserted into a chain.

Pr Pr ii = 0, LTi = CurrentTime = 0, LTi = CurrentTime2.2. When Pi is accessed again, it is moved to the top of its own chain and the following When Pi is accessed again, it is moved to the top of its own chain and the following

caculations are made: caculations are made: Pri = HF / (Current Time –LT Pri = HF / (Current Time –LT ii) + (1 – HF) * LT) + (1 – HF) * LTi i , LTi , LTi = CurrentTime= CurrentTime, ,

3.3. If a new page needs to be flushed out to open up space, a lix value is calculated If a new page needs to be flushed out to open up space, a lix value is calculated for the pages at the bottom of each chain and the page with the lowest lix value is for the pages at the bottom of each chain and the page with the lowest lix value is flushed out. The lix value is calculated as follows:flushed out. The lix value is calculated as follows:

lixlixii = Pri/rel-freq = Pri/rel-freqii

where rel-freqwhere rel-freqi i is the relative broadcast frequency of the range (disk) to is the relative broadcast frequency of the range (disk) to which that page Pi belongs. which that page Pi belongs.

reference book principles of distributed database system chapters chapter 12: distributed dbms...

Documents