dbms and implementation all(full)

Upload: kabir-j-abubakar

Post on 03-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 DBMS and Implementation All(Full)

    1/69

    1

    Database Management Systemsand Their Implementation

    By Xu LizhenSchool of Computer Science and Engineering, Southeast University

    Nanjing, China

    Course Goal and its Preliminary Courses

    The Preliminary Courses are:

    Data Structure Database Principles Database Design and Application

    The students should already have the basic concepts

    2Database Management Systems and Their Implementation, Xu Lizhen

    a ou a a ase sys em, suc as a a mo e , a aschema, SQL, DBMS, transaction, database design, etc.

    Now we will introduce the implementationtechniques of Database Management Systems.

    The goal is to build the foundation of furtherresearch in database field and to use databasesystem better through the study of this course.

    Main Contents

    Introduce the inner implementationtechnique of every kind of DBMS,including the architecture of DBMS, thesupport to data model and the

    3Database Management Systems and Their Implementation, Xu Lizhen

    implementation of DBMS core, userinterface, etc. The emphasis is the basicconcepts, the basic principles and theimplementation methods related toDBMS core.

  • 7/28/2019 DBMS and Implementation All(Full)

    2/69

    2

    Main Contents

    Because the relational data model is the mainstreamdata model, and distributed DBMS includes all aspectsof classical centralized DBMS, the main thread of thiscourse is relational distributed database managements stem. The im lementation of ever as ects of DBMS

    4Database Management Systems and Their Implementation, Xu Lizhen

    are introduced according to distributed DBMS. Somecontents of other kinds of DBMS are also introduced,including federated database systems, parallel databasesystems and object-oriented database systems, etc.Along with the continuous progress of databasetechnique, new contents will be added at any time.

    Course History

    Database System Principles (before 1984) ----relational model theory, some query optimizationalgorithms

    Distributed Database Systems (1985~1994) ----introduce implementation techniques in DDBMS

    5

    oroug y an sys em ca y

    Database Management Systems and Their

    Implementation (after 1995) ---- not limited toDDBMS, hope to introduce the implementationtechniques of DBMS more thoroughly. Along withthe progress of database technique, new contents canbe added without changing the course name.

    Database Management Systems and Their Implementation, Xu Lizhen

    References

    1) Stefano Ceri, Distributed Databases

    2) Wang Nengbin, Principles of Database Systems

    3) Raghu Ramakrishnan, Johannes Gehrke, DatabaseManagement Systems , 3rd Edition, McGraw-HillCompanies, 2002

    6Database Management Systems and Their Implementation, Xu Lizhen

    4) Hector Garcia-Molina, Jeffrey.D.Ullman, DatabaseSystems: the Complete Book

    5) S.Bing Yao et al, Query Optimization in DDBS

    6) Courseware:http://cse.seu.edu.cn/people/lzxu/resource

  • 7/28/2019 DBMS and Implementation All(Full)

    3/69

    3

    Table of Contents

    1. Introduction

    The history, classification, and main research contents ofdatabase systems; Distributed database system

    2. DBMS Architecture

    The com osition of DBMS and its rocess structure; The

    7Database Management Systems and Their Implementation, Xu Lizhen

    architecture of distributed database systems

    3. Access Management of Database

    Physical file organization, index, and access primitives

    4. Data Distribution

    The fragmentation and distribution of data, distributeddatabase design, federated database design, parallel databasedesign, data catalog and its distribution

    Table of Contents

    5. Query Optimization

    Basic problems; Query optimization techniques; Queryoptimization in distributed database systems; Queryoptimization in other kinds of DBMS

    6. Recovery Mechanism

    8Database Management Systems and Their Implementation, Xu Lizhen

    Basic problems; Updating strategies and recovery techniques;Recovery mechanism in distributed DBMS

    7.

    Concurrency ControlBasic problems; Concurrency control techniques; Concurrencycontrol in distributed DBMS; Concurrency control in otherkinds of DBMS

    1. Introduction

    Database Management Systemsand Their Implementation, Xu

    Lizhen

  • 7/28/2019 DBMS and Implementation All(Full)

    4/69

    4

    1.1 The History of Database Technologyand its Classification

    (1) According to the development of data model No management(before 1960): Scientific computing

    File system: Simple data management

    Demand of data management growing continuously,DBMS emer ed.

    10Database Management Systems and Their Implementation, Xu Lizhen

    .

    1964, the first DBMS (American): IDS,network

    1969, the first commercialDBMS of IBM, hierarchical

    1970, E.F.Codd(IBM) bring forwardrelational data model

    Other data model: Object Oriented, deductive, ER, ...

    (2) According to the development of DBMSarchitectures

    Centralized database systems

    Parallel database systems

    Distributed database systems (and Federateddatabase systems)

    Mobile database systems

    3 Accordin to the develo ment of architectures of

    11Database Management Systems and Their Implementation, Xu Lizhen

    application systems based on databases

    Centralized structure : HostTerminal

    Distributed structure Client/Server structure

    Three tier/multi-tier structure

    Mobile computing

    Grid computing (Data Grid), Cloud Computing

    (4) According to the expanding of application fields

    OLTP

    Engineering Database

    Deductive Database

    Multimedia Database

    Temporal Database

    12Database Management Systems and Their Implementation, Xu Lizhen

    Spatial Database

    Data Warehouse, OLAP, Data Mining

    XML Database

    Big Data, NoSQL, NewSQL

  • 7/28/2019 DBMS and Implementation All(Full)

    5/69

    5

    13Database Management Systems and Their Implementation, Xu Lizhen

    1.2 Distributed Database Systems

    What is DDB?

    A DDB is a collection of correlated data which arespread across a network and managed by a softwarecalled DDBMS.

    14Database Management Systems and Their Implementation, Xu Lizhen

    Two kinds:(1) Distributed physically, centralized logically (general DDB)

    (2) Distributed physically, distributed logically too (FDBS)

    We take the first as main topic in this course.

    Distribution

    Correlation

    DDBMS

    Features of DDBS :

    15Database Management Systems and Their Implementation, Xu Lizhen

  • 7/28/2019 DBMS and Implementation All(Full)

    6/69

    6

    The advantages of DDBS:

    Local autonomy

    Good availability (because support multi copies) Good flexibility

    Low system cost

    High efficiency (most access processed locally, lesscommunication comparing to centralized database

    16Database Management Systems and Their Implementation, Xu Lizhen

    system)

    Parallel process

    The disadvantages of DDBS: Hard to integrate existing databases

    Too complex (system itself and its using, maintenance,etc. such as DDB design)

    The main problems in DDBS:

    Compared to centralized DBMS, thedifferences of DDBS are as follows:

    Query Optimization (different optimizinggoal)

    Concurrenc control should consider whole

    17Database Management Systems and Their Implementation, Xu Lizhen

    network)

    Recovery mechanism (failure combination)

    Another problem specially for DDBS:

    Data distribution

    2. The Architecture of DBMS

    Database Management Systemsand Their Implementation, Xu

    Lizhen

  • 7/28/2019 DBMS and Implementation All(Full)

    7/69

    7

    Main Contains

    The components of DBMS core The process structure of DBMS

    The components of DDBMS core

    19

    Database Management Systems and Their Implementation, Xu Lizhen

    2.1 The Components of DBMS Core

    Database statement

    (such as SQL)

    App1

    Grant checking

    Parser

    App i

    interfacem

    Appj

    interface1

    Statement / Program

    Grammar tree

    Message or data

    Formatted message or data

    App n. . . .. . .. .

    u f i /API . . .

    20Database Management Systems and Their Implementation, Xu Lizhen

    DBMS coreOperating system

    Disk

    Semantic analysis and query treatment

    ( DDL QL DML DCL )

    Concurrency

    control

    Recovery

    mechanism

    Access

    management

    Access pr imiti ve

    System call

    I/Ocommand

    Message or data

    Message or data

    State info. or physical

    data block

    2.2 The Process Structure of DBMS

    Single process structure

    Multi processes structure

    Multi threads structure

    21Database Management Systems and Their Implementation, Xu Lizhen

    processes / threads

  • 7/28/2019 DBMS and Implementation All(Full)

    8/69

    8

    Single process structure

    The application program is compiled with DBMS core

    as a single .exe file, running as a single process.

    App licat ion cod es

    22Database Management Systems and Their Implementation, Xu Lizhen

    DBMS core (as a function)

    SQLstatements Result

    .exe file

    Multi processes structure

    One application process corresponding to one DBMScore process

    App lic atio nprocess 1

    DBMScore process 1pipe

    SQLstatements

    results

    23Database Management Systems and Their Implementation, Xu Lizhen

    App lic atio nprocess 2

    DBMScore process 2pipe

    SQLstatements

    results

    App lic atio nprocess n

    DBMScore process npipe

    SQLstatements

    results

    .

    .

    .

    .

    .

    .

    Multi threads structure

    Only one DBMS process, every application processcorresponding to a DBMS core thread.

    c at al og l oc k t ab le b uf fer DAEMON

    Appl icati oni e/socket

    SQLs tatements

    DBMScore thread 1

    24Database Management Systems and Their Implementation, Xu Lizhen

    DBMSprocess

    processresults

    pipe/socketAppl icati on

    process 2DBMScore thread 2

    SQLs tatements

    results

    pipe/socketAppl icati on

    process nDBMScore thread n

    SQLs tatements

    results

    . . . . . .

  • 7/28/2019 DBMS and Implementation All(Full)

    9/69

    9

    Communication protocols between processes / threads

    Application programs access databases through API

    or embedded SQL offered by DBMS, according tocommunication protocol to realize synchronizingcontrol:

    Ad-hoc i nterfac e orDBMS core

    Pipe0

    25Database Management Systems and Their Implementation, Xu Lizhen

    app ca on programPipe1

    State TupNum AttNum AttName AttType AttLen . . . . . . TmpFileName

    Def in it iono fonea t t r ibu te Def in it ion o fo thera t tr ibutes

    Pipe0: Send SQL statements, inner commands;Pipe1: return results. The result format:

    Communication protocols between processes / threads

    State: 0 -- error, 1 -- success for insert,delete,update,

    2 -- query success, need to treat result further.

    TupNum: tuple number in result.

    AttNum: attribute number in result table.

    AttName: attribute name.

    26Database Management Systems and Their Implementation, Xu Lizhen

    .

    AttType: attribute type.

    AttLen: byte number of this attribute.

    TmpFileName: name of the temporary file whichstore the result data, need the above metadata toexplain it.

    2.3 The Components of DDBMS Core

    DB DC

    DDDDBK

    LDB1DB: database management

    DC: communication control

    27Database Management Systems and Their Implementation, Xu Lizhen

    DB DC

    DDDDBK

    LDB2

    Site1

    Site2

    DD: catalog management

    DDBK: core, responsible for

    parsing, distributed

    transaction management,

    concurrency control, recovery

    and global query optimization.

  • 7/28/2019 DBMS and Implementation All(Full)

    10/69

    10

    An example of global query optimization

    Global query optimization mayget an execution plan based oncost estimation, such as:

    (1)send R2 to site1, R

    R1 R2

    Site1 Site2

    28Database Management Systems and Their Implementation, Xu Lizhen

    execute on site :

    Select *

    From R1, R

    Where R1.a = R.b;

    Select *From R1,R2Where R1.a = R2.b;

    2.4 The Process Structure of DDBMS

    Daemon Daemon Daemon

    DDBMS core thread

    Appl icati on pr ocess 2Appl icati on pr ocess 1

    SQL/result SQL/result

    SQL/result

    SQL/resultDDBMS core thread 1

    DDBMS core thread

    29Database Management Systems and Their Implementation, Xu Lizhen

    LDBMS process

    Local database 1

    LDBMS process

    Local database 2

    LDBMSprocess

    Local database 3

    Site 1 Site 2 Site 3

    SQL/result SQL/result SQL/result SQL/result

    DDBMS core thread 2

    3. Database Access Management

    Database Management Systemsand Their Implementation, Xu

    Lizhen

  • 7/28/2019 DBMS and Implementation All(Full)

    11/69

    11

    Main Contains

    The access to database is transferred to theoperations on files (of OS) eventually. The filestructure and access route offered on it willaffect the speed of data access directly. It isim ossible that one kind of file structure will

    31Database Management Systems and Their Implementation, Xu Lizhen

    be effective for all kinds of data access

    Access types

    File organization

    Index technique

    Access primitives

    Access Types

    Query all or most records of a file (>15%)

    Query some special record

    Query some records (

  • 7/28/2019 DBMS and Implementation All(Full)

    12/69

    12

    Index Technique

    B+ Tree () Clustering index ( ) Inverted file

    Dynamic hashing

    Grid structure file and artitioned hash function

    34Database Management Systems and Their Implementation, Xu Lizhen

    Bitmap index (used in data warehouse)

    Others

    Bitmap index index itself is data

    D at e S to r e S ta te C la ss S al es

    3/1/96 32 NY A 6

    3/1/96 36 MA A 9

    3/1/96 38 NY B 5

    3/1/96 41 CT A 11

    3/1/96 43 NY A 9

    3/1/96 46 RI B 3

    Bitmap Index for Sales

    8bit 4bit 2bit 1bit

    0 1 1 0

    1 0 0 1

    0 1 0 1

    1 0 1 1

    1 0 0 1

    0 0 1 1

    Bitmap Index for State

    AK AR CA CO CT MA NY RI

    0 0 0 0 0 0 1 0

    0 0 0 0 0 1 0 0

    0 0 0 0 0 0 1 0

    0 0 0 0 1 0 0 0

    0 0 0 0 0 0 1 0

    0 0 0 0 0 0 0 1

    35Database Management Systems and Their Implementation, Xu Lizhen

    3/1/96 4 7 CT B 7

    3/1/96 49 NY A 12

    1 1 0 0

    for Class

    A B C

    1 0 0

    1 0 0

    0 1 0

    1 0 0

    1 0 0

    0 1 0

    0 1 0

    1 0 0

    0 0 0 0 0 0 1 0

    Total sales = ? (4*8+4*4+4*2+6*1=62)How manyclassA store inNY ? (3)Salesof class A store in NY = ? (2*8+2*4+1*2+1*1=27)How manystores inCT ? (2)Join operation (query product list of class A store in NY)

    Access Primitives (examples)

    int dbopendb(char * dbname)

    Function: open a database.

    int dbclosedb(unsigned dbid)

    Function: close a database.

    int dbTableInfo(unsigned rid, TableInfo * tinfo)

    36Database Management Systems and Their Implementation, Xu Lizhen

    unc on: ge e n orma on o e a e re erence y r .

    int dbopen(char * tname,int mode, int flag)

    Function: open the table tname and assign a rid for it.

    int dbclose(unsigned rid)Function: close the table referenced by rid and release the rid.

    int dbrename(oldname, newname)

    Function: rename the table.

  • 7/28/2019 DBMS and Implementation All(Full)

    13/69

    13

    Access Primitives (examples)

    int dbcreateattr (unsigned rid , sstree * attrlist)

    Function: create some attributes in the table referenced by rid.

    int dbupdateattrbyidx(unsigned rid, int nth, sstree attrinfo)

    Function: update the definition of the nth attribute in the tablereferenced by rid.

    *

    37Database Management Systems and Their Implementation, Xu Lizhen

    , ,attrinfo)

    Function: update the definition of attribute attrname in the tablereferenced by rid.

    int dbinsert(unsigned rid, char * tuple, int length, int flag)

    Function: insert a tuple into the the table referenced by rid.

    Access Primitives (examples)

    int dbdelete(unsigned rid, long offset, int flag)

    Function: delete the tuple specified by offset in the tablereferenced by rid.

    int dbupdate(unsigned rid, long offset, char * newtuple, int flag)

    Function: update the tuple specified by offset in the tablereferenced b rid with newtu le.

    38Database Management Systems and Their Implementation, Xu Lizhen

    .

    int dbgetrecord(unsigned rid, int nth, char* buf)

    Function: fetch out the nth tuple from the table referenced by rid

    and put it into buffer buf. int dbopenidx(unsigned rid, indexattrstruct * attrarray, int flag)

    Function: open the index of the table referenced by rid andassign a iid for it.

    Access Primitives (examples)

    int dbcloseidx(unsigned iid)

    Function: close the index referenced by iid.

    int dbfetch(unsigned rid, char * buf, long offset)

    Function: fetch out the tuple specified by offset from the tablereferenced by rid and put it into buffer buf.

    * *

    39Database Management Systems and Their Implementation, Xu Lizhen

    , , ,

    Function: fetch out the TIDs of tuples whose value on indexedattribute has the flag relation withpvalue, and put them intooffsetbuf. iid is the reference of the index used.

    int dbpack(unsigned rid)

    Function: re-organize the relation, delete the tuples havingdeleted flag physically.

  • 7/28/2019 DBMS and Implementation All(Full)

    14/69

    14

    4. Data Distribution

    Database Management Systemsand Their Implementation, Xu

    Lizhen

    4.1 Strategies of Data Distribution

    (1) Centralized: distributed system, but the dataare still stored centralized. It is simplest, butthere is not any advantage of DDB.

    (2) Partitioned: data are distributed withoutre etition. no co ies

    41Database Management Systems and Their Implementation, Xu Lizhen

    (3) Replicated: a complete copy of DB at eachsite. Good for retrieval-intensive system.

    (4) Hybrid (mix of the above): an arbitraryfraction of DB at various sites. The mostflexible and complex distributing method.

    Comparison of four strategies

    1 2 3 4

    flexibility

    42Database Management Systems and Their Implementation, Xu Lizhen

    complexity

    Advantage of DDBS

    Problems with DDBS

  • 7/28/2019 DBMS and Implementation All(Full)

    15/69

    15

    4.2 Unit of Data Distribution

    (1) According to relation(or file), that meansnon partition

    (2) According to fragments

    43Database Management Systems and Their Implementation, Xu Lizhen

    Horizontal fragmentation: tuple partition

    Vertical fragmentation: attribute partition

    Mixed fragmentation: both

    The criteria of fragmentation:

    (1) Completeness: every tuple or attribute musthas its reflection in some fragments.

    (2) Reconstruction: should be able to

    44Database Management Systems and Their Implementation, Xu Lizhen

    reconstruct t e origina g o a re ation.

    (3) Disjointness: for horizontal fragmentation.

    (1) Horizontal Fragmentation

    Defined by selection operation with predicate, andreconstructed by union operation.

    Fragmentation Methods

    Rn fragments (use P1, , P2. . .Pn)SELECT *

    45Database Management Systems and Their Implementation, Xu Lizhen

    Fulfill: PiPj=false (ij)P1P2. . .Pn=true

    FROM R

    WHERE P ;

    Derived Fragmentation: relation is fragmented notaccording to itselfs attribute, but to another relationsfragmentation.

  • 7/28/2019 DBMS and Implementation All(Full)

    16/69

    16

    An example of Derived Fragmentation

    TEACHER(TNAME, DEPT)

    COURSE(CNAME,TNAME)

    Suppose TEACHER has been fragmented according toDEPT, we want to fragment COURSE even if there is noDEPT attribute in it. This will be the fragmentation

    46Database Management Systems and Their Implementation, Xu Lizhen

    derived from TEACHERs fragmentation.

    Semi join : R S = R (R S)

    TEACHER9 = SELECT * FROM TEACHER

    WHERE DEPT=9th;

    COURSE9=COURSE TEACHER9

    (2) Vertical Fragmentation

    Defined by project operation, and reconstructed byjoin operation. Note:

    Completeness: each attribute should appear in atleast one fragment.

    Reconstruction: should fulfill the condition of

    47Database Management Systems and Their Implementation, Xu Lizhen

    lossless join decomposition when fragmentizing.

    a. Include a key of original relation in every fragment.

    b. Include a TID of original relation produced bysystem in every fragment.

    (3) Mixed Fragmentation

    Apply fragmentation operations recursively.

    Can be showed with a fragmentation tree:

    R Global relation

    48Database Management Systems and Their Implementation, Xu Lizhen

    R13

    R2R1

    R12R11 Fragments

    V

    H

  • 7/28/2019 DBMS and Implementation All(Full)

    17/69

    17

    4.3 Different Transparency Level

    We can simplify a complex problem throughinformation hiding method

    Level 1: Fragmentation Transparency

    User only need to know global relations, he

    49Database Management Systems and Their Implementation, Xu Lizhen

    dont have to know if they are fragmentizedand how they are distributed. In thissituation, user can not feel the distribution ofdata, as if he is using a centralized database.

    Level 2: Location Transparency

    User need to know how the relations arefragmentized, but he dont have to worry thestore location of each fragment.

    50Database Management Systems and Their Implementation, Xu Lizhen

    User need to know how the relations arefragmentized and how they are distributed,but he dont have to worry every localdatabase managed by what kind of DBMS,using what DML, etc.

    Level 4No Transparency

    4.4 Problems Caused by Data Distribution

    1) Multi copies consistency2) Distribution consistency

    Mainly the change of tuples store locationresulted by updating operation. Solution:

    (1) Redistribution

    51Database Management Systems and Their Implementation, Xu Lizhen

    After Update: SelectMoveInsertDelete(2) PiggybackingCheck tuple immediately while updating, if

    there is any inconsistency it is sent back alongwith ACK information and then sent to the rightplace.

  • 7/28/2019 DBMS and Implementation All(Full)

    18/69

    18

    3) Translation of Global Queries to FragmentQueries and Selection of Physical Copies.

    4) Design of Database Fragments and Allocation

    52Database Management Systems and Their Implementation, Xu Lizhen

    o ragments.

    Above 1)~3) should be solved in DDBMS.

    While 4) is a problem of distributed databasedesign.

    4.5 Distributed Database Design

    1) Distributed database

    The design of fragments

    The design of fragment distribution solution

    To understand users requirements, we

    53Database Management Systems and Their Implementation, Xu Lizhen

    should ask the following questions to everyapplication while requirement analysis:

    The sites where this application may occur

    The frequency of this application

    The data object accessed by this application

    Design flow of centralized database

    Requirement Analysis

    Concept Design

    Informationrequirement

    Processrequirement

    Requirementindication

    DBMSfeature

    A

    54Database Management Systems and Their Implementation, Xu Lizhen

    Logic Design

    Physical Design

    Hardware, OS feature

    DBMS independentdata schema

    Outer schema,concept schema

    Inner schema

  • 7/28/2019 DBMS and Implementation All(Full)

    19/69

    19

    Design flow of distributed database

    A

    Fragmentation Design

    Information

    requirement

    Process

    requirement

    Concept schema DDBMSfeature

    B

    55Database Management Systems and Their Implementation, Xu Lizhen

    Distribution Design

    Physical design of each site

    Hardware, OS feature of each site

    Distribution units

    Local schema ofevery site

    Inner schema

    (1) Fragmentation design

    In DDB, it is not true that the fragments should bedivided as fine as possible. It should be fragmentizedaccording to the requirement of application. Forexample, there are following two applications:

    App1: SELECT GRADE FROM STUDENT

    = th

    56Database Management Systems and Their Implementation, Xu Lizhen

    App2: SELECT AVG(GRADE)

    FROM STUDENT

    WHERE SEX=Male;

    Problem:

    if STUDENT should be fragmentized horizontallyaccording to DEPT?

    General rules:

    Select some important typical applications whichoccur often.

    Analyze the local feature of the data accessed bythese applications.

    For horizontal fragmentation:

    57Database Management Systems and Their Implementation, Xu Lizhen

    Select suitable predicate to fragmentize the globalrelations to fit the local requirement of each site. Ifthere is any contradiction, consider the need of moreimportant application.

    Analyze the join operations in applications to decideif derived fragmentation is needed.

  • 7/28/2019 DBMS and Implementation All(Full)

    20/69

    20

    For vertical fragmentation:

    Analyze the affinity between attributes, andconsider:

    Save storage space and I/O cost

    58Database Management Systems and Their Implementation, Xu Lizhen

    .by some users.

    (2) Distribution design

    Through cost estimation, decide the suitablestore location (site) of each distribution unit.p252

    2) Parallel database

    What is parallel database system?

    Share Noting (SN) structure

    Vertical parallel and horizontal parallel

    A complex query can be decomposed into severaloperation steps, the parallel process of these steps is

    59Database Management Systems and Their Implementation, Xu Lizhen

    ca e ver ca para e .

    For the scan operation, if the relation to be scanned isfragmentized beforehand into several fragments, andstored on different disks of a SN structured parallelcomputer, then the scan can be processed on thesedisk in parallel. This kind of parallel is calledhorizontal parallel.

    Example:

    SELECT *

    FROM R,S

    WHERE R.a=S.a ANDR.b>20 AND

    R=b>20(R)

    S=c

  • 7/28/2019 DBMS and Implementation All(Full)

    21/69

    21

    Design flow of parallel database

    A

    Fragmentation Design

    Information

    requirement

    Process

    requirement

    Concept schema

    PDBMSfeature andthe feature ofparallel computer

    system used

    61Database Management Systems and Their Implementation, Xu Lizhen

    Distribution design of data on different

    disks of the parallel computer

    Physical design of each disk

    Distribution units

    Local schema of

    every disk

    Inner schema

    C

    Data fragmentation mode in PDB

    (1)Arbitrary

    Fragmentize relation R in arbitrary mode,then stored these fragments on the disk ofdifferent processor. For example, R may bedivided avera el , or hashed into several

    62Database Management Systems and Their Implementation, Xu Lizhen

    fragments, etc.

    (2)Based on expression

    Put the tuples fulfill some condition into afragment. Suitable for the situation in which themost query are based on fragmentationconditions. --- excluded respectively.

    The difference between PDB and DDB about datafragmentation and distribution

    PDB DDB

    The goal offragmentationanddistribution

    Promote parallel processdegree, use the parallelcomputers ability asadequately as possible

    Promote the local degree ofdata access, reduce the datatransferred on network

    63Database Management Systems and Their Implementation, Xu Lizhen

    in accordancewith

    feature of parallel computersystem used, combiningapplication requirements.

    ,combining the feature ofDDBMS used.

    Distributionmode

    On multi disks of a parallelcomputer

    On multi sites in thenetwork

  • 7/28/2019 DBMS and Implementation All(Full)

    22/69

    22

    Design flow of DDB with PDB as local database

    B

    If site i is a parallel

    computer?

    N

    64Database Management Systems and Their Implementation, Xu Lizhen

    C

    Local physical design

    of general DDB

    Inner schema

    Y

    3) Federated database

    In practical applications, there are strongrequirements for solving the integration of multiexisting, distributed and heterogeneous databases.

    The database system in which every member isautonomic and collaborate each other based on

    65Database Management Systems and Their Implementation, Xu Lizhen

    nego a on --- e era e a a ase sys em.

    No global schema in federated database system,

    every federated member keeps its own data schema. The members negotiate each other to decide

    respective input/output schema, then, the datasharing relations between each other are established.

    The schema structure in federated database System

    FS1 FS3FS2

    IS3IS2IS1

    66Database Management Systems and Their Implementation, Xu Lizhen

    ES3ES2ES1

    CS3CS2CS1

    DB1 DB2 DB3

    Site1 Site3Site2

  • 7/28/2019 DBMS and Implementation All(Full)

    23/69

    23

    FSi

    = CSi

    + ISi

    FSi is all of the data available for the users on sitei.

    ISi is gained through the negotiation with ESj of othersites (ji).

    -

    67Database Management Systems and Their Implementation, Xu Lizhen

    i i i the sub-queries on corresponding ESj.

    The results gained from ESj the result forms ofcorresponding ISi, and combined with the results getfrom the sub-queries on CSi, then synthesized to theeventual result form of FS i.

    4.6 The Distribution of Catalog

    Catalog --- Data about data (Metadata).

    Its main function is to transfer the users operatingdemands to the physical targets in system.

    4.6.1 Contents of Catalogs

    (1) The type of each data object (such as base table,

    68Database Management Systems and Their Implementation, Xu Lizhen

    v ew, . . . an s sc ema

    (2) Distribution information (such as fragmentlocation, . . .)

    (3) Access routing (such as index, . . .)

    (4) Grant information

    (5) Some statistic information used in queryoptimization

    (1)~(4) are not changed frequently, while (5) willchange on every update operation. For it:

    Update it periodically

    Update it after every update operation

    4.6.2 The features of catalog

    69Database Management Systems and Their Implementation, Xu Lizhen

    (1) mainly read operation on it

    (2) very important to the efficiency and the datadistribution transparency of the system

    (3) very important to site autonomy

    (4) the distribution of catalog is decided by thearchitecture of DDBMS, not by applicationrequirements

  • 7/28/2019 DBMS and Implementation All(Full)

    24/69

    24

    4.6.3 Distribution strategies of catalog

    (1)Centralized

    A complete catalog stored at one site.

    Extended centralized catalog: centralized atfirst; saved after being used; notify while

    70Database Management Systems and Their Implementation, Xu Lizhen

    (2) Fully replicated: catalogs are replicated ateach site. Simple in retrieval. Complex inupdate. Poor in autonomy.

    (3) Local catalog: catalogs for local data arestored at each site. That means catalogs arestored along with data.

    If want the catalog information about data on other site(look for through broadcast):

    Master catalog: store a complete catalog on some site.Make every catalog information has two copies.

    Cache: after getting and using the catalog informationabout data on other site through broadcast, save it forfuture use (cache it). Update the catalog cached

    71Database Management Systems and Their Implementation, Xu Lizhen

    roug e compar son o vers on num er.

    R

    Rs catalog(Ver No.) Cache of Rs catalog

    (Ver No.)

    (4) Different Combination of the above

    Use different strategy to different contents ofcatalog, and then get different combinationstrategies. Such as:

    a. Use full re licated strate to distribution

    72Database Management Systems and Their Implementation, Xu Lizhen

    information (2), use local catalog strategy toother parts.

    b. Use local catalog strategy to statisticinformation, use fully replicated strategy toother parts.

  • 7/28/2019 DBMS and Implementation All(Full)

    25/69

    25

    4.6.4 Catalog management in R*--- Site autonomy

    Characteristics:

    There is no global catalog

    Independent naming and data definition

    The catalog grows reposefully

    ---

    73Database Management Systems and Their Implementation, Xu Lizhen

    ---(SWN)

    ::[email protected]@BirthSite

    ObjectName: the name given by user for the dataobject

    User: users name. With this, different users canaccess different data object using the same name.

    UserSite: the ID of the site where the User is. Withthis, different users on different sites can use the sameuser name.

    BirthSite: the birth site of the data object. There is noglobal catalog in R* system. At the BirthSite the

    74Database Management Systems and Their Implementation, Xu Lizhen

    information about the data is always kept even thedata is migrated to other site.

    Print Name (PN): user used normally when theyaccess a data object.

    ::=[User[@UserSite].]ObjectName[@BirthSite]

    Name Resolution --- Mapping PN to SWN

    Establish a synonym table for eachuser using Define Synonym ...statement.

    ObjectName SWN

    Mapping PN in different forms according tofollowing rules:

    75Database Management Systems and Their Implementation, Xu Lizhen

    1) PrintName = SWN, need not transform

    2) Only have ObjectName: search ObjectName inthe synonym table of current user on current site.

    3) User.ObjectName: search the synonym table ofuser User on current site.

    4) [email protected]

  • 7/28/2019 DBMS and Implementation All(Full)

    26/69

    26

    5) ObjectName@BirthSite

    If no match for the ObjectName is found in (2), (3) orprint name is in the form of (4) or (5), namecompletion is used.

    Name Resolution --- Mapping PN to SWN

    76Database Management Systems and Their Implementation, Xu Lizhen

    A missing User is replaced by current User.

    A missing UserSite or BirthSite is replaced by currentsite ID.

    5. Query Optimization

    Database Management Systemsand Their Implementation, Xu

    Lizhen

    Introduction

    Rewrite the query statementssubmitted by user first, and thendecide the most effective operatingmethod and steps to get the result.

    78Database Management Systems and Their Implementation, Xu Lizhen

    The goal is to gain the result of usersquery with the lowest cost and inshortest time.

  • 7/28/2019 DBMS and Implementation All(Full)

    27/69

    27

    5.1 Summary of Query Optimization inDDBMS

    Global Query: a query over globalrelation.

    Fragment Query: a query over

    79

    ragments.

    Database Management Systems and Their Implementation, Xu Lizhen

    General flow

    Global Query

    Transfer it to fragment queries

    1) Transform the queries tree into the most effective form

    2) Query decomposition (into several sub queries which can be

    Query tree

    GlobalOptimiz

    Algebr a optimi zation

    80Database Management Systems and Their Implementation, Xu Lizhen

    executed locally)3) Decide the order and site of the operations

    Execute each operation according to the schedule in query plan

    Query result

    Query plan

    tion

    LocalOptimization

    Operation optimization

    Example

    S(SNUM, SNAME, CITY)

    SP(SNUM, PNUM, QUAN)

    P(PNUM, PNAME, WEIGHT, SIZE)

    Suppose the fragmentation is as following:

    81Database Management Systems and Their Implementation, Xu Lizhen

    S1 = CITY=Nanjing(S)S2 = CITY=Shanghai(S)

    SP1 = SP S1

    SP2 = SP S2

  • 7/28/2019 DBMS and Implementation All(Full)

    28/69

    28

    Global Query:

    SELECT SNAME

    FROM S, SP, P

    WHERE S.SNUM=SP.SNUM AND

    SP.PNUM=P.PNUM AND

    82Database Management Systems and Their Implementation, Xu Lizhen

    S.CITY=Nanjing AND

    P.PNAME=Bolt AND

    SP.QUAN>1000;

    Query tree

    (S.SNAME)

    3 P.PNAME=Bolt

    SP.PNUM=P.PNUM

    S.SNUM=SP.SNUM

    83Database Management Systems and Their Implementation, Xu Lizhen

    S1 S2

    1 CITY=Nanjing 2 QUAN>1000 P

    S SP

    SP1 SP2

    After equivalent transform (Algebra optimization) :

    (S.SNAME)

    3

    SP.PNUM=P.PNUM

    S.SNUM=SP.SNUM3

    84Database Management Systems and Their Implementation, Xu Lizhen

    S1 S2

    P

    SP1 SP2

    11 222211

    1= (S.SNUM,S.SNAME)

    2= (SP.SNUM,SP.PNUM)

    3= (P.PNUM)S1 S2 SP1 SP2

    P

  • 7/28/2019 DBMS and Implementation All(Full)

    29/69

    29

    The result of equivalent transform

    (S.SNAME)

    P

    SP.PNUM=P.PNUM

    S.SNUM=SP.SNUM

    85Database Management Systems and Their Implementation, Xu Lizhen

    S1 S2

    SP1 SP2

    The operation optimization of the sub tree (yellow) :

    First consider distributed JN: (1) (S1 S2) (SP1 SP2)

    (2) Distributed Join

    Then consider Site Selection, may produce many combination

    For every join operation, there are many computing method:

    R Site , R S

    86Database Management Systems and Their Implementation, Xu Lizhen

    SRSite jSite i

    S Site i, R S

    JN Attr.(S) Site i, R S Site j, (R S) S

    The goal of query optimization is to select a good solution fromso many possible execution strategies. So it is a complex task.

    5.2 The Equivalent Transform of a Query

    That is so called algebra optimization. It takes a series oftransform on original query expression, and transform itinto an equivalent, most effective form to be executed.For example: NAME,DEPTDEPT=15(EMP)DEPT =15NAME,DEPT (EMP)

    (1)Query tree

    87Database Management Systems and Their Implementation, Xu Lizhen

    For example: SNUMAREA=NORTH(SUPPLYDEPTNUM DEPT)

    (SNUM)

    AREA = North

    DEPT

    DEPTNUM=DEPTNUM

    Supply

    Leaves: global relationMiddle nodes: unary/binaryoperationsLeaves root: the executingorder of operations

  • 7/28/2019 DBMS and Implementation All(Full)

    30/69

    30

    (2) The equivalent transform rules of relational algebra

    1) Exchange rule of /: E1 E2 E2 E12) Combination rule of /: E1(E2E3) (E1E2)E33) Cluster rule of : A1An(B1Bm(E)) A1An(E),

    legal when A1An is the sub set of {B1Bm}

    88Database Management Systems and Their Implementation, Xu Lizhen

    5) Exchange rule of and : F(A1An(E)) A1An (F(E))if F includes attributes B1Bm which dont belong toA1An, then A1An (F(E)) A1AnF(A1AnB1Bm(E))

    6) If the attributes in F are all the attributes in E1, then

    F(E1E2) F(E1)E2

    if F in the form of F1F2, and there are only E1sattributes in F1, and there are only E2s attributes inF2, then F(E1E2) F1(E1)F2(E2)if F in the form of F1F2, and there are only E1sattributes in F1, while F2 includes the attributes both

    89Database Management Systems and Their Implementation, Xu Lizhen

    , F F2 F17) F(E1 E2) F(E1) F(E2)

    8) F(E1 - E2)F(E1) - F(E2)9) Suppose A1An is a set of attributes, in which

    B1Bm are E1s attributes, and C1Ck are E2sattributes, then

    A1An(E1E2) B1Bm(E1)C1Ck(E2)

    10) A1An(E1 E2) A1An(E1) A1An(E2)From the above we can see, the goal of algebraoptimization is to simplify the execution of the query,and the target is to make the scale of the operandswhich involved in binary operations be as small aspossible.

    90Database Management Systems and Their Implementation, Xu Lizhen

    (SNUM)

    (DEPTNUM)

    DEPTNUM=DEPTNUM

    (SNUM, DEPTNUM)

    Supply DEPT

    AREA = North

    (3) The generalprocedure of algebraoptimization pleaserefer to p118.

  • 7/28/2019 DBMS and Implementation All(Full)

    31/69

    31

    5.3 Transform Global Queries intoFragment Queries

    Methods:

    For horizontal fragmentation: R = R1 R2 Rn For vertical fragmentation: S = S1 S2 Sn

    Replace the global relation in query expression with the above.The expression we get is called canonical expression

    Transform the canonical ex ression with the e uivalent

    91Database Management Systems and Their Implementation, Xu Lizhen

    transform rules introduced above. Principles:

    1) Push down the unary operations as low as possible

    2) Look for and combine the common sub-expression

    Definition: the sub-expression which occurs more than once inthe same query expression. If find this kind of sub-expressionand compute it only once, it will promote query efficiency.

    General method:

    (1) combine the same leaves in the query tree

    (2) combine the middle nodes corresponding to thesame operation with the same operands.

    example: emp.name(emp (mgrnum=373 dept)

    92Database Management Systems and Their Implementation, Xu Lizhen

    sal>35000 mgrnum=373

    emp.name

    deptnum = deptnum

    dept

    -

    mgrnum=373

    empdept

    mgrnum=373sal>35000

    emp

    combinecombine

    move up

    combine

    combine

    emp.name

    deptnum

    -

    sal>35000

    Properties: R R R R R R RR R F(R) F(R)

    93Database Management Systems and Their Implementation, Xu Lizhen

    dept

    mgrnum=373

    emp

    Common sub-expression:emp (mgrnum=373 dept)

    F R F(R) not F R F1(R) F2(R) F1F2(R) F1(R) F2(R) F1F2(R) F1(R)F2(R) F1not F2(R)

  • 7/28/2019 DBMS and Implementation All(Full)

    32/69

    32

    emp.name

    deptnum

    sal

  • 7/28/2019 DBMS and Implementation All(Full)

    33/69

    33

    Decomposition method :

    Traverse the query tree in post-order, until j become 2,

    then get the first sub-tree. The rest may be deduced byanalogy, so we can get all of the sub-trees.

    R1 R2 R3

    97Database Management Systems and Their Implementation, Xu Lizhen

    R41

    R11 R2

    1

    R31

    21 434321

    R5

    55

    R6

    66

    R1

    R2 R3

    Site1 Site3Site2 Assembling tree

    5.5 The Optimization of BinaryOperations

    The executions of local sub-queries are responsible bylocal DBMS. The query optimization of DDBMS isresponsible for the global optimization, that is theexecution of assembling tree.

    Because the executions of unary operations are

    98Database Management Systems and Their Implementation, Xu Lizhen

    respons e y oca a er a ge ra op m za onand query decomposition, the global optimization ofDDBMS only need to consider the binary operations,mainly the join operation.

    How to find a good access strategy to compute thequery improved by algebra optimization is introducedin this section.

    I. Main problems in global optimization

    1) Materialization: select suitable copies offragments which involved in the query

    2) Strategies for join operation

    Non distributed oin:

    99Database Management Systems and Their Implementation, Xu Lizhen

    _(R2 U R3) R1

    Distributed join:(R1 R2) (R1R3)

    irect join

    Use of semi_join

    3) Select execution of each operation (mainly todirect join)

  • 7/28/2019 DBMS and Implementation All(Full)

    34/69

    34

    Nested loop: one relation acts as outer loop relation

    (O), the other acts as inner loop relation (I). Forevery tuple in O, scan I one time to check joincondition.

    Because the relation is accessed from disk in the

    100Database Management Systems and Their Implementation, Xu Lizhen

    un o oc , we can use oc u er o mproveefficiency. For R S, if let R as O, S as I, bR isphysical block number of R, bS is physical blocknumber of S, there are nB block buffers in system(nB>=2), and nB-1 buffers used for O, one bufferused for I, then the total disk access times neededto compute R S is:

    bRbR/(nB-1)

    bS

    Merge scan: order the relation R and S on disk inahead, then we can compare their tuples in order,and both relation only need to scan one time. If Rand S have not ordered in ahead, must consider theordering cost to see if it is worth to use this method(p122)

    101Database Management Systems and Their Implementation, Xu Lizhen

    Using in ex or as to oo or mapping tup es: innested loop method, if there is suitable access routeon I (say B+ tree index), it can be used to substitutesequence scan. It is best when there is cluster index orhash on join attributes.

    Hash join: because the join attributes of R and S havethe same domain, R and S can be hashed into thesame hash file using the same hash function, then R S can be computed based on the hash file.

    The above are all classical algorithms in centralizedDBMS. The idea is similar when computing join inDDBMS using direct join strategy.

    II. Three general optimization methods:1) By cost comparison (also called exhaustive search)2 B heuristic rule: enerall used in small s stems.

    102Database Management Systems and Their Implementation, Xu Lizhen

    (p124)3) Combination of 1, 2: eliminate the solutions

    unsuitable obviously by using 2 to reduce solutionspace, then use 1 to compare cost of the restsolutions carefully.

    III. Cost estimationTotal query cost=Processing cost+Transmission cost

  • 7/28/2019 DBMS and Implementation All(Full)

    35/69

    35

    According to different environment:

    For wide area network: the transfer rate is about100bps~50Kbps, far slow than processing speed incomputer, so Processing cost can be omitted.

    For local area network: the transfer rate will reach

    103Database Management Systems and Their Implementation, Xu Lizhen

    1000Mbps, both items should be considered.

    1) Transmission cost

    TC(x)=C0 + C1x

    x: the amount of data transferred; C0: cost ofinitialization; C1: cost of transferring one data unit onnetwork. C0 , C1 rely on the features of the networkused.

    2)Processing cost

    Processing cost = costcpu + costI/Ocostcpu can be omitted generally.

    cost of one I/O = D0 + D1D0: the average time looking for track (ms);

    D : time of one data unit I/O s, can be omitted

    104Database Management Systems and Their Implementation, Xu Lizhen

    costI/O = no. of I/OD0

    Notice: calculate query cost accurately is unnecessaryand unpractical. The goal is to find a good solutionthrough the comparison between different solutions,so only need to estimate the execution cost ofdifferent solutions under the same executionenvironment.

    5.6 Implement Join Operation WithSemi_join

    I. The role of semi_join

    Semi_join is used to reduce transmission cost. So it issuitable for WAN only.

    R S = R(R S)

    if R and S are stored on site 1 and 2 respectively, the

    105Database Management Systems and Their Implementation, Xu Lizhen

    steps to realize R S with is as following:

    1) TransferA(S) site1, A is join attribute

    2) Execute R A(S)=R S on site1 (compress R)

    3) Transfer R S site2

    4) Execute (R S) S = R S on site2

  • 7/28/2019 DBMS and Implementation All(Full)

    36/69

    36

    Cost comparison

    Cost of direct join = C0 + C1min(r, s) ------

    r, s --- |R|, |S| (size of the relations) Cost of join via semi_join =

    min(2C0 + C1s + C1r, 2C0 + C1r + C1s) =

    2C0+C1min(s + r, r + s) ------

    106Database Management Systems and Their Implementation, Xu Lizhen

    s, r --- |A(S)|, |A(R)|

    s, r --- |S R|, |R S|

    Only when

  • 7/28/2019 DBMS and Implementation All(Full)

    37/69

    37

    Query graph:

    Link the two relations with a line if there is

    between them, then we can get thequery graph. The query whose querygraph like the left graph is called treequery (TQ)

    R1

    R2 R3

    R5 R6R4

    = = = =

    109Database Management Systems and Their Implementation, Xu Lizhen

    1. 2. 2. 3. 3. 1.

    R1

    R2 R3

    The query whose query graph like the leftgraph is called cyclic query (CQ)

    Example 3: q = (R1.A=R2.B)(R2.B=R3.C)(R3.C=R1.A)

    This is a TQ, not a CQ, because R3.C=R1.A can be obtainedfrom transfer relation, it is not a independent condition.

    Can be proved:

    1) Full reducer exists for TQ.

    2) No full reducer exists for CQ in many cases.

    R1 A B

    0 1

    R2 C D

    1 2

    R3 E F

    2 3

    110Database Management Systems and Their Implementation, Xu Lizhen

    q = (R1.B=R2.C)(R2.D=R3.E)(R3.F=R1.A)Even if the result of this query is empty, the size of anyone of R1, R2 and R3 can not be decreased through . Sothere is not full reducer for this query.

    Another example about full reducer of CQ

    R A B

    1 a

    2 b

    3 c

    S B C

    a x

    b y

    c z

    T C A

    x 2

    y 3

    z 4

    111Database Management Systems and Their Implementation, Xu Lizhen

    q = . = . . = . . = . , s ere u re ucer

    R=R T=A B

    2 b

    3 c

    S=S R=B C

    b y

    c z

    T=T S=C A

    y 3

    z 4

    R=R T=A B

    3 cS=S R= T=T S=

    B C

    c z

    C A

    z 4

  • 7/28/2019 DBMS and Implementation All(Full)

    38/69

    38

    R=R T=, So the full reducer of relation R inquery q is :

    R=R T S=S R T=T S R=R T

    S=S R T=T S R=R T= Conclusion:

    For CQ: there is no full reducer in general. Even ifthere is, its length increases linearly along with the

    112Database Management Systems and Their Implementation, Xu Lizhen

    number of tuples of some relation in the query. (about3(m-1))

    For TQ: the length of full reducer < n-1 (n is thenumber of nodes in the query graph).

    So the full reducer can not be the optimization targetwhen execute join operation under distributedenvironment through .

    SDD-1 Algorithm: heuristic rule (p189~190)

    5.7 Direct Join

    --- Implementation method of join operation in R*

    I. Two basic implementation method of join operation

    113Database Management Systems and Their Implementation, Xu Lizhen

    algorithm in centralized DBMS

    cost = (bR

    +bR/(n

    B- 1)* b

    S) * D

    0

    Merge Scan: the extension of correspondingalgorithm in centralized DBMS

    cost = (bR + bS) * D0 + Costsort(R) + Costsort(S)

    II. Transmission of relations in these two methods

    1) shipped whole : The whole relation is shippedwithout selections.

    For I, a temporary relation is established at thedestination site for further use.

    For O, the relation doesnt need storing.

    114Database Management Systems and Their Implementation, Xu Lizhen

    2) Fetch as need : The whole relation is not shipped.The tuples needed by the remote site are sent at itsrequest. The request message usually contains the valueof join attribute.

    Usually there is an index on the join attribute at therequested site.

  • 7/28/2019 DBMS and Implementation All(Full)

    39/69

    39

    III. Six implementation strategies of join in R*

    Nested Loop O: shipped whole

    I : fetch as neededMerge Scan O: shipped whole

    I : shipped whole

    fetch as needed

    115Database Management Systems and Their Implementation, Xu Lizhen

    Shipping whole O and I to a 3rd site (NL or MS)

    It is obvious that O should be shipped whole.

    In NL, if I is shipped whole, index cant be shippedalong with it, moreover temporary relation isrequired. Both processing cost and storage cost arehigh.

    Six strategies dont include:

    Multiple join --- transformed into multibinary joins.

    Copy selection --- because R* doesntsupport multi copies.

    116Database Management Systems and Their Implementation, Xu Lizhen

    5.8Distributed Grouping & Aggregate FunctionEvaluation

    SELECT PNUM, SUM(QUAN)

    FROM SP

    GROUP BY PNUM;

    That is : GBPNUM, SUM(QUAN)SP

    There are the following conclusions about grouping &aggregate function evaluation in distributed computingenvironment :

    1) Suppose Gi is a group gotten through grouping to R1 R2according to some attribute set, iff G iRj OR GiRj = for all i, j------(SNC), then :

    GBG, AF(R1 R2) = (GBG, AFR1) (GBG, AFR2)

    For example:

    117Database Management Systems and Their Implementation, Xu Lizhen

    , ;

    If SP is derived fragmented according to the suppliers city:conform to SNC, so the grouping & aggregate can be evaluateddistributed.

    If SP is derived fragmented according to the parts type: dontconform to SNC, because the same supplier may provide morethan one kinds of part at same time. In this situation thegrouping & aggregate can not be evaluated distributed.

  • 7/28/2019 DBMS and Implementation All(Full)

    40/69

    40

    2) If SNC does not hold, it is still possible to computesome aggregate functions of global relationdistributed

    Suppose global relation: S

    fragments: S1, S2, , Snthen:

    SUM S = SUM SUM S SUM S SUM S

    118Database Management Systems and Their Implementation, Xu Lizhen

    COUNT(S) = SUM(COUNT(S1), COUNT(Sn))

    AVG(S) = SUM(S)/COUNT(S)

    MIN(S) = MIN(MIN(S1), MIN(S2), , MIN(Sn))

    MAX(S) = MAX(MAX(S1), MAX(S2), , MAX(Sn))

    5.9 Update Strategies

    The consistency between multi copies must beconsidered while executing update, because anydata may have multi copies in DDB.

    1) Updating all strategy

    119Database Management Systems and Their Implementation, Xu Lizhen

    unavailable.

    p --- probability of availability of a copy.n --- No. of copies

    The probability of success of the update=pn

    0lim

    n

    n

    p

    2) Updating all available sites immediately andkeeping the update data at spooling site forunavailable sites, which are applied to that siteas soon as it is up again.

    3) Primary copy updating strategy

    120Database Management Systems and Their Implementation, Xu Lizhen

    ss gn a copy as pr mary copy. e rema n ngcopies called secondary copies.Update : update P.C, then P.C broadcast theupdate to S.Cs at sometimes.P.C maybe inconsistent with S.C temporarily.There is no problem if the next operation is stillupdate. While if the next operation is a read tosome S.C, then:

  • 7/28/2019 DBMS and Implementation All(Full)

    41/69

    41

    Compare the version No. of S.C with that of

    P.C, if version No. are equal, read S.C directly;else:

    (1) redirect the read to P.C

    (2) wait the update of S.C

    121Database Management Systems and Their Implementation, Xu Lizhen

    4) Snapshot

    Snapshot is a kind of copy image notfollowed the changes in DB.

    Master copy at one site, many snapshots aredistributed at other sites.

    Update: master copy only.

    Read: master copy

    snapshotsis indicated by users

    The snapshot can be refreshed:

    (1) periodically

    122Database Management Systems and Their Implementation, Xu Lizhen

    (2) forced refreshing by REFRESH command

    Snapshot is suitable for the application

    systems in which there is less update, such ascensus system, etc.

    6. Recovery

    Database Management Systemsand Their Implementation, Xu

    Lizhen

  • 7/28/2019 DBMS and Implementation All(Full)

    42/69

    42

    6.1 Introduction

    The main roles of recovery mechanism in DBMS are:

    (1) Reducing the likelihood of failures (prevention)

    (2) Recover from failures (solving)

    Restore DB to a consistent state after some failures.

    124Database Management Systems and Their Implementation, Xu Lizhen

    Redundancy is necessary.

    Should inspect all possible failures.

    General method:

    1) Periodical dumping

    Variation : Backup + Incremental dumping

    I.D --- updated parts of DB

    tdumping dumping failure

    Update lost

    125Database Management Systems and Their Implementation, Xu Lizhen

    tBackup I.D failureI.D

    Update lost

    This method is easy to be implemented and theoverhead is low, but the update maybe lost after failureoccurring. So it is often used in file system or smallDBMS.

    2) Backup + Log

    Log : record of all changes on DB since the lastbackup copy was made.

    Change:Old value (before image --- B.I)New value (after image --- A.I)

    Recordedinto Log

    126Database Management Systems and Their Implementation, Xu Lizhen

    or up ate op. : . .

    insert op. : ---- A.I

    delete op. : B.I ----

    tBackup failure

    Log

  • 7/28/2019 DBMS and Implementation All(Full)

    43/69

    43

    While recovering:

    Some transactions maybe half done, shouldundo them with B.I recorded in Log.

    Some transactions have finished but the

    127Database Management Systems and Their Implementation, Xu Lizhen

    results have not been written into DB in time,should redo them with A.I recorded in Log.(finish writing into DB)

    It is possible to recover DB to the most recentconsistent state with Log.

    3) Multiple Copies

    Advantages:

    1 increase reliabilit

    There are multi copies for every data object. Recoveredwith other copies while failure occurs. The systemcannot be collapsed because of some copys failure.

    128Database Management Systems and Their Implementation, Xu Lizhen

    (2) recovery is very easy

    Problems:

    (1) difficult to acquire independent failure modes incentralized database systems.

    (2) waste in storage space

    So this method is not suitable for Centralized DBMS.

    6.2 Transaction

    A transaction T is a finite sequence of actions onDB exhibiting the following effects:

    Atomic action: Nothing or All.

    Consistency preservation: consistency state of

    129Database Management Systems and Their Implementation, Xu Lizhen

    ano er cons s ency s a e o .

    Isolation: concurrent transactions should runas if they are independent each other.

    Durability:The effects of a successfullycompleted transaction are permanentlyreflected in DB and recoverable even failureoccurs later.

  • 7/28/2019 DBMS and Implementation All(Full)

    44/69

    44

    Example: transfer money s from account A to account B

    Begin transaction

    read A

    A:=A-s

    if A

  • 7/28/2019 DBMS and Implementation All(Full)

    45/69

    45

    6.4 Commit Rule and Log Ahead Rule

    6.4.1 Commit Rule

    A.I must be written to nonvolatile storage beforecommit of the transaction.

    6.4.2 Log Ahead Rule

    If A.I is written to DB before commit then B.I must

    133Database Management Systems and Their Implementation, Xu Lizhen

    first written to log.

    6.4.3 Recovery strategies

    (1)The features of undo and redo (are idempotent) :

    undo(undo(undo undo(x) )) = undo(x)

    redo(redo(redo redo(x) )) = redo(x)

    (2) Three kinds of update strategy

    a) A.IDB before commit

    TIDactive list

    B.ILog (Log Ahead Rule)

    A.IDB

    134Database Management Systems and Their Implementation, Xu Lizhen

    TIDcommit listdelete TID from active list

    commit

    The recovery after failure in this situation

    Check two lists for every TID while restartingafter failure:

    Commitlist

    Activelist

    135Database Management Systems and Their Implementation, Xu Lizhen

    undo, delete TID fromactive list

    delete TID from active list

    nothing to do

  • 7/28/2019 DBMS and Implementation All(Full)

    46/69

    46

    b) A.IDB after commit

    TIDactive list

    A.ILog (Commit Rule)

    136Database Management Systems and Their Implementation, Xu Lizhen

    TIDcommit list

    A.IDB

    delete TID from active list

    commit

    The recovery after failure in this situation

    Check two lists for every TID while restartingafter failure:

    Commitlist

    Activelist

    137Database Management Systems and Their Implementation, Xu Lizhen

    delete TID from active list

    redo, delete TID fromactive list

    nothing to do

    c) A.IDB concurrently with commit

    TIDactive list

    A.I, B.ILog (Two Rules)

    A.IDB (partially done)

    138Database Management Systems and Their Implementation, Xu Lizhen

    TIDcommit list

    A.IDB (completed)

    delete TID from active listcommit

  • 7/28/2019 DBMS and Implementation All(Full)

    47/69

    47

    The recovery after failure in this situation

    Check two lists for every TID while restartingafter failure:

    Commitlist

    Activelist

    139Database Management Systems and Their Implementation, Xu Lizhen

    undo, delete TID fromactive list

    redo, delete TID fromactive list

    nothing to do

    Conclusion :

    redo undo

    a)

    140Database Management Systems and Their Implementation, Xu Lizhen

    b)

    c)

    d)?

    6.5 Update Out of Place

    Keep two copies for every page of a relation

    Keep a page table (PT) for every relation

    When updating some page, produce a newpage out of place, change the correspondingpointer in page table while the transaction

    141Database Management Systems and Their Implementation, Xu Lizhen

    committing, let it point to new page.

    new

    PT

    old

    Suppose relation Rhas N pages, then thelength of its PT is N

  • 7/28/2019 DBMS and Implementation All(Full)

    48/69

    48

    P141 : lories approach

    commit

    Master Record (in memory)

    kth

    Master Record (on disk)

    142Database Management Systems and Their Implementation, Xu Lizhen

    New

    PTji

    PT1

    Old

    PTj PTn

    6.6 Recovery Procedures

    Failure types :

    1) Transaction failure: because of some reason beyondexpectation, the transaction has to be aborted.

    2) System failure: the operating system collapse, but theDB on disk is not damaged. Such as power cutsuddenl .

    143Database Management Systems and Their Implementation, Xu Lizhen

    .

    3) Media failure: disk failure, the DB on the disk isdamaged.

    Solutions :1) Transaction failure: because it must occur before

    committing :

    Undo if necessary

    Delete TID from active list

    2) System failure:

    Restore the system

    Undo or redo if necessary

    3) Media failure:

    144Database Management Systems and Their Implementation, Xu Lizhen

    Load the latest dump

    Redo according the log

  • 7/28/2019 DBMS and Implementation All(Full)

    49/69

    49

    6.7 System Start Up

    1) Emergency restart

    Start after system or media failure. Recoveryis needed before start.

    2) Warm start

    145Database Management Systems and Their Implementation, Xu Lizhen

    Start after system shutdown. Recovery is notrequired.

    3) Cold start

    Start the system from scratch. Start after acatastrophic failure or start a new DB.

    6.8 Two Phase Commit

    The transactions in DDBMS are distributed transactions, the keyof distributed transaction management is how to assure all sub-transactions either commit together or abort together.

    Realize the sub-transactions harmony with each other relies oncommunication, while the communication is not reliable.

    Two general paradox : No fixed length protocol exists.

    146Database Management Systems and Their Implementation, Xu Lizhen

    Solution : number the messages.

    A BMarch i

    OK i

    March i+1

    OK i+1

    If A has not received OK i+11) B had not received March i+12) OK i+1 has been lost

    A send March i+1 again; for B :1) If has received, send OK i+1 again

    2) or, , send OK i+1

    When there are multi generals, select one of them ascoordinator

    coordinator

    147Database Management Systems and Their Implementation, Xu Lizhen

    participantparticipant

  • 7/28/2019 DBMS and Implementation All(Full)

    50/69

    50

    Two Phase Commit

    coordinato

    r

    participate

    Broadcast prepare

    OK / not OK

    Commit/Abort

    Ack

    If the sub-transaction successes andcan commit, it answers OK, or not OK

    If all of the sub-transactions are OK,Commit command is sent out, orAbort command is sent out

    I

    II

    148Database Management Systems and Their Implementation, Xu Lizhen

    OK --- if success

    Commit --- all OK Abort --- any one not OK

    not OK --- if fail

    Every participant is self-determining before answering OK, it canabort by itself. Once answers OK, it can only wait for the commandcome from the coordinator.

    If the coordinator has failure after the participates answer OK, theparticipates have to wait, and is in blocked state. This is thedisadvantage of 2PC.

    Three Phase Commit

    Broadcast prepare

    Ready / Abort

    Prepare to commit

    OK

    If the sub-transaction successes and cancommit, it answers Ready, or Abort

    Broadcast to all participants and thisMSG is recorded on the disk

    Commit/Abort

    I

    II

    oordinator

    participate

    149Database Management Systems and Their Implementation, Xu Lizhen

    If the coordinator has not any failure, phase II is wasted If the coordinator has failure after the participates answer OK, the

    participates communicate each other and check the MSG recordedon disk in phase II, and a new coordinator is elected. If the newcoordinator finds the Prepare to commit MSG on any participate, itsends out Commit command, or sends out Abort command. So theblocked problem can be solved in 3PC.

    AckIII

    7. Concurrency Control

    Database Management Systemsand Their Implementation, Xu

    Lizhen

  • 7/28/2019 DBMS and Implementation All(Full)

    51/69

    51

    7.1 Introduction

    In multi users DBMS, permit multitransaction access the database concurrently.

    7.1.1 Why concurrency?

    151Database Management Systems and Their Implementation, Xu Lizhen

    1) Improving system utilization & response time.

    2) Different transaction may access to differentparts of database.

    7.1.2 Problems arise from concurrent executions

    T1

    Read(x)

    x:=x+1

    Write(x)

    T2

    Read(x)

    x:=2*x

    T1

    Write(t)

    T2

    Read(t[x])

    Read(t[y])

    T1

    Read(x)

    Read(x)

    T2

    Write(x)

    152Database Management Systems and Their Implementation, Xu Lizhen

    r e xt

    Lost update

    ro ac

    Dirty read Unrepeatable read

    So there maybe three kinds of conflict when transactions executeconcurrently. They are write write, write read, and read writeconflicts. Write write conflict must be avoided anytime. Write readand read write conflicts should be avoided generally, but they areendurable in some applications.

    7.1.3 Serialization --- the criterion forconcurrency consistency

    Definition: suppose {T1,T2,Tn} is a set of transactionsexecuting concurrently. If a schedule of {T1,T2,Tn}produces the same effect on database as some serialexecution of this set of transactions, then the schedule isserializable.

    153Database Management Systems and Their Implementation, Xu Lizhen

    serial execution different result? (yes, n!)

    TA

    Read R2

    TB

    Read R1

    Write R2

    TC

    Write R1

    The result of this scheduleis the same as serialexecution TA TB TC, soit is serializable. Theequivalent serial executionis TATBTC .

  • 7/28/2019 DBMS and Implementation All(Full)

    52/69

    52

    7.1.4 View equivalent and conflict equivalent

    Schedules --- sequences that indicate thechronological order in which instructions ofconcurrent transactions are executed

    a schedule for a set of transactions must

    154Database Management Systems and Their Implementation, Xu Lizhen

    consist o a instructions o t osetransactions

    must preserve the order in which theinstructions appear in each individualtransaction.

    Example Schedules

    Let T1 transfer $50 fromA to B, and T2 transfer 10% of the balance fromA to B. The following is a serial schedule, in which T1 is followed by T2.

    T1 T2

    read(A)A :=A 50write(A)

    155Database Management Systems and Their Implementation, Xu Lizhen

    rea (B)B := B + 50write(B)

    read(A)temp :=A*0.1;A :=A tempwrite(A)read(B)B := B + tempwrite(B)t

    Example Schedules

    Let T1 and T2 be the transactions defined previously. Thefollowing schedule is not a serial schedule, but it is equivalentto the above.

    T1 T2

    read(A)A :=A 50write(A)

    156Database Management Systems and Their Implementation, Xu Lizhen

    read (A)temp := A*0.1

    A = A tempwrite(A)

    read(B)B := B + 50write(B)

    read(B)B := B + tempwrite(B)t

  • 7/28/2019 DBMS and Implementation All(Full)

    53/69

    53

    View equivalentand Conflict equivalent

    Let S and S be two schedules with the same set of

    transactions. S and S are view equivalent if theyproduce the same effect on database based on thesame initial execution condition.

    Conflict operations : R-WW-W. The sequence of

    157Database Management Systems and Their Implementation, Xu Lizhen

    con c opera on w a ec e e ec o execu on.

    Non-conflicting operations: R-R Even if thereare write operation, the data items operated aredifferent. Such as Ri(x) and Wj(y).

    If a schedule S can be transformed into a schedule Sby a series of swaps of non-conflicting operations, wesay that S and S are conflict equivalent.

    Property: if schedule S and S are conflict equivalent, theymust be view equivalent. It is not right contrarily.

    Serialization can be divided into view serialization andconflict serialization.

    Example 1: for the schedule s of transaction set {T 1,T2,T3}

    s = R2 x)W3 x)R1 )W2 ) R1 )R2 x)W2 )W3 x) = s

    158Database Management Systems and Their Implementation, Xu Lizhen

    s is conflict serialization because s is a serial execution.

    Example 2: s = R1(x)W2(x)W1(x)W3(x)

    There is no conflict equivalent schedule of s, but we can find aschedule s

    s = R1(x)W1(x)W2(x)W3(x)

    It is view equivalent with s, and s is a serial execution, so s isview serialization.

    The test algorithm of view equivalent is a NPproblem, while conflict serialization coversthe most instances of serializable schedule, sothe serialization we say in later parts will

    oint to conflict serialization if without

    159Database Management Systems and Their Implementation, Xu Lizhen

    special indication.

  • 7/28/2019 DBMS and Implementation All(Full)

    54/69

    54

    7.1.5 Preceding graph

    Directed graph G =

    V --- set of vertexes, including all transactionsparticipating in schedule.

    E --- set of edges, decided through the analysis ofconflict operations. If any of the following conditionsis fulfilled add an ed e TT into E:

    160Database Management Systems and Their Implementation, Xu Lizhen

    i j

    Ri(x) precedes Wj(x)

    Wi(x) precedes Rj(x)

    Wi(x) precedes Wj(x)

    Finally, check if there is cycle in the preceding graph.If there is cycle in it, the schedule is not serializable,or it is serializable.

    Find equivalent serial execution while serialization

    1) Because there is no cycle, there must be somevertexes whose in-degree is 0. Remove these vertexesand relative edges from the preceding graph, andstore these vertexes into a queue.

    2) Process the left graph in the same way as above, butthe vertexes removed should be stored behind the

    161Database Management Systems and Their Implementation, Xu Lizhen

    existing vertexes in the queue.

    3) Repeat 1 and 2 until all vertexes moved into the

    queue.Example: for schedule s on {T1,T2,T3,T4}, suppose :

    s = W3(y)R1(x)R2(y)W3(x)W2(x)W3(z)R4(z)W4(x)

    Is it serializable? Find out the equivalent serialexecution if it is.

    s = W3

    (y)R1

    (x)R2

    (y)W3

    (x)W2

    (x)W3

    (z)R4

    (z)W4

    (x)

    T1

    T2

    T3

    T4

    T2

    T3

    T4

    Queue : T1

    T2

    T4

    Queue : T1,T3

    162Database Management Systems and Their Implementation, Xu Lizhen

    The commission of concurrency control is to enforce theconcurrent transactions executed in a serializableschedule.

    T4

    Queue : T1,T3,T2Equivalent serial execution :T1T3T2T4

    Empty

  • 7/28/2019 DBMS and Implementation All(Full)

    55/69

    55

    7.2 Locking Protocol

    Locking method is the most basic concurrency control

    method. There maybe many kinds of locking protocols.7.2.1 X locks

    Only one type of lock, for both read and write.

    163Database Management Systems and Their Implementation, Xu Lizhen

    --- --- Y --- compatible N --- incompatible

    NL X

    NL Y Y

    X Y N

    X_lock RUpdate RX_unlock REOT

    TAX_lock Rwait

    X_lock RRead R

    TB

    *Two Phase Locking

    Definiton1: In a transaction, if all locks precede allunlocks, then the transaction is called two phasetransaction. This restriction is called two phaselocking protocol.

    Definition2: In a transaction, if it first acquires a lockon the ob ect before o eratin on it, it is called well-

    164Database Management Systems and Their Implementation, Xu Lizhen

    formed. T1Lock A

    Lock BLock CUnlock AUnlock BUnlock C

    T2Lock A

    Lock BUnlock AUnlock BLock CUnlock C

    2PL not 2PL

    Growing

    phase

    Shrinkingphase

    Theorem: If S is anyschedule of well-formed and twophase transactions,then S is serializable.

    (proving is on p151)

    Conclusions :

    1) Well-formed + 2PL : serializable

    2) Well-formed + 2PL + unlock update at EOT:serializable and recoverable. (without dominophenomena)

    3) Well-formed + 2PL + holding all locks to EOT: strict

    165Database Management Systems and Their Implementation, Xu Lizhen

    two phase locking transaction.

    7.2.2 (S,X) locks

    S lock --- if read access isintended.

    X lock --- if update accessis intended.

    NL S X

    NL Y Y Y

    S Y Y N

    X Y N N

  • 7/28/2019 DBMS and Implementation All(Full)

    56/69

    56

    7.2.3 (S,U,X) locks

    U lock --- update lock. For

    an update access thetransaction first acquiresa U-lock and thenpromote it to X-lock.

    Purpose: shorten the time

    NL S U X

    NL Y Y Y Y

    S Y Y Y N

    U Y Y N N

    166Database Management Systems and Their Implementation, Xu Lizhen

    of exclusion, so as toboost concurrency degree,and reduce deadlock.

    X Y N N N

    X (S,X) (S,U,X)

    Concurrency degree

    Overhead

    7.3 Deadlock & Live Lock

    Dead lock: wait in cycle, no transaction can obtain all of resourcesneeded to complete.

    Live lock: although other transactions release their resource inlimited time, some transaction can not get the resources needed fora very long time.

    TA

    TB R

    167Database Management Systems and Their Implementation, Xu Lizhen

    X_ ock R1

    X_lock R2

    wait

    _ ocX_lock R1wait

    T1: S-lockT2: S-lock

    T: x-lock

    Live lock is simpler, only need to adjust schedule strategy, suchas FIFO

    Deadlock: (1) Prevention(dont let it occur); (2) Solving(permit itoccurs, but can solve it)

    7.3.1 Deadlock Detection

    1) Timeout: If a transaction waits for some specifiedtime then deadlock is assumed and the transactionshould be aborted.

    2) Detect deadlock by wait-for graph G=

    V : set of transactions {T i|Ti is a transaction in DBS

    168Database Management Systems and Their Implementation, Xu Lizhen

    (i=1,2,n)}

    E : {|Ti waits for Tj (i j)} If there is cycle in the graph, the deadlock occurs.

    When to detect?

    1) whenever one transaction waits.

    2) periodically

  • 7/28/2019 DBMS and Implementation All(Full)

    57/69

    57

    What to do when detected?

    1) Pick a victim (youngest, minimum abort cost, )

    2) Abort the victim and release its locks and resources

    3) Grant a waiter

    4 Restart the victim automaticall or manuall

    169Database Management Systems and Their Implementation, Xu Lizhen

    7.3.2 Deadlock avoidance1) Requesting all locks at initial time of transaction.

    2) Requesting locks in a specified order of resource.

    3) Abort once conflicted.

    4) Transaction Retry

    Every transaction is uniquely time stamped. If TArequires a lock on a data object that is already locked byTB, one of the following methods is used:

    a) Wait-die: TA waits if it is older than TB, otherwise itdies, i.e. it is aborted and automatically retriedwith original timestamp.

    b) Wound-wait: TA waits if it is younger than TB,

    170Database Management Systems and Their Implementation, Xu Lizhen

    otherwise it wound TB, i.e. TB is aborted andautomatically retried with original timestamp.

    In above, both have only one direction wait, eitherolder younger or younger older. It is impossibleto occur wait in cycle, so the dead lock is avoided.

    7.4 Lock Granularities

    7.4.1 Locking in multi granularities

    To reduce the overhead of locking, the lock unit should be thebigger, the better; To boost the concurrency degree oftransactions, the lock unit should be the smaller, the better.

    In large scale DBMS, the lock unit is divided into several levels:

    DBFileRecordField

    171Database Management Systems and Their Implementation, Xu Lizhen

    In this situation, if a transaction acquires a lock on a data objectof some level then it acquires implicitly the same lock on eachdescendant of that data object.

    So, there are two kinds of locks in multi granularity lock method:

    Explicit lock

    Implicit lock

  • 7/28/2019 DBMS and Implementation All(Full)

    58/69

    58

    7.4.2 Intention lock

    How to check conflicts on implicit locks

    Intention lock: provide three kinds of intension lockswhich are IS, IX and SIX. For example, if a transactionadds a S lock on some lower level data object, all thehigher level data object which contains it should beadded an IS lock as a warning information. If another

    172Database Management Systems and Their Implementation, Xu Lizhen

    transaction want to apply an X lock on a higher leveldata object later, it can find the implicit conflictthrough IS lock.

    IS --- Intention share lock IX --- Intention exclusive lock SIX --- SIX

    DB

    File

    Record

    Field

    S

    IS

    IS

    Compatibility matrix while lock in multi granularities :

    NL IS IX S SIX X

    NL Y Y Y Y Y Y

    IS Y Y Y Y Y N

    IX Y Y Y N N N

    X

    SIX

    IXS

    strong

    Exclusion

    173Database Management Systems and Their Implementation, Xu Lizhen

    S Y Y N Y N N

    SIX Y Y N N N N

    X Y N N N N N

    The lock with strong exclusion degree can substitutethe lock with weak exclusion degree while locking, butit is not right contrarily.

    IS

    NL weak

    egree

    Locking Rules:

    Locks are requested from root to leaves and releasedfrom leaves to root.

    Locking order DB

    Example:

    174Database Management Systems and Their Implementation, Xu Lizhen

    record file DB

    (1) S IS IS

    (2) X IX IX

    (3) All read and some write SIX IX, IS

    Request X lock to records need updating Substitute with stronger exclusive lock

    File

    Record

    Field

  • 7/28/2019 DBMS and Implementation All(Full)

    59/69

    59

    7.5 Locking on Index (B+ Tree)

    Transactions access concurrently not only the data in DB,

    but also the indexes on these data, and apply operationson indexes, such as search, insert, delete, etc. So indexesalso need concurrency control while multi granularitylocking is supported.

    175Database Management Systems and Their Implementation, Xu Lizhen

    ow can we e c en y o c a par cu ar ea no e Btw, dont confuse this with multiple granularity locking!

    One solution: Ignore the tree structure, just lock pageswhile traversing the tree, following 2PL.

    This has terrible performance! Root node (and many higher level nodes) become bottlenecks

    because every tree access begins at the root.

    Some Useful Observations

    Every access to B+ tree need a traverse from root tosome leaf. Only leaves have detail information aboutdata, such as TID, while higher levels of the tree onlydirect searches for leaf pages.

    One node occupies one page generally, so the lockunit of B+ tree is page. Dont need multi granularity

    176Database Management Systems and Their Implementation, Xu Lizhen

    locks, only S, X lock on page level are enough.

    B+ tree is key resource accessed frequently, liable to

    be the bottleneck of system. Performance is veryimportant in index concurrency control.

    If occur conflict while traverse the tree, discard alllocks applied, and search from root again after somedelay. Avoid deadlock resulted by wait.

    Some Useful Observations Originally, locks on index are only used to keep the

    consistency of index itself. The correctness ofconcurrent transactions is responsible by 2PL. Fromthis sense, the locks on index dont need keeping toEOT, they can release immediately after finish the

    177Database Management Systems and Their Implementation, Xu Lizhen

    .But in 7.6 we will know, even the strict 2PL has leakwhile multi granularity locks are permitted. The lockon leaf of B+ tree should be kept to EOT in order tomake up the leak of strict 2PL in this situation, whilelocks on other nodes of the tree can be released afterfinishing of search.

  • 7/28/2019 DBMS and Implementation All(Full)

    60/69

    60

    ExampleROOT

    A

    B

    20

    35

    Traversing

    the tree

    178Database Management Systems and Their Implementation, Xu Lizhen

    C

    D E

    F

    G H I20*

    38 44

    22* 23* 24* 35* 36* 38* 41* 44*

    23

    Tree Locking Algorithm

    1. While traversing, apply S lock on root first, then apply S lock onthe child node selected. Once get S lock on the child, the S lockon parent can be released, because traversing cant go back.Search like this until arrive leaf node. After traversing, only Slock on wanted leaf is left. Keep this S lock till EOT.

    2. While inserting new index item, traverse first, find the leaf node

    179Database Management Systems and Their Implementation, Xu Lizhen

    w ere e new em s ou e nser e n. pp y oc on sleaf node.

    If it is not full, insert directly.

    If it is full, split node according to the rule of B+ tree. Whilesplitting, besides the original leaf, the new leaf and their parentshould add X lock. If parent is also full, the splitting will continue.

    In every splitting, must apply X lock on each node to be changed.These X locks can be released when the changes are finished.

    After the all inserting process is completed, the X lock on leaf nodeis changed to S lock and kept to EOT.

    Tree Locking Algorithm3. While deleting an index item from the tree, the

    procedure is similar as inserting. Deleting may causethe combination of nodes in B+ tree. The nodechanged must be X locked first and X lock releasedafter finishing change. The X lock on leaf node is also

    180Database Management Systems and Their Implementation, Xu Lizhen

    .

  • 7/28/2019 DBMS and Implementation All(Full)

    61/69

    61

    7.6 Phantom and Its Prevention

    The assumption that the DB is a fixed

    collection of objects is not true when multigranularity locking is permitted. Then evenStrict 2PL will not assure serializability:

    T1 locks all a es containin sailor records with

    181Database Management Systems and Their Implementation, Xu Lizhen

    rating = 1, and finds oldest sailor (say, age = 71).

    Next, T2 inserts a new sailor; rating = 1, age = 96.

    T2 also deletes oldest sailor with rating = 2 (and,say, age = 80), and commits.

    T1 now locks all pages containing sailor recordswith rating = 2, and finds oldest (say, age = 63).

    No consistent DB state where T1 is correct!

    The Problem

    T1 implicitly assumes that it has locked the set of allsailor records with rating = 1.

    Assumption only holds if no sailor records are added whileT1 is executing!

    Need some mechanism to enforce this assumption. (Indexlocking and predicate locking)

    182Database Management Systems and Their Implementation, Xu Lizhen

    Example shows that conflict serializability guaranteesserializability only if the set of objects is fixed!

    If the system dont support multi granularity locking,or even if support multi granularity locking, thequery need to scan the whole table and add S lock onthe table, then there is not this problem. For example :

    select s#, average(grade) from SC group by s#;

    Index Locking

    If there is a dense index on the rating field, T1should lock the index node containing the dataentries with rating = 1 and keep it until EOT.

    If there are no records with rating = 1, T1 must lock theindex node where such a data entry would be, if it existed!

    r=1

    Data

    Index

    183Database Management Systems and Their Implementation, Xu Lizhen

    W en T2 wants to insert a new sai or rating = 1,age = 96 ), he cant get the X lock on the index nodecontaining the data entries with rating = 1, so hecant insert the new index item to realize the insertof a new sailor.

    If there is no suitable index, T1 must lock thewhole table, no new records can be added beforeT1 commit of course.

  • 7/28/2019 DBMS and Implementation All(Full)

    62/69

    62

    Predicate Locking

    Grant lock on all records that satisfy some

    logical predicate, e.g. age > 2*salary. Index locking is a special case of predicate

    locking for which an index supports efficient

    184Database Management Systems and Their Implementation, Xu Lizhen

    . What is the predicate in the sailor example?

    In general, predicate locking has a lot oflocking overhead. It is almost impossible torealize it.

    7.7 Isolation Level of Transaction

    Support for isolation level of transaction is addedfrom SQL-92. Each transaction has an access mode, adiagnostics size, and an isolation level.

    SET TRANSACTION statement

    Possible result

    185Database Management Systems and Their Implementation, Xu Lizhen

    so ationLevel

    Lock demandDirtyRead

    UnrepeatableRead

    PhantomProblem

    Read

    Uncommitted Maybe Maybe Maybe

    No lock when read; X lock when write,

    keep until EOTReadCommitted

    No Maybe MaybeS lock when read, release after read; Xlock when write, keep until EOT

    RepeatableR