materialized view creation and transformation of schemas in highly available database systems

217
Materialized View Creation and Transformation of Schemas in Highly Available Database Systems Thesis for the degree philosophiae doctor Trondheim, October 2007 Norwegian University of Science and Technology Faculty of Information Technology, Mathematics and Electrical Engineering Department of Computer and Information Science Jørgen Løland

Upload: others

Post on 11-Sep-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

Materialized View Creation

and Transformation of

Schemas in Highly Available

Database Systems

Thesis for the degree philosophiae doctor

Trondheim, October 2007

Norwegian University of Science and TechnologyFaculty of Information Technology, Mathematics and Electrical Engineering Department of Computer and Information Science

Jørgen Løland

Page 2: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

NTNUNorwegian University of Science and Technology

Thesis for the degree philosophiae doctor

Faculty of Information Technology, Mathematics and Electrical Engineering Department of Computer and Information Science

© Jørgen Løland

ISBN 978-82-471-4381-0 (printed version)ISBN 978-82-471-4395-7 (electronic version)ISSN 1503-8181

Doctoral theses at NTNU, 2007:199

Printed by NTNU-trykk

Page 3: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

To Ingvild and Ottar.

Page 4: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems
Page 5: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

Preface

This thesis is submitted to the Norwegian University of Science and Technol-ogy in partial fulfillment of the degree PhD. The work has been carried outat the Database System Group, Department of Computer and InformationScience (IDI). The study was funded by the Faculty of Information Tech-nology, Mathematics and Electrical Engineering through the “forskerskolen”program.

Acknowledgements

First, I would like to thank my advisor Professor Svein-Olaf Hvasshovd forhis guidance and ideas, and for providing valuable comments to drafts ofthe thesis and papers. I would also like to thank my co-advisors Dr. Ing.Øystein Torbjørnsen and Professor Svein Erik Bratsberg for constructivefeedback and interesting discussions regarding the research.

During the years I have been working on this thesis, I have received helpfrom many people. In particular, I would like to thank Heine Kolltveit andJeanine Lilleng for many interesting discussions. In addition, Professor KjetilNørvag has been a seemingly infinite source of information when it comes toacademic publishing. I would also like to thank the members of the DatabaseSystem Group in general for providing a good environment for PhD students.

I sincerely thank Rune Havnung Bakken, Jon Olav Hauglid and AssociateProfessor Roger Midtstraum for proofreading and commenting drafts of thethesis. Your feedback have been invaluable.

I would also like to thank my parents and sister for their inspiration andencouragements. Finally, I express my deepest thanks to my wife Ingvild forher constant love and support.

Page 6: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems
Page 7: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

Abstract

Relational database systems are used in thousands of applications every day,including online web shops, electronic medical records and for mobile tele-phone tracking. Many of these applications have high availability require-ments, allowing the database system to be offline for only a few minutes eachyear.

In existing DBMSs, user transactions get blocked during creation of mate-rialized views (MVs) and non-trivial schema transformations. Blocking usertransactions is not an option in database systems requiring high availability.A non-blocking method to perform these operations is therefore needed.

Our research has focused on how the MV creation and schema transfor-mation operations can be performed in database systems with high avail-ability requirements. We have examined existing solutions to MV creationand schema transformations, and identified requirements. Most importantamong these requirements were that the method should not have blockingeffects, and should degrade performance of concurrent transactions to thesmallest possible extent.

The main contribution of this thesis is a method for creation of derivedtables (DTs) using relational operators. Furthermore, we show how theseDTs can be used to create MVs and to perform schema transformations. Themethod is non-blocking, and may be executed as a low priority backgroundprocess to minimize performance degradation.

The MV creation and schema transformation methods have been imple-mented in a prototype DBMS. By performing thorough empirical validationexperiments on this prototype, we show that the method works correctly.Furthermore, through extensive performance experiments, we show that themethod incurs little response time and throughput degradation under mod-erate workloads. Thus, the method provides a way to create MVs and totransform the database schema that can be used in highly available databasesystems.

Page 8: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems
Page 9: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

Contents

I Background and Context 3

1 Introduction 51.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.1.1 The Derived Table Creation Problem . . . . . . . . . . 71.2 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . 101.3 Research Methodology . . . . . . . . . . . . . . . . . . . . . . 111.4 Organization of this thesis . . . . . . . . . . . . . . . . . . . . 12

2 Derived Table Creation Basics 142.1 Database Systems - An Introduction . . . . . . . . . . . . . . 142.2 Concurrency Control . . . . . . . . . . . . . . . . . . . . . . . 162.3 Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4 Record Identification Policy . . . . . . . . . . . . . . . . . . . 21

3 A Survey of Technologies Related to Non-Blocking DerivedTable Creation 233.1 Ronstrom’s Schema Transformations . . . . . . . . . . . . . . 23

3.1.1 Simple Schema Changes . . . . . . . . . . . . . . . . . 253.1.2 Complex Schema Changes . . . . . . . . . . . . . . . . 253.1.3 Cost Analysis of Ronstrom’s Method . . . . . . . . . . 36

3.2 Fuzzy Table Copying . . . . . . . . . . . . . . . . . . . . . . . 403.3 Materialized View Maintenance . . . . . . . . . . . . . . . . . 41

3.3.1 Snapshots . . . . . . . . . . . . . . . . . . . . . . . . . 413.3.2 Materialized Views . . . . . . . . . . . . . . . . . . . . 42

3.4 Schema Transformations and DT creation in Existing DBMSs 443.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

II Derived Table Creation 47

4 The Derived Table Creation Framework 49

Page 10: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

x CONTENTS

4.1 Overview of the Framework . . . . . . . . . . . . . . . . . . . 494.2 Step 1: Preparation . . . . . . . . . . . . . . . . . . . . . . . . 514.3 Step 2: Initial Population . . . . . . . . . . . . . . . . . . . . 534.4 Step 3: Log Propagation . . . . . . . . . . . . . . . . . . . . . 534.5 Step 4: Synchronization . . . . . . . . . . . . . . . . . . . . . 544.6 Considerations for Schema Transformations . . . . . . . . . . 56

4.6.1 A lock forwarding improvement for schema transfor-mations . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5 Common DT Creation Problems 605.1 Missing Record and State Identification . . . . . . . . . . . . . 605.2 Missing Record Pre-States . . . . . . . . . . . . . . . . . . . . 615.3 Lock Forwarding During Transformations . . . . . . . . . . . . 635.4 Inconsistent Source Records . . . . . . . . . . . . . . . . . . . 66

5.4.1 Repairing Inconsistencies . . . . . . . . . . . . . . . . . 685.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6 DT Creation using Relational Operators 706.1 Difference and Intersection . . . . . . . . . . . . . . . . . . . . 71

6.1.1 Preparation . . . . . . . . . . . . . . . . . . . . . . . . 716.1.2 Initial Population . . . . . . . . . . . . . . . . . . . . . 736.1.3 Log Propagation . . . . . . . . . . . . . . . . . . . . . 736.1.4 Synchronization . . . . . . . . . . . . . . . . . . . . . . 75

6.2 Horizontal Merge with Duplicate Inclusion . . . . . . . . . . . 776.2.1 Preparation . . . . . . . . . . . . . . . . . . . . . . . . 796.2.2 Initial Population . . . . . . . . . . . . . . . . . . . . . 796.2.3 Log Propagation . . . . . . . . . . . . . . . . . . . . . 796.2.4 Synchronization . . . . . . . . . . . . . . . . . . . . . . 80

6.3 Horizontal Merge with Duplicate Removal . . . . . . . . . . . 816.3.1 Preparation Step . . . . . . . . . . . . . . . . . . . . . 836.3.2 Initial Population Step . . . . . . . . . . . . . . . . . . 836.3.3 Log Propagation Step . . . . . . . . . . . . . . . . . . 836.3.4 Synchronization Step . . . . . . . . . . . . . . . . . . . 85

6.4 Horizontal Split Transformation . . . . . . . . . . . . . . . . . 866.4.1 Preparation . . . . . . . . . . . . . . . . . . . . . . . . 866.4.2 Initial Population . . . . . . . . . . . . . . . . . . . . . 876.4.3 Log propagation . . . . . . . . . . . . . . . . . . . . . . 876.4.4 Synchronization . . . . . . . . . . . . . . . . . . . . . . 89

6.5 Vertical Merge . . . . . . . . . . . . . . . . . . . . . . . . . . . 896.5.1 Preparation . . . . . . . . . . . . . . . . . . . . . . . . 91

Page 11: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CONTENTS xi

6.5.2 Initial Population . . . . . . . . . . . . . . . . . . . . . 916.5.3 Log Propagation . . . . . . . . . . . . . . . . . . . . . 916.5.4 Synchronization . . . . . . . . . . . . . . . . . . . . . . 93

6.6 Vertical Split over a Candidate Key . . . . . . . . . . . . . . . 956.6.1 Preparation . . . . . . . . . . . . . . . . . . . . . . . . 966.6.2 Initial Population . . . . . . . . . . . . . . . . . . . . . 966.6.3 Log Propagation . . . . . . . . . . . . . . . . . . . . . 966.6.4 Synchronization . . . . . . . . . . . . . . . . . . . . . . 97

6.7 Vertical Split over a Functional Dependency . . . . . . . . . . 976.7.1 Preparation . . . . . . . . . . . . . . . . . . . . . . . . 996.7.2 Initial Population . . . . . . . . . . . . . . . . . . . . . 1006.7.3 Log Propagation . . . . . . . . . . . . . . . . . . . . . 1016.7.4 Synchronization . . . . . . . . . . . . . . . . . . . . . . 1026.7.5 How to Handle Inconsistent Data - An Extension to

Vertical Split . . . . . . . . . . . . . . . . . . . . . . . 1036.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

III Implementation and Evaluation 109

7 Implementation Alternatives 1117.1 Alternative 1 - Simulation . . . . . . . . . . . . . . . . . . . . 1127.2 Alternative 2 - Open Source DBMS . . . . . . . . . . . . . . . 1127.3 Alternative 3 - Prototype . . . . . . . . . . . . . . . . . . . . . 1157.4 Implementation Alternative Discussion . . . . . . . . . . . . . 117

8 Design of the Non-blocking DBMS 1208.1 The Non-blocking DBMS Server . . . . . . . . . . . . . . . . . 123

8.1.1 Database Communication Module . . . . . . . . . . . . 1238.1.2 SQL Parser Module . . . . . . . . . . . . . . . . . . . . 1238.1.3 Relational Manager Module . . . . . . . . . . . . . . . 1248.1.4 Scheduler Module . . . . . . . . . . . . . . . . . . . . . 1278.1.5 Recovery Manager Module . . . . . . . . . . . . . . . . 1288.1.6 Data Manager Module . . . . . . . . . . . . . . . . . . 1298.1.7 Effects of the Simplifications . . . . . . . . . . . . . . . 130

8.2 Client and Administrator Programs . . . . . . . . . . . . . . . 1328.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

9 Prototype Testing 1349.1 Test Environment . . . . . . . . . . . . . . . . . . . . . . . . . 1349.2 Empirical Validation of the Non-Blocking DT Creation Methods139

Page 12: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

xii CONTENTS

9.3 Performance Testing . . . . . . . . . . . . . . . . . . . . . . . 1429.3.1 Log Propagation - Difference and Intersection . . . . . 1479.3.2 Log Propagation - Vertical Merge . . . . . . . . . . . . 1549.3.3 Low Performance Degradation or Short Execution Time?1569.3.4 Other Steps of DT Creation . . . . . . . . . . . . . . . 1579.3.5 Performance Experiment Summary . . . . . . . . . . . 159

9.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

10 Discussion 16210.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

10.1.1 A General DT Creation Framework . . . . . . . . . . . 16310.1.2 DT Creation for Many Relational Operators . . . . . . 16310.1.3 Support for both Schema Transformations and Mate-

rialized Views . . . . . . . . . . . . . . . . . . . . . . . 16410.1.4 Solutions to Common DT Creation Problems . . . . . 16510.1.5 Implemented and Empirically Validated . . . . . . . . 16510.1.6 Low Degree of Performance Degradation . . . . . . . . 16510.1.7 Based on Existing DBMS Functionality . . . . . . . . . 16810.1.8 Other Considerations - Total Amount of Data . . . . . 169

10.2 Answering the Research Question . . . . . . . . . . . . . . . . 16910.2.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 171

11 Conclusion and Future Work 17211.1 Research Contributions . . . . . . . . . . . . . . . . . . . . . . 17211.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17311.3 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

IV Appendix 177

A Non-blocking Database: SQL Syntax 179

B Performance Graphs 181

Glossary 191

Bibliography 195

Page 13: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

List of Figures

2.1 Database System . . . . . . . . . . . . . . . . . . . . . . . . . 152.2 Compensation Log Records provide valid State Identifiers . . . 18

3.1 Ronstroms Horizontal Merge Method . . . . . . . . . . . . . . 273.2 Ronstroms Horizontal Split Method . . . . . . . . . . . . . . . 283.3 Examples of Vertical Merge Schema Change . . . . . . . . . . 303.4 Chain of Triggers in Ronstroms Vertical Merge Method . . . . 323.5 Ronstroms Vertical Split Method . . . . . . . . . . . . . . . . 333.6 Ronstroms Vertical Split Transformation . . . . . . . . . . . . 343.7 Ronstroms Vertical Split Method and Inconsistent Data . . . . 353.8 Example MV Consistency Problem . . . . . . . . . . . . . . . 42

4.1 The four steps of DT creation. . . . . . . . . . . . . . . . . . . 50

5.1 Solving the Record and State Identification Problems . . . . . 615.2 Solving the Missing Record Pre-State Problem . . . . . . . . . 625.3 Example Simple Lock Forwarding (SLF) . . . . . . . . . . . . 645.4 Lock Compatibility Matrix . . . . . . . . . . . . . . . . . . . . 655.5 Example Many-to-One Lock Forwarding (M1LF) . . . . . . . 665.6 Example Many-to-Many Lock Forwarding (MMLF) . . . . . . 675.7 Inconsistent Source Records . . . . . . . . . . . . . . . . . . . 67

6.1 Difference and Intersection DT Creation . . . . . . . . . . . . 726.2 Horizontal Merge DT Creation . . . . . . . . . . . . . . . . . . 776.3 Horizontal Merge - Duplicate Inclusion . . . . . . . . . . . . . 786.4 Horizontal Merge - Duplicate Inclusion with type attribute . . 806.5 Horizontal Merge - Duplicate Removal. . . . . . . . . . . . . . 826.6 Horizontal Split DT Creation . . . . . . . . . . . . . . . . . . 866.7 Example vertical merge DT creation. . . . . . . . . . . . . . . 906.8 Synchronization of a Vertical Merge Schema Transformation . 946.9 Vertical split over a Candidate Key. . . . . . . . . . . . . . . . 956.10 Vertical split over a non candidate key. . . . . . . . . . . . . . 98

Page 14: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

xiv LIST OF FIGURES

7.1 Possible Modular Design of Prototype. . . . . . . . . . . . . . 116

8.1 Modular Design Overview of the Non-blocking DBMS. . . . . 1218.2 UML Class Diagram of the Non-blocking Database System. . . 1228.3 Sequence Diagram - Relational Manager Processing a Query . 1268.4 Organization of the log. . . . . . . . . . . . . . . . . . . . . . 1288.5 Organization of data records in a table. . . . . . . . . . . . . . 1298.6 Screen shot of the Client program in action. . . . . . . . . . . 133

9.1 Response time and throughput for difference and intersection . 1459.2 Response time distribution for difference and intersection. . . . 1489.3 Response time - difference log propagation . . . . . . . . . . . 1519.4 Throughput - difference log propagation . . . . . . . . . . . . 1539.5 Response time and throughput - vertical merge DT creation . 1549.6 Comparison of vertical merge and difference/intersection . . . 1559.7 Time vs Degradation . . . . . . . . . . . . . . . . . . . . . . . 1579.8 Response Time Summary . . . . . . . . . . . . . . . . . . . . 160

11.1 Example of Schema Transformation performed in two steps. . 17511.2 Example interface for dynamic priorities for DT creation. . . . 175

B.1 Response time and throughput - horizontal merge DT creation 182B.2 Response time - horizontal split DT creation . . . . . . . . . . 183B.3 Throughput - horizontal split DT creation . . . . . . . . . . . 184B.4 Response time - vertical merge DT creation . . . . . . . . . . 185B.5 Response time - vertical merge DT creation, varying table size 186B.6 Throughput - vertical merge DT creation . . . . . . . . . . . . 187B.7 Response time - vertical split DT creation . . . . . . . . . . . 188B.8 Throughput - vertical split DT creation . . . . . . . . . . . . . 189

Page 15: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

List of Tables

3.1 The three dimensions of Ronstrom’s schema transformations. . 253.2 Legend for Tables 3.3 and 3.4. . . . . . . . . . . . . . . . . . . 363.3 Cost Incurred by Ronstrom’s Vertical Merge Schema Trans-

formation Method . . . . . . . . . . . . . . . . . . . . . . . . . 373.4 Added Cost by Ronstrom’s Vertical Split Schema Transforma-

tion Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.1 DT Creation Problems and Solutions . . . . . . . . . . . . . . 69

6.1 DT Creation Operators . . . . . . . . . . . . . . . . . . . . . . 716.2 Problems and solutions for DT Creation methods . . . . . . . 107

7.1 Evaluation - Open Source DBMSs . . . . . . . . . . . . . . . . 1157.2 Evaluation of implementation alternatives. . . . . . . . . . . . 118

9.1 Hardware and Software Environment for experiments. . . . . . 1359.2 Transaction Mix 1 . . . . . . . . . . . . . . . . . . . . . . . . 1379.3 Transaction Mix 2 . . . . . . . . . . . . . . . . . . . . . . . . 1379.4 Transaction Mix 3 . . . . . . . . . . . . . . . . . . . . . . . . 1389.5 Table Sizes used in the experiments . . . . . . . . . . . . . . . 1389.6 Response Time Distribution Summary . . . . . . . . . . . . . 1499.7 Response Time Initial Population and Log Propagation . . . . 1589.8 Effects of varying priorities . . . . . . . . . . . . . . . . . . . . 158

Page 16: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems
Page 17: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

Part I

Background and Context

Page 18: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems
Page 19: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

Chapter 1

Introduction

The topic of this thesis is schema transformations and materialized view cre-ation in relational database systems with high availability requirements. Themain focus is on how creation of derived tables can be used to perform bothoperations while incurring minimal performance degradation for concurrenttransactions.

In this chapter, the motivation for the topic is presented, and the researchquestions and methodology are discussed.

1.1 Motivation

Relational database systems have had tremendous success since Ted Coddintroduced the relation concept in the famous paper “A relational modelof data for large shared data banks” in 1970 (Codd, 1970). Today, thistype of database system is so dominant that “database system” is close tosynonymous with “relational database system”.

Relational database systems1, or simply database systems, are used invirtually all kinds of applications, spanning from simple personal databasesystems to huge and complex business database systems. Personal databasesystems, including CD or book archives and contact information for friendsand family, typically contain few tuples (order of hundreds). The conse-quences of unavailability2 for such database systems are, in general, not crit-ical; it would still be possible to play music even if the CD archive was

1Throughout this thesis, the term database is used to denote a collection of relateddata. A Database Management System (DBMS) is a program used to manage these data,while database system denotes a collection of data managed by a DBMS (Elmasri andNavathe, 2004).

2In this thesis, database systems are considered available when they can be fully ac-cessed by their intended users.

Page 20: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

6 1.1. MOTIVATION

unavailable. Because people normally consider sporadic downtime of suchsystems acceptable, these systems have low requirements for availability.

At the other end of the scale, database systems used in business appli-cations may be very large, often in the order of millions or even billionsof tuples. Business database systems are involved in everything from stockexchanges, web shops, banking and airline ticket booking to patient histo-ries at hospitals3, Enterprise Resource Planning (ERP) and Home LocationRegistries (HLR) used to keep track of mobile telephones in a network.

While database system availability is not critical to all business applica-tions, it certainly is to many. Database systems are, e.g., required for theexchange of stocks at NASDAQ and for customers to shop at Amazon.com.Even more critical; the HLR database system is required for any mobilephone to work in a network. These systems should not be unavailable forlong periods of time.

Database Operations

Users interact with database systems by performing transactions, which mayconsist of one or more database operations (Elmasri and Navathe, 2004).Examples of basic database operations include inserting a new patient into ahospital database and querying a patient’s medical record for possible dan-gerous allergies before a surgery.

In modern database systems, e.g. Microsoft SQL Server 2005 (MicrosoftTechNet, 2006) and IBM DB2 version 9 (IBM Information Center, 2006), thebasic operations are designed to achieve high degrees of concurrency (Garcia-Molina et al., 2002). While the operations performed by one user may blockother users from accessing the very same data items at the same time, otherdata items are still accessible.

A database operation is said to be blocking if it keeps other transactionsfrom executing their update (and possibly read) operations, effectively mak-ing the involved data unavailable. Short term blocking of small parts of thedatabase may not be problematic. However, blocking a huge amount of datafor a long period of time seriously reduces availability. This is obviouslyunwanted in highly available database systems.

In the following section, two database operations, database schema trans-formations and creation of materialized views, are described. Neither of thesecan be performed without blocking the involved tables for a long period oftime in existing DBMSs (Løland, 2003).

3Patient history databases may, e.g., describe previous treatment, allergies, x-ray im-ages etc

Page 21: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 1. INTRODUCTION 7

1.1.1 The Derived Table Creation Problem

”Due to planned technical maintenance and upgrades, the onlinebank will be unavailable from Saturday 8 p.m. to Sunday 10 a.m.

Our contact center will not be able to help with account infor-mation as internal systems are affected as well.

We are sorry for the inconvenience this may cause for our cus-tomers.”

Norwegian Online Bank, October 2006

Schema Transformations

Database schemas4 are typically designed to model the relevant parts andaspects of the world at design time. The schema may be excellent for theintended usage at the time it is design, but many applications change overtime. New kinds of products appear, departments which had one head ofdepartment suddenly has a board, or new laws that affect the company areintroduced by the government. These are examples of changes that mayrequire a transformation of the database schema.

In addition to changing needs as a source for transformations, designersmay also have been unsuccessful in designing a good schema in the first place.After being used for some time, it may turn out that a schema does not workas well as it was intended to. Often, the reason for this is that the design is acompromise between many factors, some of which include readability of theE/R diagram, removal of anomalies and optimization of runtime efficiency. Itmay very well turn out that the schema is too inefficient or that the designersjust forgot or misinterpreted something.

In a study of seven applications, Marche (Marche, 1993) reports of sig-nificant changes to relational database schemas over time. Six of the studiedschemas had more than 50% of their attributes changed. The evolution con-tinued after the development period had ended. A similar study of a healthmanagement system came to the same conclusion (Sjøberg, 1993).

As should be clear, a database schema may sometimes have to be changedafter the database has been populated with data. In this thesis, we refer tosuch changes as “schema transformations”. An important shortcoming of allbut the least complex schema transformations is that they must be performedin a blocking way in todays DBMSs (Lorentz and Gregoire, 2003b; MicrosoftTechNet, 2006; IBM Information Center, 2006). This will be elaborated onin Section 3.4.

4The description, or model, of a database (Elmasri and Navathe, 2000).

Page 22: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

8 1.1. MOTIVATION

Materialized Views

A database view is a table derived from other tables, and is defined by adatabase query called the view query (Elmasri and Navathe, 2004). Viewsmay be either virtual or materialized. Virtual views do not physically storeany records, but can still be queried like normal tables. This is done by usingthe view queries to rewrite the user queries (Elmasri and Navathe, 2004).

Depending on the complexity of the view query, querying a virtual viewmay be much more costly than querying a normal table. To remedy this,most modern DBMSs support Materialized Views (MVs)5 (Løland, 2003).Unlike a virtual view, the result of the view query is stored physically in anMV (Elmasri and Navathe, 2004).

MVs have many uses in addition to speeding up queries (Alur et al., 2002).They can be used to store historical information, e.g. sales reports for eachquarter of a year. They are also frequently used in Data Warehouses. Becauseof the great performance advantages of MVs and their widespread use, muchresearch has been conducted on how to keep MVs consistent with the sourcetables (Løland and Hvasshovd, 2006c). However, in current DBMSs, the MVsstill have to be created in a way that blocks all updates to the source tableswhile the MV is created (Lorentz and Gregoire, 2003b; IBM InformationCenter, 2006; Microsoft TechNet, 2006).

Using Derived Tables for Schema Transformations and Material-ized View Creation

The blocking MV creation and schema transformation methods described inthe previous sections may take minutes or more for tables with large amountsof data. If either of these operations is required, the database administratoris forced to choose between unavailability while performing the operation,or to not perform it at all. Both choices may, however, be unacceptable.This is especially the case when the database system has high availabilityrequirements.

A derived table (DT) is, as the name suggests, a database table containingrecords derived from other tables6 (Elmasri and Navathe, 2004). A table“Sales Report” that stores a one-year history of all sales by all employeesis an example of a DT. Hence, a materialized view is obviously one type ofDT. A less intuitive application of DTs is to redirect operations from source

5Materialized Views are called Indexed Views by Microsoft (Microsoft TechNet, 2006)and Materialized Query Table by IBM (IBM Information Center, 2007)

6Throughout this thesis, the tables that records are derived from will be called sourcetables.

Page 23: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 1. INTRODUCTION 9

tables to derived tables and thereby perform a schema transformation. Amethod to create DT is therefore likely to be usable for both operations.

Both schema transformations and Materialized Views are defined by aquery (Microsoft TechNet, 2006; Lorentz and Gregoire, 2003a; Løland, 2003),and we therefore focus on DT creation using relational operators7, also calledrelational algebra operations. Relational operators can be categorized intwo groups: non-aggregate and aggregate operators (Elmasri and Navathe,2004). The non-aggregate operators are cartesian product, various joins,projection, union, selection, difference, intersection and division. Aggregateoperators are mathematical functions that apply to collections of records.Both non-aggregate and aggregate operators can be used to define schematransformations and MVs. However, aggregate operators are typically notused without non-aggregate operators (Alur et al., 2002), and we thereforeconsider non-aggregate operators the best starting point for DT creation.

The main topic of this thesis is to develop a method that solves theunavailability problem of creating derived tables, using common relationaloperators. Due to time constraints, we will focus on six operators: full outerequijoin (one-to-many and many-to-many relationships), projection, union,selection, difference and intersection8. Full outer equijoin is chosen becausethese can later be reduced to any inner/left/right join simply by removingrecords from the result, and because equality is the most commonly usedcomparison type in joins (Elmasri and Navathe, 2004). Furthermore, in termsof derived table creation, cartesian product is actually a simpler full outerjoin in which no attribute comparison is performed. The final non-aggregateoperator, division, can be expressed in terms of the other operators, and istherefore considered less important.

The suggested method must solve any problem associated with utilizingthe DTs as materialized views and in schema transformations. To gain insightin the field, the work must include a thorough study of existing solutions tothe described and closely related problems. Existing DBMS functionalityshould be used to the greatest possible extent to ease the integration ofthe method into existing DBMSs. Since the goal is to develop a methodthat incurs little performance degradation to concurrent transactions, theperformance implications of the method needs to be tested.

7Relational operators are the building blocks used in queries (Elmasri and Navathe,2004).

8Due to naming conventions in the literature (Ronstrom, 1998), we use the names ver-tical merge and split, horizontal merge and split, difference and intersection, respectively,when these relational operators are used in DT creation.

Page 24: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

10 1.2. RESEARCH QUESTIONS

1.2 Research Questions

Based on the discussion in the previous section, the main research questionof the thesis is:

How can we create derived tables and use these for schema transformationand materialized view creation purposes while incurring minimal performancedegradation to transactions operating concurrently on the involved source ta-bles.

We realize that this is a research question with many aspects. To be able toanswer it, the research question is therefore refined into four key challenges:

Q1: Current situationWhat is the current status of related research designed to address themain research question or part of it?

Q2: System RequirementsWhat DBMS functionality is required for non-blocking DT creation towork?

Q3: Approach and solutionsHow can derived tables be created with minimal performance degra-dation, and be used for schema transformation and MV creation pur-poses?

• How can we create derived tables using the chosen six relationaloperators.

• What is required for the DTs to be used a) as materialized views?b) for schema transformations?

• To what extent can the solution be based on standard DBMS func-tionality and thereby be easily integradable in existing DBMSs?

Q4: PerformanceIs the performance of the solution satisfactory?

• How much does the proposed solution degrade performance foruser transactions operating concurrently?

• With the inevitable performance degradation in mind; under whichcircumstances is the proposed solution better than a) other solu-tions? b) performing the schema transformation or MV creationin the traditional, blocking way?

Page 25: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 1. INTRODUCTION 11

1.3 Research Methodology

Denning et al. divides computer science research into three paradigms: the-ory, abstraction and design (Denning et al., 1989).

Theory is rooted in mathematics and aims at developing validated theories.The paradigm consists of four steps:

1. characterize objects of study

2. hypothesize possible relationships among them, i.e., form theorem

3. determine whether the relationships are true, i.e., proof

4. interpret results

Abstraction is rooted in experimental scientific method. In this method, aphenomenon is investigated by collecting and analyzing experimentalresults. It consists of four steps:

1. form a hypothesis

2. construct a model and make a prediction

3. design an experiment and collect data

4. analyze results

Design is rooted in engineering, and aims at constructing a system thatsolves a problem.

1. state requirements

2. state specifications

3. design and implement the system

4. test the system

The research presented in this thesis fits naturally into the Design paradigm.The research aims at solving the problem that creation of derived tables is ablocking operation.

For our suggested solution to be useful, the method must fit into commonDBMS design. Hence, the first step in solving the research question is tounderstand commonly used DBMS technology that is somehow related tothe research question. This will enable us to state requirements.

The next step is to state specifications for a method that can be used tocreate derived tables in a non-blocking way. The method should be designed

Page 26: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

12 1.4. ORGANIZATION OF THIS THESIS

to fit into existing DBMSs to the greatest possible extent, and to degradeperformance as little as possible.

To verify validity, and to test the actual performance degradation, aDBMS and the suggested method is then designed and implemented. Theimplementation is then subjected to thorough performance testing.

1.4 Organization of this thesis

The thesis is divided into three parts with different focus. The focus in PartI is on the background for the research. This includes the research question,required functionality and a survey of related work. Part II presents oursolution to the derived table creation problem, and shows how the DTs canbe used to transform the schema and to create materialized views. In PartIII, we discuss the results of experiments on a prototype DBMS. This partalso includes a discussion of the research contributions, and suggests furtherwork.

Part I - Background and Context contains an introduction to derivedtable creation. The focus in this part is on research from the literatureand existing systems that is relevant to our solution of the researchquestion and suggestions for further work.

Chapter 1 contains this introduction. The chapter states the motiva-tion for the research, and the research methodology that is usedin the work.

Chapter 2 introduces the DBMS fundamentals required to performnon-blocking derived table creations.

Chapter 3 is a survey of existing solutions to the non-blocking DTcreation problem and related problems.

Part II - DT Creation Framework presents our solution for non-blockingcreation of derived tables, and how these derived tables can be used forschema transformations and materialized view creation.

Chapter 4 introduces our framework for non-blocking derived tablecreation.

Chapter 5 identifies problems that are encountered when derived ta-bles are created as described in Chapter 4. The chapter also showshow these problems can be solved.

Page 27: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 1. INTRODUCTION 13

Chapter 6 describes in detail how the DT creation framework pre-sented in Chapter 4 can be used for non-blocking creation of de-rived tables using the six relational operators that have been cho-sen. The chapter also describes what needs to be done to use theseDTs for schema transformations or as materialized views.

Part III - Implementation and Testing presents the design of our pro-totype DBMS. The prototype is capable of performing non-blockingDT creation as described in Part II. Results from performance testingof this prototype are also presented.

Chapter 7 evaluates three alternatives for implementation of the DTcreation method.

Chapter 8 describes the design of a prototype DBMS capable of per-forming the DT creation method developed in Part II.

Chapter 9 discusses experiment types the results of performing therequired experiments on the prototype.

Chapter 10 contains a discussion of the results of the research.

Chapter 11 presents an overall conclusion and the contributions ofthe thesis.

Page 28: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

Chapter 2

Derived Table Creation Basics

This chapter describes basic Database Management System concepts that areused by or are otherwise relevant to our non-blocking DT creation method. Athorough description of DBMSs is out of the scope of this thesis. For furtherdetails, the reader is referred to one of the many text books on the subject,e.g. “Database Systems The Complete Book” (Garcia-Molina et al., 2002)or “Fundamentals of Database Systems” (Elmasri and Navathe, 2004).

2.1 Database Systems - An Introduction

A database (DB) is a collection of data items1, each having a value (Bernsteinet al., 1987). All access to the database goes through the Database Manage-ment System (DBMS). As illustrated in Figure 2.1, a database managed bya DBMS is called a database system (Elmasri and Navathe, 2004).

Database access is performed by executing special transaction programs,which have a defined start and end, set by start and either commit or abortoperation requests. A commit ensures that all operations of the transactionare executed and safely stored, while an abort call removes all effects of thetransaction (Elmasri and Navathe, 2004).

In its most common use, transactions have four properties, known as the“ACID” properties (Gray, 1981; Haerder and Reuter, 1983):

Atomicity - The transaction must execute successfully, or must appear notto have executed at all. This is also referred to as the “all or nothing”property of transactions. Thus, the DBMS must be able to undo alloperations performed by a transaction that is aborted.

1The data items are called tuples or rows in the relational data model. Internally indatabase systems, they are called records (Garcia-Molina et al., 2002). To avoid confusion,the term “record” will be used throughout this thesis.

Page 29: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 2. DERIVED TABLE CREATION BASICS 15

Database

Database System

Database Management System

User

Figure 2.1: Conceptual Model of a Database System.

Consistency - A transaction must always transform the database from oneconsistent state2 to another.

Isolation - It should appear to each transaction that other transactionseither appeared before or after it, but not both (Gray and Reuter,1993).

Durability - The results of a transaction are permanent once the transac-tion has committed.

Broadly speaking, the ACID properties are enforced mainly by concur-rency control, which is used to achieve isolation, and recovery which is usedfor atomicity and durability. Consistency means that transactions must pre-serve constraints. The following two sections give a brief introduction tocommon concurrency control and recovery mechanisms.

2A consistent state is a state where database constraints are not broken (Garcia-Molinaet al., 2002).

Page 30: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

16 2.2. CONCURRENCY CONTROL

2.2 Concurrency Control

In a database system where only one transaction is active at any time, con-currency control is not needed. In this scenario, the operations from eachtransaction are executed in serial (i.e. sequential) order, and two transactionscan never interfere with each other. The isolation property is therefore im-plicitly guaranteed. However, this scenario is seldom used since the databasesystem is normally only able to use small parts of the available resources atany time (Bernstein et al., 1987).

When concurrent transactions are allowed, the operations of the varioustransactions must be executed as if the execution was serial (Garcia-Molinaet al., 2002). A sequence, or history, of operations that gives the sameresult as serial execution is called serializable. It is the responsibility of the“Scheduler”, or “Concurrency Controller”, to enforce serializable histories,which in turn is a guarantee for isolation (Bernstein et al., 1987).

Schedulers

Schedulers can be either optimistic or pessimistic. With the optimistic strat-egy, transactions perform operations right away without first checking forconflicts. When the transaction requests a commit, however, its history ischecked. If the transaction has been involved in any non-serializable opera-tions, the transaction is forced to abort. Timestamp ordering, serializationgraph testing and locking can all be used for optimistic scheduling (Bernsteinet al., 1987).

The most common form of scheduling is pessimistic, however. With thisstrategy, transactions are not allowed to perform operations that will formnon-serializable histories in the first place. Thus, the scheduler has to checkevery operation to see if it conflicts with any operation executed by anothercurrently active transaction. When a conflict is found, the scheduler maydecide to either delay or reject the operation (Bernstein et al., 1987).

The pessimistic Two Phase Locking (2PL) strategy has become the defacto scheduling standard in commercial DBMSs. It is, e.g., used in OracleDatabase 10g (Cyran and Lane, 2003). In 2PL, a lock must be set on adata item before a transaction is allowed to operate on it. Two lock types,shared and exclusive, are typically used. The idea is that multiple transac-tions should be allowed to concurrently read the same record, while only onetransaction should at any time be allowed to write to a record. Thus, readoperations are allowed if the transaction has a shared or exclusive lock onthe record, while write operations are allowed only if the transaction has anexclusive lock on it.

Page 31: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 2. DERIVED TABLE CREATION BASICS 17

As the name indicates, 2PL works in two phases: locks are acquiredduring the first phase, and released during the second phase. This impliesthat a transaction is not allowed to set new locks once it has started releasinglocks. Unless the transaction pre-declares all operations it will execute, thescheduler does not know if a transaction is done operating on a particularobject, or if it will need more locks in the future. Locks are therefore typicallynot released until the transaction terminates. This is known as Strict 2PL(Garcia-Molina et al., 2002). The derived table creation method describedin this document assumes that 2PL is used, although it may be tailored tosuit other scheduling strategies as well.

Two-phase commit is a commonly used protocol for commit handling indistributed DBMSs used to ensure that the transaction either commits onall nodes or aborts on all nodes. The protocol works in two phases: in theprepare phase, the transaction coordinator asks all transaction participantsif they are ready to commit. If they all agree to commit, the coordinatorcompletes the transaction by sending a commit message to all participant(Gray, 1978). This is called the commit phase.

2.3 Recovery

In a database system, failure may occur on three levels. Transaction failurehappens when as transaction either chooses to, or is forced to, abort. Systemfailure happens when the contents of volatile storage is lost or corrupted. Apower failure is a typical reason for such failures. Media failure happens whenthe contents in non-volatile storage is either lost or corrupted, e.g. becauseof a disk crash. In what follows, “memory” and “disk” will be used insteadof volatile and non-volatile storage, respectively.

Physical Logging

Recovery managers are constructed to correct the three types of failure. Theidea behind almost all recovery managers is that information for how to re-cover the database to the correct state must be stored safely at any time. Thisinformation is typically stored in a log, which can either be physical, logicalor physiological (Haerder and Reuter, 1983; Bernstein et al., 1987). Physicallogging, or value logging, writes the before and after value of a changed objectto the log (Gray, 1978). The physical unit that is logged is typically a diskblock or a record. Assuming that records are smaller than disk blocks, theformer produces drastically higher log volumes than the latter (Haerder andReuter, 1983). Since the log records contain before and after values of the

Page 32: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

18 2.3. RECOVERY

Block:27 LSN:10

R1=3

R2=6

Block:27 LSN:12

R1=10

R2=15

Block:27 LSN:??

R1=3

R2=15

DiskBlock

History11: T1 - R1=10 12: T2 - R2=15 13: T2 commit 14: T1 abort

Figure 2.2: Two records in the same disk block are updated by different trans-actions, T1 and T2. After T1 aborts, there is no valid state identifier for theblock.

changed object, logged operations are idempotent which means that redoingthe operation multiple times yields the same result as redoing it once.

Logical Logging

Logical logging, or operation logging, logs the operation that is performedinstead of the before and after value (Haerder and Reuter, 1983). This strat-egy produces much smaller log volumes than the physical methods (Bernsteinet al., 1987). A Log Sequence Number (LSN) is assigned to each log record,and data items are tagged with the LSN of the latest operation that haschanged it. This is done to ensure that changes are applied only once to eachrecord since logically logged operations are not idempotent in general. LSNsmay be assigned to block (block state identifier, BSI) or record (record stateidentifier, RSI) level (Bernstein et al., 1987). The former requires slightly lessdisk space whereas the latter is better suited in replicated database systemsbased on log redo since this allows for different physical organization at thedifferent nodes (Bratsberg et al., 1997a).

Two common methods to increase the degree of concurrency are fine-granularity locking and semantically rich locks. Fine-granularity locks arenormal locks that are set on small data items, i.e. records (Weikum, 1986;Mohan et al., 1992). Semantically rich locks allow multiple transactionsto lock the same data item provided that the operations are commutative(Korth, 1983). Operations that commute may be performed in any order,e.g., “increase” and “decrease”. When these methods are combined withlogical logging, Compensating Log Records (Crus, 1984) are required (Mohanet al., 1992). The reason for this is that there is no correct LSN that can beused as BSI after certain undo operations, as illustrated in Figure 2.2; afterthe abort of transaction 1, the LSN of the block cannot be set to what it wasbefore the update because that would not reflect the change performed by

Page 33: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 2. DERIVED TABLE CREATION BASICS 19

transaction 2. Neither can the LSN of the abort log record be used since thisinvalidates the one-to-one correspondence between updates and log records(Gray and Reuter, 1993). Thus, Compensating Log Records (CLR) (Crus,1984) are written to the log when an undo operation is performed due to anyof the failure scenarios presented in Section 2.3. The CLR describes the undoaction that takes place (Gray, 1978). It also keeps the LSN of the log recordthat was undone. E.g., if the insert of record X is undone, a CLR describingthe deletion of X is written to the log. LSNs are assigned to CLRs, thusthe state identifier of a record or disk block will increase even when undooperations are performed.

Logical logging is considered better than physical logging because of thereduced log volume and because the state identifiers reduces recovery work.However, it has one major flaw: the logged operations are not action consis-tent3 since they are not atomic. One insert may, e.g., require that large partsof a B-tree is restructured. This can be solved by using a two-level recoveryscheme where the low-level system provides action consistent operations tothe high-level logical logging scheme (Gray and Reuter, 1993). Shadowing isan example low-level scheme. With this method, blocks are copied before achange is applied. The method is complex and requires locks to be set onblocks since this is the granularity of the copies (Gray and Reuter, 1993).

Physiological Logging

Physiological logging (Mohan et al., 1992), also called physical-to-a-pagelogical-within-a-page, is a compromise between physical and logical logging.It uses logical logging to describe operations on the physical objects; blocks.In the shadowing strategy, non-atomic operations are executed by mini-transactions. The mini-transactions consist of atomic operations, each ofwhich are physiologically logged. Thus, the log records are small, while theproblems of logical logging are avoided (Gray and Reuter, 1993).

(No-)Steal and (No-)Force Cache Managers

The log may provide information to undo or redo an operation. Which kindof information is required for recovery to work depends heavily on the strat-egy of another DBMS component, the cache manager, which is responsiblefor copying data items between memory and disk. Two parameters are ofparticular relevance to the recovery manager. The first determines whether

3This means that one logical operation may involve multiple physical operations.Hence, a database system may crash when only parts of a logically logged operationhas been performed (Gray and Reuter, 1993).

Page 34: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

20 2.3. RECOVERY

or not updates from uncommitted transactions may be written to disk. Ifuncommitted updates are allowed on disk, the cache manager uses a stealstrategy; otherwise a no-steal strategy is used (Gray, 1978). Since memory isalmost always a limited resource, stealing gives the cache manager valuablefreedom in choosing which data items should be moved out. The problem isthat if a failure occurs, data items on disk may have been changed by trans-actions that have not committed. The atomicity property requires that theeffects of these transactions should be removed. Hence, if stealing is allowed,undo information of uncommitted writes must be forced to the log before theupdated records are written to disk.

The second parameter determines if data items updated by a transac-tion must be forced to disk or not (i.e. no-force) before the transaction iscommitted (Gray, 1978). If force is used, the disk must be accessed duringcritical parts of transaction execution. This may lead to inefficient cachemanagement. If no-force is used, redo information of all committed changesmust be written to the log, and the log must then be forced to disk. This isknown as Force Log at Commit (Mohan et al., 1992).

It is common to use a steal/no-force cache manager since this providesthe maximum freedom and highest performance. This is also the case for awell-known recovery strategy, ARIES (Mohan et al., 1992), which is brieflydescribed in the next section.

The ARIES Recovery Strategy

ARIES (Mohan et al., 1992) (Algorithm for Recovery and Isolation Exploit-ing Semantics) is a recovery method for the steal/no-force cache managerstrategy described above. ARIES uses 2PL for concurrency. It is in commonuse in many commercial DBMS, e.g. SQL Server 2005 (Microsoft TechNet,2006) and IBM DB2 version 9 (IBM Information Center, 2006). The princi-ples are also used in the derived table creation method, which is the topic ofthis thesis.

ARIES uses the Write-Ahead Logging (WAL) protocol (Gray, 1978),which requires that a log record describing a change to a data item is writtento disk before the change itself is written. One sequential log, containingboth undo and redo log records, is used. A unique, ascending Log SequenceNumber (LSN) is assigned to each record in this log. The LSN is also usedto tag blocks so that a disk block is known to reflect a logged change if andonly if the LSN of the disk block is equal to or greater than that of a logrecord. Log records are initially added to the volatile part of the log file, andare forced to disk either when a commit request is processed or when thecache manager writes changed data items to disk.

Page 35: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 2. DERIVED TABLE CREATION BASICS 21

The ARIES protocol can be used with both logical and physiologicallogging. It supports fine-granularity locks and semantically rich locks (Mohanet al., 1992).

2.4 Record Identification Policy

The DBMS needs a way to uniquely identify records so that transactionaloperations and recovery work is applied to the correct record. Each recordis therefore assigned a Record Identifier (RID). The mapping from RID tothe physical record is called the access path. There are four identificationtechniques (Gray and Reuter, 1993): Relative Byte Address, Tuple Identifier,Database Key, and Primary Key.

Physical Identification Policies

Relative Byte Addresses (RBA) consist of a block address and an offset, i.e.the byte number within that block. RBA is fast since it points directly to thecorrect physical address. Physical location is not very stable, however. E.g.,an update may increase a records size, which may change the offset or block.An address that is as unstable as this is not well suited as a RID (Gray andReuter, 1993).

Tuple Identifiers (TID) consists of a block address and a logical ID withinthe block. Each block has an index used to map the ID to the correct offset.Hence, a record may be relocated within a block without changing the RID.A pointer to the new address is used if a record is relocated to another block.When a pointer is followed, the access path to the record becomes morecostly, however. Hence, relocated records should eventually receive a newTID reflecting the actual location. This reorganization must be executedonline, i.e. in parallel to normal processing, and represents an overhead.This seems to be the most common record identification technique; it is usedby, e.g., IBM DB2 v9 (IBM Information Center, 2006), SQL Server 2005(Microsoft TechNet, 2006) and Oracle 10g (Cyran and Lane, 2003).

Logical Identification Policies

Database Keys are unique, ascending integers assigned to records by theDBMS. A translation table maps database keys to the physical location ofthe records. The database key works as an index in the array-like translationtable, and therefore requires only one block access. This mapping ensuresthat a record can be relocated to any block without having to change itsRID. The extra lookup incurs an access path overhead, however.

Page 36: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

22 2.4. RECORD IDENTIFICATION POLICY

Since all records in a relational database system are required to have aunique primary key4, the primary keys may serve as RIDs as well. Addressingis indirect, but in contrast to the previous method, primary keys can notbe used as an index in a translation table since primary keys do not havemonotonically increasing values. Thus, a B-tree is used to map the primarykey to the physical location of the record. The access path is approximatelyas costly as database keys, but has a number of advantages. These includethat access to records is often done through primary keys, so the primary keymapping to record must be done either way. The uniqueness of primary keysmust also be guaranteed by the DBMS. This is efficient to do when primarykey is used as RID (Gray and Reuter, 1993). This technique is used, e.g, inOracle 10g if the table is index-organized (Cyran and Lane, 2003).

When creating derived tables, the physical location of records residingin DTs are not the same as in the source tables. The blocks are obviouslydifferent. The location within the blocks may also be different since a rela-tional operator is applied. Hence, the DT creation method described in thisthesis assumes that a logical record identification scheme is used, i.e. eitherDatabase Keys or Primary Keys. Using physical identification policies, i.e.RBA or TID, is also possible, but requires an additional address mappingtable.

4Either supplied by the user or generated by the system (Gray and Reuter, 1993).

Page 37: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

Chapter 3

A Survey of TechnologiesRelated to Non-BlockingDerived Table Creation

This chapter describes the state of the art in non-blocking creation of derivedtables (DT). The aim of the survey is to evaluate the functionality and costof existing methods used for this purpose. Some of the ideas presented herewill later be used in our non-blocking DT creation method. This will beexplicitly commented on.

Three related areas of research are discussed. First, a schema transfor-mation method that can be used for some of the relational operators is de-scribed. To the author’s knowledge, this is the only research on non-blockingtransformations in relational database systems in the literature. Next, wedescribe fuzzy copying, which is a method for non-blocking creation of DTs,but without the ability to apply relational operators. Third, maintenancetechniques for Materialized Views (MV) are discussed. The motivation forthis is that an MV is a type of DT, and some of the research in MV mainte-nance is therefore applicable to our suggested DT creation method. Finally,methods for schema transformations and DT creation available in existingDBMSs are described.

3.1 Ronstrom’s Schema Transformations

Ronstrom (Ronstrom, 2000) presents a non-blocking method that uses both areorganizer and triggers within users’ transactions to perform schema trans-formations, called schema changes by Ronstrom (Ronstrom, 1998). It isargued that there are three dimensions to schema transformations. These

Page 38: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

24 3.1. RONSTROM’S SCHEMA TRANSFORMATIONS

are soft vs. hard schema changes, simple vs. complex schema changes andsimple vs. complex conversion functions (Ronstrom, 1998). A summary canbe found in Table 3.1.

The soft vs. hard schema change dimension determines whether or notnew transactions are allowed to use the new schema before all transactionson the old schema have terminated. Thus, with soft schema changes, trans-actions that were started before the new schema was created continue pro-cessing on the old schema while new transactions start using the transformedone (Ronstrom, 1998). With hard schema changes, new transactions are notallowed to start processing on the affected parts of the transformed schemauntil all transactions on the old schema have terminated.

Soft schema changes are desirable since with this strategy, new transac-tions are not blocked while the old transactions finish processing. In somecases, soft schema changes can not be used, however. This happens when-ever the new schema does not contain enough information to trigger updatesback to the old schema, i.e. when a mapping function from the transformedattributes to the old attributes does not exist (Ronstrom, 1998).

The second dimension to schema changes divides transformations intosimple and complex schema changes. Simple schema changes are short lived,and typically involve changes to the schema description only (Ronstrom,1998). Complex schema changes involve many records and take considerabletime. With this method, complex schema changes should not be executedas one single blocking transaction due to their long execution time (Ron-strom, 2000). Instead, complex schema changes are organized using SAGAs1

(Garcia-Molina and Salem, 1987).

The third dimension to schema changes is that of simple vs. complex con-version functions. In simple conversions, all the information needed to applyan operation in the transformed schema is found in the operated-on record inthe old schema. Complex conversions, on the other hand, need informationfrom other records before the operation can be applied. This informationmay be found in other tables (Ronstrom, 2000). Complex conversions canonly be performed by complex schema changes (Ronstrom, 2000).

The following sections describe how transformations are performed inRonstrom’s method. The description divides transformations into simpleand complex changes, i.e. along the second dimension. Even though notall of the complex schema changes actually create DTs, all changes that canbe performed by the method are presented for readability. A thorough costanalysis of schema transformation operators is presented in Section 3.1.3.

1SAGA is a method to organize long running transactions (Garcia-Molina and Salem,1987)

Page 39: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 3. A SURVEY OF TECHNOLOGIES RELATED TONON-BLOCKING DERIVED TABLE CREATION 25

Schema ChangeSoft New transactions are allowed to start accessing the new tables

while the old transactions are accessing the old tables.Hard Transactions that try to access the new tables are blocked until

all transactions accessing the old tables have completed.Simple Short lived, typically only changes schema description, exe-

cuted as one transaction.Complex Long lived, involves many records, executed using triggers and

SAGA transactions.Conversion Function

Simple A record can be added to the new schema by reading only theoperated-on record in the old schema.

Complex Adding a record to the new schema may require informationfrom multiple records in the old schema. Always executed as acomplex schema change.

Table 3.1: The three dimensions of Ronstrom’s schema transformations.

3.1.1 Simple Schema Changes

Simple schema changes only change the schema description of the database.The changes are organized in a way similar to the two-phase commit protocol,described in Section 2.2. First, the system coordinator sends the new schemadescription to all nodes. If the transformation is hard, each node will wait forongoing transactions to finish processing. The nodes then lock the involvedparts of the schema, update the schema description and log the change beforeacknowledging the change request. When all nodes have acknowledged thechange, the coordinator sends commit to all nodes, including a new schemaversion number. New transactions will from now on use the transformedschema.

Examples of simple schema changes include adding and dropping a table,adding an attribute with a default value, and dropping an attribute or index.None of these operations involve creation of derived tables.

3.1.2 Complex Schema Changes

Schema changes involving many records are considered complex in Ron-strom’s method. This includes adding attributes, with values derived fromother attributes, to a table. It also includes adding a secondary index. Ad-ditionally, both horizontal and vertical merge and split of tables can be per-formed (Ronstrom, 1998). In terms of relational operators, horizontal merge

Page 40: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

26 3.1. RONSTROM’S SCHEMA TRANSFORMATIONS

corresponds to the union operator, while vertical merge corresponds to theleft outer join operator. The split methods are inverses of the merge methods.

All complex schema changes go through three phases. The schema isfirst changed by adding the necessary tables, attributes, triggers, indices andconstraints (Ronstrom, 2000). Second, the involved tables are operated onby reading and performing necessary operations one record at a time. Therequired operations depend on the transformation being performed. Involvedtables are left unlocked for the entire transformation, whereas the records arelocked temporarily. To ensure that the transformation does not lock recordsfor long periods of time, only one record is operated on per transaction.All these transactions, each operating on one record, are organized by usingSAGAs. While transactions read records in the source tables and performthe operations necessary for the transformation, triggers ensure that insert,delete and update operations in the old schema are performed in the newschema as well (Ronstrom, 1998).

The third phase is started once the SAGA organized transactions havecompleted. If the schema change is soft, new user transactions start usingthe new schema immediately while active transactions are allowed to fin-ish execution on the old schema. Since both schemas are in use, triggershave to forward operations both from the old to the new schema and viceversa. If the schema change is hard, transactions are not allowed to use thenew schema until all transactions on the old schema have completed. Whenall transactions that were using the old schema have terminated, obsoletetriggers, tables, attributes, indices and constraints are removed (Ronstrom,2000).

In what follows, all complex schema changes that can be performed byRonstrom’s method are described in detail. Ordered by increasing com-plexity, these are horizontal merge and split, and vertical merge and splittransformations.

Horizontal Merge Schema Change

In Ronstrom’s schema transformation framework, horizontal merge corre-sponds to the UNION relational operator without duplicate removal. Thetransformation is performed by creating a new table in which records fromboth source tables are inserted. Hence, this is a derived table.

As illustrated in Figure 3.1, records from both source tables may haveidentical primary keys. This is a problem because two records with the sameprimary key are not allowed to coexist in the new table. This may be solvedby using a non-derived primary key, e.g. an additional attribute with an auto-incremented number, in the new table. Alternatively, the primary key of the

Page 41: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 3. A SURVEY OF TECHNOLOGIES RELATED TONON-BLOCKING DERIVED TABLE CREATION 27

Foreign key

Record ("new table")

FK

Album

Root Down

Come away...

Kind of...

Root Down

The look...

Artist

Smith, Jimmy

Jones, Norah

Davis, Miles

Smith, Jimmy

Krall, Diana

FromTbl

Vin

Vin

Vin

CD

CD

CD ("original table 2")

Album

Root Down

The look...

Artist

Smith, Jimmy

Krall, Diana

FK

Vinyl ("original table 1")

FK

Album

Root Down

Come away...

Kind of...

Artist

Smith, Jimmy

Jones, Norah

Davis, Miles

Figure 3.1: Horizontal Merge. Two tables, “Vinyl” and “CD”, aremerged into one new table, “Record”. The primary key of both tables is<artist,album>. The new table includes an attribute “FromTable” so thatidentical primary key values from the two source tables may coexist.

new table may be a combination of the primary key from the old schemaand an attribute identifying which table the record belonged to (Ronstrom,2000).

The method starts by creating the new table. Foreign keys are thenadded to both the old tables and to the new table. Since duplicates are notremoved, there is a one-to-one relationship between records in the old andthe new schema. Thus, triggers in both schemas will only have to operateon one record in the other schema.

Update and delete operations in one of the source tables trigger the sameoperation on the record referred to in the new table. For soft transformation,updates and deletes in the new table have similar triggers. Insert operationsin a source table simply trigger an equal insert into the new table. Insertsinto the new table should trigger inserts into one of the old tables as well. Ifthe new table contains an attribute that identifies which old table it shouldbelong to, this is straightforward. If this is not the case, e.g. because a non-derived key is used in the new table, the transformation cannot be performedsoftly.

In the second step, records from the old tables are read and inserted intothe new table one record at a time. When all records have been copied, newtransactions are given access to the new schema. The old tables are deletedonce all old transactions have finished processing.

Page 42: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

28 3.1. RONSTROM’S SCHEMA TRANSFORMATIONS

Foreign key

Employee ("original table")

FK

Salary

$40’

$32’

$42’

$35’

S.Name

Valiante

Olsen

Oaks

Pine

F.Name

Hanna

Erik

Markus

Peter

LowSalary ("new table2")

Salary

$32’

$35’

S.Name

Olsen

Pine

F.Name

Erik

Peter

FK

HighSalary ("new table1")

Salary

$40’

$42’

S.Name

Valiante

Oaks

F.Name

Hanna

Markus

FK

Figure 3.2: Horizontal Split. One table, “Employee”, is split into two tablesbased on salary.

Horizontal Split Schema Change

Horizontal split is the inverse of horizontal merge; it splits one table intotwo or more tables by copying records to the new tables depending on acondition. An example transformation is that of splitting “Employee” into“High Salary Employee” and “Low Salary Employee” based on conditionslike “salary >= $40.000” and “salary < $40.000”. This transformation isillustrated in Figure 3.2. Only horizontal split transformations where allsource table records match the condition of one and only one new table isdescribed by Ronstrom (Ronstrom, 2000). Because of this, records in theold table refer to exactly one record in the new schema, thus simplifying thetransformation.

The new tables are first added to the schema, thus this method createsderived tables. Foreign keys, initially set to null, are then added both to theold and new tables. Once a record has been copied to the new schema, theforeign keys in both schemas are updated to point to the record in the otherschema.

The transformation can easily be made soft by adding triggers to boththe old and the new tables. Because of the one-to-one relationship betweenrecords in the old and new schema, these triggers are straightforward: deletesand updates perform the same operation on the record referred to by theforeign key. Insert operations into the old table trigger an insert into thenew table that has a matching condition. Ronstrom does not discuss how tohandle an insert into a new table if the old table already contains a recordwith the same primary key. In all other cases, inserts into the new table

Page 43: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 3. A SURVEY OF TECHNOLOGIES RELATED TONON-BLOCKING DERIVED TABLE CREATION 29

simply result in an insert into the old table as well.

When the triggers have been added, the transformation is executed asdescribed in the general method by copying the data in the old table onerecord at a time.

Vertical Merge Schema Changes

The vertical merge schema change uses the left outer join relational opera-tor. Since records without a join match in the left table of the join are notincluded, this transformation is not lossless. The method requires that thetables have the same primary key, or that one table has a foreign key to theother table (Ronstrom, 2000). This is illustrated in Figures 3.3(a) and 3.3(b),respectively. These requirements imply that the method cannot perform ajoin of many-to-many relations, nor a full outer join.

Since the join is performed by adding attributes to one of the existingtables, a DT is not created. Hence, the method cannot be used for otherpurposes than schema transformations.

The transformation starts by adding the attributes belonging to the joinedrecord to the left table of the join. This table is called the left table in theold schema and the merged table in the new schema. The left table is repre-sented by “Person” in both Figures 3.3(a) and 3.3(b). A foreign key to theright table of the join, called the originating table, is also added if it does notalready exist. The originating table is represented by “Salary” and “PAd-dress” in Figures 3.3(a) and 3.3(b), respectively. In addition, an attributeindicating whether the record has already been transformed is added. Duringthe transformation, transactions that operate on the old schema do not seethe attributes added to the left table.

Triggers are then added to the originating table. Update and delete op-erations trigger update operations of all records referring to it in the mergedtable. The trigger on deletes also removes the foreign key of referring records.Insert operations trigger updates of all records in the merged table match-ing the join attributes. All these triggers also set the has-been-transformedattribute to true (Ronstrom, 2000).

Since old transactions are free to operate on the left table, a numberof triggers must be added there as well. The trigger of insert operationsreads the matching record in the originating table so that the value of theadded attributes can be updated accordingly. Update operations that changethe foreign key, trigger a read of the new join match and update the addedattributes to keep it consistent. In addition, all modifying operations2 have

2Inserts, updates and deletes are modifying operations.

Page 44: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

30 3.1. RONSTROM’S SCHEMA TRANSFORMATIONS

Person

Firstname

Surname

Address

Zip Code

Salary

"Left table"

Old Schema New Schema

"Originatingtable"

"Merged table"

Person

Firstname

Surname

Address

Zip Code

Salary

Firstname

Surname

Salary

Salary

(a) Vertical merge with same primary key. “Firstname, Surname” is the pri-mary key in all tables. The merged table is created by adding the salaryattribute to “Person”.

Person

Firstname

Surname

Address

Zip Code

Person

Firstname

Surname

Address

Zip Code

City

PAddress

Zip Code

City

City

Old Schema New Schema

"Left table"

"Originatingtable"

"Merged table"

(b) Vertical Merge of functional dependency. Person.ZipCode is a foreign keyto PAddress.ZipCode. The merged table is created by adding city to “Person”.

Figure 3.3: Examples of Vertical Merge Schema Change.

Page 45: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 3. A SURVEY OF TECHNOLOGIES RELATED TONON-BLOCKING DERIVED TABLE CREATION 31

to update the foreign key reference in the original table.Once all necessary attributes and triggers are in place, the records in the

left table are processed one at a time. A transaction reads a record and usesthe foreign key to find the record referred to in the originating table. Theattribute values of that record are then written to the added attributes inthe merged table, and the has-been-transformed attribute is set (Ronstrom,2000).

When all records in the merged table have been processed, it contains aleft outer join of the two tables. A hard schema change is simply achievedby letting old transactions complete on the old schema. Triggers, attributesand foreign keys no longer in use are then dropped, before new transactionsare allowed to use the new schema (Ronstrom, 2000).

Performing a soft vertical merge transformation implies adding triggersto the merged table as well. These triggers make sure that write operationsexecuted by transactions operating on the new schema are also visible fortransactions using the old schema. Thus, updates on the added attributesof records in the merged table must trigger updates to the referred record inthe originating table.

Insert operations into the merged table trigger a scan of the originatingtable to see if it already contains a matching record. If so, only the foreignkey of the inserted record is updated. Otherwise, a record is also insertedinto the originating table.

A problem arises if a record is inserted into the merged table and thetrigger scanning the originating table finds that an inconsistent record alreadyexists. The same case is encountered if the added attributes of a mergedtable record are updated while multiple records refer to the same originatingrecord. Since two records with the same primary key cannot exist in theoriginating table at the same time, a simple insert of a new record is notpossible. Furthermore, it would not be correct to just update the originatingrecord since the other records referring to it would disagree on its attributevalues. This problem is not addressed in the method, but there are at leasttwo possible solutions: the first possibility is to abort the transaction tryingto insert or update the record in the merged table. This would be a seriousrestriction to operations in the merged table. The second is to update therecord in the originating table, which in turn triggers updates on all recordsin the merged table that refers to it. This is illustrated in Example 3.1.1:

Example 3.1.1 (Triggered Updates During Soft Vertical Merge)Consider the vertical merge illustrated in Figure 3.4. During a soft schemachange, an attribute added to the merged table, “City”, is updated. Thistriggers an update to the originating table, “Postal Address”, again triggering

Page 46: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

32 3.1. RONSTROM’S SCHEMA TRANSFORMATIONS

"update merged set city = London where..." User update

Triggered update

F.Name

Hanna

Erik

Markus

Peter

Sofie

Zip

7020

5121

7020

0340

7020

Address

Moholt 3

Torvbk 6

Mollenb.7

Oslovn 34

Berg 1

S.Name

Valiante

Olsen

Oaks

Pine

Clark

City

Tr.heim

Bergen

Tr.heim

Oslo

Tr.heim

Zip

0340

5121

7020

9010

City

Oslo

Bergen

Tr.heim

Tromsø

Figure 3.4: Example 3.1.1 - An update to a record in the merged table triggersupdates in both the originating and the merged table.

updates on records in the merged table that refers to it.

This second scenario would probably be preferred in most cases. Note,however, that the behavior of transactions in the new schema would not beequal.

Vertical Split Schema Change

The vertical split transformation uses the projection relational operator. Thetransformation is the inverse of vertical merge. Like vertical merge, verticalsplit uses triggers and transactions that operate on one record at a time toperform the transformation.

The transformation starts by creating a table containing a subset of at-tributes from the original table. If, e.g., a table “Person” is vertically splitinto “Person” and “PAddress”, “PAddress” is the new table. This is illus-trated in Figure 3.5. The table that is split is called the original table in theold schema and the split table in the transformed schema (Ronstrom, 2000).Note that the “new” table is the only derived table that is created in thistransformation.

When the new table has been created, foreign keys are added to both theoriginal and the new tables. Records in the original table use these to referto the records in the new table and vice versa. As illustrated in Figure 3.6,all foreign keys are initially NULL (Ronstrom, 2000).

A number of triggers are needed on the original table to ensure thatoperations are executed on the new table as well (Ronstrom, 2000).

Inserts into the original table trigger a scan of the new table. If a recordwith matching attribute values is found in the new table, the foreign key ofthe inserted record is set to point to it. In addition, the foreign key of the

Page 47: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 3. A SURVEY OF TECHNOLOGIES RELATED TONON-BLOCKING DERIVED TABLE CREATION 33

"Original table"

New Schema

Person

Firstname

Surname

Address

Zip Code

PAddress

Zip Code

City

Old Schema

Person

Firstname

Surname

Address

Zip Code

CityCityZip Code

"Split table"

"New table"

Figure 3.5: Vertical Split over a functional dependency. A table “Person” issplit into “Person” and “PAddress”. Only the “new” table, PAddress, is aderived table.

record in the new table is updated to also point to the newly inserted record.If no matching record is found in the new table, a record is inserted beforeupdating the foreign keys.

A delete operation triggers a delete of the record referred to in the newtable if this is the only original record contributing to it. Hence, the existenceof other referring records has to be checked before the record in the new tableis deleted.

A trigger is also needed on updates that affect records in the new table. Ifthe updated record in the original table is the only record referring to it, thenew record is simply updated. Otherwise, a record with the updated valuesis inserted into the new table before updating the foreign keys.

Assuming that the schema transformation is soft, triggers must also beadded to the new table (Ronstrom, 1998; Ronstrom, 2000). Delete operationstrigger the deletion of split attribute values in all referring records. Updateoperations trigger updates to all referring records in the split table. Finally,insert operations update all record in the original table with matching joinattribute values. Thus, the insert trigger has to scan all original records tofind these join matches.

Inconsistencies. Assuming that inconsistencies never occur, the describedvertical split method works well. Unfortunately, this assumption does nothold. Consider Figure 3.7, which illustrates a typical inconsistency: tworecords with zip code “7020” have the city value “Tr.heim”, while a third

Page 48: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

34 3.1. RONSTROM’S SCHEMA TRANSFORMATIONS

Zip

City

FK

Zip

7020

5121

7020

0340

7020

Address

Moholt 3

Torvbk 6

Mollenb. 7

Oslovn 34

Berg 1

City

Tr.heim

Bergen

Tr.heim

Oslo

Tr.heim

F.Name

Hanna

Erik

Markus

Peter

Sofie

S.Name

Valiante

Olsen

Oaks

Pine

Clark

FK

NULL

NULL

NULL

NULL

NULL

(a) The first step of Vertical Split Schema Transformation of a table “Person” into“Person” and “PAddress”. The new table has been created, and foreign keys have beenadded to both the original and the new table. The City attribute is gray because it isonly part of the original schema, not the transformed one.

Zip

5121

7020

City

Bergen

Tr.heim

FK

Currently scaning

Foreign key

Zip

7020

5121

7020

0340

7020

Address

Moholt 3

Torvbk 6

Mollenb. 7

Oslovn 34

Berg 1

City

Tr.heim

Bergen

Tr.heim

Oslo

Tr.heim

FK

NULL

NULL

F.Name

Hanna

Erik

Markus

Peter

Sofie

S.Name

Valiante

Olsen

Oaks

Pine

Clark

(b) The transformation process has started to copy records from the original table tothe new table. Only the three topmost records have been read so far.

Figure 3.6: Illustration of the vertical split transformation.

record has a mistyped value “Tr.hemi”. Inconsistencies like this are calledanomalies (Garcia-Molina et al., 2002).

Anomalies are likely to appear in vertical split transformations since theold schema does not guarantee consistency. E.g., nothing prevents the cityattribute values “Tr.hemi” and “Tr.heim” to appear at the same time, as il-lustrated in Figure 3.7. In many cases, vertical split changes over a functionaldependency may be performed just to avoid such anomalies by decomposinga table into Boyce-Codd Normal Form (BCNF)3.

This problem is not addressed in the method description, but there are atleast a few possible solutions: one possibility is to update the record in thenew table with values from the latest read. This would in turn result in anupdate of all records in the original table that are referred to by this record.

3A schema in BCNF is guaranteed to not have anomalies (Garcia-Molina et al., 2002).

Page 49: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 3. A SURVEY OF TECHNOLOGIES RELATED TONON-BLOCKING DERIVED TABLE CREATION 35

Zip

5121

7020

0340

City

Bergen

Tr.heim

Oslo

FK

Currently scaning

Foreign key

Zip

7020

5121

7020

0340

7020

Address

Moholt 3

Torvbk 6

Mollenb. 7

Oslovn 34

Berg 1

City

Tr.heim

Bergen

Tr.heim

Oslo

Tr.hemi

FK

F.Name

Hanna

Erik

Markus

Peter

Sofie

S.Name

Valiante

Olsen

Oaks

Pine

Clark

?

Figure 3.7: Vertical Split with an inconsistency. The figure illustrates thesame scenario as Figure 3.6, but with an inconsistency between records withzip code 7020.

In Figure 3.7, both Hanna Valiante and Markus Oaks would in this case getan incorrect but consistent city name.

Another solution is to count the number of records that agree on eachvalue, in this case two vs. one, and let the majority decide. With this strategy,the record in the original table may have to be updated. In the figure, thecity value of Sofie Clark would be updated to Tr.heim. This strategy is likelyto produce correct results in most scenarios, but there are no guarantees. Itmay, in some cases, result in incorrect attribute values for records that werecorrect in the first place.

A third possibility is to add a non-derived primary key, e.g. an auto-incremented number, to the new table. The new schema could, e.g., look likethis:

Person (f.name,s.name, address, postalID)PostalAddress (postalID, zip, city)

This would, however, not remove anomalies since the implicit functionaldependency between zip code and city is not resolved. Thus, in cases wherethe vertical split is used to decompose a table into 3NF or BCNF, this solutionwould not meet the intention of the operation.

The described problem is similar to that described for soft vertical mergetransformations in the previous section, where triggers of inserts into themerged table find a record in the originating table that has the same key butdiffers on other attributes. Such inconsistencies are, however, much morelikely to occur in vertical split transformations since they may be introducednot only during the time interval when transactions operate on both schemas,but also at any point in time before the transformation started. Furthermore,

Page 50: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

36 3.1. RONSTROM’S SCHEMA TRANSFORMATIONS

Legendrx/wx Read/write a record in table xrxfk/wxfk Read/write the foreign key of a record in table x|x| Cardinality of table xRF Reference Factor - average number of references a record in one

table has to records in another tablevm Vertical mergevs Vertical splitleft Left table (see vertical merge)ogt Originating table (see vertical merge)m Merged table (see vertical merge)ogl Original table (see vertical split)split Split table (see vertical split)new New table (see vertical split)

Table 3.2: Legend for Tables 3.3 and 3.4.

the problem cannot be solved by aborting the transaction that introducedthe anomaly since it may already have committed.

3.1.3 Cost Analysis of Ronstrom’s Method

Since the triggers that keep the old and the new schemas consistent are exe-cuted within the transaction that triggered them, the cost added to normaloperations is highly relevant: it will increase response time of each operationin addition to the overall workload of the database system. Since cost esti-mates or test results have not been published for the method, an analysis isprovided in this section.

The reference factor (RF) is defined as the average number of referencesa record in one table has to records in another table. Thus,

RFvm =|left|

|originating|(3.1)

RFvs =|split||new|

(3.2)

for vertical merge and split, respectively. RFvm and RFvs will be referred toas RF unless otherwise noted.

Tables 3.3 and 3.4 summarizes the added trigger cost for normal oper-ations on the involved tables during vertical merge and split schema trans-formations, respectively. The added trigger costs for the horizontal methods

Page 51: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 3. A SURVEY OF TECHNOLOGIES RELATED TONON-BLOCKING DERIVED TABLE CREATION 37

Vertical MergeOperation on Added cost

Old

Sche

ma

Lef

t

Insert rogt+wm+wogtfk assuming an index on join attribute

Update

{rogt + wm + 2× wogtfk if join att. is updated,

0 otherwise

Delete 0

Ori

g.t Insert1

{RF × wm if index on join att. in left table,|left| × rm + RF × wm otherwise.

Update RF × wm

Delete RF × (rmfk + wm)

New

Sche

ma

Mer

ged

Insert

rogt + wogt + (RF − 1)× wm if non-equal record exists,rogt + wogtfk if equal record exists,wogt if record does not exist

Update

rogtfk + wogtfk + wogt if non-derived prim-key, >1 reference,rogtfk + wogt if non-derived prim-key, one reference,rogtfk + wogt + (RF − 1)× wm

if join att is prim-key,

0 if no atts. in orignt. table are updated.

Delete 0

Table 3.3: Added Cost Incurred by Vertical Merge Schema TransformationMethods. (1) Note that RF is often 0 for inserts into the originating table.

are always the same as the cost of the original operation, and are thereforenot shown in a separate table. A few examples are provided to clearify theincurred costs:

Example 3.1.2 (Cost of Consistent Insert During Vertical Split)A database contains a number of tables. One of these is “PersonInfo”, con-taining information on all Norwegian citizens, including zip code and city.The table is then vertically split into “Person” and “PostalAddress”. Thereare 4.6 million people and 4600 zip codes registered in the database. Thus,

RF =4.6million

4600= 1000

During the transformation, a user inserts a person into the original table (i.e.in the old schema). This new person has a zip code and city that is consistent

Page 52: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

38 3.1. RONSTROM’S SCHEMA TRANSFORMATIONS

Vertical SplitOperation on Added cost

Old

Sche

ma

Ori

gina

l

Insert

rnew + wnew + (RF − 1)× wogl if inconsistent,rnew + wnewfk if consistent,wnew if non-existing.

Update

rnewfk + wnewfk + wnew if non-derived prim-key, >1 reference,rnewfk + wnew if non-derived prim-key, one reference,rnewfk + wnew + (RF − 1)× wogl

if join att is prim-key,

0 if no atts. in new table are updated.

Delete

{rnewfk + wnew if only one referring,

rnewfk + wnewfk otherwise.

New

Sche

ma

Split

Insert

{rnew + woglfk if no join match in new table,rnew + wogl + wnewfk if join match in new table.

Update

{rnew + wogl + 2× wnewfk if join attribute is updated,

0 otherwise.

Delete rnewfk + wnewfk

New

Insert1{

RF × wogl if index on join att. in original,|original| × rogl + RF × wogl otherwise.

Update RF × wogl

Delete RF × (roglfk + wogl)

Table 3.4: Added Cost Incurred by Vertical Split Schema TransformationMethods. (1) Note that RF is often 0 for inserts into the new table.

Page 53: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 3. A SURVEY OF TECHNOLOGIES RELATED TONON-BLOCKING DERIVED TABLE CREATION 39

with all other persons in the “PersonInfo” table. The cost is:

Cex1 = Cnormal + Cadded

= wogl + rnew + wnewfk

For readability, assume that a read and a write operation has the samecost in terms of IO and CPU, and that a write of a foreign key has thesame cost as other write operations. With these assumptions, the simpleinsert has to perform three times more operations than it would without thetransformation. Furthermore, it has to read lock a record in the new table.

Example 3.1.3 (Cost of Update During Vertical Split)During the transformation described in Example 3.1.2, another user updatesthe city of a person in the original table. Since this transformation performsa split over a functional dependency, the primary key of the PostalAddresstable is zip code. Thus, when the update triggers an update of the record inthe new table, that update triggers an update of all original records with thesame zip code:

Cex2 = Cnormal + Cadded

= wogl + rnewfk + wnew + (RF − 1)× wogl

= rnewfk + wnew + 1000× wogl

Hence, the update results in 1001 more operations and 1000 more locks thanwould be the case without the transformation.

Example 3.1.4 (Cost of Update in New Schema)The transformation described in Example 3.1.2 is made soft, and a third usertherefore gets access to the new schema. The user updates a postal address:

Cex3 = Cnormal + Cadded

= wnew + RF × wogl

= wnew + 1000× wogl

The update results in 1000 more locks and operations than it would withoutthe transformation.

As can be seen from Tables 3.3 and 3.4 and the examples, the cost ofoperations during a schema transformation varies enormously. In almost allcases, however, the cost is at least two to three times higher in terms ofoperations and locks than it would be without the transformation.

Page 54: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

40 3.2. FUZZY TABLE COPYING

3.2 Fuzzy Table Copying

Fuzzy copying is a technique used to make a copy of a table in parallelwith other operations, including updates. There are two variants: the firstmethod is block oriented, and works like a fuzzy checkpoint (Hagmann, 1986)that is made sharp, i.e. consistent (Gray and Reuter, 1993). The secondmethod is record oriented, and is better suited when the copy is installed inan environment that is not completely identical to the source environment.An example is copying a table from one node to another in a distributedsystem. The block size may differ between these nodes, and the physicaladdress of the records in the copy would therefore differ from those in thesource.

Both fuzzy copy methods work in two steps: in the first step, the sourcetable is read without using any table or record locks, and therefore results inan inconsistent copy. The method gets its “fuzzy” name because this initialcopy is not consistent with a state the source table has had at any point intime. In the second step, the copy is made consistent by applying log recordsthat have been generated by concurrent operations during the first step.

In the block oriented fuzzy copy (BoFC) method (Gray and Reuter, 1993;Bernstein et al., 1987), the source table is read one block at a time. Locks areignored, but block latches4 (Gray, 1978) are used to ensure that the blocksare action consistent. The log records that have been generated while readingthe blocks are then redone to the fuzzy copy. Since the method copies blocks,the addresses of records in the copy are the same as in the source. Hence,logical and physiological logging (see Section 2.3) can both be used. Also,all four record identification schemes, described in Section 2.4, will work.

The record oriented fuzzy copy (RoFC) method (Bratsberg et al., 1997a)copies records instead of blocks in the first step. As for BoFC, only latches areused during this process. The copied records may then be inserted anywhere;there are no restrictions on the physical address of the records at the newlocation. Once the inconsistent copy has been installed, the log recordsare applied to it. Since the physical addresses of records may be differentat the new location, only logical logging can be used. Furthermore, stateidentifiers must be assigned to records instead of blocks, and the records mustbe identified by a non-physical record ID, as described in Section 2.4. Dueto its location independent nature, this strategy is suitable for declusteringin distributed database systems.

4Latches, also called semaphores, are locks held for a very short time (Elmasri andNavathe, 2004). They are typically used to ensure that only one operation is applied to adisk block at a time (Gray and Reuter, 1993).

Page 55: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 3. A SURVEY OF TECHNOLOGIES RELATED TONON-BLOCKING DERIVED TABLE CREATION 41

3.3 Materialized View Maintenance

Materialized views (MVs) are used in database systems, e.g., to speed upqueries by precomputing and storing query results, and in data warehouses.The work on MVs started with Snapshots (Adiba and Lindsay, 1980). Thesewere able to answer queries on historical data and speed up the query pro-cessing. As the benefits of Snapshots were appreciated, the concept wasextended to be able to answer queries on current data to lower query cost.This extension to Snapshots is called Materialized Views (MVs).

During the last two decades, MVs have evolved to become a very benefi-cial addition to DBMSs. Benefits include less work when processing queriesand less network communication in distributed queries. The purpose of thechapter is to address the problems with MVs and to show their proposedsolutions from the literature.

As is evident from the following chapters, methods to keep MVs up to datehave been researched extensively. The initial creation of MVs has, however,been neglected.

3.3.1 Snapshots

Database Snapshots marks the beginning of what is now known as Mate-rialized Views (MVs). They are defined by a query and are populated bystoring the query result in the Snapshot table. Once created, transactionsmay query them for historical data (Adiba and Lindsay, 1980).

Snapshots can later be refreshed to reflect a newer state. This can bedone by deleting the content of the snapshot and then reevaluate the query(Adiba and Lindsay, 1980). An alternative is to take advantage of recoverytechniques such as differential files and the recovery log to compute delta-values to the old Snapshot only (Kahler and Risnes, 1987). This generallyrequires less work than the first method.

An algorithm using the second strategy is presented by Lindsay et al.(Lindsay et al., 1986). The algorithm associates a timestamp value withevery record in the source relation of the Snapshot. The timestamp is amonotonically increasing value with which it is easy to decide whether anupdate occurred before or after another update. When a Snapshot is up-dated, the update transaction uses the timestamp to find updates that tookplace after the previous Snapshot. Only records with a higher timestampvalue need to be updated in the Snapshot.

Page 56: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

42 3.3. MATERIALIZED VIEW MAINTENANCE

Zip

7020

7030

0340

City

Trondheim

Trondheim

Oslo

PostalAddress

City

Trondheim

Oslo

MV

Figure 3.8: Illustration of Example 3.3.1: A Materialized View stores all citynames in the PostalAddress table.

3.3.2 Materialized Views

In contrast to Snapshots, MVs are typically not allowed to be out of datewhen they are queried. They are divided into two main groups: immediateand deferred update. These two groups differ only in when they are refreshed;the former method forwards updates within the user transaction that updatedthe source records while the latter leaves the MV update work to a separateview update transaction.

An important part of MV maintenance is to keep the MVs consistentwith the source tables. To do this, updates to the source tables have to beforwarded correctly to the view. The following example illustrates one of theproblems of keeping consistency:

Example 3.3.1 (An MV Consistency Problem)An MV defined as the set (i.e. no duplicates) of all cities in the Postal-Address table, as illustrated in Figure 3.8. The current state of the sourcetable and the MV is illustrated above. Suppose a transaction deletes therecord < 0300, Oslo > from the source table. A correct execution in theMV is to delete the record < Oslo >. On the other hand, if the transactiondeletes the record < 7020, T rondheim >, a correct execution is to not delete< Trondheim > from the MV.

Multiple solutions have been proposed to address the consistency prob-lem. Blakeley et al. (Blakeley et al., 1986) showed that counters could beused to keep track of the multiplicity of records in MVs. The counter isincreased by insertion of duplicates, and decreased by deletion of duplicates.If the counter reaches zero, the record is removed from the MV. The methodcould originally only handle select-project-join (SPJ) views, but has laterbeen improved to handle aggregates, negations and unions (Gupta et al.,1992; Gupta et al., 1993).

Page 57: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 3. A SURVEY OF TECHNOLOGIES RELATED TONON-BLOCKING DERIVED TABLE CREATION 43

Gupta et al. (Gupta et al., 1993) presents another algorithm called“Delete and Rederive” (DRed). An overestimate of records that may bedeleted from the MV is first computed. The records with alternative deriva-tions are then removed from the delete set. Finally, new records that needto be inserted are computed. The authors recommend the DRed methodwhen dealing with recursive MVs, and the counting method when dealingwith non-recursive MVs.

In contrast to the methods described so far, Qian and Wiederhold (Qianand Wiederhold, 1991) and Griffin et al. (Griffin and Libkin, 1995; Griffinet al., 1997) use algebra as the basis for computing updates. They arguethat it is easier to prove correctness of algebra than algorithms and to deriverules for other languages (Griffin et al., 1997). Qian et al. present algebrapropagation rules that take update operations on SPJ views as input andproduce an insert set ∆R and a delete set ∇R (Qian and Wiederhold, 1991).The method has also been extended to handle bag semantics (Griffin andLibkin, 1995).

All methods described so far are immediate. Immediate maintenancehas a serious drawback, however: an extra workload is incurred on all usertransaction operations that have to be propagated. In deferred MVs, theview is maintained by a view update transaction, not the user transaction.Accordingly, this does not incur extra update work on each user transaction,and deferred methods should therefore be used whenever possible (Colbyet al., 1996; Kawaguchi et al., 1997). The update transaction is typicallyinvoked periodically or by a query to the view.

The algorithms described for immediate update cannot be used in a de-ferred strategy without modification. The reason for this is that the imme-diate methods use data from the source tables to update the MVs. Whendeferred methods are used, the state of the source tables may have changedbefore the maintenance starts. This is called the “state bug” (Colby et al.,1996). Colby et al. (Colby et al., 1996) extend the algebra by Qian et al.(Qian and Wiederhold, 1991) to overcome this problem: both a log L andtwo view differential tables (∆MV and ∇MV for inserts and deletes, respec-tively) are used. The MV is in a state consistent with a previous state sp ofthe source tables. ∆MV and ∇MV contain the operations that need to beapplied to the MV to make it consistent with a state closer to the present.The log L is used to maintain ∆MV and ∇MV. I.e., the MV has a statesp which is before or equal in time to an intermediate state si that can bereached by applying the updates in ∆MV and ∇MV. The current state ofthe source tables sc can again be reached from si by applying the updates inL. By using the log, the authors compute the state si that the differentialtables are in. This is the pre-state needed to use the immediate propagation

Page 58: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

443.4. SCHEMA TRANSFORMATIONS AND DT CREATION IN

EXISTING DBMSS

methods without encountering the state bug. The algorithm uses two prop-agation transactions: one for updating the differential tables using the log,and one for updating the MVs using the differential tables. This imposesvery little overhead on ordinary transactions as the only extra work theyhave to do is to write a log record without any further computation.

Self-Maintainability

Operations that can be forwarded to derived tables without requiring moredata than the table itself and the operation are called Autonomously Com-putable Updates (ACUs) (Blakeley et al., 1989). Self-maintainable (Self-M)Materialized Views are MVs where all operations are ACUs (Gupta et al.,1996).

When operations are applied to an MV that is not Self-M, the sourcetables must be queried for the missing information. Self-M is therefore ahighly desirable property in systems with fast response time requirements(Gupta et al., 1996) and when the MV is stored on a different node than thesource tables. For Self-M MVs, only the log has to be shipped to the MVnode.

Only a very limited set of views are Self-M, and Quass et al. (Quass et al.,1996) therefore extend the concept to also include views where auxiliaryinformation makes the view Self-M. The auxiliary information, typically atable, is stored together with the MV, and is updated accordingly.

Our derived table creation method benefits from self-maintainability inthe same way as MV maintenance. Hence, in DT creation of all relationaloperators where the DTs themselves are not Self-M, an auxiliary table withthe missing information will be added.

3.4 Schema Transformations and DT creation

in Existing DBMSs

Existing database systems, including IBM DB2 v9 (IBM Information Center,2006), Microsoft SQL Server 2005 (Microsoft TechNet, 2006), MySQL 5.1(MySQL AB, 2006) and Oracle 10g (Lorentz and Gregoire, 2003b), offer onlysimple schema transformation functionality (Løland, 2003). These includeremoval of and adding one or more attributes to a table, renaming attributesand the like. Removal of an attribute can be performed by changing thetable description only, thus leaving the physical records unchanged for anunspecified period of time.

Page 59: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 3. A SURVEY OF TECHNOLOGIES RELATED TONON-BLOCKING DERIVED TABLE CREATION 45

Complex schema transformations and MV creations are performed byblocking operations. The source tables are locked with a shared lock whilethe content is read and the result of the query inserted into the DTs (Løland,2003). Throughout this thesis, we will call this the insert into select methoddue to the common SQL syntax of these operations. For example, the SQLsyntax for DT creation in MySql is (MySQL AB, 2006):

insert into <table-name> select <select-statement>

3.5 Summary

In this chapter, research related to non-blocking creation of derived tableswas presented.

A method that can be used for vertical and horizontal merge and splitschema transformations has been suggested by Ronstrom (Ronstrom, 2000).The solution involves creation of derived tables in the horizontal merge andsplit cases. In addition, one of the resulting tables in vertical split is a DT.It is therefore likely that these relational operators can be used for otherDT creation purposes than schema transformations as well. One example iscreation of materialized views, although this possibility is not discussed byRonstrom.

Although test results have not been published on the method, the costanalysis in Section 3.1 indicates that it is likely to degrade throughput andresponse time significantly for the duration of the transformation. The reasonfor this is that write operations in one schema (old or new) trigger a varyingnumber of write operations in the other schema.

The DT creation method we suggest in Part II of this thesis extends theideas from the record oriented fuzzy copy (RoFC) technique described inSection 3.2. Similar to RoFC, we make an inconsistent copy of the involvedtables and use the log to make the copied data consistent. An importantdifference is, however, that we apply relational operators to the inconsistentcopies. Because of this, the log records can not be applied to the copied datain any straightforward way.

The DT creation method developed in this thesis is also related to ma-terialized view maintenance. In particular, we will use auxiliary tables toachieve Self-Maintainable derived tables. The data in these auxiliary tableswill be needed whenever the DTs themselves do not contain all data requiredto apply the log.

The “insert into select” method for DT creation will not be used in oursolution. We will, however, compare our DT creation method to the “insert

Page 60: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

46 3.5. SUMMARY

into select” method, and discuss under which circumstances our method isbetter than the existing solution and vice versa.

Page 61: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

Part II

Derived Table Creation

Page 62: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems
Page 63: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

Chapter 4

The Derived Table CreationFramework

In Chapter 1, we presented the overall research question of this thesis:

How can we create derived tables and use these for schema trans-formation and materialized view creation purposes while incur-ring minimal performance degradation to transactions operatingconcurrently on the involved source tables.

With the research question in mind, this part of the thesis describes oursuggested method for creating derived tables (DTs) without blocking othertransactions. Once a DT has been created, it can be used as a materializedview (MV) or to transform the schema. The method aims at degradingthe performance of concurrent transactions as little as possible. To whichextent the method meets the performance aspects of the research question isdiscussed in Part III: Implementation and Testing.

In this chapter, we suggest a framework that can be used in the generalcase to create DTs in a non-blocking way. As such, this chapter presents anabstract solution to the first part of the research problem stated above.

In Chapter 5, we identify common problems encountered when the frame-work is used for DT creation using relational operators. General solutions tothese problems are also suggested. Chapter 6 contains detailed descriptionsof how the framework is used to create DTs using the six relational operators.

4.1 Overview of the Framework

The non-blocking DT creation framework presented in this chapter operatesin four steps. As illustrated in Figure 4.1, these are: preparation, initialpopulation, log propagation and synchronization.

Page 64: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

50 4.1. OVERVIEW OF THE FRAMEWORK

New Schema

DBA

1) Prepa- ration

Users

Old Schema

Office

RoomnumberTelephoneAddressCity

Product

ProductCodeItemsInStockPrice

Position

PositioncodeSalary

Log

Algebra

Nonblocking read

Insert

2) Initial Population

3) Log Propagation

ModifiedEmp

FirstnameSurnamePositionAddressZipCodeCityState

Employee

FirstnameSurnamePositionAddressZipCode

PAddress

ZipCodeCityStateLatch

4) Synchronize

Figure 4.1: The four steps of DT creation.

During the preparation step, necessary tables, indices etc are added tothe database schema. These should not be visible to users until DT creationreaches the final step; synchronization. Once the required structures are inplace, the initial population step writes a fuzzy mark to the log and thenstarts to read the involved source tables. The fuzzy mark is later used tofind the point where reading started. The relational operator used to createthe DT is then applied, and the result is inserted into the newly createdDTs. Note that no table or record locks are used, and the derived recordsare therefore not necessary consistent with the source table records1.

Log propagation is then started. It works in iterations, and each iterationstarts by writing a place-keeper mark, called fuzzy mark, to the log. Thewrite operations that have been executed on source table records between the

1They are consistent if no modifying operations have been performed on the sourcerecords during the copying.

Page 65: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 4. THE DERIVED TABLE CREATION FRAMEWORK 51

last mark and the new one are then propagated, or forwarded, to the DTs byapplying recovery techniques. These techniques must, however, be modifiedsince relational operators have been applied to create the derived records. Ifa considerable amount of updates have been performed on the source recordsduring an iteration, a new iteration is started. This is repeated until thereare few operations that distinguish the source tables from the DTs.

The fourth step, synchronization, latches the source tables while the re-maining logged operations are applied to the DTs. Since log propagationwas repeated until there were only a few operations left to apply to the DTs,these latches are held for a very short period of time. When all log recordshave been forwarded, the DTs are in the same state as the source tables, andare ready to be used.

The four steps of non-blocking DT creation are described in more detail inthe rest of this chapter. Materialized Views can use this framework withoutmodification. Note, however, that even though DTs can also be used toperform schema transformations, the framework must be slightly modifiedto do so. The reason for this is that different transactions are allowed toconcurrently operate in the two schema versions. These modifications to theframework are discussed in Section 4.6.

4.2 Step 1: Preparation

DT creation starts by adding the derived tables to the database schema. Thisis done by create table SQL statements. In addition to the wanted subset ofattributes from the source tables, the DTs typically have to include a recordstate identifier2, and a Record ID (RID) from each source record contributingto the derived records. In this thesis, we assume that the RID is based onlogical addressing, but physical identification techniques can also be usedif a RID mapping table is maintained. The record and state identificationconcepts were described in Section 2.3.

Depending on the relational operator used for DT creation, attributesother than RID and LSN may also be required. An example is vertical merge,i.e. full outer join, in which the join attributes are required to identify whichsource records should be merged in the DT. If any of these required attributesare not wanted in the DT, they must be removed after the DT creation hascompleted. This can be done by a simple schema transformation, which isavailable in modern DBMSs.

Constraints, both new and from the source tables, may be added to thenew tables. This should, however, be done with great care since constraint

2I.e., a Log Sequence Number (LSN) on record (Hvasshovd, 1999).

Page 66: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

52 4.2. STEP 1: PREPARATION

violations may force the DT creation to abort, as illustrated in Example4.2.1:

Example 4.2.1 (Bad Constraint)Consider a one-to-many full outer join of the tables Employee and PostalAd-dress, as illustrated in Figure 4.1. A unique constraint has been defined forthe ZipCode attribute in PostalAddress. If this unique-constraint is addedto the derived table “ModifiedEmp”, the transformation will have to abortif more than one person has the same zip code.

Any indices that are needed on the new tables to speed up the DT creationprocess should also be added during this step. In particular, all attributesthat are used by DT creation to identify records should be indexed. Examplesinclude RIDs copied from the source tables, and join attributes in the caseof vertical merge. These indices decrease the time used to create the DTssignificantly. The source record ID3 is, e.g., often used to identify derivedrecords affected by a logged operation. Without an index on this attribute,log propagation of these operations have to scan all records in all DTs to findthe correct record(s). With the index, the record(s) can be identified in onesingle read operation. Which indices are required differ for each relationaloperator, and are therefore described in more detail in Chapter 6. Note thatthe indices created during the preparation step will be up to date at anytime, including immediately after the DT has been created.

The DT creation for some of the relational operators requires informationthat is not stored in the DTs. Consider the following example:

Example 4.2.2 (Auxiliary Tables)Two DTs, “GoldCardHolder” and “PlatinumCardHolder”, are created byperforming a horizontal split4 of the table “FrequentFlyer Customer”. Mar-cus, who has a Silver Frequent Flyer Card, does not qualify for any of theseDTs.

While the DTs are being created, however, Marcus buys a flight ticket toHawaii. With this purchase, he is qualified for a gold card. His old customerinformation is now required by the DT creation process so that Marcus canbe added to the “GoldCardHolder” DT. This information can not be foundin either of the DTs.

3The RID of the source record a DT record is derived from.4Horizontal Split is the inverse of union.

Page 67: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 4. THE DERIVED TABLE CREATION FRAMEWORK 53

In cases like the one in Example 4.2.2, auxiliary tables must also be added tothe schema. The auxiliary tables store the information required by the DTcreation method, and are similar to those used to make MVs self-maintainable(Quass et al., 1996). The detailed DT creation descriptions in Chapter 6describe the required auxiliary tables when these are required.

4.3 Step 2: Initial Population

The newly created DTs have to be populated with records from the sourcetables. This is done by a modified fuzzy copy technique (see Section 3.2), andthe first step of populating the DTs is therefore to write a fuzzy mark in thelog. This log record must include the transaction identifier of all transactionsthat are currently active on the source tables. This is a subset of the activetransaction table (Løland and Hvasshovd, 2006c). The transaction table willbe used by the next step, log propagation, to identify the oldest log recordthat needs to be applied to the DTs.

The source tables are then read without setting locks. This results in aninconsistent read (Hvasshovd et al., 1991). The relational operator used forDT creation is then applied, and the results, called the initial images, areinserted into the DTs.

4.4 Step 3: Log Propagation

Log propagation is the process of redoing operations originally executed onsource table records to records in the DTs. All operations are reflected se-quentially in the log, and by redoing these, the derived records will eventuallyreflect the same state as the source records.

The log propagation step, which works in iterations, starts when theinitial images have been inserted into the DTs. Each iteration starts bywriting a new fuzzy mark to the log. This log record marks the end ofthe current log propagation iteration and the beginning of the next one. Logrecords of operations that may not be reflected in the DTs are then inspectedand applied if necessary. In the first iteration, the oldest log record that maycontain such an operation is the oldest log record of any transaction thatwas active when the first fuzzy mark was written. The reason for this is thatthe transactions that were active on the source tables may have been able tolog a planned operation but not perform it yet at the time the initial readwas started. This is a consequence of Write-Ahead Logging, as described inSection 2.3, which requires that write operations are logged before the record

Page 68: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

54 4.5. STEP 4: SYNCHRONIZATION

is updated. In later iterations, only log records after the previous fuzzy markneeds to be propagated.

When the log propagator reads a new log record, affected records in theDTs are identified and changed if the LSNs indicate that the records repre-sent an older state than that of the log record. The effects of applying thelog records depend on the relational operator used for the DT creation inquestion, and are therefore described in more detail in Chapter 6.

The synchronization step should not be started if a significant portion ofthe log remains to be propagated. The reason for this is that synchronizationinvolves latching the source tables while the last portion of the log is propa-gated. These latches effectively pauses all transactions on the source tables.Each log propagation iteration therefore ends with an analysis of the remain-ing work. The analysis can, e.g., be based on the time used to complete thecurrent iteration, a count of the remaining log records to be propagated,or an estimated remaining propagation time. Based on the analysis, eitheranother log propagation iteration or the synchronization step is started.

A consequence of the described log propagation strategy is that this stepwill never finish iterating if more log records are produced than the propa-gator can process during the same time interval. We suggest four possiblesolutions for this case, none of which are optimal: One possibility is to abortthe DT creation transaction. If so, the DT creation work performed is lost,but normal transaction processing will be able to continue as normal. Al-ternatively, the DT creation process may get a higher priority. The effectof this is that more log is propagated at the cost of lower performance forother transactions. A third possibility is to reduce the number of concurrenttransactions by creating a transaction queue. Like the previous alternative,this increases response time and decreases throughput for other transactions.As a final alternative, we may stop log propagation and go directly to the syn-chronization step. Synchronization will in this case have to latch the sourcetables for a longer period of time. Depending on the remaining number oflog records to propagate, this strategy can still be much quicker than theinsert into select strategy used in modern DBMSs.

4.5 Step 4: Synchronization

When synchronization is initiated, the state of the DTs should be very closeto the state of the source tables. This is because the source tables have tobe latched during one final log propagation iteration that makes the DTsconsistent with the source tables.

We suggest two ways to synchronize the DTs to the source tables and

Page 69: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 4. THE DERIVED TABLE CREATION FRAMEWORK 55

thereby complete the DT creation process. These are blocking synchroniza-tion and non-blocking synchronization. The blocking method makes the DTstransaction consistent with the source tables, while the non-blocking methodonly enforces action consistency. Note that the choice of strategy affects thesynchronization step only; the three first steps of DT creation are unaffected.

Blocking synchronization

Blocking synchronization blocks all new transactions that try to access anyof the involved tables. Transactions that already have locks on the sourcetables are either allowed to complete or forced to abort. Table locks arethen acquired on the source tables before a final log propagation iterationis performed. This log propagation makes the DTs transaction consistentwith the source tables. Blocking complete is the least complex synchroniza-tion strategy, but it does not satisfy the non-blocking requirement for DTcreation.

Non-blocking synchronization

The non-blocking strategy latches the source tables for the duration of onefinal log propagation iteration. Latching effectively pauses ongoing transac-tions that perform update work on the source tables, but the pause shouldbe very brief since the state of the DTs is very close to that of the sourcetables. Note that read operations are not paused. Once the log propagationcompletes, the DTs are in the same state as the source tables.

The newly created DTs are now almost ready to be used as MVs. Theonly remaining task is to add the preferred MV maintenance strategy to it,e.g. one of those described in Section 3.3. From now on, the MV maintenancestrategy is responsible for keeping the DTs consistent with the source tables.The latches are then released, allowing transactions to resume their updateoperations on the source tables.

The above distinction between the blocking and non-blocking strategiesmay seem artificial. However, the difference in time interval in which up-dates are blocked from the source tables may be considerable. In the formermethod, new transactions are blocked from the source tables until all trans-actions that have already accessed them have completed. The updates per-formed by these transactions also have to be redone by the log propagation,which adds further to the blocking time. The time required for a transactionto complete can not be easily controlled.

In the latter strategy, transactions are only blocked during log propaga-tion. As previously discussed, we can easily control the time needed by this

Page 70: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

56 4.6. CONSIDERATIONS FOR SCHEMA TRANSFORMATIONS

log propagation by not starting the synchronization step until the states ofthe DTs and the source tables are very close.

4.6 Considerations for Schema Transforma-

tions

Materialized View creation is a straightforward application of DTs, and cantherefore use the non-blocking DT creation framework as described in theprevious sections. On the other hand, using the framework to perform schematransformations is more complex. The reason is that in contrast to the MVcreation case, transactions will be active in both the source tables and theDTs at the same time during non-blocking synchronization. Consider thefollowing example:

Example 4.6.1 (A Non-blocking Schema Transformation)A non-blocking schema transformation is being performed, in which thetables “Employee” and “PostalAddress” are vertically merged into “Mod-ifiedEmp”. This is illustrated in Figure 4.1. The first three steps of the DTcreation process have already been executed; only non-blocking synchroniza-tion remains.

The synchronization step starts by latching “Employee” and “PostalAd-dress”, before propagating the remaining log records to “ModifiedEmp”. Atthis point, the records in the source tables and the DT reflect the samestate, and schema transformation is nearly complete. New transactions arenow given access to the “ModifiedEmp” table instead of the source tables.However, the transactions that were paused by the latches may have morework to do. Thus, until these old transactions have completed, new transac-tions will be updating records in “ModifiedEmp” while the old transactionsare updating records in “Employee” and “PostalAddress”.

Example 4.6.1 illustrates that we have to make sure that transactions operat-ing on the same data objects in two different schema versions do not conflict.For example, a transaction in the new schema should not be allowed to changethe position of Eric to “Software Engineer” if a currently active transaction inthe old schema has already modified the same Eric record. Recall from Sec-tion 2.2 that the main responsibility of the scheduler is to provide isolation,and that this property is guaranteed if the histories are serializable. Thus,when synchronizing schema transformations, we must ensure serializabilitybetween operations performed on records stored in different tables.

Page 71: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 4. THE DERIVED TABLE CREATION FRAMEWORK 57

Note that if the blocking synchronization strategy is used, serialization isnot a problem. In this case, transactions that are active in the old schemacomplete their work before transactions are given access to the new schema.Thus, this synchronization strategy can be adopted without modification.We therefore focus only on non-blocking synchronization in the rest of thischapter.

The non-blocking synchronization strategies are divided into two strate-gies for schema transaction purposes. These are non-blocking abort and non-blocking commit. Here, “abort” and “commit” refers to whether the transac-tions active in the old schema are forced to abort or are allowed to continuework after the source table latches have been removed. The reason for mak-ing this distinction is that in the former case, transactions in the old schemaare not allowed to acquire new locks. This scenario is significantly easierto handle than the latter case, in which new locks may be acquired in bothschema versions.

Both non-blocking strategies ensure serializable histories across the twoschema versions. This is done by using a modified Strict Two Phase Locking(2PL)5 (Bernstein et al., 1987) strategy, in which locks are set on all versionsof the data record.

Non-blocking Abort Synchronization

The non-blocking abort strategy latches the source tables for the durationof a log propagation iteration. As previously discussed, this pauses writeoperations on the source table records. Once the log propagation completes,the DTs are in the same state as the source tables. The locks acquiredby transactions in the source tables are then forwarded to the respectiverecords in the DTs. At this point, new transactions are given access tothe unlocked parts of the DTs. The source table latches are now released,and all transactions in the old schema are forced to abort. Aborting all oldtransactions incurs that they may not acquire new locks6.

Log propagation continues to iterate until all operations performed on thesource tables have been forwarded to the DTs. In addition to redoing writeoperations, the propagator also ensures that source table locks forwarded tothe DTs are released. Forwarded locks are released as soon as log propagationhas processed the abort log record of the transaction owning the lock. Thesource tables can be removed from the schema once all old transactions have

5See Section 2.2 for a description on 2PL.6An alternative is to only abort the old transactions that try to acquire new locks. The

non-blocking abort strategy works for both cases since new locks are not acquired in theold schema in either of the cases.

Page 72: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

58 4.6. CONSIDERATIONS FOR SCHEMA TRANSFORMATIONS

aborted.

Non-blocking Commit Synchronization

Non-blocking commit synchronization is equal to the previous strategy inmany aspects: the source tables are latched, a log propagation iteration isused to synchronize the DT states to the source table states, and sourcetable locks are forwarded to the DTs. In contrast to the previous strategy,however, transactions on the source tables are allowed to continue forwardprocessing after the latches have been removed. Ronstrom calls this a softtransformation (Ronstrom, 2000).

A consequence of allowing source table transactions to acquire new locks,is that new conflicts may occur across different schema versions. This strat-egy therefore requires that when a lock is acquired, all other versions of thatrecord are immediately locked as well. A thorough discussion of implicationscan be found in Section 5.3, but simply put, a transaction that wants to ac-cess a record rdt in a DT has to set a lock both on rdt and on all records thatrdt is derived from. Likewise, a transaction accessing a record in a sourcetable has to lock both that record and all DT records derived from it. Toavoid unnecessary deadlocks, locks are always acquired in the source tablesfirst, and then in the DTs.

Log propagation has to be modified to forward DT operations not onlyfrom source to derived records, but from derived to source records as well.This is done so that source table transactions can see the updates performedby transactions in the new schema. Log propagation is also responsible forremoving source table locks from the DTs and vice versa. This is done in asimilar manner as described for the non-blocking abort strategy.

It is clear that the non-blocking abort strategy produces serializable his-tories: the scheduler uses Strict 2PL to produce serializable histories withineach table. Before any transaction is given access to the records in the de-rived tables, all records that are locked in the source tables are also lockedin the derived tables. These forwarded source locks are not released in thenew schema until the log propagator has applied all operations executed bythe transaction owning the lock. Hence, transactions in the new schema canonly access committed values.

Although less intuitive, non-blocking commit synchronization also pro-duces serializable histories: locks are acquired immediately in both schemaversions. If a transaction is not able to lock all versions of a record, thetransaction has to wait for the lock. Furthermore, forwarded locks are notreleased until after all operations by that transaction have been propagated.Hence, transactions in either schema can only access committed values with

Page 73: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 4. THE DERIVED TABLE CREATION FRAMEWORK 59

this strategy as well.Because of the added complexity and increased chance for locking con-

flicts associated with non-blocking commit, the non-blocking abort strategymay be a better choice for some schema transformation operators. This isespecially true for operators where one DT record may be composed of mul-tiple source records, since one single DT lock requires multiple source tablelocks. Thus, a decision must be made on whether or not aborting the trans-actions active in the source tables is worse than risking lock contention. Thisproblem will be discussed in greater detail for each relational operator inChapter 6.

4.6.1 A lock forwarding improvement for schema trans-formations

We have argued that for schema transformations, transactions cannot begiven access to the new schema before all locks acquired in the old schemahave been forwarded. Forwarding all source table locks to the DTs mayrequire a considerable amount of work. This may be unacceptable sinceit has to be done during the critical time interval of synchronization whentransactions are not given access to any of the involved tables.

Two modifications are required to remove the lock forwarding phase fromthis critical time interval: first, the initial population step must be modifiedto store the source table locks to the first fuzzy mark. Second, DT lockacquisition and release must be included as part of the log propagation. Bydoing so, locks will be continuously forwarded, and therefore in place whenthe synchronization step is started.

4.7 Summary

In this chapter we have described a framework that will be used to createDTs for the six relational operators focused on. The framework can be usedboth to perform schema transformations and to create materialized views.

An important purpose of the framework has been to degrade the per-formance of concurrent transactions as little as possible. The framework istherefore based on copying the source tables in a non-blocking way. Since thesource tables are not locked, the copied records, which are inserted into theDTs, may not be consistent with the records in the source tables. Loggedoperations originally applied to source records are then propagated to theDTs. When all logged source operations have been propagated, the DTs areconsistent with the source tables and are ready to be used.

Page 74: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

Chapter 5

Common DT CreationProblems

In this chapter, we identify five problems that are encountered by DT cre-ation for multiple relational operators. We call these the missing record andstate identification, missing record pre-state, lock forwarding during transfor-mation and inconsistent source record problems. In what follows, we discussthese problems and suggest solutions.

5.1 Missing Record and State Identification

For logging to be useful as a recovery mechanism, there must be a way toidentify which record the logged operation applies to. Records therefore havea unique identifier, assumed in this thesis to be a Record Identifier (RID),that is stored in each log record. Record IDs are described in more detail inSection 2.1.

Since RIDs are unique, however, a record in a DT can not have the sameRID as the source record(s) it is composed of. Furthermore, even if thesource RIDs could have been reused, the DT creations where one DT recordmay be composed of two source records would still be problematic. We callthe problem of mapping record identification from source records to derivedrecords the record identification problem. It is solved by letting the recordsin the DTs have their own RIDs, but at the same time store the RID ofeach source record contributing to it. In vertical merge (full outer join) DTcreation, e.g., the RID of both source tables would be stored in the DT.

Log Sequence Numbers (LSNs) on records are used as state identifiers toensure idempotence during recovery and when making fuzzy copies. Duringrecovery or fuzzy coping, each log record is compared to the record with a

Page 75: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 5. COMMON DT CREATION PROBLEMS 61

RID

r11

r12

r13

RID

r01

r02

r03

r04

RID_L

r01

r02

r03

r04

NULL

Position

PosID

001

005

052

LSN

14

15

16

PosTitle

sec.tary

QA

proj mgr

Salary

$23’

$31’

$48’

Employee

Address

Moholt 3

Torvbk 6

Mollenb.7

Berg 1

Name

Hanna

Erik

Markus

Sofie

LSN

10

11

12

13

PosID

005

005

050

052

PosID

005

005

050

052

NULL

Address

Moholt 3

Torvbk 6

Mollenb.7

Berg 1

NULL

Name

Hanna

Erik

Markus

Sofie

NULL

Employee

LSN_L

10

11

12

13

NULL

Salary

$31’

$31’

NULL

$48’

$23’

PosTitle

QA

QA

NULL

proj mgr

sec.tary

LSN_R

15

15

NULL

16

14

RID_R

r12

r12

NULL

r13

r11

Figure 5.1: The Record and State Identification Problems are solved by in-cluding the record IDs and LSNs from both contributing source records in eachderived record.

matching RID, and is applied only if the logged state represents a newer statethan that of the record.

The LSNs from the source records may be used in the same way forDT creation. Derived records may, however, be composed of more than onesource record. In these cases, one LSN is not enough to identify the stateof the derived record. We call this the state identification problem. Theproblem is solved by including the LSN of all contributing source records.Both the record and state identification problems are illustrated in Figure5.1.

5.2 Missing Record Pre-States

The suggested DT creation method is based on applying operations in the logto derived records during log propagation. For the horizontal split, differenceand intersection operators, however, some source records may not belong toany of the DTs. The missing record pre-state problem is encountered if anyof the records not included in a DT are needed by the log propagator.

The problem can be solved by letting the log propagator acquire themissing information from the source tables. Since the source tables are ina different state than the DTs, however, this solution complicates the logpropagation rules. Furthermore, it incurs that the method is no longer self-

Page 76: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

62 5.2. MISSING RECORD PRE-STATES

RID

r1

r2

r3

Vinyl Records

LSN

101

102

103

Artist

Smith, Jimmy

Evans, Bill

Davis, Miles

Record

Root Down

Intuition

Kind of Blue

CD Records

RID

r10

r11

r12

LSN

151

152

153

Artist

Krall, Diana

Peterson, O.

Evans, Bill

Record

All for You

The Trio

Intuition

RIDSrc

r1

r3

Do Not Sell These

LSN

101

103

Artist

Smith, Jimmy

Davis, Miles

Record

Root Down

Kind of Blue

Cassidy, Eva Imagine r20 170

?

(a) A DT, “Do Not Sell These”, storing the difference between Vinyl and CD records,is created. The log propagator does not have the information needed to decide whetherthe new Eva Cassidy vinyl should belong to the DT or not.

RID

r1

r2

r3

Vinyl Records

LSN

101

102

103

Artist

Smith, Jimmy

Evans, Bill

Davis, Miles

Record

Root Down

Intuition

Kind of Blue

CD Records

RID

r10

r11

r12

LSN

151

152

153

Artist

Krall, Diana

Peterson, O.

Evans, Bill

Record

All for You

The Trio

Intuition

RIDSrc

r10

r11

r12

CompareTo

LSN

151

152

153

Artist

Krall, Diana

Peterson, O.

Evans, Bill

Record

All for You

The Trio

Intuition

RIDSrc

r1

r3

Do Not Sell These

LSN

101

103

Artist

Smith, Jimmy

Davis, Miles

Record

Root Down

Kind of Blue

Cassidy, Eva Imagine r20 170

Cassidy, Eva Imagine r20 170

(b) By adding the derived state of the CD record table, the log propagator is able todetermine that the new vinyl should be inserted into the DT.

Figure 5.2: The missing record pre-state problem of Example 5.2.2 is solvedfor inserts into the first source table of a difference DT. Note that this onlysolves one of the missing record pre-state problems for difference DT creation.

maintainable (Blakeley et al., 1989; Quass et al., 1996), as described in Chap-ter 2. The other solution is to add auxiliary tables to store information onmissing records that are necessary to make the DT self-maintainable. Auxil-iary tables were originally suggested for this purpose in MV maintenance byQuass et al. (Quass et al., 1996).

Consider the following examples:

Page 77: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 5. COMMON DT CREATION PROBLEMS 63

Example 5.2.1 (Missing Record Pre-State in Horizontal Split)Consider a DT creation using the horizontal split operator where one table,“Employee”, is split into “LondonEmployee” and “ParisEmplyee”. John,who is the company’s only salesman in New York, would not be representedin either of these derived tables. If John later moves to Paris (i.e. an updateoperation is encountered in the log) the previous state of John’s derivedrecord is needed by the log propagator before it can be inserted into theParisEmplyee table.

Example 5.2.2 (Missing Record Pre-State in Difference)A DT “Do Not Sell These” is created as the difference between the tables“Vinyl Records” and “CD Records”. During log propagation, a new EvaCassidy vinyl record is inserted. As can be seen in Figure 5.2(a), the logpropagator does not know whether to insert this record into the derived tableor not. The log propagator could scan the “CD Records” for an equal record,but that table represents a different state. Furthermore, the operation wouldnot be self-maintainable.

By adding the compare-to table containing the derived state of the CDrecords, the log propagator may scan that table instead. This reveals thatthe Eva Cassidy record should belong to the DT, as illustrated in Figure5.2(b).

5.3 Lock Forwarding During Transformations

When either the non-blocking abort or commit strategy is used for synchro-nization of a schema transformation, old transactions are allowed to operateon the source table records while new transactions operate on the DT recordsat the same time. As described in Section 4.5, this incurs that locks have tobe forwarded from the source records to the DT records. For non-blockingcommit, locks also have to be forwarded from the DTs to the source tablessince the source table transactions are allowed to acquire new locks. Derivedlocks, i.e. locks that have been forwarded, are released once the log propaga-tor has processed a log record describing the completion of the transactionthat owns the lock. Lock forwarding ensures that concurrency control isenforced between the source and DT versions of the same record.

In this section, four lock forwarding cases are discussed. In the first case,simple lock forwarding (SLF), each DT record is derived from one and only

Page 78: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

64 5.3. LOCK FORWARDING DURING TRANSFORMATIONS

RID

r1

r2

r3

Vinyl Records

LSN

102

103

Artist

Evans, Bill

Davis, Miles

Record

Root Down

Intuition

Kind of Blue

CD Records

RID

r10

r11

r12

LSN

151

152

153

Artist

Krall, Diana

Peterson, O.

Evans, Bill

Record

All for You

The Trio

Intuition

RIDSrc

r1

r3

Do Not Sell These

LSN

103

Artist

Davis, Miles

Record

Root Down

Kind of Blue

Cassidy, Eva Imagine r20 170

Smith, JCassidy, Eva Imagine r20 170

Smith, J 171

171

Figure 5.3: Simple Lock Forwarding (SLF) during Non-Blocking Commit.Source record locks require only one DT record lock to be acquired and viceversa.

one source record. Furthermore, a source record contributes to one and onlyone derived record. The second case, many-to-one lock forwarding (M1LF),applies when each source record contributes to one derived record only, buta record in the DT may be derived from multiple source records. Third,one-to-many lock forwarding (1MLF) is discussed. As the name suggests, itapplies when one source record may contribute to many derived records, buteach record in the DTs is derived from one source record only. The fourthcase, many-to-many lock forwarding (MMLF), inherits the problems of bothM1LF and 1MLF.

Simple Lock Forwarding (SLF)

Because of the one-to-one relationship between source and DT records inthe simple lock forwarding case, lock forwarding is straightforward. When asource record is locked, the DT record with the same source RID is lockedand vice versa. The horizontal merge with duplicate inclusion, horizontalsplit into disjoint DTs, difference and intersection operators work like this.

Many-to-One Lock Forwarding (M1LF)

Horizontal merge with duplicate removal and vertical merge of one-to-onerelationships are the only transformations presented that need many-to-onelock forwarding. As always, locks on source records ensure that conflictingsource operations are not given access at the same time. Concurrency controlrequires these locks to be forwarded to the DTs before new transactions canbe given access to the DTs. If the normal shared (S) and exclusive (X)

Page 79: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 5. COMMON DT CREATION PROBLEMS 65

Src.S

Y

Y

Y

N

Src.X

Y

Y

N

N

DT.S

Y

N

Y

N

DT.X

N

N

N

N

Src.S

Src.X

DT.S

DT.X

Figure 5.4: Lock compatibility matrix for many-to-one lock forwarding. Lockstransfered from the source tables do not conflict with each other, but conflictas normal with locks set on the target tables. The compatibility matrix caneasily be extended to multigranularity locking.

locks are used in the DTs, however, these non-conflicting source operationscould nevertheless be conflicting. This happens if more than one sourcerecord contributing to the same derived record is locked. Since the schedulerguarantees that operations on the source records are serializable (Bernsteinet al., 1987), there is no need for these locks to conflict. New locks aretherefore suggested.

Figure 5.4 shows the lock compatibility matrix used to avoid conflict be-tween non-conflicting operations forwarded from the source tables. Conflict-ing operations on the target table are still blocked. These new locks solvethe concurrency issues with M1LF. Locks can now be forwarded from thesource records to the DTs without causing conflicts for both non-blockingstrategies. If non-blocking commit is used, locks set on DT records must alsobe set on all source records contributing to the record in question.

One-to-Many Lock Forwarding (1MLF)

The third case is that of one-to-many lock forwarding. It applies to verticalsplit of one-to-one relationships and to horizontal split since the resultingDTs may be overlapping, i.e. non-disjoint. Since one DT record may bederived from one source record only, lock compatibility remains unchangedfor the non-blocking abort strategy. Thus, the only difference between non-blocking abort in SLF and in 1MLF is that one source record lock may resultin many DT record locks.

Non-blocking commit, however, is more complicated. The reason for thisis that if a DT record is updated, the update must be applied not only tothe source record it is derived from, but also to all DT records derived fromit. Locks must also be set on all these records immediately. Otherwise, thetarget and source versions of the record would not be equal. By doing this, thebehavior of transactions operating on the DTs during synchronization differs

Page 80: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

66 5.4. INCONSISTENT SOURCE RECORDS

U

Vinyl Records

Artist

Smith, Jimmy

Jones, Norah

Record

Root Down

Come Away With Me

Kind of Blue

CD Records

Artist

Krall, Diana

Davis, Miles

Evans, Bill

Record

The Look of Love

Miles Ahead

Waltz for Debby

Kind of Blue

Unique Records

Artist

Smith, Jimmy

Jones, Norah

Krall, Diana

Davis, Miles

Evans, Bill

Record

Root Down

Come Away With Me

Kind of Blue

The Look of Love

Miles Ahead

Waltz for Debby

Davis, MDavis, M

Davis, M

Figure 5.5: Many-to-One Lock Forwarding (M1LF) during Non-BlockingCommit. During synchronization of horizontal merge with duplicate removal,a DT record lock results in a lock of all records it is derived from. Note thatrecord and state identification information is not included.

from the behavior after synchronization is complete. If this is considered aproblem, non-blocking abort must be used instead. This problem will beelaborated on in the detailed operator description sections.

Many-to-Many Lock Forwarding (MMLF)

Vertical merge and split of one-to-many relationships belong to the fourthcategory, many-to-many lock forwarding. These operators inherit the prob-lems from both the 1MLF and M1LF cases. This means that the modifiedlock compatibility scheme must be used for both non-blocking strategies. Inaddition, operations performed on derived records may have to be forwardedboth to multiple source records and to all DT records derived from these ifthe non-blocking commit strategy is used.

5.4 Inconsistent Source Records

As pointed out when describing the schema transformation method by Ron-strom in Section 3.1, inconsistencies between records in the source tables may

Page 81: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 5. COMMON DT CREATION PROBLEMS 67

PCode

7030

5121

7020

7020

9010

Address

Moholt 3

Torvbk 6

Mollenb.7

Berg 1

NULL

S.Name

Valiante

Olsen

Clark

NULL

F.Name

Hanna

Erik

Markus

Sofie

NULL

EmployeePost

RID_L

r01

r02

r03

r04

NULL

RID_R

NULL

r11

r12

r12

r13

LSN_L

10

11

13

NULL

LSN_R

NULL

14

16

City

NULL

Bergen

Tromsø

53 53

53Moss

MossEdwards

PostalCode

PCode

5121

7020

9010

RecID

r11

r12

r13

LSN

14

16

City

Bergen

Tromsø

Moss 53

Employee

Address

Moholt 3

Torvbk 6

Mollenb.7

Berg 1

F.Name

Hanna

Erik

Markus

Sofie

S.Name

Valiante

Olsen

Clark

RecID

r01

r02

r03

r04

LSN

10

11

13

PCode

7030

5121

7020

7020

53Edwards

Figure 5.6: Many-to-Many Lock Forwarding (MMLF) during Non-BlockingCommit. A DT record lock on Markus results in two additional source andone additional DT record lock.

PCode

7030

5121

7020

7020

7020

Address

Moholt 3

Torvbk 6

Mollenb.7

Berg 1

Nordre 2

City

NULL

Bergen

Tr.heim

Tr.heim

Tr.hemi

Employee

S.Name

Valiante

Olsen

Oaks

Clark

Smith

F.Name

Hanna

Erik

Markus

Sofie

Peter

Figure 5.7: Example of Inconsistent Source Records. Three employees havepostal code 7020, but the city names are not equal.

be inevitable and may cause problems for the DT creations where multiplesource records contribute to the same derived record. In the detailed DTcreation descriptions in Chapter 6, this problem is relevant to vertical split,and to vertical merge for schema transformations.

Figure 5.7 illustrates a typical inconsistency where records with the samepostal code have different city names. The illustrated inconsistency is ananomaly since it breaks a functional dependency (Garcia-Molina et al., 2002).The functional dependency is only intended, however; the DBMS does notenforce it.

How to handle inconsistencies have been studied for different applicationswhere data from multiple sources are integrated into one. These includemerging of knowledge bases in the field of Artificial Intelligence, and merg-

Page 82: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

68 5.4. INCONSISTENT SOURCE RECORDS

ing database records from distributed database systems. The problem forboth is that even though integrity constraints guarantee consistency for eachindependent information source, the combination of records from multiplesources may still be inconsistent.

The solutions in the literature either focus on repairing the integratedrecords (i.e. make the records consistent) or answering queries consistentlybased on the inconsistent records. Only the former solution is relevant inthis thesis.

5.4.1 Repairing Inconsistencies

Before a repair can be performed, the records from all sources are integratedinto one table. Conflicts may be removed already at this stage, dependingon the selected integration operator. It may, e.g., be performed by the merge(Greco et al., 2001a), merge by majority (Lin and Mendelzon, 1999) or pri-oritized merge integration operators (Greco et al., 2001b). As the namessuggest, merge simply includes all record versions from all sources whilemerging by majority tries to resolve conflicts by using the value agreed uponby the majority of sources. Prioritized merge orders sources so that whenconflicts are encountered, the version from the source with higher priorityis chosen. Other integration operators also exist, see e.g. (Lin, 1996; Grecoet al., 2003; Caroprese and Zumpano, 2006). If an inconsistency can not beresolved during integration, the different versions are all stored in the resulttable. To enable multiple records with the same primary key to exist at thesame time, an attribute that refers to the originating source table is addedto the primary key (Agarwal et al., 1995).

If there are inconsistencies between integrated records, the repair operatoris applied to the result table. It identifies the alternative sets of insertions anddeletions that will make the table consistent (Greco et al., 2003), and thenapplies one of these insert-delete sets. Preference rules, or Active IntegrityConstraints (AICs), may be added so that in the case of conflict, one solutionis preferred to the others (Flesca et al., 2004). An example AIC presentedby Flesca et al. (Flesca et al., 2004), states that if two conflicting records foran employee are found, the one with the lowest salary should be kept. Evenif AICs are used, however, there may be many alternative insert/delete sets.To the authors knowledge, the choice of which alternative to choose has notbeen discussed in the literature.

In vertical split DT creation, merging by majority is used as the inte-gration operator, i.e. during initial population. This integration may beconstrained to only resolve conflicts if there is a major majority, e.g. >75%.The derived records where the conflicts are not resolved are then tagged

Page 83: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 5. COMMON DT CREATION PROBLEMS 69

Problem Description Solution

MissingRecord ID

Unable to identify which de-rived record a logged opera-tion applies to.

Add RID of all contributingsource records to each DTrecord.

MissingState ID

Unable to identify whethera logged operation has al-ready been applied to a de-rived record.

Add LSN of all contributingsource records to each DTrecord.

MissingRecordPre-State

Information about a recordthat is not stored in any ofthe DTs is required.

Store the information in anauxiliary table.

Lock For-wardingduringTransfor-mations

Scheduling problem forrecords existing in twoschema versions.

Forward locks betweenschema versions and Modifythe Lock Compatibilities.

InconsistentSourceRecords

Anomalies are found in thesource tables.

Resolve by major majorityor tag record as inconsistentand ask DBA.

Table 5.1: Summary of common DT creation problems and their solutions.

with an inconsistent mark. The repair algorithm will identify these recordsand present the different alternatives so that the DBA may decide whichalternative is correct.

5.5 Summary

In this chapter, we have identified five problems encountered when our frame-work is used for DT creation. Identifying these common problems and show-ing how they can be solved in general, makes it easier to explain DT creationfor each of the six relational operators. They also make it easier to developDT creation methods for other relational operators if they share the prob-lems. A summary of the problems and their solutions is shown in Table5.1.

Page 84: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

Chapter 6

DT Creation using RelationalOperators

This chapter describes how the DT creation process is performed for eachrelational operator. All methods follow the general framework presented inChapter 4. The methods are described in order of increasing complexity. Asshown in Table 6.1, this is the same order as the lock forwarding categoriesfor schema transformations described in the previous chapter.

The detailed DT creation descriptions start with the difference and in-tersection operators and horizontal merge with duplicate inclusion. Schematransformations using these operators belong to the Simple Lock Forwarding(SLR) category, and are the least complex operators.

Horizontal merge with duplicate removal is more complex than the du-plicate inclusion case since records in the DT may be derived from multi-ple source records. Hence, it belongs to the Many-to-One Lock Forwarding(M1LF) category.

The next operator, horizontal split, is the inverse of union. Since the splitis allowed to form overlapping result sets, one source record may be derivedinto multiple DTs. Horizontal split schema transformation belongs to theOne-to-Many Lock Forwarding (1MLF) category.

The two final operators are vertical merge and split. With these oper-ators, one source record may contribute to multiple derived records. Fur-thermore, one derived record may be derived from multiple source records1.The schema transformations using these operators require Many-to-ManyLock Forwarding (MMLF). Vertical split DT creation is also complicated bypossible inconsistencies between source records, and is therefore the most

1An exception is vertical split over a candidate key. With this operator, records in theDTs are derived from exactly one source record, and it therefore belongs to the 1MLFcategory.

Page 85: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 71

DT Creation OperatorLock forward-ing category

Section

Difference,intersection

Difference,intersection

SLF 6.1

Horizontal Merge,Dup Inclusion

Union SLF 6.2

Horizontal Merge,Dup Removal

Union M1LF 6.3

Horizontal Split Selection 1MLF 6.4

Vertical Merge Full Outer Join MMLF 6.5

Vertical Split,candidate key

Projection 1MLF 6.6

Vertical Split,non-candidate key

Projection MMLF 6.7

Table 6.1: DT Creation Operators.

complex operator.

6.1 Difference and Intersection

Difference and intersection (diff/int) DT creations are so closely related thatthe same method is applied to both operations. The method takes two sourcetables, Sin and Scomp (compared-to), as input. Sin contains the records thatbelong to either the difference or intersection set, based on the existence ofequal records in Scomp. The output is a DT containing the difference (DTdiff )or intersection (DTint) set of the source tables. An example DT creation isshown in Figure 6.1. In the figure, DTaux is created to solve the missingrecord pre-state problem described in Chapter 5. Note that in many cases,Scomp is not removed even if the DT is used for a schema transformation.

6.1.1 Preparation

During preparation, the derived table is added to the database schema. Itmay contain any subset of attributes from Sin that are wanted in the newDT. It is assumed that if a candidate key is not among these attributes, agenerated primary key, e.g. an auto-incremented integer, is added to the DT.

Page 86: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

72 6.1. DIFFERENCE AND INTERSECTION

RIDSrc

r10

r11

r12

RIDSrc

r2

RIDSrc

r1

r3

RID

r10

r11

r12

RID

r1

r2

r3

Difference

LSN

101

103

Artist

Smith, Jimmy

Davis, Miles

Record

Root Down

Kind of Blue

Intersection

LSN

102

Artist

Evans, Bill

Record

Intuition

Auxiliary

LSN

151

152

153

Artist

Krall, Diana

Peterson, O.

Evans, Bill

Record

All for You

The Trio

Intuition

CD Records

LSN

151

152

153

Artist

Krall, Diana

Peterson, O.

Evans, Bill

Record

All for You

The Trio

Intuition

Vinyl Records

LSN

101

102

103

Artist

Smith, Jimmy

Evans, Bill

Davis, Miles

Record

Root Down

Intuition

Kind of Blue

Figure 6.1: Difference and intersection DT creation. Grey attributes areused internally by the DT creation process, and are not visible to normaltransactions.

The generated primary key will not be considered when checking which ofDTint or DTdiff a record from Sin belongs to.

Duplicates may form as a result of only including a subset of sourceattributes in the DTs. Implications of this is not discussed since the methodused for horizontal merge with duplicate removal, described later in Section6.3, can be used for diff/int as well.

In addition to the source attributes and the key, the DT must include thesource RID and LSN to solve the record and state identification problems.These are shown as grey attributes in Figure 6.1.

The diff/int DT creations suffer from the missing record pre-state problemif only one derived table, storing either the difference or intersection set, iscreated. The problem is twofold. First, a record t derived from Sin may atone point in time belong to the difference set and later to the intersection setor vice versa. This may be caused by an update of the record itself, or by aninsert, delete or update of a record in Scomp. The old state of t is needed inboth cases. Thus, the first missing record pre-state problem is that the stateof records from Sin that do not belong to the difference or intersection DTbeing created are also needed. The problem is solved by adding an auxiliarytable storing the Sin records that are not in the result set. Thus, both DTdiff

and DTint are needed during both DT creations.

Second, the state of records derived from Scomp are frequently needed todetermine if a record derived from Sin should belong to DTdiff or DTint. Thishappens every time a log record describes an update or insert of a record inSin as well as when records in Scomp are updated or deleted. In the casethat an Scomp record r is updated, e.g., records in DTint that are equal to

Page 87: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 73

the old state of r may have to be moved to DTdiff , and records in DTdiff

that are equal to the new state of r should be moved to DTint. Thus, thesecond missing record pre-state problem is that the derived state of recordsfrom Scomp are needed as well, and is solved by storing these records in anauxiliary table called DTaux.

Because of the missing record pre-state problems described above, threetables are created during the preparation step. These are DTdiff , DTint andDTaux. Both auxiliary tables must have the same attributes as the DT.

Indices are created on the source RID attributes of all derived tables. Ifcandidate keys from the source tables are included in the DTs, indices shouldalso be created on one of these in all derived tables. With these indices, onlyone record has to be read when log propagation searches for equal recordsin any of the DTs2. If a candidate key is not included, an index should beadded to one or more attributes that differ the most between records. In theworst case scenario, i.e. without an index on any derived attribute, initialpopulation and log propagation must read all records in the derived tableswhen testing for equality. Unless the source tables contain few records, suchDT creations are in danger of never completing.

6.1.2 Initial Population

Once the derived and auxiliary tables have been created, the fuzzy mark iswritten to the log. Both source tables are then read fuzzily. Records fromScomp are inserted directly into DTaux whereas records from Sin are firstcompared to the records in DTaux. If an equal record is found in DTaux, theSin record is inserted into DTint. Otherwise, it is inserted into DTdiff . Whenthis step is complete, the DTs are said to contain the initial image.

6.1.3 Log Propagation

Log propagation is organized in iterations. Each iteration starts by writinga fuzzy mark in the log L, and then retrieves all log records relevant tothe source tables. The oldest log record that must be retrieved dependson whether or not this is the first log propagation iteration, as discussed inSection 4.4.

If the DT will be used to transform the schema, the synchronizationstrategy (step 4) must be decided on now. If either of the non-blocking syn-chronization strategies will be used, locks should be maintained continuously

2Although reading one record may involve reading many disk blocks if the index ispartially stored on disk.

Page 88: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

74 6.1. DIFFERENCE AND INTERSECTION

during log propagation so that the locks are in place when synchronizationis started.

The log records are applied to the DTs in sequential order. Thus, the logL consists of a partially ordered (Bernstein et al., 1987), finite number of logrecords, `1...`m, that are applied to the DTs in the same order as the loggedoperations were applied to the source tables. Note that a partial order onlyguarantees the ordering of conflicting operations.

In the diff/int DT creations, source records may contribute to only onederived record. Furthermore, since it is assumed that duplicates are not re-moved, each derived record is derived from only one source record. Log prop-agation has much in common with ARIES redo recovery processing (Mohanet al., 1992) due to this one-to-one relationship between source and derivedrecords. A difference is, however, that records may move between DTint andDTdiff . Since the source candidate keys may not be included in the DTs,multiple records derived from Sin may be equal to each DTaux record. Thisis reflected in the log propagation rules described next.

Propagation of Sin log records

Consider a log record ` ∈ L, describing an insert, update or delete of arecord in Sin. Independent of the operation described in `, the first step ofpropagation is to perform a lookup of the record ID in ` in the source RIDindex of DTint and DTdiff . The record found is called t.

If the logged operation is an insert of a record r into Sin, and a record twas found in the source RID lookup, the logged operation is already reflectedand is therefore ignored. If a record t was not found in any of the DTs,DTaux is scanned for a record with equal attribute values. r is then insertedinto either DTint or DTdiff , depending on whether or not an equal recordwas found in DTaux. As mentioned in the preparation section, the cost ofscanning for equal records in DTaux varies greatly with the availability ofgood indices. If an index on a unique attribute exist, at most one record hasto be read to determine which table r belongs to. If no indices are presentat all, all records in DTaux may have to be read.

Let `upd describe the update of a record r ∈ Sin. If the record ID of r wasnot found in the source RID lookup in DTint and DTdiff , `upd is ignored. t isguaranteed to reflect all relevant operations that happened before `upd (andpossibly some that happens later) (Løland and Hvasshovd, 2006c). Thus,not finding t in any of the DTs can only mean that a delete of r will bedescribed by a later log record `2 ∈ L, `upd ≺ `2, where a ≺ b means that ahappens before b.

If the record t was found, and `LSN > tLSN , the update described in `upd is

Page 89: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 75

applied. This update may require t to move between the DTs: if the updatedversion of t is equal to a record in DTaux, it should be in DTint. Otherwise,it should be in DTdiff . Moving t to the other DT is done by deleting the oldversion of t from one table and inserting the updated version into the other.

Log propagation of delete operations is straightforward. If the record twas found in the source RID lookup, t is deleted.

Propagation of Scomp log records

In contrast to derived Sin−records, derived Scomp−records may only belongto one table: DTaux. The reason for maintaining DTaux is only to decidewhich of DTint or DTdiff an Sin record should belong to.

Consider a log record `ins ∈ L, describing the insertion of a record rinto Scomp. The log record is ignored if the RID of r is found in a lookupin the source RID index of DTaux. This means that `ins is already reflected.Otherwise, r is inserted, and DTdiff is scanned to check if equal recordse1, . . . , em are represented there. If found, e1, . . . , em are moved to DTint.

Let `upd ∈ L describe an update of a record r in Scomp. If the source RIDof r is not found in DTaux, `upd is ignored. Otherwise, if the record t withthe described source RID is found, and if `LSN > tLSN , t is updated. Thisupdate may require records to be moved between DTint and DTdiff . DTint

and DTaux are first scanned for records equal to the old version of t. If therecords e1, . . . , em in DTint are found, and no equal records were found inDTaux, e1, . . . , em are moved to DTdiff . DTdiff is then scanned for recordsequal to the updated version of t. All matching records are moved to DTint.

Propagation of a delete log record `del ∈ L starts by identifying the de-rived version r of the record to delete in DTaux. This is done by a lookupin the source RID index. r is then deleted. If DTaux does not contain otherrecords that are equal to r, DTint is scanned. All records in DTint that areequal to r are then moved to DTdiff .

6.1.4 Synchronization

As argued in Section 4.5, synchronization should not be started until thestates of the DTs are very close to the states of the source tables. Hence, inwhat follows, we assume that this is the case.

If the blocking complete strategy is used, new transactions are first blockedfrom accessing Sin and Scomp. Transactions already active on the source tablesare then either allowed to commit or are forced to abort. When all thesetransactions have terminated, a final log propagation iteration is executed.

Page 90: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

76 6.1. DIFFERENCE AND INTERSECTION

This makes the derived tables transaction consistent with the source tables.The DTs are now ready to be used in a schema transformation or as MVs.

The non-blocking strategies differ between schema transformation andMV creation purposes. They are therefore described separately.

Synchronization for Schema Transformations

When performing non-blocking synchronization for schema transformations,transactions are allowed to perform updates in the source and derived tablesat the same time. Concurrency control is needed to ensure that the differentversions of the same record are not updated inconsistently. As described inSection 5.3, this is done by setting locks in both tables.

Each record in Sin or Scomp is derived into only one DT record, and eachDT record is composed of only one source record. The diff/int schema trans-formations therefore belong to the simple lock forwarding (SLF) categorydescribed in Section 5.3.

Synchronization starts by latching Sin and Scomp for the duration of alog propagation iteration. Read operations are, however, not affected bythis latch. Since the states of the source and derived tables do not differmuch, this pause should be very brief. This log propagation makes the DTsaction consistent with the source tables. Remember that since locks havebeen continuously forwarded by log propagation, all locks on source recordsare also set on the derived records.

If the non-blocking abort strategy is used, transactions active on Sin andScomp are then forced to abort while new transactions are allowed to accessDTdiff and/or DTint. The aborting transactions can not acquire new locks.Log propagation continues to forward the undo operations performed by theaborting source table transactions. Locks forwarded from the source tablesare released once the log propagator encounters the abort log record of thetransaction holding it. When all source table transactions have terminated,Sin and Scomp may be removed from the schema.

With non-blocking commit, source table transactions are allowed to accessnew records. In addition to the lock and operation forwarding from sourceto derived table performed in non-blocking abort, locks and operations mustalso be forwarded from the derived tables to the source tables. The reasonfor this is, of course, that transactions operating on the source tables mayaccess the records that have been modified in the derived tables. With SLF,an operation and lock on one DT record t results in an operation and lockon only the one record that t is derived from. Locks are always acquiredimmediately on both versions of the record whereas locks are not releaseduntil the log propagator encounters the transaction’s commit (or abort) log

Page 91: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 77

U

Vinyl Records

RecID

r1

r2

r3

LSN

101

102

103

Artist

Smith, Jimmy

Jones, Norah

Davis, Miles

Record

Root Down

Come Away With Me

Kind of Blue

CD Records

RecID

r10

r11

r12

LSN

151

152

153

Artist

Krall, Diana

Davis, Miles

Evans, Bill

Record

The Look of Love

Miles Ahead

Waltz for Debby

Records

RIDSrc

r1

r2

r3

r10

r11

r12

LSN

101

102

103

151

152

153

Artist

Smith, Jimmy

Jones, Norah

Davis, Miles

Krall, Diana

Davis, Miles

Evans, Bill

Record

Root Down

Come Away With Me

Kind of Blue

The Look of Love

Miles Ahead

Waltz for Debby

RecID

r101

r102

r103

r104

r105

r106

Figure 6.2: Horizontal Merge DT creation.

record.

Synchronization for MV Creations

Since transactions do not update records in the DTs when used to createMVs, operations will not be forwarded from DTdiff and DTint to Sin. Sin

and Scomp are first latched during one final log propagation iteration. Readoperations are still allowed, however. This log propagation makes the DTsaction consistent with the source tables. An MV maintenance method isthen added to DTdiff and/or DTint, before the source table latches are re-moved. The MV maintenance strategy is now responsible for keeping theMV consistent with the source tables, and DT creation is now complete.

6.2 Horizontal Merge with Duplicate Inclu-

sion

The horizontal merge DT creation uses the union relational operator. It takesrecords from m source tables, S1, . . . , Sm, and inserts these into a derivedtable DThm. DThm may contain any subset of attributes from the source

Page 92: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

78 6.2. HORIZONTAL MERGE WITH DUPLICATE INCLUSION

U

Vinyl Records

RecID

r1

r2

r3

LSN

101

102

103

Artist

Smith, Jimmy

Jones, Norah

Davis, Miles

Record

Root Down

Come Away With Me

Kind of Blue

CD Records

RecID

r10

r11

r12

r13

LSN

151

152

153

154

Artist

Krall, Diana

Davis, Miles

Evans, Bill

Davis, Miles

Record

The Look of Love

Miles Ahead

Waltz for Debby

Kind of Blue

Records

RIDSrc

r1

r2

r3

r10

r11

r12

r13

LSN

101

102

103

151

152

153

154

Artist

Smith, Jimmy

Jones, Norah

Davis, Miles

Krall, Diana

Davis, Miles

Evans, Bill

Davis, Miles

Record

Root Down

Come Away With Me

Kind of Blue

The Look of Love

Miles Ahead

Waltz for Debby

Kind of Blue

RecID

r101

r102

r103

r104

r105

r106

r107

Figure 6.3: Horizontal Merge with Duplicate Inclusion. Notice the duplicateMiles Davis albums (record IDs r103 and r107 in the derived table).

tables. An example horizontal merge between two source tables, “Vinylrecords” and “CD records”, is illustrated in Figure 6.2.

The DT may be defined to keep or remove duplicates. If duplicates arekept, all records in the source tables are represented in the DT. In this case,DThm is self-maintainable. When duplicates are removed, however, multiplesource records may contribute to the same derived record. DT creation usingthis operator requires additional information stored in an auxiliary table tobe self maintainable (Quass et al., 1996). Horizontal merge with duplicateinclusion is described in this section, whereas duplicate removal is discussedin Section 6.3.

Figure 6.3 shows a slightly modified version of Figure 6.2 where the twosource tables contain one version of the Miles Davis album “Kind of Blue”each. As expected when duplicates are not removed, the resulting DT con-tains both. Notice that the different Source Record IDs (RIDSrc) enables usto identify which record is derived from which source record.

Page 93: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 79

6.2.1 Preparation

Since all records from S1, . . . , Sm are represented exactly once in DThm, hor-izontal merge with duplicate inclusion only suffers from the missing recordand state identification problems. By including the source RID and LSN inDThm, the derived table is made self-maintainable.

During preparation, the derived table DThm is first added to the schema.The table may include any subset of attributes from the source tables. Sinceit is common for DBMSs to require a primary key in each table, an auto gen-erated primary key may have to be added to DThm. This generated primarykey is not shown in the figures of this section.

DT creation only uses the source RID attribute for record identification.An index is therefore only required on this attribute.

6.2.2 Initial Population

Initial population starts by writing a fuzzy mark, containing the identifiers ofall transactions active in S1, . . . , Sm, in the log. The source tables are thenread without the use of locks, and each record is then inserted into DThm.The resulting initial image in DThm is not consistent with the source tablesat any point in time.

6.2.3 Log Propagation

All source records are represented in DThm when log propagation starts.Furthermore, each source record contributes to only one derived record, andeach record in DThm is derived from only one source record. Log propagationcan therefore be performed like normal ARIES crash recovery redo work(Mohan et al., 1992), which was discussed in Section 2.3.

As for difference and intersection DT creation, a fuzzy mark is first writtento the log. The relevant log records, i.e. operations on records in any ofS1, . . . , Sm since the last fuzzy mark, are then retrieved and applied to DThm

in sequential order.A log record `ins ∈ L describing the insert of a source record r into

S1, . . . , Sm starts by checking if the record is already represented in DThm.This is done by a lookup on r’s RID in the source RID index of DThm. If arecord is found, the log record is ignored; the derived table already reflectsthis state (Løland and Hvasshovd, 2006c). If the RID is not found, thederived version of r is inserted into DThm.

Let the log records `upd ∈ L and `del ∈ L describe the update and deletion,respectively, of a record r in one of the source tables. Propagation of both

Page 94: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

80 6.2. HORIZONTAL MERGE WITH DUPLICATE INCLUSION

U

Vinyl Records

RecID

r1

r2

r3

LSN

101

102

103

Artist

Smith, Jimmy

Jones, Norah

Davis, Miles

Record

Root Down

Come Away With Me

Kind of Blue

CD Records

RecID

r10

r11

r12

r13

LSN

151

152

153

154

Artist

Krall, Diana

Davis, Miles

Evans, Bill

Davis, Miles

Record

The Look of Love

Miles Ahead

Waltz for Debby

Kind of Blue

Records

RIDSrc

r1

r2

r3

r10

r11

r12

r13

LSN

101

102

103

151

152

153

154

Artist

Smith, Jimmy

Jones, Norah

Davis, Miles

Krall, Diana

Davis, Miles

Evans, Bill

Davis, Miles

Record

Root Down

Come Away With Me

Kind of Blue

The Look of Love

Miles Ahead

Waltz for Debby

Kind of Blue

RecID

r101

r102

r103

r104

r105

r106

r107

Type

Vin

Vin

Vin

CD

CD

CD

CD

r110 r111 Peterson, O The Trio Vin 160

r111 Peterson, O The Trio 160

Figure 6.4: The Horizontal Merge shown in Figure 6.3, but with an added“Type” attribute in the “Records” table. With this information, the log propa-gator is able to insert the new “Oscar Peterson” record into the correct sourcetable.

operations starts with a lookup in the source RID index of DThm. If no recordis found, the log record is ignored. If a record t is found, log propagation of`del simply deletes t. Log propagation of `upd checks the LSN and updates tif `LSN > tLSN .

If the derived table will be used for a schema transformation, locks aremaintained in DThm as part of log propagation.

6.2.4 Synchronization

The synchronization step is performed in the same way as synchronizationof diff/int DT creation and is therefore not repeated. There is, however, onepotential problem with non-blocking commit during schema transformations.Consider Example 6.2.1:

Page 95: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 81

Example 6.2.1 (Lack of Information During Non-blocking Commit)A horizontal merge between two tables containing CD records and Vinylrecords was illustrated in Figure 6.3 on page 78. Notice that the derivedtable does not include any information that can be used to determine whichof the source tables a derived record would belong to.

During non-blocking commit synchronization, a transaction inserts a newvinyl record, “Oscar Peterson, The Trio”, into the derived table “Records”.Since the fact that the new record is a vinyl record can not be expressed inthe attributes of the DT, the log propagator has no way of knowing whichsource table it belongs to.

Example 6.2.1 illustrates an important problem: non-blocking commitsynchronization can only be used for horizontal merge if the log propagatorcan determine which source table an inserted DThm record belongs to. Whenthis is not the case, non-blocking abort must be used instead. Figure 6.4illustrates how adding a “Type” attribute can be used to solve the problemof Example 6.2.1.

6.3 Horizontal Merge with Duplicate Removal

Horizontal merge DT creation is more complex when duplicates are removedsince this incurs that multiple source records may contribute to the samederived record. Although the method still only suffers from the missingrecord and state ID problems, these can not be solved simply by addingsource RID and LSN as attributes in DThm. The proposed solution is tocreate an auxiliary table A in addition to DThm. The source RIDs and LNSsare then stored in A.

Figure 6.5(a) illustrates the same horizontal merge shown in Figure 6.3,but this time with duplicate removal. Thus, the duplicate Miles Davies album“Kind of Blue” is stored only once in the DT. As shown, the auxiliary tableA contains three attributes: the RID of the record in the derived table, inthe source table, and the current LSN of the record.

Another example of horizontal merge with duplicate removal is illustratedin Figure 6.5(b). The source tables are equal to those in Figure 6.5(b), butthis time DThm only contains the artist attribute. The result is that all three“Miles Davis” albums in the source tables are merged into one derived record.Regardless of the number of source records contributing to a record in DThm,A is able to store the record and state identification information required toperform the creation. Together, DThm and A are self-maintainable.

Page 96: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

82 6.3. HORIZONTAL MERGE WITH DUPLICATE REMOVAL

U

Unique Records

Artist

Smith, Jimmy

Jones, Norah

Davis, Miles

Krall, Diana

Davis, Miles

Evans, Bill

Record

Root Down

Come Away With Me

Kind of Blue

The Look of Love

Miles Ahead

Waltz for Debby

RecID

r101

r102

r103

r104

r105

r106

Vinyl Records

RecID

r1

r2

r3

LSN

101

102

103

Artist

Smith, Jimmy

Jones, Norah

Davis, Miles

Record

Root Down

Come Away With Me

Kind of Blue

CD Records

RecID

r10

r11

r12

r13

LSN

151

152

153

154

Artist

Krall, Diana

Davis, Miles

Evans, Bill

Davis, Miles

Record

The Look of Love

Miles Ahead

Waltz for Debby

Kind of Blue

RIDSrc

r1

r2

r3

r13

r10

r11

r12

LSN

101

102

103

154

151

152

153

RIDder

r101

r102

r103

r103

r104

r105

r106

ID

(a) Horizontal Merge DT creation with duplicate removal. DTid stores recordand state identification information on derived records.

U

Unique Artists

Artist

Smith, Jimmy

Jones, Norah

Davis, Miles

Krall, Diana

Evans, Bill

RecID

r101

r102

r103

r104

r105

Vinyl Records

RecID

r1

r2

r3

LSN

101

102

103

Artist

Smith, Jimmy

Jones, Norah

Davis, Miles

Record

Root Down

Come Away With Me

Kind of Blue

CD Records

RecID

r10

r11

r12

r13

LSN

151

152

153

154

Artist

Krall, Diana

Davis, Miles

Evans, Bill

Davis, Miles

Record

The Look of Love

Miles Ahead

Waltz for Debby

Kind of Blue

RIDSrc

r1

r2

r3

r11

r13

r10

r12

LSN

101

102

103

154

151

152

153

RIDder

r101

r102

r103

r103

r103

r104

r105

ID

(b) Horizontal Merge DT creation with duplicate removal. Two records from“CD” and one from “Vinyl” contribute to the same derived Miles Davisrecord.

Figure 6.5: Horizontal Merge DT Creation with Duplicate Removal.

Page 97: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 83

6.3.1 Preparation Step

As discussed, horizontal merge with duplicate removal suffers from the miss-ing record and state identification problems. To solve these, two tables arerequired: the derived table, DThm, and an auxiliary table A. A will includean attribute for the record ID in the derived and source tables, in additionto the LSN. DThm may consist of any subset of attributes from S1, . . . , Sm.Derived records are identified by performing a lookup in A, and DThm doestherefore not have to include the source RID.

An index is created on the RID in DThm and on both source RID and de-rived RID in A. Records are considered duplicates in DThm if they have equalattribute values. This check for equality must be performed very frequentlyby the DT creation process. Another index should therefore be added to theattribute in DThm that differ the most between the source records. If it isnot clear to the DBA which attribute is more divergent between the records,statistics should be acquired from the DBMS.

6.3.2 Initial Population Step

As for DT creation using other relational operators, the initial populationstarts by writing a fuzzy mark to the log. This log record contains theidentifiers of all transactions active on any of the source tables, S1, . . . , Sm.The source tables are then read fuzzily, and the resulting set of records,denoted SR, are inserted into DThm.

Insertion of a source record r ∈ SR into DThm starts by performing alookup in DThm. This is done to identify if a record teq with equal attributevalues is already represented. An index on a divergent attribute, as describedin the previous section, would speed up this search tremendously. If no equalrecord is found, a record tnew, containing the wanted subset of attributevalues from r, is inserted into DThm. A record a is then inserted into A. aconsists of the RID of r, the RID of tnew and the LSN of r.

If an equal record teq was found in DThm, the insertion into DThm is notperformed. Instead, a is inserted into A, consisting of the RID of r, the RIDof teq and the LSN of r.

6.3.3 Log Propagation Step

Since the source RIDs are only stored in the auxiliary table A, all log prop-agation rules must perform lookups in this table to identify derived records.

When the log propagator has written a fuzzy mark to the log, all logrecords relevant to S1, . . . , Sm is retrieved. The log records are then applied

Page 98: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

84 6.3. HORIZONTAL MERGE WITH DUPLICATE REMOVAL

in sequential order. If DThm will be used to perform a non-blocking schematransformation, locks should be maintained as part of log propagation, asdiscussed in Section 5.3.

Let the log record `ins ∈ L describe the insertion of a record r into oneof the source tables S1, . . . , Sm. Propagation starts by performing a lookupon the RID of r in A. If the RID is found, the log is already reflected inDThm, and `ins is therefore ignored. Even if the record is not represented inDThm, it may still be a duplicate of an existing record. Thus, DThm must bescanned to check if an existing record teq with equal attribute values exists.

Assuming that an equal record is not found in the scan of the derivedtable, a record tnew, derived from r, is inserted into DThm. A record anew,containing the RID of r and tnew, is then inserted into A. The LSN of thisrecord is set to that of `ins.

If a duplicate record teq is found in DThm, however, only the insert of aeq

into A is performed. This record stores the RID of teq instead of tnew, but isotherwise equal to anew.

Consider a log record `del ∈ L, describing the deletion of a record r fromany of S1, . . . , Sm. A lookup is first performed on the source RID index of A.Assuming that a record adel with the RID of r is found, another lookup isperformed on A’s derived RID index. If adel is the only record in A with thisderived RID, r is the only source record contributing to the derived recordt ∈ DThm, where the RID of t is equal to the derived RID of adel. In thiscase, both t and adel are deleted. Otherwise, if adel is not the only record inA with this derived RID, only adel is removed.

New duplicates may form, and old duplicates may be split as a result ofupdate operations. As for insert and delete operations, log propagation of alog record `upd ∈ L, describing an update of a source record r, starts with alookup in the source RID index of A. If a record a ∈ A with the source RIDof r is not found, or if the LSN of a indicates that a newer state is alreadyreflected, the `upd is ignored. If `LSN > aLSN , however, the update should beapplied. In this case, a lookup in the derived RID index of A is performedto identify any duplicates to the pre-update version of the record t ∈ DThm

derived from r. If a is the only record in A with this derived RID, t does notrepresent duplicates.

Assume for now that t is only derived from r. DThm is now scanned tofind if there is a record with equal attribute values to t after `upd has beenapplied to it. If there is not, t is updated as dictated by `upd. If the updatedrecord is a duplicate of a record tdup, however, t is deleted, and the derivedRID of a is updated to refer to the RID of tdup.

In the case that t is derived from more source records than r alone,`upd can not be applied directly to t. DThm is first scanned to find if t has

Page 99: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 85

duplicates after `upd has been applied. If the updated record is a duplicate oftdup ∈ DThm, the derived RID of a is updated to refer to tdup. If the updatedversion of t is not equal to any existing record in DThm, a new record tnew isinserted into DThm. tnew represents t after `upd has been applied to it. Thederived RID of a is then set to the RID of tnew.

In all four update cases described above, the LSN of a is updated to theLSN value of `upd.

6.3.4 Synchronization Step

The blocking complete synchronization, non-blocking abort synchronizationfor schema transformations and non-blocking synchronization for MV cre-ation strategies works like described for difference and intersection. Thesestrategies are not described further. The non-blocking commit strategy forschema transformations is different, however, and is therefore described next.The reason for this difference is that multiple source records may contributeto the same derived record. Hence, horizontal merge with duplicate removalbelongs to the Many-to-One Lock Forwarding (M1LF) category.

Non-blocking commit synchronization of schema transformations start bylatching S1, . . . , Sm while a log propagation iteration is performed. Thelatches do not affect read operations in the source tables. When the iter-ation is complete, DThm is action consistent with S1, . . . , Sm.. Because lockshave been maintained as part of log propagation, locks that are set on sourcerecords are also set on their counterparts in DThm. These locks must use themodified lock compatibility matrix in Figure 5.4 on page 65.

With the modified lock compatibility, locks forwarded from source recordsdo not conflict with each other. New transactions are now allowed to operateon records in DThm, and the transactions that are active in S1, . . . , Sm maycontinue processing on the source tables. Since the old transactions areallowed to access new records, locks and operations must also be forwardedfrom the DT to the source tables. Hence, for the rest of the synchronizationstep, transactions in DThm must acquire locks on both the DThm recordt it tries to access and all source records in S1, . . . , Sm that t is derivedfrom. The log propagator continues to process the log to ensure that theoperations executed in the source tables are also executed in the DT and viceversa. Forwarded locks are not released until the log propagator processesthe commit or abort log record of the transaction owning a lock. When allsource transactions have terminated, S1, . . . , Sm and A may be removed fromthe schema.

Page 100: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

86 6.4. HORIZONTAL SPLIT TRANSFORMATION

RID

r1

r2

r3

r4

r5

Music

LSN

101

102

103

104

105

Artist

Smith, Jimmy

Evans, Bill

Davis, Miles

Krall, Diana

Peterson, O

Record

Root Down

Intuition

Kind of Blue

Live In Paris

The Trio

Type

Vinyl

Vinyl

CD

DVD

CD

RIDSrc

r4

Music DVDs

LSN

104

Artist

Krall, Diana

Record

Live In Paris

RIDSrc

r3

r5

CD Records

LSN

103

105

Artist

Davis, Miles

Peterson, O

Record

Kind of Blue

The Trio

RIDSrc

r1

r2

Vinyl Records

LSN

101

102

Artist

Smith, Jimmy

Evans, Bill

Record

Root Down

Intuition

Figure 6.6: Horizontal Split DT creation. Grey attributes are used internallyby the DT creation process, and are not visible to normal transactions.

6.4 Horizontal Split Transformation

Horizontal split DT creation uses the selection relational operator. The trans-formation takes records from one source table, S, and distributes them intotwo or more derived tables DT1, . . . , DTm by using selection criterions. Anexample horizontal split of a table containing music on different media is illus-trated in Figure 6.6. Other examples include splitting an employee-table into“New York employee” and “Paris employee” based on location, or into “highsalary employee” and “low salary employee” based on a salary condition like“salary > $40.000”. The selection criterions may result in non-disjoint, i.e.overlapping, sets, and may not include all records from the source table.

6.4.1 Preparation

Horizontal split suffers from the missing record pre-state problem since theselection criterions may not include all records. As an example, consider theemployee table that was split into New York and Paris offices. An employeein London would not match any of these, and is therefore not part of anyof the resulting DTs. If, during DT creation, the employee is moved to the

Page 101: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 87

Paris office, the old state of the record is required before it can be updatedand inserted into the table containing Paris employees. The reason for thisis that update log records only contain the new values of the attributes thatare changed.

The missing record pre-state problem is solved by adding an auxiliarytable A, containing all records that do not qualify for any of the DTs. Theselection criterion for this table is the negated selection criterion of all theDTs. Thus, all records are guaranteed to belong to either one or more of thederived tables DT1, . . . , DTm, or to A.

Horizontal split DT creation also suffers from the missing record and stateidentification problems. These problems are solved by including the sourceRID and LSN in all derived tables.

The preparation step consists of creating one table for each selectioncriterion result set and one for the auxiliary information. In this section,DT1, . . . , DTm, A will be called the derived tables.

All tables must include the source RID and LSN, in addition to any sub-set of attributes from S. As for difference and intersection, it is assumedthat a candidate key from S is among the included subset of attributes.Alternatively, the derived tables may include a generated key, e.g. an auto-incremented number, that is assigned to all records. Thus, duplicate removalis not considered here. If required, duplicate removal as described for hori-zontal merge in Section 6.3 may be used.

The log propagation rules always use the source record ID to identifyrecords. Indices on other attributes are therefore not required.

6.4.2 Initial Population

Initial population starts by writing a fuzzy mark, containing the transactionidentifiers of all transactions active on S, in the log. S is then read withoutsetting locks, and each source record is then inserted into one or more derivedtables, depending on the selection criterions it satisfies. If the record doesnot match any selection criterion, it is inserted into A.

6.4.3 Log propagation

After initial population, all source records are represented in at least onederived table. Also, the derived records have a source RID and LSN definingwhich record and state they represent. With this information, the derivedtables are self-maintainable.

Each log propagation iteration starts by writing a fuzzy mark to the log L.All log records between the last fuzzy mark and the new one that are relevant

Page 102: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

88 6.4. HORIZONTAL SPLIT TRANSFORMATION

to S are then retrieved. The log records are then processed sequentially usingthe propagation rules described below. If the DTs will be used to perform anon-blocking schema transformation, locks are maintained on derived recordsas part of the log propagation.

Consider a log record `ins ∈ L, describing the insert of a record r intoS. A lookup is first performed on the source RID indices of DT1, . . . , DTm

and A. If r’s record ID, rRID, is found in any of these tables, the operationis already reflected and is ignored. Otherwise, r is evaluated with respect tothe selection criterions and inserted into all DTs it matches.

Propagation of an update log record `upd ∈ L, updating a record r in S,starts by identifying all records t1, . . . , tn ∈ DT1, . . . , DTm, A derived fromit. This is done by performing a lookup of rRID in the source RID index ofthe derived tables.

Since operations performed on a source record are always applied to allderived versions of it, all records in DT1, . . . , DTm, A derived from r have thesame LSNs. More formally,

∀tx∀ty(tx, ty ∈ {DT1, . . . , DTm, A}, txRIDSrc = tyRIDSrc ⇒ txLSN = tyLSN)

where txRIDSrc and tyRIDSrc are the source RIDs for the derived records tx

and ty, respectively. This can be used to determine whether or not `upd

has already been applied by inspecting the LSN of only one of the derivedrecords.

If none of the attributes that are updated by `upd are used in the selectioncriterions, t1, . . . , tn are simply updated. If a selection criterion attribute isupdated, however, two sets of derived tables are identified. The first is theset Ppre of derived tables where the pre-update version of the records derivedfrom r were stored. These are the same tables that t1, . . . , tn were found in.The second is the set Ppost of DTs where the updated versions of the recordsderived from r should be stored. The update is processed in two steps:

First, for all derived records t ∈ {t1, . . . , tn} identified by the initial sourceRID lookup, t is deleted if t is stored in table T and T 6∈ Ppost. Otherwise,if t is stored in T and T ∈ Ppost, t is updated with the new attribute valuesof `upd. When all records found in the initial lookup have been processed,the updated version of the derived record is inserted into all tables I whereI 6∈ Ppre and I ∈ Ppost.

When a delete log record `del ∈ L is encountered by the log propagator,a lookup is performed on the source RID index of all derived tables, usingthe RID of the record described in the `del. The records that are found aresimply deleted.

Page 103: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 89

6.4.4 Synchronization

The blocking complete and non-blocking MV synchronization strategies workin the same way as described for difference and intersection. Hence, we onlyfocus on non-blocking synchronization for schema transformations here.

Synchronization for Schema Transformations

Non-blocking abort starts by latching S during a log propagation iterationthat makes DT1, . . . , DTm, A action consistent with S. The latch does notaffect read operations. With horizontal split, a record in DT1, . . . , DTm, Amay only be derived from one source record, while one source record may con-tribute to multiple DT records. Hence, this schema transformation belongsto the one One-to-Many Lock Forwarding (1MLF) category.

In 1MLF, one source lock may have to be forwarded to multiple DTrecords since a derived record may belong to multiple DTs. As always, thenext steps of non-blocking abort are to release the latch and force transac-tions in the source table to abort. Locks forwarded from S are released inDT1, . . . , DTm once the abort log record of the transaction holding the lockis encountered by the log propagator. When all source transactions haveterminated, S and A may be removed from the schema.

Since the transactions on S may access new records, non-blocking commitsynchronization requires the log propagator to forward operations performedon a record r in DT1, . . . , DTm to the record s ∈ S it is derived from. How-ever, the operation must also be propagated to all records s contributes to,as described in Section 5.3. If not, the other records t1, . . . , tu derived from swould not be consistent with r. As discussed in Section 5.3, this transactionbehavior differs from the behavior after synchronization has completed. Ifthis is not acceptable, non-blocking abort should be used instead.

6.5 Vertical Merge

The vertical merge DT creation method creates a derived table DTvm byapplying the full outer join (FOJ) operator on two source tables, Sl and Sr.Sl is the left, and Sr the right table of the join. In contrast to inner join andleft and right outer join operators, FOJ is lossless in the sense that recordswith no join match are included in the result. In addition to being lossless,there are multiple reasons for focusing on full outer join. First, the full outerjoin result can later be reduced to any of the inner/left/right joins by simplydeleting all records that do not have the necessary join matches, whereasgoing the opposite direction is not possible. Second, full outer join is the

Page 104: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

90 6.5. VERTICAL MERGE

Employee

Address

Moholt 3

Torvbk 6

Mollenb.7

Berg 1

F.Name

Hanna

Erik

Markus

Sofie

S.Name

Valiante

Olsen

Oaks

Clark

RecID

r01

r02

r03

r04

LSN

10

11

12

13

Zip

7030

5121

7020

7020

PostalAddress

Zip

5121

7020

9010

RecID

r11

r12

r13

LSN

14

15

16

City

Bergen

Tr.heim

Tromsø

PCode

7030

5121

7020

7020

9010

Address

Moholt 3

Torvbk 6

Mollenb.7

Berg 1

NULL

S.Name

Valiante

Olsen

Oaks

Clark

NULL

F.Name

Hanna

Erik

Markus

Sofie

NULL

EmployeePost

RID_L

r01

r02

r03

r04

NULL

RID_R

NULL

r11

r12

r12

r13

LSN_L

10

11

12

13

NULL

LSN_R

NULL

14

15

15

16

City

NULL

Bergen

Tr.heim

Tr.heim

Tromsø

Figure 6.7: Example vertical merge DT creation.

only one of these operators that does not suffer from the missing record pre-state problem since all source records are represented at least once in the DT(Løland and Hvasshovd, 2006b). An example vertical merge DT creation isshown in Figure 6.7. The figure will be used as an example throughout thissection.

Vertical merge DT creation suffers from the missing record and state iden-tification problems. As argued in Section 5.1, these problems can be solvedby including the record IDs and LSNs from both source tables in DTvm. Thismethod is used in this section. An alternative method has been presented bythe authors (Løland and Hvasshovd, 2006c), however. It uses candidate keysto identify records and totally ignores LSNs in DTvm. Because the lattermethod does not solve the record and state identification problems, it hassome flaws compared to the one used here. First, the log propagation rulesare much less intuitive and second, it cannot handle semantically rich locks(Løland and Hvasshovd, 2006b). Thus, in contrast to the method presentedhere, it cannot handle delta updates (Korth, 1983). On the other hand, itrequires slightly less storage space since the two source RID attributes andan additional LSN are not added to DTvm.

In the following sections, the four steps of DT creation are explained indetail for the vertical merge operator.

Page 105: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 91

6.5.1 Preparation

During preparation, the derived table DTvm is added to the database schema.This table may include a subset of attributes from the two source tables, butsome attributes are mandatory. To solve the record and state identificationproblems, the source RIDs and LSNs from both Sl and Sr are needed.

Since records that should be affected by a logged operation are identifiedby the RID provided by the log record, indices are added to each of thesource RID attributes. In addition, an index is added to the join attribute(s)in DTvm since new join matches may have to be identified as a result of insertsand updates of source table records. Together, these indices provide directaccess to all affected records for any operation that may be encountered.

6.5.2 Initial Population

As for the other relational operators, initial population starts by writing afuzzy mark to the log, containing the identifiers of transactions that haveaccessed two source tables Sl or Sr. The source tables are then read withoutusing locks. Once read, the full outer join operator is applied, and the joinedrecords are inserted into DTvm. At this point, the state of DTvm is called theinitial image.

6.5.3 Log Propagation

No assumption is made on whether the join is over a one-to-one, one-to-manyor a many-to-many relationship. An implication of this is that records fromboth source tables may be represented in multiple records in DTvm. In whatfollows, it is assumed that the vertical merge is defined over an equijoin. Themethod can, however, be modified to use other comparison operators or, inthe case of cartesian product, no comparison operator.

Insert rules

Consider a log record `ins ∈ L, describing the insertion of a record r intosource table S, S ∈ {Sl, Sr}. The first step of propagating `ins is to performa lookup of the RID of r, denoted rRID, in either the left or right sourceRID index of DTvm. The index to use depends on which source table r wasoriginally inserted into. If one or more records in DTvm have rRID as thesource RID, the logged operation is already reflected, and `ins is ignored.

If no records with the source RID value of rRID are found in DTvm, everyDTvm record with a join attribute matching that of r, denoted rJM , areidentified. The set of records with a matching join attribute value is called

Page 106: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

92 6.5. VERTICAL MERGE

JM (Join Match). If no join matches were found, i.e. JM = ∅, r is joinedwith the null-record and inserted into DTvm. The source RID and LSN is setto that of the `ins.

If one or more join matches were found, all records t ∈ JM are composedof two source records. One of these is from the same table as r, and is denotedt1. The other part, t2, is from the other source table. If two or more recordsin JM consist of the same t2 part,only one of these records are used. Thus,for each record t ∈ JM with a t2−part that has not already been processed,t is updated with the attribute values of r iff t1 is the null-record. If t1 is notthe null-record, the attribute values of t2 are read and joined with r. Thisnew record is then inserted into DTvm with source RIDs and LSNs from both`ins and t2.

Update rules

Propagation of an update log record `upd ∈ L, updating the record r fromsource table S, S ∈ {Sl, Sr}, starts by identifying all records in DTvm partiallycomposed of r. This is done by a lookup of rRID in either the left or rightsource RID index of the derived table, depending on which table r belongsto. The set of records that are found is called P. If P = ∅, or if the LSN of allrecords p ∈ P is greater than the LSN of `upd, the log record is ignored. Asargued in Section 6.4.3, the LSN of all p ∈ P are equal. The LSN is thereforechecked only once.

The logged update is applied to DTvm if P 6= ∅ and the LSN of p is lowerthan that of `upd. If the join attribute values of r are not updated, all recordsp ∈ P are simply updated with the new attribute values and LSN of `upd.This is similar to crash recovery redo work as described in Section 2.3, butapplied to multiple instances of the same source record.

If the join attribute of r is updated, however, log propagation becomesmore complex. An additional set, N (new join matches), is first defined. Ncontains all records in DTvm that matches the updated join attribute valueof r. It is found by a lookup on the join attribute index on DTvm.

Since we assume that the vertical merge DT creation is defined over anequijoin, P and N are disjoint, i.e. not overlapping sets. Records in Pare first processed. For each DTvm record p, p ∈ P : p is composed of twojoined source records, r and one record p2 from the other source table. If p2 isrepresented in at least one other record in DTvm, p is deleted. This is checkedby a lookup of p2 source RID in the index of DTvm. If p2 is not representedin any other record in DTvm, however, p2 is joined with the null-record. Thisis done because all source records must be represented when using the fullouter join operator.

Page 107: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 93

If N = ∅, r is padded with null-values and inserted. If N 6= ∅, each recordn ∈ N is analyzed. Again, n is composed of two joined records: one recordn1 from the same table as r, and one record n2 from the other table. If n iscomposed of n2 and the null-record, n is updated with the attribute valuesof r. If n is the join of n2 and another record n1, a new record is insertedinto DTvm containing the join of the r and n2. In both cases, source RID andLSN is set to reflect the log record.

Delete rules

The propagation of a delete log record `del ∈ L is fairly intuitive. First, theset D of all records in DTvm consisting of r are identified. For each recordd ∈ D, d is deleted if it consists of r joined with the null record or a recordd2 that is represented in at least one other record in DTvm. If d is the onlyrecord in DTvm that contains d2, d2 is joined with the null record.

6.5.4 Synchronization

The synchronization step is started when the state of DTvm is very close to thestates of Sl and Sr. The blocking complete and non-blocking synchronizationof MV strategies work as described for difference and intersection. These arenot explained further.

Non-blocking Synchronization for Schema Transformations

As discussed in Section 5.3, vertical merge DT creation belongs to the many-to-many lock forwarding (MMLF) category. This means that the modifiedlock compatibility matrix, shown in Figure 5.4 on page 65, must be used sothat locks forwarded from the source tables to DTvm do not conflict witheach other. The source locks do, however, conflict with locks acquired onDTvm records. Note that if there are many more Sl records than Sr records,a few locks in Sr may result in a considerable amount of locks in DTvm.

If the non-blocking abort strategy is used, transactions active on Sl andSr are forced to abort, while new transactions are allowed to start processingon the unlocked records in DTvm. Log propagation continues to apply theundo operations performed by the aborting source table transactions. Sourcelocks in DTvm are removed once the abort log record of the owner transactionhas been processed. Sl and Sr may be removed when all source transactionshave terminated.

In the case of non-blocking commit, transactions active on the sourcetables are allowed to access new records. A consequence of this is that locks

Page 108: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

94 6.5. VERTICAL MERGE

RID

r11

r12

r13

r18

RID

r01

r02

r03

r04

RID_L

r01

r02

r03

r04

NULL

Position

PosID

001

005

052

050

LSN

14

16

68

PosTitle

sec.tary

QA

proj mgr

sw arch

Salary

$23’

$48’

$33’

Employee

Address

Moholt 3

Torvbk 6

Mollenb.7

Berg 1

Name

Hanna

Erik

Markus

Sofie

LSN

10

11

12

13

PosID

005

005

050

052

PosID

005

005

050

052

NULL

Address

Moholt 3

Torvbk 6

Mollenb.7

Berg 1

NULL

Name

Hanna

Erik

Markus

Sofie

NULL

Employee

LSN_L

10

11

12

13

NULL

Salary

$33’

$48’

$23’

PosTitle

QA

QA

sw arch

proj mgr

sec.tary

LSN_R

68

16

14

RID_R

r12

r12

r18

r13

r11

$34’

$34’$34’

72

7272

Figure 6.8: The updated salary of Hanna (see Example 6.5.1) requires asalary update of the “QA” position, resulting in an increased salary for all“QA” personel.

and operations from transactions in DTvm must be forwarded to Sl and Sr.This makes synchronization more complicated because the update of a recordt in DTvm may have to be propagated not only to the source records tl andtr that t is derived from, but to all DTvm records derived from tl and tr.Consider the following example:

Example 6.5.1 (Updates during non-blocking commit)Figure 6.8 illustrates the vertical merge schema transformation of sourcetables Employee and Position. There are two employees with position “QA”.During non-blocking commit synchronization, a transaction in DTvm updatesthe salary attribute of Hanna from “$33,000”to “$34,000”. This updaterequires that the “QA” record in the Position source table is locked andupdated accordingly. Furthermore, the Erik record in DTvm, which is alsoderived from this source record, has to be locked and updated with a newsalary to maintain consistency.

It should be clear from Example 6.5.1 that the choice of non-blockingabort vs. commit is not transparent for transactions operating on DTvm.With non-blocking commit, the behavior of transactions operating duringsynchronization differs from that of transactions after synchronization com-pletes.

Page 109: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 95

Salary

RID

r1

r2

r3

r4

LSN

101

102

103

104

Salary

$40’

$32’

$42’

$35’

EmpID

01

02

03

04

ModifiedEmp

RID

r1

r2

r3

r4

LSN

101

102

103

104

Address

Moholt 3

Torvbk 6

Mollenb.7

Berg 1

EmpID

01

02

03

04

Name

Hanna

Erik

Markus

Sofie

RID

r1

r2

r3

r4

LSN

101

102

103

104

Salary

$40’

$32’

$42’

$35’

Address

Moholt 3

Torvbk 6

Mollenb.7

Berg 1

EmpID

01

02

03

04

Employee

Name

Hanna

Erik

Markus

Sofie

Figure 6.9: Vertical split over a Candidate Key.

Before choosing to use non-blocking commit synchronization, the con-sequences must be considered carefully. The increased number of locks re-quired, forwarding of numerous update operations between the tables andthe non-transparent behavior of transactions operating on DTvm during andafter synchronization may outweigh the fact that transactions on the sourcetables are not aborted.

6.6 Vertical Split over a Candidate Key

Vertical split is the inverse of the full outer join DT creation method describedin the previous section, and uses the projection relational operator. It takesone source table S as input, and creates two derived tables, DTl (left result)and DTr (right result), each containing a subset of the source table attributes.Some attributes, called the split attributes, must be included in both DTs.These attributes can later be used to join DTl and DTr. In what follows, weassume that the only overlap between the attribute sets in DTl and DTr arethe split attributes.

If the split attributes form a candidate key in S, each source record will bederived into exactly one record in each of the DTs, and each record in the DTswill be derived from exactly one source record. The DT creation described inthis section therefore belongs to the One-to-Many Lock Forwarding category.An example split is illustrated in Figure 6.9.

If S is split over a functional dependency that is not a candidate key inS, multiple source records may have equal split attribute values and may

Page 110: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

96 6.6. VERTICAL SPLIT OVER A CANDIDATE KEY

therefore contribute to the same derived record in DTr. This type of splitis typically executed to perform a normalization of the database schema.Vertical split DT creation over a functional dependency is described in Sec-tion 6.7.

6.6.1 Preparation

In vertical split, two derived tables, DTl and DTr, are first added to theschema. They typically include two different subsets of attributes from S,but must both include the candidate key used as the split attribute.

Both DTl and DTr suffer from the missing record and state identificationproblems. Since the records in the DTs are derived from exactly one sourcerecord, these problems are solved by adding source RID and LSN directly tothe derived tables.

Since log propagation will identify all derived records based on the sourceRID, indices are only required on this attribute in DTl and DTr.

6.6.2 Initial Population

Initial population starts by writing the fuzzy mark, containing the identifiersof all transactions active in S, to the log. S is then read fuzzily, and for eachrecord found in S, one record is inserted into DTl and DTr.

6.6.3 Log Propagation

Log propagation is run in iterations. Each iteration writes a new fuzzy markto the log L, and then retrieves log records relevant to S since the last fuzzymark (or since the oldest operation of active transactions in the first itera-tion). These log records are then applied to DTl and DTr in sequential order.If the DTs will be used to perform a non-blocking schema transformation,locks are maintained as part of log propagation.

When a log record `ins ∈ L, describing the insertion of a record r into S,is encountered by log propagation, a lookup of the RID of r is first performedin the source RID index of one of the DTs. If the RID is found, r is alreadyreflected in the DTs. If not, the wanted subset of attributes from r is insertedinto DTl and DTr. Note that it is not necessary to perform a RID lookup inboth derived tables since r is either reflected in both or none of them.

A delete log record, describing the deletion of a record r from S, is prop-agated by performing a lookup in the source RID index of one of the DTs.The log record is ignored if the source RID is not found. Otherwise, the

Page 111: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 97

identified record is deleted, and the same process is then applied to the otherDT.

Consider a log record `upd ∈ L, describing an update of a record r in S.Again, log propagation starts with a lookup in the source RID index of one ofthe DTs. `upd may affect attributes in only DTl or DTr. If so, the lookup isperformed in this DT. Assuming that a derived record t with correct sourceRID is found, and that t has a lower LSN than `upd, the described update isapplied. If `upd affects attributes in the other DT as well, the procedure isrepeated for that table.

Most DBMSs do not allow primary key updates. Thus, the describedrules work under the assumption that the DT primary key attributes are notupdated. This assumption holds if the same attributes are used as primarykeys in S and the DTs. The vertical split example in Figure 6.9 illustratesthis. If another candidate key from S is used as primary key in DTl and DTr,however, the log propagator may encounter updates of these attributes. Ifthis is the case, the described update rules must be modified to delete the pre-update derived records and then insert the updated ones unless the DBMSallows primary key updates.

6.6.4 Synchronization

Vertical split over a candidate key belongs to the 1MLF category. The syn-chronization strategy works exactly as described for horizontal split in Sec-tion 6.4, and is therefore not repeated here.

6.7 Vertical Split over a Functional Depen-

dency

This section describes vertical split DT creation when the split attributes arenot a candidate key in S. This may, e.g., be done to perform a normalization(Elmasri and Navathe, 2004) of the database schema. An example split overa non candidate key is illustrated in Figure 6.10. As can be seen, the sourcetable “Employee” has two functional dependencies, and is split over “zip”,which is not a candidate key:

firstname,surname → zipzip → city

The legend is the same as was used in the previous chapter. Thus, the DTcreation method splits a table S into two derived tables DTl and DTr, each

Page 112: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

98 6.7. VERTICAL SPLIT OVER A FUNCTIONAL DEPENDENCY

RIDSrc

r1

r2

r3

r4

LSN

101

102

103

104

Zip

7030

5121

7020

7020

F.Name

Hanna

Erik

Markus

Sofie

S.Name

Valiante

Olsen

Oaks

Clark

ModifiedEmp

RID

r1

r2

r3

r4

r5

LSN

101

102

103

104

105

Zip

7030

5121

7020

7020

9010

City

NULL

Bergen

Tr.heim

Tr.heim

Tromsø

S.Name

Valiante

Olsen

Oaks

Clark

NULL

F.Name

Hanna

Erik

Markus

Sofie

NULL

Employee

#

1

2

1

City

Bergen

Tr.heim

Tromsø

Zip

5121

7020

9010

PostalAddress

Figure 6.10: Vertical split over a non candidate key.

containing a subset of attributes from the source table. Both tables mustinclude the split attributes.

A consequence of splitting S over a non candidate key is that multiplesource records may have the same split attribute value, e.g. multiple em-ployees with the same zip. These source records should be derived into onlyone record in DTr. Furthermore, a record in DTr should only be deleted ifthere are no more records in S with that split attribute value. To be able todecide if this is the case, a counter, similar to that of Gupta et al. (Guptaet al., 1993), is associated with each DTr record. When a DTr record is firstinserted, it has a counter of 1. After that, the counter is increased every timean equal record is inserted, and decreased every time one is deleted.

Before the method is described in detail, we show that S may contain in-consistencies that complicates DT creation. Consider the following example:

Example 6.7.1Consider the table “Employee” below. This table is used as a source tableto perform the DT creation illustrated in Figure 6.10.

Page 113: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 99

Firstname Surname Zip CityHanna Valiante 9010 TromsøErik Olsen 5121 BergenMarkus Oaks 7020 Trondheim. . . . . . . . . . . .Sofie Clark 7020 Trnodheim

There are intentionally two functional dependencies in this table:

firstname,surname → zipzip → city

Notice, however, that there is an inconsistency between employees Markusand Sofie since the zips are the same, whereas the city names differ. Noth-ing prevents such inconsistencies from occurring in this table, and the DTcreation framework can not decide whether “Trondheim” or “Trnodheim” iscorrect. One of the main reasons for normalization is to avoid such inconsis-tencies from occurring in the first place.

If inconsistencies like the one in Example 6.7.1 exist in S, we are not ableto perform a split transformation without taking measures.

For readability, vertical split over a non candidate key is first explainedunder the unrealistic assumption that inconsistencies never appear betweenrecords in S3. This provides an easy-to-understand basis for the scenariowhere inconsistencies may occur. An extension that can handle inconsisten-cies is then explained in Section 6.7.5.

6.7.1 Preparation

As for vertical split over a candidate key, the DTs suffer from the recordand state identification problems. For DTl, this problem can be solved byadding the source RID and LSN as attributes to the table. This can noteasily be done with DTr, however. The reason for this is that each recordin DTr may be derived from multiple source records. A possible solutionto the missing record and state identification problems of DTr would be tocreate an auxiliary table A, containing the record IDs from DTr, the sourcerecord IDs and the LSNs. This solution was used in the horizontal mergewith duplicate removal DT creation described in Section 6.3. As will be

3Note that the simplified method can not handle semantically rich locks. Semanticallyrich locks (Korth, 1983) were described in Chapter 2.

Page 114: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

100 6.7. VERTICAL SPLIT OVER A FUNCTIONAL DEPENDENCY

clear from the following sections, however, all the required record and stateidentification information can be found in DTl. Hence, the auxiliary table isnot needed.

During preparation, DTl is first added to the database schema. In addi-tion to the wanted subset of attributes from S and the split attributes, sourceRID and LSN are required. The LSNs will be used to achieve idempotencein both derived tables. The DT creation process will use the source RIDattribute of DTl for all lookups. Hence, an index should be added to thisattribute.

DTr is then added with a subset of attributes from S. Only the split at-tributes are allowed in both DTs. Instead of the normal source RID attribute,a counter is added to DTr. Since the split is over a functional dependency,the split attributes form a candidate key in DTr, and these should thereforebe defined as either primary key or unique. The DT creation process willuse the split attributes for lookup, and an index should therefore be addedto these.

If the DTs are used to perform a schema transformation, an alternativestrategy is to only create the DTr table. Since all attributes needed in DTl

are already present in S, S can be renamed to DTl during synchronizationafter removing unwanted attributes from it. The transformation would re-quire less space, and updates that would not affect attributes in DTr couldbe ignored by the log propagator. Unfortunately, the log propagator needsinformation on both the LSN and the split attribute value of each record inDTl. An auxiliary table A would therefore be needed to keep track of this in-formation during propagation. Although A may potentially be much smallerthan DTl, this section describes how the method works when DTl is createdas a separate table. Only minor adjustments are needed for the alternativeauxiliary method to work.

6.7.2 Initial Population

Initial population starts by writing a fuzzy mark in the log. The fuzzy markcontains the identifier of all transactions active on S at this point in time.After performing a fuzzy read of S, the records are inserted into DTl andDTr. Insertion into DTl is straightforward; the wanted subset of attributes isinserted together with the source RID and LSN of the record in S. A lookupis then performed on the split attribute index of DTr.

If a record with the same split attribute value already exists, the counterof that record is increased4. If the split attribute value is not found in DTr,

4Recall that for now, it is assumed that all records with equal split attribute values are

Page 115: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 101

a new record is inserted. It consists of the wanted subset of attributes, anda counter value of one.

6.7.3 Log Propagation

Log propagation is started once the initial images have been inserted intoDTl and DTr. Each iteration starts by writing a fuzzy mark to the log L,and then retrieves all log records relevant to S. The log records retrievedare then applied in sequential order to the DTs. If the DTs will be used ina non-blocking schema transformation, locks are maintained as part of logpropagation.

In general, log propagation for records in DTl is more intuitive than forrecords in DTr. The reason for this is that each record in DTl is derived fromexactly one source record. This is not the case for the records in DTr, whichmay be derived from multiple source records.

Log propagation of records in DTr must be treated with care. Sincean arbitrary number of source records may contribute to the same derivedrecord, the source RID can not be used for identification simply by addingit as an attribute. Instead, the split attribute value of the correspondingDTl record is used for identification. Since there is a one-to-one relationshipbetween source records and records in DTl, the value to search for is foundby reading the record tl ∈ DTl with correct source RID. Furthermore, DTr

does not provide correct state identifiers since multiple source records maycontribute to each record. Thus, the LSN of tl will be used to determine ifa log record is already reflected in DTr. By reading tl, both the record andstate identification problems are solved.

The records in DTr may have incorrect state identifiers during DT cre-ation. The reason for this is that there is only one LSN for each derivedrecord tr ∈ DTr. If the source record that last updated tr is later deleted,the LSN of tr will have a state ID that belongs to a source record no longercontributing to it. Nevertheless, the LSN of tr reflects the last update or in-sert propagated to it. Since all source records contributing to tr are assumedto be consistent, and since the LSN of tr is not used to achieve idempotence,this is not a problem.

Consider a log record `ins ∈ L, describing the insert of a record r into S.The RID of r is first used to perform a lookup in the source RID index ofDTl. If a record with this source RID is found, `ins is already reflected in theDTs and is therefore ignored. Otherwise, a record tl with the wanted subsetof attribute values from r, including the source RID and LSN, is inserted into

consistent.

Page 116: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

102 6.7. VERTICAL SPLIT OVER A FUNCTIONAL DEPENDENCY

DTl.

A lookup is then performed on the split attribute index of DTr. Assumingthat a record told ∈ DTr with the same split attribute values is found, thecounter of told in increased. If `ins has a higher LSN than the record, theattribute values of told are updated as well. The LSN is then set to thehigher LSN value of r and told.

If the split attribute value of r is not found in DTr, a new record tr isinserted. It contains the wanted subset of attributes and the LSN from r, inaddition to a counter value of one.

Log propagation of a delete log record `del ∈ L starts with the same sourceRID lookup in DTl. If the RID of the deleted source record is not found, thelog record `del is already reflected in the DTs. If a record tl with the correctsource RID is found, however, tl is deleted. A lookup is then performed onthe split attribute index of DTr, using the split value of tl. If the record trfound in this lookup has a counter of one, the record is deleted. Otherwise,the counter is decreased by one.

Let the `upd ∈ L be a log record describing the update of a record r in S.Propagation of `upd starts by performing a lookup in the source RID indexof DTl. If no record with this source RID exists, `upd is ignored. Otherwise,if a record tl ∈ DTl is found, and if `upd represents a newer state, i.e. has ahigher LSN than tl, the update is applied.

`upd is now applied to the attributes in tl if any of these are updated.Even if no attributes in DTl are updated, the LSN of tl is set to that of `upd.Log propagation then continues in DTr if `upd describes updates of attributesthere.

Assume for now that the split attribute values of tl are not updated. Alookup is then performed in the split attribute index of DTr, using the splitvalues read from tl. The record found in DTr is called tr. If `upd represents anewer state than tr, i.e. the LSN is higher, `upd is applied, and the LSN isset to reflect the new state.

If the split attribute value is updated, log propagation in DTr works bydelete and insert. The record told ∈ DTr is first read by a lookup in thesplit attribute index of DTr, using the pre-update split attribute value. Thecounter of told is decreased, and a new record tnew with updated attributevalues is inserted, as described for insert log records.

6.7.4 Synchronization

The blocking complete, non-blocking synchronization for MV creation andnon-blocking abort for schema transformation strategies work like described

Page 117: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 103

for vertical merge. Hence, only non-blocking commit for schema transforma-tions is discussed here.

Since each source record is split into two derived records, and each recordin DTr may be derived from multiple source records, vertical split trans-formation over a non candidate key requires many-to-many lock forwarding(MMLF).

As previously discussed, non-blocking commit allows transactions in S toaccess new records. This incurs that operations and locks set on a record tin DTl or DTr must be forwarded to all the records in S that t was derivedfrom. To allow fast lookup of these records, a split attribute index shouldbe added to S. Furthermore, if the operation on t changes the split attributevalues, the operation must also be forwarded to the record derived from r inthe other DT.

As argued for vertical merge schema transformation in Section 6.5.4, non-blocking abort may be a better choice than non-blocking commit since it ismuch less prone to locking conflicts. The commit algorithm is also muchmore complex.

6.7.5 How to Handle Inconsistent Data - An Extensionto Vertical Split

In this section, we extend the vertical split DT creation method just describedto handle inconsistent source records. The extension is inspired by solutionsto similar problems in merging of knowledge bases (Lin, 1996; Greco et al.,2003; Caroprese and Zumpano, 2006) and merging of records in distributeddatabase systems (Flesca et al., 2004) described in Section 5.4.

A flag attribute is added to records in DTr. The flag may either signalizeConsistent (C) or Unknown (U). A C flag is used when a derived record isknown to be derived from consistent source records, and the U flag is usedwhen it is known to be derived from inconsistent source records or has anunknown consistency state.

During initial population, all records in DTr that were consistent in thefuzzy read get a C−flag. All other records get a U−flag. The log propagationrules must also be modified to maintain these flags.

If the log propagator inserts a record tnew into DTr, and a record told

already has that split attribute, the flag of told is changed from C to U ifftold 6= tnew. The flag change from C to U is also performed for updates if thederived record in DTr has a counter greater than one. A U−flag can only bechanged to C if a logged update applies to all attributes that are not partof the split attributes, and the record has a counter of one.

Page 118: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

104 6.7. VERTICAL SPLIT OVER A FUNCTIONAL DEPENDENCY

A “Consistency Checker” (CC) is run regularly as a background thread.A record with a U flag, tu ∈ DTr, is chosen. The CC then writes a “BeginConsistency Check on tu” mark to the log. All records in S contributing totu are then read without using locks5. If these are consistent in S, anothermark stating that u is consistent is written to the log together with thecorrect image of tu.

The CC marks are later encountered by the log propagator. Assumingthat no logged operations apply to tu between the begin and end CC logmarks, tu is guaranteed to be consistent and is changed accordingly. Anymodification that applies to tu between these marks invalidate the result,however. Note that all records in DTr should have a C−flag before syn-chronization is started since a DBA will have to manually fix the problem ifinconsistent records still exist. This may take considerable time.

If the source records contributing to tu are not consistent, the “Consis-tency Remover” (CR) is started. It starts by collecting statistics on thesource records contributing to tu. This corresponds to identifying repairsthat may remove inconsistencies (Greco et al., 2003). Based on these statis-tics, the CR may either remove the inconsistencies based on predefined rules,or may suggest solutions to the DBA.

The CR makes inconsistency removal decisions based on rules inspired byintegration operators (Lin, 1996; Greco et al., 2003; Caroprese and Zumpano,2006) and Active Integrity Constraints (AIC) (Flesca et al., 2004). A rulemay, e.g., state that the attribute values agreed upon by a majority of sourcerecords should be used if more than 75% agree. Many rules may be defined,but if none of these evaluate to true, the DBA must decide what to do.Example 6.7.2 illustrate how the CR works during removal of inconsistencies.

Example 6.7.2Consider the inconsistency in Example 6.7.1 on page 98 one more time. As-sume that the table is split over the zip functional dependency, and that CRis trying to solve the inconsistency between records with postal zip “7020”.Based on a read of the employee table, CR has found the following statistics:

Total number of records with zip ‘‘7020’’: 306

Records agreeing on ‘‘Trondheim’’: 77% (235)

Records agreeing on ‘‘trondheim’’: 22% (68)

Records agreeing on ‘‘Trnodheim’’: 1% (3)

5Note that since the source table is read, this DT creation is not self-maintainable.

Page 119: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 105

Only one CR rule has been defined, stating that in the case of a 75 % majorityor more, the majority value is used. Thus, CR can now update the 71 recordswith cities not equal to “Trondheim”.

When the attribute values have been decided upon, either automaticallyor by the DBA, the CR is ready to remove the inconsistency. All recordson S that do not agree on the decided upon values are now updated onerecord at a time. The CR must acquire write locks on the involved recordsto do this, but only one record is locked and updated at a time. Whenall source records with incorrect attribute values have been updated, CC isagain executed for tu. If no transactions have introduced new inconsistenciesduring this process, CC will now inform the log propagator to set a C−flag.

6.8 Summary

In this chapter, we have described in detail how the DT creation frameworkcan be used to create derived tables using the six relational operators. Thesolutions to the DT creation problems described in Chapter 5 have been usedextensively in the design of the DT creation methods. Table 6.2 shows a sum-mary of which problems were encountered by which DT creation operator,and how the problems were solved.

The DT creation operators have been presented in order of increasingcomplexity. This order is closely related to the lock forwarding categoriza-tion and therefore the “cardinality” of the operation6. It is clear that manyrecords may have to be locked during non-blocking commit synchronizationof schema transformations, especially in the MMLF methods. If too manyrecords require locking, it may be better to use the non-blocking abort strat-egy. However, the number of locks depends heavily on parameters like whichtypes of modifications are frequently performed7, the number of records ineach source table etc. Thus, there is no simple answer for when one methodshould be used instead of the other. Note that lock contention is not aproblem for MV creation.

Although DT creation for all the relational operators use the same frame-work described in Chapter 4, it is clear that the work required to apply log

6How many derived records a source record may contribute to, and how many sourcerecords a derived record may be composed of.

7In vertical merge, e.g., modifications to records in Sl cause far fewer locks in the DTthan modifications to records in Sr.

Page 120: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

106 6.8. SUMMARY

records to the DTs varies from operator to operator. In DT creation usingthe difference operator, source record modifications requires the log propa-gator to lookup and modify records in two or even three DTs. For otheroperators, e.g. horizontal merge with duplicate inclusion, log propagation ofeach logged operation only requires lookup and modification of one record.These differences are expected to cause variations to the incurred perfor-mance degradation of transactions running concurrently with DT creation.

In the following chapters, we focus on implementation and testing of theDT creation methods. Our goal is to validate the methods and to indicateto which extent they degrade performance of other transactions.

Page 121: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 6. DT CREATION USING RELATIONAL OPERATORS 107M

issi

ng

Reco

rdand

Sta

teID

Mis

sing

Pre

-Sta

te

Lock

For-

ward

ing

Cate

gory

Inco

nsi

stent

Reco

rds

Diff

eren

ce,

inte

rsec

tion

Add

sourc

eR

IDan

dLSN

toal

lD

Ts

Add

two

auxilia

ryta

ble

sSLF

-

Hor

izon

talM

erge

,D

up

Incl

usi

onA

dd

sourc

eR

IDan

dLSN

toD

T-

SLF

-

Hor

izon

talM

erge

,D

up

Rem

oval

Sto

reso

urc

eR

IDan

dLSN

inau

xil-

iary

table

-M

1LF

-

Hor

izon

talSplit

Add

sourc

eR

IDan

dLSN

toD

Ts

Add

auxilia

ryta

-ble

for

reco

rds

not

qual

ifyin

gfo

rD

Ts

1MLF

-

Ver

tica

lM

erge

Add

sourc

eR

IDan

dLSN

toD

T-

MM

LF

-

Ver

tica

lSplit,

candid

ate

key

Add

sourc

eR

IDan

dLSN

tobot

hD

Ts

-1M

LF

-

Ver

tica

lSplit,

non

-can

did

ate

key

Add

sourc

eR

IDan

dLSN

tole

ftD

T-

MM

LF

Run

Con

sist

ency

Chec

kan

dR

epai

rin

par

alle

lw

ith

log

pro

pag

atio

n

Tab

le6.

2:Pro

blem

san

dso

lution

sfo

rth

eD

TCre

atio

nm

etho

ds.

Page 122: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems
Page 123: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

Part III

Implementation and Evaluation

Page 124: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems
Page 125: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

Chapter 7

Implementation Alternatives

In the previous part of this thesis, we described how the derived table creationframework can be used to perform schema transformations and create MVsusing six relational operators. We now want to determine the quality of thedescribed methods. More precisely, we want to determine a) whether themethods work, and b) what the implied costs are in terms of performancedegradation to concurrent transactions.

As discussed by Zelkowitz and Wallace (Zelkowitz and Wallace, 1998),experimentation is of great value in validation and evaluation of new tech-niques in software. In this thesis, two types of experiments are of particularimportance: empirical validation and performance experiments.

In the empirical validation experiments, the methods are tested in a con-trolled environment. An implementation of the methods is viewed as a “blackbox”, and the output of the black box is compared to what we consider tobe the correct output. This type of experiment can not be used as a proofof correct execution1, but rather as a clear indication of correctness. To con-firm the results, empirical validation experiments can be “triangulated”, i.e.,performed in two or more different implementations (Walker et al., 2003).Due to time constraints, triangulation is not performed in this thesis.

In the performance experiments, we consider relative performance, asopposed to absolute performance, the interesting metric2. This experimenttype is highly relevant since an important design goal has been to impact theperformance of concurrent transactions as little as possible.

Ideally, non-blocking DT creation should be implemented in a full-scale

1Although it can be used to prove incorrect execution (Tichy, 1998).2In this thesis, relative performance denotes the difference in performance while not

running DT creation, compared to performance during DT creation. Absolute performancedenotes the actual response time or throughput numbers that are acquired from processingcapacity benchmarks.

Page 126: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

112 7.1. ALTERNATIVE 1 - SIMULATION

DBMS. This would have provided excellent conditions for both types of ex-periments. However, implementing a DBMS comparable to IBM DB2 orOracle would be an impossible task due to the very high complexity of suchsystems. This leaves us with three alternative approaches:

Simulator Model a DBMS and the non-blocking DT creation functionality,and implement the model in a simulator.

Open Source DBMS Add the described functionality to an existing opensource DBMS, e.g. MySQL, PostgreSQL or Apache Derby.

Prototype Implement a prototype DBMS from scratch. This prototypehas to be significantly simplified compared to modern DBMSs in manyaspects, especially those considered not to affect DT creation.

The alternative we decide to use should be usable for both types of exper-iments. Due to time constraints, we also consider the implementation costan important factor. Hence, in the following sections, the alternatives areevaluated on three criteria: usability for empirical validation, usability forperformance testing, and the cost (time and risk of failure) of development.An evaluation summary is presented in Section 7.4.

7.1 Alternative 1 - Simulation

Assuming that a DBMS and the DT creation strategies can be modelledprecisely, simulations can be used to get good estimates of the incurred per-formance degradation in any simulated hardware environment (Highleyman,1989). The model would require accurate waiting times for processing andI/O, and correct distributions for task arrivals for all queues.

Implementing a model and performing simulations in a simulation pro-gram like Desmo-J (Desmo-J, 2006) requires little to moderate implementa-tion work. While it can be used for performance experiments, it can not beused for empirical validation of the non-blocking DT creation methods.

7.2 Alternative 2 - Open Source DBMS

If DT creation functionality was added to an open source DBMS, the mod-ified system could be used both for empirical validation and performancetesting. In contrast to simulations, in which any hardware environment canbe simulated, experiments using this alternative will only be executed in onehardware environment.

Page 127: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 7. IMPLEMENTATION ALTERNATIVES 113

Many open source Database Management Systems exist. From these,five well-known systems have been selected as potential DBMSs in whichthe non-blocking DT creation methods may be implemented. These are:Berkeley DB (and Berkeley DB Java Edition), Apache Derby, MySQL withInnoDB, PostgreSQL and Solid Database Engine.

As discussed in Part II, the suggested non-blocking DT creation methodshave rigid requirements for the internal DBMS design. Most importantly,the methods require Compensating Log Records (CLR), logical redo loggingand state identifiers on record granularity. Hence, in what follows, the sixDBMS candidates are evaluated with emphasis on these three requirements.

Berkeley DB and Berkeley DB Java Edition

Berkeley DB and Berkeley DB Java Edition are C and Java implementationsof the same design (Oracle Corporation, 2006b); unless otherwise noted, thename Berkeley DB will be used for both in this thesis. It is not a relationalDBMS, but rather a storage engine with transaction and recovery support.Our DT creation methods operate on relations, and a mapping from relationsto the physical structure would therefore be needed. This can be solved byusing the product as a storage engine in MySQL. However, Berkeley DB usesredo logging physical to page and page state identifiers (Oracle Corporation,2006a). It is therefore not considered a suitable candidate for the DT creationmethods.

Apache Derby

Apache Derby (Apache Derby, 2007a) is a relational DBMS implemented inJava. It uses ARIES (Mohan et al., 1992) like recovery with Write AheadLogging and Compensating Log Records. However, redo log records arephysical (Apache Derby, 2007b), and state identifiers are associated withblocks, not records (Apache Derby, 2007c). This renders Apache Derbyunsuited as an implementation candidate.

MySQL with InnoDB

MySQL (MySQL AB, 2007) is designed with a highly modular architecture,and is best described as a high level DBMS with a storage engine interface(MySQL AB, 2006). The high level functions include SQL Parsing, queryoptimization etc, while the storage engines are responsible for concurrencycontrol and recovery (MySQL AB, 2006). Many storage engines exist, e.g.,Berkeley DB, InnoDB and SolidDB. MySQL with InnoDB is described below,

Page 128: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

114 7.2. ALTERNATIVE 2 - OPEN SOURCE DBMS

whereas the Berkeley DB and SolidDB alternatives are treated as individualproducts.

MySQL with InnoDB is the recommended combination when transactionsupport is required (MySQL AB, 2006; Kruckenberg and Pipes, 2006). TheInnoDB storage engine uses physiological logging (Zaitsev, 2006) and pagelevel LSNs (Kruckenberg and Pipes, 2006). It is therefore not considered agood candidate for non-blocking DT creation.

Solid Database Engine

The Solid Database Engine can be used as a storage engine either in one ofSolid Information Technology’s embedded DBMSs (e.g. BoostEngine andEmbedded Engine), or in MySQL (Solid Info. Tech., 2007). The SolidDatabase Engine uses normal Write Ahead Logging with physical redo log-ging (Solid Info. Tech., 2006b). Furthermore, source code inspection revealsthat state identifiers are associated with blocks. Hence, we do not considerSolid Database Engine to be a good implementation candidate.

PostgreSQL

PostgreSQL, formerly known as Postgres and Postgres95, was originally cre-ated for research purposes during the mid-eighties (PostgreSQL Global De-velopment Group, 2007). Until version 7.1, PostgreSQL used a force bufferstrategy, and hence did not write redo operations to log at all (PostgreSQLGlobal Development Group, 2001). In version 7.33, undo operations, andtherefore CLRs, were not logged (PostgreSQL Global Development Group,2002). It is also clear from source code inspection that the redo log is physi-cal to page, and that state identifiers are associated with pages rather thanrecords4. The lack of CLRs, redo log records that are physical to page andpage state identifiers render PostgreSQL unsuited for the DT creation meth-ods.

Open Source DBMS Discussion

It is clear that none of the five open source DBMSs evaluated in this sectionare good implementation candidates for the non-blocking DT creation meth-ods. Since neither the log formats or the state identifiers can be used, boththe recovery and cache managers would have to be modified significantly. We

3This was the newest version when the implementation alternatives were evaluated.4See access/xlog.h for details

Page 129: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 7. IMPLEMENTATION ALTERNATIVES 115

DBMS Redo Log FormatGranularity of StateIdentifiers

Berkeley DB Physical BlockDerby Physical BlockMySQL/Innodb Physical BlockSolid DB Physical BlockPostgreSQL Physical Block

Table 7.1: Evaluation of Open Source DBMS alternatives.

consider the implementation cost of making significant changes to unfamiliarcode to be very high.

7.3 Alternative 3 - Prototype

A prototype DBMS that includes the non-blocking DT creation methodscan be used for empirical validation and performance testing in the sameway as an open source DBMS. The two strategies share the problem of fixedhardware, meaning that experiments will be performed in one hardware en-vironment only. As described in the introduction, this is not considered aproblem since we are interested in relative performance only.

It is not feasible to implement a new, fully functional DBMS from scratchdue to the complexity of such systems. A prototype should therefore onlyinclude the parts that are most relevant to DT creation.

The prototype is required to function in a manner similar to traditionalDBMS, and should therefore use a standard DBMS design to the largestpossible extent. Figure 7.1 shows a widely accepted DBMS design close towhat is used in, e.g., MySQL Enterprise Server 5 (MySQL AB, 2006) andMicrosoft SQL Server 2005 (Microsoft TechNet, 2006). The figure also bearsclose resemblance to the model described by Bernstein et al. (Bernstein et al.,1987). To get an idea of the implementation cost of a using a prototype, weconsider possible simplifications on a module by module basis in what follows.

Modules Operating on the Logical Data Model

In full-scale DBMSs, the Connection Manager is responsible for tasks likeauthentication, thread pooling and providing various connection interfaces.In a prototype, only one connection interface and a thread pooling strategyis required. Authentication, e.g., does not affect DT creation.

The SQL Parser of the prototype has to recognize the SQL statements

Page 130: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

116 7.3. ALTERNATIVE 3 - PROTOTYPE

Logical DataModel

Physical DataModel

Relational Manager

SQL Parser

Scheduler

Recovery Manager

Cache Manager

Data

Connection Manager

Figure 7.1: Possible Modular Design of Prototype.

used in the experiments. Hence, by first performing an analysis of the experi-ments, the prototype SQL Parser can be significantly simplified to understandonly a very limited subset of the SQL language.

A Relational Manager is typically responsible for mapping between thelogical data model seen by users, and the physical data model used internallyin the DBMS. The module also performs query optimization, which is usedto choose the most efficient access order when multiple tables are involved inone SQL statement. Query optimization is a highly sophisticated operationwhich involves statistical analysis of access paths. This can be totally ignoredin the prototype since the DT creation methods do not rely on it. However,this simplification requires careful construction of all SQL statements thatwill be used in the experiments. In practice, this can be done by e.g. alwaysstating tables in the most efficient order. With query optimization removedfrom the module, the relational manager is reduced to perform mappingbetween the logical and physical data models.

Modules Operating on the Physical Data Model

Schedulers are responsible for enforcing serializable execution of operations.

Page 131: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 7. IMPLEMENTATION ALTERNATIVES 117

As discussed in Section 2.2, two-phase locking (2PL) is the most commonlyused scheduling strategy in modern DBMSs. 2PL is also fairly simple toimplement, and should therefore be used in a prototype.

The primary responsibility of Recovery Managers is to correct trans-action, system and media failures. In most DBMSs, including all the opensource DBMSs evaluated in the previous section, this is done by maintain-ing a log. In the non-blocking DT creation methods, this log is also usedextensively to forward updates to the derived tables.

A prototype Recovery Manager implementation is required to maintaina log that can be used to fully recover the database. The ARIES recoverystrategy (Mohan et al., 1992) is a good candidate since it is widely acceptedand used in many DBMSs, e.g. in Apache Derby (Apache Derby, 2007b).To be usable by the DT creation methods, the module is also required to useCompensating Log Records and logical redo logging.

The final module, the Cache Manager, is responsible for physical dataaccess. In most DBMSs, this includes reading disk blocks into main memoryand writing modifications back to disk. A good strategy for choosing whichblocks to replace when the memory is full, e.g. the Least Recently Used(LRU) algorithm, is also necessary. As argued in Section 2.3, it is common forCache Managers to use a steal/no-force buffer strategy. With this strategy,the Cache Manager must cooperate closely with the Recovery Manager sothat all operations are recoverable.

In a prototype, the Cache Manager is required to cooperate with theRecovery Manager to achieve recoverable operations. Furthermore, the DTcreation methods require state identifiers to be associated with records, notblocks. As was clear from evaluating the open source alternative in theprevious section, record state identifiers are not normally used in todaysDBMSs.

7.4 Implementation Alternative Discussion

In this chapter, we have evaluated simulations, implementation in open sourceDBMSs and implementation in a prototype with respect three important cri-teria. These criteria were: usability for empirical validation, usability forperformance testing and implementation cost.

As is clear from Table 7.2, simulations can not be used to empiricallyvalidate the DT creation method. For this experiment type, the output ofa system is compared to what is considered correct output. The quality ofthe experiment result is therefore determined by the quality of the test sets.Hence, the open source DBMS and prototype alternatives are considered

Page 132: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

118 7.4. IMPLEMENTATION ALTERNATIVE DISCUSSION

SimulationOpen SourceDBMS

Prototype

Usability;EmpiricalValidation

- High High

Usability;PerformanceTesting

Medium High Medium

Implemen-tation Cost

Low High Medium

Risk; Unsuit-able Design

Low Medium Low

Table 7.2: Evaluation of implementation alternatives.

equally well suited for this purpose.All three alternatives can be used for performance experiments. Using

an existing open source DBMS would provide the most reliable performanceresults. In contrast to the alternatives, these DBMSs are all fully functionalsystems that include most aspects of common DBMS technology.

Both the simulation and prototype alternatives rely on simplified mod-els. However, we consider the latter alternative to provide the most accurateperformance results. The reason for this is that it is easier to verify the cor-rectness of the prototype design (Zelkowitz and Wallace, 1998), and becausewe do not have to make assumptions to processing and waiting times withthis alternative.

When it comes to implementation cost, simulation is clearly the leastcostly alternative. Furthermore, if an open source DBMS with a designsuitable for non-blocking DT creation had been found in Section 7.2, theopen source alternative would be considered less costly than implementing aprototype. However, the evaluation in Section 7.2 showed that none of theopen source DBMSs had a design that was suitable for DT creation. If anyof these systems were to be used, both the Cache and Recovery Managers ofthat DBMS would require significant modifications to support logical loggingand record state identifiers. In addition, the Scheduler module would have tobe changed to handle modified lock compatibilities and forwarding of locksbetween source and derived tables. Hence, only the high level modules of thechosen open source DBMS would be usable without drastic changes. Making

Page 133: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 7. IMPLEMENTATION ALTERNATIVES 119

extensive changes to unfamiliar code is considered by the author to be bothmore costly and have a higher risk of failure than implementing a prototype.

In contrast to simulations, the prototype alternative is good for bothtypes of experiments. Furthermore, it has a lower implementation cost andrisk than that of the open source alternative. Based on this evaluation, weconsider a prototype to be the better alternative due to time considerations.

Page 134: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

Chapter 8

Design of the Non-blockingDBMS

This chapter describes the design of a prototype Database Management Sys-tem, called the “Non-blocking Database Management System” (NbDBMS),which will be used for empirical validation and performance experimentsdescribed in Chapter 9. As illustrated in Figure 8.1, the prototype hasa modular design inspired by what is used in many modern DBMSs, e.g.MySQL Enterprise Server 5 (MySQL AB, 2006) and Microsoft SQL Server2005 (Microsoft TechNet, 2006). In addition to providing normal DBMSfunctionality, NbDBMS is capable of performing the six non-blocking DTcreations described in Chapter 6. Figure 8.2 shows a UML Class diagramof the most important parts of the prototype. Note that each module inNbDBMS can be replaced by another implementation as long as the moduleinterface remains unchanged.

NbDBMS is simplified significantly compared to modern DBMSs. E.g.,only a limited subset of the SQL language is supported, and only a few rela-tional operators are available for user transactions. Furthermore, NbDBMSstores records in main memory only, making buffer management unnecessary.

In the following sections, each module is described with emphasis on theirsimilarity to or difference from standard DBMS solutions. The effects of thesimplifications are then discussed in Section 8.1.7.

Page 135: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 8. DESIGN OF THE NON-BLOCKING DBMS 121

Non-blocking Database

Client Client... Admin

Java RMI

Scheduler

Sql Parser

Communication Manager

Relational Manager

Data Manager TblTbl

Recovery Manager Log

Figure 8.1: Modular Design Overview of the Non-blocking DBMS.

Page 136: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

122

Figure 8.2: UML Class Diagram of the Non-blocking Database System.

Page 137: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 8. DESIGN OF THE NON-BLOCKING DBMS 123

8.1 The Non-blocking DBMS Server

8.1.1 Database Communication Module

The Communication Module (CM) acts as the entry point for all access tothe database system. When a client or administrator wants to connect tothe database system, the CM sets up a network socket for that program.Java RMI is used for communication, but the sockets have been modified tonot buffer replies, thus emphasizing response time. This is necessary becausemany client programs will be run from the same physical node to simulatehigh workloads during the experiments. However, replies to different clientsat the same physical node should not be buffered and sent as one networkpackage, which would be the default Java RMI behavior. Once a connectionhas been established, the CM simply forwards requests from the client oradministrator to the appropriate DBMS module. Depending on the request,this is either the SQL Parser Module or the Relational Manager Module.

When performance tests are executed, the module is also responsible forwriting a performance report file. During response time tests, e.g., clientsperiodically report their observed response times, which are written to thereport file for later analysis.

8.1.2 SQL Parser Module

All user operations and some administrator operations are requested throughSQL statements. These statements must be interpreted by the SQL ParserModule (SPM) before they can be processed further.

The experiments require a way to perform all basic operation, i.e. insert,delete, update and query. Thus, SPM is designed to interpret a small subsetof the SQL language, including one single syntax for each of these operatorsLikewise, for queries, only one single way of writing projections1, selectioncriterions, joins, unions etc. is supported. Consult Appendix A for furtherdetails about accepted SQL statements.

The SPM works by first checking that the SQL statements have correctsyntax. Statements with incorrect syntax are rejected, while accepted state-ments are tokenized. Tokenization is the process of splitting strings intomeaningful blocks, called tokens (Lesk and Schmidt, 1990), which are usedas parameters in method calls to the Relational Manager. Consider the fol-lowing example tokenization:

1Selection of a subset of attributes(Elmasri and Navathe, 2004).

Page 138: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

124 8.1. THE NON-BLOCKING DBMS SERVER

Example 8.1.1 (Tokenization)Select statement:

select firstname, surname from person where zip=7020;

Tokens:

statement_type: {select}

table: {person}

attributes: {firstname,surname}

select_criterion_eq: {{zip,7020}}

order_by: {}

These tokens can then be used in a call to the Relational Manager procedure:

executeQuery(Long transID, String table, Array attributes,

Array select_criterion, Array order_by)

Regular Expressions (regex) (Friedl, 2006) are used for both syntax check-ing and tokenization of SQL statements. Regex is powerful, but becomecomplex if many different statement syntaxes are allowed. However, sinceonly a limited set of the SQL language needs to be recognized in NbDBMS,this is not a significant problem in the current implementation. If more com-plex SQL statements are to be allowed in a future implementation, a lexicalanalyzer like Lex (Lesk and Schmidt, 1990) should be used instead.

8.1.3 Relational Manager Module

The Relational Manager Module (RMM) maps the logical data model seenby users to the physical data model used internally by NbDBMS. Hence, thisis the lowest level module in which table or attribute names are meaningful.

The module consists of three classes: RelationalManager, TableMapperand RelationalAlgorithmHelper. RelationalManager is the main class of themodule. It serves as the module’s interface to higher level modules, and orga-nizes the logical to physical data mapping. The TableMapper class is used byRelationalManager whenever information is needed about a database schemaobject, e.g. a table. If the executeQuery method call in Example 8.1.1 isprocessed, e.g., the RelationalManager has to ask the TableMapper for theinternal IDs of the attributes “firstname” and “surname”. Other responsi-bilities of TableMapper includes table creation and removal, and providingdescriptions of tables. Table descriptions include attribute names, data typesand constraints, and are used when derived tables are created.

Page 139: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 8. DESIGN OF THE NON-BLOCKING DBMS 125

All information on tables and their attributes are stored in two reservedtables that are created at startup. Other than having reserved names, thesebehave as other tables. The table manipulation and information gatheringperformed by the TableMapper is therefore done by updating and queryingthe records in these. For fast lookup, the TableMapper also maintains acache of vital schema information. To be able to guarantee that the cachedinformation is valid, only one TableMapper object may be created. ThisTableMapper object is aware of all changes to the schema since schema ma-nipulations are directed through it.

If the RelationalManager is processing a query that involves set oper-ations, the RelationalAlgorithmHelper class is consulted. It contains static,i.e. stateless, implementations of some set operations, including various joins,union and sort.

The join algorithms are implemented using hash join (Shapiro, 1986).Since the database only resides in main memory, this strategy is better thanboth GRACE join and sort-merge join (Shapiro, 1986).

Union with duplicate removal is implemented with a Hashtable, and as-sumes that there are no duplicates in any of the two subquery results. Allrecords from the subquery with fewest records are first copied to the resultset, and a Hashtable is created on one of the attributes. The records fromthe other subquery are then compared to the hashed records and added tothe result set if a record with identical attribute values is not found. Ideally,an attribute with a unique constraint should be used in the hash. If thisis the case, each record from the second subquery must be compared to atmost one record. Note that records from the second table are not added tothe Hashtable.

The sort operation is implemented with Merge-Sort because this methodis both fast (n log n) (Knuth, 1998) and easy to implement.

Consider the sequence diagram in Figure 8.3. This diagram illustrate howthe module responds to the following query with a join:

select *

from (person join post on person.zip=post.zip)

where person.name=John;

As illustrated, the RMM first requests the attribute ID and type of the“name” attribute in “person” from TableMapper. If the TableMapper hasnot already cached this information, a read is requested from the reserved“columns” table. The TableMapper then caches this information for futurerequests. The RMM now knows the attribute ID and type (String) of the“name” attribute, and it uses this to read all person records with the name

Page 140: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

126 8.1. THE NON-BLOCKING DBMS SERVER

Figure 8.3: Relational Manager Module processing the queryselect * from (person join post on person.zip=post.zip) where per-son.name=John;

Page 141: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 8. DESIGN OF THE NON-BLOCKING DBMS 127

“John”. The same process is repeated for the “post” table, but without aselection criterion. Before the join is performed, the RMM needs to knowwhich attribute ID should be used to join the records. Again, the TableMap-per is consulted. The information is now found in the cache of TableMapper.Finally, the results of the subqueries and the join attribute IDs are sent tothe RelationalAlgorithmHelper class, which executes the join.

As already described, the RelationalManager is the lowest level modulein which the logical data model has any meaning. It is also the highestlevel module with knowledge of the physical data model. For this reason,the algorithms for most of the non-blocking DT creation methods are alsoimplemented here. Consult Chapter 6 for details on these algorithms.

8.1.4 Scheduler Module

The Scheduler is responsible for ordering transactional operations on recordsso that the resulting order is serializable. Note that since schema informa-tion is stored as records in two reserved tables, this implies that schemamodifications are also performed in serializable order.

The Scheduler uses a strict Two Phase Locking (2PL) strategy. Thus,locks are acquired when a transaction requests an operation, but are notreleased until the transaction has terminated. As argued in Section 7.3,this strategy was chosen for multiple reasons: strict 2PL is commonly usedin commercial systems, e.g. SQL Server 2005 (Microsoft TechNet, 2006)and DB2 v9 (IBM Information Center, 2006), is easy to understand andimplement. As opposed to basic 2PL, strict 2PL also avoids cascading aborts(Garcia-Molina et al., 2002).

The module supports both shared and exclusive locks on either record ortable granularity. If a transaction issues an operation that results in a lock-ing conflict, the transaction is aborted immediately. This ensures correct,deadlock-free execution (Bernstein et al., 1987), but comes with a perfor-mance penalty: in many cases, the conflict could have been resolved simplyby waiting for the lock to be released. If so, the transaction is aborted un-necessarily. On the other hand, deadlock detection can be ignored, thussimplifying the module.

Normal transactional requests are processed in three steps in the module.First, the TransactionManager class checks that the transaction is in theactive state. If the TransactionManager confirms that the transaction isactive, the lock type2, the transaction ID and the object ID3 is sent to the

2Shared and exclusive locks are supported.3The object ID is either a table name and a recordID, or only a table name.

Page 142: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

128 8.1. THE NON-BLOCKING DBMS SERVER

T:2, LSN:4T:1, LSN:2 T:2, LSN:3T:1, LSN:1 T:1, LSN:6T:3, LSN:5

nullnull

null

Figure 8.4: Organization of the log.

LockManager. If another transaction has a conflicting lock on the object,the LockManager returns an error code. This results in the abortion of thetransaction. Otherwise, if conflicting locks are not found, the LockManagerconfirms the lock. The Scheduler then sends the operation request to theRecovery Manager Module.

While all normal transactional operations require locks, the Scheduleralso provides lockless operations to the DT creation methods. Furthermore,methods for lock forwarding from one table to another is implemented in themodule.

8.1.5 Recovery Manager Module

The next layer of abstraction, the Recovery Module, is responsible for makingthe database durable. It is designed for a data manager using the steal andno-force buffer strategies. To ensure durability for this type of data manager,the ARIES protocol is adopted. ARIES is used in many modern DBMSs,including open source DBMS Apache Derby (Apache Derby, 2007b). Themodule maintains a logical log of all operations that modify data, and theWrite Ahead Logging (WAL) and Force Log at Commit (FLaC) techniquesare used to ensure recoverability. Furthermore, a Compensating Log Record(CLR) is written to the log if an operation is undone.

Logical logging is sometimes called operation logging because each logrecord stores the operation that was performed rather than the data valuesthemselves. The “partial action” and “action consistency” problems (Grayand Reuter, 1993) are not encountered in NbDBMS since records are storedin main memory only. This simplification is discussed in the next section. Ifrecords were stored on disk, as in most DBMSs, the logical logging strategyadopted here would have to be replaced by, e.g., a two level logging strategywith one logical log and one physiological log. This technique is used in theClustRa DBMS (Hvasshovd et al., 1995).

As illustrated in Figure 8.4, log records are organized in linked lists. Likein ARIES (Mohan et al., 1992), two sets of links are maintained: the first isa link between two succeeding log records, thus maintaining the sequential

Page 143: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 8. DESIGN OF THE NON-BLOCKING DBMS 129

Att 1 Index

"a" "d��""c""b"

1, a, x 2, a, y 3, b, x 4, c, z 5, a, z 6, c, y 7, d, y 8, c, x

Key Index

Figure 8.5: Organization of data records in a table. The table has two indexes;the primary key index (created automatically) and one user specified indexon attribute 1.

order of the log. The second link is between two succeeding log records inthe same transaction. The latter is only used to fetch log records when atransaction is aborted. These links are maintained as object references inmain memory, but are changed to LSN references when written to disk.

In addition to maintaining a sequential log of all executed operations, theRecovery Module is responsible for performing recovery after a crash and forundoing an aborted transaction.

8.1.6 Data Manager Module

The Data Manager4 Module (DMM) is responsible for storage and retrieval ofrecords, and for performing the actual updates of data records. The recordsare stored in Java hashtables (Sun Microsystems, 2007), which reside in mainmemory only. In NbDBMS, these hashtables are called indices.

When a table is created, an index is created on the primary key attribute.Indices can later be added to any other attribute. A table with a primary keyindex and an additional attribute index is illustrated in Figure 8.5. When aread operation is requested, the Data Manager chooses the index that is best

4The module is called Data Manager instead of Cache Manager because records arenever written to disk.

Page 144: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

130 8.1. THE NON-BLOCKING DBMS SERVER

suited to fetch the record. The choice is based on the best match betweenthe selection criterion and available indices. When an update is requested,the record is fetched using the primary key index before being changed. Ifthe operation modifies an indexed attribute, the record must be rehashed inthat index.

Although numerous DBMSs designed to achieve low response times aremain-memory based (Garcia-Molina and Salem, 1992; Cha et al., 1995; Chaet al., 1997; Bratsberg et al., 1997b; Cha and Song, 2004; Solid Info. Tech.,2006a), DBMSs are more often disk based5. Compared to disk based DBMSs,keeping records in main memory only is probably the greatest simplificationin the Non-blocking DBMS. By doing so, Cache management, i.e. choosingwhich disk blocks should reside in main memory at any time, can be totallyignored. Furthermore, write operations that would change multiple diskblocks, e.g. by splitting a node in a B-tree, are now atomic. This enables usto use plain logical logging, as argued in the previous section.

8.1.7 Effects of the Simplifications

The previous sections have described the simplifications made in the proto-type modules. In what follows, the implications of these are discussed.

As discussed in Section 8.1.2, NbDBMS only recognizes a very limitedSQL language, which is defined in Appendix A. This would obviouslybe a huge drawback if the Non-blocking DBMS were to be used by realapplications. However, in the current version, NbDBMS is only intendedfor empirical validation and performance testing. A predefined transactionalworkload will be used in these experiments, hence the system only needsto recognize a subset of the SQL language. This simplification is thereforeconsidered to not affect the experiments.

The Scheduler described in Section 8.1.4 is designed to abort transactionsthat try to acquire conflicting locks. This is called Immediate Restart con-flict resolution (Agrawal et al., 1987). Agrawal et al. compared this to themore common strategy of not aborting until a deadlock is detected. Thecomparison showed that the latter strategy, called blocking conflict resolu-tion, enables higher throughput under most workload scenarios. Hence, weexpect this to reduce the maximum throughput in NbDBMS.

In most circumstances, the non-blocking DT creation methods describedin this thesis do not acquire additional locks. This means that the exact same

5All the open source DBMSs evaluated in Chapter 7, IBM DB2 (IBM InformationCenter, 2006), Microsoft SQL Server 2005 (Microsoft TechNet, 2006) etc. are all diskbased DBMSs.

Page 145: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 8. DESIGN OF THE NON-BLOCKING DBMS 131

number of locking conflicts should occur for transactions executed duringnormal processing as for those executed during DT creation. Since immediaterestart affects transactions in both cases to the same extent, it is considerednot to affect the relative performance between the normal and DT creationcases. Furthermore, the empirical validation experiments remain unaffected.

As thoroughly discussed in Chapters 4 - 6, there is one exception in whichDT creation does require additional locks. This is during non-blocking syn-chronization of schema transformations. Here, locks are forwarded betweenthe old and new table versions. In all DT creation operators where one sourcerecord contributes to multiple derived records or vice versa, additional lock-ing conflicts are expected. In these cases, immediate restart is expectedto cause a higher number of aborts than the blocking strategy would have.NbDBMS is therefore expected to perform poorer in this particular case.

The Recovery Manager maintains a pure logical log. As argued byGray and Reuter, this alone does not provide enough information for crashrecovery since many disk operations are not atomic (Gray and Reuter, 1993).The logical log includes all the information needed to create DTs and forperforming crash recovery in NbDBMS, however. Thus, the only consequenceof this design is a reduced log volume compared to disk based DBMSs. Thisreduction in log volume is equal for transaction processing in the normal andDT creation cases. It is therefore considered to affect the performance ofNbDBMB to a negligible extent.

Storing data in main memory only is likely to be the simplificationwith greatest impact on the performance of NbDBMS. As discussed in Section8.1.6, this greatly reduces the complexity of the Data Manager and enablesthe use of a pure logical log. The chosen strategy is not common, but is usedin some DBMSs, including Solid BoostEngine (Solid Info. Tech., 2006a),ClustRa (Hvasshovd et al., 1995) and P*Time (Cha and Song, 2004)

As discussed by Highleyman (Highleyman, 1989), the performance of aDBMS is bound by a bottleneck resource. Example bottlenecks include CPU,disk and network. The “main memory only” simplification implies that theperformance results of NbDBMS should be compared to DBMSs that arebound by other resources than cache management. We expect that the em-pirical validation experiments are unaffected by this design. When it comesto performance testing, the normal processing and DT creation cases areboth affected by the design. We therefore consider the relative performanceto be affected to a little extent.

Page 146: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

132 8.2. CLIENT AND ADMINISTRATOR PROGRAMS

8.2 Client and Administrator Programs

Both the Database Client and Administrator programs have console userinterfaces, and connect to the Non-blocking DBMS through Java RemoteMethod Invocation (RMI) sockets. As described in Section 8.1.1, the socketshave been modified to not queue replies, thus reducing response time.

When a client program has connected to NbDBMS, it may perform op-erations on the database through a limited SQL language. The operationsare either issued through the executeRead method used by queries, or theexecuteUpdate method used by inserts, deletes and updates. All operationsare requested on behalf of a transaction. Transactions are started by callingthe startTransaction method, and terminated by calling either the commitor abort method.

There are two types of clients: one interactive client that accepts SQL op-erations from a user, and one automated client that generates semi-randomizedtransactions to simulate workload. Figure 8.6 shows a screen shot of the in-teractive client. The automated client type is used in the experiments, andis discussed further in Section 9.

The admin program has access to other operations in NbDBMS, but isotherwise similar to the client programs. There are two types of admins. Oneof these is interactive, and is used for manual verification of the DT creationmethod. The other type is automated and is used in conjunction with theautomated clients in the experiments.

8.3 Summary

In this chapter, we have described the design of a prototype DBMS, anddiscussed the effects of the simplifications we have made to it. The resultingprototype, called the Non-blocking DBMS, is capable of performing basicdatabase operations, like queries and updates, in addition to our suggestedDT creation method. Altogether, the prototype consists of approximately13,000 lines of code.

The prototype has been subject to both empirical validation and perfor-mance experiments. In the next chapter, we describe the experiments, anddiscuss the results and implications of these.

Page 147: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 8. DESIGN OF THE NON-BLOCKING DBMS 133

Figure 8.6: Screen shot of the Client program in action.

Page 148: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

Chapter 9

Prototype Testing

In this chapter, we focus on two types of experiments that can be used todetermine the quality of the DT creation methods. The first type, empiricalvalidation, is performed to determine whether the DT creation methods work.While it is clear that empirical validation can not be used to prove absolutecorrectness of a method (Tichy, 1998), it can provide a clear indication.This experiment type is therefore commonly used in development of newtechniques in software (Tichy, 1998; Pfleeger, 1999; Zelkowitz and Wallace,1998).

The second type of experiment is performance experiments. This exper-iment is highly relevant since an important design goal of the DT creationframework has been to incur low degrees of performance degradation.

The Non-blocking DBMS has been subject to extensive empirical vali-dation and performance experiments. In what follows, we first describe theenvironment the experiments have been performed in. We then discuss theresults and implications of the experiments.

9.1 Test Environment

All tests described in this chapter are performed on seven PCs, called nodes,each with two AMD 1400 MHz CPUs and 1 GB of main memory. All nodesrun the Debian GNU/Linux operating system with a 2.6.8 smp1 kernel, andare connected with a 100Mb Ethernet LAN network. The Non-blockingDBMS described in Chapter 8 is installed on one of the nodes, whereas theother nodes are used for administrator (1 node) and client (5 node) pro-grams. The reason for using five nodes with client programs is to resemble asrealistic workloads as possible with the available resources. In what follows,

1A symmetric multiprocessing (smp) kernel is required to utilize both CPUs.

Page 149: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 9. PROTOTYPE TESTING 135

Type ValueNodes 7 (1 server, 1 admin, 5 clients)CPU 2 x AMD 1400MHz per nodeMemory 1 GB per nodeOperating System Debian GNU/Linux, 2.6.8-686-smp kernelNetwork 100Mb EthernetJava Virtual Machine Java HotSpot Server VM, build 1.5.0 08-b03Java Compiler javac 1.5.0 08Java VM Options, Server -server -Xms800m -Xmx800m -XincgcJava VM Options, Admin -serverJava VM Options, Client -server

Table 9.1: Hardware and Software Environment for experiments.

these nodes are called “server node”, “admin node” and “client nodes”, re-spectively. The prototype DBMS and all client and administrator programshave been implemented in Java 2 Standard Edition 5.0 (Sun Microsystems,2006a).

The Server Node

The NbDBMS server has been run with the following options in all experi-ments:

java -server -Xms800m -Xmx800m -Xincgc

The -server option selects the Java HotSpot Server Virtual Machine (VM),which is optimized for overall performance. The alternative, the Client VM,has faster startup times and a smaller footprint (i.e. requires less memory),and is better suited for applications with graphical user interfaces (GUI) etc.(Sun Microsystems, 2006b) . Both VMs have been tried for the NbDBMSserver, and the Server VM has outperformed the Client VM with ∼15-20%higher maximum throughput.

The -Xms800m and -Xmx800m options are used to set the starting andmaximum heap sizes to 800 MB. The heap is the memory available to the JavaVM, and 800 MB has proven to be slightly below the limit where the Java VMsporadically fails to start due to memory conflicts with other processes. Bysetting the starting heap size equal to the maximum heap size, the overheadof growing the heap is avoided (Shirazi, 2003).

-Xincgc is used to select the incremental garbage collector. This algorithmfrequently collects small clusters of garbage as opposed to the default method

Page 150: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

136 9.1. TEST ENVIRONMENT

of collecting much garbage less frequent (Shirazi, 2003). Note that the impactof using this garbage collection algorithm is increasing with the heap size.The reason for this is that the default garbage collection algorithm mustcollect more garbage in each iteration (Shirazi, 2003). In NbDBMS, thisoption results in significantly lower response time variance.

The Client and Administrator Nodes

Like the Non-blocking DBMS server, the administrator and client programshave also been run with the Server VM. The heap and garbage collectionoptions used on the NbDBMS server made no observable difference for theseprograms, and are therefore not used in the experiments.

Each client node runs one organizer thread that spawns transaction threadsfrom a thread pool. When spawned, a transaction thread executes one trans-action, consisting of six basic operations, before returning to the thread pool.The organizer uses the Poisson distribution for transaction thread spawning,meaning that the number of requests per second varies, but has a definedmean. As argued by Highleyman, the Poisson distribution should be usedwhen we want to simulate requests from an “infinite” number of clients (High-leyman, 1989). By infinite, we mean many more clients than are currentlybeing processed by the database system.

The transactions requested by client threads are randomized within bound-aries defined for each DT creation operator. In all experiments, six transac-tion types are specified. These are called Transaction Mixes, and a spawnedtransaction thread executes one of the transactions specified in the appropri-ate mix. The transaction mixes are designed to reflect a varied workload sothat all log propagation rules are involved in the DT creation process. Hence,the transaction mixes include inserts, updates, deletes and queries on all in-volved tables. The transaction mixes are shown in Tables 9.2 to 9.4. Thereason for having three transaction mixes is that the DT creation methodshave different requirements; some have one source tables while others havetwo and so on.

The transactions are similar to those used in TPC-B benchmarks (Serlin,1993) although the DT creation table setups are not equal to what is specifiedin TPC-B. More thorough benchmarks, like TPC-C, TPC-D (Ballinger, 1993)

and AS3AP (Turbyfill et al., 1993) exist, but these do to a much greaterextent test DBMS functionality that is of no interest to DT creation. Agood example is the query in TCP-D that joins seven tables with greatlyvarying sizes meant to test the query optimizer, or AS3AP that tests mixesof batch and interactive queries (Gray, 1993).

In addition to the source tables, all experiments are performed with one

Page 151: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 9. PROTOTYPE TESTING 137

Transaction Mix 1

TransNon-source

Source 1 Source 2Scenario1 (%)

Scenario2 (%)

1 - 6 updates - 20 5

2 6 updates - - 20 20

3 4 reads 1 read 1 read 40 60

4 - - 6 updates 5 2.5

53 inserts3 deletes

- - 10 10

6 -2 inserts2 deletes

1 insert1 delete

5 2.5

Table 9.2: Transaction Mix 1, used in Difference and Intersection, VerticalMerge and Horizontal Merge DT creation. Transactions 1, 4 and 6 requirelog propagation processing. This corresponds to 30% of the operations inscenario 1, and 10% in Scenario 2 which is more read intensive.

Transaction Mix 2

TransNon-source

Source,“Left”part

Source,“Right”part

Scenario1 (%)

Scenario2 (%)

1 - 6 updates - 20 5

2 6 updates - - 20 20

3 4 reads 2 read - 40 60

4 - - 6 updates 5 2.5

53 inserts3 deletes

- - 10 10

6 -3 inserts3 deletes

- 5 2.5

Table 9.3: Transaction Mix 2, used in Vertical Split DT creation. There isonly one source table, but the attributes of this table are derived into eitherthe “left” or “right” derived table. Scenario 1 requires log propagation of 30%of the operation, whereas Scenario 2 requires 10%.

Page 152: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

138 9.1. TEST ENVIRONMENT

Transaction Mix 3

TransNon-source

Source,OtherAttribute

Source,SelectionAttribute

Scenario1 (%)

Scenario2 (%)

1 - 6 updates - 20 5

2 6 updates - - 20 20

3 4 reads 2 read - 40 60

4 - - 6 updates 5 2.5

53 inserts3 deletes

- - 10 10

6 -3 inserts3 deletes

- 5 2.5

Table 9.4: Transaction Mix 3, used in Horizontal Split DT creation. Thereis only one source table, but a derived record may have to move between thederived tables if the attribute used in the selection criterion is updated.

Operation # and size of records in each table

DifferenceIntersection

Nonsource: 20,000 records, 100 bytesSource 1: 20,000 records, 80 bytesSource 2: 5,000 records, 80 bytes

HorizontalMerge

Nonsource: 20,000 records, ∼100 bytesSource 1: 20,000 records, ∼80 bytesSource 2: 20,000 records, ∼80 bytes

HorizontalSplit

Nonsource: 20,000 records, ∼100 bytesSource 1: 40,000 records, ∼80 bytesSource 2: N/A

VerticalMerge

Nonsource: 20,000 records, ∼100 bytesSource 1: 20,000 records, ∼100 bytesSource 2: 1,300 to 20,000 records, ∼50 bytes

VerticalSplit

Nonsource: 20,000 records, ∼100 bytesSource 1: 20,000 records, ∼150 bytesSource 2: N/A

Table 9.5: Table Sizes used in the performance test experiments. Note thatthe empirical validation experiments are performed with 5 times more recordsin all source tables.

Page 153: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 9. PROTOTYPE TESTING 139

additional table in the schema. This table is called “nonsource”, and is notinvolved in the DT creation. The idea of having this table is to be able togenerate varying workloads without necessarily changing the log propagationwork that needs to be done. We also consider it realistic to have a databaseschema with more tables than those involved in the DT creation process.

Depending on the operator being used for DT creation, either one ortwo source tables are defined in the original database schema. Before anexperiment is started, all tables are filled with records. The number of recordsin each table are shown in Table 9.5.

Since multiple nodes with multiple threads request operations concur-rently, it will often be the case that a request arrives at the server whileanother request is being processed. The number of concurrent server threadsmay influence the performance. However, we rely on Java RMI to decide onthe optimal number of concurrent threads on the server node. Also, as previ-ously described, the server sockets used to connect the clients and the serverare modified to avoid buffering of replies to different transaction threadsexecuted on the same node.

9.2 Empirical Validation of the Non-Blocking

DT Creation Methods

Empirical validation experiments have been performed for all DT creationmethods in Non-blocking DBMS. The experiments were executed using thefollowing steps:

1. Populate the source tables with initial data as defined in Table 9.5.Note that five times more records are used in these tests than arespecified in the table.

2. Start a workload of semi-random insert, update and delete operations.The workload is described by the transaction mixes defined in Tables9.2 to 9.4, but the read transaction types are ignored. Execute 200,000transactions before stopping the workload.

3. Once the random workload has started, start the DT creation process.Let log propagation run until all transactions have completed executing.

4. When all transactions have completed, compare the content of thesource tables to that of the derived tables. No attribute values shoulddiffer between these schema versions.

Page 154: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

1409.2. EMPIRICAL VALIDATION OF THE NON-BLOCKING DT

CREATION METHODS

For continuity, the tests have been performed with similar source andderived tables as used in the examples in Chapter 6. Thus, the figures usedthere may be used for reference.

Difference and intersection

In the difference and intersection (diff/int) experiment, records from the“Vinyl records” source table were stored in the difference or intersectionDTs based on the existence of equal records in the “CD records” source table.All tables had three attributes: artist firstname, artist surname and recordtitle. To achieve a 20% overlap of records between the source tables, someoperations wrote completely random values while others wrote predefineddefault values. After the transactions had completed, the records in the DTswere ordered by artist and record name. The difference and intersectionoperators were then applied on the source tables, and the sorted results werestored in arrays. The records in these arrays were in turn compared to thederived tables. The contents of “CD records” was also compared to that ofthe auxiliary table. All records, including LSNs, were equal in the sourceand derived tables.

Horizontal Merge

To reduce the implementation work, the horizontal merge experiments haveonly been performed for the duplicate removal case. Duplicate removal waschosen before duplicate inclusion because the former is more complex, asargued in Section 6.3.

The source tables in the horizontal merge experiment were equal to thoseused in diff/int. Approximately 20% of the records in “CD Records” wereduplicates of “Vinyl Records”. There were no duplicates within one table.When all transactions had completed, the records in both source tables weresorted on name and record title and inserted into an array. The uniquerecords in this array were then compared to the DT, and the record IDs werecompared to the auxiliary table. No inconsistencies were found.

Horizontal Split

In the horizontal split experiment, a “Record” source table was split into“Vinyl records” and “CD records”. The attributes in the source table werefirstname, surname, record title and type. Type was used to determine whichDT a record should be derived to. Possible values of this attribute were“vinyl” (49%), “cd” (49%) or “none” (2%). The latter value was used toindicate that the record did not belong to either DT, and therefore had to

Page 155: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 9. PROTOTYPE TESTING 141

be stored in the auxiliary table. The comparison of the source and derivedtable contents was performed by copying the source records into one of threearrays, depending on the value of the type attribute. The records in each ofthese arrays were then compared to the DTs. The comparison showed thatall records were equal in the source and derived tables.

Vertical Merge

The vertical merge experiment was conducted by joining the “employee”and “postaladdress” source tables. The resulting DT was called “modifiedemployee”. When all transactions had completed, the comparison of recordsin the source and derived tables was performed by using the full outer joinoperator on the source table records. The join result and records in theDT were then sorted on the primary key attribute, social security number(SSN), and stored in arrays. Since the SSN was unique, comparison wasstraightforward. No inconsistencies were found.

Vertical Split

Vertical split was performed over a functional dependency, in which thesource table “employee” was split into “modified employee” and “postalad-dress”. As argued in Section 6.7, this type of vertical split is more complexthan the vertical split over a candidate key counterpart since source recordsmay be inconsistent in the former case. The records in the source table weredesigned to split into four times as many employees as postal addresses. 99%of all write operations to the attributes derived to the “postal address” DTwere default values. The remaining 1% were set to non-default values. Thisresulted in approximately 4% inconsistent records in the final state of the“postal address” DT. The consistency check program was executed in paral-lel with log propagation. In addition, the consistency check was performedon all records that were flagged as Unknown2 when the transactions had com-pleted. One final log propagation was then executed to achieve correct flagstates. The comparison of records was first performed between the sourcetable and the “modified employee” tables. The records from both tables weresorted on the primary key, SSN, before the relevant subset of attributes werecompared. No inconsistencies were found in this comparison.

The “postal address” table was then checked. This was done by insertingthe records in the source table into an array. Only the subset of attributesstored in “postal address” DT were stored, and the array was sorted on zip

2Recall from Section 6.7.5 that derived records are flagged as either (C)onsistent or(U)nknown.

Page 156: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

142 9.3. PERFORMANCE TESTING

code. Equal zip codes were discarded after checking that the records wereequal; if different attribute values were found, the record in the array wasflagged as inconsistent. This content of the array was then compared to thecontent of the “postal address” DT. Some inconsistencies were found, butonly on records marked with an Unknown flag in the DT. A cross checkrevealed that all of these were marked as inconsistent in the array. Also,no derived records with an Unknown flag were marked as consistent in thearray.

Empirical Validation Summary

With the exception of records flagged as Unknown in the vertical split ex-periment, no inconsistencies have been found between source records andderived records. This indicates that the DT creation methods work cor-rectly. We base this on what we consider to be extensive testing, in whichall basic write operations (insert, delete and update), and both normal andabort transaction execution3 has been involved. 200,000 transactions with6 operations each have been executed in each experiment. Thus, a total of1,200,000 modifying operations have been made to 300,000 records.

No matter how extensive the experiments are, however, empirical valida-tion can never be used as a proof of correctness (Tichy, 1998). Thus, theexperiment should ideally be repeated in another implementation to confirmthe results (Walker et al., 2003). Due to time considerations, the experimentshave only been performed on one implementation in this thesis.

9.3 Performance Testing

The following sections discus the performance test results from the non-blocking DT creation experiments. The same operators as in the empiricalvalidation experiments are considered. Hence, for horizontal merge, only theduplicate removal case is discussed. Similarly, for vertical split, only thefunctional dependency case is discussed.

There are two common measurements for database system performance.These are response time, i.e. the time spent from a client requests an op-eration until it receives the response, and throughput, i.e. the number oftransactions processed per unit of time (Highleyman, 1989). Results forboth measurements are discussed.

The test results presented in this chapter will not be a benchmark com-parisons between the Non-blocking DBMS and a fully functional DBMS. The

31%-3% of the transactions have been aborted due to locking conflicts.

Page 157: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 9. PROTOTYPE TESTING 143

reason is that the prototype lacks functionality vital to achieve good bench-mark results, e.g. the aforementioned query optimizer. Furthermore, as isclear from Section 9.1, the hardware used in the experiments is far from ca-pable of running high performance DBMS benchmarks. What the tests willbe used for, however, is to show the relative performance of user transactionswhen executed alone compared to when executed concurrently with the vari-ous DT creation methods. In the following sections, “user transactions” willdenote transactions sent from a client application. These are not involved inthe DT creation.

The performance of all steps of DT creation is tested, but most emphasiswill be put on performance during log propagation. The reason for this isthat the other three steps have much shorter execution times and thereforeimpact performance to a lesser extent.

Thread Priorities

The experiments discussed in the performance test sections have been de-signed to degrade the performance of concurrent transactions as little aspossible. We achieve this by reducing the priority of the DT creation threadto the point where the log propagator is only capable of applying as manylog records as are produced. Hence, with this priority, the number of logrecords to redo remains unchanged. Small increases in the priority of thisthread should therefore result in long execution time with a minimal perfor-mance degradation. Similarly, a big priority increase should result in shorterexecution time at the cost of more performance degradation.

The priority of threads in Java 2 Standard Edition 5.0 can be set so thata high priority thread is scheduled before a low priority thread. However,despite setting the priority to an absolute minimum, DT creation tends tocomplete very fast with the inevitable high performance penalties to concur-rent transactions. The reason for this is that the Java VM uses Linux systemthreads to implement Java threads (Austin, 2000), and that these have timeslices of 100ms in the Linux 2.6 kernel (Aas, 2005). Hence, every time the DTcreation thread is scheduled, it is allowed to run uninterrupted for 100ms.

By only modifying thread priorities, we are not able to achieve acceptabletransaction performance. The problem is that each requested operation isprocessed in approximately 1 ms. This means that if there are 50 threads usedfor transaction processing and one thread for DT creation, and all threads arescheduled once, the DT creation thread gets twice the CPU time as all theother threads together. Thus, to reduce the priority further, Thread.yield()and Thread.sleep() calls are used on the DT creation thread. This forcesthe thread to stop processing, thus reducing the time slice. By using this

Page 158: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

144 9.3. PERFORMANCE TESTING

technique we are able to fine-tune the priority, and thus find the lowestpossible performance degradation for each DT creation method.

Determining the Maximum Capacity of NbDBMS

Most performance results in this chapter are presented on a 50% to 100%workload scale. This implies that the point for 100% workload, i.e., themaximum transaction capacity of NbDBMS, has to be determined. Themaximum capacity differs slightly between the DT creation operators sincethe transaction mixes are not exactly equal, but the method described hereis used to find all.

As advised by Highleyman (Highleyman, 1989), the steps that shouldbe used to determine the maximum capacity of a database system is to firstdefine the maximum response time that is considered acceptable. Second, theamount of transactions that are required to complete within the maximumresponse time must be defined. The capacity can then be determined byexecuting test runs and compare the results with the requirements.

We define 10 ms as the maximum acceptable response time of an opera-tion. Considering the fact that all records are in main memory, requests areonly sent over a LAN, and the requested operations do not include complexqueries, 10 ms should suffice. A transaction that observes higher responsetime than 10ms for any of its six operations is considered to have failed. It isalso decided that 95% of all transactions must complete within an acceptableresponse time. This is often used as a requirement in telecom systems, e.g.as in ClustRa (Hvasshovd et al., 1995).

Considering only transaction failure due to unacceptable response time,5% transaction failure corresponds to too high response times in 0.85% of alloperations since all transactions consist of 6 operations:

0.95 = (1− x)6

x = 1− 6√

0.95 = 0.0085 (9.1)

Figure 9.1(a) shows the mean operation response times with a workloadranging from 100 to almost 500 transactions per second. It is clear from thegraph that the mean response time is much lower than 10 ms in all cases.Figure 9.1(b) shows the upper quartiles for a 99% confidence interval usingthe results from the same test runs as in the left graph. This graph showsthat an increasing number of transactions are not answered in time, especiallyas the throughput increases above 400. The rapid response time increase inboth graphs is expected since the delay over a bottleneck resource is givenby (Highleyman, 1989):

Page 159: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 9. PROTOTYPE TESTING 145

● ● ●● ●

150 200 250 300 350 400 450

0.8

1.0

1.2

1.4

1.6

1.8

2.0

Response Time vs Transactions Per Second

Transactions Requested

Res

pons

e T

ime

(ms)

● Avg Response TimeBase Response Time

(a) Response Time increases exponen-tially as the number of transactions persecond (tps) increases. The base re-sponse time before the rapid increase isat 0.78 ms.

●●

● ●

150 200 250 300 350 400 450

05

1015

20

Response Time vs Transactions Per Second

Transactions Requested

Res

pons

e T

ime

(ms)

● Response Time − Upper Quartile

(b) Response time 99% upper quartile.I.e., 0.5% of all response times are equalto or higher than the plot.

0 100 200 300 400

010

020

030

040

0

Throughput for Difference and Intersection Transactions

Transactions Requested

Thr

ough

put,

max

res

pons

e tim

e 10

ms

Theoretical ThroughputActual Throughput

(c) Theoretical and actual throughput when transactions are considered failed if theresponse time is higher than 10 ms.

Figure 9.1: Response time and throughput for difference and intersection,using Transaction Mix 1 scenario 1 (see Table 9.2).

Page 160: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

146 9.3. PERFORMANCE TESTING

Delay =T

1− L(9.2)

Here, T is the average service time over the bottleneck resource, and L is theload. Hence, it is clear that the response time increases towards infinity as theworkload approaches 100%. Because of this rapid response time increase, ahigher maximum acceptable response time would not increase the maximumthroughput much.

Figure 9.1(c) illustrates the throughput for transactions that are pro-cessed within the 10 ms per operation response time requirement. The num-ber of transactions processed within the time limit increases almost linearlybefore the response time becomes so high that more and more transactionsfail. The highest throughput was observed at 440 transactions per second. Ifthe workload exceeds 440, the actual throughput starts decreasing. This is,however, above the maximum capacity of NbDBMS with the current trans-action set.

A note on Locking Conflicts in the Performance Experiments

In the empirical validation experiments described in Section 9.2, the sourcetables were populated with 100,000 or 200,000 records, as shown in Table 9.5.The experiments were performed with approximately 60-70% of the workloadused in the performance tests, something that resulted in the abortion of 1-3% of the transactions due to locking conflicts.

To get statistically significant results in the performance experiments, ahuge amount of test runs are required. A total of 2000 test runs have beenperformed to get the data for the graphs in this chapter. In addition, manytest runs have been performed to find the maximum capacity of NbDBMS,ideal thread priorities for the DT creation thread and so forth. However,running this many tests with the same number of records as in the empiricalvalidation tests would take too much time. Hence, the number of records inthe source tables has been reduced to 20,000 and 40,000, as shown in Table9.5. This reduced the execution time of each iteration by more than one half.

While this reduction in records makes it possible to run the required num-ber of experiments, it causes another problem. With much fewer records, avery high number of transactions are forced to abort due to locking con-flicts even at moderate workloads. With this setup, the maximum capacityachievable without severely thrashing the throughput does not nearly utilizethe CPU capacity of the server node4. Thus, this setup can be used to little

4The throughput thrashed completely at approximately 65% of the maximum capacityused in the experiments described in this section.

Page 161: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 9. PROTOTYPE TESTING 147

more than test the locking subsystem of NbDBMS. Since the DT creationmethods do not acquire additional locks in all but a few cases, tests withsuch low workloads are not considered very useful.

To be able to perform a high number of test runs and at the same time testthe impact of DT creation under high workloads, the transactions used in theperformance experiments have been designed to operate on different records.We call this clustered operation. This means that no locking conflicts willoccur. When it comes to conflicts, the effect of this design is the same ashaving many times more records. There should be no difference (except theexecution time) to the performance results as long as all records fit in mainmemory in all cases. To verify this, five diff/int test runs with 200,000 small-sized records in each table and no clustering of the operated on records havebeen compared to the 20,000 record case with clustered operations. Apartfrom garbage collection being more noticeable in the former case, the setupsperformed with similar response times; the 200,000 setup was less then 10%slower than the 20,000 setup. It is worth noticing that the total size of allrecords in both setups5 were much smaller than the maximum Java VM heapsize (800MB).

9.3.1 Log Propagation - Difference and Intersection

We start the performance evaluation by a thorough discussion on the differ-ence and intersection (diff/int) experiments. As will be clear, the tendenciesfound here also apply to the experiments for other DT creation methods. Wewill not repeat the arguments and discussion for these experiments. Refer toAppendix B for plots of these experiments.

The diff/int tests use Transaction Mix 1, shown in Table 9.2, as workload.Experiments are conducted for two scenarios: in the first scenario, the sourcetables are frequently written to, i.e. 30% of all operations are either updates,inserts or deletes in the source tables. The second scenario is much moreread intensive, and the number of write operations on the source tables isreduced to 10%. Because write operations to the source tables are the onlyoperations that must be propagated to the DTs, scenario 1 should producethree times more log records to propagate. Thus, scenario 1 is expected toincur much higher performance degradation to normal transactions.

Response Time Distribution

Consider Figures 9.2(a) and 9.2(b), showing the distribution of operationresponse times under 50% workload. The left figure shows the response

5∼50 MB in the 200,000 case.

Page 162: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

148 9.3. PERFORMANCE TESTING

50% workload − Unloaded

Response Time (ms)

Den

sity

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

01

23

45

(a) Distribution of response time for 50%workload before DT creation is started.

50% workload − Log Propagation

Response Time (ms)

Den

sity

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

01

23

45

(b) Distribution of response time for 50%workload during log propagation for dif-ference DT creation.

80% workload − Unloaded

Response Time (ms)

Den

sity

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

01

23

45

(c) Distribution of response time for 80%workload before DT creation is started.

80% workload − Log Propagation

Response Time (ms)

Den

sity

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

01

23

45

(d) Distribution of response time for 80%workload during log propagation for dif-ference DT creation.

Figure 9.2: Response time distribution for 50% and 80% workload workloadfor difference and intersection transactions using scenario 1 from Table 9.2.

Page 163: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 9. PROTOTYPE TESTING 149

WorkloadMedian Mean

Unloaded Loaded % Unloaded Loaded %50 % 0.709ms 0.714ms 0.7% 0.772ms 0.778ms 0.8%80 % 0.750ms 0.770ms 2.7% 0.868ms 1.102ms 27%

Table 9.6: Summary of the response time distribution in the histograms ofFigure 9.2.

times without DT creation, and the right shows response times during logpropagation. The histograms show that the distributions are very similar, butthat the latter has slightly more outliers to the right. Thus, most operationsare processed equally quick in the unloaded case as in the log propagationcase, whereas a few operations observe much higher response times in thelatter case. Note that for readability, the horizontal axis of the histogramsstop at 4 ms, but the tendency of more outliers in the loaded cases is equallyclear beyond this limit.

The fact that most response times are equal in the unloaded and loadedcases is further confirmed by comparing their median values: The unloadedhistogram in Figure 9.2(a) has a median of 0.709ms, while the median forthe loaded case, shown in Figure 9.2(b), is only 0.7% higher at 0.714ms.This effect is also seen in the 80% workload histograms in Figures 9.2(c) and9.2(d), in which the medians are 0.750 ms versus 0.770 ms (2.7% higher).

It is interesting to compare these median values to the respective aver-age response times. In the 50% workload case, the mean for the unloadedhistogram is 0.772 ms whereas the mean of the log propagation histogramis 0.778 ms. In the 80% workload case, the means have increased to 0.868ms and 1.102 ms. This corresponds to 0.8% and 27% higher means in therespective workloads. Hence, the means are highly affected by the responsetime outliers whereas the medians are affected to a much lesser extent. Thisis because there are relatively few outliers, but these have very high responsetimes, thus affecting the means.

The increased number of high response time observations in the log prop-agation cases compared to the unloaded ones is caused by the DT creationthread. Since threads are given access to the CPU in time slices, and sincethe DT creation thread has a low priority, most transactional requests arriveat the server when log propagation is inactive. These requests observe thesame response times as in the unloaded case. However, when the DT creationthread is active, all transaction requests must be scheduled on only one CPU.These requests form much longer resource queues, and thus observe higherresponse times.

Comparing the histograms with 50% workload to those with 80% work-

Page 164: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

150 9.3. PERFORMANCE TESTING

load reveals that the response times increases with the workload. Further-more, the effect of performing the DT creation increases significantly, asindicated by the much flatter histogram in the lower right figure. For theunloaded case, the higher response time is caused by a higher request rate,which in turn increases average queue lengths at the server (Highleyman,1989). A higher workload also produces more log records per second. Hence,in the loaded case, the priority of the DT creation thread must be increasedto be able to propagate these additional log records within the same timeinterval. Increasing the priority means providing the thread with more CPUtime, which in turn increases the probability that the DT creation tread isactive when a transactional request arrives at the server.

Most of the long response times observed in the unloaded cases are causedby garbage collection. To determine the impact of garbage collection, threealgorithms have been tested. The default algorithm resulted in very highstandard deviation of the response times compared to the incremental algo-rithm used in all experiments in this chapter. Both these algorithms weredescribed in Section 9.1. Experiments with no garbage collection have alsobeen conducted. This option resulted in fewer, but not complete removal of,high response time observations. The latter garbage collection alternative isnot used in the experiments because the memory quickly becomes full. Weassume that the few remaining high response time observations are causedby processes running on the server node that are not under our control.

Response Time

An important aspect of the performance evaluation is how the response timeand throughput is affected by varying workloads. Consider Figure 9.3, whichshows response time means and 90% quartiles for the unloaded and log prop-agation cases. The first Figure, 9.3(a), shows the response time in the un-loaded case of Transaction Mix 1, Scenario 1. The plot shows that the lowerquartiles are stable at around 0.54 to 0.55 ms whereas the upper quartilesincrease rapidly with the workload. This is the same effect that was seenin the histograms in Figure 9.2; as the workload increases, the mean queuelengths at the NbDBMS server increase. This is not surprising since Equa-tion 9.2 determines that the response time should increase towards infinityas the load over the bottleneck resource approaches 100%.

Figure 9.3(b) shows the same experiment as in the left plot, but with theresponse times from both the unloaded and log propagation cases. It is evi-dent from the plot that the response time performance penalty of performinglog propagation increases rapidly as the workload exceeds 75-80%. E.g., themean response times during log propagation are 20%, 84% and 200% higher

Page 165: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 9. PROTOTYPE TESTING 151

● ●●

50 60 70 80 90 100

0.5

1.0

1.5

2.0

Response Time Average and 90% Quartile

% Workload

Res

pons

e T

ime

(ms)

● Unloaded

(a) Scenario 1 - Response time mean and90% quartiles for the unloaded case, i.e.before DT creation is started.

● ● ●●

50 60 70 80 90 100

24

68

Response Time Average and 90% Quartile

% Workload

Res

pons

e T

ime

(ms)

UnloadedDuring Log Propagation

(b) Scenario 1 - Response time mean and90% quartiles for the unloaded and logpropagation cases.

● ● ●

50 60 70 80 90 100

12

34

5

Response Time Average and 90% Quartile

% Workload

Res

pons

e T

ime

(ms)

UnloadedDuring Log Propagation

(c) Scenario 2 - Response time mean and90% quartiles for the unloaded and logpropagation cases.

● ●●

50 60 70 80 90 100

0.5

1.0

1.5

2.0

2.5

3.0

Response Time Average

% Workload

Res

pons

e T

ime

(ms)

Unloaded Scenario 1Unloaded Scenario 2Log Prop Scenario 1Log Prop Scenario 2

(d) Scenario 1 and 2 - Response timemean for unloaded and log propagationcases.

Figure 9.3: Response times for varying workloads before and during log prop-agation of difference and intersection DT creation using Transaction Mix1.

Page 166: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

152 9.3. PERFORMANCE TESTING

during in the 80%, 90% and 100% workload cases, respectively.The lower plots, shown in Figure 9.3(c), show the response time means

when scenario 2 of Transaction Mix 1 is used. It is clear that this scenarioimpacts the response time to a much lesser extent than scenario 1. Thereason for this is simply that the priority of the DT creation thread can bekept lower since fewer log records need to be propagated.

The impact on response time of performing log propagation applies tothe upper quartiles in particular. This was also clear from the histogramsin Figure 9.2, and indicates that an increasing number of operations haveto wait in a queue for long periods of times. Recall from Section 9.3 thatthe capacity of the Non-blocking DBMS is defined as the workload, mea-sured in transactions per second, at which less then 5% of all transactionsobserve higher operation response times than 10 ms. Further, the responsetimes increase very rapidly as the workload increases up to and beyond thiscapacity. It is obvious that log propagation adds to the workload of the Non-blocking DBMS. And, since the transaction arrival rate is not reduced whenlog propagation starts, it is not surprising that the response time averagesduring log propagation increases quickly. As opposed to the upper quartile,the lower quartile is relatively stable at approximately 0.55 to 0.56 ms for allworkloads.

Throughput

In addition to response time, throughput represents an important perfor-mance metric for database systems. Recall from Section 9.3 that all transac-tions with higher operation response times than 10 ms are considered failed.Consider Figure 9.4(a), showing the throughput of the unloaded and logpropagation cases for scenario 1 of Transaction Mix 1.

It is clear from the plot that very few transactions fail at low workloads.As the workload in the log propagation case increases beyond 70%, however,more and more operations are not processed in time. This is consistent withwhat the 99% upper quartile plot in Figure 9.1(b) and the response time plotsin Figure 9.3 indicated: The upper quartile of the response time increasesrapidly with the workload. Hence, at low workloads, even most of the “long”response times are lower than the acceptable 10 ms. At approximately 70%,the longest response times start to go beyond this. At even higher workloads,the ever increasing amount of too long response times effectively thrashesthe throughput. Furthermore, the number of failed transactions increasesvery rapidly when 70% workload is reached. Again, considering the rapidlyincreasing response times in Figure 9.3, this rapid thrashing comes as nosurprise.

Page 167: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 9. PROTOTYPE TESTING 153

0 20 40 60 80 100

010

020

030

040

0

Throughput − Difference and Intersection Scenario 1

% Workload

Thr

ough

put

●●

Theoretical ThroughputUnloadedDuring Log Propagation

(a) Scenario 1 - Throughput for differ-ence and intersection using TransactionMix 1.

0 20 40 60 80 1000

100

200

300

400

Throughput − Difference and Intersection Scenario 2

% Workload

Thr

ough

put

Theoretical ThroughputUnloadedDuring Log Propagation

(b) Scenario 2 - Throughput for differ-ence and intersection using TransactionMix 1.

Figure 9.4: Throughput for varying workloads during log propagation of Dif-ference and Intersection DT Creation.

Considering the unloaded case of the same plot, it is clear that thethroughput is only slightly reduced from approximately 80% and higher work-loads. The reason why the reduction is kept relatively low is the way we havedefined 100% workload. Recall from Section 9.3 that 100% workload is de-fined as the point where 95% of all transactions observe acceptable responsetimes. Hence, by definition, all throughput plots should have 5% reductionto the unloaded throughput at 100% workload.

Figure 9.4(a) shows the throughput of scenario 2 of the same transactionmix. As can be seen in Table 9.2, this scenario has more read operations thanscenario 1. As previously discussed, this means that the DT creation threadneeds less CPU time to propagate the log records generated per second.As shown in Figure 9.2(d), this results in less response time degradationfor concurrent transactions. It should be clear from the above discussionwhy a lower response time degradation also results in a lower throughputdegradation.

Page 168: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

154 9.3. PERFORMANCE TESTING

● ● ●●

50 60 70 80 90 100

12

34

56

Response Time Average and 90% Quartile

% Workload

Res

pons

e T

ime

(ms)

UnloadedDuring Log Propagation

(a) Response time mean and 90% quar-tiles for vertical merge DT creation.Transaction Mix 1, Scenario 1.

●●

50 60 70 80 90 1000.

60.

81.

01.

21.

41.

61.

8

Response Time Average

% Workload

Res

pons

e T

ime

(ms)

Unloaded Scenario 1 1xUnloaded Scenario 1 5xUnloaded Scenario 1 15xLog Prop Scenario 1 1xLog Prop Scenario 1 5xLog Prop Scenario 1 15x

(b) Response times for three variationsof table sizes in source table 2. Transac-tion Mix 1, Scenario 1.

Figure 9.5: Response times and throughput for varying workloads during ver-tical merge DT creation.

9.3.2 Log Propagation - Vertical Merge

The vertical merge experiments are performed using scenario 1 and 2 ofTransaction Mix 1, shown in Table 9.2. This is the same mix as was used inthe diff/int experiments.

Consider Figure 9.5(a), showing the mean response time for scenario 1.The graph shows the same rapid response time tendency as the diff/int ex-periments did. The response time distributions also have similar shapes asthose shown in the histograms in Figure 9.2. Hence, most requests are an-swered quickly, and the amount of requests with very long response timesincreases with the workload. Although histograms are not shown here, thisdistribution is indicated by the mean response times in Figure 9.5(a) beingmuch closer to the lower 90% quartile than the upper.

The left plot in Figure 9.5 shows the response time results for scenario 1with 20,000 records in both source tables. We consider it likely that verticalmerge in many cases will be performed on tables with uneven number ofrecords, however. For example, if the “employee” and “postal address” sourcetables are merged6, it is likely that at least some employees share zip codes.

6This has been used as an example of vertical merge throughout the thesis; refer to

Page 169: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 9. PROTOTYPE TESTING 155

50 60 70 80 90 100

0.5

1.0

1.5

2.0

2.5

3.0

Response Time Average

% Workload

Res

pons

e T

ime

(ms)

● ● ●

● ● ●●

Unloaded Diff/IntUnloaded Vertical MergeLog Prop Diff/IntLog Prop Vertical Merge

(a) Comparison of response times.

0 20 40 60 80 1000

100

200

300

400

Throughput, Unloaded and During Log Propagation

% Workload

Thr

ough

put

Unloaded Diff/IntUnloaded Vertical MergeLog Prop Diff/IntLog Prop Vertical Merge

(b) Comparison of throughput.

Figure 9.6: Comparison of response times and throughput for scenario 1 ofdifference and intersection and vertical merge.

Hence, Figure 9.5(b) illustrates how variations in the number of source tablerecords affect response time degradation. The red line, called 1x, is the plotfor 20,000 records in both source tables. This is the same plot as shownin the left figure. The blue 5x line shows test results with 4,000 records insource table 2 while the green line is for 1.300 records.

It is worth noticing how a reduction in the number of source records incurshigher degradation. The reason for this is that the records in source table 1of this experiment always have a join match in source table 2. Hence, as thenumber of records in source table 2 decreases, the number of join matchesfor each of these increases. This means that a modification to a record insource table 2 must be propagated to an average of 1, 5 and 15 records in thethree cases, respectively. This increases the work that must be done by logpropagation, which in turn results in an increased priority for the DT creationthread. As previously discussed, a higher priority on the DT creation threadincurs higher performance degradation for concurrent transactions.

As is clear from the above discussion, the response time of vertical mergehas a similar behavior as that of diff/int. This does not mean that the perfor-mance results are equal, however. In Figure 9.6, the results from the diff/intexperiments are shown together with those from vertical merge with 20,000

Section 6.5 for illustrations.

Page 170: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

156 9.3. PERFORMANCE TESTING

records in both source tables. The plots clearly show that the former exper-iment degrades performance to a much greater extent than vertical merge.To understand why, we have to investigate the amount of work performedby the two log propagators. In the vertical merge case, each source recordmodification is applied to one derived record on average. Even though morethan one record may be affected by modifications in source table 2, affectedrecords are always found by exactly one lookup in one DT. This is not thecase for the diff/int method, in which all source record modifications involvelookup of records in two or even three derived tables. Furthermore, sourcerecord modifications may require a derived record to move from the intersec-tion DT to the difference DT or vice versa. Since the diff/int log propagatorhas to perform more work for each log record, and since the same number oflog records are generated in the two experiments, the priority of the diff/intDT creation thread must be higher than that of vertical merge. Hence thehigher diff/int degradation to both response time and throughput.

9.3.3 Low Performance Degradation or Short Execu-tion Time?

All performance experiments described for log propagation have been per-formed with the lowest achievable performance degradation in mind. Asdescribed in Section 9.3, we define this as the degradation incurred whenthe log propagator is only capable of applying as many log records as areproduced. However, by using this DT creation thread priority, the states ofthe DTs do not get closer to that of the source tables. Thus, log propagationwill never finish.

If the priority of the DT creation thread is increased from this point, itgets more CPU time. Hence, it is capable of reducing the number of logrecords that separate the states of the DTs and source tables. At the sametime, however, the performance degradation is increased.

Figure 9.7 illustrates the effect of changing the priority in diff/int DTcreation, running 30 iterations with Transaction Mix 1 scenario 1 at 50%workload with 50,000 records in each source table. Starting at the leftmostside of the plot where the DT creation is run at maximum priority, it isclear that a slight decrease in priority results in much less degradation toresponse time at the cost of little additional execution time. As the prioritygets lower, however, less and less reduction in degradation is observed. It isup to the database administrator (DBA) to decide if short execution time orlow performance degradation is more important. Hence, we will not discussthis further.

Page 171: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 9. PROTOTYPE TESTING 157

4 6 8 10 12 14

05

1015

2025

30

Performance Degradation vs Time Spent

Completion Time (s)

Per

form

ance

Deg

rada

tion

(%)

● Response timeThroughput

Figure 9.7: Total time for log propagation with varying DT creation threadpriorities. Diff/int DT creation method, running Transaction Mix 1 scenario1 with 50% workload.

9.3.4 Other Steps of DT Creation

This far, only performance degradation during the log propagation step hasbeen discussed. In this section, the implied impact of the other steps arediscussed. Unless otherwise noted, difference and intersection DT creationexperiments, running Transaction Mix 1 scenario 1, are discussion. However,the discussion applies to all the DT creation methods.

Preparation and Initial Population

During the preparation step, derived tables and indices are added to theschema. This only involves modifications to the database schema; no recordsor locks are involved. The performance implication of this step is negligiblesince it completes very fast. An inspection of 100 performance report filesfrom all DT creation methods7 showed that the longest execution time ofthis step was 36 ms whereas the shortest was 17 ms.

The performance impact of initial population is a completely differentmatter, even though the priority of the DT creation thread can easily be low-ered to the point where no performance degradation is observed at all. Theproblem with such low priorities is that the step uses long time to complete.Since the log propagation step executed next has to apply the log records

7These were randomly picked from the 2,000 report files from the previous section.

Page 172: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

158 9.3. PERFORMANCE TESTING

Workload Initial Population Log Propagation % Difference50 % 0.758ms 0.750ms 1.1%70 % 0.764ms 0.778ms -1.8%90 % 0.865ms 0.872ms -0.8%

Table 9.7: Average response times during the initial population and log prop-agation steps of vertical merge when both steps use the same priority.

PriorityResponse timedegradation

Abortedtransactions

DT CreationTime

15

Max 0.5% 0.8% ∞25

Max 0.8% 1.0% 22.4 s

35

Max 3.8% 2.8% 14.9 s

45

Max 15.0% 7.1% 10.4 s

Max 29.1% 15.4% 7.3 sBlocking - 100% 3.6 s

Table 9.8: Effects on performance for different priorities of the DT creationthread during the initial population and log propagation steps.

generated during initial population, the execution time of this step highlyaffects the total DT creation execution time.

Initial population has no “minimal” priority similar to what we usedfor log propagation; the priority at which the same number of log recordsare propagated and produced within a time interval. Hence, we considertwo alternatives to the very low priority described above. The first is touse the same priority as the log propagation step. To get an indication onthe performance implications of this alternative, 30 test iterations have beenperformed with workloads of 50, 70 and 90% during diff/int. Not surprisingly,the tests show that the two steps degrade performance to the same extentwhen the priorities are equal. The results are shown in Table 9.7.

The second alternative is to use a very high priority on the initial pop-ulation step. Intuitively, this results in higher performance degradation fora shorter time interval. The extreme version of this is the insert into se-lect method used in many existing DBMSs (Løland, 2003): it involves read-locking the source tables for the duration of the entire initial population step.Log propagation and synchronization is not needed in this case.

Table 9.8 shows the results from the same 30 test iterations describedin Section 9.3.3 and illustrated in Figure 9.7. The priority of DT creation

Page 173: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 9. PROTOTYPE TESTING 159

has been varied, but initial population and log propagation have had thesame priority in all cases. The table clearly illustrates how the performancedegradation decreases as the DT creation time increases. The “blocking”line represents the “insert into select” method. As argued in Section 9.3.3,the choice of which priority is “best” is left to the DBA.

The performance degradation during the synchronization step is verysmall when the DTs are not used for schema transformations. 30 test runshave been performed on diff/int at 50 and 80% workload. Synchronizationwas started automatically when 10 or less log records remained to be redone.The experiments showed that the source table latches were held for a dura-tion of 1-2.5 ms while these log records were applied to the DTs. The testreport files indicate a slightly higher average of failed transactions due tounacceptably high response times the second immediately following synchro-nization; 2.4% and 1.7% for the 50% and 80% workload cases, respectively.However, these results vary much between the report files due to the shorttime interval and hence few available response time samples. The expectednumber of failed transactions for this step relies heavily on the remainingnumber of log records when the source table latches are set and the responsetime considered acceptable. Intuitively, if acceptable response time � latchtime, few transactions will fail.

9.3.5 Performance Experiment Summary

In the previous sections, we have discussed the results from extensive testingto find the performance degradation incurred by the DT creation methods.Log propagation has been discussed in most detail since this step typicallyruns for much longer time intervals than the other steps. In the described ex-periments, ∼75% of the DT creation time has been used by log propagation,∼25% by initial population and � 1% by preparation and synchronizationcombined.

The experiments have shown that the incurred performance degradationrelies heavily on the workload. At low to medium (∼70%) workloads, DT cre-ations running scenario 1 of the transaction mixes can be performed almostwith no degradation for concurrent transaction. If the workload is increasedbeyond this point, the performance quickly becomes unbearable becauseresponse times increase very quickly, and eventually result in throughputthrashing.

In addition to workload, DT creation thread priority also affects per-formance degradation to a huge extent. While a low priority DT creationthread results in little performance degradation, it also incurs long executiontime. Any increase to the priority will decrease the time to complete but also

Page 174: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

160 9.4. DISCUSSION

50 60 70 80 90 100

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Response Time Average

% Workload

Res

pons

e T

ime

(ms)

● ●●

Difference and IntersectionVertical SplitVertical MergeHorizontal MergeHorizontal Split

Figure 9.8: Summary of average response time for all DT creation methods.

increase the performance degradation.

The type of work performed by the transactions running on the serveralso affects the degradation. If the transactions perform few write operationson records in the source tables, the degradation is smaller than if manywrite operations are performed. Alternatively, the degradation may be heldconstant while the execution time is varied.

Finally, the different DT creation methods incur different degradations.The reason for this is that they must perform different amounts of workto propagate each log record. For example, log propagation of an updateoperation incurs an update of only one record in a DT when horizontal split isused. In difference and intersection, however, the same update would requirea lookup for equal records in one DT, and an update in one or even tworecords in the other DTs. As shown in Figure 9.8, difference and intersectionincurs most degradation while horizontal split incurs least. Vertical merge,vertical split and horizontal merge are between these.

9.4 Discussion

In this chapter, we have discussed empirical validation and performance ex-periments performed on the Non-blocking DBMS prototype. The empiricalvalidation experiments gave predictable and correct output, which stronglyindicates that the DT creation methods work correctly.

The performance experiments showed that close to 100% of the total DT

Page 175: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 9. PROTOTYPE TESTING 161

creation time was used by the log propagation and initial population steps.Under moderate workloads, DT creation can be performed with almost noperformance degradation for concurrent transactions. However, this requiresa very low priority on the DT creation process, which in turn increases thetotal execution time significantly. A consequence of this is that the DBAhas to make a decision on whether to perform the DT creation quickly withmuch degradation, slowly with little degradation or something in between.

When to use the DT Creation Method

It is clear that the execution time decreases and the performance degradationincreases as the priority of the DT creation process is increased. At extremelyhigh priorities, the DT creation method behaves almost like the insert intoselect method used in current DBMSs (Løland, 2003). When fast completiontime is more important than low performance degradation, the existing insertinto select method or Ronstrom’s method should be used. Since performanceexperiments have not been published on Ronstrom’s method, it is uncertainwhich of these would be preferred under which circumstances. In cases whereit is advisable to trade longer execution time for lower performance degra-dation our DT creation method should be used instead. Using our methodprovides flexibility since the priority may be increased or decreased as theDBA sees fit. Note, however, that the insert into select method allows com-binations of relational operators, including aggregates. These combinationsare not yet supported in our DT creation method.

We expect our method to outperform Ronstrom’s method when it comesto performance degradation. The reason for this is that Ronstrom’s methodforwards source record modifications by using triggers executed within theoriginal transaction (Ronstrom, 1998). A similar use of triggers is explicitlydiscouraged for MV maintenance (Colby et al., 1996; Kawaguchi et al., 1997).

If disk space is a major issue, Ronstrom’s method may still be preferredfor vertical merge and split schema transformations, however. The reasonfor this is that Ronstrom performs vertical merge transformations by addingattributes to an existing table, and vertical split by adding only one of thenew tables. In contrast, our method makes full copies of the source tables inboth cases.

Page 176: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

Chapter 10

Discussion

This chapter contains a discussion of the work presented in this thesis. Westart by discussing our research contributions with respect to how they meetthe requirements stated in Chapter 1, and how they compare to related work.We then briefly summarize the research question, and discuss to which extentit has been answered.

10.1 Contributions

The work presented in this thesis is based on the argument that databaseoperations which are blocking or incur high degrees of performance degrada-tion are not suited for database systems with high availability requirements.With this in mind, we decided to focus on a solution for two operations,database schema transformation and materialized view creation. The cur-rent solutions for both these operations degrade performance of concurrenttransactions significantly.

In this section, the contributions of our DT creation methods are dis-cussed. To summarize, the main contributions of the thesis are:

• A framework based on existing DBMS technology, that can be used tocreate DTs without blocking effects and with little performance degra-dation to concurrent transactions.

• Methods to create DTs using six relational operators.

• Strategies for how to use the DTs for Materialized View creation andschema transformation purposes.

• Solutions to common DT creation problems, which significantly sim-plifies the design of other DT creation methods.

Page 177: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 10. DISCUSSION 163

• Empirical validation of all presented DT creation methods.

• Thorough performance experiments on all DT creation methods.

In particular, we consider how the solution meets the requirements statedin Chapter 1, and how it compares to related work.

10.1.1 A General DT Creation Framework

The General DT Creation Framework presented in Chapter 4 is an abstractframework. It is based on the idea of running DT creation as a non-blocking,low priority background process to incur minimal performance degradationfor concurrent transactions.

Although we have focused on centralized DBMSs in this thesis, we are con-fident that the framework can be used to create DTs in distributed databasesystems as well. In particular, the framework should easily integrate intodistributed DBMSs where recoverability is achieved by sending logical logrecords to other nodes. In ClustRa DBMS, e.g., an ARIES like recoverystrategy is enforced by shipping logical log records between nodes (Hvasshovdet al., 1995). Furthermore, ClustRa uses logical record identifiers and logi-cal record state identifiers (Hvasshovd et al., 1995; Bratsberg et al., 1997b).Hence, this solution concurs to all the technological requirements that theopen source DBMSs evaluated in Chapter 7 did not concur to. This appli-cation of the framework is purely theoretical, however.

As described in Chapter 6, the framework can be used when creating DTsusing six relational operators1. However, the framework is expressive enoughto be useful for DT creation using other relational operators as well. Jonassonuses the framework for DT creation involving aggregates (Jonasson, 2006).An auxiliary table is used to compute the aggregate values. The solutionhas not been implemented, however, and the performance implications aretherefore uncertain.

10.1.2 DT Creation for Many Relational Operators

As discussed in Chapter 1, Materialized Views and schema transformationsare defined by a query, and are therefore created using relational operators.Relational operators can be categorized in two groups: non-aggregate andaggregate operators. The non-aggregate operators are join, projection, union,

1In this thesis, the full outer join, projection, union (both duplicate inclusion andremoval), selection, difference and intersection relational operators have been used for DTcreation

Page 178: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

164 10.1. CONTRIBUTIONS

selection, difference and intersection. Aggregate operators are mathematicalfunctions that apply to collections of records (Elmasri and Navathe, 2004).

In Chapter 1, we decided to use non-aggregate operators as a basis for DTcreation, allowing us to use optimized algorithms already available in currentDBMSs (Houston and Newton, 2002; IBM Information Center, 2006; Lorentzand Gregoire, 2002; Microsoft TechNet, 2006). We chose to focus on non-aggregate operators because these are useful for both schema transformationsand materialized views. An alternative would be to focus on aggregate op-erators. These are frequently used in materialized views, but are not oftenused in schema transformations. Furthermore, materialized views definedover aggregate functions often include non-aggregate operators as well (Aluret al., 2002).

As described in Chapter 6, our DT creation method can be used to createDTs using the full outer join, projection, union, selection, difference andintersection relational operators. This means that the method can be usedto create a broad range of DTs. Only the first four operators can be used inRonstrom’s method (Ronstrom, 2000).

10.1.3 Support for both Schema Transformations andMaterialized Views

In Chapter 1, we realized that both schema transformations and materializedview (MV) creation were blocking operations that could be seen as applica-tions of derived tables. Hence, we decided to focus both on how DTs shouldbe created to be usable in highly available database systems, and how theseDTs could be used for the two operations.

In this thesis, we have shown how the DT creation method can be used forboth operations. The solution for MV creation proved to be a straightforwardapplication of DTs; they can be used for this purpose without modification.

The solution for schema transformations is more complex, especially foroperators where a record in one schema version may contribute to or bederived from multiple records in the other schema version2. As argued inChapter 6, this may cause high degrees of locking conflicts between con-current transactions in the different schema versions. If this becomes a bigproblem, we know of no other alternative than to use the non-blocking abortsynchronization strategy, thus resolving the problem by aborting transactionsin the old schema version.

In contrast to our method and the “insert into select” method, the method

2This applies to all DT creation methods except horizontal merge with duplicate in-clusion, horizontal split and difference and intersection.

Page 179: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 10. DISCUSSION 165

suggested by Ronstrom can only be used for schema transformations (Ron-strom, 2000).

10.1.4 Solutions to Common DT Creation Problems

In Chapter 5, we presented five problems that are frequently encounteredin the DT creation methods. To summarize, these problems were: MissingRecord and State Identification, Missing Record Pre-States, Lock ForwardingDuring Transformations and Inconsistent Source Records. We also describedhow these problems could be solved in general.

The solutions to these five problems are contributions by themselves, asthey significantly eases the design of new DT creation methods. For example,a method for DT creation using aggregate operators described by Jonassonuses the suggested solutions for the missing record and state identificationproblems and the missing pre-state problem (Jonasson, 2006).

10.1.5 Implemented and Empirically Validated

The DT creation methods for the six relational operators have been imple-mented in a prototype DBMS. In Chapter 9, we discussed thorough empiricalvalidation experiments that were executed on all the methods in the proto-type. All the experiment results showed correct execution, thus stronglyindicating that the methods are correct. No matter how strong the indica-tions are, empirical validation experiments can never be used as a proof ofcorrectness with absolute certainty (Tichy, 1998). Even so, empirical valida-tion is considered vital in the software engineering community (Tichy, 1998;Pfleeger, 1999; Zelkowitz and Wallace, 1998). If confirmation of the results isrequired, empirical validation should be executed on another implementationof the same methods (Walker et al., 2003). Due to time considerations, thishas not been done.

10.1.6 Low Degree of Performance Degradation

The rationale for the research question was to develop DT creation methodsthat can be used in highly available database systems. Hence, a crucial goalwas to incur as little performance degradation to concurrent transactions aspossible.

The DT creation methods have been implemented in a prototype DBMS,and the performance implications were thoroughly discussed in Chapter 9.The experiments showed that the degree of performance degradation dependsheavily on four parameters: the workload, the transaction mix, the priority of

Page 180: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

166 10.1. CONTRIBUTIONS

the DT creation thread compared to other threads and the relational operatorused to create the DTs.

Workload

The incurred degradation is highly affected by the workload on the databaseserver. The performance experiments with 2,000 test runs on the six DTcreation methods showed that the response time increases rapidly as theworkload increases. This comes as no surprise since the delay over a bottle-neck resource is given by (Highleyman, 1989):

Delay =T

1− L(10.1)

Here, T is the average service time over the bottleneck resource, and L is theload. Hence, as the workload approaches 100%, the response time increasestowards infinity. Also, since 10 ms was defined as the highest acceptableresponse time for transactional requests, we observe throughput thrashingwhen the response time gets too high. Based on these observations, westrongly suggest that DT creation is performed when the workload is moder-ate or lower. In the experiments, the rapid increase in response time startedat approximately 75% workload. Different systems may observe this rapidincrease in response time at other workloads, depending on which resourceis the bottleneck.

Priority on the DT creation thread

The priority of the DT creation thread affects both the performance degra-dation and the execution time to a great extent. As was clear from Section9.3.3, a high priority results in quick execution with high degradation. Onthe other hand, a low priority results in low degradation over a longer timeinterval. We consider it the responsibility of the DBA to determine whichpriority setting to use.

Transaction Mix

The transaction mix3 of the workload on the system plays a significant rolefor the amount of performance degradation. The reason for this is that onlywrite operations to records in the source tables must be propagated to theDTs. Hence, a transaction mix that is read intensive4 produces less relevant

3The type of work performed by the transactions running on the server.4Or write intensive on records in non-source tables.

Page 181: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 10. DISCUSSION 167

log records5 than a transaction mix that is write intensive on source tablerecords.

For equal workloads, the log propagator has less work to do per timeunit if few relevant log records are produced than if many are produced.Hence, DT creation during the former transaction mix can either be processedquicker or with lower performance degradation than DT creation during thelatter transaction mix.

Relational Operator

The final identified parameter that affects performance to a great extent isthe relational operator used to create the DTs. The reason for the varia-tions is that different amounts of work must be performed by the differentoperators when a logged operation is applied to the DTs. In Section 9.3,we showed that DT creation using difference and intersection (diff/int) in-curs most degradation, while horizontal split incurs the least. Hence, underequal workloads, horizontal split can either be performed quicker or with lessperformance degradation than diff/int.

A Comparison with Related Work

Basing the framework on a non-blocking, low priority background processis significantly different from the two alternative strategies for materializedview creation and schema transformations. In the schema transformationmethod presented by Ronstrom, a background process is used for copyingrecords from the old to the new schema versions (Ronstrom, 2000). Triggersexecuted within normal transactions are then used to keep the copied recordsup to date with the original records. As discussed in Section 3.1.3, thesetriggers impose much degradation. A similar trigger strategy has previouslybeen suggested for maintenance of MVs6 (Gupta et al., 1993; Griffin et al.,1997), but is explicitly discouraged due to the high performance cost (Colbyet al., 1996; Kawaguchi et al., 1997).

Although performance experiments have not been published on Ron-strom’s schema transformation method, we expect our DT creation methodto incur less degradation. The reason for this is that our framework forwardsmodifications to the DTs using a low priority background process, as opposedto Ronstrom’s method of using triggers executed within each user transac-tion (Ronstrom, 2000). On the other hand, Ronstrom’s method is likely tocomplete in shorter time than our method.

5Log records that must be propagated to the DTs.6In MV maintenance, this is called Immediate Update.

Page 182: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

168 10.1. CONTRIBUTIONS

An even more drastic solution is currently used for DT creation in existingDBMSs (Løland, 2003). In the insert into select method, the source tablesare locked and read before the content is inserted into the DTs. This methodis simple, but all write operations on records in the source tables are blockedduring the process.

The insert into select method and Ronstrom’s method is better thanour DT creation method only in cases where fast completion time is muchmore important than low performance degradation. With our DT creationmethod, however, longer execution time can be traded for lower performancedegradation. This can be done to a small or large extent to fit in differentscenarios. Hence, in all cases where performance degradation has a highpriority, our method outperforms both.

10.1.7 Based on Existing DBMS Functionality

Already in the initial requirement specification, it was clear that the solutionshould be based on functionality found in existing DBMSs whenever possible.By using existing functionality, the method should be easy to integrate intoexisting systems. Hence, literature on DBMS internals and related workshave been studied carefully. The most relevant parts of this study werepresented in Chapters 2 and 3.

Our DT creation method uses standard DBMS functionality to a greatextent. For example, the widely accepted ARIES (Mohan et al., 1992) pro-tocol is used for recovery, Log Sequence Numbers (Elmasri and Navathe,2004) are used to achieve idempotence and algorithms for relational opera-tors (Garcia-Molina et al., 2002) available in all modern relational DBMSsare used for initial population of the DTs.

On the other hand, our solution also requires functionality that is thor-oughly discussed in the literature but is not common in existing DBMSs.Most importantly, this includes logical redo logging, record state identifiers(Hvasshovd, 1999) and logical record identification (Gray and Reuter, 1993).These principles were described in Chapter 2, and are required because therecords are physically reorganized since relational operators are applied.

As was evident from Chapter 7, the use of nonstandard functionalitymakes the integration into existing DBMSs more complex than if this wasnot the case. In Section 2.4, it was argued that the logical record identifierscan be replaced by physical identifiers if a mapping between the source andderived addresses is maintained. We have not found any solution to removethe logical redo log and record state identification requirements.

Hence, with few exceptions, the method is based on functionality commonin current DBMSs. A description of existing functionality can be found in

Page 183: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 10. DISCUSSION 169

Chapter 2.

10.1.8 Other Considerations - Total Amount of Data

The DT creation framework copies records from source tables to derivedtables. Thus, storage space is required for two copies of all records in thesource tables during DT creation.

When the DTs are used as materialized views, the additional data willpersist after DT creation has completed, and is therefore not considered awaste of storage space. When used for schema transformations, on the otherhand, the source tables are removed once transformation is complete. Hence,this additional storage space required may be considered wasted.

Since the source tables may contain huge amounts of data, the addedstorage usage in schema transformations may be problematic. This is alsoa problem in the two alternative solutions for schema transformations: theinsert into select (Løland, 2003; MySQL AB, 2006) method and Ronstrom’sschema transformations (Ronstrom, 2000). In the former method, the sourcetables are locked while the records are read, transformed by applying therelational operator and inserted into the new tables. Thus, this methodrequires the same amount of storage space as our method.

As thoroughly described in Section 3.1, Ronstrom’s method requires lessstorage space during vertical merge and split transformations. The reason isthat these transformations work “in-place”, i.e., attributes are added to orremoved from already existing records. Horizontal merge and split requiresthe same amount of storage space as our method, on the other hand.

We have no solution to this problem other than increasing the storagecapacity of the database server if required. This was also suggested by Ron-strom to solve the same problem (Ronstrom, 1998).

10.2 Answering the Research Question

In this section, we discuss how and to what extent our research has answeredthe research question:

How can we create derived tables and use these for schema trans-formation and materialized view creation purposes while incur-ring minimal performance degradation to transactions operatingconcurrently on the involved source tables.

In Chapter 1, we decided on refining the research question into four keychallenges. In the following sections, the results of the research is discussed

Page 184: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

170 10.2. ANSWERING THE RESEARCH QUESTION

with respect to these challenges and the research question.

Q1: Current Situation

What is the current status of related research designed to addressthe main research question or part of it?

The current status of related research was presented in a survey in Chapter3. From this review, we have identified the main limitations of existing solu-tions. The limitations are mainly associated with unacceptable performancedegradation.

Q2: System Requirements

What DBMS functionality is required for non-blocking DT cre-ation to work?

Our DT Creation method is inspired by Fuzzy Copying (Hagmann, 1986;Gray and Reuter, 1993; Bratsberg et al., 1997a), and is based on making aninconsistent copy of the involved tables. The copies are then made consis-tent by applying logged operations. The requirements of this strategy aredescribed in Chapters 2 and 4. Most of these are related to the reorganizedstructure of records after applying relational operators.

Q3: Approach and Solutions

How can derived tables be created with minimal performance degra-dation, and be used for schema transformation and MV creationpurposes?

• How can we create derived tables using the chosen six rela-tional operators.

• What is required for the DTs to be used a) as materializedviews? b) for schema transformations?

• To what extent can the solution be based on standard DBMSfunctionality and thereby be easily integradable in existingDBMSs?

Our solution to creating derived tables was presented in Chapters 4 and6. The method enables DT creation using the six relational operators, andthe DTs can be used for both MVs and schema transformations. Thus, wehave answered the two former parts of the question.

Page 185: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 10. DISCUSSION 171

The method is based on standard, existing functionality whenever wehave found it possible to do so. However, we also require some functionalitythat is not commonly used in current DBMSs. The implications of this werediscussed in detail in Section 10.1.7.

Q4: Performance

Is the performance of the solution satisfactory?

• How much does the proposed solution degrade performancefor user transactions operating concurrently?

• With the inevitable performance degradation in mind; underwhich circumstances is the proposed solution better than a)other solutions? b) performing the schema transformationor MV creation in the traditional, blocking way?

The performance implications of executing the DT creation method wasthoroughly discussed in Chapter 9. We found that the method incurs littleperformance degradation when the workload is not too high. However, thereare circumstances when DT creation incurs high performance degradation.Hence, the database administrator has to consider three parameters beforestarting the operation: the workload on the database server, the DT creationpriority and the operator used for DT creation. This was discussed in detailin Sections 9.3 and 10.1.6.

10.2.1 Summary

We have answered the main research question by developing the Non-blockingDT Creation method. To summarize, the work included: deciding on a re-search approach based on the design paradigm (Denning et al., 1989), athorough study of related work and usable functionality in existing DBMSs,development of a general DT creation framework, identification of and so-lutions to common DT creation problems, specialized methods for the sixrelational operators and a prototype design and implementation used in ex-periments.

The main research question and all the refined research questions havebeen answered.

Page 186: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

Chapter 11

Conclusion and Future Work

This chapter summarizes the main contributions and suggests several direc-tions for future research. Finally, publications resulting from the researchare briefly described.

11.1 Research Contributions

The major research contributions of this thesis are:

An easily extendable framework for derived table creation. A frame-work that can be used in the general case to create derived tables (DTs) ispresented. It is designed to degrade performance of concurrent transactionsas little as possible. In this thesis, the framework is used by six relationaloperators in a centralized database system setting. It is, however, extend-able in multiple ways. Examples include adding aggregate operators or toperform DT creation in a distributed database system setting.

Methods for creating derived tables using six relational operators.By using the general DT creation framework, we present non-blocking DTcreation solutions for six relational operators: vertical merge and split (fullouter join and its inverse), horizontal merge and split (union and its inverse),difference and intersection. Together, these methods represent a powerfulbasis for DT creation.

Means to use the derived tables for schema transformation and ma-terialized view creation purposes. Schema transformations and mate-rialized view (MV) creation are two database operations that must be per-formed in a blocking way in current DBMSs. By using the DT creation frame-

Page 187: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 11. CONCLUSION AND FUTURE WORK 173

work to perform these operations, we take advantage of the non-blocking andlow performance degradation capabilities.

Design and implementation of a prototype capable of non-blockingderived table creation. The DT creation methods for each of the sixrelational operators have been implemented in a DBMS prototype. Extensiveexperiments on this prototype have been used to empirically validate themethods.

Thorough performance experiments for DT creation using all sixrelational operators. Thorough performance experiments have been per-formed on the six DT creation methods in the prototype. The experimentsshow that the performance degradation for concurrent transactions can bemade very low. They also indicate under which circumstances DT creationshould be avoided. This is primarily during high workload.

11.2 Future Work

The following topics are identified as possible directions for further research.

DT Creation in Distributed Database Systems

The primary focus in this thesis has been DT creation in centralized databasesystems. However, we believe that the general framework can be used forDT creation in distributed database systems as well. Especially, distributedsystems based on log shipping, i.e., systems where the transaction log is sentto another node instead of written to disk, seem to be a good starting placefor this research.

DT Creation using Aggregate Operators

Some work has already been conducted on DT creations that involve aggre-gate operators (Jonasson, 2006), but the research is far from complete. Itwould be very interesting to implement these methods into the Non-blockingDatabase. Experiments should then be executed to empirically validate themethods and to indicate how much performance degradation they incur.

Implementation in a Modern DBMS

Because the DT creation methods require functionality that was not found inany of the DBMSs investigated in Chapter 7, we chose to perform experiments

Page 188: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

174 11.2. FUTURE WORK

on a prototype DBMS. Ideally, DT creation should not require any non-standard functionality. For example, we believe that block state identifierscan be allowed in the source tables as long as the derived tables have recordstate identifiers.

Implementing the DT creation functionality in an existing DBMS couldprovide additional insight. For example, the simplifications made to the pro-totype would be removed, and the methods could be subject to full-scalebenchmarks to get better indications on performance implications. Further-more, a second implementation could be subject to more empirical validationexperiments, and thereby to confirm the correctness of the methods (Walkeret al., 2003).

Hence, further research to find whether more standard DBMS function-ality could be used, and an implementation in an existing DBMS would bothbe considered interesting to the author.

Synchronizing Schema Transformations with Application Modifi-cations

When schema transformations have been discussed in this thesis, we havebeen concerned with incurring low performance degradation and performingthe switch between schemas as fast as possible so that latches are only heldfor a millisecond or two.

The transactions executed in the database system are, however, typicallyrequested by applications. When the schema is modified, the applicationsshould also be modified to reflect the new schema. It would be interesting toinvestigate how changes to an application can be synchronized with schematransformations.

As an initial approach, we would start by creating views reflecting theold schema when a schema transformation is committed. By doing so, theapplication can be changed after the schema transformation has completed.Of course, this strategy requires that all old data is intact in the new schema.

Combining Multiple Operators

In this thesis, we have designed methods to create derived tables using any ofsix relational operators. What we have not considered, however, is that mate-rialized views and schema transformations may in some cases require multipleoperators to get the wanted result. A Materialized View in a Datawarehousemay, e.g., be constructed by a join between four tables and an aggregate op-erator. For schema transformation purposes, the effect of multiple operatorscan often be achieved by performing the required transformations in serial;

Page 189: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

CHAPTER 11. CONCLUSION AND FUTURE WORK 175

Student

StudID

Name

CourseID

CourseName

Grade

StudentCourse

StudID

CourseID

Grade

Course

CourseID

CourseName

Student

StudID

Name

StudentCourse

StudID

CourseID

CourseName

Grade

Figure 11.1: Example of Schema Transformation performed in two steps.

Figure 11.2: Example interface for dynamic priorities for DT creation.

an example is illustrated in Figure 11.1. This serial execution can not beused for DT creation in general. Hence, it would be interesting to research ifand how the DT creation method can be used when multiple operators areinvolved.

Dynamic Priorities for the DT Creation Process

The performance experiments showed that the priority setting of the DT cre-ation process can be used to make the operation complete fast but with highperformance degradation, or complete in longer time with less degradation.In the current implementation, the priority is set once and for all when DTcreation is started. However, we see no reason why this priority should notbe dynamic. Figure 11.2 shows an example of how a graphical user interfacefor dynamic priority could look like.

11.3 Publications

Some of the research presented in this thesis has already been presented atseveral conferences. The papers, presented in chronological order, are:

Page 190: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

176 11.3. PUBLICATIONS

1. Jørgen Løland and Svein-Olaf Hvasshovd (2006) Online, non-blockingrelational schema changes. In Advances in Database Technology –EDBT 2006, volume 3896 of Lecture Notes in Computer Science, pages405–422. Springer-Verlag.

This paper describes the first strategy used to perform schema trans-formations with the vertical merge and split operators. Records in thetransformed schema are identified using their primary keys, whereasthe method in this thesis identifies records on non-physical Record ID.

Compared to using non-physical Record IDs for identification, the pri-mary key solution of this paper requires more complex transforma-tion mechanisms. On the other hand, the method is not restricted toDBMSs that identify records in one particular way.

2. Jørgen Løland, and Svein-Olaf Hvasshovd (2006) Non-blocking ma-terialized view creation and transformation of schemas. InAdvances in Databases and Information Systems - Proceedings of AD-BIS 2006, volume 4152 of Lecture Notes in Computer Science, pages96–107. Springer-Verlag.

In this paper, the general framework for derived table creation usedin this thesis is presented. DT creation methods for all six relationaloperators in this thesis are described, and the idea of using DTs foreither schema transformations or materialized views is introduced.

As opposed to the schema transformations described in “Online, non-blocking relational schema changes”, the methods in this paper usesRecord IDs for identification.

3. Jørgen Løland, and Svein-Olaf Hvasshovd (2006) Non-blocking Cre-ation of Derived Tables. In Proceedings of Norsk Informatikkonfer-anse 2006. Tapir Forlag.

The generalized DT creation problems formalized in Chapter 5 werefirst described in this paper. By using these generalized problems, theDT creation methods can be described in a more structured way.

Page 191: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

Part IV

Appendix

Page 192: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems
Page 193: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

Appendix A

Non-blocking Database: SQLSyntax

The SQL recognized by the Non-blocking Database prototype is by no meansthe complete SQL language. A small subset of the SQL standard has beenselected for implementation with the goal of providing enough flexibility intesting while being feasible to implement.

In the language definitions below, <. . .> means that it is a variable name.[. . . ] means optional. {. . . } is used for ordering. The following statementsare recognized:

create table <tablename>(<colname> <type> [<constraint>], ...);

drop table <tablename>;

delete from <tablename> where <pk_col>=<value>;

insert into <tablename>(<col1>, <col2>,...,<colX>)

values(<value1>, <value2>,...,<valueX>);

update <tablename> set <col1>=<value1>, <col2>=<value2>...

where <pk_col>=<value_pk>;

select {<col1>, <col2>...|*}

from <tablename>

[where <col>=<value>]

[order by <col>];

select {<col1>, <col2>...|*}

Page 194: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

180

from (<table1> join <table2>

on <ja_col1>=<ja_col2>)

[where <colX>=<value>]

[order by <colY>];

select {<col1>, <col2>...|*}

from <table1>

union

select {<col1>, <col2>...|*}

from <table2>

select {<col1>, <col2>...|*}

from <table1>

{difference|intersection}

select {<col1>, <col2>...|*}

from <table2>

<colX> = the name of attribute X

<valueX> = the value of attribute X

<constraint> = primary key

<ja_colX>= the join attribute column name of table X

<type> = Integer|String|Boolean|autoincrement

<pk_col> = column name of primary key attribute

Page 195: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

Appendix B

Performance Graphs

Page 196: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

●●

50 60 70 80 90 100

0.6

0.8

1.0

1.2

1.4

1.6

1.8

Response Time Average

% Workload

Res

pons

e T

ime

(ms)

Unloaded Scenario 1Unloaded Scenario 2Log Prop Scenario 1Log Prop Scenario 2

(a) Response times for varying workloads, scenario 1 and 2.

0 20 40 60 80 100

010

020

030

040

0

Throughput, Unloaded and During Log Propagation

% Workload

Thr

ough

put

●●

Unloaded Scenario 1Unloaded Scenario 2Log Prop Scenario 1Log Prop Scenario 2

(b) Throughput for varying workloads, scenario 1 and 2.

Figure B.1: Response times and throughput for varying workloads duringhorizontal merge DT creation, Transaction Mix 1, Scenario 1 and 2.

Page 197: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

● ● ●●

50 60 70 80 90 100

0.5

1.0

1.5

2.0

2.5

3.0

3.5

Response Time Average and 90% Quartile

% Workload

Res

pons

e T

ime

(ms)

UnloadedLog Propagation

(a) Response times for varying workloads, scenario 1. The red lines indicate meanresponse time and confidence intervals without DT creation.

50 60 70 80 90 100

0.6

0.8

1.0

1.2

Response Time Average

% Workload

Res

pons

e T

ime

(ms)

Unloaded Scenario 1Unloaded Scenario 2Log Prop Scenario 1Log Prop Scenario 2

(b) Response times for varying workloads, scenario 1 and 2.

Figure B.2: Response times for varying workloads during horizontal split DTcreation.

Page 198: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

0 20 40 60 80 100

010

020

030

040

0

Throughput, Unloaded and During Log Propagation

% Workload

Thr

ough

put

● ●

Unloaded Scenario 1Unloaded Scenario 2Log Prop Scenario 1Log Prop Scenario 2

Figure B.3: Throughput for varying workloads during horizontal split DTcreation.

Page 199: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

● ● ●●

50 60 70 80 90 100

12

34

56

Response Time Average and 90% Quartile

% Workload

Res

pons

e T

ime

(ms)

UnloadedDuring Log Propagation

(a) Response time mean and 90% quartiles for varying workloads, scenario 1.

●●

50 60 70 80 90 100

0.6

0.8

1.0

1.2

1.4

1.6

Response Time Average

% Workload

Res

pons

e T

ime

(ms)

Unloaded Scenario 1Unloaded Scenario 2Log Prop Scenario 1Log Prop Scenario 2

(b) Response times for varying workloads. Transaction Mix 1, scenario 1 and 2.

Figure B.4: Response times for varying workloads during vertical merge DTcreation.

Page 200: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

●●

50 60 70 80 90 100

0.6

0.8

1.0

1.2

1.4

1.6

1.8

Response Time Average

% Workload

Res

pons

e T

ime

(ms)

Unloaded Scenario 1 1xUnloaded Scenario 1 5xUnloaded Scenario 1 15xLog Prop Scenario 1 1xLog Prop Scenario 1 5xLog Prop Scenario 1 15x

Figure B.5: Response times for varying workloads during vertical merge DTcreation for three variations of record numbers in the two source tables. The1x plots are the ones shown in Figure B.4(a). Transaction Mix 1, Scenario1 and 2.

Page 201: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

0 20 40 60 80 100

010

020

030

040

0

Throughput, Unloaded and During Log Propagation

% Workload

Thr

ough

put

Unloaded Scenario 1Unloaded Scenario 2Log Prop Scenario 1Log Prop Scenario 2

Figure B.6: Throughput for varying workloads during vertical merge DT cre-ation. Transaction Mix 1, Scenario 1 and 2.

Page 202: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

● ● ●●

50 60 70 80 90 100

12

34

56

Response Time Average and 90% Quartile

% Workload

Res

pons

e T

ime

(ms)

UnloadedDuring Log Propagation

(a) Response time mean and 90% quartiles for varying workloads. Transaction Mix 1,Scenario 1.

●●

50 60 70 80 90 100

0.6

0.8

1.0

1.2

1.4

1.6

1.8

Response Time Average

% Workload

Res

pons

e T

ime

(ms)

Unloaded Scenario 1Unloaded Scenario 2Log Prop Scenario 1Log Prop Scenario 2

(b) Response times for varying workloads. Transaction Mix 1, Scenario 1 and 2.

Figure B.7: Response times for varying workloads during vertical split DTcreation.

Page 203: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

0 20 40 60 80 100

010

020

030

040

0

Throughput, Unloaded and During Log Propagation

% Workload

Thr

ough

put

● ●

Unloaded Scenario 1Unloaded Scenario 2Log Prop Scenario 1Log Prop Scenario 2

Figure B.8: Throughput for varying workloads during vertical split DT cre-ation. Transaction Mix 1, Scenario 1 and 2.

Page 204: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems
Page 205: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

Glossary

1MLF Abbreviation for One-to-Many Lock Forwarding. Technique usedduring non-blocking commit and abort synchronization of schema trans-formations.

2PC Common abbreviation for two-phase commit. Commonly used byschedulers in distributed database systems to ensure that transactionseither commit or abort on all nodes.

2PL Common abbreviation for two-phase locking. Transactions are notallowed to acquire new locks once they have released a lock.

Availability A database system is available when it can be fully accessedby all users that are supposed to have access.

Consistency Checker A background thread used to find inconsistenciesbetween records during vertical split DT creation.

Database A collection of related data.

Database Management System The program used to manage a database.

Database Schema The description, or model, of a database.

Database Snapshot The first type of materialized view. In contrast toMVs, snapshots can not be continuously refreshed.

Database System A database managed by a DBMS.

DBMS Common abbreviation for Database Management System.

Derived Table A table containing data gathered from one or more othertables.

DT Abbreviation for Derived Table.

Fine-granularity locking Locks that are set on small data items, i.e. records.

Page 206: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

192

Fuzzy Copy A technique used to make a copy of a table without blockingconcurrent operations, including updates, to the same table. Can bebased on copying records or blocks of records.

Fuzzy Mark A special log record used as a place-keeper by DT creation.

High availability Defines systems that are not allowed to be unavailablefor more than a few minutes each year on average.

Horizontal Merge Derived Table creation operator, corresponding to theunion relational operator.

Horizontal Split Derived Table creation operator, corresponding to theselection relational operator.

Idempotence Idempotent operations can be redone any number of timesand still yield the same result.

Initial Population Step Second step of the DT creation framework. Thederived tables are populated with records read from the source tableswithout using locks.

Latch A lock held for a very short time. For example used to ensure thatonly one thread writes to a disk block at a time. Also called semaphore.

Log Propagation Step Third step of the DT creation framework. Logrecords describing operations on source table records are applied tothe records in the derived tables.

Log Sequence Number See State Identifier.

Logical Log A transaction log containing the operations performed on thedata objects.

LSN Common abbreviation for Log Sequence Number; See State Identifier.

M1LF Abbreviation for Many-to-One Lock Forwarding. Technique usedduring non-blocking commit and abort synchronization of schema trans-formations.

Materialized View A view where the result of the view query is physicallystored.

MMLF Abbreviation for Many-to-Many Lock Forwarding. Technique usedduring non-blocking commit and abort synchronization of schema trans-formations.

Page 207: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

APPENDIX B. PERFORMANCE GRAPHS 193

MV Common abbreviation for Materialized View.

NbDBMS Abbreviation for Non-blocking DBMS. The name of the proto-type DBMS used for testing in this thesis.

Performance Degradation The degree of reduced performance, measuredin throughput or response time.

Physical Log A transaction log containing before and after values of thechanged objects.

Physiological Log A compromise between physical and logical logging.Uses logical logging to describe operations on the physical objects;blocks.

Preparation Step First step of the DT creation framework. Necessarytables, indices etc are added to the database schema.

Record Identification Policy The strategy a DBMS uses to uniquely iden-tify records. There are four alternative strategies: Relative Byte Ad-dress, Tuple Identifier, Database Key and Primary Key. The DTcreation framework presented in this thesis requires that either of thetwo latter strategies is used.

Record Identifier A unique identifier assigned to all records in a database.

RID Common abbreviation for Record Identifier.

Schema Transformation A change to the database schema that happensafter the schema has been put into use.

Self-maintainable Highly desirable property for materialized views; usedon MVs that can be maintained without querying the source tables.Throughout this thesis: also used on DT creations that can be per-formed without querying the source tables.

Semantically rich locks Lock types that allow multiple transactions tolock the same data item. Requires that the operations are commuta-tive, i.e. can be performed in any order. An example is “add $1000to account X”, which commutes with “subtract $10 from account X”.

SLF Abbreviation for Simple Log Forwarding. Technique used during non-blocking commit and abort synchronization of schema transforma-tions.

Page 208: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

194

Source Table The tables used to derive records from in the derived tablecreation framework.

State Identifier A value assigned to records or blocks (containing records)which identifies the latest operation that was applied to it. Used toachieve idempotence when logical logging is used.

Synchronization Step Fourth step of the DT creation framework. Thederived tables are made consistent with the source table records, andare either turned into materialized views or used to perform a schematransformation.

Transaction Log A file (normally) in which database operations are writ-ten. Used by the DBMS to recover a database after a failure.

Vertical Merge Derived Table creation operator, corresponding to the leftouter join relational operator in Ronstrom’s schema transformationmethod, and the full outer join operator in the DT creation methodpresented in this thesis.

Vertical Split Derived Table creation operator, corresponding to the pro-jection relational operator.

Page 209: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

Bibliography

Aas, J. (2005). Understanding the linux 2.6.8.1 cpu scheduler.http://josh.trancesoftware.com/linux/linux cpu scheduler.pdf.

Adiba, M. E. and Lindsay, B. G. (1980). Database snapshots. In Proceedingsof the Sixth International Conference on Very Large Data Bases, 1980,Canada, pages 86–91. IEEE Computer Society.

Agarwal, S., Keller, A. M., Wiederhold, G., and Saraswat, K. (1995). Flex-ible relation: An approach for integrating data from multiple, possiblyinconsistent databases. In ICDE ’95: Proceedings of the Eleventh Inter-national Conference on Data Engineering, pages 495–504, Washington,DC, USA. IEEE Computer Society.

Agrawal, R., Carey, M. J., and Livny, M. (1987). Concurrency control perfor-mance modeling: alternatives and implications. ACM Trans. DatabaseSyst., 12(4):609–654.

Alur, N., Haas, P., Momiroska, D., Read, P., Summers, N., Totanes, V., andZuzarte, C. (2002). DB2 UDB’s High Function Business Intelligence ine-business. IBM Corp., 1st edition.

Apache Derby (2007a). Apache derby homepage.http://db.apache.org/derby/.

Apache Derby (2007b). Derby engine papers: Derby logging and recovery.http://db.apache.org/derby/papers/recovery.html.

Apache Derby (2007c). Derby engine papers: Derby write ahead log format.http://db.apache.org/derby/papers/logformats.html.

Austin, C. (2000). Sun developer net-work: Java technology on the linux platform.http://java.sun.com/developer/technicalArticles/Programming/linux/.

Page 210: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

196 BIBLIOGRAPHY

Ballinger, C. (1993). TPC-D: benchmarking for decision support. In Gray,J., editor, The Benchmark Handbook for Database and Transaction Sys-tems. Morgan Kaufmann, 2nd edition.

Bernstein, P. A., Hadzilacos, V., and Goodman, N. (1987). ConcurrencyControl and Recovery in Database Systems. Addison-Weslay PublishingCompany, 1st edition.

Blakeley, J. A., Coburn, N., and Larson, P.-A. (1989). Updating derivedrelations: detecting irrelevant and autonomously computable updates.ACM Transactions on Database Systems, 14(3):369–400.

Blakeley, J. A., Larson, P.-A., and Tompa, F. W. (1986). Efficiently up-dating materialized views. In Proceedings of the 1986 ACM SIGMODinternational Conference on Management of Data, pages 61–71.

Bratsberg, S. E., Hvasshovd, S.-O., and Torbjørnsen, Ø. (1997a). Locationand replication independent recovery in a highly available database. In15th British Conference on Databases. Springer-Verlag LNCS.

Bratsberg, S. E., Hvasshovd, S.-O., and Torbjørnsen, Ø. (1997b). Parallelsolutions in ClustRa. IEEE Data Eng. Bull., 20(2):13–20.

Caroprese, L. and Zumpano, E. (2006). A framework for merging, repairingand querying inconsistent databases. In Advances in Databases and In-formation Systems - Proceedings of ADBIS 2006, volume 4152 of LectureNotes in Computer Science, pages 383–398. Springer-Verlag.

Cha, S. K., Park, B. D., Lee, S. J., Song, S. H., Park, J. H., Lee, J. S.,Park, S. Y., Hur, D. Y., and Kim, G. B. (1995). Object-oriented de-sign of main-memory dbms for real-time applications. In Proceedings ofthe 2nd International Workshop on Real-Time Computing Systems andApplications, page 109, Washington, DC, USA. IEEE Computer Society.

Cha, S. K., Park, J. H., and Park, B. D. (1997). Xmas: an extensiblemain-memory storage system. In Proceedings of the sixth internationalconference on Information and knowledge management, pages 356–362,New York, NY, USA. ACM Press.

Cha, S. K. and Song, C. (2004). P*time: Highly scalable oltp dbms formanaging update-intensive stream workload. In (e)Proceedings of the30th International Conference on VLDB, pages 1033–1044.

Page 211: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

BIBLIOGRAPHY 197

Codd, E. F. (1970). A relational model of data for large shared data banks.Commununications of the ACM, 13(6):377–387.

Colby, L. S., Griffin, T., Libkin, L., Mumick, I. S., and Trickey, H. (1996). Al-gorithms for deferred view maintenance. In Proceedings of the 1996 ACMSIGMOD International Conference on Management of Data, pages 469–480. ACM Press.

Crus, R. A. (1984). Data Recovery in IBM Database 2. IBM Systems Journal,23(2):178.

Cyran, M. and Lane, P. (2003). Oracle database online documentation 10grelease 1 (10.1) - ”concepts”, part no. b10743-01.

Denning, P. J., Comer, D. E., Gries, D., Mulder, M. C., Tucker, A., Turner,A. J., and Young, P. R. (1989). Computing as a discipline. Commu-nunications of the ACM, 32(1):9–23.

Desmo-J (2006). Desmo-j: A framework for discrete-event modelling andsimulation. http://desmoj.de/.

Elmasri, R. and Navathe, S. B. (2000). Fundamentals of Database Systems.Addison-Weslay Publishing Company, 3rd edition.

Elmasri, R. and Navathe, S. B. (2004). Fundamentals of Database Systems.Addison-Wesley, 4th edition.

Flesca, S., Greco, S., and Zumpano, E. (2004). Active integrity constraints.In Proceedings of the 6th ACM SIGPLAN international conference onPrinciples and practice of declarative programming, pages 98–107, NewYork, NY, USA. ACM Press.

Friedl, J. E. (2006). Mastering Regular Expressions. O’Reilly & Associates,3rd edition.

Garcia-Molina, H. and Salem, K. (1987). Sagas. In Proceedings of the1987 ACM SIGMOD International Conference on Management of Data,pages 249–259. ACM Press.

Garcia-Molina, H. and Salem, K. (1992). Main memory database systems:An overview. IEEE Transactions on Knowledge and Data Engineering,4(6):509–516.

Garcia-Molina, H., Ullman, J. D., and Widom, J. (2002). Database Systems:The Complete Book. Prentice Hall PTR, Upper Saddle River, NJ, USA.

Page 212: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

198 BIBLIOGRAPHY

Gray, J. (1978). Notes on data base operating systems. In Operating Systems,An Advanced Course, pages 393–481, London, UK. Springer-Verlag.

Gray, J. (1981). The transaction concept: Virtues and limitations. In VeryLarge Data Bases, 7th International Conference, September 9-11, 1981,Cannes, France, Proceedings, pages 144–154. IEEE Computer Society.

Gray, J., editor (1993). The Benchmark Handbook for Database and Trans-action Systems. Morgan Kaufmann, 2nd edition.

Gray, J. and Reuter, A. (1993). Transaction Processing: Concepts and Tech-niques. Morgan Kaufmann Publishers, Inc.

Greco, G., Greco, S., and Zumpano, E. (2001a). A logic programmingapproach to the integration, repairing and querying of inconsistentdatabases. In Proceedings of the 17th International Conference on LogicProgramming, pages 348–364, London, UK. Springer-Verlag.

Greco, G., Greco, S., and Zumpano, E. (2003). A logical framework forquerying and repairing inconsistent databases. IEEE Transactions onKnowledge and Data Engineering, 15(6):1389–1408.

Greco, S., Pontieri, L., and Zumpano, E. (2001b). Integrating and manag-ing conflicting data. In PSI ’02: Revised Papers from the 4th Interna-tional Andrei Ershov Memorial Conference on Perspectives of SystemInformatics, volume 4152 of Lecture Notes in Computer Science, pages349–362, London, UK. Springer-Verlag.

Griffin, T. and Libkin, L. (1995). Incremental maintenance of views withduplicates. In Proceedings of the 1995 ACM SIGMOD internationalconference on Management of data, pages 328–339. ACM Press.

Griffin, T., Libkin, L., and Trickey, H. (1997). An improved algorithm forthe incremental recomputation of active relational expressions. TKDE,9(3):508–511.

Gupta, A., Jagadish, H. V., and Mumick, I. S. (1996). Data integrationusing self-maintainable views. In Proceedings of the 5th InternationalConference on Extending Database Technology, pages 140–144. Springer-Verlag.

Gupta, A., Katiyar, D., and Mumick, I. S. (1992). Counting solutions tothe view maintenance problem. In Workshop on Deductive Databases,JICSLP, pages 185–194.

Page 213: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

BIBLIOGRAPHY 199

Gupta, A., Mumick, I. S., and Subrahmanian, V. S. (1993). Maintainingviews incrementally. In Proceedings of the 1993 ACM SIGMOD inter-national conference on Management of data, pages 157–166. ACM Press.

Haerder, T. and Reuter, A. (1983). Principles of transaction-orienteddatabase recovery. ACM Comput. Surv., 15(4):287–317.

Hagmann, R. B. (1986). A crash recovery scheme for a memory-residentdatabase system. IEEE Trans. Comput., 35(9):839–843.

Highleyman, W. H. (1989). Performance analysis of transaction processingsystems. Prentice-Hall, Inc., Upper Saddle River, NJ, USA.

Houston, Leland, S. and Newton (2002). IBM Informix Guide to SQL: Syn-tax, version 9.3. IBM.

Hvasshovd, S.-O. (1999). Recovery in Parallel Database Systems. VerlagVieweg, 2nd edition.

Hvasshovd, S.-O., Sæter, T., Torbjørnsen, Ø., Moe, P., and Risnes, O. (1991).A continuously available and highly scalable transaction server: Designexperience from the HypRa project. In Proceedings of the 4th Interna-tional Workshop on High Performance Transaction Systems.

Hvasshovd, S.-O., Torbjørnsen, Ø., Bratsberg, S. E., and Holager, P. (1995).The ClustRa telecom database: High availability, high throughput, andreal-time response. In Proceedings of the 21st VLDB Conference.

IBM Information Center (2006). DB2 Version 9 Information Cen-ter, http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp(checked december 5. 2006).

IBM Information Center (2007). Ibm db2 universal database glossary, version8.2, checked february 6. 2007.

Jonasson, Ø. A. (2006). Non-blocking creation and maintenance of mate-rialized views. Master’s thesis, Norwegian University of Science andTechnology.

Kahler, B. and Risnes, O. (1987). Extended logging for database snapshotrefresh. In Proceedings of the 13th International Conference on VeryLarge Data Bases.

Page 214: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

200 BIBLIOGRAPHY

Kawaguchi, A., Lieuwen, D. F., Mumick, I. S., Quass, D., and Ross, K. A.(1997). Concurrency control theory for deferred materialized views. InProceedings of the 6th International Conference on Database Theory,ICDT 1997, volume 1186 of Lecture Notes in Computer Science, pages306–320. Springer-Verlag.

Knuth, D. (1998). The Art of Computer Programming, Volume 3: Sortingand Searching. Addison-Wesley, 2nd edition.

Korth, H. F. (1983). Locking primitives in a database system. Journal ofthe ACM, 30(1):55–79.

Kruckenberg, M. and Pipes, J. (2006). Pro MySQL. Apress.

Lesk, M. E. and Schmidt, E. (1990). Lex - a lexical analyzer generator. InUNIX Vol. II: research system, pages 375–387, Philadelphia, PA, USA.W. B. Saunders Company.

Lin, J. (1996). Integration of weighted knowledge bases. Artificial Intelli-gence, 83(2):363–378.

Lin, J. and Mendelzon, A. O. (1999). Knowledge base merging by majority.In Pareschi, R. and Fronhoefer, B., editors, Dynamic Worlds: From theFrame Problem to Knowledge Management. Kluwer Academic Publish-ers.

Lindsay, B., Haas, L., Mohan, C., Pirahesh, H., and Wilms, P. (1986). Asnapshot differential refresh algorithm. In Proceedings of the 1986 ACMSIGMOD international conference on Management of data, pages 53–60,New York, NY, USA. ACM Press.

Lorentz, D. and Gregoire, J. (2002). Oracle9i SQL Reference Release 2 (9.2),Part no. A96540-01. Oracle.

Lorentz, D. and Gregoire, J. (2003a). Oracle Database SQL Reference 10gRelease 1 (10.1), Part no. B10759-01. Oracle.

Lorentz, D. and Gregoire, J. (2003b). Oracle Database SQL Reference 10gRelease 1 (10.1). Oracle.

Løland, J. (2003). Schema transformations in commercial databases. Report,Norwegian University of Science and Technology.

Løland, J. and Hvasshovd, S.-O. (2006a). Non-blocking creation of derivedtables. In Norsk Informatikkonferanse 2006. Tapir Akademisk Forlag.

Page 215: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

BIBLIOGRAPHY 201

Løland, J. and Hvasshovd, S.-O. (2006b). Non-blocking materialized viewcreation and transformation of schemas. In Advances in Databases andInformation Systems - Proceedings of ADBIS 2006, volume 4152 of Lec-ture Notes in Computer Science, pages 96–107. Springer-Verlag.

Løland, J. and Hvasshovd, S.-O. (2006c). Online, non-blocking relationalschema changes. In Advances in Database Technology – EDBT 2006,volume 3896 of Lecture Notes in Computer Science, pages 405–422.Springer-Verlag.

Marche, S. (1993). Measuring the stability of data. European Journal ofInformation Systems, 2(1):37–47.

Microsoft TechNet (2006). Microsoft technet: SQL Server 2005,http://msdn2.microsoft.com/en-us/library/ms130214.aspx (checkedmarch 22. 2007).

Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., and Schwarz, P. (1992).Aries: a transaction recovery method supporting fine- granularity lock-ing and partial rollbacks using write-ahead logging. ACM Transactionson Database Systems, 17(1):94–162.

MySQL AB (2006). Mysql 5.1 reference manual. http://dev.mysql.com/doc/.

MySQL AB (2007). Mysql homepage. http://dev.mysql.com/.

Oracle Corporation (2006a). Berkeley DB reference guide, version4.5.20. http://www.oracle.com/technology/documentation/berkeley-db/db/index.html.

Oracle Corporation (2006b). White Paper: A comparison of Or-acle Berkeley DB and relational database management systems.http://www.oracle.com/database/berkeley-db/.

Pfleeger, S. L. (1999). Albert Einstein and Empirical Software Engineering.Computer, 32(10):32–38.

PostgreSQL Global Development Group (2001). Post-gresql online manuals - postgresql 7.1. documentation.http://www.postgresql.org/docs/7.1/static/postgres.html.

PostgreSQL Global Development Group (2002). Post-gresql online manuals - postgresql 7.3. documentation.http://www.postgresql.org/docs/7.3/interactive/index.html.

Page 216: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

202 BIBLIOGRAPHY

PostgreSQL Global Development Group (2007). Postgresql history.

Qian, X. and Wiederhold, G. (1991). Incremental recomputation of activerelational expressions. Knowledge and Data Engineering, 3(3):337–341.

Quass, D., Gupta, A., Mumick, I. S., and Widom, J. (1996). Making viewsself-maintainable for data warehousing. In Proceedings of the Fourth In-ternational Conference on Parallel and Distributed Information Systems,1996, USA, pages 158–169. IEEE Computer Society.

Ronstrom, M. (1998). Design and Modelling of a Parallel Data Server forTelecom Applications. PhD thesis, Linkoping University.

Ronstrom, M. (2000). On-line schema update for a telecom database. InProceedings of the 16th International Conference on Data Engineering,pages 329–338. IEEE Computer Society.

Serlin, O. (1993). The history of debitcredit and the TPC. In Gray, J., ed-itor, The Benchmark Handbook for Database and Transaction Systems.Morgan Kaufmann, 2nd edition.

Shapiro, L. D. (1986). Join processing in database systems with large mainmemories. ACM Trans. Database Syst., 11(3):239–264.

Shirazi, J. (2003). Java Performance Tuning. O’Reilly & Associates.

Sjøberg, D. (1993). Quantifying schema evolution. Information and SoftwareTechnology, 35(1):35–44.

Solid Info. Tech. (2006a). Solid boostengine data sheet.http://www.solidtech.com/pdfs/SolidBoostEngine DS.pdf.

Solid Info. Tech. (2006b). Solid database engine administration guide.

Solid Info. Tech. (2007). Solid database homepage.http://www.solidtech.com/.

Sun Microsystems (2006a). Sun developer network: Java 2 standard edition5.0. http://java.sun.com/j2se/1.5.0/.

Sun Microsystems (2006b). Sun developer network: Java se hotspot at aglance. http://java.sun.com/javase/technologies/hotspot/.

Sun Microsystems (2007). Java 2 platform standard edition 5.0 api specifi-cation. http://java.sun.com/j2se/1.5.0/docs/api/.

Page 217: Materialized View Creation and Transformation of Schemas in Highly Available Database Systems

BIBLIOGRAPHY 203

Tichy, W. F. (1998). Should Computer Scientists Experiment More? IEEEComputer, 31(5):32–40.

Turbyfill, C., Orji, C. U., and Bitton, D. (1993). AS3AP - an ANSI SQLstandard scaleable and portable benchmark for relational database sys-tems. In Gray, J., editor, The Benchmark Handbook for Database andTransaction Systems. Morgan Kaufmann, 2nd edition.

Walker, R. J., Briand, L. C., Notkin, D., Seaman, C. B., and Tichy, W. F.(2003). Panel: empirical validation: what, why, when, and How. In Pro-ceedings of the 25th International Conference on Software Engineering(ICSE’03), pages 721–722. IEEE Computer Society Press.

Weikum, G. (1986). A theoretical foundation of multi-level concurrency con-trol. In PODS ’86: Proceedings of the fifth ACM SIGACT-SIGMODsymposium on Principles of database systems, pages 31–43, New York,NY, USA. ACM Press.

Zaitsev, P. (2006). Presentation: Innodb architecture and perfor-mance optimization. Open Source Database Conference 2006,http://www.opendbcon.net/.

Zelkowitz, M. V. and Wallace, D. R. (1998). Experimental Models for Vali-dating Technology. Computer, 31(5):23–31.