secure query processing in distributed database management

1

SECURE QUERY PROCESSING IN DISTRIBUTED DATABASE MANAGEMENT SYSTEMS - DESIGN AND PERFORMANCE STUDIES

Bhavani Thuraisingham and Ammiel Kamon

The MITRE Corporation, Burlington Road, Bedford, MA 01730 ABSTRACT

Distributed systems are vital for the efficient processing required in military applications. For these applications it is especially important that the distributed database management systems (DDBMS) should operate in a secure manner. For example, the DDBMS should allow users, who are cleared to different levels, access to the database consisting of data at a variety of sensitivity levels without compromising security. This paper focuses on secure query processing in a DDBMS. Implementation of secure query processing algorithms in a DDBMS as well as an analysis of the performance of the algorithms is described. 1. INTRODUCTION

The rapid growth of the networking and information processing industries has led to the development of distributed database management system prototypes and commercial distributed database management systems (see, for example jCERI841). In such a system, the database is stored in several computers which are interconnected by some communication media. The aim of a distributed database management system (DDBMS) is to process and communicate data in an efficient and cost-effective manner. It has been recognized that such distributed systems are vital for the efficient processing required in military applications. For these applications it is especially important that the distributed database systems should operate in a secure manner. For example, the DDBMS should allow users, who are cleared to different levels, access to the database consisting of data at a variety of sensitivity levels without compromising security.

A considerable amount of work has been carried out in providing multilevel user/data handling capability in centralized database management systems, known as trusted database management systems (TDBMS) (see, for example, [GRAU82, AFSB83. GRAU85, BURN86, DENN87, STAC901). In contrast distributed database management systems (DDBMS) have received very little attention. Note that in some DDBMSs limited forms of discretionary security controls (that is, where users access data based on authorizations) do exist [CERI84].

This paper concerns with query processing in a husted distributed database management systems. It has two focuses. A major consideration in the design of most TDBMSs is the feature of secrecy where information has to be protected from unauthorized individuals. Another important consideration,

which has generally been overlooked, is the feature of performance. For many applications in the command and control environment, it is important for the database system to operate in a secure manner as well as to meet real-time or at least near-real-time performance. In order to carry out performance enhancements to a TDDBMS, a baseline study is needed initially. The focus of this paper is to describe an initial implementation of query processing in a TDDBMS which provides the first step towards such a study. It describes a simple design of a query processing strategy and discusses the implementation of the design. The objective of this implementation is to validate the security policy and to analyze the performance of query processing algorithms.

The organization of this paper is as follows: Issues in secure query processing in a TDDBMS are described in section 2. In section 4 we discuss an initial implementation of query processing strategies and give performance results. The paper is summarized in section 3.

2. ISSUES IN SECURE QUERY PROCESSING

2.1 Architecture

We assume that the TDDBMS consists of several nodes that are interconnected by a multilevel secure network (network security issues are discussed in [WALK85]). Furthermore, each node is capable of handling multilevel data. That is, each node has a TDBMS. The database at each node is represented by a multilevel relational data model. The essential points of such a model are described in section 3. The global view of the distributed database uses the same multilevel relational data model used by the local systems. That is, we do not address heterogeneity. Therefore, the distributed data model that we consider has two levels: the global representation level and the local representation level, both of which use a multilevel relational model. The distributed data model is illustrated in figure 1.

Figure 1. Multilevel Data Model

88 THO351-7/90/0000/0088$01 .OO Q 1990 IEEE

Authorized licensed use limited to: Univ of Texas at Dallas. Downloaded on April 16,2010 at 17:53:02 UTC from IEEE Xplore. Restrictions apply.

2

Local TCB

Each node also has a distributed processing component which manages the global schemas and is also responsible for distributed query execution. This component is implemented as a set of processes separate from the local TDBMS. The local TCB (Trusted Computing Base) which enforces the security policy at a node) ensures that the objects that these processes read and write will be in accordance with the security policy enforced by it. That is, we do not assume the existence of a multilevel secure distributed operating system. The security architecture assumed here is illustrated in figure 2. We assume that a user has the same clearance to all of the databases in the distributed database system. This assumption makes the design of the query processor less complex.

Local TCB

SITE 1 sm2

Figure 3 shows the distribute architecture. The local TDBMS manages the local multilevel database. The Distributed execution monitor, which we also call the distributed query processor (DQP), is responsible for managing the data distribution issues. It has an interface to the users as well as to the network. Its functions include the following: (1 global query transformation where the query against the global relation is transformed into queries against the fragments, (2) global query optimization where the most profitable execution strategy is generated, and (3) monitoring query execution where the DQP has to supervise the actual execution of the query.

3.2 Data Distribution

A major design issue in distributed query processing is data distribution. There are two types of data distribution schemes that one must consider in a TDDBMS. One is the distribution of multilevel data into one or more single-level relations and the other is the distribution of the data across sites. The distribution of multilevel data has been addressed in centralized TDBMS designs before (see, for example, [STAC90]).

The second type of data distribution is the replication of the data. We assume that the relations are partially replicated. Furthermore, we also assume that the relations me horizontally fragmented. Vertical fragmentation is not attempted in most designs of distributed DBMSs. Incorporating vertical fragmentation in the presence of multilevel security makes it even more complex.

We assume that the security level of a relation is the security level of the user who creates it. A relation R classified at level L can have tuples at levels which dominate L. In other words, there could be a version of relation R at a level L* which dominates L. Therefore, corresponding to an Unclassified relation, there could be versions of that relation at Secret and Top Secret levels. In addition, each version at a security level could be fragmented at the same level. Furthermore, a fragment could also be replicated.

When a user enters a tuple, the tuple is stored in a kagment of the relation at the security level of the user. If the user's security level is dominated by the security level of the relation, then the tuple is not entered. The tuples could be polyinstantiated. However, within a security level, the primary key constraint cannot be violated. Each TDBMS is capable of creating a view of a relation at the security level of the querying subject. The distributed processing components merge the various views created by the individual TDBMSs in order to obtain a single global view at the security level of the querying subject.

The recombination scheme that we propose is a simple one that reduces the amount of data transferred between sites. It performs the union operation of all of the fragments involved within and across security levels. Note that the recombination operation at a security level L will involve all fragments of the relation dominated by L.

When polyinstantiation is presenL a user can request the lower level polyinstantiated tuples to be removed from his view. If so, only the tuple associated with the maximum security level that is dominated by the security level at which the .recombination operation is being performed is considered. If there are two different tuples at the same security level with the same primary, then the tuple with the latest time-stamp is considered. However, if the database is consistent, then it is not possible for two different tuples with the same primary key to exist at the same security level. We assume that the database is consistent.

Figure 3. Dlstrlbutcd Arcbltecturr

If polyinstantiation occurs across sites (that is, there are Secret and Unclassified versions of a relation EMP, but

89


3

the versions are at different sites and there are tuples in the versions with the same primary key) then the distributed query processing algorithm operating at the higher level should either consider both tuples or discard the one at the lower security level. The strategy chosen will depend on what the user requested.

There are two ways to distribute relations. Each of these methods is described below. In the f is t approach, called the top-down approach, the relation is first fragmented depending on some selection criteria. This selection criteria could either depend on (i) the values of some attributes in the relation to be fragmented (primary fragmentation), or (2) the values of attributes in another relation (derived fragmentation).

In the second approach, called the bottom-up approach, the distribution is as follows. The relation is not fragmented according to some selection criteria Instead, the tuples are entered into the version of the relation at the appropriate security level. That is, the security level of the tuple is that of the subject who enters it. As more and more tuples are entered into a version, the version of the relation may get fragmented at the same security level. Four of the strategies to fragment a version of a relation are round-robin, hashed, range partitioned with user specified placement value, and range partitioned with uniform distribution [DEWI85].

An objective in the design of distributed database schemas is to be able to decompose the global schema into local schemas without any loss of information or erroneous representations. Similarly, the individual local schemas should be recombined into the global schema without any loss of information. In a multilevel environment, if all of the schemas are assigned the same security level, then it is handled no differently from the schema of a non-trusted DDBMS. However, if the schemas are themselves multilevel, then it should be possible to decompose or recombine schemas without loss of information on a per- level basis. That is, it should be possible to decompose the global schema at security level L into the local schemas at security level L without loss of information. Similarly, it should be possible to recombine the local schemas at security level L into the global schema at security level L without loss of information.

The global representation model is a multilevel distributed relational model. The local representation model is also a multilevel relational model. Both models are the same. The global model represents the entire database and the local model represents the local database only. In a distributed environment, all of the tuples in a fragment of a relation are displayed in the global view of the relation. That is. if fragments R1 and R2 of a relation R are stored at sites 1 and 2 respectively, and if a user requests the global view of R, then all of the tuples in R1 and R2 will be displayed. However, this is not usually the case in a multilevel environment. This point is illustrated in the following example.

Consider the multilevel distributed database stored at sites 1 and 2 which is illustrated in figure 4. Figure 5 illustrates the Unclassified and Secret views at site 1.

Figure 6 illustrates the Unclassified and Secret views at site 2. Figure 7 illustrates the global Unclassified and Secret views.

Site 1 : EMP-U Site. 1 : EMP-S

Site 2: EMP-S Site 2 EMP-U

Figure 4. Multilevel Distributed Database

Unclassified view

&ret view - lower level

tuples removed pOlyinstallti*ed

Figure 5. Views at Node 1

lames

lower level

not m o v e d pOlyhStanUated NpkS

We note the following: From the global Unclassified view one can

obtain the local Unclassified views; From the two local Unclassified views, one can

obtain the global Unclassified view; When lower level polyinstantiated tuples are not

removed, then it is possible to obtain the local Secret views from the global Secret views;

When lower level polyinstantiated tuples are not removed, then it is possible to obtain the global Secret view from the local Secret views;

When lower level polyinstantiated tuples are removed, it is not possible to obtain the local Secret views from the global Secret view;

When lower level polyinstantiated tuples are removed, it is not possible to obtain the global Secret view from the local Secret views.

90


4

David

Unclassified view

James Jane lack

Unclassified view

Secret view - lower level

plyinstantiated tuples mwed

Figure 6. Views at Node 2

Secnt view - lower level

polyinsrantiated tuples removed

Figure 7. Global Views

Secret view ~

lower level polyinstantialed tuple not removed

SS# I Name - 1 1 2 2 3

John Paul

%?d James Il!$ Smith

Secre1 view - lower level

polyinstantiated nples not removed

3. IMPLEMENTATION

This section describes a simple design of a query processing strategy and the implementation of t h i s design. Our objective in this implementation is to (1) validate the security policy for query processing in a TDDBMS, and (2)

The organization of this section is as follows: In section 3.1 a simple design of a TDDBMS is described. In wprtinn 'I 3 the imnlpmpntatinn pnvirnnmpnt i c dPcrrihPA

In sechon 5.3 the secure query processing algorithms are discussed. In section 3.4 the performance graphs are exhibited and analyzed.

3.1 Design of a TDDBMS

The distributed architecture that we considered for the implementation is the one shown in figure 3. In this architecture, various nodes are connected via a trusted network. Each node consists of a distributed query processor and a local TDBMS. The multilevel database associated with each node is managed by the TDBMS.

We assume that the user can pose a query from any node in the distributed system. The distributed query processor at the node where the query is posed will determine the site at which the query should be executed. This is done by examining the global schema which has information on (i) the fragments of the various relations and (ii) the allocations of the fragments. The query is then routed to the execution site. The distributed query processor at that site performs query optimization, and it subsequently

generates various execution strategies. The strategy with the least cost is selected for execution. Cost is determined by the number of tuples that have to be transmitted in a strategy. The query execution is monitored by the distributed execution monitor component of the distributed processor. The response assembled is sent to the site where the query was posed.

We assume that the relations are horizontally partitioned within and across security levels. It is also possible for tuples to be polyinstantiated at different sites. However, it is not possible for two different tuples with the same primary key to exist at the same security level.

3.2 Implementation Environment

The implementation architecture is shown in figure 8. We implemented a two-node architecture. At each node, the local TDBMS is augmented with a distributed query processor (DQP). The DQP has interfaces to the user process as well as to the communication channel. DQP was implemented in C on SUN-3 running Berkeley UNM (trademark of AT&T Laboratories). We used the UNIX IPC (Inter Process Communication) facilities to connect DQP to the user as well as to the communication channel. DQP has access to the global schema. This schema is replicated at each node. Note that the Local Query Processor (LQP) component is the trusted front-end of the local TDBMS. LQP has access to the local schema which describes the relations at the local node.

Since a commercial TDBMS was not available to us at the time of implementation, we also implemented the local TDBMS. We used the trusted frontend/ untrusted backend distributed architecture proposed by the Air Force Summer Study [AFSB83] for the TDBMS. In this architecture, a trusted front-end is interconnected to untrusted back-end machines. Each back-end machine operates at a single level and manages the database at the same level. That is, the Secret back-end machine manages the database at the Secret level and the Top Secret back-end machine manages the database at the Top Secret level. The SYBASE Dataserver [SYBASE] (trademark of Sybase Inc), a commercial relational DBMS, was used for the back-end machines. The local query processor (LQP) illustrated in figure 8 performs the functions of the trusted front-end. A more detailed description of the implementation of the TDBMS is given in [THUR90].

91


5

Y -?

Figure 8. Implementation Architecture

Since the nodes were not connected via a network, the performance of the strategies was not affected by the network transmission time. However the impact of transferring tuples over the communication channel as well as the security impact on the performance of query processing within a node were exhibited in our experiment.

3.3 Query Processing Algorithms

We give algorithms for the select-all and the join queries. We assume that the relations are horizontally partitioned within and across security levels. Furthermore, polyinstantiation can occur within and across sites. We also assume that within a security level the primary key constraint cannot be violated.

I . Select-All Query 1. From the global schema, determine the various fragments associated with the relation specified in the query. The security level of each fragment must be dominated by the security level of the user who requested the query. 2. Send requests to the sites where the fragments are located. 3. Each site which receives a request specified in step 2 does the following:

4.

(i) tuples from the corresponding fragments. The local TDBMS handles polyinstantiation with a site. (ii) (The query execution site is the site where the request

The query execution site does the following: (i) Merge all the tuples received; (ii) Eliminate the lower level polyinstantiated tuples if requested by the user; (iii) Give the response to the user.

Request the local TDBMS to retrieve all the

Send the tuples to the query execution site.

was posed.)

11. Join Query 1 . From the information in the global schema, determine the query execution site. Selecting the query execution site depends on the number of tuples that have to be transmitted during the query execution and the number of join operalions that have to be performed. 2. Send the query to the execution site. 3. The query execution site does the following:

(i) Determine the execution strategy; (ii) requests to other sites to retrieve tuples and/or to

perform certainjoins). (iii) requested by the user; (iv) whch received the request from the user. The site which received the request from the user displays the response to the user.

Execute the strategy (this may involve issuing

Eliminate lower level plyinstantiated tuples if

Assemble the response and send it to the site

4.

3.4 Performance Analysis

In this section we first illustrate the various performance graphs that we have obtained from our experiments and then analyze the results. For each graph we also discuss the operations conditions under which the data was gathered.

3.4.1 Performance Graphs

We have obtained a total of 28 graphs. We assume that there are only two security levels: Unclassified and Secret (note that the algorithm can actually handle more levels). We also assume that the query is posed by a Secret user.

Although the query processing algorithm for the join queries .compute the cost of altemate strategies and select the strategy with the least cost for execution, we do not test this algorithm, as the tuples are not transferred across an actual network. The algorithm that is tested forces a particular strategy to occur.

We assume that the processing is always performed at node 1. This assumption is not a restriction as we consider the following two scenarios for each type of data distribu tion.

user poses a query at no& 1; user poses a query at no& 2.

For each test case, we obtain the execution time taken at each node and the total execution time for the query (which is the sum of the execution times at nodes 1 and 2). The various testcases considered are described below.

Case 1 Select-All Query 1.1 Polyinstantiation is present, but the lower level

plyinstantiated tuples are not removed. 1.1.1 Queryposedatnode 1:

1.1.1.1 keep # of tuples in no& 2 fiied at 0; v vary # of tuples in node 1 (Graph 1);

1.1.1.2 keep#oftuples innode2fixedat2000; vary # of tuples in node 1 (Graph 5).

1.1.2 Query posed at no& 2

92


6

1.1.2.1 keep# of tuples innode 2 fixed at 0; vary # of tuples in node 1 (Graph 3);

1.1.2.2 keep#of tuples innode 2 fixed at 2000; vary # of tuples in node 1 (Graph 7).

1.2 Polyinstantiation is present and the lower level polyinstantiated tuples are removed. 1.2.1 Query posed at node 1:

1.2.1.1 keep#of tuplesinnode2fnedatO; vary # of tuples in node 1 (Graph 2);

1.2.1.2 keep # of tuples in node 2 fixed at 2000; vary # of tuples in node 1 (Graph 6).

1.2.2.1 keep#of tuplesinnode2fixedatO; vary # of tuples in node 1 (Graph 4);


1.2.2 Query posed at node 2

Case 2 Join Query 2.1 Polyinstantiation is present, but the lower level

polyinstantiated tuples are not removed. 2.1.1 Queryposedatnodel:

2.1.1.1 keep # of tuples in no& 2 fned at 0 vary # of tuples in node 1 (Graph 9);

2.1.1.2 keep # of tuples in no& 1 fmed at 0; vary # of tuples in node 2 (Graph 13);

2.1.1.3 keep # of tuples in node 2 fixed at 800; vary # of tuples in node 1 (Graph 17);

2.1.1.4 keep # of tuples in no& 1 fixed at 800, vary # of tuples in node 2 (Graph 21).


2.1.2.2 keep # of tuples in no& 1 fixed at 0; vary # of tuples in node 2 (Graph 15);

2.1.2.3 keep # of tuples in no& 2 fixed at 800; vary # of tuples in node 1 (Graph 19);


2.1.2 Queryposedatnode2

2.2 Polyinstantiation is present and the lower level polyinstantiated tuples are removed. 2.2.1 Query posed at no& 1:



2.2.1.3 keep # of tuples in node 2 fixed at 800, vary # of tuples in node 1 (Graph 18);




2.2.2.3 keep # of tuples in no& 2 fixed at 800, vary # of tuples in node 1 (Graph 20);


2.2.2 Query posed at no& 2

2.3 Polyinstantiation is not present. 2.3.1 Queryposedatnode 1:

2.3.1.1 keep # of tuples in node 2 fixed at 800, vary # of tuples in node 1 (Graph 25);

2.3.1.2 keep # of tuples in node 1 fixed at 800, vary # of tuples in node 2 (Graph 27).

2.3.2.1 keep # of tuples in no& 2 fixed at 800, vary # of tuples in node 1 (Graph 26);

2.3.2.2 keep # of tuples in node 1 fued at 800, vary # of tuples in node 2 (Graph 28).

2.3.2 Queryposedatnode2

Operatbnal Condltbnr for Graph 1

*cry: S s b s t - d l h R N& h VlrcnqvcyU pd 1 Dcdptim d R EMF'(SSi?. Enme. Wr). SS# U thc k q

lGocp I of t q l k s "dc 2 Gad st 0 V r y # O f qain mdc 1 Yw (2m of& tlqh ale pdj-i"hkd) P o l y k m a i & "

E l i i l m c r l m l p d y l U F k x No

Y Axii: r=inlscmd. X h h : #of hlpla in m& I (n loa))

Data for Graph 1

Graph 1

O M 1 2 3 4 5

93


7

YofTuplu mN&l

I000

m 3000

4000

5000

Operatlonal Condltlons for Graph 2

*cry:

DcssrfimofR

Fulyimtuukh pcmt Eliminmlownlnrclpd,' . ' dnlplcc

N& h w b qtmy Uposcd:

a i k

XAXk Y A X k

Dah for Graph 2

YofTuph BUT- CRIT- Toul inNoh2 itNodcl UN&2 CRTTnnc

0 2 4 1.0 3 A

0 7.9

(rccopd.) (rccopd.) ~.cMd.)

6.0 1.9

0 10.7 25 13.2

0 155 3.4 18.9

0 zS.9 4.2 33.1

I000

zoo0

3000

4Ooo

5000

0 1.2 0.1 13

0 4.2 0.1 4 5

0 8.9 0.1 9.0

0 15.4 0.1 15.5

0 253 0.1 25.4

1 2 3 4 5

Operatlonal Condltbns for Graph 3

Wm-.U fran R E A v b qwuposcd: 2 Dcsrptim of R

miyimtuukti,,, prcmt:

EMP(ssn. Enlmc. W). ss* U th kcy *# oftupla inn& 2 fucd lt 0 V p y l dtuph h d 1 Y a (2m of mc tuph .IC pdyimmntited)

CritRii:

E l h h k Imn kvcl pd- t u p k X Axu Y A X k T k h d

Dah for Graph 3

No I oftupla inn& 1 (n' 1000)

3000 6.4

4Ooo 8.0

5000 10.5

- . 1 2 3 4 5

Operatbnal Condltlons for Graph 5

Sclm-dl frun R Eh W h n c q7 is+ 1 Dcsrptim of R: Q i k

BMp(SS#, Bnunc. M). SSW U thc k y Keep X oftupla inn& 2 fucd l ~. V u y # o f t u p I u " d c l Yea (20% of mc hlplu .rc pdyiraudhtcd) miyimtuukti,,, y t :

U"k lowcr h e 1 p d w t u p k XAXk Y Axu TmrCinaCmd.

Dah for Graph 5

No #oftupluinnoh1(n*1000)

. .

2.2

2 3 5.4

94


Opentbnal Condltlonr for Graph 8 Opentbnal Condltbnr lor Graph 6

Query: Nodc frun w h qunyiaposd DcrriF6m dR. atcric

m i w h p c m n t : plunnrc bwcr h1 polrplmtiad" . . X Axi.. YAxir:

Data lor Gnph 6

Opentbnal Condltbnr lor Graph 7

.SCk€hUfJWIR %zL w h U+ 2 Dncriptim of R

m 1 w h - t

Hh(p(ssa. Enunc. Da), SSI U thc ley Keep I oftupla inn& 2 rucd It m. v n y a Oftuph mmdc 1 Ye8 (u)a of mc tuplca .IC pdyinrtmthed)

a of tupio. in nodc 1 (n 1000)

airaic

E 1 i " k lover lcvcl pdyblmthd t u p k XAXk YAxir: Tnncinsx'n&

Data lor Graph 7

No

16.7

23.9

303

39.1

45.2

Opentbnal Condltloni for Graph 9

Quay: No& h w h w c q w y U p a d hscriptim ofR1: h a + m o f R 2 airaL:

m i M t i m pfc-t: ElimhuelovcrIc~l&' . -. 4luplc.C XAXk Y Axi..

Dah lor Gnph 9

800

1200

1600 9 .l

95


9

UofTuplsdlofTupka (PUT& WTii T d mN&l mN&2 u N & l u N & 2 BUT-

W O 1.6 0.1 1.7

8 0 0 0 4 3 0.1 4.4

la00 0 8.6 0.1 8.7

1600 0 14.2 0.1 14.3

(.ooood.) (.ocood.) (=€=nw

um o m.6 0.1 20.7

Operational Condltbnr for Graph 11

Nodc h w h query u p c u d Quy:

DcrrpticoofRI: D a u i i o f R 2 QiIUiC

XAxir: Y hi.:

Data for Graph 11

Operational Condltbnr for G n p h 12

N& Q=7: b w h query U pond

DcsagtimdRI:

QiIUiC Dmitim Of=

Fblyimd&”e E l i m n n k h c l p d y ’ . . ’ - I* XAxir: Y e

Data for Gnph I2

la00 9.0 9.5

1600 14.9 15.9

1 2 3 4 5

G n p h 13

1 2 3 4 5

96


10

Opentbiul Condltbns for Cnph 14

ShWh q u q is+

L-rw. . d-

t k a u i i dR1:

Okrk muptimdR2:

. .

XAxi.: YAxi.:

Iht. for Cnph 14

1 2 3 4 5

Opentbnal Condltbns for C n p h 15

Quy: No&. h w h qtuybpood 0uP.ptim ofR1: D w o i d R Z : OiWk

F b l W O n p l c a n C ~ l m a M p l y i m m d a k d t u # c x X Axir. Y Axk:

Data for Cnph 15

1 2 3 4 5

0

I 2 3 4 5

Opentbnal Conditions for C n p h 17

Qunu:

DuaimdRZ:

~ I ~ t i O n p l C ~ t EhMklOrnrImlpiyktddtupla:

Nodc h w k is+ Izacrim of R1:

OiWk

. . XAXk YAXk

Data for Cnph 17

I 1 1 4 0 0 1 800 1 6 . 7 1 ; 1 8 . 6 1

800 800 10.7 20 12.7

lux) 800 165 18.6

1600 800 233 25.2

Po0 800 29.4 2 0 31.4

1 2 3 4 5

97


11

U o f T q h IofTvph B u T i h N & l inNodc2 uNodcl

400 800 105

800 800 16.9

lux) 800 26.1

1600 800 38.1

2ooo 800 465

(rcmoJ.)

Operatbnal Condltbns for Graph 18

Quay: Nodc h w h qumy Upo&d: W i m of R1: W i m o f R z : GitcrL:

P o 1 y i m m U w ” t EliminelawrIcvdpdy’ ‘.dhlpb.: X AXL: Y A X k

Data for Graph 18

CRlTimc T d ltNodc2 B u T i (=.aDm (E-&)

4.2 14.7

5.0 21.9

5.3 31.4

6.0 44.1

6.2 527

‘IofTllpru XofTuplu W T i B u T i T d mN&l inNodc2 a N & l atNodc2 BUTimc

800 400 5 5 1.2 6.7

800 800 9.7 2 2 11.9

800 lux) 14.4 3.4 17.8

800 1600 21.6 3.9 255

800 m 32.3 4 5 36.8

(rtcoodr) (scoodr) (==w

JuinRl mdRZ 1 EraySW. E”. Dw). SSI ia lhc key mor.I)nms); o r i l k k e y

vmy I ofhlph. i n d 1 Yw (20% of tbc rga ue pdyi”thed)

cfNpka inn& 2 fd at 800

YE I dtuF&8 inwdc 1 (n 400) Tiinkaolh

Operatbnal Condltbnr for Graph 19

BLwhcr. qurylpoad: DwrrptimofRl: Duaiptiao o f R 2 GhL:

Polyimmnhtimplumlt Elunme lowcr kvclpdyhdur.d +ex . .

XAXk Y AXL:

Data for Graph 19

2ooo

Graph 19

m.4 33.7

Bu Trm UN&2 (=-w

3.8

4.1

4.6

5 3

6.1

7 31.7

5 0

4 0

3 0

2 0

1 0

0 I 1 2 3 4 5

98


Operational Condltkni for G n p h 22

Quay: N& b o m w b a c qmxy ipacd: Dcrrptim dR1: a P p t i m d R 2 aierie

1 inNodcl inN-2 uNodcl uNodc2 B U T i

(kmd.) (=ads) (d)

800 400 10.4 28 13.2

803 800 16.8 5.2 U 0

800 im m.4 7.4 33.8

800 1600 336 9.2 428

800 2ooo 40.9 103 51.2

Operational Condltkna for Graph 24

800

800

800

800

. . X Axis: Y Axis:

800 15.1 3.0 18.1

lux) 23.9 6.0 29.9

1600 30.9 85 39.4

aW0 37.1 120 49.1

WofTuplg LofTqks B U T i C R I T b k N & l inNoh2 U N A 1 uNodc2

400 800 5.9 21

800 800 9 5 2.0

1200 800 143 2.0

(lecoodr) (lecoodr)

1600 800 B.7 2.1

um 800 283 2.2

Operational Conditions for G n p h 23

JoiuRl mdR2

T d CRTTm (kmda)

8.0

11.5

16.3

228

30.5

h b lor Gnph 24

Gnph 24

Operational Conditions for Graph 25

Data lor Gnph 25

G n p h 25

99


IofTlrplcr inN&l

400

800

laW

1600

zoo0

13

IofTupka BUT& ClUTrmc T d inNodc2 u N & l d N & 2 CPUT"

800 6.7 3 5 10.2

800 109 3.6 14-5

800 15.9 3 8 19.7

800 14.7 4.1 288

800 31A 4.6 36.0

(rmad.) (===w (d)

Operstbnsl Condltbnr for Graph M

PYT: Join R1 and R 2 N& fmmwhcrc query is@ 2 DwuiitimofRl: Dcaaiptim of R2 Giark

W ~ l i m ~ . c n t : No XAXL:

EMP(SS1. Fnunc. DU). SSY is tk k y D W D k hum$; DN E thc key KeepIoftupksinnodcl fixcdu800 vay I oftuph in" 2

# of mpa inn& 2(n * 400) YAxir: r i i n d

Ihta for Graph M

Operatbnal Condltbnr for G n p h 27

Quc7: J o i o R l d R 2 Nadc I " w h query is@ 1 W p t i m o f R l : EMP(SSI. W. MI). SSI is thc k y

GitUL Ibcp#dhrpamn&l W u 8 0 0

XAXiX

rzrr~ofRz: D E p T ( D N . e ) ; "thehy

VUyY oftupka innmk 2

# ofm* in& 2 (n 400) F t . l ~ t i m plramt: NO

YAxir: r i in- Data for Graph 27

11.9

T d m r w (=a")

9.1

15.0

21 3

31 A

43.1

3.4.2 Analysis

Our implementation of the query processing algorithm chooses the node at which the processing should be done by computing the cost of various strategies and selecting the one with the least cost. The cost is determined by the number of tuples that have to be transmitted across the network..

In the testing we always forced node 1 to process the query. This did not matter as the test cases that we considered were such that all possible scenarios were included. By examining the execution times obtained (specifically comparing graph 17 to 23. 18 to 24, and 25 to 28) we found that by choosing the strategy that will cause less network traffic we would also choose the strategy that requires less CPU time. Therefore our assumption that network traffic is the main cost variable is correct.

Another effect that became apparent through the testing was that for the select-all query eliminating the lower level polyinstantiated tuples had a significant impact of performance (compare graph 1 to 2, 3 to 4, 5 to 6, 7 to 8). For a join query the impact was not as great (examine graphs 9 to 28). This is because when the lower level polyinstantiated tuples are removed before processing the join, fewer tuples are included in the join operation. Therefore, although some time is taken to check for polyinstantiation, some time is saved by processing fewer tuples in the join operation. A detailed analysis of the 28 graphs are given below.

100


14

Graph 1: As expected, the total CPU time increases with the number of tuples. Since node 2 does not have any tuples, the processing time in node 2 is due to the initial set-up of the system.

Graph 2: This graph shows that eliminating lower level polyinstantiated tuples has a significant impact on performance.

Graph 3: For the select-all query, when the query is posed at node 2, but the tuples are processed at node 1. The execution time that is shown for node 2 reflects the cost of transferring the data from one node to another.

Graph 4: Again the processing is done at node 1. This. is evidenced by the fact that the execution times for node 2 are almost identical in graphs 3 and 4. But the execution time for node 1 is much higher in graph 4, reflecting the cost of checking for polyinstantiations.

Graph5: The total CPU time increases almost linearly with the number of tuples. Since node 2 has 2000 tuples, the execution times for graph 5 are higher than those for graph 1.

Graph 6: When horizontal partitioning is present, and one requests to remove the lower level polyinstantiated tuples, checking for polyinstantiations occurs in two phases. First the lower level polyinstantiated tuples occurring within a node are removed. Then the necessary data are sent to the node which received the query and that node removes the lower level polyinstantiated tuples which occur across nodes. The CPU time for no& 2 is constant for each case as it only checks for local polyinstantiations for the same number of tuples.

Graph 7: Since tuples have to be moved from node 1 to node 2 and the number of tuples in node 1 is the variable, the execution times for graph 7 are higher ,than those for graph 5 when the number of tuples in node 1 increases to 2000 or more.

Graph 8: Unlike in graph 6, node 2 performs the check for polyinstantiation at the global level while node 1 checks for polyinstantiation only at the local level on varying data size.

Graph 9: As expected, the total CPU time increases with the number of tuples. Since node 2 does not have any tuples, the processing time in node 2 is due to the initial set-up of the system.

Graph 10: This graph shows that eliminating lower level polyinstantiated tuples has some impact on performance for the join query also (compare to graph 9). However, this is not so significant as in the case for the select-all query. This is because, for the join query, we tested with a smaller number of tuples, and also, when lower level polyinstantiated tuples are removed from the fragments, the number of tuples involved in the join operations also decreases.

Graph 11: For the join query, when the query is posed at node 2, most of the query processing is still performed at node 1. Therefore, the tuples are moved to node 1. The execution times for graph 11 are slightly higher than those for graph 9 because of the time taken to send the results of the query to node 2.

Graph 12: The relationships between graphs 9 and 11 are similar to the relationships between the graphs 10 and 12.

Graph13: Although node 1 does not have any tuples, according to our assumption, node 1 performs the join. Therefore, the tuples are moved from node 2 to node 1. The execution times for graph 3 are higher than those for graph 9 because of the number of tuples that have to be moved from node 2 to node 1.

Graph 14: Note that the CPU times for node 1 are identical to those for graph 13. Only node 2 is affected by the removal of lower level polyinstantiated tuples. This is because no data is stored in node 1 and removing lower level polyinstantiated tuples at the global level is not necessary. Node 1 gets the necessary data and performs the same operations as it did in graph 13.

Graph 15: The execution times for graph 15 are slightly higher than those for graph 13 because of the time taken to move the tuples to node 2 to be displayed.

Graph 16: The relationship between graphs 13 and 15 is similar to the relationship between graphs 14 and 16.

Graph 17: The number of tuples in node 2 is 800. Therefore the execution times for this graph are higher than those for graph 9, as more tuples have to be moved from node 2.

Graph 18: This graph shows that eliminating lower level polyinstantiated tuples causes the performance to drop. Note that as the number of tuples increases, the CPU time for node 2 remains constant. This is because node 2 performs only the local removal of the lower level polyinstantiated tuples. Node 1 removes lower level polyinstantiated tuples locally as well as globally.

Graph 19: The execution times for this graph are higher than those for graph 17 as the tuples have to be moved to node 2 to be displayed.

Graph 20: The relationship between graphs 17 and 19 is similar to the relationship between the graphs 18 and 20.

Graph 21: Although the number of tuples in node 1 remains constant, since the processing is performed at node 1. the tuples in node 2 are moved to node 1 for processing. The execution times for this graph are higher than those for graph 17 when the number of tuples in node 2 increases to 1600 or more. This is because more tuples have to be moved.

101


15

Graph 22: Note that as the number of tuples in node 2 increases, removing the lower level polyinstantiated tuples does not affect CPU time for node 1 as much (compare to graph 21). This is because, as more tuples are removed, fewer tuples are compared in the join operation and as a result fewer matchmgs are found.

Graph23: The execution times for this graph are higher than those for graph 21 as the tuples have to be moved to node 2 to be displayed. Also when compared to graph 19, we note that the execution times are higher for graph 23 when the number of tuples in node 2 increases to 1600 or more.

Graph24: The execution times for this graph are higher than those for graph 22 as the tuples have to be moved to node 2 to be displayed. Consider the graphs 19, 23, 20 and 24. One would expect the execution times for graph 24 to be higher than those for graphs 20 when the number of tuples in node 2 increases to more than some value. However, this does not happen. This is probably due to the way our algorithm handles polyinstantiation.

Graph25: Compare graph 25 to graph 17. The execution times for graph 17 are slightly higher than those for graph 25 as more tuples are involved in the join operation when polyinstantiation is present (even if the lower level polyinstantiated tuples do not have to be removed).

Graph 26: The relationship between graphs 17 and 25 is similar to the relationship between graphs 19 and 26.

Graph 27:

Graph 28:

The relationship between graphs 17 and

The relationship between graphs 17 and 25 is similar to the relationship between graphs 21 and 27.

25 is similar to the relationship between graphs 23 and 28.

4. CONCLUSION

In this paper we first described the issues involved in secure distributed query processing. Next we described a simple design of a TDDBMS and have discussed the implementation of the design for query processing. The TDDBMS that we implemented consisted of two nodes interconnected via a communication channel. Each node had a local TDBMS and a distributed query processor. The local TDBMS was implemented by augmenting a commercial relational database system with a front-end component. Our objectives in this implementation were (i) to validate the security policy, and (ii) to analyze the performance of secure query processing algorithms. From our simple analysis we were able to show that polyinstantiation as well as transfer of tuples between nodes has a significant impact on performance.

The following are the major limitations of the TDDBMS that we have implemented (1) the nodes are not connected over a communication network; (2) the local TDBMS is implemented on top of an untrusted commercial DBMS; (3) the system consisted of only two nodes. In order to develop a more realistic system, the distributed system

should consist of more than two nodes which are interconnected via a communication network. Furthermore, a secure commercial TDBMS should be used as the local TDBMS. Our future work will include such an implementation.

ACKNOWLEDGEMENT

The work reported in this paper was sponsored by the Department of the Navy (SPAWAR). It was carried out at the MITRE Corporation between February 1, 1989 and September 30. 1989. REFERENCES

[AFSB83] Air Force Studies Board, Committee on Multilevel Data Management Security, "MULTILEVEL DATA MANAGEMENT SECURITY," National Academy Press. 1983.

[BURN861 Bums, R., "Towards Practical MLS Database Management Systems Using the Integrity Lock Technology," Proceedings of the 9th National Computer Security Conference, MD, September 1986.

[CERI84] Ceri, S., and Pelagatti, G., "Distributed Databases, Principles and Systems," McGraw Hill, NY, 1984.

[DENN87] Denning, D. E., et al, "A Multilevel Relational Data Model," Proceedings of the IEEE Symposium on Security and Privacy, Oakland, CA, April 1987.

[DEW851 DeWitt, D., and Gerber, R., "Multiprocessor Hash-based Join Algorithms," Proceedings of the 1985 VLDB Conference. Stockholm, Sweden, August 1985. [GRAU82] Graubart, R.. and Woodward, J., "A Preliminary Naval Surveillance DBMS Security Model," Proceedings of the 1982 IEEE Symposium on Security and Privacy, Oakland, CA, April 1982.

[GRAU85] Graubart, R., and Duffy, K., "Design Overview for Retrofitting Integrity-lock Architecture onto a Commercial DBMS," Proceedings of the 1985 Symposium on Security and Privacy, Oakland, CA, April 1985.

[STAC90] Stachour, P., and Thuraisingham, M. B., "Design of LDV - A Multilevel Secure Database Management System," IEEE Transaction on Knowledge and Data Engineering, Vol. 2. #2,1990.

[SYBASE] "The Dataserver." Sybase Inc., 1989.

[THUR90] Thuraisingham, M.B., and Kamon, A., "Secure Query Processing in a Trusted Database Management System - Design and Performance Studies," Technical Paper MTP 292, The MITRE Corporation, June 1990 (also submitted for publication).

[WALK851 Walker, S.. "Network Security Overview." Proceedings of the 1985 IEEE Symposium on Security and Privacy, Oakland, CA, April 1985.

102


secure query processing in distributed database management

Documents