picodbms: scaling down database techniques for the samartcard

PicoDBMS: Scaling down Database Techniques for the

Samartcard

Abstract• Smartcards are the morst secure computing device today

• As smartcadrs get more powerful, and become multi application, the need for database management arises.

• However smartcards have severe hardware limitations

The mojor problem is

SCALING DOWN DATABASE SCALING DOWN DATABASE TECHNIQUES SO THEY PERFORM TECHNIQUES SO THEY PERFORM

WELL UNDER THESE LIMITATIONSWELL UNDER THESE LIMITATIONS

Outline

• Introduction• Demand for database in smartcard

applications• Analysis of smartcard hardware

limitations&problem definition• PicoDBMS storage model and query

execution model• Conclusion

Introduction

• Smartcards are the most secure portable device today

• The first smartcard was developed by BULL for the French banking system

• Smartcards can be used in many applications involving money, proprietary data, personal data, TV or GSM subsciber, identification, healthcare,etc...

• Standards for multiapplicacion support

– JavaCard

– Microsoft’s smartcard for Windows

Why do we need DBMS on smartcards?• Multiapplication, versatility

• Improvement of prosessor power

• Volume of data

• Complexity of queries

• Meet Transaction ACID properties

More generally;

DATABASE MANAGEMENT HELPS TO SEPERATE DATA MANAGEMENT CODE FROM APPLICATION CODE, THEREBY MAKING APPLICATION CODE SMALLER

ACID properties

• Atomicity: In a transaction involving two or more discrete pieces of information, either all of the pieces are committed or none are.

• Consistency: A transaction either creates a new and valid state of data, or, if any failure occurs, returns all data to its state before the transaction was started

• Isolation: A transaction in process and not yet committed must remain isolated from any other transaction

• Durability: Committed data is saved by the system such that, even in the event of a failure and system restart, the data is available in its correct state

Hardware limitations due to

• Small size• Flexible plastic card• Low cost• Security

Sample Smartcard Block Diagram

Hardware Limitations• Memory bottleneck (96Kb of ROM, 4Kb of RAM,

128Kb of stable storage like EEPROM)• EEPROM has a very fast read time but very slow

write time (10msword)• Lack of autonomy, no asenchronious and

disconnected prosessing

With these extreme constraints of the smartcard, the major problem is scaling down database techniques

Some solutions to scaling down

• Sysbase Adaptive Server Anywere• Oracle 8i Lite• DB2 Anywhere• ISOL’s SQL Java machine DBMS• SCQL, The standart for smartcard database

language

PDA

Smartcard Applications

• Database Management Requirements

• Case Study:The Healthcard application

Types of smartcard applications• Money and identification (Credit card)

Tiny volume of data , automicity requirement

• Downloadable databases (list of hotels)

High volume of data, availibility requirement

• User environment (address book)

Medium volume of data, access right, automicity,durability requirement

• Personal folders (health care, car maintenance)

High volume of data, access rights

The Health Card application

• Personal Folder Application

• High volume of data is required ( Doctor info, blood type, prescriptions)

• Different users can query, modify, create data (Doctors, insurance agents)

Requirements due to observations

• Amount of data is significant• Queries can be rather complex (Ex: doctor asks for

the last antibiotic prescribed to the patient)• Sophisticated access right management

(aggregation, views)• Automicity must be preserved• Durability is mandatory

Storing database in a smart card rather than centralized database contributes data availability, security and simplicity

Problem formulation

• Smartcard constraints

• Impact on the PicoDBMS Architecture

• Problem Statement

Smartcard Constraints

• 32 bir RISC processor about 30 MIPS (monolithic chip)• Very limited storage capacity• Very slow write time in EEPROM• Extremely reduced size of RAM• Lack of autonomy• High security level• Tiny die size (about 25 mm²)• Low cost (less than 5 dollars)

2

As the chip size is limited MORE RAM means LESS EEPROM

How Smartcard properties impact on DBMS Architecture

• Highly Secure – access right management

• Highly portable- high availability

• Limited storage resources-memory consumption during execution

• Stable storage is not main main memory

• Non autonomous

Design Components for a PicoDBMS

• Storage manager

• Query manager

• Transaction manager

• Access right manager

Desin criterias for Smartcard DBMS

• Compactness rule (data,index)

• RAM Rule

• Write Rule

• Read Rule

• Access Rule

• Security Rule

PicoDBMS Storage Model

• Existing Storage Models

• Storage Cost Evaluation

Existing Storage Models

• Flat storage

• Domain Storage

• Ring Storage

Flat Storage

• Simplest way to organize data

• Tuples are stored sequentially

• Has 2 drawbacks– Space consuming

– Inefficient

• Indexed FS

Relation R

A xxxx

A yyyy

Domain Storage

Relation RRelation S

Value n

Value 2

Value 1

• Data Compactness

• The amount of data to be written is much smaller

• The length of the attribute and length of the pointer

• Cardinality of the relation

• All tuples of all relations become fixed size

• Tuple to value pointers

Ring Storage

• Index compactness + data compactness

• Storing Value to Tuple pointers instead of Tuple to Value Pointers

• One pointer per domain value whatever the cardinality of the relation is

Value n

Value 2

Value 1

Index on S.a

Join Indices

• Typically based on key attributes

• Ring index on a foreign key attribute

• Unidirectional storage model (Domain Storage)

• Bidirectional storage Model (Ring Index)

• The cost of bi-directional join is restricted to a single pointer per R tuple

Storage Cost EvaluationPicoDBMS storage model combines FS, DS,

and RS.

-Need for indexing attributes - the choice is between RS and Indexed FS.

- No need for indexing attributes- the choice is between FS and DS.

Comparison of the storage cost for a single attribute

• CardRel: cardinality of the relation holding the attribute

• A: Average length of attribute (bytes)

• P: Pointer size (3 bytes..)

• S:selectivity factor of the attribute =<1 (redundancy of the attribute) S=CardDom/CardRel

• CardDom=Cardinality of the attribute domain extensions

•Cost(FS) =Cardrel*a

•Cost(DS) =cardRel*p+ S*cardRel*a

•Cost(Indexed-FS) =Cost(FS)+ S*Cardrel*a+ Cardrel*p

•Cost(RS) =Cost(DS)+ S*cardRel*P

Domain Storage CostRelation R Relation

S

Value 1

Value 2

Value n

•Cost(DS) =cardRel*p+ S*cardRel*a

Ring Storage CostRelation S

S.aIndex on S.a•Rind index on a regular attribute

•Cost(RS) =Cost(DS)+ S*cardRel*P

FS=DS Curve (without ring index)

Cost(FS)=Cost(DS)

S=(a-p)/a

Where p=3

RS=DS Curve with Index

Cost(rS)=Cost(DS)

S=a/p

Where p=3

0 2 4 6 8 10 13 16 19 22 25 28 310

0.2

0.4

0.6

0.8

1

Length in bytes

S

RS more compact

FS

Query Processing

Traditional Query Processing uses large main memory for storing

• Intermediate Results

• Temporary data structures

They use state of the art algorithms to resort metarialization on disk to avoid memory overflow

Query Processing

• Basic Query Execution without RAM

• Complex Query Execution withot RAM

Why PicoDBMS cannot use those state of the art algorithms

• Lifetime of stable memory

• Unable to estimate RAM size

• Those algorithms are complex

•TO SOLVE THIS PROBLEM, WE PROPOSE QUERY PROCESSING TECHNIQUES THAT DO

NOT USE ANY WORKING RAM NOR INCUR ANY WRITES IN THE STABLE MEMORY

Basic Query Exeqution

• Query Optimizer generates Optimal Query Execution Plan (QEP)

• Query Engine

a library of relational operators

• Execution model

uses

implements

query parser and translator

query output

evaluation engine

relational algebra expression

execution plan

optimizer

data statistics about data

Basic Query Execution

Query Optimizer can consider different shapes of query execution plan

• Left-deep tree

• Right-deep tree

• Bushy trees

• Extreme right-deep tree (PicoDBMS offered)

Example:Healtcare relations

•Doctor(DocId,name,specialty,...)•Prescription(VisitId,DrugId,qty,...)•Visit(VisitId,DocId,date,diagnostics,...)•Drug(DrugId,name,type)

Q1:WHO PERSCRIBEN ANTIBIOTICS IN 1999

Left-deep tree

• Operators are executed sequentially

• Each intermediate result is metarialized

Ϭ

⋈

⋈

doc

⋈

Ϭvisitpresc

drug

Q1:WHO PERSCRIBED ANTIBIOTICS IN 1999

Right-deep tree•Execute operators in a pipelined fashion (avoid intermediate metarialization

•All left- operands must be metarialized

⋈

⋈doc

visit

⋈

drug

presc

Ϭ

Ϭ

Q1:WHO PERSCRIBEDANTIBIOTICS IN 1999

Bushy trees

⋈

⋈

⋈

Ϭ

visit doc

Ϭ

drug presc.

Offer opurtinities to deal with the size of intermediate results and memory consumption

Q1:WHO PERSCRIBEDANTIBIOTICS IN 1999

Iterator model

• Pipeline execution can be easily achieved using Iterator Model.

• Each operator is an iterator that supports 3 procedure calls– Open-prepare an operator for producing an item– Next- to produce an item– Close-perform final clean up

• Starts from the root.

Extreme right deep trees

Ϭ

⋈

doc

visit

⋈

presc

drug

Ϭ

⋈

•Uses pipelining on every operator(including select)

•As left operands are always base relations, they are already metarialized in stable memory.

NO RAM CONSUMPTION

Q1:WHO PERSCRIBED ANTIBIOTICS IN 1999

The cost of select,project and join with extreme right deep trees• These operators(SPJ) can be executed either sequentially

or with a ring index

• Select-ring index on select is allowed in the first base operation.(drug.type)

• Project-project operator are pushed up to the tree since no metarialization is required

• Join-The cost of the join depends on the way indices are traversed.– Unidirectional index

– Bidirectional index

Unidirectional index join cost

•m•n

•Doctor Doc.Id •Doc.Id Visit

•Start with Doctor--m*n•Start with Visit--m

Bidirectional index join cost

•m•n

•Doc.Id Visit

•Start with Doctor--m+n•Start with Visit--m*m/2n

•Doctor Doc.Id

Complex Query Execution without RAMComlex Query Operators:

• Aggregate(count,max,min)

• Sort

• Duplicate removal

The intermediate results need to be metarialized.Due to the No RAM rule such metarialization cannot occur in PicoDBMS.

PicoDBMS solution

• Aggregate and duplicate removal can be done in pipeline if the incoming tuples are yet grouped by gistinct values

• Pipeline operators are order-preserving since they consume tupls in the arrival order.

THUS, ENFORCING AN ADEQUATE CONSUMPTION AT THE LEAF OF THE EXECUTİON TREE ALLOWS

PIPELIED AGGREGATION AND DUPLICATE REMOVAL

Q2:Number of Antibiotics prescribed per doctor in1999

•Doctor contains distinct doctors•Aggregation

presc

count

Ϭ

⋈

Ϭ

⋈

⋈

drug

visit doc

Q3:Number of prescription per type of drug

•An additional join is required between relation Drug and domain Drug.type

count

⋈

⋈presc

drug drug.type

Q4:Number of prescriptions per doctor and type of drug

•Cartesian make n times more response time where n is the number of distinct type of drugs•“group by”

count

⋈

⋈drug

presc ⋈

visit

drug.type

x

doc

Q5:Distinct doctors and type of drug

•Q5 retreives the distinct couples of doctor and type of precsribed drugs

distinct

⋈

⋈drug

presc ⋈

visit

drug.type

x

doc

Performance Evaluation

• Reasonable response time is the main concern

• Evaluating response time in samrtcards is complex depending on platform parameters (Processor speed, caching strategy, RAM and EEPROM speed)

• Which acceleration techniques are really mandatory

– ring indices

– query optimization

Test Platform

• Intel 486 (with no cache, CISC)

• 3 types of data volumes is considered (Small, medium,large)

• for the platform, the system parameters (clock frequency, cache) are varied and experimentation test is performed) (best/worst configuration)

• Ring indices are tested, how it makes change in the response time

• 2 query executions are tested

– Q1:Who prescribes antibiotics in 1999

– Q4:Number of prescriptions per doctor and type of drugs

Performance results of Q1:Who prescribes antibiotics in 1999

0

0.2

0.4

0.6

0.8

1

Small DB Medium DB Large DB

no ring-worst

no ring-best

ring-worst

ring-best

Performance results of Q4: Number of prescriptions per doctor and type of drugs

0

5

10

15

20

25

Small DB MediumDB

Large DB

no ring-worstno ring-bestring-worstring-best

Test Results

• Join indices are mandatory in all classes

• Query optimiztion is useful only with large databases and complex queries

CONCLUSION

As smartcards becomebecome more and more versatile, multiapplications and powerful, the need for database techniques arises. Performance evaluations show that PicoDBMS’s query and execution models meet the smartcard requirments.

picodbms: scaling down database techniques for the samartcard

Documents

personal data

proprietary data

observationsamount of

data doctors

valid state of data

seperate data management

transaction durability

multi application