picodbms: scaling down database techniques for the samartcard
TRANSCRIPT
PicoDBMS: Scaling down Database Techniques for the
Samartcard
Abstract• Smartcards are the morst secure computing device today
• As smartcadrs get more powerful, and become multi application, the need for database management arises.
• However smartcards have severe hardware limitations
The mojor problem is
SCALING DOWN DATABASE SCALING DOWN DATABASE TECHNIQUES SO THEY PERFORM TECHNIQUES SO THEY PERFORM
WELL UNDER THESE LIMITATIONSWELL UNDER THESE LIMITATIONS
Outline
• Introduction• Demand for database in smartcard
applications• Analysis of smartcard hardware
limitations&problem definition• PicoDBMS storage model and query
execution model• Conclusion
Introduction
• Smartcards are the most secure portable device today
• The first smartcard was developed by BULL for the French banking system
• Smartcards can be used in many applications involving money, proprietary data, personal data, TV or GSM subsciber, identification, healthcare,etc...
• Standards for multiapplicacion support
– JavaCard
– Microsoft’s smartcard for Windows
Why do we need DBMS on smartcards?• Multiapplication, versatility
• Improvement of prosessor power
• Volume of data
• Complexity of queries
• Meet Transaction ACID properties
More generally;
DATABASE MANAGEMENT HELPS TO SEPERATE DATA MANAGEMENT CODE FROM APPLICATION CODE, THEREBY MAKING APPLICATION CODE SMALLER
ACID properties
• Atomicity: In a transaction involving two or more discrete pieces of information, either all of the pieces are committed or none are.
• Consistency: A transaction either creates a new and valid state of data, or, if any failure occurs, returns all data to its state before the transaction was started
• Isolation: A transaction in process and not yet committed must remain isolated from any other transaction
• Durability: Committed data is saved by the system such that, even in the event of a failure and system restart, the data is available in its correct state
Hardware limitations due to
• Small size• Flexible plastic card• Low cost• Security
Sample Smartcard Block Diagram
Hardware Limitations• Memory bottleneck (96Kb of ROM, 4Kb of RAM,
128Kb of stable storage like EEPROM)• EEPROM has a very fast read time but very slow
write time (10msword)• Lack of autonomy, no asenchronious and
disconnected prosessing
With these extreme constraints of the smartcard, the major problem is scaling down database techniques
Some solutions to scaling down
• Sysbase Adaptive Server Anywere• Oracle 8i Lite• DB2 Anywhere• ISOL’s SQL Java machine DBMS• SCQL, The standart for smartcard database
language
PDA
Smartcard Applications
• Database Management Requirements
• Case Study:The Healthcard application
Types of smartcard applications• Money and identification (Credit card)
Tiny volume of data , automicity requirement
• Downloadable databases (list of hotels)
High volume of data, availibility requirement
• User environment (address book)
Medium volume of data, access right, automicity,durability requirement
• Personal folders (health care, car maintenance)
High volume of data, access rights
The Health Card application
• Personal Folder Application
• High volume of data is required ( Doctor info, blood type, prescriptions)
• Different users can query, modify, create data (Doctors, insurance agents)
Requirements due to observations
• Amount of data is significant• Queries can be rather complex (Ex: doctor asks for
the last antibiotic prescribed to the patient)• Sophisticated access right management
(aggregation, views)• Automicity must be preserved• Durability is mandatory
Storing database in a smart card rather than centralized database contributes data availability, security and simplicity
Problem formulation
• Smartcard constraints
• Impact on the PicoDBMS Architecture
• Problem Statement
Smartcard Constraints
• 32 bir RISC processor about 30 MIPS (monolithic chip)• Very limited storage capacity• Very slow write time in EEPROM• Extremely reduced size of RAM• Lack of autonomy• High security level• Tiny die size (about 25 mm²)• Low cost (less than 5 dollars)
2
As the chip size is limited MORE RAM means LESS EEPROM
How Smartcard properties impact on DBMS Architecture
• Highly Secure – access right management
• Highly portable- high availability
• Limited storage resources-memory consumption during execution
• Stable storage is not main main memory
• Non autonomous
Design Components for a PicoDBMS
• Storage manager
• Query manager
• Transaction manager
• Access right manager
Desin criterias for Smartcard DBMS
• Compactness rule (data,index)
• RAM Rule
• Write Rule
• Read Rule
• Access Rule
• Security Rule
PicoDBMS Storage Model
• Existing Storage Models
• Storage Cost Evaluation
Existing Storage Models
• Flat storage
• Domain Storage
• Ring Storage
Flat Storage
• Simplest way to organize data
• Tuples are stored sequentially
• Has 2 drawbacks– Space consuming
– Inefficient
• Indexed FS
Relation R
A xxxx
A yyyy
Domain Storage
Relation RRelation S
Value n
Value 2
Value 1
• Data Compactness
• The amount of data to be written is much smaller
• The length of the attribute and length of the pointer
• Cardinality of the relation
• All tuples of all relations become fixed size
• Tuple to value pointers
Ring Storage
• Index compactness + data compactness
• Storing Value to Tuple pointers instead of Tuple to Value Pointers
• One pointer per domain value whatever the cardinality of the relation is
Value n
Value 2
Value 1
Index on S.a
Join Indices
• Typically based on key attributes
• Ring index on a foreign key attribute
• Unidirectional storage model (Domain Storage)
• Bidirectional storage Model (Ring Index)
• The cost of bi-directional join is restricted to a single pointer per R tuple
Storage Cost EvaluationPicoDBMS storage model combines FS, DS,
and RS.
-Need for indexing attributes - the choice is between RS and Indexed FS.
- No need for indexing attributes- the choice is between FS and DS.
Comparison of the storage cost for a single attribute
• CardRel: cardinality of the relation holding the attribute
• A: Average length of attribute (bytes)
• P: Pointer size (3 bytes..)
• S:selectivity factor of the attribute =<1 (redundancy of the attribute) S=CardDom/CardRel
• CardDom=Cardinality of the attribute domain extensions
•Cost(FS) =Cardrel*a
•Cost(DS) =cardRel*p+ S*cardRel*a
•Cost(Indexed-FS) =Cost(FS)+ S*Cardrel*a+ Cardrel*p
•Cost(RS) =Cost(DS)+ S*cardRel*P
Domain Storage CostRelation R Relation
S
Value 1
Value 2
Value n
•Cost(DS) =cardRel*p+ S*cardRel*a
Ring Storage CostRelation S
S.aIndex on S.a•Rind index on a regular attribute
•Cost(RS) =Cost(DS)+ S*cardRel*P
FS=DS Curve (without ring index)
Cost(FS)=Cost(DS)
S=(a-p)/a
Where p=3
RS=DS Curve with Index
Cost(rS)=Cost(DS)
S=a/p
Where p=3
0 2 4 6 8 10 13 16 19 22 25 28 310
0.2
0.4
0.6
0.8
1
Length in bytes
S
RS more compact
FS
Query Processing
Traditional Query Processing uses large main memory for storing
• Intermediate Results
• Temporary data structures
They use state of the art algorithms to resort metarialization on disk to avoid memory overflow
Query Processing
• Basic Query Execution without RAM
• Complex Query Execution withot RAM
Why PicoDBMS cannot use those state of the art algorithms
• Lifetime of stable memory
• Unable to estimate RAM size
• Those algorithms are complex
•TO SOLVE THIS PROBLEM, WE PROPOSE QUERY PROCESSING TECHNIQUES THAT DO
NOT USE ANY WORKING RAM NOR INCUR ANY WRITES IN THE STABLE MEMORY
Basic Query Exeqution
• Query Optimizer generates Optimal Query Execution Plan (QEP)
• Query Engine
a library of relational operators
• Execution model
uses
implements
query parser and translator
query output
evaluation engine
relational algebra expression
execution plan
optimizer
data statistics about data
Basic Query Execution
Query Optimizer can consider different shapes of query execution plan
• Left-deep tree
• Right-deep tree
• Bushy trees
• Extreme right-deep tree (PicoDBMS offered)
Example:Healtcare relations
•Doctor(DocId,name,specialty,...)•Prescription(VisitId,DrugId,qty,...)•Visit(VisitId,DocId,date,diagnostics,...)•Drug(DrugId,name,type)
Q1:WHO PERSCRIBEN ANTIBIOTICS IN 1999
Left-deep tree
• Operators are executed sequentially
• Each intermediate result is metarialized
Ϭ
⋈
⋈
doc
⋈
Ϭvisitpresc
drug
Q1:WHO PERSCRIBED ANTIBIOTICS IN 1999
Right-deep tree•Execute operators in a pipelined fashion (avoid intermediate metarialization
•All left- operands must be metarialized
⋈
⋈doc
visit
⋈
drug
presc
Ϭ
Ϭ
Q1:WHO PERSCRIBEDANTIBIOTICS IN 1999
Bushy trees
⋈
⋈
⋈
Ϭ
visit doc
Ϭ
drug presc.
Offer opurtinities to deal with the size of intermediate results and memory consumption
Q1:WHO PERSCRIBEDANTIBIOTICS IN 1999
Iterator model
• Pipeline execution can be easily achieved using Iterator Model.
• Each operator is an iterator that supports 3 procedure calls– Open-prepare an operator for producing an item– Next- to produce an item– Close-perform final clean up
• Starts from the root.
Extreme right deep trees
Ϭ
⋈
doc
visit
⋈
presc
drug
Ϭ
⋈
•Uses pipelining on every operator(including select)
•As left operands are always base relations, they are already metarialized in stable memory.
NO RAM CONSUMPTION
Q1:WHO PERSCRIBED ANTIBIOTICS IN 1999
The cost of select,project and join with extreme right deep trees• These operators(SPJ) can be executed either sequentially
or with a ring index
• Select-ring index on select is allowed in the first base operation.(drug.type)
• Project-project operator are pushed up to the tree since no metarialization is required
• Join-The cost of the join depends on the way indices are traversed.– Unidirectional index
– Bidirectional index
Unidirectional index join cost
•m•n
•Doctor Doc.Id •Doc.Id Visit
•Start with Doctor--m*n•Start with Visit--m
Bidirectional index join cost
•m•n
•Doc.Id Visit
•Start with Doctor--m+n•Start with Visit--m*m/2n
•Doctor Doc.Id
Complex Query Execution without RAMComlex Query Operators:
• Aggregate(count,max,min)
• Sort
• Duplicate removal
The intermediate results need to be metarialized.Due to the No RAM rule such metarialization cannot occur in PicoDBMS.
PicoDBMS solution
• Aggregate and duplicate removal can be done in pipeline if the incoming tuples are yet grouped by gistinct values
• Pipeline operators are order-preserving since they consume tupls in the arrival order.
THUS, ENFORCING AN ADEQUATE CONSUMPTION AT THE LEAF OF THE EXECUTİON TREE ALLOWS
PIPELIED AGGREGATION AND DUPLICATE REMOVAL
Q2:Number of Antibiotics prescribed per doctor in1999
•Doctor contains distinct doctors•Aggregation
presc
count
Ϭ
⋈
Ϭ
⋈
⋈
drug
visit doc
Q3:Number of prescription per type of drug
•An additional join is required between relation Drug and domain Drug.type
count
⋈
⋈presc
drug drug.type
Q4:Number of prescriptions per doctor and type of drug
•Cartesian make n times more response time where n is the number of distinct type of drugs•“group by”
count
⋈
⋈drug
presc ⋈
visit
drug.type
x
doc
Q5:Distinct doctors and type of drug
•Q5 retreives the distinct couples of doctor and type of precsribed drugs
distinct
⋈
⋈drug
presc ⋈
visit
drug.type
x
doc
Performance Evaluation
• Reasonable response time is the main concern
• Evaluating response time in samrtcards is complex depending on platform parameters (Processor speed, caching strategy, RAM and EEPROM speed)
• Which acceleration techniques are really mandatory
– ring indices
– query optimization
Test Platform
• Intel 486 (with no cache, CISC)
• 3 types of data volumes is considered (Small, medium,large)
• for the platform, the system parameters (clock frequency, cache) are varied and experimentation test is performed) (best/worst configuration)
• Ring indices are tested, how it makes change in the response time
• 2 query executions are tested
– Q1:Who prescribes antibiotics in 1999
– Q4:Number of prescriptions per doctor and type of drugs
Performance results of Q1:Who prescribes antibiotics in 1999
0
0.2
0.4
0.6
0.8
1
Small DB Medium DB Large DB
no ring-worst
no ring-best
ring-worst
ring-best
Performance results of Q4: Number of prescriptions per doctor and type of drugs
0
5
10
15
20
25
Small DB MediumDB
Large DB
no ring-worstno ring-bestring-worstring-best
Test Results
• Join indices are mandatory in all classes
• Query optimiztion is useful only with large databases and complex queries
CONCLUSION
As smartcards becomebecome more and more versatile, multiapplications and powerful, the need for database techniques arises. Performance evaluations show that PicoDBMS’s query and execution models meet the smartcard requirments.