Download - Db Compare
-
8/2/2019 Db Compare
1/31
Comparison of
Oracle, MySQL and
PostgreSQL DBMS
in the context of ALICE needsWiktor Peryt, Warsaw University of Technology, Faculty of Physics
-
8/2/2019 Db Compare
2/31
We have taken the following approach: first of all we determined whatfeatures of DBMS are important from the point of view of such a large
experiment.
We chose the following features:
Elementary featuresbasic data types
SQL language features
declarative integrity constraints
programming abstractionsautomatic generation of identifiers
national characters support
-
8/2/2019 Db Compare
3/31
Transactions and multi-user access transactions
locks
multi-user access
Programming in database stored procedures
triggers
Elements of database administration access control
backup copies
data migration
-
8/2/2019 Db Compare
4/31
Portability and scalability portability of DBMS
scalability
Performance and VLDB(Very Large Databases) query optimization
structures supporting query optimization
support for analytical processing
allocation of disk space
data size limits
VLDB implementations
Distributed databases access to multiple databases heterogeneous systems support
-
8/2/2019 Db Compare
5/31
Distributed databases access to multiple databases heterogeneous systems support
Special data types large objects in database
post-relational extensions support for special data types
Application development and interfaces embedded SQL
standard interfaces, additional interfaces
interoperability with Web technology
XML, CASE
-
8/2/2019 Db Compare
6/31
Reliability failure recovery
Commercial issues technical support available
market position
Having completed step one we carried out subsequent work in 3subgroups; each of them dealt with only one DBMS.
The members of particular subgroups had their own practicalexperience with using DBMS being subject to investigation by theirsubgroup.
Such a procedure gave us the possibility of verifying information
contained in manuals and other documentation available (for instanceon Internet).
As a result 3 extended documents devoted to Oracle, MySQL and
PostgreSQL were created.
-
8/2/2019 Db Compare
7/31
Konrad Bohuszewicz undergraduate student
Maciej Czyzowicz undergraduate student
Michal Janik Ph.D. student
Dawid Jarosz undergraduate student
Piotr Mazan undergraduate student
Marcin Mierzejewski undergraduate student
Mikolaj Olszewski undergraduate student
Wiktor S. Peryt
Sylwester Radomski undergraduate student
Piotr Szarwas Ph.D. student
Tomasz Traczyk
Dominik Tukendorf undergraduate student
Jacek Wojcieszuk undergraduate student
Faculty of Electronics and Information Technology
Faculty of Mathematics and Information Sciences
Faculty of Physics
-
8/2/2019 Db Compare
8/31
About Comparison About Comparison
discussion by all people involved in this task compilation was made by Dr. Tomasz Traczyk
compilation circulated within the whole group a few times to make sure
we avoided some omissions or mistakes
this version of the document is accepted by all co-authors
we consider it a quite comprehensive and objective comparison
it contains also some kind of "weights" called by us "importance", with
differentiation for Central database and Lab-participants. Central
database should be a kind of data warehouse at CERN, containing all
the data, also data transferred from Lab-participants periodically
the term "Lab-participants" denotes smaller databases in labs involved
in ALICE experiment preparation
few explanations of terminology used in the database domain are also
included to make this document easy to comprehend for non-specialists
-
8/2/2019 Db Compare
9/31
Summary
Importance Assessment
Category Problem Central
database
Lab-
participants
MySQ L Oracle8 Pos tg reSQL
Basic data types C C B C A
SQL B B C B B
Declarative constraints B B C A A
Programming abstractions A C D A C
Generation of ids C C C A A
Elementary features
National chars B C B A B
Transactions A C D A A
Locks A C D A ATransactions
Multiuser access A D C A C
Programming in DBStored proce dures and
triggersB C D A A
Access control B D A A B
Backup A C C A CAdministrationData migration C C A B A
Portability B C B A BPortability and scalability
Scalability A C B A C
Query optimizatio n A C B A B
Structures supporting
optimizationB D D A B
Support for OLAP B D D A D
Allocation of the disk space A C C A C
Size limits A B B A C
Performance and VLDB
VLDB implementations A C D A B
Access to multiple databases C D C A C
Distributed databases Heterogeneous systems
support
B D D B D
Large objects B B B A C
Post-relational extensions C C D A BSpecial data types
Support for special data
typesC C D A C
Embedded SQL C C D A B
Standard interfaces B C B A B
Additional interfaces A A A A A
Web technology A A B A B
XML B C D A D
Application development and
interfaces
CASE B C D A D
Reliability Recovery A B C A C
Prices C A A D A
Technical support A B C B DCommercial issuesPosition on the market A C D A D
-
8/2/2019 Db Compare
10/31
Our preliminary conclusions:
for Central Data Repository for ALICE at CERN:
only ORACLE can be taken into account seriously
for Labs-participants (mainly for production phase databases):
Oracle is also the best but using MySQL or PostgreSQL ispossible
the choice one of them is not obvious at the moment
Some extended tests concerning MySQL and PostgreSQL performance,
stability etc. with real data for STAR SSD are still in progress in Warsaw.
They will be published in 1-2 weeks on the website:
http://ITS_DB_ALICE.if.pw.edu.pl
the same place for document Comparison of Oracle, MySQL and PostgreSQL DBMS
-
8/2/2019 Db Compare
11/31
Questions for ALICEQuestions for ALICE
How to start with databases for ALICE and how
to manage the project?
General concept of system architecture Databases in production phase
Software technologies recommended
DBMS platform choice
How to proceed?
-
8/2/2019 Db Compare
12/31
Databases types for ALICEDatabases types for ALICE
The following main categories of information should gointo databases:
production and assembly phase measurements anddescriptive dataProdPhase database
calibrations dataCalibration databaseconfiguration dataConfiguration databasedetector condition dataCondition databaserun logs dataRunLog database
geometry data (?)Geometry database or part ofCalibrationDB (?)
some others? ... to be defined later, during "phase one" work
-
8/2/2019 Db Compare
13/31
Databases contentsDatabases contents (1)(1)
ProdPhase database all information coming from test-beds, from manufacturers,
assembly processes, object flow between manufacturers andlabs, etc.
RunLog database
to store the summary information describing the contents of an
experimental run and to point the locations where detailed information
associated with the run is stored
Example of Web based interface developed by Sylwester Radomski (undergraduate
student from Faculty of Physics, WUT) for STAR can be seen on
http://www.star.bnl.gov -> Computingand from tableNew thefirst item
-
8/2/2019 Db Compare
14/31
The environment in which the archive facility operates iscomposed of many sources of information
We have to deal with data:
produced by various test-bench systems
entered manually by operators submitted by collaborating institutes and companies
Usually there is a number of distinct data formats
Files are stored in many locations
Consequently, without database: it is not only hard to locate the right piece of information but also to
ensure the safety and good quality of data
Why database in production phase?Why database in production phase?
-
8/2/2019 Db Compare
15/31
secure archiving of all the test results in repositorysecure archiving of all the test results in repository
easy availability of info upon location of objects (in geographic
sense: manufacturers, labs)makes the assembly arrangement
easier
creating the possibility of automatic assignment of qualityattributes according to the well defined criteria
statistical analysis of the quality should be made easily and at
any time
preparing data for future on-line use by slow-control, DCS and
DAQ
easy access to all data during production and assembly phase
In the future - easy access to all data during experiment run
Goals for production phase databaseGoals for production phase database
-
8/2/2019 Db Compare
16/31
Basic requirements: data should be stored in central repository to make easy
and reliable the management and maintenance
access to the data should be assured for everybody
which participates in tests during production phase, i.e.software allowing use of WEB browsers is necessary
objects' registration should be possible manually (by
operator with suitable privileges) as well as
automatically (from LabVIEW application, for example or
other software) The software should allow creating (SQL) queries to the
database even for inexperienced users
DB production phase
-
8/2/2019 Db Compare
17/31
From the point of view of domain experts ..... (1)(1)
there is an ever-increasing demand for centralized storage
of data for consistent and easy to use search and retrieval
facilities
experts want to be able to retrieve and analyze the
information in a user-friendly way, regardless of its origin
They do not want to be forcedto perform several queries just
because data in question was taken by different dataacquisition systems
-
8/2/2019 Db Compare
18/31
From the point of view of domain experts ..... (2)(2)
they wish to do statistics on data sets spanning months (and
more)without having to browse tens of subdirectories on
backup storage devices
usually - they prefer to use industry-standard, versatilesoftware tools to process and analyze data
they certainly would not mind should they be able to
automate their routine,everyday tasks
Their task is to lookTheir task is to look atat the information, not to lookthe information, not to look forfor itit
-
8/2/2019 Db Compare
19/31
we should address those issues by providing amodular framework for archiving and for platform-independent retrieval of data in heterogeneousdistributed computing environment
our database system must be open enough tofollow inevitable evolution of information gatheringsystems related to the development of the particulardetectors
we should be able to cope with the fast evolving newInternet technologies in order to take full advantage offacilities they provide
Requirements addressed to software developersRequirements addressed to software developers
-
8/2/2019 Db Compare
20/31
Use of: PHP4 software running on the server side
C/C++ for API
JAVA + SWING & JDBC for applications requiringmore interactivity
(JDBC = JAVA DataBase Connectivity)
" seems to be the right choice ofseems to be the right choice of tools used for clientside software development
DB for STAR -DB for STAR - software technologies usedsoftware technologies used
-
8/2/2019 Db Compare
21/31
On the flight plots creation ...On the flight plots creation ...
-
8/2/2019 Db Compare
22/31
From SQL query to plot......
Generation of plots and histograms from database andputting them on the Web.Attempt made by S. Radomski:Data chain:Http server (Apache - Tomacat) calls servlet (dbPlot) withparameter - SQL query.Servlet in http server connects to ROOT based server
through socket and sends queryROOT server means ROOT script which handles connectionsand scripting dbPlot class.dbPlot::Init() reuse existing connection to database orcreates new one if the old one does not exist.dbPlot::TakeData() server sends query to DB and takes datausing TSQLServer class.dbPlot::TakeData() takes data from TSQLResult and putthem to TNtuple. This function can recognise and parse'private' format of data stored in BLOB.dbPlot::PlotData() calls TTree->Draw() with proper
parameters.dbPlot::Style() set colors and labels.
-
8/2/2019 Db Compare
23/31
Performance and problems ......
One histogram takes about 1-2 sec.Slowest element in the chain - convert. Convertmakes use of GhostScript. Creation of PostScriptand then conversion to PNG is overcomplicatedand rather simple TGrph with ~758 lines takes ~10
sROOT cannot generate Gif in -b mode.Problems with memory deallocation in ROOT -after about 100 plots ROOT crashes.Modification of Draw() in Ttree:
In 1-D Histogram Draw() always makes 100 bins.If data has its own grid (measurement precision)plots look terribly - especially when histogrammingintegers.Small modification in TTreePlayer permits to
recognize if data are gridy and sets number of
-
8/2/2019 Db Compare
24/31
Typical architecture for local site/lab i.e. Lab-participantTypical architecture for local site/lab i.e. Lab-participant
measurements are performed on
dedicated computer
data are transferred over Ethernet
local network to database
users can access the measurements
by means of JAVA applets or PHP
applications
graphical user interface make the
construction of complex queries
easy even for user with no database
experience
another capability of this applet isthe visualisation of selected data
it is clear that using JAVA, JDBC and
PHP allows to access the database
over the Internet or local network
with user's favourite browser
DB server
(daemon)
repository
JAVA appletPHP
applications
LabVIEW
application
DUT
ROOT
or AliROOT
-
8/2/2019 Db Compare
25/31
Production phase database for ALICEProduction phase database for ALICE
MySQL
repository
MySQL server
(daemon)
JAVA applet PHP
applications
LabVIEW
application
DUT
ROOT
or AliROOT
Lab 3
Lab 2
Lab 4 Lab n
DATA
repository
Data services
Data Archive
Server LibraryORACLE server
(daemon)
Application services
DBMS
AliROOT
CERN
Lab1
somewhere
in Europe
.....
C CC t l d t h t CERN
-
8/2/2019 Db Compare
26/31
Central data warehouse at CERNCentral data warehouse at CERNthree tier architecturethree tier architecture
one can easily distinguish thethree logical tiers - according to
present tendencies: client layer,
application services layer and
data services layer each layer contains several
components (not all shown on the
picture) top level is a layer containing
client applications, responsible for
data transfer into database and
visualisation the middle layer is composed of
application services; this layerknows the logical structure and
physical locations of data the bottom layer contains data
and Database Management
System
Data Archive
Server Library
DATA
repository
Data services
ORACLE server
Application services
Filters
Generic
dataloader
Custom
data loader
Interactive software:
WWW browsers,
JAVA applets,
PHP, HTML,
command line utilities
etc...
Client layermodules and applications
DBMS
AliROOT
-
8/2/2019 Db Compare
27/31
Project should be managed in few phases
Project is large so I strongly suggest to apply
methodology proven in
"commercial environment"
How to start with databases for ALICE?How to start with databases for ALICE?
-
8/2/2019 Db Compare
28/31
How to manage the project?How to manage the project? (1)(1)
Phase 1: strategy (or planning) for the WHOLE
project:determination of scope of the project
partitioning into subsystems (natural way: subdetectors, but not only)
formulation of general models which could be applied
creation of list of actors/participants
approximate time schedule for particular tasks
initial choice of software technologies
H t th j t?H t th j t? (2)(2)
-
8/2/2019 Db Compare
29/31
Successive phases should be performed in "spiral
cycle"
It simply means that particular subsystems
are elaborated successively.
Each subsystem must go through the
following phases:
analysis/conceptual design software design development (programming) implementation
improvements/corrections in earlier completed subsystems
must be continued during the work on successive subsystems
simultaneous work on several subsystems is a good practice;
How to manage the project?How to manage the project? (2)(2)
H t th j t?H t th j t? (3)(3)
-
8/2/2019 Db Compare
30/31
work on pilot project - in parallel to the main one; thesame software technology, it should contain most urgentthings
efficiency tests; creation of "simulated data" withcapacity volumes similar to the expected ones
creation of"conceptual models"during the analysisphase is necessary before design of subsystems; theappropriate formalism and class CASE tools are neededfor that. For linux - UML (UnifiedModellingLanguage) is aappropriate option
elaboration for the whole project of such standards as:system of keys, terminology, security, access rights etc.
How to manage the project?How to manage the project? (3)(3)
Fi t tFi t t
-
8/2/2019 Db Compare
31/31
First steps ...First steps ...
Start to formally organize database central group for ALICE After that: begin phase 1 of the project, i.e. strategy for the WHOLE
project/experiment
Partial, of highest priority tasks for this group: determination of scope of the project formulation of general models which could be applied creation of list of actors/participants(including 1-2 representatives from
each subdetector!)
initial choice of software technologies which could be used partitioning into subsystems analysis/conceptual design