using nosql technologies for handling of the cms conditions
TRANSCRIPT
![Page 1: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/1.jpg)
Using NoSQL technologies for handling of the CMS Conditions
Roland Sipos for the CMS Collaboration
Forum on Concurrent Programming Models and Frameworks17 June 2015
![Page 2: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/2.jpg)
Overview● Intro
○ CMS Conditions Database○ NoSQL
● Candidates○ Test framework○ Deployment
● Results● Outlook
2
![Page 3: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/3.jpg)
Intro3
![Page 4: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/4.jpg)
Conditions DatabaseAlignment and Calibration constants, that record a given “state” of the CMS Detector.
Essential for the analysis and reconstruction of the recorded data.
Also critical for the dataflow and need to be properly re-synchronized during the data processing.
4
![Page 5: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/5.jpg)
CondDB - DetailsConditions are free from:● Full table scans
○ Only “by key” (or range of keys) access● Joins● Complex, nested queries● Transactions
○ Data is written once, and never deleted, altered● Absolute consistency
○ Only consistency criteria: newly appended data should be available for reads ASAP! (in less than few seconds)
5
![Page 6: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/6.jpg)
CondDB - MotivationsFind alternative data storing technologies for the CMS Conditions data for:● Storing BLOBs● And it’s meta data● In a read-heavy environmentFurther requirements:● Durability● High availability● (Optional scalability)
Do we really need relational access for such use-case?6
![Page 7: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/7.jpg)
Relational vs. Non-relational● Based on relational
model: relational algebra○ schema
● SQL: not necessarily but the most widespread
● Transactions○ ACID
● Well tested, "proven"
● Does not based on the relational model.○ schema free
● Query languages may differ:○ Datalog, XPath, etc.
● Unique operations (eg. CRUD)○ BASE
● Many, quite new solutions.○ beta phase versions, etc. 7
![Page 8: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/8.jpg)
NoSQL - ACID vs. BASEACID
● Atomicity○ "all or nothing"
transactions● Consistency
○ data is always valid● Isolation
○ transactions are independent
● Durability○ permanent state
BASE● Basic Availability
○ it’s OK to give approximate answers
● Soft state○ easier (schema)
evolution● Eventual consistency
○ stored data achieves consistent state with time
8
![Page 9: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/9.jpg)
NoSQL - CAP Theorem
Consistencyaka."All client see the same data at the same time!"
Availabilityaka."Every request got a response about success or failure!"
Partition toleranceEventualconsistency
RDBMSs
Enforced consistency(PAXOS)
You can choose only two by the following three attributes.
- Eric Brewer at 2000
10+ years already passed, many misleading/false information were born based on the theorem.
9
![Page 10: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/10.jpg)
Partitioning - General● Distributed memory cache● Clustering
○ scaling the persistency layer● Separate operations (reads and writes)
○ dedicated master for write, group of slaves for read● Sharding - horizontal partitioning
○ the storage volume is distributed among many nodes
Scale up: "put more RAM or better processor to the server" Scale out: distribute data among computing elements
Scale out = partitioning10
![Page 11: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/11.jpg)
Partitioning - Vertical
Splitting up stored data in one entity, into multiple ones. (It’s a bit more than normalization.)
“Different columns on different resources.”
E.g.: Frequently accessed columns in separate table, cached in-memory. Rarely used columns stored on disk.
11
![Page 12: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/12.jpg)
Partitioning - Horizontal
One entity, however it’s data may be distributed on many storage elements.
“Different rows on different resources.”
E.g.: Year based distribution on different storage elements based on an “insertion time” constraint.
12
![Page 13: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/13.jpg)
NoSQL - GeneralNoSQL in keywords:● Only a buzzword
○ Meaning: “One size does not fit all!”● CAP Theorem● ACID vs BASE● Different models
○ Doc. store, Key-Value, Column oriented, BigTable
NoSQL means: “we have options”!Not against relational DBs, but a complement to those!
13
![Page 14: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/14.jpg)
NoSQL - Options
Options
Non-Relational Relational
Operational
Analytic
NewSQLNoSQL
Document
Key-value DaaS
Column oriented Graph
Oracle IBM DB2 JustOneDB MS SQL Server
HadoopCloudera Hadapt
Oracle TimesTen IBM InfosphereSAP (Hana, Sybase IQ) HP Vertica
SPARK
Lotus Notes
CouchDB MongoDB
MySQL PostgreSQL JustOneDB
ProgressObjectivity Versant
McObjectMarkLogic
SQL Azure RavenDB Amazon RDS
XeroundFathomDB NuoDB
Riak Redis Voldemort BerkleyDB
Cassandra Accumulo
BigTable HyperTable HBase
Neo4j
Couchbase SimpleDBApp Engine
Brand new RDBS Add-on
Clustrix VoltDB SnakeSQL
ScaleDBMySQL Cluster GenieDB TokutekDrizzle
Flat, Hierarchical, Network, etc...
source: Tim Gasper - Big Data Right Now: Five trendy open source technologies
14
![Page 15: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/15.jpg)
Challenges 1.How to chose? Rule of thumb: Benchmarking!But an exact way for NoSQL benchmarking could not exist. (It's not like TPC-X for RDBMSs.) Even if we want to compare them, we must fight with different...
○ problems and possibilities,○ APIs,○ partitioning techniques, etc ...
15
![Page 16: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/16.jpg)
Challenges 2.The main issue is that the design and preferred use-cases of the NoSQL databases are REALLY different. There is no "x better than y" argument for EVERY use-case.
Benchmarks are based on several use cases:● different computing elements,● compared by write/read/update/scan
operations. (mixed with a ratio: 90%/10%/0%)16
![Page 17: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/17.jpg)
Prototypes - The candidates17
![Page 18: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/18.jpg)
SelectionIn multiple phases...
Find:● Showstopper problems (no-go)● Barely usable (some issues)● Promising candidates
Preliminary testing.18
![Page 19: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/19.jpg)
CandidatesNo-go
● HBase (/w HDFS)○ BLOB size problem.
● CouchDB○ Drivers
● Hypertable○ In development
● etc.: app layer needs, CAP characteristics, durability problems.
Promising● MongoDB
● Cassandra
So-so● RIAK
○ Query routing!● (Couchbase)
19
![Page 20: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/20.jpg)
DeploymentAutomated virtual environments on OpenStack.
○ Personal tenant - biased by user interactions○ Thanks to the collaboration with CERN IT, the
evaluation was made on dedicated resources○ Also SSD cached vs. disk comparisons were made
Details:○ No overcommit○ Instances are “equally” distributed on the
hypervisors. (for 5 node: 2-2-1 on 3 hypervisors)○ 1 GBit NICs (shared between co-hosted VMs)
20
![Page 21: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/21.jpg)
Evaluation
Empirical evaluation: Check if a given prototype meets the usability and performance criterias of the desired solution.
If more of them passes the criteria, choose the best, based on essential features and performance characteristics.
21
![Page 22: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/22.jpg)
CustomSamplers 1.An extension for JMeter, with CMS specific needs, in order to measure the performance of different databases.For each candidate the extension has:● Deployers
○ To build up the data model● QueryHandlers
○ Simulate the CMS workflow● ConfigElements
○ Configure persistency objects● Samplers
○ Report to the testplan listeners22
![Page 23: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/23.jpg)
CustomSamplers 2.Testplans are XML configurations that set up the behaviour of the testing engine by controlling:● number of threads● ramp up time of thread creation● configure connection layers● assign requests to threads
1 TPS : 1200 threads started in 1200 second, result in 20 min. constant stage tests. 23
![Page 24: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/24.jpg)
ResultsIncreasing request numbers: 1-9 TPS(For both remote and single testplans)● Exploring limits for saturating factors like:
○ Network bandwidth○ Access of persistency objects○ Storage elements (Ephemeral disk/SSD, Ceph)
● Scaling out (different cluster setups):○ Node numbers (5 x m1.large, 4 x m1.medium) ○ Routing techniques (Round robin, Token-aware)○ Distributed testing (4 JMeter engine)
24
![Page 25: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/25.jpg)
Single nodeNormal test - DISK
25
![Page 26: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/26.jpg)
Single nodeRemote test - DISK
26
![Page 27: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/27.jpg)
Single nodeNormal test - SSD
27
![Page 28: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/28.jpg)
Single nodeRemote test - SSD
28
![Page 29: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/29.jpg)
Medium clusterRemote test - DISK
29
![Page 30: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/30.jpg)
Medium clusterRemote test - SSD
30
![Page 31: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/31.jpg)
Large clusterRemote test - DISK
31
![Page 32: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/32.jpg)
Large clusterRemote test - SSD
32
![Page 33: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/33.jpg)
Ceph volume
33
![Page 34: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/34.jpg)
Outro - Present and future34
![Page 35: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/35.jpg)
Application layerThe current implementation of the session layer is highly modular and extendable with alternative storage backends. Steps:● Handling persistency objects
○ Extending the software framework with NoSQL support
● Implement the Session interfaces○ Implementing the “equivalent” CondDB queries
● Testing 35
![Page 36: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/36.jpg)
Integration● Release validation● Find differences between the current
solution and the prototypes○ Using real data○ Real use-cases - using CMSSW
This will be the final performance comparison between different deployments.
36
![Page 37: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/37.jpg)
Outlook● Understand and eliminate issues during the
release validation● Fine-tuning critical performance factors● Formal evaluation and comparison of the
different solutions
Long term project!Not a “by tomorrow” change, but for LS2.
37
![Page 38: Using NoSQL technologies for handling of the CMS Conditions](https://reader035.vdocument.in/reader035/viewer/2022072617/62df23814c451c07f74258e1/html5/thumbnails/38.jpg)
The endThank you for your attention!
Any questions are welcome!
From: http://geek-and-poke.com/38