evaluation of nosql databases for dirac monitoring and beyond adrian casajus ramo, federico stagni,...
TRANSCRIPT
![Page 1: Evaluation of NoSQL databases for DIRAC monitoring and beyond Adrian Casajus Ramo, Federico Stagni, Luca Tomassetti, Zoltan Mathe On behalf of the LHCb](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d345503460f94a0b0de/html5/thumbnails/1.jpg)
Evaluation of NoSQL databases for DIRAC
monitoring and beyond
Adrian Casajus Ramo, Federico Stagni, Luca Tomassetti, Zoltan Mathe
On behalf of the LHCb collaboration
![Page 2: Evaluation of NoSQL databases for DIRAC monitoring and beyond Adrian Casajus Ramo, Federico Stagni, Luca Tomassetti, Zoltan Mathe On behalf of the LHCb](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d345503460f94a0b0de/html5/thumbnails/2.jpg)
Motivation
Evaluation of NoSQL databases for DIRAC monitoring and beyond, CHEP2015
2
m Develop a system for real time monitoring and data analysis:o Focus on monitoring the jobs (not accounting)
m Requirements o Optimized for time series analysiso Efficient data storage, data analysis and retrievalo Easy to maintaino Scale Horizontally o East to create complex reports (dashboards)
m Why?o Current system is based on MySQL:
P is not designed for real time monitoring (more for accounting)P does not scale to hundred of million rows (>500 million).
P It requires ~400 second to generate a one-month duration plotP is not for real time analysis P is not schema-less:
d Often change the data format
![Page 3: Evaluation of NoSQL databases for DIRAC monitoring and beyond Adrian Casajus Ramo, Federico Stagni, Luca Tomassetti, Zoltan Mathe On behalf of the LHCb](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d345503460f94a0b0de/html5/thumbnails/3.jpg)
Motivation
3Evaluation of NoSQL databases for DIRAC monitoring and beyond, CHEP2015
![Page 4: Evaluation of NoSQL databases for DIRAC monitoring and beyond Adrian Casajus Ramo, Federico Stagni, Luca Tomassetti, Zoltan Mathe On behalf of the LHCb](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d345503460f94a0b0de/html5/thumbnails/4.jpg)
Technologies used
m Database:o InfluxDB is a distributed time series database with no dependencyo OpenTSDB is a distributed time series database based on HBaseo ElasticSearch is a distributed search and analytic engine
m Data visualization:o Grafana
P Metric dashboard and graph editor for InfluxDB, Graphite and OpenTSDB
Evaluation of NoSQL databases for DIRAC monitoring and beyond, CHEP2015
4
![Page 5: Evaluation of NoSQL databases for DIRAC monitoring and beyond Adrian Casajus Ramo, Federico Stagni, Luca Tomassetti, Zoltan Mathe On behalf of the LHCb](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d345503460f94a0b0de/html5/thumbnails/5.jpg)
Motivation
m Grafana dashboard:
Evaluation of NoSQL databases for DIRAC monitoring and beyond, CHEP2015
5
![Page 6: Evaluation of NoSQL databases for DIRAC monitoring and beyond Adrian Casajus Ramo, Federico Stagni, Luca Tomassetti, Zoltan Mathe On behalf of the LHCb](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d345503460f94a0b0de/html5/thumbnails/6.jpg)
Technologies used
Evaluation of NoSQL databases for DIRAC monitoring and beyond, CHEP2015
6
m Database:o InfluxDB is a distributed time series database with no dependencyo OpenTSDB is a distributed time series database based on HBaseo ElasticSearch is a distributed search and analytic engine
m Data visualization:o Grafana
P Metric dashboard and graph editor for InfluxDB, Graphite and OpenTSDo Kibana
P Flexible analytic and visualization frameworkP Developed for creating complex dashboards
![Page 7: Evaluation of NoSQL databases for DIRAC monitoring and beyond Adrian Casajus Ramo, Federico Stagni, Luca Tomassetti, Zoltan Mathe On behalf of the LHCb](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d345503460f94a0b0de/html5/thumbnails/7.jpg)
Technologies used
Evaluation of NoSQL databases for DIRAC monitoring and beyond, CHEP2015
7
m Kibana dashboard:
![Page 8: Evaluation of NoSQL databases for DIRAC monitoring and beyond Adrian Casajus Ramo, Federico Stagni, Luca Tomassetti, Zoltan Mathe On behalf of the LHCb](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d345503460f94a0b0de/html5/thumbnails/8.jpg)
Technologies used
Evaluation of NoSQL databases for DIRAC monitoring and beyond, CHEP2015
8
m Database:o InfluxDB is a distributed time series database with no dependencieso OpenTSDB is a distributed time series database based on HBaseo ElasticSearch is a distributed search and analytic engine
m Data visualization:o Grafana
P Metric dashboard and graph editor for InfluxDB, Graphite and OpenTSDo Kibana
P Flexible analytic and visualization frameworkP Developed for creating complex dashboards
m Communicationo RabbitMQ
P Robust messaging system
![Page 9: Evaluation of NoSQL databases for DIRAC monitoring and beyond Adrian Casajus Ramo, Federico Stagni, Luca Tomassetti, Zoltan Mathe On behalf of the LHCb](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d345503460f94a0b0de/html5/thumbnails/9.jpg)
Overview of the System
Evaluation of NoSQL databases for DIRAC monitoring and beyond, CHEP2015
9
![Page 10: Evaluation of NoSQL databases for DIRAC monitoring and beyond Adrian Casajus Ramo, Federico Stagni, Luca Tomassetti, Zoltan Mathe On behalf of the LHCb](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d345503460f94a0b0de/html5/thumbnails/10.jpg)
Hardware and data format
m RabbitMQo one physical machine
m 12 VMs provided by CERN OpenStacko Each VM has 4 core, 8 GB memory and 80GB disko We used 3 clusters with 4 nodes
m Data format:o The records are sent to the RabbitMQ in JSON format.o Each record must contain a minimum of four elements:
P metric, time, key/value pairs, valueP For example: {"Status": "Done", ”time": 1404086442, "JobSplitType":
"MCSimulation", "MinorStatus": "unset", "Site": "ARC.Oxford.uk", "value": 10, ”metric": ”WMSHistory", "User": "phicharp", "JobGroup": "00037468", "UserGroup": "lhcb_mc”}
Evaluation of NoSQL databases for DIRAC monitoring and beyond, CHEP2015
10
![Page 11: Evaluation of NoSQL databases for DIRAC monitoring and beyond Adrian Casajus Ramo, Federico Stagni, Luca Tomassetti, Zoltan Mathe On behalf of the LHCb](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d345503460f94a0b0de/html5/thumbnails/11.jpg)
Performance comparison
Evaluation of NoSQL databases for DIRAC monitoring and beyond, CHEP2015
11
m We have recorded ~600 million records during ~1.5 monthm We defined 5 different queries
o Running jobs grouped by Siteo Running jobs grouped by JobGroupo Running jobs grouped by JobSplitTypeo Failed jobs grouped by JobSplitTypeo Waiting jobs grouped by JobSplitType
m Query intervals: 1, 2, 7 and 30 day o Random interval:
P Start and end time are generated randomly between 2015-02-05, 15:00:00 and 2015-03-12 15:00:00
m The high workload is generated by 10, 50, 100 clients (python threads) to measure the response time and the throughputo REST APIs are used to retrieve the data from the DBo All clients are used a random query and a random periodo All clients are continuously running parallel during 7200 second
InfluxDB has not scaled after
2 days
![Page 12: Evaluation of NoSQL databases for DIRAC monitoring and beyond Adrian Casajus Ramo, Federico Stagni, Luca Tomassetti, Zoltan Mathe On behalf of the LHCb](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d345503460f94a0b0de/html5/thumbnails/12.jpg)
Results: 10 client
Evaluation of NoSQL databases for DIRAC monitoring and beyond, CHEP2015
12
![Page 13: Evaluation of NoSQL databases for DIRAC monitoring and beyond Adrian Casajus Ramo, Federico Stagni, Luca Tomassetti, Zoltan Mathe On behalf of the LHCb](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d345503460f94a0b0de/html5/thumbnails/13.jpg)
Results: 50 client
Evaluation of NoSQL databases for DIRAC monitoring and beyond, CHEP2015
13
![Page 14: Evaluation of NoSQL databases for DIRAC monitoring and beyond Adrian Casajus Ramo, Federico Stagni, Luca Tomassetti, Zoltan Mathe On behalf of the LHCb](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d345503460f94a0b0de/html5/thumbnails/14.jpg)
Results: 100 client
Evaluation of NoSQL databases for DIRAC monitoring and beyond, CHEP2015
14
![Page 15: Evaluation of NoSQL databases for DIRAC monitoring and beyond Adrian Casajus Ramo, Federico Stagni, Luca Tomassetti, Zoltan Mathe On behalf of the LHCb](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d345503460f94a0b0de/html5/thumbnails/15.jpg)
Response time of all experiments
Evaluation of NoSQL databases for DIRAC monitoring and beyond, CHEP2015
15
![Page 16: Evaluation of NoSQL databases for DIRAC monitoring and beyond Adrian Casajus Ramo, Federico Stagni, Luca Tomassetti, Zoltan Mathe On behalf of the LHCb](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d345503460f94a0b0de/html5/thumbnails/16.jpg)
Throughput of all experiments
Evaluation of NoSQL databases for DIRAC monitoring and beyond, CHEP2015
16
![Page 17: Evaluation of NoSQL databases for DIRAC monitoring and beyond Adrian Casajus Ramo, Federico Stagni, Luca Tomassetti, Zoltan Mathe On behalf of the LHCb](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d345503460f94a0b0de/html5/thumbnails/17.jpg)
Conclusions
m ElasticSearch was faster than OpenTSDB and InfluxDBo It is easy to maintain o Marvel is a very good tool for monitoring the cluster
P license required…o It can be easily integrated to the DIRAC portalo OpenTSDB was slower than ElasticSearch but it may scale better by
adding more nodes to the clusterP It is not easy to maintain (lot of parameters which have to be correctly set)P Very good monitoring of the cluster.
m InfluxDB is a new time series database, which is easy to use, but it does not scale
m Kibana can fulfil our needso But we’ll look at integration in the DIRAC portal
m According to our experience we decided to use ElasticSerach for real time monitoring of jobs, and for all real time DIRAC monitoring systems
Evaluation of NoSQL databases for DIRAC monitoring and beyond, CHEP2015
17
![Page 18: Evaluation of NoSQL databases for DIRAC monitoring and beyond Adrian Casajus Ramo, Federico Stagni, Luca Tomassetti, Zoltan Mathe On behalf of the LHCb](https://reader035.vdocument.in/reader035/viewer/2022062421/56649d345503460f94a0b0de/html5/thumbnails/18.jpg)
Thanks!
Question, comments
?
18
Evaluation of NoSQL databases for DIRAC monitoring and beyond, CHEP2015