databases at cern, · cern database services • over 100 oracle databases, most of them rac •...
TRANSCRIPT
Databases at CERN,
experience and future
challenges
Eric Grancher, IT department, CERN
OakTable network member
2
0. Thanks to Christian Trieb and DOAG
1. Thanks to my colleagues
(especially Ignacio, Luna, Manuel and Zbigniew)
2. Blame on me for errors
Outline
• Introduction to CERN
• Oracle DB usage at CERN (history)
• CERN and Oracle user group(s)
• Oracle at CERN
• World Wide LHC Computing Grid challenges
• CERN openlab for ICT innovations
• Database evolutions and challenges
• Conclusions
3
CERN
• CERN - European Laboratory for Particle Physics
• Founded in 1954 by 12 Countries for fundamental
physics research in the post-war Europe
• Today 22 member states and world-wide collaborations
• About ~1100 MCHF yearly budget
• 2’300 CERN personnel
• 10’000 users from 110 countries
4
Fundamental Research
• What is 95% of the Universe made of?
• Why do particles have mass?
• Why is there no antimatter
left in the Universe?
• What was the Universe like,
just after the "Big Bang"?
5
18 months of production experience
with 12c
6
The Large Hadron Collider (LHC)
Largest machine in the world27km, 6000+ superconducting magnets
Emptiest place in the solar system
High vacuum inside the magnets
Hottest spot in the galaxy
During Lead ion collisions create temperatures 100 000x hotter than the heart of the sun
Fastest racetrack on Earth
Protons circulate 11245 times/s (99.9999991% the speed of light)
“Data deluge” video
7
LHC Computing Grid-WLCG
~600 million collisions per second, ~1 PB raw data per second before filtering, ~50 PB of new data annually
>500,000 cores in ≥42 countries, ≥170 computer centres around the world
http://cern.ch/about/computinghttp://cern.ch/lhcathome/
8
CERN Tier0 metrics
9
WLCG - Germany
10
https://wlcg-rebus.cern.ch/
Outline
• Introduction to CERN
• Oracle DB usage at CERN (history)
• CERN and Oracle user group(s)
• Oracle at CERN
• World Wide LHC Computing Grid challenges
• CERN openlab for ICT innovations
• Database evolutions and challenges
• Conclusions
11
Oracle at CERN started with…
• Since 1982 – version 2.3
• Initially used for accelerator controls
• Currently supports hundreds
of applications in different domains
• LHC experiments metadata
• Accelerator control and logging
• Engineering applications
• Administrative support: HR, ERT, Finance, WMS
12
Credit: N. Segura Chinchilla, CERN
13
Credit: C. Roderick
Accelerator applications
CERN Database Services
• Over 100 Oracle databases, most of them RAC• Running Oracle 11.2.0.4 and 12.1.0.2
• > 1100 TB of data files for production DBs in total, NAS as storage
• Oracle In-Memory in production since July 2015
• Examples of critical production DBs:• Quench Protection System: 150’000 changes/s
• QPSr redo > 135TB / month
• But also DB as a Service (single instances)• 310 MySQL, 70 PostgreSQL, 9 Oracle, 5 InfluxDB
14
15
CERN DBoD
• Database on
Demand
• MySQL
• PostgreSQL
• InfluxDB
16
Size and activity/redo
ACCLOG, credit: N. Tsvetkov
Outline
• Introduction to CERN
• Oracle DB usage at CERN (history)
• CERN and Oracle user group(s)
• Oracle at CERN
• World Wide LHC Computing Grid challenges
• CERN openlab for ICT innovations
• Database evolutions and challenges
• Conclusions
17
CERN and Oracle user groups
18
• Quite involved for many years including Alan Silverman, Sue Foffano, Ronny Billen, Geneviève Ferran, Stéphan Petit, Tomasz Ladzinski, Anna Suwalska
• At SOUG, EOUG, etc.
• Has been extremely useful to learn, share, establish contacts as well as to work with Oracle.
Enhancements (EOUG)
19
• FY2000
enhancements
Credit: Ronny Billen
1997-1998 1998-1999
Voting rounds 6 5
User Groups contacted 30 30
User Groups logged on 21 20
User Groups voting 12 12
Users voting 82 67
Votes 13000 13000
Outline
• Introduction to CERN
• Oracle DB usage at CERN (history)
• CERN and Oracle user group(s)
• Oracle at CERN
• World Wide LHC Computing Grid challenges
• CERN openlab for ICT innovations
• Database evolutions and challenges
• Conclusions
20
Managed Cloud Service
21
• The main plusses for us are• eBusiness Suite expertise:
• OMCS knows what needs to patched and when and prepares a proper intervention plan
• They perform the update outside working hours and have a very good track record: we never experienced problems after these updates
• Monitoring• Very good proactive monitoring that allows us to be aware of the issues even before our customers experience
them.
• A few false-positives, but no false-negatives
• Minuses• Issues with non-standard operations (e.g. updates of IFILE for CERN specific TNS entries). Even if proper
documentation exists we had many cases where the person performing the change still did not follow the appropriate non-standard procedures
• Scheduling issues: lately missed a few intervention windows
• Lessons Learned• The OMCS service delivery manager can play a very important role when escalating particular issues.
Make good use of his/her services
• Give appropriate leeway when scheduling interventions (more that the required minimum)
Credit: Giovanni Chierico, CERN
Role of tape
at CERNTSM backup:
• IBM
• 2 x TS3500
• 55 x TS1140
• 200 x JC, 12000 x JB
• 8 PB; ~2300 M files
• 18 x TSM 7.1.4 servers
CASTOR archive:
• IBM
• 3 x TS3500
• 46 x TS1150
• 6000 x JD media (10 TB)
6000 x JC media (7 TB)
• Oracle
• 4 x SL8500
• 40 x T10000D
• 10000 x T2 media (8 TB)
• 10 PB disk cache
• ~150 PB of data on tape
~25 PB of free space
• Over 7 PB of new data per month
• Peaks of up to 7 GB/s to tape
• Lifetime of data: infinite
Main file or object storages:
• AFS 450 TB, 3 G files
• EOS 40 PB, 450 M files
• Ceph 360 TB, 80 M objects
22
Credit:
Vlado Vahyl
253 PB on tape
556 M files
Credit:
Ian Bird
Experiment online systems
23
• Experiments rely on
a SCADA system
for their control
• Up to 150,000
changes / second
stored in Oracle
databases
Experiment offline systems
24
• Geometry DB
• Conditions DB
Credit:
Vakho Tsulaia
And… Fidelio Micros for CERN hotel
25
Outline
• Introduction to CERN
• Oracle DB usage at CERN (history)
• CERN and Oracle user group(s)
• Oracle at CERN
• World Wide LHC Computing Grid challenges
• CERN openlab for ICT innovations
• Database evolutions and challenges
• Conclusions
26
WLCG and LHC / experiments
27
Credit: F. Bordry
• Simple model based on today’s computing models, but with expected HL-LHC operating parameters (pile-up, trigger rates, etc.)
• At least x10 above what is realistic to expect from technology with reasonably constant cost
8 October 2016 Ian Bird
0
100
200
300
400
500
600
700
800
900
1000
Raw Derived
Dataestimatesfor1styearofHL-LHC(PB)
ALICE ATLAS CMS LHCb
0
50000
100000
150000
200000
250000
CPU(HS06)
CPUNeedsfor1stYearofHL-LHC(kHS06)
ALICE ATLAS CMS LHCb
Data:• Raw 2016: 50 PB 2027: 600 PB
• Derived (1 copy): 2016: 80 PB 2027: 900 PB
CPU:• x60 from 2016
Technology at ~20%/year will bring x6-10 in 10-11 years
Estimates of resource needs for HL-LHC
Credit: Ian Bird
28
Technical topics (WLCG evolution)• Computing models
• Different scenarios
• Use of in-house, commercial, dedicated architectures, HPC, opportunistic, etc. resources
• Technology “choices” – may not be a choice but market-driven
• Data management and data access layer• End-to-end performance considerations; models of data delivery, event streaming, etc.
• Networking
• Resource provisioning layer
• Workload management layer
• Analysis facilities – how will analysis be done – traditional vs ”query” vs ML, …
• These above lead to ideas about facilities and how they may look
• The stated (and agreed) intention in the CWP discussion is to make these components as common and non-experiment specific as possible
• Clarify what really needs to be specific
• The CWP will provide the details of progress and R&D roadmaps in many key areas ---> http://hepsoftwarefoundation.org/
29
Credit: Ian Bird
Outline
• Introduction to CERN
• Oracle DB usage at CERN (history)
• CERN and Oracle user group(s)
• Oracle at CERN
• World Wide LHC Computing Grid challenges
• CERN openlab for ICT innovations
• Database evolutions and challenges
• Conclusions
30
CERN openlab collaboration
31
PARTNERS CONTRIBUTORS ASSOCIATES RESEARCH
CERN openlab
32
• Many successes with Oracle: • RAC on Linux x86 (9.2 PoC and 10.1
production with ASM),
• Additional required functionality (IEEE numbers, OCCI, instant client, etc.),
• PVSS and RAC scalability,
• Monitoring with Grid Control,
• Streams world wide distribution,
• Active DG, GoldenGate,
• Analytics for accelerator, experiment and IT,
• In-memory database
• Etc.
• Current work on Bare Metal Cloud services
• Regular feedback with joint selection of topics, some of the projects are common with more than one partner
CERN openlab mission
• Evaluate state-of-the-art technologies in a
challenging environment and improve them.
• Test in a research environment today
technologies that will be used in many business
sectors tomorrow.
• Train the next generation of
engineers/researchers.
• Promote education and cultural exchanges.
• Communicate results and reach new audiences.
• Collaborate and exchange ideas to create
knowledge and innovation.
openlab/Oracle Cloud Infrastructure
33
Credit: Ana Lameiro Fernández
Katarzyna Maria Dziedziniewicz-Wojcik
Eva Dafonte Pérez
Outline
• Introduction to CERN
• Oracle DB usage at CERN (history)
• CERN and Oracle user group(s)
• Oracle at CERN
• World Wide LHC Computing Grid challenges
• CERN openlab for ICT innovations
• Database evolutions and challenges
• Conclusions
34
Databases and streaming
35
• Increasingly IoT and monitoring workloads
• Streaming for ingest
• Streaming for analysis
Credit: Lionel Cons
Credit: IT-CM-MM
Streaming for the CERT• Since last year in test mode:
• 60K IN messages per second
• Aggregated bandwidth of around 200 MB/s. Clients 20MB/s in 80MB/s out
• 10 Brokers, around 100 partitions per broker. -> 4VCPU / 8GB
• 72 hours retention policy. ~12TB persistence total as of now
• Type of content: connections crossing firewall, DNS requests, commands executed in shared Linux clusters...
36
Credit: Jose Carlos Luna Duran
Hadoop
37
• Complements Oracle database features
• Interesting ecosystem with developments for
storage engines for efficiency
• Ex: ATLAS
Event Index• Over 120 billion of records,
150TB of data
• Current ingestion rates 5kHz,
• 60TB/year
Credit: Dario Barberis
CERN IT Monitoring infrastructure
Kafka cluster
(buffering) *
Processing
Data enrichment
Data aggregation
Batch Processing
Transport
Flume
Kafka
sink
Flume
sinks
FTS
Data
Sources
Rucio
XRootD
Jobs
…
Lemon
syslog
app log
DB
HTTP
feed
AMQFlume
AMQ
Flume
DB
Flume
HTTP
Flume
Log GW
Flume
Metric
GW
Logs
Lemon
metrics
HDFS
Elastic
Search
…
Storage &
Search
Others
(influxdb)
Data
Access
CLI, API
• Data now 200 GB/day, 200M events/day
• At scale 500 GB/day
• Proved effective in several occasions
Credits: Alberto Aimar, IT-CM-MM
Critical for CC operations and WLCG
38
SWAN notebooks
39
MySQL, variety, optimiser
40
Credit: Ignacio Coterillo Coz
Evolution on the DB technologies
41
• NVRAM, FPGAs, low latency communication architecture, etc…
• In addition to well known SQL deployments• Spark scale-out
• In-memory
• Embedded (not new)
• Time series based SQL
• Integration with machine learning?
SQL is key: example Apache Spark
42
# Spark SQL example
>> df.createOrReplaceTempView("temp_table")
>> df2 = spark.sql(
"SELECT t.boolField, avg(t.doubleField) avg_double, max(t.intField) max_int " \
"FROM temp_table t " \
"WHERE t.acqStamp between 1495760359000000000 and 1495760370000000000“ \
"GROUP BY t.boolField").collect()
# Spark DataFrame API
>> df2 = df.filter(df.acqStamp.between(1495760359000000000L,1495760370000000000L))\
.groupBy("boolField").agg(func.avg("doubleField").alias(‘avg_double’),\
func.max("intField").alias(‘max_int’))
>> df2.show()
SQL is key: example time series DB
43
Evolution of services
44
• Use of Cloud resources for many solutions
• Initially for specific applications (SaaS only
availability)
• Increasingly for flexibility, automation (PaaS,
IaaS)
Evolution of our communities
45
• User groups
• Meetups
• Oracle user group SIG
• Emphasis on development
Evolution of developer needs
46
• Some see the database as commodity (serialisation, scale horizontal, “serverless”)
• Some new architectures are simple to code with but complex to analyse
• Multi-languages, extensibility
• Variety of databases
• SQL is everywhere but…
Opportunities or threats?
47
• Serverless
• Multiple database systems
• Big Data / ML
• Cloud
• Security Bruce Schneier “Security is never something we actually want. Security
is something we need in order to avoid what we don't want. It's also more abstract,
concerned with hypothetical future possibilities. Of course it's lower on the priorities list
than fundraising and press coverage. They're more tangible, and they're more immediate.”
Credit: https://www.schneier.com/
Evolution of people and skills
48
• Oracle database administrator
• … to Oracle database engineer
• … to database engineer
• … to database and Hadoop engineer
• … to database / Hadoop / streaming engineer
• … to cloud platform engineer
• (… have bigger developer role, machine learning?)
Outline
• Introduction to CERN
• Oracle DB usage at CERN (history)
• CERN and Oracle user group(s)
• Oracle at CERN
• World Wide LHC Computing Grid challenges
• CERN openlab for ICT innovations
• Database evolutions and challenges
• Conclusions
49
Summary
50
• From accelerator control to…• accelerator logging,
• administration,
• engineering systems,
• access control,
• laboratory infrastructure (cabling,network configuration, etc.),
• mass storage system,
• experiment online systems,
• experiment offline systems,
• CERN hostel
• Focus on developers / development
• Impact of cloud solutions
• New architectures (software and hardware)
• Very challenging and interesting years in front of us (change / adapt / learn and exchange), have to be ready to change more often
• People are at the heart of change
Thank you to DOAG and the event organisers.
I wish you a great conference.
I welcome your ideas, suggestions, feedback at [email protected]
See also: https://db-blog.web.cern.ch/
Q & A
52