big science and big databig science and big data dirk duellmann, cern apache big data europe 28 sep...

26
Big Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary

Upload: others

Post on 27-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Science and Big DataBig Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster

Big Science and Big DataDirk Duellmann, CERN

Apache Big Data Europe 28 Sep 2015, Budapest, Hungary

Page 2: Big Science and Big DataBig Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster
Page 3: Big Science and Big DataBig Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster

16/02/2015 Real-Time Analytics: Making better and faster business decisions 8

Page 4: Big Science and Big DataBig Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster
Page 5: Big Science and Big DataBig Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster

CERN IT Department CH-1211 Genève 23

Switzerland www.cern.ch/it

The Worldwide LHC Computing Grid

7000  tons,  150  million  sensors generating  data  40  millions  times  per  second

i.e.  a  petabyte/s

The  ATLAS  experiment

5

Page 6: Big Science and Big DataBig Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster

[email protected]

LHCb: 200-400 MB/sec

Data flow to permanent storage: 4-6 GB/sec

Alice: 4 GB/sec

ATLAS: 1-2 GB/sec

CMS: 1-2 GB/sec

Data Collection and Archiving at CERN

Page 7: Big Science and Big DataBig Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster

The Worldwide LHC Computing Grid

Tier-1: permanent storage, re-processing, analysis

Tier-0 (CERN): data recording, reconstruction and distribution

Tier-2: Simulation,end-user analysis

> 2 million jobs/day

~350’000 cores

500 PB of storage

nearly 170 sites, 40 countries

10-100 Gb links

An international collaboration to distribute and analyse LHC data

Integrates computer centres worldwide that provide computing and storage resource into a single infrastructure accessible by all LHC physicists

Page 8: Big Science and Big DataBig Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster

LHC – Big Data…Few PB of raw data becomes ~100 PB! ➔

• Duplicate raw data • Simulated data • Derived data products • Versions as software improves • Replicas to allow access by

more physicists

Page 9: Big Science and Big DataBig Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster

[email protected]

• 1st  Try  -­‐  All  data  in  an  commercial  Object  Database  (1995)  – good  match  for  complex  data  model  and  OO  language  integraLon  – but  the  market  predicted  by  many  analysts  did  not  materialise!  

• 2nd  Try  -­‐  All  data  in  a  relaLonal  DB  -­‐  object  relaLonal  mapping  (1999)  – PB-­‐scale  of  deployment  was  far  for  from  being  proven      –Users  code  in  C++    —  and  rejected  data  model  definiLon  in  SQL  

• Hybrid  between  RDBMS  and  structured  files  (from  2001  -­‐  today)  – RelaLonal  DBs  for  transacLonal  management  of  metadata  (only  TB-­‐scale)    

• File/dataset  meta  data,  condiLons,  calibraLon,  provenance,  work  flow  • via  DB  abstracLon  (plugins:  Oracle,  MySQL,  SQLite,  FronLer/SQUID)    

• Open  source  persistency  framework  (ROOT)  –Uses  C++  “introspecLon”  to  store/retrieve  networks  of  C++  objects  – Column-­‐store  for  efficient  sparse  reading

9

How  do  we  store/retrieve  LHC  data?  A  short  history…

Page 10: Big Science and Big DataBig Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster

Processing a TTree

16

preselection analysisOk

Output list

Process()

Branch

Branch

Branch

BranchLeaf Leaf

Leaf Leaf Leaf

Leaf Leaf

Event n

Read needed parts only

TTree

Loop over events

1 2 n last

Terminate()- Finalize analysis

(fitting, ...)

Begin()- Create histograms- Define output list

TSelector

Page 11: Big Science and Big DataBig Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster

CERN Disk Storage Overview

2

AFS CASTOR EOS Ceph NFS CERNBoxRaw Capacity 3 PB 20 PB 140 PB 4 PB 200 TB 1.1 PBData Stored 390 TB 86 PB (tape) 27 PB 170 TB 36 TB 35 TBFiles Stored 2.7 B 300 M 284 M 77 M (obj) 120 M 14 M

AFS is CERN’s linux home directory service

CASTOR & EOS are mainly used for the physics use case (Data Analysis and DAQ)

Ceph is our storage backend for images and volumes in OpenStack

NFS is mainly used by engineering application

CERNBox is our file synchronisation service based on OwnCloud+EOS

Page 12: Big Science and Big DataBig Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster

CHEP 2015, Okinawa

Tape at CERN

14/4/201512

Archive read

15 PB 23 PB

27 PBArchive write

Data Volume 100 PB physics archive 7 PB backup (TSM)

Tape libraries 3+2 x IBM TS3500 4 x Oracle SL8500

Tape drives 100 physics archive 50 backup

Capacity 70k slots 30k tapes

A look into the Future

• LHC upgrades will further increase luminosity• Computing resources needs will be higher• Data generated will increase drastically

• Next accelerators• Future Circular Collider (80-100 km)

Bangalore –05/02/2015 India Analytics & Big Data Summit 2015 21

Page 13: Big Science and Big DataBig Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster

CHEP 2015, Okinawa

Archive: Large scale media migration

14/4/201513

LHC Run1

Repack

LHC Run1

Repack

Deadline:LHC run 2 start !

Part 1:Oracle T10000D

Part 2:IBM TS1150

Page 14: Big Science and Big DataBig Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster

[email protected] 17 201514

Page 15: Big Science and Big DataBig Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster

Smart vs Simple Archive: HSM Issues

• CASTOR had been designed as Hierarchical Storage Management system

• disk-only and multi-pool support were added later — painfully..

• required rates for namespace access and file-open exceeded earlier estimates

• Around LHC start also conceptual issues with the HSM model became visible

• “A file” is not a meaningful granule for managing data exchange— experiment use datasets

• Dataset parts needed to be “pinned” on disk by users to avoid cache trashing

• Users had to “trick” the HSM to do the right thing :-(

Page 16: Big Science and Big DataBig Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster

CERN IT Department CH-1211 Genève 23

Switzerland www.cern.ch/it

Internet Services

DSS EOS Project: Goals & Choices

• Server, media, file system failures need to be transparently absorbed

– key functionality: file level replication and rebalancing – data stays available after a failure - no human intervention

• Fine grained redundancy within one h/w setup – choose & change redundancy level for specific data

• either file replica count or erasure encoding • Support bulk deployment operations

– eg replace hundreds of servers at end of warranty • In-memory namespace (sparse hash per directory)

– file stat calls 1-2 orders faster – write ahead logging for durability

• Later in addition: transparent multi-site clustering • eg between Geneva and Budapest

16

Page 17: Big Science and Big DataBig Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster

Connectivity (100 Gbps)

Dante/Géant

T-Systems

Page 18: Big Science and Big DataBig Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster

EOS Raw Capacity Evolution

Page 19: Big Science and Big DataBig Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster

Why do we develop our own open source storage software?• Large science community trained to be effective with set of

products

• efficiency of this community is our main asset - not just the raw utilisation of CPUs and disks

• integration and specific support do matter

• community sharing via tools and formats even more

• Long term projects

• change of “vendor/technology” is not only likely but expected

• we carry old but valuable data through time (bit-preservation)

• “loss of data ownership” after first active project period

Page 20: Big Science and Big DataBig Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster

Does Kryder’s law still hold?

areal density CAGR

source: HDD Opportunities & Challenges, Now to 2020, Dave Anderson, Seagate

Page 21: Big Science and Big DataBig Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster

Object Disk• Each disk talks object storage

protocol over TCP – replication/failover with other disks

in a networked disk cluster – open access library for app

development

– Why now? • shingled media come with constrained

(object) semantic: eg no updates

– Early stage with several open questions • port price for disk network vs price gain

by reduced server/power cost? • standardisation of protocol/semantics to

allow app development at low risk of vendor binding?

Page 22: Big Science and Big DataBig Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster

Can we optimise our systems further?

• Infrastructure analytics

• apply statistical analysis to the complete system: storage, cpu, network, user app

• measure/predict quantitative impact of changes on real job population

• Easy!

• looks like physics analysis with infrastructure metrics instead of physics data

• … really?

Page 23: Big Science and Big DataBig Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster

Non-trivial…• Technically

• needs consolidated service and application side metrics

• usually: log data for human consumption — without data design

• Conceptually

• some established metrics turn out to be less useful for analysis of today’s hardware than expected

• cpu efficiency = t_cpu / t_wall ? storage efficiency = GB / s ?

• correlation does not imply causal relation

• Sociologically

• better observe “rule of local discovery”

• people who quantitatively understand the infrastructure are busy running services — Always …

Page 24: Big Science and Big DataBig Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster

Data Collection and Analysis Repository

eos

HDFS

ai

lsfreadbytes : numberfilename : stringopentime : time

Set: EOS readbytes : numberfilename : stringopentime : time

Set: EOS readbytes : numberfilename : stringopentime : time

Set: eos PeriodicExtract & Cleaning

MonitoringJSON Files

export

User extract

MR node

MR node

MR node

MR node

MR node

MR node

Hadoop

small, binary subset

Ramping up:~ 100 nodes~ 100 TB raw logs

In production:- Flume- HDFS- MR- Pig- Spark- Scoop- {Impala}

Current work items: Service: availability (eg isolation and rolling upgrades) Analytics: workbooks support for popular analysis tools: R/python/ROOT

Page 25: Big Science and Big DataBig Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster

Summary

• CERN has a long tradition in deploying large scale storage systems used by a distributed science community world-wide

• During the first LHC run period we have passed the 100 PB mark at CERN and more importantly have contributed to the rapid confirmation of the Higgs boson and many other LHC results

• For LHC Run 2 we have significantly upgraded & optimised the infrastructure in close collaboration between service providers and users

• Adding more quantitative infrastructure analytics to prepare for High-Luminosity-LHC

• CERN is already very active as user and provider in the open source world and the overlap with other Big Data communities is increasing.

Page 26: Big Science and Big DataBig Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster

Thank you!