cern-it-db exabyte-scale data management using an object-relational database: the lhc project at...

CERN-IT-DB

Exabyte-Scale Data Management Using an Object-Relational

Database: The LHC Project at CERN

Jamie ShiersCERN, Switzerlandhttp://cern.ch/db/

EB Scale DBs

Overview

Brief introduction to CERN & LHC

Why we have massive data volumes

The role of Object-Relational DBs

A Possible Solution…

CERN - The European Organisation for Nuclear Research

The European Laboratory for Particle Physics

Fundamental research in particle physicsDesigns, builds & operates large acceleratorsFinanced by 20 European countries (member

states) + others (US, Canada, Russia, India, ….)~€650M budget - operation + new accelerators 2000 staff + 6000 users (researchers) from all over

the world

LHC (starts ~2005) experiment: 2000 physicists, 150 universities, apparatus costing ~€300M, computing ~€250M to setup, ~€60M/year to run

10+ year lifetime

EB Scale DBs

EB Scale DBs

airp

ort

Computer Centre Geneva

27km

EB Scale DBs

EB Scale DBs

The LHC machine

Two counter-circulating proton beams

Collisionenergy 7 + 7 TeV

27 Km of magnetswith a field of 8.4 Tesla

Super-fluid Heliumcooled to 1.9°K

The world’s largest superconducting structure

EB Scale DBs The LHC Detectors

CMSATLAS

LHCb

EB Scale DBs

online systemmulti-level triggerfilter out backgroundreduce data volume from40TB/s to 100MB/s

level 1 - special hardware

40 MHz (40 TB/sec)level 2 - embedded processorslevel 3 - PCs

75 KHz (75 GB/sec)5 KHz (5 GB/sec)100 Hz(100 MB/sec)data recording &

offline analysis

1000TB/s according to recent estimates

EB Scale DBs

Higgs Search

H ZZ

•Start with protons•(quarks + gluons)

•Accelerate & collide•Observe in massive detectors

EB Scale DBs

LHC Data Challenges

4 large experiments, 10-15 year lifetimeData rates: ~500MB/s – 1.5GB/sData volumes: ~5PB / experiment / year

Several hundred PB total !Data reduced from “raw data” to “analysis

data” in a small number of well-defined steps

Analysed by thousands of users world-wide

CERN-IT-DBEstimated DISK Capacity at CERN

0

1000

2000

3000

4000

5000

6000

7000

1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

year

Tera

Byt

es

LHC

Other experiments

Estimated Mass Storage at CERN

LHC

Other experiments

0

20

40

60

80

100

120

14019

98

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

Year

Pet

aByt

es

Estimated CPU Capacity at CERN

0

1,000

2,000

3,000

4,000

5,000

6,000

1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

year

K S

I95

LHC

Other experiments

Moore’s law

Planned capacity evolution at CERN

Mass Storage Disk

CPU

EB Scale DBs

RAW

ESD

AOD

TAG

randomseq.

1PB/yr

100TB/yr

10TB/yr

1TB/yr

Data

Users

Tier0

Tier1

interactivephysicsanalysis

batchphysicsanalysis

batchphysicsanalysis

detector

event summary data

rawdata

eventreprocessing

eventreprocessing

eventsimulation

eventsimulation

analysis objects(extracted by physics topic)

Data Handling and Computation for

Physics Analysisevent filter(selection &

reconstruction)

event filter(selection &

reconstruction)

processeddata

les.

rob

ert

son

@ce

rn.c

h

CERN

EB Scale DBs

LHC Data ModelsLHC data models are

complex! Typically hundreds (500-1000) of

structure types (classes) Many relations between them Different access patterns

LHC experiments rely on OO technology OO applications deal with

networks of objects Pointers (or references) are

used to describe relations

EventEvent

TrackListTrackList

TrackerTracker Calor.Calor.

TrackTrackTrackTrackTrackTrack

TrackTrackTrackTrack

HitListHitList

HitHitHitHitHitHitHitHitHitHit

EB Scale DBs

CMS: 1800 physicists150 institutes32 countries

World Wide Collaboration distributed computing & storage capacity

EB Scale DBs

physics group

regional group

les.

rob

ert

son

@ce

rn.c

h

CERNTier2

Lab a

Uni a

Lab c

Uni n

Lab m

Lab b

Uni bUni y

Uni x

Tier3physics

department

Desktop

Germany

Tier 1

USA

UK

France

Italy

……….

CERN Tier 1

……….

The LHC Computing

Centre

CERN-IT-DB

Why use DBs?

OK, you have lots of data, but what have databases, let alone Object-Relational DBs got to do with it?

EB Scale DBs

Why Not: file = object + GREP?It works if you have thousands of objects

(and you know them all)But hard to search

millions/billions/trillions with GREP

Hard to put all attributes in file name. Minimal metadata

Hard to do chunking right.Hard to pivot on

space/time/version/attributes.Email: [email protected] site: http://research.microsoft.com/~Gray

EB Scale DBs

The Reality: its build vs buy

If you use a file system you will eventually build a database system: metadata, Query, parallel ops, security,…. reorganize, recovery, distributed, replication,

EB Scale DBsOK: so I’ll put lots of objects in a

fileDo It Yourself Database

Good news: Your implementation will be 10x faster (at

least!) easier to understand and use

Bad news: It will cost 10x more to build and maintain Someday you will get bored

maintaining/evolving it It will lack some killer features:

• Parallel search• Self-describing via metadata• SQL, XML, … • Replication• Online update – reorganization• Chunking is problematic (what granularity, how to

aggregate)

EB Scale DBsTop 10 reasons to put Everything in a DB1. Someone else writes the million lines of code

2. Captures data and Metadata,3. Standard interfaces give tools and quick learning4. Allows Schema Evolution without breaking old apps5. Index and Pivot on multiple attributes

space-time-attribute-version….6. Parallel terabyte searches in seconds or minutes7. Moves processing & search close to the disk arm

(moves fewer bytes (qestons return datons).

8. Chunking is easier (can aggregate chunks at server).9. Automatic geo-replication 10.Online update and reorganization. 11.Security 12.If you pick the right vendor, ten years from now,

there will be software that can read the data.

CERN-IT-DB

How to build multi-PB DBs

Total LHC data volume: ~300PBVLDBs today: ~3TB

Just 5 orders of magnitude to solve…

(one per year)

EB Scale DBs

Divide & Conquer Split data from different experiments Split different data types

Different schema, users, access patterns,… Focus on mainstream technologies &

low-risk solutions VLDB target: 100TB databases

1. How do we build 100TB databases?2. How do we use 100TB databases to

solve 100PB problem?

EB Scale DBs

Why 100TB DBs?

Possible today

Vendors must provide support

Expected to be mainstream within a few years

EB Scale DBs Decision Support (2000)

Company DB Size*(TB)

DBMS Partner

Server Partner

Storage Partner

SBC 10.50 NCR NCR LSI

First Union Nat. Bank

4.50 Informix IBM EMC

Dialog 4.25 Proprietary Amdahl EMC

Telecom Italia (DWPT)

3.71 IBM IBM Hitachi

FedEx Services 3.70 NCR NCR EMC

Office Depot 3.08 NCR NCR EMC

AT & T 2.83 NCR NCR LSI

SK C&C 2.54 Oracle HP EMC

NetZero 2.47 Oracle Sun EMC

Telecom Italia (DA) 2.32 Informix Siemens TerraSystems

*Database size = sum of user data + summaries and aggregates + indexes

EB Scale DBs

Size of the Largest RDBMS in Commercial Use for DSSSource: Database Scalability Program 2000

Terabytes

3

50

100

1996 2000 2005

Projected By Respondents

EB Scale DBs

BT Visit – July 2001

Oracle VLDB site: Enormous Proof of Concept test in 1999 80TB disk, 40TB mirrored, 37TB usable Performed using Oracle 8i, EMC storage “Single instance” – i.e. not cluster

Same techniques as being used at CERN

Demonstrated > 2 years ago!No concerns for building 100TB today!

EB Scale DBs

Physics DB Deployment

Currently run 1-3TB / server Dual processor Intel/Linux Scale to ~10TB in a few years sounds plausible

10-node cluster: 100TB ~100 disks in 2005!

Can we achieve close to linear scalability? Fortunately, our data is write-once, read-many

Should be good match for shared-disk clusters

EB Scale DBs

100TB DBs & LHC Data

Analysis data: 100TB ok for ~10 yearsOne DB cluster

Intermediate: 100TB ~1 year’s data ~40 DB clusters

RAW data: 100TB = 1 month’s data 400 DB clusters to handle all RAW

data• 10 / year, 10 years, 4 experiments

EB Scale DBs

RAW DataProcessed sequentially ~once / yearNeed only current + historic window onlineSolution: partitioning + offline tablespaces100TB = 10 days dataAmple for (re-)processing

Partition the tables “Old” data transportable TBS copy to tape Drop from catalog Reload, eventually to a different server, on request

EB Scale DBs

Intermediate Data~100-500TB / experiment / yearYotta-byte DBs predicted by 2020!

1000,000,000 TB

? Can DBMS capabilities grow fast enough to permit just 1 server / experiment? ++500TB / year

An open question …

EB Scale DBs

DB DeploymentDAQ cluster: current data – no history

export tablespacesto RAW cluster

to/from MSS

ESD cluster: 1/year? 1?

AOD/TAG 1 total?

to RCs to/from RCs

reconstruct analysis

EB Scale DBs

Come & Visit Us!

EB Scale DBs

Come Join Us!

EB Scale DBs

Summary

Existing DB technologies can be used to build 100TB databases

Familiar data warehousing techniques can be used to handle much larger volumes of historic data

A paper solution to the problems of LHC data management exists: now we just have to implement it

cern-it-db exabyte-scale data management using an object-relational database: the lhc project at...

Documents

eb scale dbs slide

analysis data

cern lhc

eb scale dbs cms

mbsec data

chdb slide

experiments lhc

eb scale dbs online