on the verge of one petabyte – the story behind the babar database system jacek becla stanford...

On the Verge of One Petabyte – the Story Behind the BaBar Database System

Jacek BeclaStanford Linear Accelerator CenterFor the BaBar Computing Group

CHEP’03

2 of 18

Outline

Talk will cover– Our experience with running large scale

DB system Achievements, issues New development and what drives it

– Main focus on period since last CHEP

CHEP’03

3 of 18

Providing Persistency for BaBar

Growing complexity and demands Changing requirements Hitting unforeseen limits in many places Non-trivial maintenance

– Most problems are persistent-technology independent– System becoming more and more distributed

Very lively environment– Production not as stable as one would imagine

CHEP’03

4 of 18

Some Numbers 750+ TB of data 0.5+ million DB files Several billion events 60+ million collections 1000+ simultaneous analysis jobs

accessing DB common

CHEP’03

5 of 18

Data Availability is Essential

PromptCalibration– Rapid feedback, keeping up with detector

Event Reco (ER)– Data available for analysis within a week

Reprocessing– All data reprocessed before conferences

Analysis– Outages < 4%

driven mostly by power outages, hardware failures

CHEP’03

6 of 18

What Changed Since Sep'01/last CHEP?

Event Reconstruction– 4 output physics streams 20– 20 output streams 5 + 115 pointer collections– Rolling calibrations now separated– Runs now processed in parallel– Raw and rec not persisted anymore– Planning to run skim production separately

continued…

CHEP’03

7 of 18

What Changed Since Sep'01/last CHEP?

Simulation Production– 1.5 3 MC events per real event– ~8 ~24 production sites

Analysis– Bridge federations now fully functional– Significant system growth

29 data servers, 34 lock/journal servers 66TB disk space, 101 slave federations

CHEP’03

8 of 18

Some Challenges Setting up ER/REP in Padova

– All Linux based

Recovery from Linux-client crashes leaving connections on server side

Three data corruptions1. Understood and fixed – race condition: file descriptors

closed/reopened incorrectly2. Never understood, went away after power outage (Dec'02)

Not sure who is at fault: Objectivity? Linux kernel?

3. Problems with B-Tree index updates in Temporal Database

Imposed by our software– Lock collisions– Large number of skim collections

Overflowing containers

CHEP’03

9 of 18

Some New Features

Bridge federations– all 3 phases deployed

Data compression New Conditions DB (CDB) Automatic load balancing

CHEP’03

10 of 18

Conditions DB Main features

– New conceptual model for metadata 2-d space of validity and insertion time, revisions, persistent

configurations, types of conditions, hierarchical namespace for conditions

– Flexible user data clustering– Support for distributed updates and use– State ID– Scalability problems solved– Significant (100-1000x) speedup for critical use cases

Status– In production since Fall’02– Data converted to new format– Working on distributed management tools

CHEP’03

11 of 18

AMS Load Balancing Dynamically stages in/replicates files

– Based on configurable parameters and host load

Increases fault tolerance– Data servers can be taken offline transparently

Scalable– Hierarchical

Currentlybeing tested

Dynamic

SelectionDistinguishedDistinguishedAMSAMS

CHEP’03

12 of 18

Size Raw/rec not persisted

– Event: ~200 kB ~20kB

Continues to grow fast– Higher luminosity– 115 skims– Reprocessing all data every year– More MC events (1.5:1 3:1)

Reducing size– Event store redesign (see talk by Yemi tomorrow)– Data compression (achieving ~2:1 compression)

050

100150200250300350400450500550600650700750800

size

[TB

]

Oct-99 Jan-00 Apr-00 Jul-00 Oct-00 Jan-01 Apr-01 Jul-01 Oct-01 Jan-02 Apr-02 Jul-02 Oct-02 Jan-03

CHEP’03

13 of 18

Media Attention

World’s largest database– 500 TB – see SLAC press release (Apr/02)

Many ideas/problems/solutions common to any large scale database system

Newspaper and local TV coverage– Non-HEP attention

CHEP’03

14 of 18

Size matters in data world

Mountains Of Data: 500 Terabytes And Counting

A firm grip, or gagging on gigabytes?

Stanford claims world's largest database

500,000 gigabytes and growing: SLAC houses world's largest database

University database breaks world record

Stanford Linear Accelerator Database Reaches 500,000 Gigabytes

Stanford researchers may have world’s largest database

CHEP’03

15 of 18

New Computing Model Discussed Fall’02 Main decisions

– Two stage approach Develop "new micro" in ROOT-based

– alternative to nTuples Develop full event store in ROOT-based

– Deprecate ROOT-based conditions Use existing Objy-based conditions

Main reasons to change– To follow general HEP trend– To allow interactive analysis in ROOT

CHEP’03

16 of 18

Summary DB system keeps up with excellent

B-Factory performance– No major problems/showstoppers– Tackling with growing size, complexity and demands

Event store technology based on Objectivity– A good, working model, proven in production– Not well proven in analysis

Most users extract data to nTuples

– Likely to be deprecated soon

May'99 - Mar'03– Undoubtedly a successful chapter for the BaBar DB

CHEP’03

17 of 18

Acknowledgements Development Team

– Andy Hanushevsky– Andy Salnikov (online databases)– Daniel Wang (started Sep’02)– David Quarrie (gone Oct’01)– Igor Gaponenko– Simon Patton (gone March ’02)– Yemi Adesanya

Operations Team– Adil Hasan– Artem Trunov– Wilko Kroeger– Tofigh Azemoon

CHEP’03

18 of 18

Some Related BaBar Talks

Operation Aspects of Dealing with the Large BaBar Data Set– Category 8, Tuesday 3:30pm

The Redesigned BaBar Event Store – Believe the Hype– Category 8, Tuesday 4:50pm

BdbServer++: A User Instigated Data Location and Retrieval Tool– Category 2

Distributing BaBar Data Using SRB– Category 2

Distributed Offline Data Reconstruction in BaBar– Category 3, Tuesday, 6:10pm

on the verge of one petabyte – the story behind the babar database system jacek becla stanford...

Documents