on the verge of one petabyte – the story behind the babar database system jacek becla stanford...
TRANSCRIPT
On the Verge of One Petabyte – the Story Behind the BaBar Database System
Jacek BeclaStanford Linear Accelerator CenterFor the BaBar Computing Group
CHEP’03
2 of 18
Outline
Talk will cover– Our experience with running large scale
DB system Achievements, issues New development and what drives it
– Main focus on period since last CHEP
CHEP’03
3 of 18
Providing Persistency for BaBar
Growing complexity and demands Changing requirements Hitting unforeseen limits in many places Non-trivial maintenance
– Most problems are persistent-technology independent– System becoming more and more distributed
Very lively environment– Production not as stable as one would imagine
CHEP’03
4 of 18
Some Numbers 750+ TB of data 0.5+ million DB files Several billion events 60+ million collections 1000+ simultaneous analysis jobs
accessing DB common
CHEP’03
5 of 18
Data Availability is Essential
PromptCalibration– Rapid feedback, keeping up with detector
Event Reco (ER)– Data available for analysis within a week
Reprocessing– All data reprocessed before conferences
Analysis– Outages < 4%
driven mostly by power outages, hardware failures
CHEP’03
6 of 18
What Changed Since Sep'01/last CHEP?
Event Reconstruction– 4 output physics streams 20– 20 output streams 5 + 115 pointer collections– Rolling calibrations now separated– Runs now processed in parallel– Raw and rec not persisted anymore– Planning to run skim production separately
continued…
CHEP’03
7 of 18
What Changed Since Sep'01/last CHEP?
Simulation Production– 1.5 3 MC events per real event– ~8 ~24 production sites
Analysis– Bridge federations now fully functional– Significant system growth
29 data servers, 34 lock/journal servers 66TB disk space, 101 slave federations
CHEP’03
8 of 18
Some Challenges Setting up ER/REP in Padova
– All Linux based
Recovery from Linux-client crashes leaving connections on server side
Three data corruptions1. Understood and fixed – race condition: file descriptors
closed/reopened incorrectly2. Never understood, went away after power outage (Dec'02)
Not sure who is at fault: Objectivity? Linux kernel?
3. Problems with B-Tree index updates in Temporal Database
Imposed by our software– Lock collisions– Large number of skim collections
Overflowing containers
CHEP’03
9 of 18
Some New Features
Bridge federations– all 3 phases deployed
Data compression New Conditions DB (CDB) Automatic load balancing
CHEP’03
10 of 18
Conditions DB Main features
– New conceptual model for metadata 2-d space of validity and insertion time, revisions, persistent
configurations, types of conditions, hierarchical namespace for conditions
– Flexible user data clustering– Support for distributed updates and use– State ID– Scalability problems solved– Significant (100-1000x) speedup for critical use cases
Status– In production since Fall’02– Data converted to new format– Working on distributed management tools
CHEP’03
11 of 18
AMS Load Balancing Dynamically stages in/replicates files
– Based on configurable parameters and host load
Increases fault tolerance– Data servers can be taken offline transparently
Scalable– Hierarchical
Currentlybeing tested
Dynamic
SelectionDistinguishedDistinguishedAMSAMS
CHEP’03
12 of 18
Size Raw/rec not persisted
– Event: ~200 kB ~20kB
Continues to grow fast– Higher luminosity– 115 skims– Reprocessing all data every year– More MC events (1.5:1 3:1)
Reducing size– Event store redesign (see talk by Yemi tomorrow)– Data compression (achieving ~2:1 compression)
050
100150200250300350400450500550600650700750800
size
[TB
]
Oct-99 Jan-00 Apr-00 Jul-00 Oct-00 Jan-01 Apr-01 Jul-01 Oct-01 Jan-02 Apr-02 Jul-02 Oct-02 Jan-03
CHEP’03
13 of 18
Media Attention
World’s largest database– 500 TB – see SLAC press release (Apr/02)
Many ideas/problems/solutions common to any large scale database system
Newspaper and local TV coverage– Non-HEP attention
CHEP’03
14 of 18
Size matters in data world
Mountains Of Data: 500 Terabytes And Counting
A firm grip, or gagging on gigabytes?
Stanford claims world's largest database
500,000 gigabytes and growing: SLAC houses world's largest database
University database breaks world record
Stanford Linear Accelerator Database Reaches 500,000 Gigabytes
Stanford researchers may have world’s largest database
CHEP’03
15 of 18
New Computing Model Discussed Fall’02 Main decisions
– Two stage approach Develop "new micro" in ROOT-based
– alternative to nTuples Develop full event store in ROOT-based
– Deprecate ROOT-based conditions Use existing Objy-based conditions
Main reasons to change– To follow general HEP trend– To allow interactive analysis in ROOT
CHEP’03
16 of 18
Summary DB system keeps up with excellent
B-Factory performance– No major problems/showstoppers– Tackling with growing size, complexity and demands
Event store technology based on Objectivity– A good, working model, proven in production– Not well proven in analysis
Most users extract data to nTuples
– Likely to be deprecated soon
May'99 - Mar'03– Undoubtedly a successful chapter for the BaBar DB
CHEP’03
17 of 18
Acknowledgements Development Team
– Andy Hanushevsky– Andy Salnikov (online databases)– Daniel Wang (started Sep’02)– David Quarrie (gone Oct’01)– Igor Gaponenko– Simon Patton (gone March ’02)– Yemi Adesanya
Operations Team– Adil Hasan– Artem Trunov– Wilko Kroeger– Tofigh Azemoon
CHEP’03
18 of 18
Some Related BaBar Talks
Operation Aspects of Dealing with the Large BaBar Data Set– Category 8, Tuesday 3:30pm
The Redesigned BaBar Event Store – Believe the Hype– Category 8, Tuesday 4:50pm
BdbServer++: A User Instigated Data Location and Retrieval Tool– Category 2
Distributing BaBar Data Using SRB– Category 2
Distributed Offline Data Reconstruction in BaBar– Category 3, Tuesday, 6:10pm