handling high energy physics data using cloud computing
TRANSCRIPT
![Page 1: Handling High Energy Physics Data using Cloud Computing](https://reader035.vdocument.in/reader035/viewer/2022062706/55757a88d8b42adb7e8b4b65/html5/thumbnails/1.jpg)
04
/13
/20
23
1
High Energy Physics Data Management using CLOUD ComputingANALYSIS OF THE FAMOUS BABAR EXPERIMENT DATA HANDLING
PAPER BY: ABHISHEK DEY, CSE 2nd Year | DIYA GHOSH, CSE 2nd year | Mr. SOMENATH ROY CHOWDHURY
![Page 2: Handling High Energy Physics Data using Cloud Computing](https://reader035.vdocument.in/reader035/viewer/2022062706/55757a88d8b42adb7e8b4b65/html5/thumbnails/2.jpg)
04/13/2023
2
Contents
Motivation
HEP Legacy Project
CANFAR Astronomical Research Facility
System Architecture
Operational Experience
Summary
![Page 3: Handling High Energy Physics Data using Cloud Computing](https://reader035.vdocument.in/reader035/viewer/2022062706/55757a88d8b42adb7e8b4b65/html5/thumbnails/3.jpg)
04/13/2023
3
What exactly is BaBar?
It’s design was motivated by the investigation of CP violation.
set up to understand the disparity between the matter and antimatter content of the universe by measuring CP violation.
BaBar focuses on the study of CP violation in the B meson system.
nomenclature for the B meson (symbol B) and its antiparticle (symbol B, pronounced B bar)
![Page 4: Handling High Energy Physics Data using Cloud Computing](https://reader035.vdocument.in/reader035/viewer/2022062706/55757a88d8b42adb7e8b4b65/html5/thumbnails/4.jpg)
04/13/2023
4
BaBar : Data Point of View
9.5 million lines of C++ and Fortran
Compiled size is 30 GB
Significant amount of manpower is required to maintain the software
Each installation must be validated before generated results will be accepted.
CANFAR is a partnership between :
– University of Victoria – University of British Columbia – National Research Council, Canadian Astronomy Data Centre – Herzberg Institute for Astrophysics
Helps in providing Infrastructure for VMs.
![Page 5: Handling High Energy Physics Data using Cloud Computing](https://reader035.vdocument.in/reader035/viewer/2022062706/55757a88d8b42adb7e8b4b65/html5/thumbnails/5.jpg)
04/13/2023
5
Need for Cloud Computing:
Jobs are embarrassingly parallel, much like HEP.
Each of these surveys requires a different processing environment, which require:
A specific version of a Linux distribution.
A specific compiler version. Specific libraries
Applications have little documentation.
These environments are evolving rapidly
![Page 6: Handling High Energy Physics Data using Cloud Computing](https://reader035.vdocument.in/reader035/viewer/2022062706/55757a88d8b42adb7e8b4b65/html5/thumbnails/6.jpg)
04/13/2023
6
DATA is precious, too precious..
We need Infrastructure, which comes easily as a Service
![Page 7: Handling High Energy Physics Data using Cloud Computing](https://reader035.vdocument.in/reader035/viewer/2022062706/55757a88d8b42adb7e8b4b65/html5/thumbnails/7.jpg)
04/13/2023
7
A word about Cloud Computing:
![Page 8: Handling High Energy Physics Data using Cloud Computing](https://reader035.vdocument.in/reader035/viewer/2022062706/55757a88d8b42adb7e8b4b65/html5/thumbnails/8.jpg)
04/13/2023
8
IaaS: What next?
With IaaS, we can easily create many instances of a VM image
How do we Manage the VMs once booted?
How do we get jobs to the VMs?
![Page 9: Handling High Energy Physics Data using Cloud Computing](https://reader035.vdocument.in/reader035/viewer/2022062706/55757a88d8b42adb7e8b4b65/html5/thumbnails/9.jpg)
04/13/2023
9Our Solution: Cloud Scheduler + Condor
Users create a VM with their experiment software installed.
A basic VM is created by one group, and users add on their analysis or processing software to create their custom VM.
Users then create batch jobs as they would on a regular cluster, but they specify which VM should run their images.
CONDOR
![Page 10: Handling High Energy Physics Data using Cloud Computing](https://reader035.vdocument.in/reader035/viewer/2022062706/55757a88d8b42adb7e8b4b65/html5/thumbnails/10.jpg)
04/13/2023
10Steps for the successful architecture setup:
![Page 11: Handling High Energy Physics Data using Cloud Computing](https://reader035.vdocument.in/reader035/viewer/2022062706/55757a88d8b42adb7e8b4b65/html5/thumbnails/11.jpg)
04/13/2023
11
![Page 12: Handling High Energy Physics Data using Cloud Computing](https://reader035.vdocument.in/reader035/viewer/2022062706/55757a88d8b42adb7e8b4b65/html5/thumbnails/12.jpg)
04/13/2023
12
![Page 13: Handling High Energy Physics Data using Cloud Computing](https://reader035.vdocument.in/reader035/viewer/2022062706/55757a88d8b42adb7e8b4b65/html5/thumbnails/13.jpg)
04/13/2023
13
![Page 14: Handling High Energy Physics Data using Cloud Computing](https://reader035.vdocument.in/reader035/viewer/2022062706/55757a88d8b42adb7e8b4b65/html5/thumbnails/14.jpg)
04/13/2023
14CANFAR : MAssive Compact Halo Objects
Detailed re-analysis of data from the MACHO experiment Dark Matter search.
Jobs perform a wget to retrieve the input data (40 M) and have a 4-6 hour run time. Low I/O great for clouds.
Astronomers happy with the environment.
![Page 15: Handling High Energy Physics Data using Cloud Computing](https://reader035.vdocument.in/reader035/viewer/2022062706/55757a88d8b42adb7e8b4b65/html5/thumbnails/15.jpg)
04/13/2023
15
Data Handling in BaBar:
Analysis Jobs
Event data
Real Data
Simulated Data
ConfigurationBaBar
Conditions Database
Data is approximately 2PB.
The file system is hosted on a cluster of six nodes, consisting of a Management/Metadata server (MGS/MDS).
five Object Storage servers (OSS).
single gigabit interface/VLAN to communicate both internally and externally.
![Page 16: Handling High Energy Physics Data using Cloud Computing](https://reader035.vdocument.in/reader035/viewer/2022062706/55757a88d8b42adb7e8b4b65/html5/thumbnails/16.jpg)
04/13/2023
16
Xrootd : Need for Distributed Data
Xrootd is a file server providing byte level access and is used by many high energy physics experiments.
provides access to the distributed data.
a read-ahead value of 1 MB
a read-ahead cache size of 10 MB was set on each Xrootd client
![Page 17: Handling High Energy Physics Data using Cloud Computing](https://reader035.vdocument.in/reader035/viewer/2022062706/55757a88d8b42adb7e8b4b65/html5/thumbnails/17.jpg)
04/13/2023
17
How a DFS works?
Blocks replicated across several datanodes(usually 3)
Single namenode stores metadata (file names, block locations, etc.)
Optimized for large files, sequential reads Clients read from closest replica available.(note:
locality of reference.) If the replication for a block drops below target,
it is automatically re-replicated.
Datanodes
11223344
112244
221133
114433
332244
Namenode
![Page 18: Handling High Energy Physics Data using Cloud Computing](https://reader035.vdocument.in/reader035/viewer/2022062706/55757a88d8b42adb7e8b4b65/html5/thumbnails/18.jpg)
04/13/2023
18
Results and Analysis:
![Page 19: Handling High Energy Physics Data using Cloud Computing](https://reader035.vdocument.in/reader035/viewer/2022062706/55757a88d8b42adb7e8b4b65/html5/thumbnails/19.jpg)
04/13/2023
19
Fault tolerant model:
![Page 20: Handling High Energy Physics Data using Cloud Computing](https://reader035.vdocument.in/reader035/viewer/2022062706/55757a88d8b42adb7e8b4b65/html5/thumbnails/20.jpg)
04/13/2023
20
Acknowledgements
A special word of appreciation and thanks to Mr. Somenath Roy Chowdhury.
My heartiest thanks to the entire team who worked hard to build the cloud.
![Page 21: Handling High Energy Physics Data using Cloud Computing](https://reader035.vdocument.in/reader035/viewer/2022062706/55757a88d8b42adb7e8b4b65/html5/thumbnails/21.jpg)
04
/13
/20
23
21
Questions Please?