dial: distributed interactive analysis of large datasets

28
ATLAS DIAL: Distributed Interactive Analysis of Large Datasets David Adams – BNL September 16, 2005 DOSAR meeting

Upload: boris

Post on 18-Jan-2016

45 views

Category:

Documents


3 download

DESCRIPTION

DIAL: Distributed Interactive Analysis of Large Datasets. DOSAR meeting. David Adams – BNL September 16, 2005. DIAL project Implementation Components Dataset Transformation Job Scheduler Catalogs User interfaces. Contents. Status Interactivity Current results Conclusions - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: DIAL: Distributed Interactive Analysis of Large Datasets

ATLAS

DIAL: Distributed Interactive Analysisof Large Datasets

David Adams – BNL

September 16, 2005

DOSAR meeting

Page 2: DIAL: Distributed Interactive Analysis of Large Datasets

D. Adams DOSAR meeting DIAL September 16, 2005 2ATLAS

Contents

DIAL project

Implementation

Components• Dataset

• Transformation

• Job

• Scheduler

• Catalogs

• User interfaces

Status

Interactivity

Current results

Conclusions

More information

Contibutors

Page 3: DIAL: Distributed Interactive Analysis of Large Datasets

D. Adams DOSAR meeting DIAL September 16, 2005 3ATLAS

DIAL projectDIAL = Distributed Interactive Analysis of Large datasets

Goal is to demonstrate the feasibility of doing interactive analysis of large data samples

• Analysis means production of analysis objects (e.g. histograms or ntuples) from HEP event data

• Interactive means that user request is processed in a few minutes

• Large is whatever is available and useful for physics studies– How large can we go?

Approach is to distribute processing over many nodes using the inherent parallelism of HEP data

• Assume each event can be independently processed

• Each node runs an independent job to processes a subset of the events

• Results from each job are merged into the overall result

Page 4: DIAL: Distributed Interactive Analysis of Large Datasets

D. Adams DOSAR meeting DIAL September 16, 2005 4ATLAS

ImplementationDIAL software provides a generic end-to-end framework for distributed analysis

• Generic means few assumptions about the data to be processed or application that carries out the application

– Experiment and user must provide extensions to the system

• Able to support a wide range of processing systems– Local processing using batch systems such as LSF, Condor and PBS

– Grid processing using different flavors of CE or WMS

• Provides friendly user interfaces– Integration with root

– Python binding (from GANGA)

– GUI for job submission and monitoring (from GANGA)

Mostly written in C++• Releases packaged 3-4 times/year

Page 5: DIAL: Distributed Interactive Analysis of Large Datasets

D. Adams DOSAR meeting DIAL September 16, 2005 5ATLAS

ComponentsAJDL

• Abstract job definition language

• Dataset and transformation (application + task)

• Job

• C++ interface and XML representation for each

Scheduler• Handles job processing

Catalogs• Hold job definition objects and their metadata

Web services

User interfaces

Monitoring and accounting

Page 6: DIAL: Distributed Interactive Analysis of Large Datasets

D. Adams DOSAR meeting DIAL September 16, 2005 6ATLAS

DatasetGeneric data description includes

• Description of the content (type of data)– E.g. reconstructed objects, electrons, jets with cone size 0.7,…

– Number of events and their ID’s

• Data location– Typically a list of logical file names

• List of constituent datasets– Model is inherently hierarchical

DIAL provides the following classes• Dataset – defines the common interface

• GenericDataset – provides data and XML representation

• TextDataset – embedded collection of text files

• SingleFileDataset – Data location is a single file

• EventMergeDataset – Collection of single file event datasets

Page 7: DIAL: Distributed Interactive Analysis of Large Datasets

D. Adams DOSAR meeting DIAL September 16, 2005 7ATLAS

Dataset (cont)Current approach for experiment-specific event datasets

• Add a GenericDataset subclass that is able to read experiment files and fill the appropriate data

• Use EventMergeDataset to describe large collections

• DIAL can carry out processing using the GenericDataset interface and implementation

Allowed for experiment to provide its own implementation of the Dataset interface

• If GenericDataset is not sufficient

• Processing components then require this implementation

Page 8: DIAL: Distributed Interactive Analysis of Large Datasets

D. Adams DOSAR meeting DIAL September 16, 2005 8ATLAS

TransformationA transformation acts on a dataset to produce another dataset

• Fundamental user interaction with the system

Transformation has two components• Application describes the action to take

• Task provides data to configure the application

Task is a collection of named text files

Application holds two scripts• run – Carries out transformation

– Input is dataset.xml and output is result.xml

– Has access to the transformation

• build_task – Creates transformation data from the files in the task– E.g. compiles to build library

– Build is platform specific

Page 9: DIAL: Distributed Interactive Analysis of Large Datasets

D. Adams DOSAR meeting DIAL September 16, 2005 9ATLAS

JobThe Job class hold the following data:

• Job definition: application, task, dataset and job preferences

• Status of the corresponding job– Initialized, running, done, failed or killed

• Other history information– Compute node, native ID, start and stop times, return code, …

Job provides interface to• Start or submit job

• Update the status and history information

• Kill job

• Fetch the result (output dataset)

Page 10: DIAL: Distributed Interactive Analysis of Large Datasets

D. Adams DOSAR meeting DIAL September 16, 2005 10ATLAS

Job (cont)Subclasses

• Provide connection to underlying processing system

• Implement these methods

• Currently support fork, LSF and Condor

ScriptedJob• Subclass that implements these methods by calling a script

• Scripts may be then written to support different processing systems – Alternative to providing subclass

• Added in release 1.20

Job copies• Jobs may be copied locally or remotely

• Copy includes only the base class:– All common data available but job cannot be started, updated or killed– Use scheduler for these interactions

Page 11: DIAL: Distributed Interactive Analysis of Large Datasets

D. Adams DOSAR meeting DIAL September 16, 2005 11ATLAS

SchedulerScheduler carries out processing

The following interface is defined• add_task builds the task

– Input: application, task

– Output: job ID

• submit creates and starts a job– Input: application, task, dataset and job preferences

– Output: job ID

• job(JobId) returns a reference to the job– Or to a copy if the scheduler is remote

• kill(JobId) to kill a job

Page 12: DIAL: Distributed Interactive Analysis of Large Datasets

D. Adams DOSAR meeting DIAL September 16, 2005 12ATLAS

Scheduler (cont)Subclasses implement this interface

• LocalScheduler– Use a single job type and queue to handle processing, e.g. LSF

• MasterScheduler– Splits input dataset

– Uses an instance of LocalScheduler to process subjobs

– Merges results from subjobs

Analysis service• Web service wrapper for Scheduler

– This service may run any scheduler with any job type

• WsClientScheduler subclass acts as a client to the service– Same interface for interacting with local fork, local batch or remote

analysis service

• Typical mode of operation is to use a remote analysis service

Page 13: DIAL: Distributed Interactive Analysis of Large Datasets

D. Adams DOSAR meeting DIAL September 16, 2005 13ATLAS

Scheduler (cont)

D atas et

B uild

A p p lic atio n

T as k

X fo rm atio n

Spli t

D atas et

D atas et

D atas et

P ro c e s s D atas et

D atas etM e rgeP ro c e s s D atas et

P ro c e s s D atas et

Typical structure of a job running on a MasterScheduler

Page 14: DIAL: Distributed Interactive Analysis of Large Datasets

D. Adams DOSAR meeting DIAL September 16, 2005 14ATLAS

Scheduler (cont)Scheduler hierarchy

• Plan to add a subclass to forward requests to a selected scheduler based on job characteristics

• This will make it possible to create a tree (or web) of analysis services

– Should allow us to scale to arbitrarily large data samples (limited only by available computing resources)

Page 15: DIAL: Distributed Interactive Analysis of Large Datasets

D. Adams DOSAR meeting DIAL September 16, 2005 15ATLAS

Scheduler (cont)

Tie r 1 o r 2

Tie r 2

gLite W M S

C lie nt

A na ly s is s e rv ic e

C E

A T LA S P ro d u c tio n s y s te m

F a rm

Tie r 1

Example of a possible scheduler tree for ATLAS

Page 16: DIAL: Distributed Interactive Analysis of Large Datasets

D. Adams DOSAR meeting DIAL September 16, 2005 16ATLAS

CatalogsDIAL provides support for different types of catalogs

• Repositories to hold instances of application, task, dataset and job

• Selection catalogs associate names and metadata with instances of these objects

• And others

MySql implementations• Populated with ATLAS transformations and Rome AOD datasets

Page 17: DIAL: Distributed Interactive Analysis of Large Datasets

D. Adams DOSAR meeting DIAL September 16, 2005 17ATLAS

User interfacesRoot

• Almost all DIAL classes are available at the root command line

• User can access catalogs, create job definitions, and submit and monitor jobs

Python interface available in PyDIAL• Most DIAL classes are available as wrapped python classes

• Work done by GANGA using LCG tools

GUI• There is a GUI that supports job specification, submission and

monitoring

• Provided by GANGA using the PyDIAL

Page 18: DIAL: Distributed Interactive Analysis of Large Datasets

D. Adams DOSAR meeting DIAL September 16, 2005 18ATLAS

Page 19: DIAL: Distributed Interactive Analysis of Large Datasets

D. Adams DOSAR meeting DIAL September 16, 2005 19ATLAS

Page 20: DIAL: Distributed Interactive Analysis of Large Datasets

D. Adams DOSAR meeting DIAL September 16, 2005 20ATLAS

StatusReleases

• Most of the previous is available in the current release (1.20)

• Release 1.30 expected next month

• Release with service hierarchy in January

Use in ATLAS• Ambition is to use DIAL to define a common interface for

distributed analysis with connections to different systems– See figure

– ATLAS Distributed analysis is under review

• Also want to continue with the original goal of providing a system with interactive response for appropriate jobs

• Integration with the new USATLAS PANDA project

Page 21: DIAL: Distributed Interactive Analysis of Large Datasets

D. Adams DOSAR meeting DIAL September 16, 2005 21ATLAS

Possible analysis service model for ATLAS

R O O T P Y T H O N

C atalo gs D IA L A S A T P R O D A S A R D A A S

LS F , C O N D O R LC G /gLite W M SA T P R O D D B

G U I andc o m m and l inec l ie nts

H igh le ve l s e rvic e sfo r c atalo ging andjo b s ubm is s io n andm o nito r ing

W o rklo adm anage m e nts ys te m s

AJ D L

s h S Q L g L ite

AJ D L

AJ D L

Page 22: DIAL: Distributed Interactive Analysis of Large Datasets

D. Adams DOSAR meeting DIAL September 16, 2005 22ATLAS

Status (cont)Use in other experiments

• Some interest but no serious use that I know of

• Current releases depend on ATLAS software but I am willing to build a version without that dependency

– Not difficult

• A generic system like this might be of interest to smaller experiments that would like to have the capabilities but do not have the resources to build a dedicated system

Page 23: DIAL: Distributed Interactive Analysis of Large Datasets

D. Adams DOSAR meeting DIAL September 16, 2005 23ATLAS

InteractivityWhat does it mean for a system to be interactive?

• Literal meaning suggests user is able to directly communicate with jobs and subjobs

• Most users will be satisfied if their jobs complete quickly and do not need to interact with subjobs or care if some other agent is doing so

– Important exception is the capability to a job and all its subjobs

Interactive (responsive) impose requirements on the processing system (batch, WMS, …)

• High job rates (> 1 Hz)

• Low submit-to-result job latencies (< 1 minute)

• High data input rate from SE to farm (> 10 MB/job)

Page 24: DIAL: Distributed Interactive Analysis of Large Datasets

D. Adams DOSAR meeting DIAL September 16, 2005 24ATLAS

Current resultsATLAS generated data samples for the Rome physics meeting in June

• Data includes AOD (analysis oriented data) that is a summary of the reconstructed event data

– AOD size is 100 kB/event

• All Rome AOD datasets are available in DIAL• An interactive analysis service is deployed at BNL using a special

LSF queue for job submission• The following plot shows processing times for datasets of various

sizes– LSF SUSY (green triangles) is a “real AOD analysis” producing

histograms– LSF big produces ntuples and extra time is for ntuple merging (not

parallelized)– Largest jobs take 3 hours to run serially and are processed in about 10

minutes with the DIAL interactive service

Page 25: DIAL: Distributed Interactive Analysis of Large Datasets

ATLAS

ADA AOD processing time using DIAL 1.20 29jun05

0

600

1200

1800

0 100 200 300

1000 events

Tim

e (

sec)

single job

parallel proc only

(single job)/10

BNL LSF big

BNL LSF small

BNL LSF SUSY

Page 26: DIAL: Distributed Interactive Analysis of Large Datasets

D. Adams DOSAR meeting DIAL September 16, 2005 26ATLAS

ConclusionsDIAL analysis service running at at BNL

• Serves clients from anywhere

• Processing done locally

• Rome AOD datasets available

• Robust and responsive since June

Next steps• Similar service running at UTA (using PBS)

• Service to select between these

• BNL service to connect to OSG CE (in place of local LSF)

• Integration with new ATLAS data management system

• Integration with PANDA

Page 27: DIAL: Distributed Interactive Analysis of Large Datasets

D. Adams DOSAR meeting DIAL September 16, 2005 27ATLAS

More informationDIAL home page:

http://www.usatlas.bnl.gov/~dladams/dial

ADA (ATLAS Distributed Analysis) home page:http://www.usatlas.bnl.gov/ADA

Current DIAL release:http://www.usatlas.bnl.gov/~dladams/dial/releases/1.20

Page 28: DIAL: Distributed Interactive Analysis of Large Datasets

D. Adams DOSAR meeting DIAL September 16, 2005 28ATLAS

ContributorsGANGA

• Karl Harrison, Alvin Tan

DIAL• David Adams, Wensheng Deng, Tadashi Maeno, Vinay

Sambamurthy, Nagesh Chetan, Chitra Kannan

ARDA• Dietrich Liko

ATPROD• Frederic Brochu, Alessandro De Salvo

AMI• Solveig Albrand, Jerome Fulachier

ADA• (plus those above), Farida Fassi, Christian Haeberli, Hong Ma,

Grigori Rybkine, Hyunwoo Kim