paul avery university of florida [email protected]
DESCRIPTION
Open Science Grid Linking Universities and Laboratories in National CyberInfrastructure. Paul Avery University of Florida [email protected]. SURA Infrastructure Workshop Austin, TX December 7, 2005. Bottom-up Collaboration: “Trillium”. Trillium = PPDG + GriPhyN + iVDGL - PowerPoint PPT PresentationTRANSCRIPT
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 1
Paul AveryUniversity of [email protected]
Open Science GridLinking Universities and Laboratories in National
CyberInfrastructure
SURA Infrastructure WorkshopAustin, TX
December 7, 2005
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 2
Bottom-up Collaboration: “Trillium”
Trillium = PPDG + GriPhyN + iVDGLPPDG: $12M (DOE) (1999 – 2006)GriPhyN: $12M (NSF) (2000 – 2005) iVDGL: $14M (NSF) (2001 – 2006)
~150 people with large overlaps between projectsUniversities, labs, foreign partners
Strong driver for funding agency collaborations Inter-agency: NSF – DOE Intra-agency: Directorate – Directorate, Division – Division
Coordinated internally to meet broad goalsCS research, developing/supporting Virtual Data Toolkit
(VDT)Grid deployment, using VDT-based middleware Unified entity when collaborating internationally
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 3
Common Middleware: Virtual Data Toolkit
Sources(CVS)
Patching
GPT srcbundles
NMI
Build & TestCondor pool
22+ Op. Systems
Build
Test
Package
VDT
Build
Many Contributors
Build
Pacman cache
RPMs
Binaries
Binaries
Binaries Test
A unique laboratory for testing, supporting, deploying, packaging, upgrading, & troubleshooting complex sets of software!
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 4
VDT Growth Over 3 Years (1.3.8 now)
0
5
10
15
20
25
30
35Ja
n-02
Apr
-02
Jul-0
2
Oct
-02
Jan-
03
Apr
-03
Jul-0
3
Oct
-03
Jan-
04
Apr
-04
Jul-0
4
Oct
-04
Jan-
05
Apr
-05
VDT 1.1.x VDT 1.2.x VDT 1.3.x
# of
com
pone
nts
VDT 1.0Globus 2.0bCondor 6.3.1
VDT 1.1.7Switch to Globus 2.2
VDT 1.1.11Grid3
VDT 1.1.8First real use by LCG
www.griphyn.org/vdt/
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 5
Components of VDT 1.3.5 Globus 3.2.1 Condor 6.7.6 RLS 3.0 ClassAds 0.9.7 Replica 2.2.4 DOE/EDG CA certs ftsh 2.0.5 EDG mkgridmap EDG CRL Update GLUE Schema 1.0 VDS 1.3.5b Java Netlogger 3.2.4 Gatekeeper-Authz MyProxy1.11 KX509
System Profiler GSI OpenSSH 3.4 Monalisa 1.2.32 PyGlobus 1.0.6 MySQL UberFTP 1.11 DRM 1.2.6a VOMS 1.4.0 VOMS Admin 0.7.5 Tomcat PRIMA 0.2 Certificate Scripts Apache jClarens 0.5.3 New GridFTP Server GUMS 1.0.1
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 6
VDT Collaborative Relationships
ComputerScience
Research
VirtualData
Toolkit
Partner science projectsPartner networking projectsPartner outreach projects
Science, ENG,Education
Communities
Globus, Condor, NMI, TeraGrid, OSGEGEE, WLCG, Asia, South AmericaQuarkNet, CHEPREO, Digital Divide
Deployment,Feedback
TechTransfer
Techniques&
software
RequirementsPrototyping
& experiments
Other linkages Work force CS researchers Industry
U.S.GridsInternational
Outreach
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 7
Search for Origin of Mass New fundamental forces Supersymmetry Other new particles 2007 – ?
TOTEM
LHCbALICE
27 km Tunnel in Switzerland & France
CMS
ATLAS
Major Science Driver:Large Hadron Collider (LHC) @ CERN
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 8
LHC: Petascale Global Science Complexity: Millions of individual detector channels Scale: PetaOps (CPU), 100s of Petabytes (Data) Distribution: Global distribution of people & resources
CMS Example- 20075000+ Physicists 250+ Institutes 60+ Countries
BaBar/D0 Example - 2004700+ Physicists 100+ Institutes 35+ Countries
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 9
CMS Experiment
LHC Global Data Grid (2007+)
Online System
CERN Computer Center
USAKorea RussiaUK
Maryland
150 - 1500 MB/s
>10 Gb/s
10-40 Gb/s
2.5-10 Gb/s
Tier 0
Tier 1
Tier 3
Tier 2
Physics caches PCs
Iowa
UCSDCaltechU Florida
5000 physicists, 60 countries
10s of Petabytes/yr by 2008 1000 Petabytes in < 10 yrs?
FIU
Tier 4
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 10
Grid3 andOpen Science Grid
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 11
Grid3: A National Grid InfrastructureOctober 2003 – July 200532 sites, 3,500 CPUs: Universities + 4 national
labsSites in US, Korea, Brazil, TaiwanApplications in HEP, LIGO, SDSS, Genomics, fMRI,
CS
Brazil www.ivdgl.org/grid3
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 12
Grid3 Lessons Learned How to operate a Grid as a facility
Tools, services, error recovery, procedures, docs, organizationDelegation of responsibilities (Project, VO, service, site, …)Crucial role of Grid Operations Center (GOC)
How to support people people relationsFace-face meetings, phone cons, 1-1 interactions, mail lists,
etc. How to test and validate Grid tools and applications
Vital role of testbeds How to scale algorithms, software, process
Some successes, but “interesting” failure modes still occur How to apply distributed cyberinfrastructure
Successful production runs for several applications
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 13
http://www.opensciencegrid.org
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 14
Sao Paolo Taiwan, S.Korea
Production Grid: 50+ sites, 15,000 CPUs “present”(available but not at one time)
Sites in US, Korea, Brazil, Taiwan Integration Grid: 10-12 sites
Open Science Grid: July 20, 2005
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 15
OSG Operations Snapshot
November 7: 30 days
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 16
OSG Participating DisciplinesComputer Science Condor, Globus, SRM,
SRBTest and validate innovations: new services & technologies
Physics LIGO, Nuclear Physics, Tevatron, LHC
Global Grid: computing & data access
Astrophysics Sloan Digital Sky Survey
CoAdd: multiply-scanned objectsSpectral fitting analysis
Bioinformatics Argonne GADU project
Dartmouth Psychological & Brain Sciences
BLAST, BLOCKS, gene sequences, etcFunctional MRI
University campusResources, portals, apps
CCR (U Buffalo)GLOW (U Wisconsin)TACC (Texas Advanced Computing Center)MGRID (U Michigan)UFGRID (U Florida)Crimson Grid (Harvard)FermiGrid (FermiLab Grid)
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 17
OSG Grid PartnersTeraGrid • “DAC2005”: run LHC apps on TeraGrid
resources• TG Science Portals for other applications• Discussions on joint activities: Security,
Accounting, Operations, PortalsEGEE • Joint Operations Workshops, defining
mechanisms to exchange support tickets• Joint Security working group• US middleware federation contributions to
core-middleware gLITE
Worldwide LHC Computing Grid
• OSG contributes to LHC global data handling and analysis systems
Other partners • SURA, GRASE, LONI, TACC• Representatives of VOs provide portals and
interfaces to their user groups
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 18
Example of Partnership:WLCG and EGEE
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 19
OSG Technical Groups & Activities
Technical Groups address and coordinate technical areas
Propose and carry out activities related to their given areasLiaise & collaborate with other peer projects (U.S. &
international)Participate in relevant standards organizations.Chairs participate in Blueprint, Integration and Deployment
activities Activities are well-defined, scoped tasks contributing
to OSGEach Activity has deliverables and a plan… is self-organized and operated… is overseen & sponsored by one or more Technical
GroupsTGs and Activities are where the real work gets done
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 20
OSG Technical Groups (deprecated!)
Governance Charter, organization, by-laws, agreements, formal processes
Policy VO & site policy, authorization, priorities, privilege & access rights
Security Common security principles, security infrastructure
Monitoring and Information Services
Resource monitoring, information services, auditing, troubleshooting
Storage Storage services at remote sites, interfaces, interoperability
Support Centers Infrastructure and services for user support, helpdesk, trouble ticket
Education / Outreach
Training, interface with various E/O projects
Networks (new) Including interfacing with various networking projects
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 21
OSG ActivitiesBlueprint Defining principles and best practices for
OSGDeployment Deployment of resources & servicesProvisioning Connected to deploymentIncidence response
Plans and procedures for responding to security incidents
Integration Testing & validating & integrating new services and technologies
Data Resource Management (DRM)
Deployment of specific Storage Resource Management technology
Documentation Organizing the documentation infrastructure
Accounting Accounting and auditing use of OSG resources
Interoperability Primarily interoperability between Operations Operating Grid-wide services
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 22
OSG Integration Testbed:Testing & Validating Middleware
Brazil
Taiwan
Korea
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 23
Networks
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 24
Evolving Science Requirements for Networks (DOE High Performance
Network Workshop)Science Areas
Today End2End
Throughput
5 years End2End
Throughput5-10 Years End2End
ThroughputRemarks
High Energy Physics
0.5 Gb/s 100 Gb/s 1000 Gb/s High bulk throughput
Climate (Data &
Computation)0.5 Gb/s 160-200
Gb/sN x 1000
Gb/sHigh bulk
throughput
SNS NanoScience
Not yet started
1 Gb/s 1000 Gb/s + QoS for Control Channel
Remote control and time critical throughput
Fusion Energy
0.066 Gb/s(500 MB/s
burst)0.2 Gb/s(500MB/20 sec. burst)
N x 1000 Gb/s
Time critical throughput
Astrophysics 0.013 Gb/s(1 TB/week)
N*N multicast
1000 Gb/s Computational steering and
collaborationsGenomics
Data & Computation
0.091 Gb/s(1 TB/day)
100s of users
1000 Gb/s + QoS for Control Channel
High throughput
and steering
See http://www.doecollaboratory.org/meetings/hpnpw/
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 25
UltraLight
10 Gb/s+ network• Caltech, UF, FIU, UM, MIT• SLAC, FNAL• Int’l partners• Level(3), Cisco, NLR
http://www.ultralight.org
Integrating Advanced Networking in Applications
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 26
EducationTraining
Communications
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 27
iVDGL, GriPhyN Education/Outreach
Basics $200K/yr Led by UT Brownsville Workshops, portals, tutorials Partnerships with QuarkNet,
CHEPREO, LIGO E/O, …
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 28
Grid Training Activities June 2004: First US Grid Tutorial (South Padre Island,
Tx)36 students, diverse origins and types
July 2005: Second Grid Tutorial (South Padre Island, Tx)
42 students, simpler physical setup (laptops) Reaching a wider audience
Lectures, exercises, video, on webStudents, postdocs, scientistsCoordination of training activities “Grid Cookbook” (Trauner & Yafchak)More tutorials, 3-4/yearCHEPREO tutorial in 2006?
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 29
QuarkNet/GriPhyN e-Lab Project
http://quarknet.uchicago.edu/elab/cosmic/home.jsp
CHEPREO: Center for High Energy Physics Research and Educational OutreachFlorida International University
Physics Learning Center CMS Research iVDGL Grid Activities AMPATH network (S.
America)
Funded September 2003
$4M initially (3 years) MPS, CISE, EHR, INT
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 31
Grids and the Digital Divide
Background World Summit on Information
Society HEP Standing Committee on
Inter-regional Connectivity (SCIC)Themes Global collaborations, Grids and
addressing the Digital Divide Focus on poorly connected
regions Brazil (2004), Korea (2005)
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 32
Science Grid CommunicationsBroad set of activities(Katie Yurkewicz)News releases, PR, etc.Science Grid This WeekOSG NewsletterNot restricted to OSGwww.interactions.org/sgtw
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 33
Grid Timeline
2000 2001 2003 2004 2005 2006 20072002
GriPhyN, $12M
PPDG, $9.5M
UltraLight, $2M
CHEPREO, $4M
DISUN, $10M
Grid Communications
Grid Summer Schools
Grid3 operations
OSG operationsVDT
1.0
First US-LHCGrid
Testbeds
Digital Divide Workshops
LIGO Grid
Start of LHCiVDGL,
$14M
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 34
Future of OSG CyberInfrastructure
OSG is a unique national infrastructure for scienceLarge CPU, storage and network capability crucial for
science Supporting advanced middleware
Long-term support of the Virtual Data Toolkit (new disciplines & international collaborations
OSG currently supported by a “patchwork” of projects
Collaborating projects, separately funded Developing workplan for long-term support
Maturing, hardening facilityExtending facility to lower barriers to participationOct. 27 presentation to DOE and NSF
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 35
Sao Paolo Taiwan, S.Korea
OSG Consortium Meeting: Jan 23-25
University of Florida (Gainesville)About 100 – 120 people expectedFunding agency invitees
ScheduleMonday Morning: Applications plenary
(rapporteurs)Monday Afternoon: Partner Grid projects plenaryTuesday Morning: ParallelTuesday Afternoon: PlenaryWednesday Morning: ParallelWednesday Afternoon: PlenaryThursday: OSG Council meeting
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 36
Disaster PlanningEmergency Response
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 37
Grids and Disaster Planning / Emergency Response
Inspired by recent eventsDec. 2004 tsunami in IndonesiaAug. 2005 Katrina hurricane and subsequent flooding(Quite different time scales!)
Connection of DP/ER to GridsResources to simulate detailed physical & human
consequences of disastersPriority pooling of resources for a societal good In principle, a resilient distributed resource
Ensemble approach well suited to Grid/cluster computing
E.g., given a storm’s parameters & errors, bracket likely outcomes
Huge number of jobs requiredEmbarrassingly parallel
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 38
DP/ER Scenarios Simulating physical scenarios
Hurricanes, storm surges, floods, forest firesPollutant dispersal: chemical, oil, biological and nuclear spillsDisease epidemicsEarthquakes, tsunamisNuclear attacksLoss of network nexus points (deliberate or side effect)Astronomical impacts
Simulating human responses to these situationsRoadways, evacuations, availability of resourcesDetailed models (geography, transportation, cities,
institutions)Coupling human response models to specific physical
scenarios Other possibilities
“Evacuation” of important data to safe storage
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 39
DP/ER and Grids: Some Implications
DP/ER scenarios are not equally amenable to Grid approach
E.g., tsunami vs hurricane-induced floodingSpecialized Grids can be envisioned for very short response timesBut all can be simulated “offline” by researchersOther “longer term” scenarios
ER is an extreme example of priority computingPriority use of IT resources is common (conferences, etc) Is ER priority computing different in principle?
Other implicationsRequires long-term engagement with DP/ER research communities(Atmospheric, ocean, coastal ocean, social/behavioral, economic)Specific communities with specific applications to executeDigital Divide: resources to solve problems of interest to 3rd WorldForcing function for Grid standards?Legal liabilities?
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 40
Grid Project ReferencesOpen Science Grid
www.opensciencegrid.orgGrid3
www.ivdgl.org/grid3Virtual Data Toolkit
www.griphyn.org/vdtGriPhyN
www.griphyn.orgiVDGL
www.ivdgl.orgPPDG
www.ppdg.netCHEPREO
www.chepreo.org
UltraLight www.ultralight.org
Globus www.globus.org
Condor www.cs.wisc.edu/condor
WLCG www.cern.ch/lcg
EGEE www.eu-egee.org
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 41
Extra Slides
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 42
Grid3 Use by VOs Over 13 Months
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 43
CMS: “Compact” Muon Solenoid
Inconsequential humans
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 44
LHC: Beyond Moore’s LawEstimated CPU Capacity at CERN
Intel CPU (2 GHz) = 0.1K SI95
0
1,000
2,000
3,000
4,000
5,000
6,000
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
K S
I95 Moore’s Moore’s LawLaw
(2000)(2000)
LHC CPU LHC CPU RequiremenRequiremen
tsts
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 45
Grids and Globally Distributed Teams
Non-hierarchical: Chaotic analyses + productions Superimpose significant random data flows
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 46
Sloan Digital Sky Survey (SDSS)Using Virtual Data in GriPhyN
1
10
100
1000
10000
100000
1 10 100
Num
ber
of C
lust
ers
Number of Galaxies
Galaxy clustersize distribution
Sloan Data
SURA Infrastructure Workshop (Dec. 7, 2005)
Paul Avery 47
The LIGO Scientific Collaboration (LSC)
and the LIGO Grid LIGO Grid: 6 US sites + 3 EU sites (Cardiff/UK,
AEI/Germany)
* LHO, LLO: LIGO observatory sites* LSC: LIGO Scientific Collaboration
Cardiff
AEI/Golm •
Birmingham•