meeting the htc demand with diagrid and xsede · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700...

33
Meeting the HTC Demand with Diagrid and XSEDE Kimberley Dillman Research Programmer / Purdue XSEDE Campus Champion Purdue University email: [email protected]

Upload: others

Post on 22-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

Meeting the HTC Demand with Diagrid and XSEDE

Kimberley Dillman

Research Programmer / Purdue XSEDE Campus Champion

Purdue University

email: [email protected]

Page 2: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

What “HTC” resources do we have at Purdue?

Page 3: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

HTC “Traditional” Methods

• Community Cluster “standby” queues

• Diagrid

• Open Science Grid (OSG)

Page 4: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

Community Cluster “standby” Queues

• Must “purchase” at least one node to gain access to any “community cluster” with the exception of Radon (recycled cluster)

• Allows access to any/all the CPUs in the cluster for up to 4 hours when idle (not in use by “owner”)

• Jobs in this queue have lower priority than jobs in “owner” queues

• Can run serial jobs and “share” nodes with other jobs or “claim” the entire node

Page 5: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

DiaGrid

• A large, high-throughput, distributed computing system

• Operated by Rosen Center (RCAC)

• Uses Condor to manage jobs and resources

• Good for running serial computations on a large number of processors

• Utilizing idle cycles

• Including all Purdue clusters, lab computers, department computers, desktop, totaling 50,000+ cores

• Purdue leading a partnership of 10 campuses and institutions

Page 6: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

Diagrid Partners

Page 7: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

1,500

4,000

6,100 7,700

22,000

30,000

42,000

50,600

-

10,000

20,000

30,000

40,000

50,000

60,000

2004 2005 2006 2007 2008 2009 2010 2011

Growth of DiaGrid 2004-2011 (number of cores)

25

70

115 115

163

145

276

8

27

50 60

85 79

89

5 11

16 13 18 20

15 4

11 19 18 16 16 13

0

50

100

150

200

250

300

2005 2006 2007 2008 2009 2010 2011

DiaGrid User Count

Unique users Unique PIs

Unique PI depts Fields of Science

Slide Courtesy of Carol Song - Purdue

Page 8: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

0

5

10

15

20

25

20042005

20062007

20082009

20102011

Mil

lion

s

Jobs

Hours

Slide Courtesy of Carol Song - Purdue

Page 9: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

Open Science Grid (OSG)

• Some Current Users at Purdue

– CMS Tier 2

– NEES (via NEESHub)

• Methods of access

– “traditional” VO submit host

– “submit” capability enabled within a HUBZero hub (i.e. NEES)

Page 10: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

Job Priority Order

• Node “owner” (highest priority)

• Cluster “standby” queue (second priority)

• Diagrid/OSG (lowest priority)

Page 11: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

Why isn’t the “traditional” method good enough?

• Not all jobs are “serial” or are “short enough” in duration to fit well into a “cycle scavenging” mode of operation

• Not all scientists are or want to be “computer experts” – They don’t want to have to know or understand the

different methods of job submission and syntax (i.e. no command line)

– They just want to get their science done!

• Users usually don’t care where or how the job runs…just that it does so successfully…

Page 12: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

Enter….Diagrid Hub

• What is it?

– Access to the Diagrid Pool of resources plus a dedicated queue on the Hansen cluster

– GUI “front end” that “hides” the details of job submission from the user

• What applications are available?

– Blast

– R

– CryoEM image processing

• Who uses it?

Page 13: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

Science-as-a-Service

Slide Courtesy of Carol Song - Purdue

Page 14: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

Bioinformatics – BLASTer for large scale sequence alignment

J. Andrew DeWoody, Nick Marra, Forestry & Natural Resources

• Using Blaster to annotate assembly of gene sequences (50,534 contigs) from E51K Illumina in study of gene evolution

• 8 days in the lab less than 3 hours on DiaGrid

• Blaster has completed 1 million search jobs (equivalent to searches of tens of millions of sequences against public and custom databases)

• Currently 44 research users of Blaster • April, Sept campus wide presentations • Nov. presentation to Coll. Pharmacy

Slide Courtesy of Carol Song - Purdue

Page 15: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

R for everyone – Data analysis, parameter sweeps, parallel applications

• Beta release of SubmitR in October 2012 • A single interface allowing users to access Condor and cluster resources • 1.1 M processor hours has been used by R applications from DiaGrid • Community forum on November 28 (21 researchers attended)

Slide Courtesy of Carol Song - Purdue

Page 16: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

DiaGrid is also for sharing scientific tools – CryoEM 3D Reconstruction

• Powerful tool created by research group in Biology • Adapted to DiaGrid hub to share with the larger research

community • Still in development, prototype tested with small class in 2012

Wen Jiang, Biology

Slide Courtesy of Carol Song - Purdue

Page 17: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

Diagrid Hub – Latest Stats..

• SubmitR (via diagrid-a queue on Hansen cluster)

– Total Jobs: 1013

– Total Wall Processor Hours: 2,574,001

• Blaster

– Total Jobs: 258,961

– Total Wall Processor Hours: 1,092,839

• CryoEM

– Used in “beta mode” in small class settings

– Being integrated with Pegasus (workflow)

Page 18: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

Diagrid Hub – What’s next..

• Enhancements to existing tools

• Add new tools

• Pegasus in now available in HUBZero…use it to manage large job workflows…

Page 19: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

Where do you go when you need more?

Page 20: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

XSEDE (and what is available)

• Large supercomputers

• Medium clusters

• Viz resources

• Diagrid

• OSG

Page 21: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

XSEDE Resources for HTC

• Diagrid (aka Purdue Condor Pool)

– No longer available via XSEDE allocations after July 2013

• Open Science Grid

– XSEDE has a submit host for login and submit access to OSG

• Some TACC resources

– Stampede and Lonestar have special “submit scripts” that can aid users with “bundling” and submitting serial jobs so that they “look like” large parallel jobs to the system

Page 22: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

Purdue Condor Pool Usage

19048203 SUs Total

Page 23: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

OSG Usage

4800264 SUs Total

Page 24: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

Diagrid/XSEDE/OSG Connectivity Diagram

Purdue

University of

Louisville

University of

Nebraska Lincoln

Wisconsin

Purdue Calumet

Indiana University

University of Notre

Dame

Indiana State

University

IPFW

Purdue University

North Central

XSEDE

OSG

XSEDE Submitter

Purdue Submitter

OSG/XSEDE Submitter

Page 25: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

The Campus Champions Program!

XSEDE has another “resource” to help Campuses help their users….

Page 26: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

Campus Champion Program

• Kay Hunt, Purdue, coordinates the program

• Launched in 2007

• More than 100 member campuses today

26

Page 27: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

Current Campus Champion Institutions (unclassified) – 73

Current Campus Champion Institutions (EPSCoR states) 45

Current Campus Champion Institutions (Minority Serving Institutions)--11

Current Campus Champion Institutions (both EPSCoR and MSI) – 8

Total Number of Campus Champion Institutions Overall -- 137

Campus Champion Institutions

March 8, 2013

Page 28: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

28

XSEDE Support of Champions

• Support provided from across XSEDE

• Champions provided monthly training and updates

• XSEDE staff as liaisons for Champions

• Champions provided with start-up account

• Champions are members of User Services team

• Forum for sharing and interactions

• Access to information on usage by their users

• Waive registrations for annual XSEDE Conference

Page 29: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

29

Campus Champion Role

• Raise awareness locally

• Provide training

• Get users started with access quickly

• Represent needs of local community

• Provide feedback to improve services

• Attend annual XSEDE conference

• Share campus training and education materials

• Build community among Champions

Page 30: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

30

Strategies of Success

• CI Days events on campus

• Act as regional source of information

• Champions co-present at conferences to promote and grow the program

• Champion focused sessions at SC’xx and XSEDE’xx

• Significant increase in number of new XSEDE users

• Large number of under-represented institutions have joined

Page 31: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

XSEDE Campus Bridging Vision

• Provide software and training tools for interoperation with XSEDE infrastructure

• Make better use of the nation’s aggregate CI resources

• Campus Bridging is a set of tools, techniques, and consulting

• Tools for doing this:

– Installers

– Documentation & training

– Ability to contribute community resources for greater good

31

Page 32: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

The “program” from a Champion’s perspective…

• A Champion’s role is to assist a user in determining the “best” resources to further their “science” including resources other than XSEDE (including local resources if available)

• Champions “network” with each other to ask and answer questions about topics that benefit their local institution and user base even if it is not directly related to XSEDE resources and services

• Champions serve as an important “resource” to XSEDE by providing “insight” into the needs and problems encountered by themselves and their users

Page 33: Meeting the HTC Demand with Diagrid and XSEDE · 3/12/2013  · 271,500 4,000 6,100 10,000 7,700 22,000 30,000 42,000 50,600 -20,000 30,000 40,000 50,000 60,000 2004 2005 2006 2007

Questions?