teragrid resources enabling scientific discovery through cyberinfrastructure (ci)

Post on 23-Jan-2016

24 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI). Diane Baxter, Ph.D. San Diego Supercomputer Center University of California, San Diego. The National TeraGrid. Grid Infrastructure Group (UChicago). UW. PSC. UC/ANL. NCAR. PU. NCSA. UNC/RENCI. IU. - PowerPoint PPT Presentation

TRANSCRIPT

TeraGrid ResourcesEnabling Scientific Discovery

Through Cyberinfrastructure (CI)

Diane Baxter, Ph.D.San Diego Supercomputer CenterUniversity of California, San Diego

SDSC

TACC

UC/ANL

NCSA

ORNL

PU

IU

PSC

NCAR

Caltech

USC/ISI

UNC/RENCI

UW

Resource Provider (RP)Software Integration Partner

Grid Infrastructure Group (UChicago)

LSU

U Tenn.

The National TeraGrid

http://www.teragrid.org/

A complex collaboration of over a dozen organizations working together to provide cyberinfrastructure

that goes beyond what can be provided by

individual institutions,

to improve research productivity and enable breakthroughs not otherwise possible.

3

TeraGrid . . . .

•Deep - provides leadership class resources at 11 partner sites

•Wide - is an integrated, persistent computational resource for broad user communities

•Open - is an open scientific discovery infrastructure

•Is the world's largest, most comprehensive distributed cyberinfrastructure for open scientific research.

To be more specific, TeraGrid . . . • Uses high-performance network connections

(10-30 Tb/sec)

• Integrates high-performance computers; resources for data analysis, visualization, and storage; data collection tools, high-end experimental facilities; and supporting expertise around the country;

• Provides more than a petaflop of computing capability;

• Offers more than 30 petabytes of online and archival data storage, as well as systems to manage data acquisition and access; and

• Provides researchers access to over 100 discipline-specific databases.

What’s in it (TeraGrid) for me?

• Instruments that delivers high-end IT resources - computation, storage, visualization, and data/service

– A computational facility – over a PetaFLOP in parallel computing capability

– A data storage and management facility - over 30 PetaBytes of storage (disk and tape), over 100 scientific data collections

– A high-bandwidth national data network

•Services: help desk and consulting, Advanced Support for TeraGrid Applications (ASTA), education and training events and resources

•Access - without financial cost – Research accounts allocated via peer review– Startup and Education accounts automatic

6

TeraGrid Compute Power

Computational Resources (size approximate - not to scale)

Slide Courtesy Tommy Minyard, TACC

SDSC

TACC

UC/ANL

NCSA

ORNL

PU

IU

PSC

NCAR

TennesseeLONI/

LSU

7

TeraGrid Data Storage and Management

• Persistent storage on disk and tape

• Allocatable tape-based, geographically distributed storage systems for backups of critical data :

» IU (Indiana University)» NCAR (National Center for Atmospheric Research)» NCSA (National Center for Supercomputing Applications)» SDSC (San Diego Supercomputer Center)

• Command line usage with GridFTP, using the File Manager tool in the TeraGrid User Portal

• GPFS-WAN (General Parallel File System Wide Area Network). ~ 1 petabyte

• IU Data Capacitor (1 Pb spinning disk for short-term data storage)

• Long term disk storage allocations

TeraGrid Architecture

ComputeService

VizService

DataService

Network, Accounting, …

RP 1

RP 3

RP 2

TeraGrid Infrastructure (Network, Authorization, Accounting,

…)

POPS

Science Gateway

s

UserPortal

Command

Line

(Are your eyes glazing over?) Translation please!

Enter: Science Gateways

•A Science Gateway– Enables scientific communities of

users with a common scientific goal and vocabulary

– Has a common interface – Leverages community investment

•Three common forms:– Web-based Portals – Application programs running on

users' machines but accessing services in TeraGrid

– Coordinated access points enabling users to move seamlessly between TeraGrid and other grids.

12

Today, there are approximately 29 gateways using the TeraGrid

How do Gateways help?

•Make science more productive– Researchers use same tools– Complex workflows– Common data formats– Data sharing

•Bring TeraGrid capabilities to the broad science community– Lots of disk space– Lots of compute resources– Powerful analysis capabilities– A community-friendly interface

to information and research tools

But it’s not just ease of use. What can scientists do that they couldn’t do

previously?

• LEAD - access to radar data• NVO – access to sky surveys• OOI – access to sensor data• PolarGrid – access to polar ice sheet data• SIDGrid – analysis tools for social scientists• GridChem – developing multiscale coupling

How would this have been done before gateways?

Gateways can enhance and support investments in other projects

•Increase access– To instruments

•Increase capabilities– To data analysis tools

•Improve workforce development– For underserved populations, through broad

access to learning resources

•Increase outreach•Increase public awareness

– Public sees value in investments in large facilities

•Slice bread

Gateways Greatly Expand Access

•Almost anyone can investigate scientific questions using high end resources– Not restricted to those in research groups with allocations– Gateways allow anyone with a web browser to explore

•Fosters new ideas, cross-disciplinary approaches•Encourages students to experiment•But Gateways are used in production too

– Significant number of papers resulting from gateways including GridChem, nanoHUB

– Scientists can focus on challenging science problems rather than challenging infrastructure problems

How do we develop a new gateway? Advanced support for Gateway Development

•Same peer review process used to request resources– 30,000 CPUs – + 6 months of help from a TG Gateway Team

member

– Reviews based on appropriate use of resources, science is not reviewed if already funded•Petascale•Multisite workflows•Gateways•Domain expertise

Support is Very Targeted• Start with well-defined objectives

– Focus on efficient or novel use of national CI resources

• Minimum .25 FTE for months to a year

– Enough investment to really understand and help solve complex problems

• Must have commitment from PIs

– Want to make sure work is incorporated into production codes and gateways

• Good candidates for targeted support include:

– Large, high impact projects

– Ability to influence new communities

– Suggestions from NSF directorates on important projects

• Lessons learned move into training and documentation

When is a gateway be most appropriate?

• Researchers using defined sets of tools in different ways

– Same executables, different input•GridChem, CHARMM

– Creating multi-scale or complex workflows

– Shared datasets

• Common data formats

– National Virtual Observatory

– Earth System Grid

– Some groups have invested significant efforts already, e.g.:

•caBIG, extensive discussions to develop common terminology and formats

•BIRN, extensive data sharing agreements

• Difficult to access data/advanced workflows

– Sensor/radar input

•LEAD, GEON

Work by Emad Tajkhorshid and James Gumbart, of University of Illinois Urbana-Champaign. – Mechanics of Force Propagation in

TonB-Dependent Outer Membrane Transport. Biophysical Journal 93:496-504 (2007).

– Results of the simulation may be seen at www.life.uiuc.edu/emad/TonB-BtuB/btub-2.5Ans.mpg

• Modeled mechanisms for transport of molecules through cell membrane.

• Used 400,000 CPU hours [45 processor-years] on systems at National Center for Supercomputing Applications, IU, Pittsburgh Supercomputing CenterImage courtesy of Emad Tajkhorshid,

UIUC

Things you can do with the TeraGrid:Simulate cell membrane processes

Predict storms

• Hurricanes and tornadoes cause massive loss of life and damage to property

• TeraGrid supported spring 2007 NOAA and University of Oklahoma Hazardous Weather Testbed

– Major Goal: assess how well ensemble forecasting predicts thunderstorms, including supercells tornadoes.

– Delivers “better than real time” prediction

– Used 675,000 CPU hours for the season

– Used 312 TB on HPSS storage at PSCSlide courtesy of Dennis Gannon, IU, and LEAD Collaboration

Watch Polar Ice Caps Melt (PolarGrid)

•Cyberinfrastructure Center for Polar Science (CICPS)– Experts in polar science,

remote sensing and cyberinfrastructure

– Indiana, ECSU, CReSIS

•Satellite observations show disintegration of ice shelves in West Antarctica and speed-up of several glaciers in southern Greenland– Most existing ice sheet

models, including those used by IPCC cannot explain the rapid changes

http://www.polargrid.org/polargrid/images/4/42/C0050-polargrid-big.m4v

Source: Geoffrey Fox

CY2007 Usage by Discipline

3.95B SUs delivered in CY2007

Molecular

Biosciences

31%

Chemistry

17%Physics

17%

Astronomical

Sciences12%

Materials Research

6%

Earth Sciences

3%

All 19 Others

4%

Advanced Scientific Computing

2%

Atmospheric

Sciences

3%

Chemical, Thermal

Systems

5%

24

Do you want to see more Gateway examples?

•Yes • No

Recent Gateways using TeraGrid Significantly

•SCEC•SIDGrid•CIG

SCEC using gateway to produce hazard map

•PSHA hazard map for California using newly released Earthquake Rupture Forecast (UCERF2.0) calculated using SCEC Science Gateway

•Warm colors indicate regions with a high probability of experiencing strong ground motion in the next 50 years.

•High resolution map, significant CPU use

LEAD (portal.leadproject.org/)

• Simple enough an undergraduate can use it! http://wxchallenge.com/• National Center for Supercomputing Applications (NCSA) and IU teamed

up to support WxChallenge weather forecast competition. 64 teams, 1000 students, ~16,000 CPU hours on Big Red

• XBaya is available from http://www.collab-ogce.org/

NanoHub Harnesses TeraGrid for Education

Nanotechnology education

•Used in dozens of courses at many universities

•Teaching materials•Collaboration space•Research seminars•Modeling tools•Access to cutting edge

research software

Social Informatics Data Grid

•Heavy use of “multimodal” data. – Subject might be viewing a

video, while a researcher collects heart rate and eye movement data.

•Events must be synchronized for analysis, large datasets result

•Extensive analysis capabilities are not something that each researcher should have to create for themselves.

http://www.ci.uchicago.edu/research/files/sidgrid.mov

• Social scientists have traditionally worked in isolated labs without the capability to share data or insights with others.

• SIDGrid enables a number of capabilities. – Data that is expensive to collect can now be shared with others, increasing the

potential for scientific impact.– Geographically distant researchers can collaborate on the analysis of the same

data set.– Complex analysis tools and workflows are now available for all to use, rather

than having each lab duplicate efforts.– All researchers now have access to the highest quality computational resources

•SIDGrid uses TeraGrid resources for computationally-intensive tasks such as media transcoding algorithms for pitch analysis of audio tracks and fMRI image analysis

• SIDGrid is unique among social science data archive projects– Focused on streaming data which change over time– Provides the ability to investigate multiple datasets, collected at different time

scales, simultaneously

• Active users of the SIDGrid system include a human neuroscience group and linguistic research groups from the University of Chicago and the University of Nottingham, UK

SIDGrid sidgrid.ci.uchicago.edu

TeraGrid Pathways Activities

•2 Gateway components– Adapt gateways for educational use by

underrepresented communities•GEON – SDSC, Navajo Tech

– Teach participants from underrepresented communities how to build gateways•PolarGrid – IU, ECSU

Navajo Technical College and gateways

•Incorporating the use of gateways in their curricula•GEON, GISolve areas of initial interest

Menu TG Resources and Services

•Computing – over a petaflop of computing power and growing

•Data – Data storage facilities & management tools – Scientific data collections

•Over 30 Science Gateways

•Remote visualization servers and software

•Technical Support– Central point of contact for support of all systems– Advanced Support for TeraGrid Applications (ASTA)

•Education and training events and resources– K-12 Education– Pathways– Campus Champions

35

Human

Connection:

Your

Campus

Champion

• The Campus Champions program supports campus representatives as the local source of knowledge about high-performance computing opportunities and resources.

Knowledge plus assistance will empower campus researchers, educators, and students to advance scientific discovery.

• Your campus will benefit by having direct access to the TeraGrid and input to its staff, resource allocations awarded for their use, and assistance in using those resources.

• TeraGrid will support the Campus Champion. See

– http://www.teragrid.org/eot/campuschamps.html

– To join the Campus Champions program, contact the TeraGrid Campus Champions Program Coordinator, at tgcc-help@teragrid.org.

Online Resources

•Online resources at www.teragrid.org

•TeraGrid User Portal for managing allocations and job flow

•Documentation– Knowledge Base for quick answers to FAQ’s– HPC University to increase general HPC knowledge

•Calendar of events including upcoming workshops and training– Annual conference - TG09

•Arlington, VA•June 22-26, 2009

TeraGrid: greater than the sum of its parts

•Leadership in cyberinfrastructure development, deployment and support

•Expertise in building national computing and data resources

•Leveraging extensive resources, expertise, R&D, and EOT– Leveraging activities at other participant sites– Learning from each other improves expertise of all TG staff– Shared training, education, and outreach resources benefit all

•Simplified access to high end resources– Single unified allocations process– Single point of contact for problem reporting– Coordinated software environments– Uniform access to heterogeneous resources to solve a single

scientific problem

38

Would you like to learn more about getting a TeraGrid allocation ?

Yes Not today

How does the Allocations process work?

• Startup allocations: for code development, experimentation with TeraGrid platforms, and application testing. Startup requests may total up to 200,000 service units (SUs) of computation, up to 5TB on disk and 25TB on tape of storage.

• Education allocations: for use in classroom instruction or training activities, with the same SU and storage limits as Startup allocations.

• Research allocations: requires a detailed justification of resource usage. Requests are reviewed four times a year by the Resource Allocations Committee.

– National peer-review process

•allocates computational and data resources

•makes recommendations on allocation of advanced direct support services

•Currently awarding >1B Normalized Units of resources

– Principal investigator (PI) must be a researcher, educator, or postdoctoral researcher at a US academic or non-profit research institution.

Go to the POPS page - https://pops-submit.teragrid.org

Á

Create a POPS Login

Á

Indicate that you are “New” to the Teragrid

Á

Indicate this is a “Start-up” Request

Á

Select Startup or Educational

Fill out PI information

Á

Á

Skip Co-PIs info

Á

Fill out info on your project

Fill out info on your funding

Á

Á

Á

Á

Á

Estimate your computing need (reasonably)

Á Á when ready

Upload your CV and Submit!

Acknowledgements

• This work is made possible by the dedicated efforts of the TeraGrid staff. In particular, slides came from Scott Lathrop, Craig Stewart, John Towns, Dane Skow, Daphne Siefert-Herron, Vickie Lynch, David Hart (Indiana Dave); David Hart (California Dave), Fran Berman, Nancy Wilkins-Diehr, Laura McGinnis and probably others.

• The Grid Infrastructure Group management of the TeraGrid is funded by NSF grant 0503697.

• The LEAD portal is developed under the leadership of IU Professors Dr. Dennis Gannon and Dr. Beth Plale, and supported by NSF grant 331480. Marcus Christie and Surresh Marru of the Extreme! Computing Lab contributed the LEAD graphics

• The ChemBioGrid Portal is developed under the leadership of IU Professor Dr. Geoffrey C. Fox and Dr. Marlon Pierce and funded via the Pervasive Technology Labs (supported by the Lilly Endowment, Inc.) and the National Institutes of Health grant P20 HG003894-01.

• Any opinions, findings and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation (NSF), National Institutes of Health (NIH), Lilly Endowment, Inc., or any other funding agency.

Thank you!

•Questions?

top related