ut grid project jay boisseau, texas advanced computing center sura grid application planning &...

41
UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

Upload: gavin-goodman

Post on 28-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

UT Grid Project

Jay Boisseau, Texas Advanced Computing Center

SURA Grid ApplicationPlanning & Implementations Workshop

December 7, 2005

Page 2: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

Outline

• Overview– Vision– Strategy– Approach

• Current Project Status, Near-Term Goals– UT Grid Production Compute Resources

• Roundup• Rodeo

– Interfaces to production resources: • Grid User Portal• Grid User Node

– Tools to support resources: • GridPort• GridShell• Metascheduling Prediction Services

• Future Work and Plans

Page 3: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

UT Grid Vision: A Powerful, Flexible, and Simple Virtual Environment for Research & Education

The UT Grid project vision is to create a cyberinfrastructure for research and education in which people can develop and test ideas, collaborate, teach, and learn through applications that seamlessly harness the diverse campus compute, visualization, storage, data, and instruments as needed from their personal systems (PCs) and interfaces (web browsers, GUIs, etc.).

Page 4: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

UT Grid: Develop and Provide a Unique, Comprehensive Cyberinfrastructure…

The strategy of the UT Grid project is to integrate…– common security/authentication– scheduling and provisioning– aggregation and coordination

diverse campus resources…– computational (PCs, servers, clusters)– storage (Local HDs, NASes, SANs, archives)– visualization (PCs, workstations, displays, projection rooms)– data collections (sci/eng, social sciences, communications, etc.)– instruments & sensors (CT scanners, telescopes, etc.)

from ‘personal scale’ to terascale…– personal laptops and desktops– department servers and labs– institutional (and national) high-end facilities

Page 5: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

…That Provides Maximum Opportunity & Capability for Impact in Research, Education

…into a campus cyberinfrastructure…– evaluate existing grid computing technologies– develop new grid technologies– deploy and support appropriate technologies for production use– continue evaluation, R&D on new technologies– share expertise, experiences, software & techniques

that provides simple access to all resources…– through web portals– from personal desktop/laptop PCs, via custom CLIs and GUIs

to the entire community for maximum impact on– computational research in applications domains– educational programs– grid computing R&D

Maytal Dahan
Should Focus on 'Entire Research and Development' Community
Page 6: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

Add Services Incrementally, Driven By User Requirements

Page 7: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

Texas Two-Step: Hub & Spoke Approach

• Deploying P2P campus grid requires overcoming two trust issues– grid software: reliability, security, and performance– each other: not to abuse one’s own resources

• Advanced computing center presents opportunity to build centrally manage grid as step to P2P grid– already has trust relationships with users– so, when facing both issues, install grid software centrally first

• create centrally managed services• create spokes from central hub

– then, when grid software is trusted• show usage and capability data to demonstrate opportunity• show policies and procedures to ensure fairness• negotiate spokes among willing participants

Page 8: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

UT Grid: Logical View

• Integrate a set of resources(clusters, storage systems, etc.)within TACC first

TACC Compute,Vis, Storage, Data

(actually spread across two campuses)

Page 9: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

UT Grid: Logical View

• Next add other UTresources usingsame tools andprocedures

ACES Data

ACES PCs

TACC Compute,Vis, Storage, Data

ACES Cluster

Page 10: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

UT Grid: Logical View

• Next add other UTresources usingsame tools andprocedures

ACES PCs

GEO Cluster

GEO Data

TACC Compute,Vis, Storage, Data

GEO Cluster

ACES DataACES Cluster

Page 11: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

UT Grid: Logical View

ACES PCs

BIO Data BIO Instrument

PGE Cluster

PGE Instrument

• Next add other UTresources usingsame tools andprocedures

PGE Data

TACC Compute,Vis, Storage, Data

ACES DataACES Cluster

GEO Data

GEO Cluster

GEO Cluster

Page 12: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

UT Grid: Logical View

• Finally negotiateconnectionsbetween spokesfor willing participantsto develop a P2P grid. ACES PCs

BIO Data BIO Instrument

PGE Cluster

PGE Data

PGE Instrument

TACC Compute,Vis, Storage, Data

ACES DataACES Cluster

GEO Data

GEO Cluster

GEO Cluster

Page 13: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

Enhancing Grid Computing R&D and Deployment Expertise for UT and for IBM

• Benefits for IBM– Increased knowledge of diverse grid user and application

requirements in universities– Access to new software technologies developed for UT Grid– Early awareness of new distributed & grid computing R&D

opportunities– Exposure & expertise in a variety of grid technologies, open

source & commercial, which can be shared internally– Experience to be gained from maintaining a large distributed

production grids– Collaboration with UT in conducting new distributed & grid

computing R&D activities, including publications, proposals– Exposure among TACC’s collaborators and peers for

expertise in grid deployment services, capabilities

Page 14: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

Enhancing Grid Computing R&D and Deployment Expertise for UT and for IBM

• Benefits for UT Austin– greater access to all resources by entire community– more effective utilization of existing and future resources– unique capabilities presented by access, aggregation,

coordination for research, education– enhanced collaborative capabilities among researchers, and

among teachers & students• Additional Benefits for TACC

– increased expertise in grid deployment issues– early awareness of new distributed & grid computing R&D

opportunities– platform for conducting new distributed & grid computing

R&D activities

Page 15: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

Enhancing Grid Computing R&D and Deployment Expertise for UT and for IBM

• Benefits for TACC Partners– UT Grid-supported technologies being integrated into

TeraGrid: GridPort/user portal, GridShell/user node, etc.– Expertise being developed in scheduling will be used in

TeraGrid– UT Grid developments will be used

• in TIGRE and SURA Grid• by TACC partners in UT System, HiPCAT, U.S., Latin America• by TACC industrial partners

• Benefits for Community– UT Grid producing IBM DeveloperWorks articles– UT Grid R&D will produce professional papers in Year 2

(and proposals)

Page 16: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

TACC Grid Technology & Deployment Activities Provide Synergy Through Tech Transfer

• UT Grid– creating new tools for integrating compute, vis, storage and data

across campus, from ‘personal scale’ to terascale– will exchange tools, experiences with TeraGrid & TIGRE to

advance both and be interoperable with each• TeraGrid

– will utilize & promote UT Grid user portal & user node technologies, and scheduling & workflow results

– will provide grid visualization and data collection services to UT Grid, benefiting TACC and IBM

• TIGRE– will utilize, promote UT Grid results and expertise to other state

institutions, including industry– will provide additional experiences with UT Grid technologies from

users from across state, helping to refine technologies

Page 17: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

UT Grid Compute Resources

• PCs and workstations– Roughly 1/2 are Windows on Intel/AMD and 1/3 are Macs– Most of rest are Linux on Intel/AMD

• Networks of PCs and Workstations– Roundup: United Devices-managed network of PCs

• Non-dedicated, heterogeneous compute resources across campus• Some managed by TACC, ITS, or other departments; some individually managed• Windows, Linux & Mac desktop PCs

– Rodeo: Condor-managed network of PCs• Dedicated & non-dedicated, heterogeneous compute resources• Some managed by TACC, ITS, or other departments; some individually managed• Linux, Windows & Mac PCs , plus some workstations

• Clusters– Lonestar: 1024-processor Linux at TACC– Wrangler: 656-processor Linux cluster at TACC– Longhorn: 128-processors in 4-way IBM p655 nodes at TACC– Other smaller clusters at TACC– Various department/lab cluster from 4 to 128+ processors will be included– Resources have different resource managers (LSF, PBS, SGE)

• High-end Servers – Longhorn: IBM system 32 Power4 processors, 128 GB memory– Maverick: Sun system w/64 dual-core UltraSPARC 4 procs, 512 GB mem

Page 18: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

Interfaces and Tools for these Resources

• For a broad, diverse campus community, access must be easy and from local resources– Users access Grid User Portal with standard web browser

• Grid User Portal submits to Rodeo via SOAP– UT-Grid Condor Web Services layer developed to facilitate– Condor portlet part of GridPort 4 release

• Grid User Portal submits to Roundup via Hosted Applications– Users access Grid User Node with SSH

• Grid User Node submits to Rodeo via GridShell– GridShell provides command line interface through shell façade– Abstracts user from underlying grid technology and complexity– Submits to specific resource or determines most appropriate resource using

catalog services• Grid User Node submits to Roundup

– Batch job submission supported via GridShell– CLI for submitting hosted application jobs

Page 19: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

Accessing UT Grid Compute ResourcesHosted User Nodes & Portals

PC Grid

User

HPC

Storage

Compute Resources

GRAM/GridFTP

GRAM/GridFTPGRAM/GridFTP

GRAM/GridFTP

Visualization

GRAM/GridFTP

Grid User Portal Grid User Node (Windows, Mac, Linux)

Condor UnitedDevices

GRAM/GridFTPGRAM/GridFTP

Page 20: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

Current Status & Near Term Goals

Page 21: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

Roundup

FrioUnited Devices

Roundup Grid MP

`

`

`

ITS

`

`

`ENG

College of Engineering

Computer Science`

`CS

ITS

`

`TACC

TACC

``

COC

College of Fine Arts

College of Communication

Roundup

UT Grid User

Grid User Portal Grid User Node (Windows, Mac, Linux)

Roundup

Page 22: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

Roundup: Current Status

• Roundup is a production UT Grid resource– Production system with over 1000 PCs distributed in

campus– Automated account request and creation– Production level consulting– Comprehensive user guide– Training classes offered at TACC

• Client downloads available for Windows, Mac, Linux from UT Grid web site

• Hosted Applications Installed– HMMer, BLAST, POV-Ray, Coorset, etc.

Page 23: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

Roundup: Next Steps

• Near-term goals (few months):– Support additional production users– GSI Integration

• United devices GridMP has capability for multiple authentication schemes

• Need to add support extension for GSI

– Evaluate MP Insight data warehousing and report generation package

– Test and evaluate screen saver feature and start development of UT specific screen saver

– Investigate possible solutions to enable sharing jobs across grids• Multi-grid agents or job forwarding

Page 24: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

Rodeo

`

`

`

ICES

`

`

` CS

ComputerScience

Condor Pool ICES Condor Pool

Rodeo

UT Grid User

Grid User Portal Grid User Node (Windows, Mac, Linux)

Collector/NegotiatorCollector/Negotiator

TACC Condor Pool

Page 25: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

Rodeo: Current Status

• Rodeo is a production UT Grid resource– Production system with over 500 PCs made up of

dedicated clusters and PCs distributed on campus– Automated account request and creation– Production level consulting– Comprehensive user guide– Training classes offered at TACC

• Client downloads available for Windows, Mac, Linux from UT Grid web site

Page 26: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

Rodeo: Current Status

• Currently the largest production users are:– UTCS (Department of Computer Sciences)– Graeme Henkleman (Chemistry)– Wolfgang Bangerth (Geosciences)

Page 27: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

Rodeo: Next Steps

• Near term goals (few months):– Continue supporting production users– Expand on number of CPUs available to users– Explore ‘hosted’ applications possibilities

Page 28: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

UT Grid Interfaces

• UT Grid will provide two types of interfaces: – Web-based Grid User Portal (GUP) accessible via any web

browser– Customized desktop environments for Linux, Windows and

Macintosh PCs to act as Grid User Nodes (GUN).

• Users can access all UT Grid resources using either the GUP or GUNs managed by UT Grid.

• They will also be able to download the necessary software to build and host their own customized grid user portals or convert their personal desktop systems into grid user nodes.

Page 29: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

Motivation for a Grid User Portal

• Lower the barrier of entry for novice user• Provide a centralized grid account management

interface– Easy access to multiple resources through a single interface

• Simple GUI interface to complex grid computing capabilities– Provide simple alternatives to CLI for advanced users

• Present a “Virtual Organization” view of the Grid as a whole

• Increase productivity of UT researchers – do more science!

Page 30: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

Grid User Portal: Current Status

• Added Roundup and Rodeo as production resources on TACC User Portal

• Developed JSR-168 Compliant portlets that can:– View information on resources within UT Grid, including

status, load, jobs, queues, etc.– View network bandwidth and latency between systems,

aggregate capabilities for all systems.– Submit user jobs– Manage files across systems, and move/copy multiple files

between resources with transfer time estimates• These portlets contribute to GridPort 4 release• TACC leading portal effort in TeraGrid

– This will impact TACC User Portal and therefore UTGrid

Page 31: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

Grid User Portal: Next Steps

• New term plans (few months):– Complete new TACC User Portal (TUP) based on

GridPort 4 including UT Grid resources• UT Grid capabilities fully integrated into TUP• Ability to customize environment to only expose UT Grid

resources

– Migrate portlets to WebSphere to ensure compatibility(?)

– Grid Account Management Portlets

Page 32: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

Grid User Node

• The Linux GUN current capabilities:– Information queries about grid resources– Job submission

• Parallel computing jobs (Dedicated Cluster Resources)• Serial computing jobs (Roundup, Rodeo)

– Monitoring job status– Reviewing job results– Resource brokering based on ClassAd catalogs– GridFTP enabled GSIFTP

Page 33: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

Grid User Node: Current Status

• Production Linux, development Windows and Mac GUNs– Need to decide whether to do GUI versions

• Submission to Roundup and Rodeo• “On-Demand” glide-in of UD resources into Condor

pool• Integrated “real-life” applications

– NAMD– SNOOP3D– HMMeR– POVray

Page 34: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

Grid User Node: Next Steps

• Near term goals: – Investigating distribution of GUN software stack

using VDT– Prepare and present training class before the end

of the year.

Page 35: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

GridPort: Current Status

• GridPort 4 developed and released this month

• Available to UT Grid and national users as a grid portal toolkit to download and create user and application portals

• Based on JSR-168 compliant portlets• Leveraged technology and knowledge in UT

Grid to create Condor and Comprehensive file transfer portlets

Page 36: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

GridPort: Next Steps

• Near term goals– GridPort4 will be part of the TeraGrid User Portal,

to be in production in 1Q06– Preparing demonstration and lab at Grid

Workshop in Venezuela in April 2006– Continue evolution of GridPort to include:

• Advanced job submission functionality• Advanced user customization, and more

– Investigating demo portal based in WebSphere

Page 37: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

GridShell: Current Status

• GridShell developed and deployed on UT Grid (and TeraGrid)

• Available to UT Grid users in the GUN software stack.

• Able to submit jobs first to a Spoke (departmental cluster) and then to the Hub (TACC) if not enough resources are available at the Spoke.

• Collaborating with researchers at PSC and Caltech, we have extended GridShell to provide a single job submission interface (Condor) to the heterogeneous clusters on the TeraGrid.

Page 38: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

GridShell: Next Steps

• Near term goals– Create a public download site for GridShell 1.0

(current version available only to NSF TeraGrid and UT Grid users).

– Continue evolution of GridShell to include:• Support submitting jobs to clusters with firewalls

– Need to hire an additional developer and developers partnerships with external developers (e.g. GridPort)

Page 39: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

MPS: Current Status

• Goal is to reduce turn around times of jobs by optimizing resource selection for data movements, queue wait times, and performance

• Components– Prediction Services

• Execution times, queue wait times, file transfer times

– Resource Brokering• Immediately select resources based on job requirements

– Including predictions

– Metascheduling• Schedule complex jobs such as workflows• Workload management

Page 40: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

MPS: Next Steps

• Near term goals:– Create prediction web services

• Based on existing R&D• Predictions based on

– Historical information– Learning algorithms– Scheduling simulations

– Integrate with Condor-G• Provide additional information about clusters

– The clusters themselves (e.g. number of CPUs)– The jobs submitted to the clusters

• Add call outs so matchmaker can request predictions• User requests minimizing predicted response time as part of ranking

– Demonstration with Graham Carey (ICES / UT Austin)• Selecting which cluster at TACC to use• Matchmaking capability using MPS to rank systems based on user

request

Page 41: UT Grid Project Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementations Workshop December 7, 2005

Future Plans and Work

• Complete MPS work and integrate campus cluster with TACC clusters– First, just ‘upload’ larger jobs– Later, share jobs among spokes

• Integrate maverick as remote visualization resource into UT Grid– Overlapping software stack with PCs– Remote vis software downloads (incl. file transfer)– Vis portal

• Integrate campus data collections into UT Grid– Hosted collections in DBs– WebSphere Information Integrator?

• Prepare NSF proposal?