developing a cluster strategy for npaci all hands meeting panel feb 11, 2000 david e. culler...

15
Developing a Cluster Strategy for NPACI All Hands Meeting Panel Feb 11, 2000 David E. Culler Computer Science Division University of California, Berkeley http://www.cs.berkeley.edu/~culler

Post on 20-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Developing a Cluster Strategy for NPACI All Hands Meeting Panel Feb 11, 2000 David E. Culler Computer Science Division University of California, Berkeley

Developing a Cluster Strategy for NPACI

All Hands Meeting PanelFeb 11, 2000

David E. Culler

Computer Science Division

University of California, Berkeleyhttp://www.cs.berkeley.edu/~culler

Page 2: Developing a Cluster Strategy for NPACI All Hands Meeting Panel Feb 11, 2000 David E. Culler Computer Science Division University of California, Berkeley

2/11/2000 NPACI Clusters 2

• x86+Myrinet platforms w/ GbE inter-networking

UCB Millennium Cluster of Clusters

PIII-X 64x4

PII8x2

PII8x2

PII8x2

PII8x2

PII8x2

PIII 32x2

½ TB

DLIB PII

PIII

Gigabit Ethernet (GbE)

Ninja

Math

BioCE

PhysicsAstro

NTONInternet-2SuperNet

Mobile SvcsKiosksNOW

• Distributed ownership, allocation, and management

Page 3: Developing a Cluster Strategy for NPACI All Hands Meeting Panel Feb 11, 2000 David E. Culler Computer Science Division University of California, Berkeley

2/11/2000 NPACI Clusters 3

Vineyard Cluster Architecture

• Distributed resource utilization and management in a “Vineyard” of Clusters.

- VIA / GM, GbE- Multicast

Applications / Services(ISPACE/Kiosks)

- NT / Linux (2.2.x)- Stride Scheduler

MPI VEXEC

PBS

I/O

Mg

mt /

Mo

nito

rin

g

REXEC

TOOLS

Rootstock Distribution

Page 4: Developing a Cluster Strategy for NPACI All Hands Meeting Panel Feb 11, 2000 David E. Culler Computer Science Division University of California, Berkeley

2/11/2000 NPACI Clusters 4

Clusters “own” HPC

Page 5: Developing a Cluster Strategy for NPACI All Hands Meeting Panel Feb 11, 2000 David E. Culler Computer Science Division University of California, Berkeley

2/11/2000 NPACI Clusters 5

Fundamental Advantages of Clusters

• Cost

• Performance

• Performance / Cost

• Track leading edge of market technology

• Incremental scalability

• Availability

• Tremendous I/O performance

• Wide-Area Network performance– competitive internal network performance too

• Allow specialization of networked services

Page 6: Developing a Cluster Strategy for NPACI All Hands Meeting Panel Feb 11, 2000 David E. Culler Computer Science Division University of California, Berkeley

2/11/2000 NPACI Clusters 6

Fundamental Challenges

• Management• Complete system on every node

– need scalable administration

• Incremental scalability & availability => – heterogeneity

– some parts inoperable at any time

• The Cluster projects are making great progress in this area

– eg: Millennium rootstock

• Cluster tools are what you want for managing the desktops across your department

Page 7: Developing a Cluster Strategy for NPACI All Hands Meeting Panel Feb 11, 2000 David E. Culler Computer Science Division University of California, Berkeley

2/11/2000 NPACI Clusters 7

CS&E HPC hampered by “self-centered” usage model• Have my own application for my studies

• Want the entire machine to myself

• Want it now

• Think “services”

• Think “software”• The value is in your application.

• Make it a service and make it available to the scientific community.

• Put it on a cluster to deliver results 24x7 x 52

Page 8: Developing a Cluster Strategy for NPACI All Hands Meeting Panel Feb 11, 2000 David E. Culler Computer Science Division University of California, Berkeley

2/11/2000 NPACI Clusters 8

Example: TCAD Simulation Service

• star formation simulation

• earthquake simulations

• phylogeny, BLAST, ...

•http://cuervo.eecs.berkeley.edu/Volcano/

Page 9: Developing a Cluster Strategy for NPACI All Hands Meeting Panel Feb 11, 2000 David E. Culler Computer Science Division University of California, Berkeley

2/11/2000 NPACI Clusters 9

Extreme Example

• UCB Millennium / NOW has deliver 70 CPU years!

• Simple special case, but ...

• Engineered for portability, adaptability, availability

Page 10: Developing a Cluster Strategy for NPACI All Hands Meeting Panel Feb 11, 2000 David E. Culler Computer Science Division University of California, Berkeley

2/11/2000 NPACI Clusters 10

What should NPACI do?

To be relevant: • become a “Center of Expertise” for clusters• draw expertise toward the center for ease of

dissemination• facilitate and encourage building clusters among

the partners• invest in an interesting cluster “close to home”

– cheap! Graft Millennium

• invest in people to understand the implications

To Lead:• Pioneer widespread computational science and

engineering services• infiniband

Page 11: Developing a Cluster Strategy for NPACI All Hands Meeting Panel Feb 11, 2000 David E. Culler Computer Science Division University of California, Berkeley

2/11/2000 NPACI Clusters 11

from e-commerce to

Page 12: Developing a Cluster Strategy for NPACI All Hands Meeting Panel Feb 11, 2000 David E. Culler Computer Science Division University of California, Berkeley

2/11/2000 NPACI Clusters 12

Technical Backup Slides

Page 13: Developing a Cluster Strategy for NPACI All Hands Meeting Panel Feb 11, 2000 David E. Culler Computer Science Division University of California, Berkeley

2/11/2000 NPACI Clusters 13

Rootstock Mechanics

KKclusterstock- build- os- drvrs- mill SW- os mods

leasedbuilds

cs

CA

N

Cluster System Distribution Center

...

IPnetwork

1. Cluster Stock

- Rootstock build pages

- Full Current Linux

- all fixes and pckgs

- SSL, SSH

- Cluster Drivers

- Cluster System Layers

- rexec, mpe, pbs

- Optional SW ($)

- Cluster Kernal Mods

5. Cluster Update button (future) - 2nd dialtone, CF engine, rolling update

2. Make the CS “graft” - specify IP address - pckg removes - dchp, dns, nis,...sanity check and build - resolv.conf, /etc/hosts, ...constructs cluster build (lease)download CS build floppy

Cluster

3. CS power-on build

- xfer and localize DT

- add local admin scripts

- node build floppy

4. Node power-on build

- local stock from CS

Page 14: Developing a Cluster Strategy for NPACI All Hands Meeting Panel Feb 11, 2000 David E. Culler Computer Science Division University of California, Berkeley

2/11/2000 NPACI Clusters 14

REXEC / VEXEC

• Components– rexecd, rexec & vexecd

rexecd rexecd rexecd rexecd

vexecd(Policy A)

rexec

Cluster IP Multicast Channel

%rexec –n 2 –r 3 indexer

minimum $

vexecd(Policy B)

Node A Node B Node C Node D

“Nodes AB”run indexer on Nodes AB at 3 credits/min

Page 15: Developing a Cluster Strategy for NPACI All Hands Meeting Panel Feb 11, 2000 David E. Culler Computer Science Division University of California, Berkeley

2/11/2000 NPACI Clusters 15

Computational Economy

• Market-based approach to resource allocation– Optimizes for user value

Resources

EconomicF.E.

API

API

Access Modules

ResourceManagers

TimeShare

BatchQueue

Apps(Value)