icer user m eeting

32
iCER User Meeting 3/26/10

Upload: mervin

Post on 08-Feb-2016

51 views

Category:

Documents


0 download

DESCRIPTION

iCER User M eeting. 3/26/10. Agenda. W hat’s new in iCER (Wolfgang) W hats new in HPCC (Bill) Results of the recent cluster bid Discussion of buy-in (costs, scheduling) Other. What’s New in iCER. New iCER Website. http:// icer.msu.edu. Part of VPRGS News Showcased Projects - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: iCER  User  M eeting

iCER User Meeting

3/26/10

Page 2: iCER  User  M eeting

Agenda

• What’s new in iCER (Wolfgang)• Whats new in HPCC (Bill)• Results of the recent cluster bid• Discussion of buy-in (costs, scheduling)• Other

Page 3: iCER  User  M eeting

What’s New in iCER

Page 4: iCER  User  M eeting

New iCER Website• Part of VPRGS– News– Showcased

Projects– Supported

Funding– Recent

Publications

http://icer.msu.edu

Page 5: iCER  User  M eeting

User Dashboard• Common Portal to User Resources– FAQ– Documentation– Forums– Research

Opportunities– Known Issues

http://wiki.hpcc.msu.edu

Page 6: iCER  User  M eeting

Current Research Opportunities• NSF Postdoc Fellowships for Transformative

Computational Science using CyberInfrastructure• Website– Proposals– Classes– Seminars– Papers– Jobs

http://wiki.hpcc.msu.edu

Page 7: iCER  User  M eeting

• 50/50 match from iCER for a postdoc for large grant proposals (multi-investigator, inter-disciplinary)

• Currently only three matches picked up– Titus Brown– Scott Pratt– Eric Goodman

• Several other matches promised, but grants not decided yet

• More opportunities!

Postdoc Matching

Page 8: iCER  User  M eeting

• New Hire!• Eric McDonald– System Programmer– Partnership with NSCL (Alex Brown et al.)

Personnel

Page 9: iCER  User  M eeting

• Interdisciplinary graduate education in high-performance-computing & science

• Big Data• Leads:– Dirk Colbry– Bill Punch

IGERT Grant Proposal

Page 10: iCER  User  M eeting

• NSF STC– Funded, starting in June– $5M/year for 5 years

• New joint space with iCER & HPCC– First floor BPS– Former BPS library space

BEACON

Page 11: iCER  User  M eeting
Page 12: iCER  User  M eeting
Page 13: iCER  User  M eeting
Page 14: iCER  User  M eeting
Page 15: iCER  User  M eeting
Page 16: iCER  User  M eeting

What’s New in HPCC

Page 17: iCER  User  M eeting

Graphics Cluster

32 node cluster• 2 x Quad 2.4GHz• 18GB ram• Two Nvidia M1060• no Infinband (Ethernet

only)

Page 18: iCER  User  M eeting

Result of a Buyin

• 21 of the nodes were purchased by funds from users

• Can be used by any HPCC user

Page 19: iCER  User  M eeting

Each nVidia Tesla M1060• Number of Streaming Processor Cores 240• Frequency of processor cores 1.3 GHz• Single Precision peak floating point performance 933 gigaflops• Double Precision peak floating point performance 78 gigaflops• Dedicated Memory 4 GB GDDR3 • Memory Speed 800 MHz• Memory Interface 512-bit • Memory Bandwidth 102 GB/sec• System Interface PCIe

Page 20: iCER  User  M eeting

Example Script#!/bin/bash –login#PBS –l nodes=1:ppn=1:gfx10,walltime=01:00:00#PBS –l advres=gpgpu.6364,gres=gpu:1

cd ${PBS_O_WORKDIR}module load cudamyprogram myarguments

Page 21: iCER  User  M eeting

CELL Processor

2 Playstation 3’s• running linux• for experimenting

with CELL• dev-cell08 and test-

cell08 (see the web for more details)

Page 22: iCER  User  M eeting

Green Restrictions

• The machine Green is still up an running, especially after having removed some problematic memory

• Mostly replaced by AMD fat nodes• On April 1st, it will be reserved for jobs

requesting 32 cores (or more) and/or 250 GB of memory (or more)

• Hope to help people running larger jobs

Page 23: iCER  User  M eeting

HPCC Stats

• Ganglia (off the main web page, Status) is back and working. Gives you a snapshot of the present system

• We are nearly done with a database of all run jobs that can be queried for all kinds of information. Should be up in the next couple of weeks.

Page 24: iCER  User  M eeting

Cluster Bid Results

Page 25: iCER  User  M eeting

How it was done

• HPCC submitted a Request for Quotes for a new cluster system.

• Targeted:– performance vs. power main concern– Inifinband– 3GB per core of memory– approximately $500K of cluster

Page 26: iCER  User  M eeting

Results

• Received 13 bids from 8 vendors• Found 3 options that were suitable for the

power, space, cooling and performance we were looking for.

• Looking for some guidance from you on a number of issues

Page 27: iCER  User  M eeting

Choice 1: Infiniband config

Two ways to configure Infiniband:• series of smaller switches configured in a

hierarchy (leaf switches)• one big switch (director)• leaf switches are cheaper, harder to expand

(requires reconfiguration), more wires, more points of failure

• director is more expandable, convenient, expensive

Page 28: iCER  User  M eeting

Choice 2: Buyin Cost

• buyin cost could reflect just the cost of the compute nodes itself, HPCC provides infrastructure (switches, wires, racks, etc.)

• buyin cost could reflect the total hardware cost

• obviously, subsidizing costs means cheaper buyin costs, fewer general nodes.

Page 29: iCER  User  M eeting

Remember

• HPCC is still subsidizing costs, even if hardware is not subsidized

• still must buy air-conditioning equipment, OS licenses, MOAB (scheduling) licenses, software licenses (Not to mention salaries, power)

• Combined, “other” hardware will run to about $75K

• scheduler about $100K for 3 years.

Page 30: iCER  User  M eeting

Some Issues

• 1 node = 8 cores, 1 chassis = 4 nodes. • Buyin will be at the chassis level (32 cores)

Page 31: iCER  User  M eeting

For 1024 coresVendor/config Total Per

node/subsidized

Per node/full

Dell/leaf $418K $2,278 ($9,112)

$3,260 ($13,040)

HP/leaf $460K $2,482 ($9,928)

$3,594 ($14,376)

Dell/director $523K $2,278 ($9,112)

$4,086 ($16,344)

Page 32: iCER  User  M eeting

Scheduling

• We are working on some better scheduling methods. We think they have promise and would be very useful to the user base

• For the moment, it will be the Purdue model. We guarantee access to nodes within 8 hours of a request from a buyin user. Still a week max run time (though can be changed)