state of hcc 2012 dr. david r. swanson director, holland computing center

Post on 15-Jan-2016

216 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

State of HCC2012

Dr. David R. SwansonDirector, Holland Computing Center

Nature Communications, July 17, 2012

Nebraska Supercomputing Symposium 2012

HCC CPU Hour Usage 2012

Nebraska Supercomputing Symposium 2012

Zeng (Quant Chem) 4.5M

Starace (AMO Phys) 2.7M

Rowe (Climate) 2.0M

NanoScience 6.4M

B

N

NN

N NN

NB

CComp Bio 3.0M B

Comp Sci 1.7M C

C

Physics 0.7M

Mech E 0.4M

High Performance Computing

• http://t2.unl.edu/status/hcc-status• Xiao Zeng, Chemistry, UNL (prior slide)• DFT and Car Parrinello MD• HPC – tightly coupled codes• Requires expensive low-latency local network

(infiniband)• Requires high-performance storage (Panasas,

Lustre) • Requires highly reliable hardware

Nebraska Supercomputing Symposium 2012

Eureka! A Higgs! (or at least something currently indistinguishable)

• "I think we have it. We have discovered a particle that is consistent with a Higgs boson." – CERN Director-General Rolf Heuer

Nebraska Supercomputing Symposium 2012

US CMS Tier2 Computing

Nebraska Supercomputing Symposium 2012

Compact Muon Solenoid (CMS)

5.5 mi

Large Hadron Collider

Nebraska Supercomputing Symposium 2012

CMS Grid Computing Model

Nebraska Supercomputing Symposium 2012

Eureka! A Higgs! (or at least something currently indistinguishable)

• Ca. 50 PB of CMS data in entirety• Over 1 PB currently at HCC’s “Tier2”, 3500

cores• Collaboration at many scales

– HCC and Physics Department– Over 2700 scientists worldwide– International Grid Computing Infrastructure– Data grid as well– UNL closely linked to KU, KSU physicists via

a jointly hosted “Tier3” Nebraska Supercomputing Symposium 2012

Data Intensive HTC

• Huge database• Requires expensive high-bandwidth wide area

network (dwdm fiber)• Requires high-capacity storage (HDFS, dCache) • HTC – loosely coupled codes• Requires hardware

Nebraska Supercomputing Symposium 2012

Outline

• HCC Overview• New User report• HCC-Go• Moving Forward (after break)

– Next purchase– It’s the Data, stupid… – Other Issues

Nebraska Supercomputing Symposium 2012

Outline

• New User report• HCC-Go• Moving Forward (next section)

– Next purchase (motivation)– New Communities– PIVOT– It’s the Data, stupid…

Nebraska Supercomputing Symposium 2012

HOLLAND COMPUTING CENTER OVERVIEW

Nebraska Supercomputing Symposium 2012

HCC @ NU

• Holland Computing Center has a University-wide mission to – Facilitate and perform computational and

data intensive research– Engage and train NU researchers, students,

and other state communities

– This includes you! – HCC would be delighted to collaborate

Nebraska Supercomputing Symposium 2012

Computational Science – 3rd Pillar

Experim

ent

Theory

Computation/D

ata

Nebraska Supercomputing Symposium 2012

Lincoln Resources

• 10 staff• Red• Sandhills• 5,000 compute

cores• 3 PetaBytes

storage in HDFS

Nebraska Supercomputing Symposium 2012

Sandhills “Condominium Cluster”

• 44 nodes X 32-core, 128 GB, IB

• Lustre (175 TB)• Priority Access

– $HW + $50/month– 4 groups currently

• SLURM

Nebraska Supercomputing Symposium 2012

Omaha Resources

• 3 Staff • Firefly • Tusker• 10,000 compute

cores• 500 TB storage• New offices soon:

158J PKI

Nebraska Supercomputing Symposium 2012

Tusker

• 106*64= 6784 cores

• 256 GB/node• 2 nodes w/ 512

GB• 360 TB Lustre

– 100 TB more en route

• QDR IB• 43 TFlop

Nebraska Supercomputing Symposium 2012

Tusker

• ¼ footprint of Firefly

• ¼ the power• 2X the TFLOPS• 2X the storage• Fully utilized• Maui/Torque

Nebraska Supercomputing Symposium 2012

In between …

• HCC (UNL) to Internet2: 10 gbps• HCC (Schorr) to HCC (PKI): 20 gbps• Allows us to do some interesting things

– “overflow” jobs to/from Red– DYNES project– Xrootd mechanism

Nebraska Supercomputing Symposium 2012

HCC Staff

• HPC Applications Specialists– Dr. Adam Caprez– Dr. Ashu Guru– Dr. Jun Wang– Dr. Nicholas

Palermo

• System Administrators– Dr. Carl Lundstedt– Garhan Attebury– Tom Harvill– John Thiltges– Josh Samuelson– Dr. Brad Hurst

Nebraska Supercomputing Symposium 2012

HCC Staff

• Other Staff– Dr. Brian

Bockelman– Joyce Young

• GRAs– Derek Weitzel– Chen He– Kartik Vedalaveni– Zhe Zhang

• Undergraduates– Carson Crawford– Kirk Miller– Avi Knecht– Phil Brown– Slav Ketsman– Nicholas Nachtigal– Charles Cihacek

Nebraska Supercomputing Symposium 2012

HCC Campus Grid

• Holland Computing Center resources are combined into an HTC campus grid– 10,000 cores, 500 TB in Omaha– 5,000 cores, 3 PB in Lincoln– All tied together via a single submission

protocol using OSG software stack– Straightforward to expand to OSG sites

across the country, as well as to EC2 (cloud)– HPC jobs get priority; HTC ensures high

utilizationNebraska Supercomputing Symposium 2012

HCC Model for a Campus Grid

Me, my friends and everyone else

Grid

Campus

Local

25

Nebraska Supercomputing Symposium 2012

HCC & Open Science Grid

• National, distributed computing partnership for data-intensive research– Opportunistic computing– Over 100,000 cores– Supports the LHC experiments, other

science– Funded for 5 more years– Over 100 sites in the Americas– Ongoing support for 2.5 (+3) FTE at HCC

Nebraska Supercomputing Symposium 2012

It Works!

Nebraska Supercomputing Symposium 2012

HCC Networking Monitoring

Nebraska Supercomputing Symposium 2012

OSG Resources

Nebraska Supercomputing Symposium 2012

Working philosophy

• Use what we buy– These pieces of infrastructure are linked, but improve

asynchronously – Depreciation is immediate– Leasing is still more expensive (for now)– Buying at fixed intervals mitigates risk, increases ROI– Space, Power and Cooling have a longer life span

• Share what we aren’t using– Share opportunistically – retain local ownership– Consume opportunistically – there is more to gain!– Collaborators, not just consumers– Greater good vs. squandered opportunity

Nebraska Supercomputing Symposium 2012

Working philosophy

• A Data deluge is upon us• Support is essential

– If you only build it, they still may not come– Build incrementally and buy time for user training– Support can grow more gradually than hardware

• Links to national and regional infrastructure are critical – Open Source Community– GPN access to Internet2– Access to OSG, XSEDE resources– Collaborations with fellow OSG experts– LHC

Nebraska Supercomputing Symposium 2012

HCC New Users

FY UNL-City

UNL-East

UNO UNMC Outside NU system

2011 424 (74) 33 (10) 75 (19) 30 (17)

112 (26)

2012 519 (95) 50 (17)

105 (30) 35 (5)

130 (18)

Nebraska Supercomputing Symposium 2012

New User Communities

• Theatre, Fine Arts/Digital Media, Architecture• Psychology, Finance

• UNMC

• Puerto Rico

• PIVOT collaborators

Nebraska Supercomputing Symposium 2012

HCC NEW USER REPORT:HEATH ROEHR

Nebraska Supercomputing Symposium 2012

HCC-GO :DR. ASHU GURU

Nebraska Supercomputing Symposium 2012

MOVING FORWARD

Nebraska Supercomputing Symposium 2012

NEW PURCHASE

Nebraska Supercomputing Symposium 2012

$2M for …

• More computing– need ca. 100 TF to hit Top500 for Jun 2013 – Likely use all of funds to hit that amount

• More storage– Near-line archive (9 PB)– HDFS

• Specialty hardware– GPGPU/Viz– Mic hardware

Nebraska Supercomputing Symposium 2012

More computing

• How much RAM/core? • Currently almost always oversubscribed• Large scale jobs almost impossible (> 2000

core)• Safest investment – will use right away• Firefly due to be retired soon – EOL

Nebraska Supercomputing Symposium 2012

More computing

Nebraska Supercomputing Symposium 2012

More Computing

Nebraska Supercomputing Symposium 2012

More storage

• Most rapidly growing demand• Growing contention, can’t just queue up• Largest unmet need (?)

Nebraska Supercomputing Symposium 2012

Storage for $2M

• $2M HDFS cluster – 250 nodes– 4000 cores (Intel)– 9.0 PB (RAW)– 128 GB / node

Nebraska Supercomputing Symposium 2012

Other options

• GPGPUs most Green option for computing• Highest upside for raw power (Top500)• Mic even compatible with x86 codes

• SMP uniquely meets some needs, easiest to use/program

• Bluegene, Tape silo, …

Nebraska Supercomputing Symposium 2012

HCC personnel timeline

1999 2002 2005 2009 2012

Personnel 2 3 5 9 13

1

3

5

7

9

11

13

HCC Personnel Numbers

Nu

mb

er

7X

Nebraska Supercomputing Symposium 2012

HCC networking timeline

1999 2002 2005 2009 2012

WAN B/W 0.155 0.155 0.622 10 30

2.5

7.5

12.5

17.5

22.5

27.5

HCC WAN Bandwidth

Gb

/sec

200X

Nebraska Supercomputing Symposium 2012

HCC cpu timeline

1999 2002 2005 2009 2012

CPU Cores 16 256 656 6956 14492

1000

3000

5000

7000

9000

11000

13000

15000

HCC CPU Cores

Nu

mb

er

900X

Nebraska Supercomputing Symposium 2012

HCC storage timeline

1999 2002 2005 2009 2012

Capacity 0.108 1.2 31.2 1200 3250

250

750

1250

1750

2250

2750

3250

HCC Storage Capacity (RAW)

Tera

Byte

s

30,000X

Nebraska Supercomputing Symposium 2012

Composite Timeline

• Data increase/ CPU Cores = 33• Data increase/ WAN bandwidth = 150• It takes a month to move 3 PB at 10 Gb/sec

• Power < 100X increase, largely constant last 3 years

Nebraska Supercomputing Symposium 2012

Storage at HCC

• Affordable, Reliable, High Performance, High Capacity– Pick 2 – So multiple options

• /home• /work• /shared• Currently, no /archive

Nebraska Supercomputing Symposium 2012

/home

• Reliable• Low performance

– No W from workers• ZFS• Rsync’ed pair, one in Omaha, one in Lincoln• Backed up incrementally, requires severe

quotas

Nebraska Supercomputing Symposium 2012

/work

• High performance• High(er) capacity• Not permanent storage• Lenient quotas• More robust, more reliable “scratch space”• Subject to purge as needed

Nebraska Supercomputing Symposium 2012

/share

• Purchased by given group• Exported to both Lincoln and Omaha machines• Usually for capacity, striped for some reliability

Nebraska Supercomputing Symposium 2012

Storage Strategy

• Maintain /home for precious files– Could be global

• Maintain /work for runtime needs– Remain local to cluster

• Create /share for near-line archive– 3-5 year time frame (or less)– Use for accumulating intermediate data,

then purge– Global access

Nebraska Supercomputing Symposium 2012

Storage strategy

• Permanent archival has 3 options– 1) library– 2) Amazon glacier

• Currently $120/TB/year– 3) tape system

Nebraska Supercomputing Symposium 2012

HCC Data Visualizations

• Fish!• HadoopViz• OSG Google Earth

• Web-based monitoring– http://t2.unl.edu/status/hcc-status/– http://hcc.unl.edu/gratia/index.php

Nebraska Supercomputing Symposium 2012

Other discussion topics

• Maui vs. SLURM• Queue length policy • Education approaches

– This (!)– Tutuorials (next!)– Afternoon workshops– Semester courses– Individual presentations/meetings– Online materials

Nebraska Supercomputing Symposium 2012

©2007 The Board of Regents of the University of Nebraska

NU Administration (UNL, NRI)NSF, DOE, EPSCoR, OSG

Holland FoundationCMS: Ken Bloom, Aaron Dominguez

HCC: Drs. Brian Bockelman, Adam Caprez, Ashu Guru, Brad Hurst, Carl Lundstedt, Nick Palmero, Jun Wang.

Garhan Attebury, Tom Harvill, Josh Samuelson, John Thiltges

Chen He, Derek Weitzel

top related