research computing at harvard

29
Research Computing at Harvard John Huth

Upload: tevy

Post on 14-Jan-2016

59 views

Category:

Documents


0 download

DESCRIPTION

Research Computing at Harvard. John Huth. Topics. Support of computing in science (as opposed to desktop) is becoming more and more of an issue at research universities. Crimson Grid Initiative in Innovative Computing The EGG project Another kind of LHC computing challenge: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Research Computing at Harvard

Research Computing at Harvard

John Huth

Page 2: Research Computing at Harvard

Topics

• Support of computing in science (as opposed to desktop) is becoming more and more of an issue at research universities.

• Crimson Grid• Initiative in Innovative Computing• The EGG project• Another kind of LHC computing challenge:

– The inverse mapping problem.

Page 3: Research Computing at Harvard

The Crimson Grid InitiativeStarted in April 2004

A project to engineer a technology fabric in support of interdisciplinary &

collaborative computingJoy Sircar – Division of Engineering and

Applied Science

Page 4: Research Computing at Harvard

The Crimson Grid:

•A Scalable collaborative computing environment for research at the interface of science and engineering

•A Gateway to community/national/global computing infrastructures for interdisciplinary research

•A Test bed for faculty & IT-industry affiliates within the framework of a production environment for integrating HPC solutions for higher education & research

•A Campus Resource for skills & knowledge sharing for advanced systems administration & management of switched architectures

Page 5: Research Computing at Harvard

The Campus Grid Vision: Grid of Grids from Local to Global

Community Campus

National

CrimsonGrid-GLOW

ATLAS

OSG

OSG

Page 6: Research Computing at Harvard

OSG/ATLASUsers

CrimsonGridUsers

DEAS Condor pool

CrimsonGridGateway

Campus Grid “agreed”

UsersCG-GLOW

GT-GK

GT-GK NNINCondor pool

GT-GK CRC-ICondor pool

GT-GKWKSTNs

Condor pool

GT-GK

Page 7: Research Computing at Harvard

Power of Campus Grids

GLOW - ~1000 Procs

CG - ~750 Procs

In just 2 campuses !

…..

Page 8: Research Computing at Harvard

Grid use in first 12-months

Number of J obs

0 5000 10000 15000

2005-02

2005-04

2005-06

2005-08

2005-10

2005-12

2006-02

2006-04

2006-06*

First Use Research Areas in the Crimson Grid

•Nanoscience •Mesoscopic Physics •Quantum Chemistry and Quantum Chaos•Condensed Matter Physics•Chemistry at Harvard Molecular Mechanics-- CHARMM•Harvard Biorobotics Lab •Atmospheric Chemistry •Earth and Planetary Sciences (Ocean Modeling)• Solid and Structural Mechanics•Earth Sciences and Geophysics- earthquake engineering;•Complex Biosystems Modeling •Quantitative Social Science

Page 9: Research Computing at Harvard

Initiative in Innovative Computing

Alyssa Goodman (Director)

Tim Clark (Executive Director)

Page 10: Research Computing at Harvard

Filling the “Gap” between Science and Computer

Science

Increasingly, core problems in science require computational solution

Typically hire/“home grow” computationalists, but often lack the expertise or funding to go beyond the immediate pressing need

Focused on finding elegant solutions to basic computer

science challenges

Often see specific, “applied” problems as outside their

interests

Scientific disciplines

Computer Science departments

Page 11: Research Computing at Harvard

Continuum

“Pure” Discipline Science

(e.g. Galileo)

“Pure” Computer Science

(e.g. Turing)

“Computational Science”Missing at Most Universities

Page 12: Research Computing at Harvard

Filling the “computational science” gap: IIC

Problem-driven approach…focusing effort on solving problems that will have greatest impact &

educational valueCollaborative projects

…combining disciplinary knowledge with computer science expertise

Interdisciplinary effort…to ensure that best practices are shared across fields and that new

tools and methodologies will be broadly applicable

Links with industry…to draw on and learn from experience in applied computation

Institutional funding…to ensure effort is directed towards key needs and not driven solely by

narrow priorities of funding agencies

Page 13: Research Computing at Harvard

Where are the optimal “IIC” problems?

Low High

Computer Science Payoff

Dom

ain

Sci

ence

Payoff

Low

HIg

h

“Never Mind” Computer

Science Department

Science Departments

CSDepartments

What is the rightshape for

that boundary?

Page 14: Research Computing at Harvard

Visualization Distributed Computing

Databases/ Provenance

Analysis & Simulations

Instrumentation

Physically meaningful combination of diverse data types.

e-Science aspects of large collaborations.

Sharing of data and computational resources and tools in real-time.

Management, and rapid retrieval, of data.

“Research reproducibility” …where did the data come from? How?

Development of efficient algorithms.

Cross-disciplinary comparative tools (e.g. statistical).

Improved data acquisition.

Novel hardware approaches (e.g. GPUs, sensors).

IIC Research Branches( and Projects Draw upon >1 )

V

DC

DB/P AS I

Plus…Educational Programs that bring IIC Science to Harvard students, and to the public at large.

Page 15: Research Computing at Harvard

Data Intensive Project

• ATLAS/LHC computing – Tier 2

• Mileura Wide Field Array (MWA) – microwave examination of ultra-redshifted era – time of recombination.

• Pan-STARRS – optical telescope (Panoramic Survey Telescope And Rapid Response System)

Page 16: Research Computing at Harvard

EGG Project

• S. Youssef, J. Huth, D. Parkes, M. Seltzer, J. Shank

• Extension of PACMAN concept to resource allocation, cache management

Page 17: Research Computing at Harvard

In the beginning…

Software environment computing, i.e. creating and manipulating software environments

Economic mechanism design; bidding systems, provenance & file systems, resource prediction

GLOBUSCondor

PBS

LSF EGEE

Chimera

RLS VOMS

Resource Brokers

MonaLisa

Ganglia

OSGDial

PandadCache

SRMPacman

Gums Web services

Virtual Machines GridCat

BU Harvard

But what do these have to do with each other? …And how do they fit into the (over-)complicated world of grid computing?

But then, something very unusual happened…

Alien

LCG

Dirac

DISUN ACDC

VDT

VDS

DRM

ClarensGlue

EDG

Classads

Netlogger

CaponeEowyn gLite

ADAiVDGL

PPDG

Page 18: Research Computing at Harvard

setenv(Foo,Bar)

download(foo.tar.gz)

shell(make install)get(E)

“eggshell”

“caches”

“Pacman”

An installation

~ Various URLs with eggshell source code

[ Pacman is used by ATLAS, OSG, VDT, LCG, Globus, TeraGrid,… >350,000 downloads, ~500-1000 new installations per day in 50 countries around the world, supported on 14 OS.]

Page 19: Research Computing at Harvard

setenv(Foo,Bar)

download(foo.tar.gz)

shell(myjob < infile > outfile)

We can let all computations be “installations.”

put(E)

But which path should E follow?

Page 20: Research Computing at Harvard

setenv(Foo,Bar)

download(foo.tar.gz)Cache history Cache

contentsF( , , )=> ~Opportunity cost

Fast WAN

ATLAS v.10.5.0 already installed

put(job needing ATLAS 10.5.0)

Resolving the put ambiguity == Resource allocation

Page 21: Research Computing at Harvard

Eggshells Computers

On save()…

bidding process ->(C,E)

C.put(E)

...repeat...

“time>= 14 Nov.”

“bidding closes in 7 days”

A cache can be a marketplace

Eggshells go where they get the best prices

Computers go where there are the most buyers

Page 22: Research Computing at Harvard

The LHC Inverse Mapping Problem

• A CPU intensive problem

• N. Arkani-Hamed, G. Kane

Page 23: Research Computing at Harvard
Page 24: Research Computing at Harvard
Page 25: Research Computing at Harvard
Page 26: Research Computing at Harvard
Page 27: Research Computing at Harvard
Page 28: Research Computing at Harvard
Page 29: Research Computing at Harvard