new archs for new bio stanford 10-06 · new algorithm parallel algorithm for range-limited n-body...

127
New Architectures for a New Biology David E. Shaw D. E. Shaw Research, LLC and Center for Computational Biology and Bioinformatics Columbia University

Upload: others

Post on 24-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

New Architectures for aNew Biology

David E. Shaw

D. E. Shaw Research, LLC andCenter for Computational Biology and Bioinformatics

Columbia University

Page 2: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

*** Background(A Bit of Basic Biochemistry)

Page 3: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

DNA Codes for Proteins

Page 4: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

The 20 Amino Acids

Page 5: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Polypeptide Chain

Source: www.yourgenome.org

Page 6: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Levels of Protein Structure

Source: Robert Melamede, U. Colorado

Page 7: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

What We Know and What We Don’t

Decoded the genome

Don’t know most protein structures– Especially membrane proteins

No detailed picture of what most proteins do

Don’t know how everything fits together into a working system

Page 8: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

We Now Have The Parts List ...

Page 9: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

But We Don’t Know What the Parts Look Like ...

Page 10: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Or How They Fit Together ...

Page 11: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Or How The Whole Machine Works

Page 12: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

How Can We Get There?

Two major approaches:Experiments– Wet lab– Hard, since everything is so small

Simulation– Simulate:

• How proteins fold (structure, dynamics)• How proteins interact with

- Other proteins- Nucleic acids- Drug molecules

– Gold standard: Molecular dynamics (MD)

Page 13: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

*** Molecular Dynamics

Page 14: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Molecular Dynamics

t

Divide time into discrete time steps

~1 fs time step

Page 15: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Molecular Dynamics

Calculate forces

Molecular mechanicsforce field

Page 16: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Molecular Dynamics

Move atoms

Page 17: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Molecular Dynamics

Move atoms

... a little bit

Page 18: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Molecular Dynamics

IterateIterate

Iterate... and iterate

Iterate... and iterate

Integrate Newton’slaws of motion

Page 19: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Example of an MD Simulation

Page 20: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Main Problem With MD

Too slow!

Example I just showed:

2 ns simulated time

3.4 CPU-days to simulate

Page 21: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

*** Goals and Strategy

Page 22: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Thought Experiment

What if MD were

– Perfectly accurate?– Infinitely fast?

Would be easy to performarbitrary computational experiments

– Determine structures by watching them form– Figure out what happens by watching it happen– Transform measurement into data mining

Page 23: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Two Distinct Problems

Problem 1: Simulate many short trajectories

Problem 2: Simulate one long trajectory

Page 24: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Simulating Many Short Trajectories

Can answer surprising number of interesting questions

Can be done using– Many slow computers– Distributed processing approach– Little inter-processor communication

E.g., Pande’s Folding at Home project

Page 25: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Simulating One Long Trajectory

Harder problem

Essential to elucidate many biologically interesting processes

Requires a single machine with– Extremely high performance– Truly massive parallelism– Lots of inter-processor communication

Page 26: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Our Goal

Single, millisecond-scale MD simulations– Protein with 64K atoms– Explicit water molecules

Why?

– That’s the time scale at which many biologically interesting things start to happen

Page 27: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Image: Istvan Kolossvary & Annabel Todd,D. E. Shaw Research

Protein Folding

Page 28: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Interactions Between Proteins

Image: Vijayakumar, et al., J. Mol. Biol. 278, 1015 (1998)

Page 29: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Image: Nagar, et al., Cancer Res. 62, 4236 (2002)

Binding of Drugs to their Molecular Targets

Page 30: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Image: H. Grubmüller, in Attig, et al. (eds.), Computational Soft Matter (2004)

Mechanisms of Intracellular Machines

Page 31: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

What Will It Take to Simulate a Millisecond?

We need an enormous increase in speed– Current (single processor): ~ 100 ms / fs– Goal will require < 10 μs / fs

Required speedup: > 10,000x faster than current single-processor speed~ 1,000x faster than current parallel implementations

Page 32: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Target Simulation Speed

3.4 days today(one processor)

~ 13 seconds onour machine

(one segment)

Page 33: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Molecular Mechanics Force Field

( )

20

bonds2

0angles

torsions

12 6

( )

[1 cos( )]

b

i j

i j i ij

ij ij

i j i ij ij

E k r r

k

A n

q qr

A Br r

θ θ θ

τ ϕ

>

>

= −

+ −

+ + −

+

+ −

∑∑

∑∑

Stretch

Bend

Torsion

Electrostatic

Van der Waals

Non-Bonded

Bonded

Page 34: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Distance Between Centers of Atoms

Pote

ntia

l Ene

rgy

0 r0

Stretch Term

( )20

bondsbE k r r= −∑

Page 35: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Distance Between Centers of Atoms0 r0

Stretch TermPo

tent

ial E

nerg

y

( )20

bondsbE k r r= −∑

Page 36: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Distance Between Centers of Atoms0 r0

Stretch TermPo

tent

ial E

nerg

y

( )20

bondsbE k r r= −∑

Page 37: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Distance Between Centers of Atoms

Pote

ntia

l Ene

rgy

0 r0

Stretch Term

( )20

bondsbE k r r= −∑

Page 38: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Distance Between Centers of Atoms

Pote

ntia

l Ene

rgy

0 r0

Stretch Term

( )20

bondsbE k r r= −∑

Page 39: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Distance Between Centers of Atoms

Pote

ntia

l Ene

rgy

0 r0

Stretch Term

( )20

bondsbE k r r= −∑

Page 40: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Distance Between Centers of Atoms

Pote

ntia

l Ene

rgy

0 r0

Stretch Term

( )20

bondsbE k r r= −∑

Page 41: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Bond Angle

Pote

ntia

l Ene

rgy

20

angles( )E kθ θ θ= −∑

Bend Term

0θ θ=

Page 42: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Bond Angle

Pote

ntia

l Ene

rgy

Bend Term

θ

20

angles( )E kθ θ θ= −∑

Page 43: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Bond Angle

Pote

ntia

l Ene

rgy

Bend Term

θ

20

angles( )E kθ θ θ= −∑

Page 44: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Bond Angle

Pote

ntia

l Ene

rgy

Bend Term

0θ θ=

20

angles( )E kθ θ θ= −∑

Page 45: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Bond Angle

Pote

ntia

l Ene

rgy

Bend Term

0θθ

20

angles( )E kθ θ θ= −∑

Page 46: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Bond Angle

Pote

ntia

l Ene

rgy

Bend Term

0θθ

20

angles( )E kθ θ θ= −∑

Page 47: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Bond Angle

Pote

ntia

l Ene

rgy

Bend Term

0θ θ=

20

angles( )E kθ θ θ= −∑

Page 48: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Torsion Term

torsions[1 cos( )]E A nτ ϕ= + −∑

4 / 3π 5 / 3π 2π/ 3π 2 / 3π π0

Oblique View

Pote

ntia

lEn

ergy

Torsion Angle (radians)

Axial View

Page 49: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Torsion Term

torsions[1 cos( )]E A nτ ϕ= + −∑

4 / 3π 5 / 3π 2π/ 3π 2 / 3π π0

Oblique View

Pote

ntia

lEn

ergy

Torsion Angle (radians)

Axial View

Page 50: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Torsion Term

torsions[1 cos( )]E A nτ ϕ= + −∑

4 / 3π 5 / 3π 2π/ 3π 2 / 3π π0

Oblique View

Pote

ntia

lEn

ergy

Torsion Angle (radians)

Axial View

Page 51: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Electrostatic Term

Distance Between Centers of Atoms

Pote

ntia

l Ene

rgy

i j

i j i ij

q qE

r>

=∑∑

++

Page 52: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Electrostatic Term

Distance Between Centers of Atoms

Pote

ntia

l Ene

rgy

i j

i j i ij

q qE

r>

=∑∑

++

Page 53: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Electrostatic Term

Distance Between Centers of Atoms

Pote

ntia

l Ene

rgy

i j

i j i ij

q qE

r>

=∑∑

++

Page 54: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Electrostatic Term

Distance Between Centers of Atoms

Pote

ntia

l Ene

rgy

i j

i j i ij

q qE

r>

=∑∑

+_+

Page 55: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Electrostatic Term

Distance Between Centers of Atoms

Pote

ntia

l Ene

rgy

i j

i j i ij

q qE

r>

=∑∑

+_+

Page 56: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Electrostatic Term

Distance Between Centers of Atoms

Pote

ntia

l Ene

rgy

i j

i j i ij

q qE

r>

=∑∑

+ _

Page 57: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Attractive (1 / r 6 )Repulsive (1 / r 12 )

Combined

Van der Waals Terms

Pote

ntia

l Ene

rgy

Distance Between Centers of Atoms

12 6 = ij ij

i j i ij ij

A BE

r r>

−∑∑

0

Page 58: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Attractive (1 / r 6 )Repulsive (1 / r 12 )

Combined

Van der Waals Terms

Distance Between Centers of Atoms

12 6 = ij ij

i j i ij ij

A BE

r r>

−∑∑

Pote

ntia

l Ene

rgy

0

Page 59: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Attractive (1 / r 6 )Repulsive (1 / r 12 )

Combined

Van der Waals Terms

Distance Between Centers of Atoms

12 6 = ij ij

i j i ij ij

A BE

r r>

−∑∑

Pote

ntia

l Ene

rgy

0

Page 60: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Attractive (1 / r 6 )Repulsive (1 / r 12 )

Combined

Van der Waals Terms

Pote

ntia

l Ene

rgy

Distance Between Centers of Atoms

12 6 = ij ij

i j i ij ij

A BE

r r>

−∑∑

0

Page 61: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Attractive (1 / r 6 )Repulsive (1 / r 12 )

Combined

Van der Waals Terms

Distance Between Centers of Atoms

12 6 = ij ij

i j i ij ij

A BE

r r>

−∑∑

Pote

ntia

l Ene

rgy

0

Page 62: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Attractive (1 / r 6 )Repulsive (1 / r 12 )

Combined

Van der Waals Terms

Distance Between Centers of Atoms

12 6 = ij ij

i j i ij ij

A BE

r r>

−∑∑

Pote

ntia

l Ene

rgy

0

Page 63: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Attractive (1 / r 6 )Repulsive (1 / r 12 )

Combined

Van der Waals Terms

Pote

ntia

l Ene

rgy

Distance Between Centers of Atoms

12 6 = ij ij

i j i ij ij

A BE

r r>

−∑∑

0

Page 64: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Molecular Mechanics Force Field

( )

20

bonds2

0angles

torsions

12 6

( )

[1 cos( )]

b

i j

i j i ij

ij ij

i j i ij ij

E k r r

k

A n

q qr

A Br r

θ θ θ

τ ϕ

>

>

= −

+ −

+ + −

+

+ −

∑∑

∑∑

Stretch

Bend

Torsion

Electrostatic

Van der Waals

Non-Bonded

Bonded

Page 65: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

What Takes So Long?

Inner loop of force field evaluation looks at all pairs of atoms (within distance R)

On the order of 64K atoms in typical system

Repeat ~1012 times

Current approaches too slow by several orders of magnitude

What can be done?

Page 66: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Our Strategy

New architectures– Designing a specialized machine– Enormously parallel architecture– Based on special-purpose ASICs– Dramatically faster for MD, but less flexible– Projected completion: 2008

New algorithms– Applicable to

• Conventional clusters• Our own machine

– Scale to very large # of processing elements

Page 67: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Interdisciplinary Lab

Computational Chemists and Biologists

Computer Scientists and Applied Mathematicians

Computer Architects and Engineers

Page 68: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

*** New Architectures

Page 69: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Alternative Machine Architectures

Conventional cluster of commodity processors

General-purpose scientific supercomputer

Special-purpose molecular dynamics machine

Page 70: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Conventional Cluster of Commodity Processors

Strengths:

– Flexibility

– Mass market economies of scale

Limitations

– Doesn’t exploit special features of the problem

– Communication bottlenecks• Between processor and memory• Among processors

– Insufficient arithmetic power

Page 71: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Typical Commodity Microprocessor

Page 72: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Typical Commodity Microprocessor

Page 73: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

General-Purpose Scientific Supercomputer

E.g., IBM Blue Gene

More demanding goal than ours– General-purpose scientific supercomputing– Fast for wide range of applications

Strengths:– Flexibility– Ease of programmability

Limitations for MD simulations– Expensive– Still not fast enough for our purposes

Page 74: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Our Special-Purpose MD Machine

Strengths:– Several orders of magnitude faster for MD– Excellent cost/performance characteristics

Limitations:– Not designed for other scientific applications

• They’d be difficult to program• Still wouldn’t be especially fast

– Limited flexibility

Page 75: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Source of Speedup on Our Machine

Judicious use of arithmetic specialization– Flexibility, programmability only where needed– Elsewhere, hardware tailored for speed

• Tables and parameters, but not programmable

Carefully choreographed communication– Data flows to just where it’s needed– Almost never need to access off-chip memory

Page 76: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Two Subsystems on Each ASIC

SpecializedSubsystem

FlexibleSubsystem

Programmable,general-purposeEfficient geometricoperations

Pairwise pointinteractionsEnormously parallel

Page 77: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Where We Use Specialized Hardware

Specialized hardware (with tables, parameters) where:Inner loopSimple, regular algorithmic structureUnlikely to change

Examples:Electrostatic forcesVan der Waals interactions (at least attractive term)

Page 78: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Example: Particle Interaction Pipeline (one of 32)

Page 79: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Array of 32 Particle Interaction Pipelines

Page 80: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Advantages of Particle Interaction Pipelines

Save area that would have been allocated to– Cache– Control logic– Wires

Achieve extremely high arithmetic density

Save time that would have been spent on– Cache misses,– Load/store instructions– Misc. data shuffling

Page 81: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Where We Use Flexible Hardware

– Use programmable hardware where:• Algorithm less regular• Smaller % of total time

- E.g., local interactions (fewer of them)• More likely to change

– Examples:• Bonded interactions• Bond length constraints• Experimentation with

- New, short-range force field terms- Alternative integration techniques

Page 82: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Forms of Parallelism in Flexible Subsystem

The Flexible Subsystem exploits three forms of parallelism:

– Multi-core parallelism

– Instruction-level parallelism

– SIMD parallelism

Page 83: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Overview of the Flexible Subsystem

GC = Geometry Core(each a VLIW processor)

Page 84: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Geometry Core(one of 8; 64 pipelined lanes/chip)

+

X

+ + + +

InstructionMemory

DecodeFrom

TensilicaCore

X X X

X Y Z W

PC

+

X

+ + + +

X X X

X Y Z W

DataMemory

f f f f f f f f

Page 85: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

System-Level Organization

Multiple segments (probably 8 in first machine)

512 nodes (each with one ASIC) per segment– Organized in an 8 x 8 x 8 toroidal mesh

Topology reflects physical space being simulated:– Three-dimensional nearest neighbor connections– Periodic boundary conditions

Page 86: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

3D Torus Network

Page 87: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

But Communication is Still a Bottleneck

Scalability limited by inter-chip communication

To execute a single millisecond-scale simulation,

– Need a huge number of processing elements

– Must dramatically reduce amount of data transferred between these processing elements

Can’t do this without fundamentally new algorithms

Page 88: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

*** The NT Algorithm

Page 89: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Range-Limited Pairwise Particle Interactions

Efficient methods known for distant interactions

R

Pairwise, non-bonded interactions dominateRange-limited n-body problem

Page 90: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

New Algorithm

Parallel algorithm for range-limited n-body problem

Called the NT (for “Neutral Territory”) Method *

Asymptotically less inter-processor communicationthan traditional spatial decomposition methods

Constant factors also very attractive– Significant improvements on typical cluster– Major win on large machines

* Shaw, J. Comp. Chem. 26, Oct. 2005

Page 91: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Desirable Properties

Ideally, a parallel algorithm for the range-limitedn-body problem would:

Exploit the range limitation to reduce computational load

Scale such that data transfer approaches zero as p →∞

Page 92: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Asymptotic Comparison WithTraditional Spatial Decomposition Methods

Exploitable range

limitation

Scaling with number of processors

Traditional methods

O (R 3)

neighbors Not

scalable

NT Method O (R 3/2)

neighbors O (P –1/2) scaling

NT Method has both of these properties:

Page 93: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Partitioning of Space Into Boxes

Atom A Home box of atom A

Page 94: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Two-Dimensional Analog of the NT Method

Traditional Method(2D Analog)

NT Method(2D Analog)

Green = interaction box; blue = import region

Page 95: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

How can it be betterto meet on neutral territory?

Number of pairwise interactions (~ product of areas)

Number of atoms imported (~ sum of areas):

Traditional Method (2D) NT Method (2D)

Page 96: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Actual 3D Algorithm

Considerably more complex– Odd number of dimensions introduces

complications

Can be made to work– Math gets more complicated– Performance advantage just as large

Start by describing 3D version of traditional spatial decomposition methods

Page 97: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Traditional 3DSpatial Decomposition

Methods

Page 98: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Traditional Spatial Decomposition MethodInteraction Box and Import Region

Green = Interaction box Blue = Import region

Page 99: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Site of Interaction, Traditional Method

Interact– One atom from (cubical) interaction box– One atom from either interaction box or import

region

All interactions occur within home box of one of the two atoms

How much inter-processor communication?

Page 100: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Import SubregionFace(–x)

Page 101: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Import SubregionEdge(–x, +z)

Page 102: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Import SubregionCorner (+x, –y, +z)

Page 103: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Import region of traditional spatial decomposition method:3 face subregions6 edge subregions4 corner subregions

3Rb2 + 3πR2b /2 + 2πR3 /3

where b = side length of (cubical) box

In limit as p , import volume approaches2πR3 /3

Import Volume, Traditional Method

Page 104: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

*** The Three-DimensionalNT Algorithm

Page 105: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

NT MethodInteraction Box and Import Region

Green = Interaction box Blue = Import region

Page 106: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

The Tower(outer tower in blue)

Page 107: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

The Plate(outer plate in blue)

Page 108: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Interact– One atom from tower– One atom from plate

Both atoms may have tobe imported

They meet “on neutralterritory”

Site of Interaction, NT Method

Plate Atom

Tower Atom

Page 109: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Aspect Ratio Optimization in NT

Dimensions of box ⇒ dimensions of tower, plate

Volume of box determined by– Size of molecular system– Number of processors

Aspect ratio of box is free parameter– x and y dimensions equal; ratio to z can vary– Optimize to minimize communication

Optimal aspect ratio depends on number of processors– More processors ⇒ shorter, fatter box (balance)

Page 110: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Scaling of the NT Method64 Processors

Assumes 50,000 atoms, interaction radius = 12A, density = 0.1 atom/A3

Page 111: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Scaling of the NT Method512 Processors

Assumes 50,000 atoms, interaction radius = 12A, density = 0.1 atom/A3

Page 112: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Scaling of the NT Method4K Processors

Assumes 50,000 atoms, interaction radius = 12A, density = 0.1 atom/A3

Page 113: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Scaling of the NT Method32K Processors

Assumes 50,000 atoms, interaction radius = 12A, density = 0.1 atom/A3

Page 114: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Import volume: 4 face subregions2 edge subregionsNo corner subregions

Vi = 2 Rbxy2 + 2 Rbxybz + πR2 bz /2

where bxy = x & y dimensions of boxbz = z dimension of box

Optimize ratio bxy / bz to minimize import volume

NT’s Import Volume With Cubical Box

Page 115: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

NT: Optimal Aspect Ratio and Import Volume

Results:

Optimal bxy = [ c1/2 + (Vb c –1/2 – c)1/2 ] / 2

where c = d /6 – 2πRVb /dd = {27Vb2 – 3 [3Vb3 ( (4πR)3 + 27Vb)]1/2 }1/3

Vb = box volume

To find minimal import volume:

– Use optimal bxy to calculate optimal bz

– Substitute into equation for Vi

Page 116: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

NT’s Import Volume With Optimized Box

Limit as p →∞: Note decrease in exponent

Vi = 2π 1/2 R 3/2 Vb 1/2

Vb ~ N / p, where N is # atoms in molecular system

So Vi = O ( R 3/2 (N /p)1/2 )

Page 117: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Comparison ofTraditional and NT Methods

Page 118: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Traditional Method Imports Corner Subregions

Page 119: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

NT Method Doesn’t Import Any Corner Subregions

Page 120: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

NT vs. Traditional Method

Traditional spatial decomposition method:• Transfer time ~ volume of sphere of radius R

(for large p)

NT method• Transfer time ~ square root of that sphere’s

volume

Advantage of NT over traditional method grows as number of processors increases

Page 121: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Scaling of Traditional vs. NT Method64 Processors

Assumes 50,000 atoms, interaction radius = 12A, density = 0.1 atom/A3

Page 122: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Scaling of Traditional vs. NT Method 512 Processors

Assumes 50,000 atoms, interaction radius = 12A, density = 0.1 atom/A3

Page 123: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Scaling of Traditional vs. NT Method 4K Processors

Assumes 50,000 atoms, interaction radius = 12A, density = 0.1 atom/A3

Page 124: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Scaling of Traditional vs. NT Method 32K Processors

Assumes 50,000 atoms, interaction radius = 12A, density = 0.1 atom/A3

Page 125: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Inter-Processor Transfer Time, Traditional vs. NT

8 64 512 4K 32K

1000

2000

3000

4000

5000

6000

7000

Number of Processors

Tim

e

Assumes 50,000 atoms, interaction radius = 12A, density = 0.1 atom/A3

Time unit is time required to import data associated with one atom

TraditionalMethod

NTMethod

Page 126: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

An Open Question ThatKeeps Me Awake at Night

Page 127: new archs for new bio stanford 10-06 · New Algorithm Parallel algorithm for range-limited n-body problem Called the NT (for “Neutral Territory”) Method* Asymptotically less inter-processor

Are Force Fields Accurate Enough?

Nobody knows how accurate the force fields that everyone uses actually are

– Can’t simulate for long enough to know

– If problems surface, we may at least be able to• Figure out why• Take steps to fix them

But we already know that fast, single MD simulations will prove sufficient to answer at least some major scientific questions