nsf visit gordon bell gbell microsoft research 4 october 2002

NSF Visit

Gordon Bellwww.research.microsoft.com/~gbell

Microsoft Research4 October 2002

http://www.research.microsoft.com/~gbell

Topics

• How much things have changed since CISE was formed in 1986, but remain the same?

• 10 year base case @CRA’s Grand Challenges? http://www.google.com/search?sourceid=navclient&q=cra+grand+challenges

• GB MyLifeBits: storing one’s entire life for recall, home media, etc.

• Clusters, Grids, and Centers…challenge is apps• Supercomputing directions

http://www.google.com/search?sourceid=navclient&q=cra+grand+challenges

Messages…

• The Grand Challenge for CISE is to work on applications in science, engineering, and bio/medicine/health care (e.g. NIH).

• Databases versus greping. Revolution needed.Performance from software >= Moore’s Law

• Big challenge moving forward will come from trying to manage and exploit all the storage.

• Supercomputing: Cray. Gresham's Law• Build on industry standards and efforts.

Grid and “web services” must co-operate. • Whatever happened to the first, Grand Challenges?• Minimize grant overhead… site visits.

IBM Sets Up Biotech Research Center

U.S.-based IBM recently set up a biotechnology research and development center in Taiwan -- IBM Life Sciences Center of Excellence -- the company's first in the Asia Pacific region… the center will provide computation solutions and services from an integrated bio-information database linked to resources around the world. Local research institutes working in cooperation with the center include Academia Sinica, the Institute for Information Industry and National Yang Ming University.

From HPCWire 30 September 2002

Retrospective: CISE formed in 1986

• CISE spent about $100 million on research in 1987

• Q: What areas of software research do you think will be the most vital in the next decade?

• A: Methods to design and build large programs and data bases in a distributed environment are central.

• Q: What software research areas are funded?• A: We fund what the community considers to be important … object-

oriented languages, data bases, & human interfaces; semantics; formal methods of design and construction; connectionism; and data and knowledge bases, including concurrency. We aren’t funding applications.

Software Productivity c1986• I believe the big gains in software will come about by eliminating the

old style of programming, by moving to a new paradigm, rather than magic tools or techniques to make the programming process better. Visicalc and Lotus 1-2-3 are good examples of a dramatic improvement in programming productivity. In essence, programming is eliminated and the work put in the hands of the users.

• These breakthroughs are unlikely to come from the software research community, because they aren’t involved in real applications. Most likely they will come from people trained in another discipline who understand enough about software to be able to carry out the basic work that ultimately is turned over to the software engineers to maintain and evolve.

Software productivity c1986

• Q: The recent Software Engineering Conference featured a division of opinion on mechanized programming. … developing a programming system to write programs can automate much of the mundane tasks…

• A: Mechanized programming is recreated and renamed every few years. In the beginning, it meant a compiler. The last time it was called automatic programming. A few years ago it was program generators and the programmer’s work bench. The better it gets, the more programming you do!

Parallelism c1986

• To show my commitment to parallel processing, for the next 10 years I will offer two $1000 annual awards for the best, operational scientific or engineering program with the most speedup ...

• Q: What …do you expect from parallelism in the next decade?

• A: Our goal is obtaining a factor of 100 … within the decade and a factor of 10 within five years. 10 will be easy because it is inherently in most applications right now. The hardware will clearly be there if the software can support it or the users can use it.

• Many researchers think this goal is aiming too low. They think it should be a factor of I million within 15 years. However, I am skeptical that anything more than our goal will be

No challenge, next decade of systems. Industry’s evolutionary path…

¿Que sera sera

Computing Research Association Grand ChallengesGordon Bell

Microsoft Research26 June 2002

Base Case

Grand Challengeland

Death and Doldrums

Time

Go

od

nes

s

2000 2012

We can count on:• Moore’s Law provides ≈50-100x performance, const. $

20% $ decrease/year => ½ per 5 years• Terabyte personal stores => personal db managers• Astronomical sized, by current standards, databases!• Paper quality screens on watch, tablets… walls• DSL wired, 3-4G/802.11j nets (>10 Mbps) access• Network Services: Finally computers can use|access the

web. “It’s the Internet, Stupid.”– Enabler of intra-, extra-, inter-net commerce– Finally EDI/Exchanges/Markets

• Ubiquity rivaling the telephone. – Challenge: An instrument to supplant the phone?– Challenge: Affordability for everyone on planet <$1500/year

• Personal authentication to access anything of value• Murphy’s Law continues with larger and more complex

systems, requiring better fundamental understanding. A opportunity and need for “Autonomic Computing”

In a decade, the evolution:

We are likely to “have”• 120M computers/yr. World population >1B.

– increasing with decreasing price. 2x / -50%– X% are discarded. Result is 1 Billion.

• Smaller personals w/phones… video @PDA $• Almost adequate speech communication for

commands, limited dictation, note taking, segmenting/indexing video

• Vision capable of tracking each individual in a relatively large crowd. With identity, everybody’s location is known, everywhere, anytime.

In a decade, the evolution:

Inevitable wireless nets… body, home, …x-area nets will create

new opportunities

• Need to construct these environment of platforms, networking protocols, and programming environments for each kind

• Each net has to research its own sensor/effector structure as f(application) e.g. body, outdoor, building,

• Taxonomy includes these alternative dimensions:– Network function– master|slave vs. distributed… currently peripheral nets– permanent|dynamic– indoor|outdoor; – size and spatial diameter; – bandwidth and performance;– sensor/effector types; – security and noise immunity;

New environments can support a wide range of new apps

• Continued evolution of personal monitoring and assistance for health and personal care of all ages

• Personal platforms that provide “total recall” that will assist (25% of population) solving problems

• Platforms for changing education will be available. Limiters: Authoring tools & standards; content

• Transforming the scientific infrastructure is needed!– petabyte databases, petaflops performance– shared data notebooks across instruments and labs– new ways of performing experiments and – new ways of programming/visualizing and storing data.

• Serendipity: Something really new, like we get every decade but didn’t predict, will occur.

R & D Challenges• Engineering, evolutionary construction, and non-trivial

maintenance of billions of node, fractal nets ranging from the space, continent, campus, local, … to in-body nets

• Increasing information flows & vast sea of data– Large disks everywhere!

personal to large servers across all apps– Akin to the vast tape libraries that are never read (bit rot)

• A modern, healthcare system that each of us would be happy or unafraid of being admitted into. Cf. islands (incompatible systems) of automation and instruments floating on a sea of paper moved around by people who maintain a bloated and inefficient “services” industry/economy.

MyLifeBits, The Challenge of a MyLifeBits, The Challenge of a 0.001-1 Petabyte lifetime PC 0.001-1 Petabyte lifetime PC

Cyberizing everything…Cyberizing everything…I’ve I’ve writtenwritten, , saidsaid, , presentedpresented (incl. video), (incl. video),

photos of physical objects & a few things photos of physical objects & a few things I’ve read, heard, seenI’ve read, heard, seen

and might “want to see” on TVand might “want to see” on TV

"The PC is going to be the place where you store the "The PC is going to be the place where you store the information … really the center of control“ Billg information … really the center of control“ Billg

1/7/20011/7/2001MyLifeBits is an “on-going” project following MyLifeBits is an “on-going” project following CyberAll to “cyberize” all of personal bits!CyberAll to “cyberize” all of personal bits!►Memory recall of books, CDs, Memory recall of books, CDs, communication, papers, photos, videocommunication, papers, photos, video►Photos of physical object collections Photos of physical object collections ►Elimination of all physical stores & objectsElimination of all physical stores & objects►Content source for home media: ambiance, Content source for home media: ambiance, entertainment, communication, interaction entertainment, communication, interaction FreestyleFreestyle for CDs, photos, TV content, videos for CDs, photos, TV content, videosGoal: to understand the 1 TByte PC: Goal: to understand the 1 TByte PC: need, utility, cost, feasibility, challenge & tools.need, utility, cost, feasibility, challenge & tools.

Storing all we’ve read, heard, & seenStoring all we’ve read, heard, & seen

Human data-types /hr /day (/4yr) /lifetimeread text, few pictures 200 K 2 -10 M/G 60-300 G

speech text @120wpm 43 K 0.5 M/G 15 Gspeech @1KBps 3.6 M 40 M/G 1.2 T

stills w/voice @100KB 200 K 2 M/G 60 G

video-like 50Kb/s POTS 22 M .25 G/T 25 Tvideo 200Kb/s VHS-lite 90 M 1 G/T 100 T

video 4.3Mb/s HDTV/DVD 1.8 G 20 G/T 1 P

©

20

02Scenes from Media Center

A “killer app” for A “killer app” for Terabyte, Lifetime, PC?Terabyte, Lifetime, PC?

► MyLifeBits demonstrates need for lifetime memory!MyLifeBits demonstrates need for lifetime memory!► MODI (Microsoft Office Document Imaging)! MODI (Microsoft Office Document Imaging)!

TThe most significant Office™ addition since HTML.he most significant Office™ addition since HTML.

► Technology to support the vision:Technology to support the vision:1.1. Guarantee that data will live forever!Guarantee that data will live forever!2.2. A single index that includes mail, conversations, A single index that includes mail, conversations,

web accesses, and books!web accesses, and books!3.3. E-book…e-magazines reach critical mass!E-book…e-magazines reach critical mass!4.4. Telephony and audio capture are neededTelephony and audio capture are needed5.5. Photo & video “index serving”Photo & video “index serving”6.6. More meta-information … Office, photosMore meta-information … Office, photos7.7. Lots of GUIs to improve ease-of-useLots of GUIs to improve ease-of-use

Copyright Gordon Bell Clusters & GridsCopyright Gordon Bell Clusters & Grids

The Clusters – GRID Era

CCGSC 2002Lyon, France September 2002

Same observations as 2000 GRID was/is an exciting concept …

– They can/must work within a community, organization, or project. Apps need to drive.

– “Necessity is the mother of invention.” Taxonomy… interesting vs necessity

– Cycle scavenging and object evaluation (e.g. seti@home, QCD)

– File distribution/sharing for IP theft e.g. Napster– Databases &/or programs for a community

(astronomy, bioinformatics, CERN, NCAR)– Workbenches: web workflow chem, bio…– Exchanges… many sites operating together– Single, large objectified pipeline… e.g.

NASA.– Grid as a cluster platform! Transparent &

arbitrary access including load balancing

Web

SV

Cs

X

mailto:seti@home

Grid nj. An arbitrary distributed, cluster platform

A geographical and multi-organizational collection of diverse computers dynamically configured as cluster platforms responding to arbitrary, ill-defined jobs “thrown” at it.

Costs are not necessarily favorable e.g. disks are less expensive than cost to transfer data.

Latency and bandwidth are non-deterministic, thereby changing cluster characteristics

Once a large body of data exists for a job, it is inherently bound to (set into) fixed resources.

Large datasets & I/O bound programs need to be with their data or be database accesses…

But are there resources there to share?

Bright spots… near term, user focus, a lesson for Grid suppliers

Tony Hey, head of UK scientific computing.apps-based funding. versus tools-based funding.

Web services based Grid & data orientation. David Abramson - Nimrod.

– Parameter scans… other low hanging fruit– Encapsulate apps! “Excel”-- language/control mgmt.– “Legacy apps are programs that users just want, and

there’s no time or resources to modify code …independent of age, author, or language e.g. Java.”

Andrew Grimshaw - Avaki– Making Legion vision real. A reality check.

Lip 4 pairs of “web services” based apps Gray et al Skyservice and Terraservice Goal: providing a web service must be as easy

as publishing a web page…and will occur!!!


SkyServer: delivering a web service to the astronomy community. Prototype for other sciences? Gray, Szalay, et al

First paper on the SkyServerhttp://research.microsoft.com/~gray/Papers/MSR_TR_2001_77_Virtual_Observatory.pdf

http://research.microsoft.com/~gray/Papers/MSR_TR_2001_77_Virtual_Observatory.doc

Later, more detailed paper for database communityhttp://research.microsoft.com/~gray/Papers/MSR_TR_01_104_SkyServer_V1.pdf

http://research.microsoft.com/~gray/Papers/MSR_TR_01_104_SkyServer_V1.doc

http://research.microsoft.com/~gray/Papers/MSR_%0BTR_2001_77_Virtual_Observatory.pdf

http://research.microsoft.com/~gray/Papers/MSR_%0BTR_2001_77_Virtual_Observatory.pdf

http://research.microsoft.com/~gray/Papers/MSR_%0BTR_2001_77_Virtual_Observatory.doc

http://research.microsoft.com/~gray/Papers/MSR_%0BTR_2001_77_Virtual_Observatory.doc

http://research.microsoft.com/~gray/Papers/MSR_%0BTR_01_104_SkyServer_V1.pdf

http://research.microsoft.com/~gray/Papers/MSR_%0BTR_01_104_SkyServer_V1.pdf

http://research.microsoft.com/~gray/Papers/MSR_%0BTR_01_104_SkyServer_V1.doc

http://research.microsoft.com/~gray/Papers/MSR_%0BTR_01_104_SkyServer_V1.doc


What can be learned from Sky Server?

It’s about data, not about harvesting flops 1-2 hr. query programs versus 1 wk

programs based on grep 10 minute runs versus 3 day compute &

searches Database viewpoint. 100x speed-ups

– Avoid costly re-computation and searches– Use indices and PARALLEL I/O.

Read / Write >>1. – Parallelism is automatic, transparent, and just

depends on the number of computers/disks. Limited experience and talent to use

dbases.

Some science is hitting a wallFTP and GREP are not adequate (Jim Gray)

You can GREP 1 GB in a minute You can GREP 1 TB in 2 days You can GREP 1 PB in 3 years.

1PB ~10,000 >> 1,000 disks At some point you need

indices to limit searchparallel data search and analysis

Goal using dbases. Make it easy to – Publish: Record structured data– Find data anywhere in the network

Get the subset you need!– Explore datasets interactively

Database becomes the file system!!!

You can FTP 1 MB in 1 sec. You can FTP 1 GB / min. … 2 days and 1K$ … 3 years and 1M$

Network concerns Very high cost

– $(1 + 1) / GByte to send on the net; Fedex and 160 GByte shipments are cheaper

– Disks cost $1/GByte to purchase!!!– DSL at home is $0.15 - $0.30

Disks cost less than $2/GByte to purchase Low availability of fast links (last mile problem)

– Labs & universities have DS3 links at most, and they are very expensive

– Traffic: Instant messaging, music stealing Performance at desktop is poor

– 1- 10 Mbps; very poor communication links Manage: trade-in fast links for cheap links!!

Gray’s $2.4 K, 1 TByte Sneakernet aka Disk Brick

Courtesy of Jim Gray, Microsoft Bay Area Research

Cost to move a Terabyte

Cost, time, and speed to move a Terabyte

Cost of a “Sneaker-Net” TBWe now ship NTFS/SQL disks.Not good format for Linux.Ship NFS/CIFS/ODBC servers (not disks).Plug “disk” into LAN.

DHCP then file or DB serve… Web Service in long term

Cost to move a Terabyte

ContextSpeed Mbps

Rent$/month

Raw $/Mbps

Raw $/TB sent

Time/TBdays

home phone 0.04 40 1,000 3,086 6 years home DSL 0.6 70 117 360 5 monthsT1 1.5 1,200 800 2,469 2 monthsT3 43 28,000 651 2,010 2 daysOC3 155 49,000 316 976 14 hours100 Mpbs 100 1 dayGbps 1000 2.2 hoursOC192 9600 1,920,000 200 617 14 minutes

Cost, time of Sneaker-net vs Alts

Media Robot$

Media$

TB read +write time

ship time

TotalTim/TB Mbps

Cost (10 TB)

$/TB shipped

CD 1500 2x800 240 60 hrs24 hrs 6 days 28 $2 K $208

DVD 200 2x8K 400 60 hrs24 hrs 6 days 28 $20 K $2,000

Tape 25 2x15K 1000 92 hrs24 hrs 5 days 18 $31 K $3,100

DiskBric 7 1K 1,400 19 hrs24 hrs 2 days 52

$2.6 K $260

Courtesy of Jim Gray, Microsoft Bay Area Research


Grids: Real and “personal”Two carrots, one downside. A bet.

Bell will match any Gordon Bell Prize (parallelism, performance, or performance/cost) winner’s prize that is based on “Grid Platform Technology”.

I will bet any individual or set of individuals of the Grid Research community up to $5,000 that a Grid application will not win the above by SC2005.

Copyright Gordon Bell LANL 5/17/2002Copyright Gordon Bell LANL 5/17/2002

Technical computing: Observations on an ever changing, occasionally repetitious, environment

A brief, simplified history of HPC1. Sequential & data parallelism using shared memory, Cray’s Fortran

computers 60-02 (US:90) 2. 1978: VAXen threaten general purpose centers…3. NSF response: form many centers 1988 - present

4. SCI: Search for parallelism to exploit micros 85-95 5. Scalability: “bet the farm” on clusters.

Users “adapt” to clusters aka multi-computers with LCD program model, MPI. >95

6. Beowulf Clusters adopt standardized hardware and Linus’s software to create a standard! >1995

7. “Do-it-yourself” Beowulfs impede new structures and threaten g.p. centers >2000

8. 1997-2002: Let’s tell NEC they aren’t “in step”.

9. High speed networking enables peer2peer computing and the Grid. Will this really work?

What Is the SystemArchitecture?(GB c1990)

MIMD

Multiprocessors Single Address Space Shared Memory Computation

Multicomputers Multiple Address Space Message Passing Computation

Central Memory Multiprocessors (not scalable)

Distributed Memory Multiprocessors (scalable)

Dynamic Binding of addresses to processors KSR

Static Run-time Binding research machines

Bus multis DEC, Encore, NCR, ... Sequent, SGI,Sun

Cross-point or Multi-stage Cray, Fujitsu, Hitachi, IBM, NEC, Tera

Distributed Multicomputers (scalable) Switch connected

IBM

Mesh Connected Intel

Fast LANs for High Availability and High Capacity Clusters DEC, Tandem

LANs for Distributed Processing Workstations, PCs

Butterfly/Fat Tree/Cubes CM5, NCUBE

Static binding, Ring multi IEEE SCI proposalStatic Binding, caching Alliant, DASH

Simple, ring multi ... bus multi replacement

X

X

X

GRID

SIMDX


Processor Architectures?

VECTORS VECTORSOR

CS View

MISC >> CISC >> Language directed

RISC >> Super-scalar >>Extra-Long Instruction Word

Caches: mostly alleviate need for memory B/W

SC Designers View

RISC >> VCISC (vectors)>>

Massively parallel (SIMD) (multiple pipelines)

Memory B/W = perf.


Results from DARPA’s SCI c1983 Many research and construction efforts … virtually all

new hardware efforts failed except Intel and Cray. DARPA directed purchases… screwed up the market,

including the many VC funded efforts. No Software funding! Users responded to the massive power potential with LCD

software. Clusters, clusters, clusters using MPI. Beowulf! It’s not scalar vs vector, its memory bandwidth!

– 6-10 scalar processors = 1 vector unit– 16-64 scalars = a 2 – 6 processor SMP

Dead Supercomputer Society ACRI Alliant American Supercomputer Ametek Applied Dynamics Astronautics BBN CDC Convex Cray Computer Cray Research Culler-Harris Culler Scientific Cydrome Dana/Ardent/Stellar/Stardent Denelcor Elexsi ETA Systems Evans and Sutherland Computer Floating Point Systems Galaxy YH-1

Goodyear Aerospace MPP Gould NPL Guiltech Intel Scientific Computers International Parallel Machines Kendall Square Research Key Computer Laboratories MasPar Meiko Multiflow Myrias Numerix Prisma Tera Thinking Machines Saxpy Scientific Computer Systems (SCS) Soviet Supercomputers Supertek Supercomputer Systems Suprenum Vitesse Electronics


What a difference 25 years AND spending >10x makes!

LLNL 150 Mflops machine room c1978

ESRDC: 40 Tflops. 640 nodes (8 - 8GFl P.vec/node)

Japanese Earth Simulator

• Spectacular results for $400M.– Year to year gain of 10x. The greatest gain since the

first (1987) Gordon Bell Prize.– Performance is 10x the nearest entrant– Performance/cost is 3x the nearest entrant– RAP (real application performance) >60% Peak

Other machines are typically 10% of peak.– Programming was done in HPF (Fortran) that the US

research community abandoned.

• NCAR was right in wanting to purchase an NEC super


Computer types

NetwrkedSupers…

LegionCondor Beowulf NT clusters

VPPuni

T3E SP2(mP) NOW

NEC mP

SGI DSM clusters &SGI DSM

NEC super Cray X…T(all mPv)

MainframesMultis

WSs PCs

-------- Connectivity--------

WAN/LAN SAN DSM SM

mic

ros

v

ecto

r

ClustersGRID& P2P

OldWorld


The Challenge leading to Beowulf

NASA HPCC Program begun in 1992 Comprised Computational Aero-Science and

Earth and Space Science (ESS) Driven by need for post processing data

manipulation and visualization of large data sets Conventional techniques imposed long user

response time and shared resource contention Cost low enough for dedicated single-user

platform Requirement:

– 1 Gflops peak, 10 Gbyte, < $50K Commercial systems: $1000/Mflops or 1M/Gflops


Inno

vatio

n

The Virtuous Economic Cycle drives the PC industry… & Beowulf

Volum

e

Competition

Standards

Utility/value

DOJ

Greater availability

@ lower cost

Creates apps, tools, training,Attracts users

Attracts suppliers

Lessons from Beowulf

An experiment in parallel computing systems Established vision- low cost high end computing Demonstrated effectiveness of PC clusters for some (not all) classes of

applications Provided networking software Provided cluster management tools Conveyed findings to broad community Tutorials and the book Provided design standard to rally community! Standards beget: books, trained people, software … virtuous cycle that

allowed apps to form Industry begins to form beyond a research project

Courtesy, Thomas Sterling, Caltech.


Clusters: Next Steps

Scalability… They can exist at all levels:

personal, group, … centers Clusters challenge centers…

given that smaller users get small clusters

Computing in small spaces @ LANL(RLX cluster in building with NO A/C)

240 processors @2/3 GFlops

Fill the 4 racks -- gives a Teraflops


Internet II concerns given $0.5B cost

Very high cost– $(1 + 1) / GByte to send on the net;

Fedex and 160 GByte shipments are cheaper– DSL at home is $0.15 - $0.30

Disks cost $1/GByte to purchase! Low availability of fast links (last mile problem)

– Labs & universities have DS3 links at most, and they are very expensive

– Traffic: Instant messaging, music stealing Performance at desktop is poor

– 1- 10 Mbps; very poor communication links

Scalable computing: the effects They come in all sizes; incremental growth

10 or 100 to 10,000 (100X for most users)debug vs run; problem growth

Allows compatibility heretofore impossible1978: VAX chose Cray Fortran1987: The NSF centers went to UNIX

Users chose sensible environment– Acquisition and operational costs & environments– Cost to use as measured by user’s time

The role of gp centers e.g. NSF, statex is unclear. Necessity for support?– Scientific Data for a given community…– Community programs and data– Manage GRIDdiscipline

Are clusters ≈ Gresham’s Law? Drive out alts.


The end

nsf visit gordon bell gbell microsoft research 4 october 2002

Documents

areas of software research

software research areas

software research community

software engineers

biotech research center

microsoft research

biotechnology research

software productivity