scaleable computing jim gray microsoft corporation [email protected]

73
Scaleable Scaleable Computing Computing Jim Gray Jim Gray Microsoft Corporation Microsoft Corporation [email protected] [email protected]

Upload: georgia-alexander

Post on 13-Dec-2015

233 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Scaleable ComputingScaleable Computing

Jim GrayJim GrayMicrosoft CorporationMicrosoft Corporation

[email protected]@Microsoft.com

Page 2: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Thesis: Scaleable ServersThesis: Scaleable Servers Scaleable ServersScaleable Servers

Commodity hardware allows new applicationsCommodity hardware allows new applications New applications need huge serversNew applications need huge servers Clients and servers are built of the same “stuff”Clients and servers are built of the same “stuff”

Commodity software and Commodity software and Commodity hardwareCommodity hardware

Servers should be able to Servers should be able to Scale up Scale up (grow node by adding CPUs, disks, networks)(grow node by adding CPUs, disks, networks)

Scale out Scale out (grow by adding nodes)(grow by adding nodes)

Scale down Scale down (can start small)(can start small)

Key software technologiesKey software technologies Objects, Transactions, Clusters, ParallelismObjects, Transactions, Clusters, Parallelism

Page 3: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

1987: 256 tps Benchmark 1987: 256 tps Benchmark 14 M$ computer (Tandem)14 M$ computer (Tandem) A dozen peopleA dozen people False floor, 2 rooms of machinesFalse floor, 2 rooms of machines

Simulate 25,600 clients

A 32 node processor array

A 40 GB disk array (80 drives)

OS expert

Network expert

DB expert

Performance expert

Hardware experts

Admin expert

Auditor

Manager

Page 4: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

1988: DB2 + CICS Mainframe1988: DB2 + CICS Mainframe65 tps65 tps

IBM 4391 IBM 4391 Simulated network of 800 clientsSimulated network of 800 clients 2m$ computer2m$ computer Staff of 6 to do benchmarkStaff of 6 to do benchmark

2 x 3725 network controllers

16 GB disk farm4 x 8 x .5GB

Refrigerator-sizedCPU

Page 5: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

1997: 10 years later1997: 10 years later1 Person and 1 box = 1250 tps1 Person and 1 box = 1250 tps

1 Breadbox ~ 5x 1987 machine room1 Breadbox ~ 5x 1987 machine room 23 GB is hand-held23 GB is hand-held One person does all the workOne person does all the work Cost/tps is 1,000x lessCost/tps is 1,000x less

25 micro dollars per transaction25 micro dollars per transaction4x200 Mhz cpu1/2 GB DRAM12 x 4GB disk

Hardware expertOS expertNet expertDB expertApp expert

3 x7 x 4GB disk arrays

Page 6: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

What Happened?What Happened? Moore’s law: Moore’s law:

Things get 4x better every 3 yearsThings get 4x better every 3 years (applies to computers, storage, and networks)(applies to computers, storage, and networks)

New Economics: CommodityNew Economics: Commodityclassclass price/mips softwareprice/mips software $/mips k$/year $/mips k$/yearmainframe mainframe 10,000 10,000 100 100 minicomputerminicomputer 100 100 10 10microcomputer 10 microcomputer 10 1 1

GUI: Human - computer tradeoffGUI: Human - computer tradeoffoptimize for people, not computersoptimize for people, not computers

mainframeminimicro

time

pric

e

Page 7: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

What Happens NextWhat Happens Next

Last 10 years: Last 10 years: 1000x improvement 1000x improvement

Next 10 years: Next 10 years: ????????

Today: Today: text and image servers are freetext and image servers are free

25 25 $/hit => advertising pays for $/hit => advertising pays for themthem

Future:Future:video, audio, … servers are freevideo, audio, … servers are free“You ain’t seen nothing yet!” “You ain’t seen nothing yet!”

1985 20051995

perf

orm

ance

Page 8: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Kinds Of Kinds Of Information ProcessingInformation Processing

It’s ALL going electronicIt’s ALL going electronic

Immediate is being stored for analysis (so ALL database)Immediate is being stored for analysis (so ALL database)

Analysis and automatic processing are being addedAnalysis and automatic processing are being added

Point-to-pointPoint-to-point BroadcastBroadcast

ImmediateImmediate

Time-Time-shiftedshifted

ConversationConversationMoneyMoney

LectureLectureConcertConcert

MailMail BookBookNewspaperNewspaper

NetworkNetwork

DatabaseDatabase

Page 9: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Why Put EverythingWhy Put EverythingIn Cyberspace?In Cyberspace?

Low rent -Low rent -min $/bytemin $/byte

Shrinks time -Shrinks time -now or laternow or later

Shrinks space -Shrinks space -here or therehere or there

Automate processing -Automate processing -knowbotsknowbots

Point-to-point Point-to-point OR OR

broadcastbroadcast

Imm

ed

iate

OR

tim

e-d

ela

ye

dIm

me

dia

te O

R t

ime

-de

lay

ed

NetworkNetwork

DatabaseDatabase

LocateLocateProcessProcessAnalyzeAnalyzeSummarizeSummarize

Page 10: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Magnetic Storage Magnetic Storage Cheaper Than PaperCheaper Than Paper

File cabinetFile cabinet:: cabinet (four drawer)cabinet (four drawer) 250$250$paper (24,000 paper (24,000

sheets)sheets) 250$250$ space space (2x3 @ 10$/ft(2x3 @ 10$/ft22)) 180$180$ totaltotal

700$700$

3¢/sheet3¢/sheet DiskDisk:: disk (4 GB =)disk (4 GB =) 800$800$

ASCII: 2 mil ASCII: 2 mil pagespages

00..04¢/sheet04¢/sheet (80x cheaper)(80x cheaper)

ImageImage:: 200,000 pages200,000 pages

0.4¢/sheet0.4¢/sheet (8x cheaper)(8x cheaper)

Store everything on diskStore everything on disk

Page 11: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

DatabasesDatabasesInformation at Your FingertipsInformation at Your Fingertips™™

Information Network Information Network™™

Knowledge NavigatorKnowledge Navigator™™

All information will be in anAll information will be in anonline database (somewhere)online database (somewhere)

You might record everything you You might record everything you Read: 10MB/day, 400 GB/lifetimeRead: 10MB/day, 400 GB/lifetime

(eight tapes (eight tapes todaytoday)) Hear: 400MB/day, 16 TB/lifetimeHear: 400MB/day, 16 TB/lifetime

(three tapes/year (three tapes/year todaytoday)) See: 1MB/s, 40GB/day, 1.6 PB/lifetime See: 1MB/s, 40GB/day, 1.6 PB/lifetime

(maybe someday)(maybe someday)

Page 12: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Database StoreDatabase StoreALL Data TypesALL Data Types

The new world:The new world: Billions of objectsBillions of objects Big objects (1 MB)Big objects (1 MB) Objects have Objects have

behavior (methods)behavior (methods)

The old world:The old world: Millions of objectsMillions of objects 100-byte objects100-byte objects

PeoplePeople

NameName AddressAddress

MikeMike

WonWon

DavidDavid NYNY

BerkBerk

AustinAustinPeoplePeople

NameName AddressAddress PapersPapers PicturePicture VoiceVoice

MikeMike

WonWon

DavidDavid NYNY

BerkBerk

AustinAustin

Paperless officePaperless office Library of Congress onlineLibrary of Congress online All information onlineAll information online

EntertainmentEntertainmentPublishingPublishingBusinessBusiness

WWW and InternetWWW and Internet

Page 13: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Billions Of Clients Billions Of Clients

Every device will be “intelligent”Every device will be “intelligent” Doors, rooms, cars…Doors, rooms, cars… Computing will be ubiquitousComputing will be ubiquitous

Page 14: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Billions Of ClientsBillions Of ClientsNeed Millions Of ServersNeed Millions Of Servers

MobileMobileclientsclients

FixedFixedclients clients

ServerServer

SuperSuperserverserver

ClientsClients

ServersServers

All clients networked All clients networked to serversto servers May be nomadicMay be nomadic

or on-demandor on-demand Fast clients wantFast clients wantfasterfaster servers servers

Servers provide Servers provide Shared DataShared Data ControlControl CoordinationCoordination CommunicationCommunication

Page 15: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

ThesisThesisMany little beat few bigMany little beat few big

Smoking, hairy golf ballSmoking, hairy golf ball How to connect the many little parts?How to connect the many little parts? How to program the many little parts?How to program the many little parts? Fault tolerance?Fault tolerance?

$1 $1 millionmillion $100 K$100 K $10 K$10 K

MainframeMainframe MiniMiniMicroMicro NanoNano

14"14"9"9"

5.25"5.25" 3.5"3.5" 2.5"2.5" 1.8"1.8"1 M SPECmarks, 1TFLOP1 M SPECmarks, 1TFLOP

101066 clocks to bulk ram clocks to bulk ram

Event-horizon on chipEvent-horizon on chip

VM reincarnatedVM reincarnated

Multiprogram cache,Multiprogram cache,On-Chip SMPOn-Chip SMP

10 microsecond ram

10 millisecond disc

10 second tape archive

10 nano-second ram

Pico Processor

10 pico-second ram

1 MM 3

100 TB

1 TB

10 GB

1 MB

100 MB

Page 16: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Future Super Server:Future Super Server:4T Machine4T Machine

Array of 1,000 4B machinesArray of 1,000 4B machines1 bps processors1 bps processors1 BB DRAM 1 BB DRAM 10 BB disks 10 BB disks 1 Bbps comm lines1 Bbps comm lines1 TB tape robot1 TB tape robot

A few megabucksA few megabucks Challenge:Challenge:

ManageabilityManageabilityProgrammabilityProgrammabilitySecuritySecurityAvailabilityAvailabilityScaleabilityScaleabilityAffordabilityAffordability

As easy as a single systemAs easy as a single system

Future servers are CLUSTERSFuture servers are CLUSTERSof processors, discsof processors, discs

Distributed database techniquesDistributed database techniquesmake clusters workmake clusters work

CPU

50 GB Disc

5 GB RAM

Cyber BrickCyber Bricka 4B machinea 4B machine

Page 17: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Performance = Storage Performance = Storage AccessesAccesses

not Instructions Executed not Instructions Executed In the “old days” we counted instructions and In the “old days” we counted instructions and IO’sIO’s

Now we count memory referencesNow we count memory referencesProcessors wait most of the timeProcessors wait most of the time

Where the time goes: clock ticks used by AlphaSort Components

SortDisc Wait SortDisc Wait OS

Memory Wait

D-Cache Miss

I-Cache MissB-Cache

Data Miss

70 MIPS“real” apps have worse Icache misses so run at 60 MIPSif well tuned, 20 MIPS if not

Page 18: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Storage Latency: Storage Latency: How Far Away is the Data?How Far Away is the Data?

RegistersOn Chip CacheOn Board Cache

Memory

Disk

12

10

100

Tape /Optical Robot

109

106

This CampusThis Room

My Head

10 min

1.5 hr

2 Years

1 min

Pluto

2,000 YearsAndromeda

Clo

ck T

icks

Sacramento

Page 19: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

The Hardware Is In Place…The Hardware Is In Place…And then a miracle occursAnd then a miracle occurs

? SNAP: scaleable networkSNAP: scaleable network

and platformsand platforms Commodity-distributedCommodity-distributed

OS built on:OS built on: Commodity platformsCommodity platforms Commodity networkCommodity network

interconnectinterconnect Enables parallel applicationsEnables parallel applications

Page 20: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Thesis: Scaleable ServersThesis: Scaleable Servers Scaleable ServersScaleable Servers

Commodity hardware allows new applicationsCommodity hardware allows new applications New applications need huge serversNew applications need huge servers Clients and servers are built of the same “stuff”Clients and servers are built of the same “stuff”

Commodity software and Commodity software and Commodity hardwareCommodity hardware

Servers should be able to Servers should be able to Scale up Scale up (grow node by adding CPUs, disks, networks)(grow node by adding CPUs, disks, networks)

Scale out Scale out (grow by adding nodes)(grow by adding nodes)

Scale down Scale down (can start small)(can start small)

Key software technologiesKey software technologies Objects, Transactions, Clusters, ParallelismObjects, Transactions, Clusters, Parallelism

Page 21: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Scaleable ServersScaleable ServersBOTH SMP And ClusterBOTH SMP And Cluster

Grow up with SMP; 4xP6Grow up with SMP; 4xP6is now standardis now standardGrow out with clusterGrow out with clusterCluster has inexpensive partsCluster has inexpensive parts

ClusterClusterof PCs of PCs

SMP superSMP superserverserver

DepartmentalDepartmentalserverserver

PersonalPersonalsystemsystem

Page 22: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

SMPs Have AdvantagesSMPs Have Advantages

Single system image Single system image easier to manage, easier easier to manage, easier to program threads in to program threads in shared memory, disk, Netshared memory, disk, Net

4x SMP is commodity4x SMP is commodity Software capable of 16xSoftware capable of 16x Problems:Problems:

>4 not commodity>4 not commodity Scale-down problem Scale-down problem

(starter systems expensive)(starter systems expensive) There There isis a BIGGEST one a BIGGEST one

SMP superSMP superserverserver

DepartmentalDepartmentalserverserver

PersonalPersonalsystemsystem

Page 23: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Building the Largest NodeBuilding the Largest Node There is a biggest node (size grows over time)There is a biggest node (size grows over time) Today, with NT, it is probably 1TBToday, with NT, it is probably 1TB We are building itWe are building it (with help from DEC and SPIN2)(with help from DEC and SPIN2)

1 TB GeoSpatial SQL Server database1 TB GeoSpatial SQL Server database (1.4 TB of disks = 320 drives).(1.4 TB of disks = 320 drives). 30K BTU, 8 KVA, 1.5 metric tons.30K BTU, 8 KVA, 1.5 metric tons.

Will put it on the Web as a demo app.Will put it on the Web as a demo app. 10 meter image of the ENTIRE PLANET.10 meter image of the ENTIRE PLANET. 2 meter image of interesting parts 2 meter image of interesting parts (2% of land)(2% of land)

One pixel per meter = 500 TB One pixel per meter = 500 TB uncompressed.uncompressed.

Better resolution in US (courtesy of USGS).Better resolution in US (courtesy of USGS).

www.SQL.1TB.com

SupportSupportfilesfiles

1-TB SQL Server DB1-TB SQL Server DBSatellite and aerial Satellite and aerial

photosphotos

Todo loo da loo-rah, ta da ta-la la la

Todo loo da loo-rah, ta da ta-la la la

Todo loo da loo-rah, ta da ta-la la la

Todo loo da loo-rah, ta da ta-la la la

Todo loo da loo-rah, ta da ta-la la la

Todo loo da loo-rah, ta da ta-la la la

Todo loo da loo-rah, ta da ta-la la la

1-TB home page1-TB home page

TM

Page 24: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

What’s TeraByte?What’s TeraByte? 1 Terabyte:1 Terabyte: 1,000,000,000 business letters 150 miles of book shelf1,000,000,000 business letters 150 miles of book shelf 100,000,000 book pages 100,000,000 book pages 15 miles of book shelf 15 miles of book shelf 50,000,000 FAX images50,000,000 FAX images 7 miles of book shelf 7 miles of book shelf 10,000,000 TV pictures (mpeg) 10 days of video 10,000,000 TV pictures (mpeg) 10 days of video

4,000 LandSat images 4,000 LandSat images 16 earth images (100m) 16 earth images (100m) 100,000,000 web page 10 copies of the web HTML100,000,000 web page 10 copies of the web HTML

Library of Congress (in ASCII) is 25 TBLibrary of Congress (in ASCII) is 25 TB 1980: $200 million of disc1980: $200 million of disc 10,000 discs 10,000 discs

$5 million of tape silo$5 million of tape silo 10,000 tapes 10,000 tapes

1997: $200 k$ of magnetic disc 48 discs1997: $200 k$ of magnetic disc 48 discs $30 k$ nearline tape 20 tapes$30 k$ nearline tape 20 tapes

Terror Byte !Terror Byte !

Page 25: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

TB DB User InterfaceTB DB User Interface

Next

Page 26: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Tpc-C Web-Based BenchmarksTpc-C Web-Based Benchmarks Client is a Web browser Client is a Web browser

(7,500 of them!)(7,500 of them!) Submits Submits

OrderOrder InvoiceInvoice Query to server via Web Query to server via Web

page interfacepage interface

Web server translates to DBWeb server translates to DB SQL does DB workSQL does DB work Net: Net:

easy to implement easy to implement performance is GREAT!performance is GREAT!

HT

TP

HT

TP

OD

BC

OD

BC

SQL SQL

IISIIS= Web= Web

Page 27: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

TPC-C TPC-C Shows How Far SMPs have comeShows How Far SMPs have comeTPC-C TPC-C Shows How Far SMPs have comeShows How Far SMPs have come Performance is amazing: Performance is amazing:

2,000 users is the min!2,000 users is the min! 30,000 users on a 4x12 alpha cluster (Oracle)30,000 users on a 4x12 alpha cluster (Oracle)

Peak Performance: Peak Performance: 30,390 tpmC30,390 tpmC @ $305/tpmC @ $305/tpmC (Oracle/DEC)(Oracle/DEC)

Best Price/Perf: 6,712 tpmC @ Best Price/Perf: 6,712 tpmC @ $65/tpmC$65/tpmC ( (MS SQL/DEC/Intel)MS SQL/DEC/Intel)

graphs show UNIX high price & diseconomy of scaleupgraphs show UNIX high price & diseconomy of scaleuptpmC & Price Performance(only "best" data shown for each vendor)

0

50

100

150

200

250

300

350

400

0 5000 10000 15000 20000

tpmC

$/tp

mC

DB2

Informix

MS SQL Server

Oracle

Sybase

Page 28: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

TPC C SMP PerformanceTPC C SMP Performance

tpmC vs CPS

0

5,000

10,000

15,000

20,000

0 5 10 15 20

CPUs

tpm

C

SUN Scaleability

0

2,000

4,000

6,000

8,000

10,000

12,000

14,000

16,000

18,000

20,000

0 5 10 15 20

cpus

tpm

C

SUN Scaleability

SQL Server

• SMPs do offer speedup but 4x P6 is better than some 18x MIPSco

Page 29: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

The TPC-C RevolutionThe TPC-C Revolution Shows How Far Shows How Far

NT and SQL Server have ComeNT and SQL Server have Come Economy of scale on Windows NT Economy of scale on Windows NT Recent Microsoft SQL Server benchmarks Recent Microsoft SQL Server benchmarks

are Web-basedare Web-based

tpmC and $/tpmCMS SQL Server: Economy of Scale & Low Price

$0

$50

$100

$150

$200

$250

0 1000 2000 3000 4000 5000 6000 7000 8000

Performance tpmC

Pri

ce

$/T

PM

-C

DB2

Informix

Microsoft

Oracle

Sybase

Bet

ter

Bet

ter

Page 30: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

30

What Happens To Prices?What Happens To Prices? No expensive UNIX front end (20$/tpmC)No expensive UNIX front end (20$/tpmC) No expensive TP monitor software (10$/tpmC)No expensive TP monitor software (10$/tpmC)

=> => 65$/tpmC65$/tpmCTPC Price/tpmC

164

93

188

39

66 64

54

3944

66

44 4440

42

31

3835

38

22

41

18

35

16

3945

30

8

19

27

40

3

21

0

10

20

30

40

50

60

70

80

90

100

processor disk software net

Informix on SNIOracle on DEC UnixOracle on Compaq/NTSybase on Compaq/NTMicrosoft on Compaq with VisigenicsMicrosoft on HP with VisagenicsMicrosoft on Intergraph with IISMicrosoft on Compaq with IIS

Page 31: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Grow UP and OUT Grow UP and OUT

1 billion 1 billion transactions transactions

per dayper day

SMP superSMP superserverserver

DepartmentalDepartmentalserverserver

PersonalPersonalsystemsystem

1 Terabyte DB1 Terabyte DB

Cluster: •a collection of nodes •as easy to program and manage as a single node

Page 32: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Clusters Have AdvantagesClusters Have Advantages

Clients and servers made from the same stuffClients and servers made from the same stuff Inexpensive: Inexpensive:

Built with commodity components Built with commodity components

Fault tolerance: Fault tolerance: Spare modules mask failuresSpare modules mask failures

Modular growthModular growth Grow by adding small modulesGrow by adding small modules

Unlimited growth: Unlimited growth: no biggest oneno biggest one

Page 33: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Windows NT Windows NT clustersclusters Key goals:Key goals:

Easy: to install, manage, programEasy: to install, manage, program Reliable: better than a single nodeReliable: better than a single node Scaleable: added parts add powerScaleable: added parts add power

Microsoft & 60 vendors Microsoft & 60 vendors defining NT clustersdefining NT clusters Almost all big hardware and Almost all big hardware and

software vendors involvedsoftware vendors involved No special hardware needed - No special hardware needed -

but it may helpbut it may help Enables Enables

Commodity fault-toleranceCommodity fault-tolerance Commodity parallelism Commodity parallelism

(data mining, virtual reality…)(data mining, virtual reality…) Also great for workgroups!Also great for workgroups!

Initial: two-node failoverInitial: two-node failover Beta testing since December96Beta testing since December96 SAP, Microsoft, Oracle giving SAP, Microsoft, Oracle giving

demos.demos. File, print, Internet, mail, DB, other File, print, Internet, mail, DB, other

servicesservices Easy to manageEasy to manage Each node can be 4x (or more) SMPEach node can be 4x (or more) SMP

Next (NT5) “Wolfpack” is modest Next (NT5) “Wolfpack” is modest size clustersize cluster About 16 nodes (so 64 to 128 CPUs)About 16 nodes (so 64 to 128 CPUs) No hard limit, algorithms designedNo hard limit, algorithms designed

to go furtherto go further

Page 34: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

SQL ServerSQL Server™™ Failover Failover Using “Wolfpack” Windows NT ClustersUsing “Wolfpack” Windows NT Clusters

Each server “owns” half the databaseEach server “owns” half the database When one fails…When one fails…

The other server takes over the shared disksThe other server takes over the shared disks Recovers the database and serves itRecovers the database and serves it

Shared SCSI disk stringsShared SCSI disk strings

AA BB

PrivatePrivatedisksdisks

PrivatePrivatedisksdisks

ClientsClients

Page 35: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Billion Transactions per DayBillion Transactions per DayProjectProject

Building a 20-node Windows NT Building a 20-node Windows NT Cluster (with help from Intel)Cluster (with help from Intel)> 800 disks> 800 disks

All commodity partsAll commodity parts Using SQL Server & Using SQL Server &

DTC distributed transactionsDTC distributed transactions Each node has 1/20 th of the DB Each node has 1/20 th of the DB Each node does 1/20 th of the Each node does 1/20 th of the

workwork 15% of the transactions are 15% of the transactions are

“distributed”“distributed”

Page 36: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

How Much Is 1 Billion How Much Is 1 Billion Transactions Per Day?Transactions Per Day?

Millions of transactions per dayMillions of transactions per day

0.10.1

1.1.

10.10.

100.100.

1,000.1,000.

1 B

tpd

1 B

tpd

Vis

aV

isa

AT

&T

AT

&T

Bo

fAB

ofA

NY

SE

NY

SE

Mtp

dM

tpd

1 Btpd = 11,574 tps 1 Btpd = 11,574 tps (transactions per second)(transactions per second) ~ 700,000 tpm ~ 700,000 tpm (transactions/minute)(transactions/minute)

AT&T AT&T 185 million calls 185 million calls

(peak day worldwide)(peak day worldwide) Visa ~20 M tpdVisa ~20 M tpd

400 M customers400 M customers 250,000 ATMs worldwide250,000 ATMs worldwide 7 billion transactions / year 7 billion transactions / year

(card+cheque) in 1994 (card+cheque) in 1994

Page 37: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

ParallelismParallelismThe OTHER aspect of clustersThe OTHER aspect of clusters

Clusters of machines Clusters of machines allow two kinds allow two kinds of parallelismof parallelism Many little jobs: online Many little jobs: online

transaction processingtransaction processing TPC-A, B, C…TPC-A, B, C…

A few big jobs: data A few big jobs: data search and analysissearch and analysis TPC-D, DSS, OLAPTPC-D, DSS, OLAP

Both give Both give automatic parallelismautomatic parallelism

Page 38: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey

Kinds of Parallel ExecutionKinds of Parallel Execution

Pipeline

Partition outputs split N ways inputs merge M ways

Any Sequential Program

Any Sequential Program

Any Sequential

Any Sequential Program Program

Page 39: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey

Data RiversData Rivers Split + Merge StreamsSplit + Merge Streams

River

M ConsumersN producers

Producers add records to the river, Consumers consume records from the riverPurely sequential programming.River does flow control and buffering

does partition and merge of data records River = Split/Merge in Gamma = Exchange operator in Volcano.

N X M Data Streams

Page 40: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey

Partitioned ExecutionPartitioned Execution

A...E F...J K...N O...S T...Z

A Table

Count Count Count Count Count

Count

Spreads computation and IO among processors

Partitioned data gives NATURAL parallelism

Page 41: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Jim Gray & Gordon Bell: VLDB 95 Parallel Database Systems Survey

N x M way ParallelismN x M way Parallelism

A...E F...J K...N O...S T...Z

Merge

Join

Sort

Join

Sort

Join

Sort

Join

Sort

Join

Sort

Merge Merge

N inputs, M outputs, no bottlenecks.

Partitioned DataPartitioned and Pipelined Data Flows

Page 42: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

The Parallel Law The Parallel Law Of ComputingOf Computing

Grosch's Law: Grosch's Law:

Parallel Law:Parallel Law:Needs:Needs:

Linear speedup and linear scale-upLinear speedup and linear scale-upNot always possibleNot always possible 1 MIPS 1 MIPS

1 $1 $

1,000 MIPS1,000 MIPS1,000 $1,000 $

2x $ is2x performance

1 MIPS1 MIPS1 $1 $

1,000 MIPS1,000 MIPS 32 $32 $.03$/MIPS.03$/MIPS

2x $ is 4x performance

Page 43: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Thesis: Scaleable ServersThesis: Scaleable Servers Scaleable ServersScaleable Servers

Commodity hardware allows new applicationsCommodity hardware allows new applications New applications need huge serversNew applications need huge servers Clients and servers are built of the same “stuff”Clients and servers are built of the same “stuff”

Commodity software and Commodity software and Commodity hardwareCommodity hardware

Servers should be able to Servers should be able to Scale up Scale up (grow node by adding CPUs, disks, networks)(grow node by adding CPUs, disks, networks)

Scale out Scale out (grow by adding nodes)(grow by adding nodes)

Scale down Scale down (can start small)(can start small)

Key software technologiesKey software technologies Objects, Transactions, Clusters, ParallelismObjects, Transactions, Clusters, Parallelism

Page 44: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

The BIG PictureThe BIG PictureComponents and transactionsComponents and transactions

Software modules are objects Software modules are objects Object Request Broker (a.k.a., Transaction Object Request Broker (a.k.a., Transaction

Processing Monitor) connects objectsProcessing Monitor) connects objects(clients to servers)(clients to servers)

Standard interfaces allow software plug-insStandard interfaces allow software plug-ins Transaction ties execution of a “job” into an Transaction ties execution of a “job” into an

atomic unit: all-or-nothing, durable, isolatedatomic unit: all-or-nothing, durable, isolated

Object Request BrokerObject Request Broker

Page 45: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

ActiveX and COMActiveX and COM COM is Microsoft model, engine inside OLE ALL COM is Microsoft model, engine inside OLE ALL

Microsoft software is based on COM (ActiveX)Microsoft software is based on COM (ActiveX) CORBA + OpenDoc is equivalentCORBA + OpenDoc is equivalent Heated debate over which is bestHeated debate over which is best Both share same key goals: Both share same key goals:

Encapsulation: hide implementationEncapsulation: hide implementation Polymorphism: generic operationsPolymorphism: generic operations

key to GUI and reuse key to GUI and reuse Versioning: allow upgradesVersioning: allow upgrades Transparency: local/remoteTransparency: local/remote Security: invocation can be remote Security: invocation can be remote Shrink-wrap: minimal inheritanceShrink-wrap: minimal inheritance Automation: easyAutomation: easy

COM now managed by the Open GroupCOM now managed by the Open Group

Page 46: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Linking And EmbeddingLinking And EmbeddingObjects are data modules;Objects are data modules;

transactions are execution modulestransactions are execution modules

Link: pointer to object Link: pointer to object somewhere elsesomewhere else Think URL in InternetThink URL in Internet

Embed: bytesEmbed: bytesare hereare here

Objects may be Objects may be activeactive; ; can callback to subscriberscan callback to subscribers

Page 47: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Objects Meet DatabasesObjects Meet DatabasesThe basis for The basis for universaluniversal

data servers, access, & integrationdata servers, access, & integration

DBMSDBMSengineengine

object-oriented (COM oriented) object-oriented (COM oriented) programming interface to dataprogramming interface to data

Breaks DBMS into componentsBreaks DBMS into components Anything can be a data sourceAnything can be a data source Optimization/navigation “on top Optimization/navigation “on top

of” other data sourcesof” other data sources A way to componentized a A way to componentized a

DBMSDBMS Makes an RDBMS and O-RMakes an RDBMS and O-R

DBMS (assumes optimizer DBMS (assumes optimizer understands objects)understands objects)

DatabaseDatabase

SpreadsheetSpreadsheet

PhotosPhotos

MailMail

MapMap

DocumentDocument

Page 48: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

49

The Pattern: The Pattern: Three Tier ComputingThree Tier Computing

Clients do presentation, gather inputClients do presentation, gather input

Clients do some workflow (Xscript)Clients do some workflow (Xscript)

Clients send high-level requests to Clients send high-level requests to ORB (Object Request Broker)ORB (Object Request Broker)

ORB dispatches workflows and ORB dispatches workflows and business objects -- proxies for client, business objects -- proxies for client, orchestrate flows & queuesorchestrate flows & queues

Server-side workflow scripts call on Server-side workflow scripts call on distributed business objects to distributed business objects to execute taskexecute task

Database

Business Objects

workflow

Presentation

Page 49: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

50

The Three The Three TiersTiers

Web Client

HTML

VB or Java Script Engine

VB or Java Virt Machine

VBscritptJavaScrpt

VB Javaplug-ins

InternetORB

HTTP+DCOM

ObjectserverPool

MiddlewareORB

TP MonitorWeb Server...

DCOM (oleDB, ODBC,...)

Object & Dataserver.

LU6.2

IBMLegacy Gateways

Page 50: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

51

Why Did Everyone Go To Why Did Everyone Go To Three-Tier?Three-Tier?

ManageabilityManageability Business rules must be with dataBusiness rules must be with data Middleware operations toolsMiddleware operations tools

Performance (scaleability)Performance (scaleability) Server resources are preciousServer resources are precious ORB dispatches requests to server poolsORB dispatches requests to server pools

Technology & PhysicsTechnology & Physics Put UI processing near userPut UI processing near user Put shared data processing near shared Put shared data processing near shared

datadataDatabase

Business Objects

workflow

Presentation

Page 51: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

53

What Middleware DoesWhat Middleware Does ORB, TP Monitor, Workflow Mgr, Web Server ORB, TP Monitor, Workflow Mgr, Web Server

Registers transaction programs Registers transaction programs

workflow and business objects (DLLs)workflow and business objects (DLLs) Pre-allocates server poolsPre-allocates server pools Provides server execution environmentProvides server execution environment Dynamically checks authorityDynamically checks authority

(request-level security)(request-level security)

Does parameter bindingDoes parameter binding Dispatches requests to serversDispatches requests to servers

parameter bindingparameter binding load balancingload balancing

Provides QueuesProvides Queues Operator interfaceOperator interface

Page 52: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

54

Server Side ObjectsServer Side Objects Easy Server-Side ExecutionEasy Server-Side Execution

Give simple execution Give simple execution environmentenvironment

Object gets Object gets startstart invokeinvoke shutdownshutdown

Everything else is Everything else is automaticautomatic

Drag & Drop Business Drag & Drop Business ObjectsObjects

NetworkNetwork

Thread PoolThread Pool

QueueQueue

ConnectionsConnections

ContextContext SecuritySecurity

Shared Data

ReceiverReceiver

SynchronizationSynchronization

Service logic

Co

nfig

ura

tion

Co

nfig

ura

tion

Ma

na

ge

me

nt

Ma

na

ge

me

nt

A Server

Page 53: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

A new programming paradigm Develop object on the desktopDevelop object on the desktop Better yet: download them from the NetBetter yet: download them from the Net Script work flows as method invocations Script work flows as method invocations All on desktopAll on desktop Then, move work flows and objects to server(s)Then, move work flows and objects to server(s) GivesGives

desktop development desktop development three-tier deploymentthree-tier deploymentSoftware CyberbricksSoftware Cyberbricks

Page 54: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Transactions Coordinate Transactions Coordinate Components (ACID)Components (ACID)

Transaction propertiesTransaction properties Atomic: all or nothingAtomic: all or nothing Consistent: old and new valuesConsistent: old and new values Isolated: automatic locking or versioningIsolated: automatic locking or versioning Durable: once committed, effects surviveDurable: once committed, effects survive Transactions are built into modern OSsTransactions are built into modern OSs

MVS/TM Tandem TMF, VMS DEC-DTM, NT-DTCMVS/TM Tandem TMF, VMS DEC-DTM, NT-DTC

Page 55: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Transactions & ObjectsTransactions & Objects Application requests transaction Application requests transaction

identifier (XID)identifier (XID) XID flows with method invocationsXID flows with method invocations Object Managers join (enlist)Object Managers join (enlist)

in transactionin transaction Distributed Transaction Manager Distributed Transaction Manager

coordinates commit/abortcoordinates commit/abort

Page 56: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Distributed TransactionsDistributed Transactions Enable Huge Throughput Enable Huge Throughput

Each node capable of 7 KtmpC Each node capable of 7 KtmpC (7,000 (7,000 activeactive users!) users!) Can add nodes to cluster Can add nodes to cluster (to support 100,000 users)(to support 100,000 users)

Transactions coordinate nodesTransactions coordinate nodes ORB / TP monitor spreads work among nodesORB / TP monitor spreads work among nodes

Page 57: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Distributed TransactionsDistributed Transactions Enable Huge DBs Enable Huge DBs

Distributed database technology Distributed database technology spreads data among nodesspreads data among nodes

Transaction processing technology Transaction processing technology manages nodesmanages nodes

Page 58: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Thesis: Scaleable ServersThesis: Scaleable Servers Scaleable Servers Built from CyberbricksScaleable Servers Built from Cyberbricks

Allow new applicationsAllow new applications

Servers should be able to Servers should be able to Scale up, out, downScale up, out, down

Key software technologiesKey software technologies Clusters (ties the hardware together)Clusters (ties the hardware together) Parallelism: (Parallelism: (uses the independent cpus, stores, wiresuses the independent cpus, stores, wires

Objects (software CyberBricks) Objects (software CyberBricks) Transactions: masks errors.Transactions: masks errors.

Page 59: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Computer Industry Laws Computer Industry Laws (Rules of thumb)(Rules of thumb)

Metcalf’s lawMetcalf’s law Moore’s first lawMoore’s first law Bell’s computer classes (7 price tiers)Bell’s computer classes (7 price tiers) Bell’s platform evolutionBell’s platform evolution Bell’s platform economicsBell’s platform economics Bill’s lawBill’s law Software economicsSoftware economics Grove’s lawGrove’s law Moore’s second lawMoore’s second law Is info-demand infinite?Is info-demand infinite? The death of Grosch’s lawThe death of Grosch’s law

Page 60: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Metcalf’s LawMetcalf’s LawNetwork Utility = UsersNetwork Utility = Users22

How many connections can it How many connections can it make?make? 1 user: no utility1 user: no utility 100,000 users: a few contacts100,000 users: a few contacts 1 million users: many on Net1 million users: many on Net 1 billion users: everyone on Net1 billion users: everyone on Net

That is why the Internet is so “hot”That is why the Internet is so “hot” Exponential benefitExponential benefit

Page 61: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

XXX doubles every 18 months XXX doubles every 18 months 60% increase per year60% increase per year Micro processor speedsMicro processor speeds Chip densityChip density Magnetic disk densityMagnetic disk density Communications bandwidthCommunications bandwidth

WAN bandwidth approaching LANsWAN bandwidth approaching LANs Exponential growth:Exponential growth:

The past does not matterThe past does not matter 10x here, 10x there, soon you’re talking REAL 10x here, 10x there, soon you’re talking REAL

changechange PC costs decline faster than any other PC costs decline faster than any other

platformplatform Volume and learning curvesVolume and learning curves PCs will be the building bricks of all future PCs will be the building bricks of all future

systemssystems

Moore’s First LawMoore’s First Law

128KB128KB

128MB128MB

200020008KB8KB

1MB1MB

8MB8MB

1GB1GB

19701970 19801980 19901990

1M1M 16M16Mbits: 1Kbits: 1K 4K4K 16K16K 64K64K 256K256K 4M4M 64M64M 256M256M

1 chip memory size1 chip memory size ( 2 MB to 32 MB)( 2 MB to 32 MB)

Page 62: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Bumps In The Moore’s Bumps In The Moore’s Law RoadLaw Road

DRAM:DRAM: 1988: United States1988: United States

anti-dumping anti-dumping rulesrules

1993-1995: ?price flat1993-1995: ?price flat

10000001000000

11

100100

1000010000

19701970 19801980 19901990 20002000

$/MB of DRAM$/MB of DRAM

.01.01

11

100100

10,00010,000

19701970 19801980 19901990 20002000

$/MB of DISK$/MB of DISK Magnetic disk:Magnetic disk: 1965-1989: 10x/decade1965-1989: 10x/decade 1989-1996: 4x/3year!1989-1996: 4x/3year!

100X/decade100X/decade

Page 63: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Gordon Bell’s 1975 VAX Gordon Bell’s 1975 VAX Planning Model... He Didn’t Planning Model... He Didn’t

Believe It!Believe It!

5x: Memory is5x: Memory is20% of cost20% of cost3x: DEC markup3x: DEC markup.04x: $ per byte.04x: $ per byte

He didn’t believe:He didn’t believe:the projectionthe projection$500 machine$500 machine

He couldn’tHe couldn’tcomprehendcomprehendthe implicationsthe implications 0.01K$

0.1K$

1.K$

10.K$

100.K$

1,000.K$

10,000.K$

100,000.K$

1960 1970 1980 1990 2000

16 KB 64 KB 256 KB 1 MB 8 MB

System Price = 5 x 3 x .04 x memory size/ 1.26 System Price = 5 x 3 x .04 x memory size/ 1.26 (t-1972) (t-1972) K$K$

Page 64: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Gordon Bell’s ProcessingGordon Bell’s ProcessingMemories, And Comm 100 Memories, And Comm 100

YearsYears

1.E+001.E+00

1.E+031.E+03

1.E+061.E+06

1.E+091.E+09

1.E+121.E+12

1.E+151.E+15

1.E+181.E+18

19471947 19671967 19871987 20072007 20272027 20472047

ProcessingProcessing Pri. MemPri. Mem Sec. Mem.Sec. Mem.

POTS(bps)POTS(bps) BackboneBackbone

Page 65: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Gordon Bell’s Seven Price Gordon Bell’s Seven Price TiersTiers

10$: 10$: wrist watch computerswrist watch computers

100$:100$: pocket/ palm computerspocket/ palm computers

1,000$:1,000$: portable computersportable computers

10,000$: 10,000$: personal computers (desktop)personal computers (desktop)

100,000$: 100,000$: departmental computers departmental computers (closet)(closet)

1,000,000$:1,000,000$: site computers (glass house)site computers (glass house)

10,000,000$:10,000,000$: regional computers (glass regional computers (glass castle)castle) Super server: costs more than $100,000

“Mainframe”: costs more than $1 millionMust be an array of processors, disks, tapes, comm ports

Page 66: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Bell’s Evolution Of Bell’s Evolution Of Computer ClassesComputer Classes

Technology enables two evolutionary paths:1. constant performance, decreasing cost2. constant price, increasing performance

????TimeTime

Mainframes (central)Mainframes (central)

Minis (dep’t.)Minis (dep’t.)

PCs (personals)PCs (personals)Log

pri

ce

Log

pri

ce

WSsWSs

1.26 = 2x/3 yrs -- 10x/decade; 1/1.26 = .81.26 = 2x/3 yrs -- 10x/decade; 1/1.26 = .81.6 = 4x/3 yrs --100x/decade; 1/1.6 = .621.6 = 4x/3 yrs --100x/decade; 1/1.6 = .62

Page 67: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Gordon Bell’s Gordon Bell’s Platform EconomicsPlatform Economics

Computer typeComputer type

0.010.01

0.10.1

11

1010

100100

10001000

1000010000

100000100000

MainframeMainframe WSWS BrowserBrowser

Price (K$)Price (K$)

Volume (K)Volume (K)

ApplicationApplicationpriceprice

Traditional computers: custom or semi-custom,Traditional computers: custom or semi-custom, high-tech and high-touch high-tech and high-touch

New computers: high-tech and no-touch New computers: high-tech and no-touch

Page 68: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Software Software EconomicsEconomics

An engineer costs An engineer costs aboutabout$150,000/year$150,000/year

R&D gets [5%…15%]R&D gets [5%…15%]of budgetof budget

Need [$3 million…Need [$3 million…$1 million] revenue $1 million] revenue per engineer per engineer

Microsoft: $9 billionMicrosoft: $9 billion

R&DR&D16%16%

SG&ASG&A34%34%

ProductProductand Serviceand Service

13%13%

TaxTax13%13%

ProfitProfit24%24%

Intel: $16 billionIntel: $16 billion

R&DR&D8%8%

SG&ASG&A11%11%

P&SP&S47%47%

TaxTax

12%12%

ProfitProfit22%22%

R&DR&D8%8%

SG&ASG&A22%22%

P&SP&S59%59%

TaxTax5%5%

ProfitProfit6%6%

IBM: $72 billionIBM: $72 billion

R&DR&D9%9%

SG&ASG&A43%43%

TaxTax7%7%

ProfitProfit15%15%

P&SP&S26%26%

Oracle: $3 billionOracle: $3 billion

Page 69: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Software Economics: Bill’s Software Economics: Bill’s LawLaw

Bill Joy’s law (Sun): Bill Joy’s law (Sun): don’t write software for less than 100,000 platforms don’t write software for less than 100,000 platforms

@$10 million engineering expense, $1,000 price@$10 million engineering expense, $1,000 price Bill Gate’s law:Bill Gate’s law:

don’t write software for less than 1,000,000 platforms don’t write software for less than 1,000,000 platforms @$10 engineering expense, $100 price@$10 engineering expense, $100 price

Examples: Examples: UNIX versus Windows NT: $3,500 versus $500UNIX versus Windows NT: $3,500 versus $500Oracle versus SQL-Server: $100,000 versus $6,000Oracle versus SQL-Server: $100,000 versus $6,000No spreadsheet or presentation pack on UNIX/VMS/...No spreadsheet or presentation pack on UNIX/VMS/...

Commoditization of base software and hardwareCommoditization of base software and hardware

PricePrice Fixed_Fixed_CostCostMarginal _CostMarginal _Cost==

UnitsUnits ++

Page 70: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Grove’s LawGrove’s LawThe New Computer IndustryThe New Computer Industry

Horizontal Horizontal integrationintegrationis new structureis new structure

Each layer picks Each layer picks best from lower best from lower layerlayer

Desktop (C/S) Desktop (C/S) marketmarket1991: 50%1991: 50%1995: 75%1995: 75%

Intel & SeagateIntel & SeagateSilicon & OxideSilicon & Oxide

SystemsSystemsBasewareBasewareMiddlewareMiddlewareApplicationsApplications SAPSAP

OracleOracleMicrosoftMicrosoft

CompaqCompaq

IntegrationIntegration EDSEDS

OperationOperation AT&TAT&TFunctionFunction ExampleExample

Page 71: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Moore’s Second Moore’s Second LawLaw

The cost of fab linesThe cost of fab linesdoubles every generation doubles every generation (three years)(three years)

Money limit hard to imagine:Money limit hard to imagine: $10-billion line$10-billion line $20-billion line$20-billion line $40-billion line$40-billion line

Physical limitPhysical limit Quantum effects at 0.25 Quantum effects at 0.25

micron now 0.05 micron micron now 0.05 micron seems hard 12 years, three seems hard 12 years, three generationsgenerations

Lithograph: need Xray Lithograph: need Xray below 0.13 micronbelow 0.13 micron

$1$1

$10$10

$100$100

$1,000$1,000

$10,000$10,000

19601960 19701970 19801980 19901990 20002000

YearYear

$mil

lion

/ Fab

Lin

e$m

illi

on/ F

ab L

ine

Page 72: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Constant workConstant work:: One SuperServer can doOne SuperServer can do

all the world’s computationsall the world’s computations

Constant dollars:Constant dollars: The world spends 10% onThe world spends 10% on

information processinginformation processing Computers are moving fromComputers are moving from

5% penetration to 50%5% penetration to 50% $300 billion to $3 trillion$300 billion to $3 trillion We have the patentWe have the patent

on the byte and algorithmon the byte and algorithm

Constant Dollars Versus Constant Dollars Versus Constant WorkConstant Work

Page 73: Scaleable Computing Jim Gray Microsoft Corporation Gray@Microsoft.com ™

Crossing The ChasmCrossing The Chasm

OldOldmarketmarket

OldOldtechnologytechnology

NewNewtechnologytechnology

VeryVeryhard

hard

HardHardBoringBoring

competitivecompetitiveslow growthslow growth

No productNo productno customersno customers

Product findsProduct finds customerscustomers

CustomersCustomersfind productfind product

HardHard

New New marketmarket