scaleable servers jim gray [email protected]

42
Scaleable Servers Scaleable Servers Jim Gray Jim Gray Microsoft Microsoft [email protected] [email protected] http://www.research.Microsoft.com/ http://www.research.Microsoft.com/ ~Gray ~Gray

Upload: emory-parrish

Post on 17-Jan-2016

221 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

Scaleable ServersScaleable ServersJim GrayJim Gray

[email protected]@Microsoft.com

http://www.research.Microsoft.com/~Grayhttp://www.research.Microsoft.com/~Gray

Page 2: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

Thesis: Scaleable ServersThesis: Scaleable Servers Scaleable ServersScaleable Servers

Commodity hardware allows new applicationsCommodity hardware allows new applications New applications need huge serversNew applications need huge servers Clients and servers are built of the same “stuff”Clients and servers are built of the same “stuff”

Commodity software and Commodity software and Commodity hardwareCommodity hardware

Servers should be able to Servers should be able to Scale up Scale up (grow node by adding CPUs, disks, networks)(grow node by adding CPUs, disks, networks)

Scale out Scale out (grow by adding nodes)(grow by adding nodes)

Scale down Scale down (can start small)(can start small)

Key software technologiesKey software technologies Objects, Transactions, Clusters, ParallelismObjects, Transactions, Clusters, Parallelism

Page 3: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

1987: 256 tps Benchmark 1987: 256 tps Benchmark 14 M$ computer (Tandem)14 M$ computer (Tandem) A dozen peopleA dozen people False floor, 2 rooms of machinesFalse floor, 2 rooms of machines

Simulate 25,600 clients

A 32 node processor array

A 40 GB disk array (80 drives)

OS expert

Network expert

DB expert

Performance expert

Hardware experts

Admin expert

Auditor

Manager

Page 4: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

1988: DB2 + CICS Mainframe1988: DB2 + CICS Mainframe65 tps65 tps

IBM 4391 IBM 4391 Simulated network of 800 clientsSimulated network of 800 clients 2m$ computer2m$ computer Staff of 6 to do benchmarkStaff of 6 to do benchmark

2 x 3725 network controllers

16 GB disk farm4 x 8 x .5GB

Refrigerator-sizedCPU

Page 5: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

1997: 10 years later1997: 10 years later1 Person and 1 box = 1250 tps1 Person and 1 box = 1250 tps

1 Breadbox ~ 5x 1987 machine room1 Breadbox ~ 5x 1987 machine room 23 GB is hand-held23 GB is hand-held One person does all the workOne person does all the work Cost/tps is 1,000x lessCost/tps is 1,000x less

25 micro dollars per transaction25 micro dollars per transaction4x200 Mhz cpu1/2 GB DRAM12 x 4GB disk

Hardware expertOS expertNet expertDB expertApp expert

3 x7 x 4GB disk arrays

Page 6: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

What Happened?What Happened? Moore’s law: Moore’s law:

Things get 4x better every 3 yearsThings get 4x better every 3 years (applies to computers, storage, and networks)(applies to computers, storage, and networks)

New Economics: CommodityNew Economics: Commodityclassclass price/mips softwareprice/mips software $/mips k$/year $/mips k$/yearmainframe mainframe 10,000 10,000 100 100 minicomputerminicomputer 100 100 10 10microcomputer 10 microcomputer 10 1 1

GUI: Human - computer tradeoffGUI: Human - computer tradeoffoptimize for people, not computersoptimize for people, not computers

mainframeminimicro

time

pric

e

Page 7: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

Billions Of ClientsBillions Of ClientsNeed Millions Of ServersNeed Millions Of Servers

MobileMobileclientsclients

FixedFixedclients clients

ServerServer

SuperSuperserverserver

ClientsClients

ServersServers

All clients networked All clients networked to serversto servers May be nomadicMay be nomadic

or on-demandor on-demand Fast clients wantFast clients wantfasterfaster servers servers

Servers provide Servers provide Shared DataShared Data ControlControl CoordinationCoordination CommunicationCommunication

Page 8: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

ThesisThesisMany little beat few bigMany little beat few big

Smoking, hairy golf ballSmoking, hairy golf ball How to connect the many little parts?How to connect the many little parts? How to program the many little parts?How to program the many little parts? Fault tolerance?Fault tolerance?

$1 $1 millionmillion $100 K$100 K $10 K$10 K

MainframeMainframe MiniMiniMicroMicro NanoNano

14"14"9"9"

5.25"5.25" 3.5"3.5" 2.5"2.5" 1.8"1.8"1 M SPECmarks, 1TFLOP1 M SPECmarks, 1TFLOP

101066 clocks to bulk ram clocks to bulk ram

Event-horizon on chipEvent-horizon on chip

VM reincarnatedVM reincarnated

Multiprogram cache,Multiprogram cache,On-Chip SMPOn-Chip SMP

10 microsecond ram

10 millisecond disc

10 second tape archive

10 nano-second ram

Pico Processor

10 pico-second ram

1 MM 3

100 TB

1 TB

10 GB

1 MB

100 MB

Page 9: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

Future Super Server:Future Super Server:4T Machine4T Machine

Array of 1,000 4B machinesArray of 1,000 4B machines1 bps processors1 bps processors1 BB DRAM 1 BB DRAM 10 BB disks 10 BB disks 1 Bbps comm lines1 Bbps comm lines1 TB tape robot1 TB tape robot

A few megabucksA few megabucks Challenge:Challenge:

ManageabilityManageabilityProgrammabilityProgrammabilitySecuritySecurityAvailabilityAvailabilityScaleabilityScaleabilityAffordabilityAffordability

As easy as a single systemAs easy as a single system

Future servers are CLUSTERSFuture servers are CLUSTERSof processors, discsof processors, discs

Distributed database techniquesDistributed database techniquesmake clusters workmake clusters work

CPU

50 GB Disc

5 GB RAM

Cyber BrickCyber Bricka 4B machinea 4B machine

Page 10: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

The Hardware Is In Place…The Hardware Is In Place…And then a miracle occursAnd then a miracle occurs

? SNAP: scaleable networkSNAP: scaleable network

and platformsand platforms Commodity-distributedCommodity-distributed

OS built on:OS built on: Commodity platformsCommodity platforms Commodity networkCommodity network

interconnectinterconnect Enables parallel applicationsEnables parallel applications

Page 11: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

Thesis: Scaleable ServersThesis: Scaleable Servers Scaleable ServersScaleable Servers

Commodity hardware allows new applicationsCommodity hardware allows new applications New applications need huge serversNew applications need huge servers Clients and servers are built of the same “stuff”Clients and servers are built of the same “stuff”

Commodity software and Commodity software and Commodity hardwareCommodity hardware

Servers should be able to Servers should be able to Scale up Scale up (grow node by adding CPUs, disks, networks)(grow node by adding CPUs, disks, networks)

Scale out Scale out (grow by adding nodes)(grow by adding nodes)

Scale down Scale down (can start small)(can start small)

Key software technologiesKey software technologies Objects, Transactions, Clusters, ParallelismObjects, Transactions, Clusters, Parallelism

Page 12: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

Scaleable ServersScaleable ServersBOTH SMP And ClusterBOTH SMP And Cluster

Grow up with SMP; 4xP6Grow up with SMP; 4xP6is now standardis now standardGrow out with clusterGrow out with clusterCluster has inexpensive partsCluster has inexpensive parts

ClusterClusterof PCs of PCs

SMP superSMP superserverserver

DepartmentalDepartmentalserverserver

PersonalPersonalsystemsystem

Page 13: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

SMPs Have AdvantagesSMPs Have Advantages

Single system image Single system image easier to manage, easier easier to manage, easier to program threads in to program threads in shared memory, disk, Netshared memory, disk, Net

4x SMP is commodity4x SMP is commodity Software capable of 16xSoftware capable of 16x Problems:Problems:

>4 not commodity>4 not commodity Scale-down problem Scale-down problem

(starter systems expensive)(starter systems expensive) There There isis a BIGGEST one a BIGGEST one

SMP superSMP superserverserver

DepartmentalDepartmentalserverserver

PersonalPersonalsystemsystem

Page 14: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

Tpc-C Web-Based BenchmarksTpc-C Web-Based Benchmarks Client is a Web browser Client is a Web browser

(9,200 of them!)(9,200 of them!) Submits Submits

OrderOrder InvoiceInvoice Query to server via Web Query to server via Web

page interfacepage interface

Web server translates to DBWeb server translates to DB SQL does DB workSQL does DB work Net: Net:

easy to implement easy to implement performance is GREAT!performance is GREAT!

HT

TP

HT

TP

OD

BC

OD

BC

SQL SQL

IISIIS= Web= Web

Page 15: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

TPC-C TPC-C Shows How Far SMPs have comeShows How Far SMPs have comeTPC-C TPC-C Shows How Far SMPs have comeShows How Far SMPs have come Performance is amazing: Performance is amazing:

2,000 users is the min!2,000 users is the min! 30,000 users on a 4x12 alpha cluster (Oracle)30,000 users on a 4x12 alpha cluster (Oracle)

Peak Performance: Peak Performance: 30,390 tpmC30,390 tpmC @ $305/tpmC @ $305/tpmC (Oracle/DEC)(Oracle/DEC)

Best Price/Perf: 7,693 tpmC @ Best Price/Perf: 7,693 tpmC @ $43/tpmC$43/tpmC ( (MS SQL/Dell)MS SQL/Dell)

graphs show UNIX high price & diseconomy of scaleupgraphs show UNIX high price & diseconomy of scaleuptpmC & Price Performance(only "best" data shown for each vendor)

0

50

100

150

200

250

300

350

400

0 5000 10000 15000 20000

tpmC

$/t

pm

C

DB2

Informix

MS SQL Server

Oracle

Sybase

Page 16: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

TPC C SMP PerformanceTPC C SMP Performance

tpmC vs CPS

0

5,000

10,000

15,000

20,000

0 5 10 15 20

CPUs

tpm

C

SUN Scaleability

0

2,000

4,000

6,000

8,000

10,000

12,000

14,000

16,000

18,000

20,000

0 5 10 15 20

cpus

tpm

C

SUN Scaleability

SQL Server

• SMPs do offer speedup but 4x P6 is better than some 18x MIPSco

Page 17: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

18

What Happens To Prices?What Happens To Prices? No expensive UNIX front end (20$/tpmC)No expensive UNIX front end (20$/tpmC) No expensive TP monitor software (10$/tpmC)No expensive TP monitor software (10$/tpmC)

=> => 65$/tpmC65$/tpmCTPC Price/tpmC

164

93

188

39

66 64

54

3944

66

44 4440

42

31

3835

38

22

41

18

35

16

39

45

30

8

19

27

40

3

21

0

10

20

30

40

50

60

70

80

90

100

processor disk software net

Informix on SNIOracle on DEC UnixOracle on Compaq/NTSybase on Compaq/NTMicrosoft on Compaq with VisigenicsMicrosoft on HP with VisagenicsMicrosoft on Intergraph with IISMicrosoft on Compaq with IIS

Page 18: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

Building the Largest NT NodeBuilding the Largest NT Node Build a 1 TB SQL Server databaseBuild a 1 TB SQL Server database

Show off NT and SQL Server ScaleabilityShow off NT and SQL Server Scaleability Stress test the productStress test the product

Demo it on the InternetDemo it on the Internet WWW accessible by anyoneWWW accessible by anyone

So data must beSo data must be 1 TB1 TB UnencumberedUnencumbered Interesting to everyone everywhereInteresting to everyone everywhere ANDAND not offensive to anyone anywhere not offensive to anyone anywhere

Page 19: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

What’s TeraByte?What’s TeraByte? 1 Terabyte:1 Terabyte: 1,000,000,000 business letters 150 miles of book shelf1,000,000,000 business letters 150 miles of book shelf 100,000,000 book pages 100,000,000 book pages 15 miles of book shelf 15 miles of book shelf 50,000,000 FAX images50,000,000 FAX images 7 miles of book shelf 7 miles of book shelf 10,000,000 TV pictures (mpeg) 10 days of video 10,000,000 TV pictures (mpeg) 10 days of video

4,000 LandSat images 4,000 LandSat images 16 earth images (100m) 16 earth images (100m) 100,000,000 web page 10 copies of the web HTML100,000,000 web page 10 copies of the web HTML

Library of Congress (in ASCII) is 25 TBLibrary of Congress (in ASCII) is 25 TB 1980: $200 million of disc1980: $200 million of disc 10,000 discs 10,000 discs

$5 million of tape silo$5 million of tape silo 10,000 tapes 10,000 tapes

1997: 200 k$ of magnetic disc 48 discs1997: 200 k$ of magnetic disc 48 discs 30 k$ nearline tape 20 tapes30 k$ nearline tape 20 tapes

Terror Byte !Terror Byte !

Page 20: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

The PlanThe Plan DEC Alpha + DEC Alpha + 324 StorageWorks 324 StorageWorks

Drives (1.4 TB)Drives (1.4 TB) 30K BTU, 30K BTU,

8 KW, 8 KW, 1.5 metric tons.1.5 metric tons.

SQL 7.0SQL 7.0 USGS dataUSGS data

(1 meter)(1 meter) Russian SpaceRussian Space

data (2 meter)data (2 meter)

DEC 41004 x 400 Mhz

Alpha Processors4GB DRAM

Microsoft

BackOffice

SPIN-2

Page 21: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

Image Data SourcesImage Data Sources

Spin-2500 GBWorldWideLoB AppNew Data Coming

DOQ

300 GBSrc: USGS& UCSBUCSB missingsomeDOQs

Page 22: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

DOQ coverage of the USDOQ coverage of the US

1 Meter images of many places1 Meter images of many places Problems: Problems:

most of data not yet published most of data not yet published interesting places missing interesting places missing

(LA, Portland, SD, Anchorage,…) (LA, Portland, SD, Anchorage,…)

Loaded published 130 GB.Loaded published 130 GB. CRDA for unpublished 3 TBCRDA for unpublished 3 TB

Page 23: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

SPIN-2 SPIN-2 CoverageCoverage

The rest of the worldThe rest of the world The US Government can’t help, but....The US Government can’t help, but.... The Russian Space Agency is eager to cooperate.The Russian Space Agency is eager to cooperate. 2 Meter Geo Rectified imagery of anywhere2 Meter Geo Rectified imagery of anywhere More data coming, Earth has ~ 500 TeraMetersMore data coming, Earth has ~ 500 TeraMeters22

=> ~30 Tera Bytes of Land at 2x2 Meter=> ~30 Tera Bytes of Land at 2x2 Meter => we need 3% of the land (Urban World = the red stuff)=> we need 3% of the land (Urban World = the red stuff)

Page 24: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

Demo InterfaceDemo Interface

Page 25: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

Grow UP and OUT Grow UP and OUT

1 billion 1 billion transactions transactions

per dayper day

SMP superSMP superserverserver

DepartmentalDepartmentalserverserver

PersonalPersonalsystemsystem

1 Terabyte DB1 Terabyte DB

Cluster: •a collection of nodes •as easy to program and manage as a single node

Page 26: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

Clusters Have AdvantagesClusters Have Advantages

Clients and servers made from the same stuffClients and servers made from the same stuff Inexpensive: Inexpensive:

Built with commodity components Built with commodity components

Fault tolerance: Fault tolerance: Spare modules mask failuresSpare modules mask failures

Modular growthModular growth Grow by adding small modulesGrow by adding small modules

Unlimited growth: Unlimited growth: no biggest oneno biggest one

Page 27: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

Billion Transactions per Day ProjectBillion Transactions per Day Project

Built a 45-node Windows NT Cluster Built a 45-node Windows NT Cluster (with help from Intel & Compaq) (with help from Intel & Compaq) > 900 disks> 900 disks

All off-the-shelf partsAll off-the-shelf parts Using SQL Server & Using SQL Server &

DTC distributed transactionsDTC distributed transactions DebitCredit TransactionDebitCredit Transaction Each node has 1/20 th of the DB Each node has 1/20 th of the DB Each node does 1/20 th of the workEach node does 1/20 th of the work 15% of the transactions are “distributed”15% of the transactions are “distributed”

Page 28: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

How Much Is 1 Billion How Much Is 1 Billion Transactions Per Day?Transactions Per Day?

Millions of transactions per dayMillions of transactions per day

0.10.1

1.1.

10.10.

100.100.

1,000.1,000.

1 B

tpd

1 B

tpd

Vis

aV

isa

AT

&T

AT

&T

Bo

fAB

ofA

NY

SE

NY

SE

Mtp

dM

tpd

1 Btpd = 11,574 tps 1 Btpd = 11,574 tps (transactions per second)(transactions per second) ~ 700,000 tpm ~ 700,000 tpm (transactions/minute)(transactions/minute)

AT&T AT&T 185 million calls 185 million calls

(peak day worldwide)(peak day worldwide) Visa ~20 M tpdVisa ~20 M tpd

400 M customers400 M customers 250,000 ATMs worldwide250,000 ATMs worldwide 7 billion transactions / year 7 billion transactions / year

(card+cheque) in 1994 (card+cheque) in 1994

Page 29: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

Type nodes CPUs DRAM ctlrs disks RAIDspace

WorkflowMTS

20CompaqProliant

2500

20x

2

20x

128

20x

1

20x

1

20x

2 GB

SQL Server

20CompaqProliant

5000

20x

4

20x

512

20x

4

20x36x4.2GB7x9.1GB

20x

130 GB

DistributedTransactionCoordinator

5CompaqProliant

5000

5x

4

5x

256

5x

1

5x

3

5x

8 GB

TOTAL 45 140 13 GB 105 895 3 TB

Billion Transactions Per Day HardwareBillion Transactions Per Day Hardware 45 nodes (Compaq Proliant)45 nodes (Compaq Proliant) Clustered with 100 Mbps Switched EthernetClustered with 100 Mbps Switched Ethernet 140 cpu, 13 GB, 3 TB.140 cpu, 13 GB, 3 TB.

Page 30: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

1.2 B tpd1.2 B tpd 1 B tpd ran for 24 hrs.1 B tpd ran for 24 hrs. Sized for 30 daysSized for 30 days Linear growthLinear growth 5 micro-dollars per 5 micro-dollars per

transactiontransaction Out-of-the-box Out-of-the-box

softwaresoftware Off-the-shelf hardwareOff-the-shelf hardware AMAZING!AMAZING!

Page 31: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

Other StuntsOther Stunts 100 M Web Hits/day on one server100 M Web Hits/day on one server

(=1,300 hits/sec, Web Mark HTML server)(=1,300 hits/sec, Web Mark HTML server)

Email server (exchange)Email server (exchange) 50 GB database 50 GB database (up from 16GB, limit now 16TB)(up from 16GB, limit now 16TB)

50 k POP3 users (1.5 M msg/day)50 k POP3 users (1.5 M msg/day)

64-bit addressing SQL Server64-bit addressing SQL Server SAP Failover SAP Failover Theme: Theme:

conventional stuff is easyconventional stuff is easy

Page 32: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

Thesis: Scaleable ServersThesis: Scaleable Servers Scaleable ServersScaleable Servers

Commodity hardware allows new applicationsCommodity hardware allows new applications New applications need huge serversNew applications need huge servers Clients and servers are built of the same “stuff”Clients and servers are built of the same “stuff”

Commodity software and Commodity software and Commodity hardwareCommodity hardware

Servers should be able to Servers should be able to Scale up Scale up (grow node by adding CPUs, disks, networks)(grow node by adding CPUs, disks, networks)

Scale out Scale out (grow by adding nodes)(grow by adding nodes)

Scale down Scale down (can start small)(can start small)

Key software technologiesKey software technologies Objects, Transactions, Clusters, ParallelismObjects, Transactions, Clusters, Parallelism

Page 33: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

ParallelismParallelismThe OTHER aspect of clustersThe OTHER aspect of clusters

Clusters of machines Clusters of machines allow two kinds allow two kinds of parallelismof parallelism Many little jobs: online Many little jobs: online

transaction processingtransaction processing TPC-A, B, C…TPC-A, B, C…

A few big jobs: data A few big jobs: data search and analysissearch and analysis TPC-D, DSS, OLAPTPC-D, DSS, OLAP

Both give Both give automatic parallelismautomatic parallelism

Page 34: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

Kinds of Parallel ExecutionKinds of Parallel Execution

Pipeline

Partition outputs split N ways inputs merge M ways

Any Sequential Program

Any Sequential Program

Any Sequential

Any Sequential Program Program

Page 35: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

Data RiversData Rivers Split + Merge StreamsSplit + Merge Streams

River

M ConsumersN producers

Producers add records to the river, Consumers consume records from the riverPurely sequential programming.River does flow control and buffering

does partition and merge of data records River = Split/Merge in Gamma = Exchange operator in Volcano.

N X M Data Streams

Page 36: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

Partitioned ExecutionPartitioned Execution

A...E F...J K...N O...S T...Z

A Table

Count Count Count Count Count

Count

Spreads computation and IO among processors

Partitioned data gives NATURAL parallelism

Page 37: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

N x M way ParallelismN x M way Parallelism

A...E F...J K...N O...S T...Z

Merge

Join

Sort

Join

Sort

Join

Sort

Join

Sort

Join

Sort

Merge Merge

N inputs, M outputs, no bottlenecks.

Partitioned DataPartitioned and Pipelined Data Flows

Page 38: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

Clusters (Plumbing)Clusters (Plumbing)

Single system imageSingle system image namingnaming protection/securityprotection/security management/load balancemanagement/load balance

Fault ToleranceFault Tolerance Wolfpack Wolfpack

Hot Pluggable hardware & SoftwareHot Pluggable hardware & Software

Page 39: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

Windows NT Windows NT clustersclusters Key goals:Key goals:

Easy: to install, manage, programEasy: to install, manage, program Reliable: better than a single nodeReliable: better than a single node Scaleable: added parts add powerScaleable: added parts add power

Microsoft & 60 vendors Microsoft & 60 vendors defining NT clustersdefining NT clusters Almost all big hardware and Almost all big hardware and

software vendors involvedsoftware vendors involved No special hardware needed - No special hardware needed -

but it may helpbut it may help Enables Enables

Commodity fault-toleranceCommodity fault-tolerance Commodity parallelism Commodity parallelism

(data mining, virtual reality…)(data mining, virtual reality…) Also great for workgroups!Also great for workgroups!

Initial: two-node failoverInitial: two-node failover Beta testing since December96Beta testing since December96 SAP, Microsoft, Oracle giving SAP, Microsoft, Oracle giving

demos.demos. File, print, Internet, mail, DB, other File, print, Internet, mail, DB, other

servicesservices Easy to manageEasy to manage Each node can be 4x (or more) SMPEach node can be 4x (or more) SMP

Next (NT5) “Wolfpack” is modest Next (NT5) “Wolfpack” is modest size clustersize cluster About 16 nodes (so 64 to 128 CPUs)About 16 nodes (so 64 to 128 CPUs) No hard limit, algorithms designedNo hard limit, algorithms designed

to go furtherto go further

Page 40: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

So, What’s New?So, What’s New? When slices cost 50k$, you buy 10 or 20.When slices cost 50k$, you buy 10 or 20. When slices cost 5k$ you buy 100 or 200.When slices cost 5k$ you buy 100 or 200. Manageability, programmability, usability Manageability, programmability, usability

become key issues (total cost of become key issues (total cost of ownership).ownership).

PCs are MUCH easier to use and programPCs are MUCH easier to use and program

New MPP &NewOS

New App

New MPP &NewOS

New App

New MPP &NewOS

New App

New MPP &NewOS

New App

Customers

MPPVicious CycleNo Customers!

CP/CommodityVirtuous Cycle:Standards allow progressand investment protection

Apps

Standardplatform

Page 41: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

Thesis: Scaleable ServersThesis: Scaleable Servers Scaleable ServersScaleable Servers

Commodity hardware allows new applicationsCommodity hardware allows new applications New applications need huge serversNew applications need huge servers Clients and servers are built of the same “stuff”Clients and servers are built of the same “stuff”

Commodity software and Commodity software and Commodity hardwareCommodity hardware

Servers should be able to Servers should be able to Scale up Scale up (grow node by adding CPUs, disks, networks)(grow node by adding CPUs, disks, networks)

Scale out Scale out (grow by adding nodes)(grow by adding nodes)

Scale down Scale down (can start small)(can start small)

Key software technologiesKey software technologies Objects, Transactions, Clusters, ParallelismObjects, Transactions, Clusters, Parallelism

Page 42: Scaleable Servers Jim Gray MicrosoftGray@Microsoft.comGray

Objects Meet DatabasesObjects Meet DatabasesThe basis for The basis for universaluniversal

data servers, access, & integrationdata servers, access, & integration

DBMSDBMSengineengine

object-oriented (COM oriented) object-oriented (COM oriented) programming interface to dataprogramming interface to data

Breaks DBMS into componentsBreaks DBMS into components Anything can be a data sourceAnything can be a data source Optimization/navigation “on top Optimization/navigation “on top

of” other data sourcesof” other data sources A way to componentized a A way to componentized a

DBMSDBMS Makes an RDBMS and O-RMakes an RDBMS and O-R

DBMS (assumes optimizer DBMS (assumes optimizer understands objects)understands objects)

DatabaseDatabase

SpreadsheetSpreadsheet

PhotosPhotos

MailMail

MapMap

DocumentDocument