a tentative proposal for istore-2

33
A Tentative Proposal for ISTORE-2 Winfried W. Wilcke [email protected] (408) 927-2139 Almaden Research Center July 18, 2000 Richard C. Booth [email protected] m (408) 927-1879 Almaden Research Center David A. Patterson [email protected] (510) 642-6587 University of California, Berkeley

Upload: clover

Post on 22-Feb-2016

53 views

Category:

Documents


0 download

DESCRIPTION

A Tentative Proposal for ISTORE-2. July 18, 2000. David A. Patterson [email protected] (510) 642-6587 University of California, Berkeley. Winfried W. Wilcke [email protected] (408) 927-2139 Almaden Research Center. Richard C. Booth [email protected] (408) 927-1879 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Tentative Proposal for ISTORE-2

A Tentative Proposal for ISTORE-2

Winfried W. [email protected](408) 927-2139Almaden Research Center

July 18, 2000

Richard C. [email protected](408) 927-1879Almaden Research Center

David A. [email protected](510) 642-6587University of California, Berkeley

Page 2: A Tentative Proposal for ISTORE-2

Underlying Beliefs...• Commodity components are quickly winning the server

wars– Gigabit Ethernet will win everything– x86 Processors– Linux OS will prosper

• Large servers (100-10k nodes) will be quite common - and most are storage centric

• What matters most:– Ease of management, density of nodes and seamless

geographical interconnect

Page 3: A Tentative Proposal for ISTORE-2

Generations of IStore• IStore = IStore-1: Present UCB Project• IStore-2: Joint Research Prototype

– ~2000 nodes– Split between UCB, IBM and others– Hardware similar to IStore-1– Focus on real applications and management

software– Operational YE 2001

• Follow-on Work

Page 4: A Tentative Proposal for ISTORE-2

Talk Outline

• Project Goals• Applications• Research Topics• Hardware Architecture• Development Schedule• Working Relationships• Next Steps

Page 5: A Tentative Proposal for ISTORE-2

Applications&

Research Topics

Page 6: A Tentative Proposal for ISTORE-2

Candidate Applications• Research Focus

– NOAA Severe Weather Warning (R. Arps, ARC)– Fast Image Recognition (J. Malik, UCB)

• Commercial Focus– Scalable E-business server (IGS) - a must !– Deep Searching of Entire Web; Webfountain (N. Pass)– (tbd) Large Scale Network Attached Server (J. Palmer)– (tbd) Speech Recognition Farms for Phone-based Special Web-

services

Page 7: A Tentative Proposal for ISTORE-2

NOAA Severe Weather.... Ron Arps

• Doppler Radar enables detection of violent tornadoes and plane crashes due to windshear

• Doubled warning time for residents in Oklahoma during '99 class 5 outbreaks– Goal: 15 minutes avg. warning time in 2004

• Eventually 120 radar sites will be established• Matches well with I-Store characteristics

– Needs scalable local storage/processing plus seamless transfer of data on geographical scale, manageable from one site

Page 8: A Tentative Proposal for ISTORE-2
Page 9: A Tentative Proposal for ISTORE-2

WebfountainNorm Pass

• Index entire Web every few weeks– Google, Northernlight index 25%

• 4 TB index => 200 TB in two years• 'Miner' technology demonstrated

– Resumes, Prices, Geospatial,...– Prototype running on a 30 node Linux farm

Page 10: A Tentative Proposal for ISTORE-2

Software Model

• Users will see a standard Linux farm (shared nothing) programming model– No porting effort for existing Linux farm

applications (except dealing with different versions of Linux, of course)

• The system management functions are only visible to system administrators– Exception are performance monitoring functions

useful for tuning apps

Page 11: A Tentative Proposal for ISTORE-2

Differences to a Linux Farm• Much higher spatial density of Nodes or ‘Bricks’• Single network protocol (Ethernet) for ALL off-node

communications• Design with geographical distribution in mind • Diagnostic Processors• Lego-like, standardized building blocks

– Regular and relaxed homogeneous• Monitoring Hardware• Measuring of relevant environmental parameters• (New) System Management Language

• AME, SON and RAIN objectives

Page 12: A Tentative Proposal for ISTORE-2
Page 13: A Tentative Proposal for ISTORE-2

AME, RAIN and SON

• Three areas of system research to be explored with I-Store

• These three areas are largely independent of each other

Page 14: A Tentative Proposal for ISTORE-2

AME• Availability

– No single points of failure– Introspection, failover and fast failure– Fast repair by swapping identical blocks

• Maintainability– Homogenous structure– System management language

• Extensibility/Scalability– Shared nothing architecture

Page 15: A Tentative Proposal for ISTORE-2

RAIN• Redundant Array of Inexpensive Network

(Switches)• Issues to be explored

– Optimal topology – Density/cost of ports, optics vs. copper– Routing algorithms within a machine– Need for TCP hardware acceleration– Performance of Ethernet protocol– Frame sizes– Simplified switches

Page 16: A Tentative Proposal for ISTORE-2

SON

• Storage Oriented Nodes• Basic Premise of one node=one disk=one

processor– It works in farms, but is it a good general choice?– Is the loss of flexibility (in the ratio of disks per

processor) a good tradeoff for easier management?

Page 17: A Tentative Proposal for ISTORE-2

Additional Software Research Topics...

• Define AME, RAIN, SON benchmarks• Server Management Language• Parallel Searching of geographically

distributed database• Dynamic Resource Allocation (i.e. Firewalls)• SCSI over TCP/IP (SAN within I-Store)• Storage for mobile users (a’la Ocean Store)

Page 18: A Tentative Proposal for ISTORE-2

System Management Language

• Define a high-level, interpretive(?) system management language– May use facilities of system OS

• Highly regular I-Store is the first target• Sample Verbs

– allocate, protect, share, map, backup, restore, copy, correlate, display, discover, ping, initialize, report, arm, define(node)....

Page 19: A Tentative Proposal for ISTORE-2

System Management Language• Should easily describe tasks such as:

– Backup all data located in the Philippines to Colorado (a volcano is about to blow)

– Set alarm if any disk is more than 80% full– Define protected subregions in the system– Display CPU utilization by time and state– Discover present routing topology– Show 3D correlation plot of disk vibration vs brick

temperature vs. actual failure events– .....

Page 20: A Tentative Proposal for ISTORE-2

Hardware ArchitectureDevelopment Schedule

&Working Relationships

Page 21: A Tentative Proposal for ISTORE-2

IStore HardwareArchitecture Goals

• Seamless Scalability– O(10,000) AME Storage Nodes– Optimized Storage Brick for Packaging Density

• Geographically Disperse Nodes– Gb Ethernet Connections to WAN Routers

• Storage Brick – Full PME Brick: Processor, Memory, Cache– Gb Ethernet as the Sole Interconnection Fabric– Imbedded Disk with 10s GBytes

Page 22: A Tentative Proposal for ISTORE-2

IStore HardwareArchitecture Goals (cont.)

• State-of-the-art Intel Processor Memory Element (PME) – 650 MHz Pentium III with 100 MHz System Bus– 256 KB L2 cache– O(512MB) main memory

• State-of-the-art Interconnect Fabric– 1 Gb Ethernet Runtime Network– 10/100 Mb Ethernet Diagnostic Network

• State-of-the-art Disks– 2.5" ~32 GB drive

Page 23: A Tentative Proposal for ISTORE-2

IStore HardwareArchitecture Goals (cont.)

• Berkeley AME Hardware Management Support– Diagnostic processor– Environmental sensors

• TCP/IP Hardware Accelerator– Class 4: Hardware State Machine

• SCSI over TCP ("iSCSI") Support• Compatible with Standard Ethernet

Switches/Routers

Page 24: A Tentative Proposal for ISTORE-2

IStore-1Current Berkeley Design

• 80 nodes• AME• 266 MHz Pentium II• Four 100 MB Ethernet Ports/brick• Integrated UPS

Page 25: A Tentative Proposal for ISTORE-2

IStore-2Deltas from IStore-1

• Geographically Disperse Nodes– O(1000) nodes at Almaden– O(1000) nodes at Berkeley

• Upgraded Storage Brick– Pentium III 650 MHz Processor– Two Gb Ethernet Copper Ports/brick– One 2.5" ATA disk

• User Supplied UPS Support• Standard Ethernet Switches

Page 26: A Tentative Proposal for ISTORE-2

Follow on Work

• Ethernet Sourced in Memory Controller (North Bridge)

• TCP/IP Hardware Accelerator– Class 4: Hardware State Machine

• SCSI over TCP Support• Integrated UPS

Page 27: A Tentative Proposal for ISTORE-2

Why an IStore-2 PrototypeIs Interesting

• Storage Bricks– New ratios for MIPS/bandwidth/storage– New level of density

• AME Hardware Support– Seamless scaling– Self maintaining nodes

• It Exists

Page 28: A Tentative Proposal for ISTORE-2

IStore-2Core Design Team

• IBM (full time)– System Architect: Winfried Wilcke– Lead Designer: Richard Booth– 1 Experienced Hardware Designer: tbd– 3 Designers: tbd

• Berkeley– 6 Graduate Students

Page 29: A Tentative Proposal for ISTORE-2

IStore-2Development Schedule

• Working Model– 7/00: Agreement in Principle– 8/00: Working Team Membership

• Design– 9/00: Architecture Specification version 1.0– 11/00: Design Workbook version 1.0

• Implementation– 2Q/01: First 3 Nodes Power-up– 3Q/01: O(64) nodes available to users– 4Q/01: O(2000) nodes available to users

Page 30: A Tentative Proposal for ISTORE-2

IStore-2 Footprint(per 1000 nodes)

• 16 Storage (19") Racks – 64 Storage bricks/rack

• 8 type 1 storage bricks/drawer• 8 storage drawers/rack

– Ethernet switches in rack• 8 Global Ethernet Switch (19") Racks• Requires 600 sq.. ft lab

Page 31: A Tentative Proposal for ISTORE-2

IStore-2 PlatformRequired Resources

• Staffing– 6 ARC/SSD IBMers– 6 UCB Graduate Students

• Lab Space– 600 sq. ft. lab at Almaden– 600 sq. ft. lab at Berkeley

• Hardware Costs– $3M (mostly 2001 dollars)

Page 32: A Tentative Proposal for ISTORE-2

IStore-2Working Model

• Jointly Authored Architecture Specification– 1 or 2 Almaden authors– 1 or 2 Berkeley authors

• Design Workbook– Each Core Team Member owns a section

• Weekly Half Day Working Face-to-face Meetings– Alternate between Almaden and Berkeley

• Shared Electronic Documentation• Machine Available -for free- to Users From Either Institution• IP is Handled Like Previous IBM/UCB Projects ??• Fabrication (some design ?) Vendored Out

Page 33: A Tentative Proposal for ISTORE-2

Next Steps• Continue to Seek Feedback on Proposal• Funding Discussion

– IBM– Berkeley

• Form IBM Team• Begin Regular Working Meetings• Begin Architectural Design