modern & scalable network designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfmodern...

53
Modern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta[email protected] @tsunanet

Upload: others

Post on 20-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Modern & ScalableNetwork Designs for

Benoît “tsuna” SigoureMember of the Yak Shaving [email protected] @tsunanet

Page 3: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

1970

A Brief History of Network Software

Custommonolithicembedded

OS 19801990

2000

ModifiedBSD, QNXor Linux

kernel base2010

Page 4: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

1970

A Brief History of Network Software

Custommonolithicembedded

OS

“The switchis just a

Linux box”

19801990

2000

ModifiedBSD, QNXor Linux

kernel base2010

Page 5: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

1970

A Brief History of Network Software

Custommonolithicembedded

OS

“The switchis just a

Linux box”

19801990

2000

ModifiedBSD, QNXor Linux

kernel base2010

EOS = Linux

Page 6: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

1970

A Brief History of Network Software

Custommonolithicembedded

OS

“The switchis just a

Linux box”

19801990

2000

ModifiedBSD, QNXor Linux

kernel base2010

EOS = Linux

Custom ASIC

Page 7: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Inside a Modern Switch

Arista 7050Q-16

Page 8: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Inside a Modern Switch

2 PSUs4 Fans

Arista 7050Q-16

Page 9: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Inside a Modern Switch

2 PSUs4 Fans

DDR3

x86 CPUUSB FlashSATA SSD

Arista 7050Q-16

Page 10: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Inside a Modern Switch

2 PSUs4 Fans

DDR3

x86 CPU

Networkchip(s)

USB FlashSATA SSD

Arista 7050Q-16

Page 11: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Inside a Modern Switch

2 PSUs4 Fans

DDR3

x86 CPU

Networkchip(s)

USB FlashSATA SSD

Arista 7050Q-16

Page 12: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Command API

It’s All About The Software

Zero TouchReplacement VmTracer CloudVision Email Wireshark Fast

Failover DANZ Fabric Monitoring

Page 13: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Command API

It’s All About The Software

Zero TouchReplacement VmTracer CloudVision Email Wireshark Fast

Failover DANZ Fabric Monitoring

OpenTSDB

Page 14: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Command API

#!/bin/bash C++

It’s All About The Software

Zero TouchReplacement VmTracer CloudVision Email Wireshark Fast

Failover DANZ Fabric Monitoring

OpenTSDB

Page 15: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Command API

#!/bin/bash C++

It’s All About The Software

Zero TouchReplacement VmTracer CloudVision Email Wireshark Fast

Failover DANZ Fabric Monitoring

OpenTSDB$ sudo yum install <pkg>$ sudo yum install <pkg>

Page 16: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Command APIOr a CLI over JSON-RPC

Page 18: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Command APIOr a CLI over JSON-RPC

Page 19: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Command APIOr a CLI over JSON-RPC

Page 20: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Command APIOr a CLI over JSON-RPC

import pexpect, reconnector = pexpect.spawn("ssh [email protected]")connector.expect(".ssword:*")connector.sendline(password)index = connector.expect([">","#"])if index == 0: connector.sendline("enable") index = connector.expect(["assword", "#"]) if index == 0: connector.sendline(enable) connector.expect("#")connector.sendline("show version")connector.expect("#")output = connector.beforeswitches = re.findall("^(\*?)\s+(\d)\s+\d+\s+([A-Z\d-]+)", output, re.MULTILINE)for master, num, model in switches: ...

Page 21: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Command APIOr a CLI over JSON-RPC

import pexpect, reconnector = pexpect.spawn("ssh [email protected]")connector.expect(".ssword:*")connector.sendline(password)index = connector.expect([">","#"])if index == 0: connector.sendline("enable") index = connector.expect(["assword", "#"]) if index == 0: connector.sendline(enable) connector.expect("#")connector.sendline("show version")connector.expect("#")output = connector.beforeswitches = re.findall("^(\*?)\s+(\d)\s+\d+\s+([A-Z\d-]+)", output, re.MULTILINE)for master, num, model in switches: ...

Traditional approach:“screen scraping”

Page 22: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Command APIOr a CLI over JSON-RPC

import pexpect, reconnector = pexpect.spawn("ssh [email protected]")connector.expect(".ssword:*")connector.sendline(password)index = connector.expect([">","#"])if index == 0: connector.sendline("enable") index = connector.expect(["assword", "#"]) if index == 0: connector.sendline(enable) connector.expect("#")connector.sendline("show version")connector.expect("#")output = connector.beforeswitches = re.findall("^(\*?)\s+(\d)\s+\d+\s+([A-Z\d-]+)", output, re.MULTILINE)for master, num, model in switches: ...

Traditional approach:“screen scraping”

from jsonrpclib import Serverswitch = Server("http://admin:" + passwrd + "@1.2.3.4/command-api")

response = switch.runCmds(1, ["show privilege"])if response[0]["privilegeLevel"] != 15: switch.runCmds(1, [{"cmd": "enable", "input": enable}])

response = switch.runCmds(1, ["show version"])for num, switch in enumerate(response[0]["stack"]): # use switch["model"] and switch["master"]

Compare to:A Standard HTTP API

Page 23: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Integration

Neutron

Arista Driver

Datacenter Leaf Layer

Red VLAN

Blue VLAN

TOR1 TOR2 TOR3

VM1 VM2VM1 VM2 VM3 VM4VM3

Host 3Host 1 Host 4 Host 6Host 2 Host 5

EAPI

Page 25: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Server 1

Server 2

Server 3

Server N

Rack-1

Server 1

Server 2

Server 3

Server N

Rack-2

Server 1

Server 2

Server 3

Server N

Rack-3

5 to 7 racks, 7 to 9 switchesScale: 240 to 336 nodesOversubscription ratio 1:3

2x40Gbpsuplinks

Server 1

Server 2

Server 3

Server N

Rack-4

Server 1

Server 2

Server 3

Server N

Rack-5

Page 26: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Server 1

Server 2

Server 3

Server N

Rack-1

Server 1

Server 2

Server 3

Server N

Rack-2

56 racks, 8 core switchesScale: 3136 nodesOversubscription ratio 1:3

8x10Gbpsuplinks

Server 1

Server 2

Server 3

Server N

Rack-55

Server 1

Server 2

Server 3

Server N

Rack-56

...

...

...

Page 27: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Server 1

Server 2

Server 3

Server N

Rack-1

Server 1

Server 2

Server 3

Server N

Rack-2

1152 racks, 16 core modular switchesScale: 55296 nodesOversubscription ratio still 1:3

16x10Gbpsuplinks

Server 1

Server 2

Server 3

Server N

Rack-1151

Server 1

Server 2

Server 3

Server N

Rack-1152

...

...

...

Page 28: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Build Layer 3 Networks

3 layers of pink Cake by Catherine Scott

Page 29: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Layer 2 vs Layer 3

• Layer 2: “data link layer” – Ethernet “bridge”• Layer 3: “network layer” – IP “router”

A B C D E F G H

Uplinksblockedby STP

Layer 2

E, F

, G, H

← A, B, C, D

E, F, G, H →

← A

, B, C

, D

A B C D E F G H

ECMP:0.0.0.0/0 via core1 via core2

Layer 3x

Page 30: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Layer 2 vs Layer 3

• Layer 2: “data link layer” – Ethernet “bridge”• Layer 3: “network layer” – IP “router”

A B C D E F G H

E, F

, G, H

← A, B, C, D

E, F, G, H →

← A

, B, C

, D

A B C D E F G H

ECMP:0.0.0.0/0 via core1 via core2

Layer 3Layer 2with vendor magic x

Page 31: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Layer 2 vs Layer 3

• Layer 2: “data link layer” – Ethernet “bridge”• Layer 3: “network layer” – IP “router”

A B C D E F G H

E, F

, G, H

← A, B, C, D

E, F, G, H →

← A

, B, C

, D

A B C D E F G H

ECMP:0.0.0.0/0 via core1 via core2

Layer 3Layer 2with vendor magic x

Page 34: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

1G vs 10G

• 1Gbps is today what 100Mbps was yesterday

• New generation servers come with 10G LAN on board

• At 1Gbps, the network is one of the first bottlenecks

• A single HBase client can very easily push over 4Gbps to HBase 0

175

350

525

700

TeraGen in seconds

1Gbps 10Gbps

3.5x

Page 36: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Jumbo Frames

0

85

170

255

340

Context switches

MTU=1500 MTU=9212

11%

0

125

250

375

500

Soft IRQ CPU Time

15%

(millions) (millions) (thousand jiffies)

0

500

1000

1500

2000

Packets per TeraGen

MTU=1500 MTU=9212

3x less

Page 37: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

«Hardware» buffers by Roger Ferrer Ibáñez

Deep Buffers

Page 38: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

«Hardware» buffers by Roger Ferrer Ibáñez

Deep Buffers

0

38

75

113

150

Packet buffer per 10G port (MB)

Ari

sta

7500

Ari

sta

7500

E

Indu

stry

Ave

rage

1.5

48

128

Page 39: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

0

250000

500000

750000

1000000

Packets dropped per TeraGen

1MB 4:11MB 5.33:148MB 5.33:1

Zero!

4x10G ⇒ 4:13x10G ⇒ 5.33:1

...

16 hostsx 10G

16 hostsx 10G

...

Deep Buffers

1k T

CP

slo

w s

tart

/sec

Page 40: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

RAILMaking the NetworkWork Better With

&

Page 41: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Host Failures in Hadoop Clusters

Server 1

Server 2

Server 3

Server NRack-1 Rack-2 Rack-3

Server 1

Server 2

Server 3

Server N

Server 1

Server 2

Server 3

Server N

Server 1

Server 2

Server 3

Server NRack-N

• Hardware failures• Kernel panics• Operator errors• NIC driver bugs

Page 42: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

• RegionServer: wait for ZooKeeper lease timeoutTypical: ~30s

• DataNode: wait for heartbeats to timeout enough for NameNode to declare deadTypical: ~10min (or ~30s with dfs.namenode.check.stale.datanode see HDFS-3703)

Host Failures for HBase & Hadoop

Page 43: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

• RegionServer: wait for ZooKeeper lease timeoutTypical: ~30s

• DataNode: wait for heartbeats to timeout enough for NameNode to declare deadTypical: ~10min (or ~30s with dfs.namenode.check.stale.datanode see HDFS-3703)

Host Failures for HBase & Hadoop

Page 44: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

• RegionServer: wait for ZooKeeper lease timeoutTypical: ~30s

• DataNode: wait for heartbeats to timeout enough for NameNode to declare deadTypical: ~10min (or ~30s with dfs.namenode.check.stale.datanode see HDFS-3703)

Host Failures for HBase & Hadoop

Page 45: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

• Redirect traffic to the switch• Have the switch send TCP resets• Immediately kills all active and new flows

10.4.5.1

10.4.5.3

10.4.6.1

10.4.6.2

10.4.6.3

10.4.5.254 10.4.6.254

10.4.5.2

How Does This Work?

Page 46: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

10.4.5.2

• Redirect traffic to the switch• Have the switch send TCP resets• Immediately kills all active and new flows

10.4.5.1

10.4.5.2

10.4.5.3

10.4.6.1

10.4.6.2

10.4.6.3

10.4.5.254 10.4.6.25410.4.5.210.4.5.2

EOS = Linux

How Does This Work?

Page 47: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

MapReduceTracer

For

Page 48: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

MapReduce Tracer

• Keep track of nodes in the Hadoop cluster• Poll statistics about Hadoop activity:‣What jobs are running, when, where‣How “big” each job is, network-wise‣Historical data of MapReduce activity‣MapReduce vs HDFS traffic accounting

• CLI integration for real-time visibility• Multi-cluster support

Page 49: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Server 1

Server 2

Server 3

Server 16

Rack-1

Server 17

Server 18

Server 19

Server 32

Rack-2

40Guplinks

• 2 racks, 16 hosts each• L3 network (/27 per rack,

ECMP)• Both TOR use the following

additional config: monitor hadoop cluster batch jobtracker host 172.19.33.1 jobtracker user tsuna tasktracker http-port 10060 no shutdown

• No change required to Hadoop• No plugin, config change, or

additional daemon to run

Page 50: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Server 1

Server 2

Server 3

Server 16

Rack-1

Server 17

Server 18

Server 19

Server 32

Rack-2

NameNodeJobTracker

DN / TT

DN / TT

DN / TT

DN / TT

DN / TT

DN / TT

DN / TT

MapReduceTracer

MapReduceTracer

40Guplinks

• 2 racks, 16 hosts each• L3 network (/27 per rack,

ECMP)• Both TOR use the following

additional config: monitor hadoop cluster batch jobtracker host 172.19.33.1 jobtracker user tsuna tasktracker http-port 10060 no shutdown

• No change required to Hadoop• No plugin, config change, or

additional daemon to run

Page 51: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Hadoop Versions Supported• Apache Hadoop:

0.20.203 to 1.2.1• Cloudera’s distribution:

CDH3 and CDH4• HortonWorks’ distribution:

HDP 1.1, 1.2, 1.3• Microsoft HDInsight 0.10

• Incompatible:Apache 0.20.x and earlierApache 0.21.x, 0.22.x, 0.23.xApache 2.0.x

Page 52: Modern & Scalable Network Designs forfiles.meetup.com/12607092/nyc-hadoop-meetup-sm.pdfModern & Scalable Network Designs for Benoît “tsuna” Sigoure Member of the Yak Shaving Sta!

Thank You