scale, reliability and programmability for global internet...

24
Confidential + Proprietary Confidential + Proprietary Taking the Edge off with Espresso Scale, Reliability and Programmability for Global Internet Peering KK Yap, Murtaza Motiwala, Jeremy Rahe, Steve Padgett, Matthew Holliman, Gary Baldus, Marcus Hines, Taeeun Kim, Ashok Narayanan, Ankur Jain, Victor Lin, Colin Rice, Brian Rogan, Arjun Singh, Bert Tanaka, Manish Verma, Puneet Sood, Mukarram Tariq, Matt Tierney, Dzevad Trumic, Vytautas Valancius, Calvin Ying, Mahesh Kallahalla, Bikash Koley, Amin Vahdat and many others.

Upload: others

Post on 06-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scale, Reliability and Programmability for Global Internet ...conferences.sigcomm.org/sigcomm/2017/files/program/ts-10-2-espr… · Manish Verma, Puneet Sood, Mukarram Tariq, Matt

Confidential + ProprietaryConfidential + Proprietary

Taking the Edge off with EspressoScale, Reliability and Programmability for Global Internet Peering

KK Yap, Murtaza Motiwala, Jeremy Rahe, Steve Padgett, Matthew Holliman, Gary Baldus, Marcus Hines, Taeeun Kim, Ashok Narayanan, Ankur Jain, Victor Lin, Colin Rice, Brian Rogan, Arjun Singh, Bert Tanaka, Manish Verma, Puneet Sood, Mukarram Tariq, Matt Tierney, Dzevad Trumic, Vytautas Valancius, Calvin Ying, Mahesh Kallahalla, Bikash Koley, Amin Vahdat and many others.

Page 2: Scale, Reliability and Programmability for Global Internet ...conferences.sigcomm.org/sigcomm/2017/files/program/ts-10-2-espr… · Manish Verma, Puneet Sood, Mukarram Tariq, Matt

Confidential + Proprietary

Problem Statement

Egress Terabits/sec of traffic to our Internet peers● High-def video, cloud traffic, etc.

2

Page 3: Scale, Reliability and Programmability for Global Internet ...conferences.sigcomm.org/sigcomm/2017/files/program/ts-10-2-espr… · Manish Verma, Puneet Sood, Mukarram Tariq, Matt

Confidential + Proprietary

Problem Statement

Egress Terabits/sec of traffic to our Internet peers● High-def video, cloud traffic, etc.

1. Optimize traffic per-customer and per-application● e.g., optimal video quality, or differentiated service for cloud

3

Google

Alternate path with better user experience?

● Problem: Constrained by BGP shortest path and lack of application awareness

Page 4: Scale, Reliability and Programmability for Global Internet ...conferences.sigcomm.org/sigcomm/2017/files/program/ts-10-2-espr… · Manish Verma, Puneet Sood, Mukarram Tariq, Matt

Confidential + Proprietary

Problem Statement

Egress Terabits/sec of traffic to our Internet peers● High-def video, cloud traffic, etc.

2. Deliver new features quickly

4

Request to vendor

Commit to Feature Implement Vendor

Testing

Integration Testing @

GoogleDeploy

Novel L2 VPN?

● Problem: router-vendor feature cycles and qualification take many years

Page 5: Scale, Reliability and Programmability for Global Internet ...conferences.sigcomm.org/sigcomm/2017/files/program/ts-10-2-espr… · Manish Verma, Puneet Sood, Mukarram Tariq, Matt

Confidential + Proprietary

Espresso: Google’s SDN Peering Edge

Our previous experience with SDN

● B4 [SIGCOMM 2013] and Jupiter [SIGCOMM 2015]● Enable flexible traffic engineering● Increase feature velocity

5

SDN is only suited for walled gardens?

Peering edge requires interoperability with heterogeneous peers.

Page 6: Scale, Reliability and Programmability for Global Internet ...conferences.sigcomm.org/sigcomm/2017/files/program/ts-10-2-espr… · Manish Verma, Puneet Sood, Mukarram Tariq, Matt

Confidential + Proprietary

Agenda

● Problem Statement

● Espresso in Context

● Design Principles

● Architecture Overview

● Results

● Conclusion

6

Page 7: Scale, Reliability and Programmability for Global Internet ...conferences.sigcomm.org/sigcomm/2017/files/program/ts-10-2-espr… · Manish Verma, Puneet Sood, Mukarram Tariq, Matt

Confidential + Proprietary

Espresso in Context

B4Jupiter Data CenterGoogle

7

[SIGCOMM 2015]

[SIGCOMM 2013]

Page 8: Scale, Reliability and Programmability for Global Internet ...conferences.sigcomm.org/sigcomm/2017/files/program/ts-10-2-espr… · Manish Verma, Puneet Sood, Mukarram Tariq, Matt

Confidential + Proprietary

Espresso in Context

B4

Metro/Points-of-presence (PoP)

Jupiter Data CenterGoogle

Google

8

Page 9: Scale, Reliability and Programmability for Global Internet ...conferences.sigcomm.org/sigcomm/2017/files/program/ts-10-2-espr… · Manish Verma, Puneet Sood, Mukarram Tariq, Matt

Confidential + Proprietary

Espresso in Context

B4Espresso

Internet

Metro/PoP

User

Jupiter Data CenterGoogle

Google

9

Page 10: Scale, Reliability and Programmability for Global Internet ...conferences.sigcomm.org/sigcomm/2017/files/program/ts-10-2-espr… · Manish Verma, Puneet Sood, Mukarram Tariq, Matt

Confidential + Proprietary

Points of presence (>100)

Network fiber

Global Edge Footprint, > 100 PoPs

10

Page 11: Scale, Reliability and Programmability for Global Internet ...conferences.sigcomm.org/sigcomm/2017/files/program/ts-10-2-espr… · Manish Verma, Puneet Sood, Mukarram Tariq, Matt

Confidential + Proprietary

Agenda

● Problem Statement

● Espresso in Context

● Design Principles

● Architecture Overview

● Results

● Conclusion

11

Page 12: Scale, Reliability and Programmability for Global Internet ...conferences.sigcomm.org/sigcomm/2017/files/program/ts-10-2-espr… · Manish Verma, Puneet Sood, Mukarram Tariq, Matt

Confidential + Proprietary

Espresso’s Design Principles

12

1. Hierarchical control plane○ Global optimization while local control plane provide fast reaction.

2. Fail static○ Local control plane continues to function without global controller failure.

3. Software programmability○ Externalize features into software to exploit commodity servers for scale.

4. Testability

5. Manageability

Page 13: Scale, Reliability and Programmability for Global Internet ...conferences.sigcomm.org/sigcomm/2017/files/program/ts-10-2-espr… · Manish Verma, Puneet Sood, Mukarram Tariq, Matt

Confidential + Proprietary

Espresso’s Design Principles

13

1. Hierarchical control plane○ Global optimization while local control plane provide fast reaction.

2. Fail static○ Local control plane continues to function without global controller failure.

3. Software programmability○ Externalize features into software to exploit commodity servers for scale.

4. Testability

5. Manageability

Page 14: Scale, Reliability and Programmability for Global Internet ...conferences.sigcomm.org/sigcomm/2017/files/program/ts-10-2-espr… · Manish Verma, Puneet Sood, Mukarram Tariq, Matt

Confidential + Proprietary

Architecture: Externalizing BGP

eBGP Peering

Espresso

Peering Router

Internet-size routing/forwarding

table

Large ACL

External Peer

Traditional Peering Router

14

Hierarchical control planeFail static

Software programmability

HostHostHostHostHost

Host Servers in Metro

Label-switched Fabric

BGP speaker

External Peer

Peering Fabric

HostHostHostHostHost

Host Servers in Metro

Page 15: Scale, Reliability and Programmability for Global Internet ...conferences.sigcomm.org/sigcomm/2017/files/program/ts-10-2-espr… · Manish Verma, Puneet Sood, Mukarram Tariq, Matt

Confidential + Proprietary

Label-switched Fabric

Architecture: Reliability and Scale of BGP

External Peer

eBGP Peering

Peering Router

Internet-size RIB/FIBLarge TCAM

External Peer

Traditional Peering Router

15

Espresso Peering Fabric

HostHost

Host

HostHostHostHostHost

BGP speaker

BGP speaker

BGP speaker

Host Servers in Metro

Hierarchical control planeFail static

Software programmability

HostHostHostHostHost

Host Servers in Metro

Page 16: Scale, Reliability and Programmability for Global Internet ...conferences.sigcomm.org/sigcomm/2017/files/program/ts-10-2-espr… · Manish Verma, Puneet Sood, Mukarram Tariq, Matt

Confidential + Proprietary

Architecture: Externalize Packet Processing

Label-switched Fabric

HostHostHostHostHost

Host

Packet Processor

BGP speaker

External PeereBGP Peering

HostHostHostHostHost

Labeled packets specify egress

Host-based packet processor allows flexible packet processing, including ACL and handling of DoS.

16

Sink DoSIngress ACL

Peering Fabric

Hierarchical control planeFail static

Software programmability

Page 17: Scale, Reliability and Programmability for Global Internet ...conferences.sigcomm.org/sigcomm/2017/files/program/ts-10-2-espr… · Manish Verma, Puneet Sood, Mukarram Tariq, Matt

Confidential + Proprietary

Architecture: Hierarchical Control

HostHostHostHostHost

Espresso Metro

Global Controller

HostHostHostHostHost

Peering Fabric

LocationControl

Peering FabricControl

17

Label-switched Fabric

BGP speaker

External Peer

Host

Packet Processor

eBGP Peering

Hierarchical control planeFail static

Software programmability

Page 18: Scale, Reliability and Programmability for Global Internet ...conferences.sigcomm.org/sigcomm/2017/files/program/ts-10-2-espr… · Manish Verma, Puneet Sood, Mukarram Tariq, Matt

Confidential + Proprietary

Architecture: Fail Static

HostHostHostHostHost

Espresso Metro

Global Controller

HostHostHostHostHost

Peering Fabric

18

Label-switched Fabric

BGP speaker

LocationControl

Peering FabricControl

External Peer

Host

Packet Processor

eBGP Peering

Hierarchical control planeFail static

Software programmability

Page 19: Scale, Reliability and Programmability for Global Internet ...conferences.sigcomm.org/sigcomm/2017/files/program/ts-10-2-espr… · Manish Verma, Puneet Sood, Mukarram Tariq, Matt

Confidential + Proprietary

Architecture: Application Aware Routing

HostHostHostHostHost

Espresso Metro

Global Controller

HostHostHostHostHost

Peering Fabric

LocationControl

Peering FabricControl

19

Label-switched Fabric

BGP speaker

External Peer

Host

Packet Processor

eBGP Peering

Application Signals

Hierarchical control planeFail static

Software programmability

Page 20: Scale, Reliability and Programmability for Global Internet ...conferences.sigcomm.org/sigcomm/2017/files/program/ts-10-2-espr… · Manish Verma, Puneet Sood, Mukarram Tariq, Matt

Confidential + Proprietary

Using User’s Best Path, not BGP’s

20

Google

● Serve 13% more traffic than BGP best path in application aware manner.

● Helps capacity-constrained ISPs by overflowing demand to alternate paths within local metro and also via remote metros.

Page 21: Scale, Reliability and Programmability for Global Internet ...conferences.sigcomm.org/sigcomm/2017/files/program/ts-10-2-espr… · Manish Verma, Puneet Sood, Mukarram Tariq, Matt

Confidential + Proprietary

Improvements in End User Experience

Client ISP Change in mean time between rebuffers (MTBR)

Change in Mean Goodput

A 10 → 20 min 2.25 → 4.5 Mbps

B 4.6 → 12.5 min 2.75 → 4.9 Mbps

C 14 → 19 min 3.2 → 4.2 Mbps

Provide significant improvements to end-user experience.

21

Page 22: Scale, Reliability and Programmability for Global Internet ...conferences.sigcomm.org/sigcomm/2017/files/program/ts-10-2-espr… · Manish Verma, Puneet Sood, Mukarram Tariq, Matt

Confidential + Proprietary

Release Velocity

Component Average Velocity(days)

Local Controller 11.2

BGP speaker 12.6

Peering Fabric Controller 15.8

> 50× more frequently than with traditional peering routers.

Novel L2 VPN delivered 6× faster via incremental rollout.

22

Page 23: Scale, Reliability and Programmability for Global Internet ...conferences.sigcomm.org/sigcomm/2017/files/program/ts-10-2-espr… · Manish Verma, Puneet Sood, Mukarram Tariq, Matt

Confidential + Proprietary

Conclusion

SDN is only suited for walled gardens.

23

.

Espresso demonstrates that

● traditional peering architecture can evolve to exploit SDN● SDN’s value is in flexibility and feature velocity

Page 24: Scale, Reliability and Programmability for Global Internet ...conferences.sigcomm.org/sigcomm/2017/files/program/ts-10-2-espr… · Manish Verma, Puneet Sood, Mukarram Tariq, Matt

Confidential + Proprietary

Conclusion

Cloud 1.0RouterCentric

Protocols

Local viewConnectivity based optimizationSlow evolutionCostly

EspressoSDN

Peering

Global viewApplication signals-based optimizationRapid deploy-and-iterate75% Cheaper

24