esnet activities & plans · dedicated systems –data transfer node •set up specificallyfor...

33
ESnet Activities & Plans Michael O’Connor [email protected] Network Engineering Energy Sciences Network Lawrence Berkeley National Lab LHCOPN-LHCONE Meeting Taipei Taiwan March 13, 2016

Upload: others

Post on 22-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

ESnet Activities & Plans

Michael O’Connor [email protected]

Network Engineering

Energy Sciences Network

Lawrence Berkeley National Lab

LHCOPN-LHCONE Meeting

Taipei Taiwan

March 13, 2016

Page 2: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

ESnet vision:

Scientific progress will be completely unconstrained by the physical location of instruments, people,

computational resources, or data.

2

Page 3: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

DOE’s Energy Sciences Network (ESnet):

Page 4: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

80% of ESnet traffic originates or terminates outside the Department of Energy complex.

80% of ESnet traffic originates or terminates outside the DOE complex.

NRENs share fate. No research network can succeed in isolation.

Page 5: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

The most important thing to know about NRENs: they are not ISPs.

Page 6: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

They are instruments for discovery designed to overcome the constraints of geography.

Offering unique capabilities – and optimized for data acquisition, data placement, data sharing,

data mobility.

Page 7: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

What does success look like?

3/13/20167

0

5E+15

1E+16

1.5E+16

2E+16

2.5E+16

3E+16

3.5E+16

4E+16

4.5E+16

Jan

-90

Sep

-90

May

-91

Jan

-92

Sep

-92

May

-93

Jan

-94

Sep

-94

May

-95

Jan

-96

Sep

-96

May

-97

Jan

-98

Sep

-98

May

-99

Jan

-00

Sep

-00

May

-01

Jan

-02

Sep

-02

May

-03

Jan

-04

Sep

-04

May

-05

Jan

-06

Sep

-06

May

-07

Jan

-08

Sep

-08

May

-09

Jan

-10

Sep

-10

May

-11

Jan

-12

Sep

-12

May

-13

Jan

-14

Sep

-14

May

-15

Jan

-16

Total Bytes Transferred per Month in ESnet

39.35 PB

Jan 2016

Page 8: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

Planning for growth

3/13/20168

1E+10

1E+11

1E+12

1E+13

1E+14

1E+15

1E+16

1E+17

1E+18

Jan

-90

Sep

-90

May

-91

Jan

-92

Sep

-92

May

-93

Jan

-94

Sep

-94

May

-95

Jan

-96

Sep

-96

May

-97

Jan

-98

Sep

-98

May

-99

Jan

-00

Sep

-00

May

-01

Jan

-02

Sep

-02

May

-03

Jan

-04

Sep

-04

May

-05

Jan

-06

Sep

-06

May

-07

Jan

-08

Sep

-08

May

-09

Jan

-10

Sep

-10

May

-11

Jan

-12

Sep

-12

May

-13

Jan

-14

Sep

-14

May

-15

Jan

-16

Sep

-16

Total Bytes Transferred per Month in ESnet

39 PB

Jan 2016

125 PB

Jan 2017

predicted

Page 9: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

Design Pattern #1: Protect your Elephant Flows

3/13/20169

Page 10: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

3/13/201610

Large Hadron Collider (LHC)

International Thermonuclear Experimental Reactor (ITER) Genomics

Large Science Data

10General Internet

Page 11: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

Elephant flows require almost lossless networks.

Metro Area

Local(LAN)

Regional

Continental

International

Measured (TCP Reno) Measured (HTCP) Theoretical (TCP Reno) Measured (no loss)

11

. See Eli Dart, Lauren Rotman, Brian Tierney, Mary Hester, and Jason Zurawski. The Science DMZ: A Network Design Pattern for Data-

Intensive Science. In Proceedings of the IEEE/ACM Annual SuperComputing Conference (SC13), Denver CO, 2013.

NZAUNZSFO

200+

Page 12: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

Design Pattern #2: Unclog your data taps

3/13/201612

Page 13: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

Problem and Solution explained illustratively

3/13/2016

Big-Data assets not optimized for high-bandwidth access because of convoluted campus network and security design

Science DMZ is a deliberate, well-designed architecture to simplify and effectively on-ramp ‘data-intensive’ science to a capable WAN

Inder Monga ([email protected]) 13

Page 14: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

Data Set Mobility Timeframes - Theoretical

This table available at:

http://fasterdata.es.net/fasterdata-home/requirements-and-expectations/

3/13/2016

© 2015, The Regents of the University of California, through Lawrence Berkeley National Laboratory and is licensed under CC BY-

NC-ND 4.0Inder Monga ([email protected]) 14

Page 15: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

Dedicated Systems for Data

Transfer

Network Architecture

Performance Testing &

Measurement

Data Transfer Node• High performance• Configured specifically

for data transfer• Proper tools

Science DMZ• Dedicated network

location for high-speed data resources

• Appropriate security• Easy to deploy - no need

to redesign the whole network

perfSONAR• Enables fault isolation• Verify correct operation• Widely deployed in ESnet

and other networks, as well as sites and facilities

The Science DMZ Design Pattern

15 – ESnet Science Engagement ([email protected]) - 3/13/2016© 2015, Energy Sciences Network

Page 16: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

Science DMZ Design Pattern (Abstract)

10GE

10GE

10GE

10GE

10G

Border Router

WAN

Science DMZ

Switch/Router

Enterprise Border

Router/Firewall

Site / Campus

LAN

High performance

Data Transfer Node

with high-speed storage

Per-service

security policy

control points

Clean,

High-bandwidth

WAN path

Site / Campus

access to Science

DMZ resources

perfSONAR

perfSONAR

perfSONAR

16 – ESnet Science Engagement ([email protected]) - 3/13/2016© 2015, Energy Sciences Network

Page 17: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

Local And Wide Area Data Flows

10GE

10GE

10GE

10GE

10G

Border Router

WAN

Science DMZ

Switch/Router

Enterprise Border

Router/Firewall

Site / Campus

LAN

High performance

Data Transfer Node

with high-speed storage

Per-service

security policy

control points

Clean,

High-bandwidth

WAN path

Site / Campus

access to Science

DMZ resources

perfSONAR

perfSONAR

High Latency WAN Path

Low Latency LAN Path

perfSONAR

17 – ESnet Science Engagement ([email protected]) - 3/13/2016© 2015, Energy Sciences Network

Page 18: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

Dedicated Systems – Data Transfer Node

• Set up specifically for high-performance data movement

– System internals (BIOS, firmware, interrupts, etc.)

– Network stack

– Storage (global filesystem, Fibrechannel, local RAID, etc.)

– High performance tools

– No extraneous software

• Limitation of scope and function is powerful

– No conflicts with configuration for other tasks

– Small application set makes cybersecurity easier

18 – ESnet Science Engagement ([email protected]) - 3/13/2016© 2015, Energy Sciences Network

Page 19: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

Example of perfSONAR monitoring

Improving things, when you don’t know what you are doing, is a random walk. Sharing and educating the local community is important

19 – ESnet Science Engagement ([email protected]) -

3/13/2016

Page 20: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

Emerging global consensus around Science DMZ architecture.

>120 universities in the US have deployed this ESnet architecture.

NSF has invested >>$60M to accelerate adoption.

Australian, Canadian universities following suit.

http://fasterdata.es.net/science-dmz/

Page 21: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

ESnet 6 Planning Process2018 – 100G ESnet 5 architecture will exceed 6 years in production

ESnet has begun the planning process for ESnet 6

Architectural Choices

• SDN

– Hybrid SDN/Traditional Routing

– Full Control and Data Plane separation

• Packet Optical Convergence

– Packet services in Optical Transport Networks

– OTN Transceivers in classical routers

• Business as Usual

– Disruptive price/performance advances?

3/13/201621

ESnet 6 investigation should align with Next Gen service development

Page 22: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

ESnet6 Architecture Investigation

22

Netw

ork

requ

ireme

nts

RFP

Page 23: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

100G R&E Exchange

Direct ConnectESnet Pilot 2x10G

AWS100G to PNWG

Increased ESnet Connectivityto AWS US West regions

Seattle

2015 ESnet AWS Pilot

Page 24: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

The AWS Cloud Changing the rules of the Network Game

The Amazon Web Service cloud bends a number or longstanding Internet Architecture rules.

• AWS ASN 16509 is not contiguous as Autonomous Systems are intended to be. This “Cloud” is a set of regional Data Centers.

• BGP with AWS establishes connectivity with only the geographically close region(s), NOT the entire AS. Establishing BGP with AWS, ASN 16509 in one region doesn’t establish connectivity to other regions, continents etc.

• Since AWS does not advertise remote region routes, routing to those remote regions will NOT use your established BGP with them, choosing the general public Internet instead.

• Similarly, the reverse path will ignore AS Path rules and prefer commodity public Internet over the direct peering with AWS.

This is a challenge for globally collaborative computing, but it’s not necessarily a bad thing

Page 25: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

Amazon is not in the “Networking” business

• Amazon Web Services (AWS) offers an extensive portfolio of computing resources, but they are not in the “networking” business.

• AWS recharges customers for egress traffic out of the cloud. Some researchers have described these fees as “holding their data hostage”.

• AWS must pay for upstream transit to the Internet and so they pass this on to their customers.

• Their inter-region long haul circuits are used for internal control and management rather than customer transport.

• AWS offers a service that will migrate data between regions, but this is strictly controlled and scheduled by AWS, not customers.

• Since AWS peers for free with R&E networks, they offer customers of these networks an “Egress Traffic Fee Waiver”.

Research & Education Networks have an opportunity to scale “Cloud” beyond the local region for their customers.

Page 26: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

Routing ExerciseHow do European NRENs connect to AWS

Prefixes from several AWS regions

• us-east-1 50.19.0.0/16

• us-west-1 50.18.0.0/16

• us-west-2 50.112.0.0/16

• eu-central-1 52.28.0.0/16

• eu-west-1 52.18.0.0/15

European NREN Looking Glasses

• GEANT - https://tools.geant.net/portal/links/lg/

• DFN Germany – https://www.noc.dfn.de/lg/

• GARR Italy - https://gins.garr.it/LG/

• RENATER France - https://portail.noc.renater.fr/lookingglass/v2/lg_index.pl

26

Try Looking up these BGP routes using a couple of these European NREN looking glass services

• Check the AS path to the US regions

• Check the AS path to the European regions

What is interesting about these paths?

Page 27: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

The “On-Ramp” to R&E NetworkingVirtual Private Cloud

The public cloud is being used privately, however the private cloud enhances data sharing over R&E networks.

• Currently, cloud service usage within the ESnet customer-base, employs a simple “Single Consumer Model”, where each site purchases services that they consume directly.

• Their connection is a standard routed path through their upstream provider ESnet, no virtual circuits or additional routing protocol setup, simple.

• As global collaborations shift computing resources to the “Cloud”, this simple single consumer model will give way to one that better supports data sharing, the “Collaborative Cloud”.

• VPC enables a customer to control the routing of their data by providing a means of mapping customer IP allocations into the cloud.

• The principal assumption: When geographically dispersed collaborations begin to transit large datasets directly out of the cloud, VPC services will be essential in order to control routing between those collaborating institutions.

Global collaborations need to control transport of large data sets

Page 28: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

Proposal: Collaborative Cloud Services (CCS)

SDN site router - A virtual router that is provisioned to BGP peer with the cloud provider using the customer ASN.

• Multiple virtual routers per hardware device.

• IPsec will connect the SDN Site router to the cloud in order to support VPC.

• Collaborator transit paths take the best path to the cloud and back.

The potential:

• Become an important application for SDN and a new ESnet service.

• Transform the ESnet site connection architecture by alleviating upgrade pressure on local loop circuits.

Page 29: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

Proposed: Collaborative Cloud Services (CCS)

3/13/201629

Standard VPC implementation controls routing as an “On-Ramp”for R&E Network Transport.

Page 30: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

Virtual “Site Router” (VRTR) ServiceAt the edge of the cloud

3/13/201630

Virtual “Site Router” improves path efficiency and takes pressure off of the site local-loop.

Page 31: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

CCS Protocol Description

3/13/201631

The CCS service would reside on ESnet HW, but serve multiple customer site router instance in software.

Page 32: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

In conclusion:

• Global science networks are not ISPs – rather, extensions of science discovery instruments.

• Design patterns, architectures, workflows and challenges from science are now crossing over to domains.

• Packet Optical convergence, SDN and Cloud all present challenges and opportunities in network design.

• Cloud computing services offer a compelling business case for research institutions along with challenges and opportunities for NRENs.

32

Page 33: ESnet Activities & Plans · Dedicated Systems –Data Transfer Node •Set up specificallyfor high-performance data movement –System internals (BIOS, firmware, interrupts, etc.)

Thank You

Michael O’Connor [email protected]

Network Engineering

Energy Sciences Network

Lawrence Berkeley National Lab

LHCOPN-LHCONE Meeting

Taipei Taiwan

March 13, 2016