circuit emulation for bulk transfers in distributed storage and clouds

34

Upload: marat-zhanikeev

Post on 19-Jan-2015

89 views

Category:

Technology


1 download

DESCRIPTION

Assuming that majority of in-cloud networking is Ethernet-based at least at departure and entry points, it is widely recognized that TCP/UDP communications fail to achieve the necessary throughput during bulk transfers. While modern switches support maximum achievable throughput via the cut-through mode of operation, the practical benefit of this mode is diminished when the network is contended by multiple communication parties. This research removes this problem by implementing circuits-over-packets emulation. Circuits are simply optimal schedules for communication sessions where each session gets exclusive access to the network. Transfer of chunks of Big Data, pieces of storage, VM images, etc. all fall under the category of bulk transfers.

TRANSCRIPT

Page 1: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds
Page 2: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

Setting the Mood

• "It's time to get rid of TCP/UDP protocols in DCs"

• DCs/Clouds are closed worlds, brand new technologies are OK

• with bulk transfers (BigData, ...), the business value of a TCP/UDP alternative is high

• circuits are an alternative to packets

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 2/32...

2/32

Page 3: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

Ethernet is the Best

.Ethernet.....

.

... is the cheapest and most available technology with e2esupport

• Fiber Channel (FC), SATA, etc. require expensive hardware, lowcompatibility, no e2e support

• FCoE = Ethernet, same problems, expensive hardware, no e2e support

• network virtualization is best fit for Ethernet

• disclaimer: one of proposed models will work with optical networks aswell

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 3/32...

3/32

Page 4: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

Ethernet is the Worst

.Ethernet.....

.... is the worst technology in terms of throughput• CSMA/CD is the biggest throughput limitation

◦ not in modern switches, but still major problem in wireless

• contention problem cannot be easily resolved

• same applies to OBS/OPS optical technologies

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 4/32...

4/32

Page 5: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

Ethernet Contention

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 5/32...

5/32

Page 6: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

Ethernet and Contention

• whaterver you do, Ethernet L2 domains cannot avoid contention

Switch Switch

Qualitatively Identical

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 6/32...

6/32

Page 7: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

Parallel vs Sequential (2 flows)

20 24 28 32 36 40Transfer time in contention (s)

20

24

28

32

36

40Tr

ansf

er ti

me

by e

xclu

sive

circ

uits

(s)

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 7/32...

7/32

Page 8: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

Ethernet Switches : Basic Facts

• cut-through versus store-and-forward• cut-through is 10..15x better

• Cisco has advanced cut-through : +bytes versus routing decision tradeoff

• store-and-forward is subjected to QoS classes◦ L3 DSCP versus L2 CoS, AF, EF, BE, SBE models

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 8/32...

8/32

Page 9: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

Switchess : Modeling

C: Cut Through

Check, etc. Q: Queue

D: Drop QoS classes

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 9/32...

9/32

Page 10: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

Proposal

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 10/32...

10/32

Page 11: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

Proposal : Circuits

.Circuits..

.

... are emulations which allow for exclusive access to L2 domain byindividual parties

• circuits-over-packets emulation

• cut-through mode for each circuit is guaranteed

• highest possible throughput

• NOTE: will work with cheepest switches

• NOTE2: applies to optical networks as well (L2=lightpaths)

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 11/32...

11/32

Page 12: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

Implementation : 2 cases• left: book-then-send, right: separate control layer

SWITCH

NOC

Storage Node A

Storage Node B

Step 1: Book

session

Step 2: Transfer bulk

SWITCH

Storage Node A

Storage Node B

SWITCH

Bookingsegment

BulkSegment

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 12/32...

12/32

Page 13: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

Impl.: Centralized Case

SWITCH

NOC

Storage Node A

Storage Node B

Step 1: Book

session

Step 2: Transfer bulk

• same network for booking andcircuits

• inefficient but still valid/practical

• legacy-compatible,partial implementation, etc.

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 13/32...

13/32

Page 14: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

Impl.: Distributed Case

SWITCH

Storage Node A

Storage Node B

SWITCH

Bookingsegment

BulkSegment

• book on one network, send on another

• legacy-incompatible• contention-sensing possible →fully distributed models

• can also use sensing andcontention control

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 14/32...

14/32

Page 15: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

Optimization

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 15/32...

15/32

Page 16: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

Optimization : Basics

• same for distributed and centralized models◦ does not matter, optimization shows the overall utility of a heuristic

• practical optimization = formulation + heuristic• given: demand matrix

• expected result: a routing table mapping demand to topology

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 16/32...

16/32

Page 17: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

Optimization : Basics

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 17/32...

17/32

Page 18: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

Optim. : OSPF → tuple notation

• OSPF is traditional in such optimizations, but too rigid for many practical cases◦ too complex for lightpaths in optical networks◦ no good heuristics for complex topologies

• OSPF notation is not very convinient1. capacity constraints2. flow preservation3. contention/congestion metrics

• alternative: tuples ... for example ⟨s, d, v, t⟩ defines demand of traffic

volume v at time t from source s to destionation d◦ this notation ismuch more flexible for several coming formulations

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 18/32...

18/32

Page 19: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

Optim. : Basic Tuple Notation

• nodes: source s, destination: d and others a, b, c• individual demand tupleTi = ⟨s, d, v, t⟩• lightpathλ for optical networks

• time t, can be start time, start and end of a period, etc.

• we do not care about utility so far, just the notation, but utility is obvious inmost cases

• → means results in... or leads to...

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 19/32...

19/32

Page 20: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

tOSPF : Traditional OSPF

Ti = ⟨s, d, v, t⟩ → ⟨s, a, b, ..., d⟩.Externals..

.

Using demand matrix, creates a set of per-linkweights, which define a unique route for eachdemand item.

.Internals..

.

Per-link capacity constraint, in/out flowconservation constraint, unstable for largetopologies and demand matrices

• s source

• d destination

• a, b, c, ... intermediatenodes on e2e paths/routes

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 20/32...

20/32

Page 21: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

oOSPF : Optical OSPF w/out Switching

Ti = ⟨s, d, v, t⟩ → ⟨s, λ⟩.Externals..

.

Using demand matrix, maps each demand item onisolated lightpath

.Internals..

.

Simple but inefficient because the number ofe2e lightpaths is small

• s source

• d destination

• λ a wavelength for a fixed e2elightpath from s to destination

d

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 21/32...

21/32

Page 22: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

oOSPFs : Optical OSPF with Switching

Ti = ⟨s, d, v, t⟩ → ⟨s, λs, λa, λb, ...⟩.Externals..

.

Using demand matrix, maps each demand item on aroute of wavelengths

.Internals..

.

Efficient, but suffers from the same problemsas traditional OSPF

• s source

• d destination

• λx an exit wavelength at agiven node x

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 22/32...

22/32

Page 23: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

Proposal : Sensing Formulation

Ti = ⟨s, d, v, t1, t2⟩ → ⟨s, λ, t⟩.Externals..

.

Using a matrix of loosely scheduled demand, createa schedule of sequential sessions withexlusive access to paths

.Internals..

.

Same approach for Ethernet (one wavelength) andoptical networks

• s source

• d destination

• t1 and t2 areuser-preferred range forthe start of a session, a valuet is picked between them

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 23/32...

23/32

Page 24: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

Heuristics

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 24/32...

24/32

Page 25: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

Centralized Case

SWITCH

NOC

Storage Node A

Storage Node B

Step 1: Book

session

Step 2: Transfer bulk

• all optimization formulations exceptsensing

• very close to traditional OSPF• same problems as in OSPF

• the biggest problem is to knowdemand matrix in advance

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 25/32...

25/32

Page 26: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

Distributed Case

SWITCH

Storage Node A

Storage Node B

SWITCH

Bookingsegment

BulkSegment

• can be used for all formulations

• pefectly suited for the Sensingformulation

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 26/32...

26/32

Page 27: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

The Sensing Model• contention methods in wireless and OBS will work

◦ in practice: sensing can beSNMP-like feedback on gate's status◦ no sync among users is necessary

• same model for Ethernet (+virtual nets) and optical networks

• main advantage: the offload, no need to implement funny OSPFheuristics

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 27/32...

27/32

Page 28: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

Realistic Gate/Sensing Model

• an approximate view of JGNtopology

• two way = one way + ring• Gates are created at optical/ethernet border

• NOTE: already working for Ethernet

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 28/32...

28/32

Page 29: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

Wrapup

• circuit emulation is necessary for effective bulk transfers◦ up to 40% faster in our lab tests

• intra-DC, DC-DC, federations, etc. -- all can benefit from circuits

• circuits formulated as OSPF are bad -- a Gate/Sensing model is better• validity: worst case is the existing technology, but upper performancebound is very high

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 29/32...

29/32

Page 30: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

That’s all, thank you ...

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 30/32...

30/32

Page 31: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

[01] myself (2014)High Availability Cloud Storage...NS研

[02] Cisco (2014)LAN Switching and Wireless, CCNA Exploration Companion GuideCisco Press

[03] Cisco (2014)Cut-Through and Store-and-Forward Ethernet Switching for Low-Latency....Cisco Press

[04] NetOptics (2014)Cut-Through Ethernet Switching: A Versatile Resource for Low Latency...White Paper

[05] Cisco (2006)QoS: DSCP Classification GuidelinesRFC4594

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 30/32...

30/32

Page 32: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

[06] Cisco (2010)A Differentiated Services Code Point (DSCP)...RFC5865

[07] open source (current)PICA8 Project for Low Latency Virtual Networkinghttp://www.pica8.com/

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 31/32...

31/32

Page 33: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

Wait-n-Send Model

Bulk size per transmission

Goodput

2 potential distributions in practice

Response curve(s)

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 31/32...

31/32

Page 34: Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds

.

Utility of Waiting (curve)

• I called it Wait-n-SeeCurve

• source waits for some time forexclusive access --sensing and accumulating bulk

• on timeout, the current bulkis released at best effort(fallback)

M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 32/32...

32/32