design and performance analysis of a dram-based statistics counter array architecture

17
Authors: Haiquan (Chuck) Zhao, Hao Wang, Bill Lin, Jun (Jim) Xu Conf. : The 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS '09) Presenter : JHAO-YAN JIAN Date : 2010/12/15 Design and Performance Analysis of a DRAM-based Statistics Counter Array Architecture 1

Upload: heidi-dixon

Post on 01-Jan-2016

29 views

Category:

Documents


2 download

DESCRIPTION

Design and Performance Analysis of a DRAM-based Statistics Counter Array Architecture. Authors: Haiquan (Chuck) Zhao, Hao Wang, Bill Lin, Jun (Jim) Xu Conf. : The 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS '09) Presenter : JHAO-YAN JIAN - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Design and Performance Analysis of a DRAM-based Statistics Counter Array Architecture

Authors: Haiquan (Chuck) Zhao, Hao Wang, Bill Lin, Jun (Jim) XuConf. : The 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS '09)Presenter : JHAO-YAN JIANDate : 2010/12/15

Design and Performance Analysis of a DRAM-based Statistics Counter Array Architecture

1

Page 2: Design and Performance Analysis of a DRAM-based Statistics Counter Array Architecture

Outline

2

IntroductionSchemeAnalysisEvaluation

Page 3: Design and Performance Analysis of a DRAM-based Statistics Counter Array Architecture

Network MeasurementTraffic statistics are widely used in

router managementtraffic engineering network anomaly detection � etc.

Fine-grained network measurementPossibly tens of millions of flows (and

counters)Wirespeed statistics counting

8 ns update time at 40 Gb/s(OC-768)

3

Page 4: Design and Performance Analysis of a DRAM-based Statistics Counter Array Architecture

Naive ImplementationsSRAM is fast, but too expensive

e.g., 10 million counters × 64-bits = 80 MB, prohibitively expensive (infeasible for on-chip)

DRAM is cheap, but naïve implementation too slowe.g., 50 ns DRAM random access times

typically quoted, 2 × 50 ns = 100 ns > 8 ns required for wirespeed updates (read, increment, then write)

4

Page 5: Design and Performance Analysis of a DRAM-based Statistics Counter Array Architecture

Hybrid SRAM/DRAM architectures(1/2)Based on premise that DRAM is too slow, hybrid

SRAM/DRAM architectures have been proposede.g., Shah’02[25], Ramabhadran’03[21], Roeder’04[23],

Zhao’06[31]

All based on following idea:Store full counters in DRAM (64-bits)Keep say a 5-bit SRAM counter, one per flow Wirespeed increments on 5-bit SRAM counters “Flush" SRAM counters to DRAM before they

“overflow"Once “flushed", SRAM counter won’t overflow again

for at least say another 2^5 = 32 (or 2^b in general) cycles

5

Page 6: Design and Performance Analysis of a DRAM-based Statistics Counter Array Architecture

Hybrid SRAM/DRAM architectures(2/2)

10 to 57 MB needed far exceed available on-chip SRAM

On-chip SRAM needed for other network processing

SRAM amount depends on “how often" SRAM counters have to be flushed - if arbitrary increments are allowed (e.g. byte counting), more SRAM needed

Integer specific, no decrements6

Page 7: Design and Performance Analysis of a DRAM-based Statistics Counter Array Architecture

Basic Architecture: Randomized Scheme Proposed by Lin & Xu in Hotmetrics08. Counters randomly distributed across B memory banks B

> 1= 1/µ, where µ is the SRAM-to-DRAM access latency ratio.

7

Page 8: Design and Performance Analysis of a DRAM-based Statistics Counter Array Architecture

Adversarial Access PatternsEven though random address permutation is

applied, the memory loads to the DRAM banks may not be balanced.Counter index permutation in our scheme makes it

difficult for an adversary to purposely trigger a large number of consecutive counter updates to the same memory bank with updates to distinct counters since the pseudorandom permutation function (or the key it uses) is not known to the outside world.

An adversary can only try to trigger consecutive counter updates to the same counter, which would result in consecutive accesses to the same memory bank.

8

Page 9: Design and Performance Analysis of a DRAM-based Statistics Counter Array Architecture

Extended Architecture to Handle Adversaries

Fixed pipelined delay module absorbs repeated updates to the same memory location.

Implemented as a fully associative cache with FIFO replacement policy

111 2

9

Page 10: Design and Performance Analysis of a DRAM-based Statistics Counter Array Architecture

Worst Case Request Pattern

10

q + r requests for distinct counters a1, ..., aq+r

q requests repeat T times each r requests repeat T − 1 times each

Page 11: Design and Performance Analysis of a DRAM-based Statistics Counter Array Architecture

A Few Definitions(1/2)

11

Eg. (1, 1, 1) ≤ M (2, 1, 0) ≤M (3, 0, 0).

Page 12: Design and Performance Analysis of a DRAM-based Statistics Counter Array Architecture

A Few Definitions(2/2)

12

Page 13: Design and Performance Analysis of a DRAM-based Statistics Counter Array Architecture

A Useful Theorem

13

The following theorem relates majorization, exchangeable random variables and convex order together.

Page 14: Design and Performance Analysis of a DRAM-based Statistics Counter Array Architecture

Chernoff Bound(1/2)

14

Want to bound the probability that a request queue will overflow in n cycles

Xs,t is the number of updates to the bank during cycles [s, t],

τ = t − s, K is length of request queue. For total overflow probability bound multiply by B.

Page 15: Design and Performance Analysis of a DRAM-based Statistics Counter Array Architecture

Chernoff Bound(2/2)

15

mi : 1 ≤ i ≤ n be the count of the number of appearances of the ith address

m ∗ : worst case counter update sequencesXi, 1 ≤ i ≤ n be the indicator random variable

for whether the ith address is mapped to the DRAM bank

Page 16: Design and Performance Analysis of a DRAM-based Statistics Counter Array Architecture

Overflow Probability Overflow probability for 16 million counters, µ = 1/16, B

= 32.

16

Page 17: Design and Performance Analysis of a DRAM-based Statistics Counter Array Architecture

Memory Usage Comparison

17