what is the wht anyway, and why are there so many ways to compute it? jeremy johnson 1, 2, 6, 24,...

22
What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…

Upload: elwin-singleton

Post on 11-Jan-2016

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…

What is the WHT anyway, and why are there so many ways to

compute it?

Jeremy Johnson

1, 2, 6, 24, 112, 568, 3032, 16768,…

Page 2: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…

Walsh-Hadamard Transform

• y = WHTN x, N = 2n

n

N WHTWHTWHT 22...

11

11WHT2

1111

1111

1111

1111

11

11

11

11

WHTWHTWHT 224

Page 3: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…

WHT Algorithms

• Factor WHTN into a product of sparse structured matrices

• Compute: y = (M1 M2 … Mt)xyt = Mtx…

y2 = M2 y3

y = M1 y2

Page 4: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…

Factoring the WHT Matrix

• AC DCD• A • A C) = (A C

• Im nmn

1100

1100

0011

0011

1010

0101

1010

0101

1111

1111

1111

1111

4WHT

WHT2 WHT2WHT2WHT2

Page 5: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…

Recursive and Iterative Factorization

WHT8 WHT2WHT4

WHT2WHT2 I2WHT

WHT2WHT2 I2I2WHT

WHT2WHT2 I2I2WHT

WHT2WHT2 I2I2WHT

WHT2WHT2 I4WHT

Page 6: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…

WHT8 =WHT2WHT2

I4WHT

11

11

11

11

11

11

11

11

11

11

1

11

11

11

11

11

11

11

11

11

11

11

11

11

11111111

11111111

11111111

11111111

11111111

11111111

11111111

11111111

Page 7: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…

WHT Algorithms

• Recursive

• Iterative

• General

n

i

ini

N1

IWHTIWHT 2221

WHTIIWHTWHT 2 2/2/2 NNN

nnn t

t

i

nnnnnn tiii

1

1

where

,2222 IWHTIWHT 111

Page 8: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…

WHT Implementation

• Definition/formula– N=N1* N2Nt Ni=2ni – x=WHTN*x x =(x(b),x(b+s),…x(b+(M-1)s))

• Implementation(nested loop) R=N; S=1; for i=t,…,1 R=R/Ni

for j=0,…,R-1 for k=0,…,S-1 S=S* Ni;

Mb,s

t

i 1

)

nn WHTWHT 222 II( n1+ ··· + ni-1 2ni+1+ ··· + nt

i

ii

i

i

NSkSjNN

NSkSjN xWHTx ,,

i

Page 9: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…

Partition Trees

4

1 3

1 2

1 1

4

1 1 11

Right Recursive

Iterative

9

3 4 2

1 2 1

1 1

4

13

12

11

Left Recursive

4

2 2

1 1 1 1

Balanced

Page 10: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…

Ordered Partitions

• There is a 1-1 mapping from ordered partitions of n onto (n-1)-bit binary numbers.

There are 2n-1 ordered partitions of n.

1|1 1|1 1 1 1|1 1 1+2+4+2 = 9

162 = 1 0 1 0 0 0 1 0

Page 11: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…

Enumerating Partition Trees

3 3

12

00 01

3

12

11

01

3

21

10

3

1 2

1 1

10

3

11

11

1

Page 12: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…

Counting Partition Trees

1,1

1,1 TTT 1

1

n

nn

nnnn t

tn

8.6),/(

)22(2

8811

))T(1(

)T()1(

2462

2/3

2

432

0

T

)T(

T)T(

n

z

zzzzzz

n

n

n

nn

z

zz

z

zzz

Page 13: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…

WHT PackagePüschel & Johnson (ICASSP ’00)• Allows easy implementation of any of the possible WHT algorithms• Partition tree representation W(n)=small[n] | split[W(n1),…W(nt)]• Tools

– Measure runtime of any algorithm

– Measure hardware events (coupled with PCL)

– Search for good implementation

• Dynamic programming

• Evolutionary algorithm

Page 14: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…

Histogram (n = 16, 10,000 samples)

•Wide range in performance despite equal number of arithmetic operations (n2n flops)

•Pentium III consumes more run time (more pipeline stages)

•Ultra SPARC II spans a larger range

Page 15: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…

Operation Count

Theorem. Let WN be a WHT algorithm of size N. Then the number of floating point operations (flops) used by WN is Nlg(N).

Proof. By induction.

2222

)(2)(

11

1

nninnn

Nflopsnn

Nflops

nnn

WWt

ii

it

i

i

i

t

i

i

Page 16: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…

Instruction Count Model

)()()A()IC(3

1

3

1AL nlninn

ll

ii

A(n) = number of calls to WHT procedure= number of instructions outside loopsAl(n) = Number of calls to base case of size l l = number of instructions in base case of size l

Li = number of iterations of outer (i=1), middle (i=2), and outer (i=3) loop i = number of instructions in outer (i=1), middle (i=2), and outer (i=3) loop body

Page 17: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…

Small[1].file "s_1.c"

.version "01.01"

gcc2_compiled.:

.text

.align 4

.globl apply_small1

.type apply_small1,@function

apply_small1:

movl 8(%esp),%edx //load stride S to EDX

movl 12(%esp),%eax //load x array's base address to EAX

fldl (%eax) // st(0)=R7=x[0]

fldl (%eax,%edx,8) //st(0)=R6=x[S]

fld %st(1) //st(0)=R5=x[0]

fadd %st(1),%st // R5=x[0]+x[S]

fxch %st(2) //st(0)=R5=x[0],s(2)=R7=x[0]+x[S]

fsubp %st,%st(1) //st(0)=R6=x[S]-x[0] ?????

fxch %st(1) //st(0)=R6=x[0]+x[S],st(1)=R7=x[S]-x[0]

fstpl (%eax) //store x[0]=x[0]+x[S]

fstpl (%eax,%edx,8) //store x[0]=x[0]-x[S]

ret

Page 18: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…

Recurrences

leaf a ,0)A(

... ),A(1)A(1

12

n

nnnn

n

nnnti

t

i

i

lnl

ln

llA leaves ofnumber where,)( 2

Page 19: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…

Recurrences

leaf a ,0)(

... ,)()(

... ,)()(

... ),()(

L

2L2L

2L2L

L2L

11

23

1

...

122

11

11

11

n

nnnn

nnnn

nnnn

n

nnnnn

nnnnn

nntn

i

ti

t

i

ti

t

i

ti

t

i

ii

ii

i

Page 20: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…

Histogram using Instruction Model (P3)

l = 12, l = 34, and l = 106 = 271 = 18, 2 = 18, and 1 = 20

Page 21: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…

Algorithm Comparison

Recursive/Iterative Runtime

0.00E+002.00E-014.00E-016.00E-018.00E-011.00E+00

1.20E+001.40E+001.60E+001.80E+002.00E+00

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

WHT size(2̂ n)

ratio r1/i1

Rec &Bal/It Instruction Count

0

0.5

1

1.5

2

2.5

1 3 5 7 9 11 13 15 17 19

rr1/i1

lr1/i1

bal1/i1

Rec&It/Best Runtime

0.00E+00

2.00E+00

4.00E+00

6.00E+00

8.00E+00

1.00E+01

1.20E+01

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

WHT size(2̂ n)

ratio

r1/b

r3/b

i1/b

i3/b

b/b

Small/It Runtime

0.00E+002.00E+00

4.00E+006.00E+00

8.00E+001.00E+01

1.20E+01

1 2 3 4 5 6 7 8

WHT size(2̂ n)

ratio I_1/rt

r_1/rt

Page 22: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…

Dynamic Programming

minn1+… + nt= n

Cost(T T…

n1 nt

n),

where Tn is the optimial tree of size n.

This depends on the assumption that Cost only depends onthe size of a tree and not where it is located.(true for IC, but false for runtime).

For IC, the optimal tree is iterative with appropriate leaves.For runtime, DP is a good heuristic (used with binary trees).