fast 1d convolution algorithms - drone research

76
P. Giannakeris, P. Bassia, N. Kilis, R.T. Chadoulis Prof. Ioannis Pitas Aristotle University of Thessaloniki [email protected] www.aiia.csd.auth.gr Version 2.6 Fast 1D Convolution Algorithms

Upload: others

Post on 29-Apr-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fast 1D Convolution Algorithms - Drone Research

P. Giannakeris, P. Bassia, N. Kilis, R.T. ChadoulisProf. Ioannis Pitas

Aristotle University of Thessaloniki

[email protected]

www.aiia.csd.auth.grVersion 2.6

Fast 1D Convolution

Algorithms

Page 2: Fast 1D Convolution Algorithms - Drone Research

Outline

β€’ Signals and systems

β€’ 1D convolutions

Linear & Cyclic 1D convolutions

Discrete Fourier Transform, Fast Fourier Transform

Winograd algorithm

Nested convolutions

Block convolutions.

β€’ Machine learning

Convolutional neural networks.

Page 3: Fast 1D Convolution Algorithms - Drone Research

Motivationβ€’ Machine learning

Fast implementation of 1D/2D/3D convolutions in

Convolutional Neural Networks (CNNs).

β€’ Fast implementation of 1D digital filters

1D signal filtering (e.g., audio/music, ECG, EEG)

1D Signal feature calculation

β€’ Fast implementation of 1D correlation

1D template matching

Time-of-flight (distance) calculation (e.g., sonar)

Page 4: Fast 1D Convolution Algorithms - Drone Research

Motivationβ€’ Fast implementation of 2D/3D convolutions

Image/video filtering

Image/video feature calculation

β€’ Gabor filters

β€’ Spatiotemporal feature calculation

β€’ Fast implementation of 2D correlation

Template matching

Correlation tracking

Page 5: Fast 1D Convolution Algorithms - Drone Research

1D Linear Systems

β€’ Linearity:

𝑇 π‘Žπ‘₯1 + 𝑏π‘₯2 = π‘Žπ‘‡ π‘₯1 + 𝑏𝑇 π‘₯2 .

β€’ Shift-Invariance:

𝑦(𝑛) = 𝑇[π‘₯(𝑛)] β‡’

𝑦 𝑛 βˆ’π‘š = 𝑇 π‘₯ 𝑛 βˆ’π‘š .

LSI system convolution : 𝑦 π‘˜ = β„Ž π‘˜ βˆ— π‘₯ π‘˜ .

𝑇[π‘₯(𝑛)]π‘₯(𝑛) 𝑦(𝑛)

Page 6: Fast 1D Convolution Algorithms - Drone Research

1D Linear Systems

Page 7: Fast 1D Convolution Algorithms - Drone Research

Linear 1D convolution

β€’ The one-dimensional (linear) convolution of:

β€’ an input signal π‘₯ of length 𝐿 and

β€’ a convolution kernel β„Ž (filter mask, finite impulse response) of length 𝑀:

𝑦 π‘˜ = β„Ž π‘˜ βˆ— π‘₯ π‘˜ =

𝑖=0

π‘€βˆ’1

β„Ž 𝑖 π‘₯ π‘˜ βˆ’ 𝑖 .

β€’ For a convolution kernel centered around 0 and 𝑀 = 2𝑣 + 1,

convolution takes the form:

𝑦 π‘˜ = β„Ž π‘˜ βˆ— π‘₯ π‘˜ =

𝑖=βˆ’π‘£

𝑣

β„Ž 𝑖 π‘₯ π‘˜ βˆ’ 𝑖 .

Page 8: Fast 1D Convolution Algorithms - Drone Research

Linear 1D convolution

β€’ The one-dimensional (linear) convolution is a linear operator.

β€’ Signals π‘₯, β„Ž can be interchanged:

𝑦 π‘˜ = β„Ž π‘˜ βˆ— π‘₯ π‘˜ = π‘₯ π‘˜ βˆ— β„Ž π‘˜ .

β€’ It can model any linear shift-invariant system.

β€’ It has a geometrical interpretation:

β€’ Signal is π‘₯ 𝑖 flipped around 𝑖 = 0 producing π‘₯ βˆ’π‘– .

β€’ Flipped signal π‘₯ βˆ’π‘– is shifted by π‘˜ samples, producing π‘₯ π‘˜ βˆ’ 𝑖 .

β€’ Products β„Ž 𝑖 π‘₯ π‘˜ βˆ’ 𝑖 , 𝑖 = 0,… ,𝑀 βˆ’ 1 are calculated and summed.

β€’ This is repeated for other temporal shifts π‘˜.

Page 9: Fast 1D Convolution Algorithms - Drone Research

Linear 1D convolution – Example

Page 10: Fast 1D Convolution Algorithms - Drone Research

Linear 1D convolution – Example

Page 11: Fast 1D Convolution Algorithms - Drone Research

Linear 1D convolution – Example

Page 12: Fast 1D Convolution Algorithms - Drone Research

Linear 1D convolution – Example

Page 13: Fast 1D Convolution Algorithms - Drone Research

Linear 1D correlation

β€’ Correlation of template signal β„Ž and input signal π‘₯ π‘˜ (inner

product):

π‘Ÿβ„Žπ‘₯ π‘˜ =

𝑖=0

π‘€βˆ’1

β„Ž 𝑖 π‘₯ π‘˜ + 𝑖 = 𝒉𝑇𝒙 π‘˜ .

β€’ 𝒉 = β„Ž 0 ,… , β„Ž 𝑀 βˆ’ 1 𝑇: template vector.

β€’ 𝒙 π‘˜ = [π‘₯ π‘˜ , ..., π‘₯ π‘˜ + Μ βˆ’ 1 ]Ξ€ ∢ local signal vector.

Page 14: Fast 1D Convolution Algorithms - Drone Research

Linear 1D correlation

Differences from convolution:

β€’ π‘₯ π‘˜ is not flipped around 𝑖 = 0 .

β€’ It is often confused with convolution: they are identical only

if β„Ž is centered at and is symmetric about 𝑖 = 0.

β€’ It is used for:

β€’ 1D template matching;

β€’ Time-delay estmation.

Page 15: Fast 1D Convolution Algorithms - Drone Research

Linear 1D convolution: matrix

notation

β€’ Vectorial convolution input/output, kernel representation:

𝐱 = π‘₯ 0 ,… , π‘₯(𝐿 βˆ’ 1) 𝑇: the first input vector.

𝐑 = β„Ž 0 ,… , β„Ž(𝑀 βˆ’ 1) 𝑇: the second input vector.

𝐲 = 𝑦 0 ,… , 𝑦 𝑁 βˆ’ 1 𝑇: the output vector, with 𝑁 = 𝐿 +𝑀 βˆ’ 1.

β€’ 1D linear convolution between two discrete signals 𝐱, 𝐑 can be

expressed as the matrix-vector product:𝐲 = 𝐇𝐱,

where 𝐇 is a 𝑁 Γ— 𝐿 matrix.

Page 16: Fast 1D Convolution Algorithms - Drone Research

Linear 1D convolution: matrix

notation

β€’ 𝐇 : a 𝑁 Γ— 𝐿 band matrix of the form:

𝑯 =

β„Ž 0 0 β‹― 0β„Ž(1) β„Ž(0) β‹― β‹―

β‹― β‹― β‹― 0β„Ž(π‘€βˆ’1) β„Ž π‘€βˆ’1 β‹― β„Ž(0)

0 0 β‹― β„Ž(1)β‹― β‹― β‹― β‹―

0 0 β‹― β„Ž(π‘€βˆ’1)

.

β€’ Alternative matrix notation: 𝐲 = 𝐗𝐑, where 𝐗 is an N ×𝑀 matrix.β€’ Fast calculation of the product 𝐲 = 𝐇𝐱 using BLAS/cuBLAS.

Page 17: Fast 1D Convolution Algorithms - Drone Research

Optimal algorithms: minimal

multiplicative complexityβ€’ As multiplications are more expensive than additions, optimal algorithms

having minimal multiplicative complexity have been designed.

β€’ Example: Complex number multiplication normally involves four

multiplications and two additions:

𝑅 + 𝐼𝑖 = π‘Ž + 𝑏𝑖 𝑐 + 𝑑𝑖 = π‘Žπ‘ βˆ’ 𝑏𝑑 + 𝑏𝑐 + π‘Žπ‘‘ 𝑖.β€’ Gauss trick: a way to reduce the number of multiplications to three

(optimal algorithm): π‘Žπ‘, 𝑏𝑑, π‘Ž + 𝑏 𝑐 + 𝑑 ,as: 𝐼 = (π‘Ž + 𝑏)(𝑐 + 𝑑) βˆ’ π‘Žπ‘ βˆ’ 𝑏𝑑.

β€’ Other formulation:

π‘˜1 = 𝑐 π‘Ž + 𝑏 , π‘˜2= π‘Ž 𝑑 βˆ’ 𝑐 , π‘˜3 = 𝑏 𝑐 + 𝑑 ,𝑅 = π‘˜1 βˆ’ π‘˜3,𝐼 = π‘˜1 + π‘˜2.

Page 18: Fast 1D Convolution Algorithms - Drone Research

Optimal Linear Convolution

Algorithmsβ€’ Winograd theory for optimal linear convolution algorithms:

β€’ Minimal number of multiplications.

β€’ The computation of the first π‘š outputs 𝑦 0 ,… , 𝑦(π‘š βˆ’ 1) of an 𝑛-tap filter

can be written as:

𝑦 0

𝑦 1β‹―

𝑦(π‘š βˆ’ 1)

=

π‘₯ 0 π‘₯ 1 β‹― π‘₯(π‘›βˆ’1)

π‘₯ 1 π‘₯ 2 β‹― π‘₯(𝑛)β‹― β‹― β‹―

π‘₯(π‘šβˆ’1) π‘₯ π‘š β‹― π‘₯ π‘š+π‘›βˆ’2

β„Ž 0β„Ž 1β‹―

β„Ž π‘›βˆ’1

.

β€’ Notation consistent with literature.

Page 19: Fast 1D Convolution Algorithms - Drone Research

Optimal Linear Convolution

Algorithms

β€’ Linear convolution can be written as a product 𝐲 = 𝐗𝐑, where 𝐗 is an π‘š ×𝑛 matrix:

𝑋𝑖𝑗 = π‘₯(𝑖 + 𝑗) , 0 ≀ 𝑖 ≀ π‘š βˆ’ 1, 0 ≀ 𝑗 ≀ 𝑛 βˆ’ 1.

β€’ Example: for input π‘₯ π‘˜ with size 𝑛 = 4, kernel β„Ž π‘˜ with size π‘š = 3 and

output 𝑦 π‘˜ with size π‘Ÿ = 2, the matrix vector product:

𝑦 0

𝑦 1=

π‘₯ 0 π‘₯ 1 π‘₯ 2π‘₯ 1 π‘₯ 2 π‘₯ 3

β„Ž 0β„Ž 1β„Ž 2

,

requires 2 Γ— 3 = 6 multiplications and 4 additions.

Page 20: Fast 1D Convolution Algorithms - Drone Research

Optimal Linear Convolution

Algorithmsβ€’ Example: optimal Winograd linear convolution algorithm.

β€’ 8 additions and 4 multiplications.

β€’ 3 additions on β„Ž can be pre-calculated:

𝑦 0

𝑦 1=

π‘₯ 0 π‘₯ 1 π‘₯ 2π‘₯ 1 π‘₯ 2 π‘₯ 3

β„Ž 0β„Ž 1β„Ž 2

=π‘š1 +π‘š2 +π‘š3

π‘š2 βˆ’π‘š3 βˆ’π‘š4,

where:

π‘š1 = π‘₯ 0 βˆ’ π‘₯ 2 β„Ž 0 , π‘š2= π‘₯ 1 + π‘₯ 2β„Ž 0 +β„Ž 1 +β„Ž 2

2,

π‘š4 = π‘₯ 1 βˆ’ π‘₯ 3 β„Ž 2 , π‘š3 = π‘₯ 2 βˆ’ π‘₯ 1β„Ž 0 βˆ’β„Ž 1 +β„Ž 2

2.

Page 21: Fast 1D Convolution Algorithms - Drone Research

Optimal Linear Convolution

Algorithms

Intermediate addition result is used 2 times.

Page 22: Fast 1D Convolution Algorithms - Drone Research

Optimal Linear Convolution

Algorithmsβ€’ Winograd linear convolution algorithm requires π‘š + π‘Ÿ βˆ’ 1multiplications,

π‘š and π‘Ÿ: lengths of 𝑦 and β„Ž, respectively.

β€’ General form of optimal Winograd linear convolution algorithms:

𝐲 = 𝐀T 𝐇𝐑 βŠ— 𝐁𝐓𝐱 ,

βŠ— indicates element-wise π‘š + π‘Ÿ βˆ’ 1multiplications.

𝐱, 𝐑, 𝐲: input signal, filter coefficient and output signal vectors.

Page 23: Fast 1D Convolution Algorithms - Drone Research

Optimal Linear Convolution

Algorithms-ExampleInput signal, filter coefficient, output signal vectors for π‘š = 3 and π‘Ÿ = 2 :

𝐱 = [π‘₯ 0 π‘₯ 1 π‘₯ 2 π‘₯ 3 ] 𝑇, 𝐑 = [β„Ž 0 β„Ž 1 β„Ž 2 ] 𝑇 , 𝐲 = [𝑦 0 𝑦 1 ] 𝑇.

The involved matrices are:

𝐁T =

1 00 1

βˆ’1 01 0

0 βˆ’10 1

1 00 βˆ’1

, 𝐇 =

1 0 0Ξ€1 2 Ξ€1 2 Ξ€1 2Ξ€1 2 Ξ€βˆ’1 2 Ξ€1 20 0 1

,

𝐀T =1 1 1 00 1 βˆ’1 βˆ’1

.

Page 24: Fast 1D Convolution Algorithms - Drone Research

Cyclic 1D convolution

β€’ One-dimensional cyclic convolution of length 𝑁 :

𝑦 π‘˜ = π‘₯ π‘˜ βŠ› β„Ž π‘˜ =

𝑖=0

π‘βˆ’1

β„Ž 𝑖 π‘₯ π‘˜ βˆ’ 𝑖 𝑁 ,

(π‘˜)𝑁= π‘˜ mod 𝑁.

β€’ It is of no use in modeling linear systems.

β€’ Important use: Embedding linear convolution in a fast cyclic convolution

𝑦 𝑛 = π‘₯ 𝑛 βŠ› β„Ž 𝑛 of length 𝑁 β‰₯ 𝐿 +𝑀 βˆ’ 1 and then performing a cyclic

convolution of length 𝑁.

Page 25: Fast 1D Convolution Algorithms - Drone Research

Linear convolution embeddingβ€’ Zero-padding signal vector 𝐱 to length 𝑁: 𝐱p = π‘₯ 0 ,… , π‘₯ 𝛭 βˆ’ 1 , 0, … , 0 𝑇.

β€’ Zero-padding convolution kernel vector 𝐑 to length 𝑁: 𝐑p = β„Ž 0 ,… , β„Ž 𝐿 βˆ’ 1 , 0,… , 0 𝑇.

β€’ Linear convolution embedding in a cyclic convolution of length 𝑁 = 𝐿 +𝑀 βˆ’ 1:

Page 26: Fast 1D Convolution Algorithms - Drone Research

Cyclic 1D convolution - Example

Cyclic convolution of

π‘₯(𝑛) = [1 2 0] and

β„Ž 𝑛 = 3 5 4 .

Clock-wise

Anticlock-wise

Folded sequence

0 𝑠𝑝𝑖𝑛𝑠 1 𝑠𝑝𝑖𝑛 2 𝑠𝑝𝑖𝑛𝑠

1

20

3

5 4

3

5 4 3

5

4

4

3 50 0 02 22

1 1 1

𝑦 0 = 1 Γ— 3 + 2 Γ— 4 + 0 Γ— 5 𝑦 1 = 1 Γ— 5 + 2 Γ— 3 + 0 Γ— 4 𝑦 2 = 1 Γ— 4 + 2 Γ— 5 + 0 Γ— 3

Page 27: Fast 1D Convolution Algorithms - Drone Research

Cyclic 1D convolution: matrix

notationβ€’ Cyclic convolution definition as matrix-vector product:

𝐲 = 𝐇𝐱,

where:

𝐱 = π‘₯ 0 ,… , π‘₯ 𝑁 βˆ’ 1 𝑇: the input vector.

𝐲 = 𝑦 0 ,… , 𝑦 𝑁 βˆ’ 1 𝑇: the output vector.

𝐇 : a N Γ— N Toeplitz matrix of the form:

𝑯 =

β„Ž(0) β„Ž(1) β„Ž(2)β„Ž(𝑁 βˆ’ 1) β„Ž(0) β„Ž(1)β„Ž(𝑁 βˆ’ 2) β„Ž(𝑁 βˆ’ 1) β„Ž(0)

β‹―β‹―β‹―

β„Ž(𝑁 βˆ’ 1)β„Ž(𝑁 βˆ’ 2)β„Ž(𝑁 βˆ’ 3)

β‹― β‹― β‹― β‹― β‹―β„Ž(1) β„Ž(2) β„Ž(3) β‹― β„Ž(0)

.

Page 28: Fast 1D Convolution Algorithms - Drone Research

Cyclic Convolution via DFT

β€’ Cyclic convolution calculation using 1D Discrete Fourier Transform (DFT):

𝐲 = 𝐼𝐷𝐹𝑇 𝐷𝐹𝑇 𝐱 βŠ—π·πΉπ‘‡ 𝐑 .

β€’ Fast calculation of DFT, IDFT through an FFT algorithm.

𝐷𝐹𝑇

𝐷𝐹𝑇

𝐼𝐷𝐹𝑇

π‘₯(𝑛)

β„Ž(𝑛)

𝑋(π‘˜)

𝐻(π‘˜)

Y(π‘˜) 𝑦(𝑛)

Page 29: Fast 1D Convolution Algorithms - Drone Research

1D FFT

β€’ There are various FFT algorithms to speed up the calculation of DFT.

β€’ The best known is the radix-2 decimation-in-time (DIT) Fast FourierTransform (FFT) (Cooley-Tuckey).

1. DFT of a sequence π‘₯ 𝑛 of length 𝑁 (𝑛 = 0,… ,𝑁 βˆ’ 1):

𝑋 π‘˜ = σ𝑛=0π‘βˆ’1 π‘₯ 𝑛 π‘’βˆ’

2πœ‹π‘–

π‘π‘›π‘˜ , π‘˜ = 0,… ,𝑁 βˆ’ 1.

𝑁-th complex roots of unity: π‘Šπ‘π‘› = π‘’βˆ’

2πœ‹π‘–

𝑁𝑛, 𝑛 = 0,… ,𝑁 βˆ’ 1.

Page 30: Fast 1D Convolution Algorithms - Drone Research

1D FFT2. The Radix-2 DIT algorithm rearranges the DFT:

𝑋(π‘˜) =

π‘š=0

𝑁2βˆ’1

π‘₯ 2π‘š π‘’βˆ’2πœ‹π‘–π‘2

π‘šπ‘˜

𝐷𝐹𝑇 π‘œπ‘“ π‘’π‘£π‘’π‘›βˆ’π‘–π‘›π‘‘π‘’π‘₯𝑒𝑑 π‘π‘Žπ‘Ÿπ‘‘ π‘œπ‘“ π‘₯(𝑛)

+π‘’βˆ’2πœ‹π‘–π‘π‘˜

π‘š=0

𝑁2βˆ’1

π‘₯ 2π‘š + 1 π‘’βˆ’2πœ‹π‘–π‘2

π‘šπ‘˜

𝐷𝐹𝑇 π‘œπ‘“ π‘œπ‘‘π‘‘βˆ’π‘–π‘›π‘‘π‘’π‘₯𝑒𝑑 π‘π‘Žπ‘Ÿπ‘‘ π‘œπ‘“ π‘₯(𝑛)

= 𝐸(π‘˜) + π‘’βˆ’2πœ‹π‘–π‘π‘˜π‘‚(π‘˜).

3. The complex exponential is periodic which means that:

𝑋(π‘˜ +𝑁

2) =

π‘š=0

𝑁2βˆ’1

π‘₯ 2π‘š π‘’βˆ’2πœ‹π‘–Ξ€π‘ 2π‘š(π‘˜+

𝑁2) +π‘’βˆ’

2πœ‹π‘–π‘ (π‘˜+

𝑁2)

π‘š=0

𝑁2βˆ’1

π‘₯ 2π‘š + 1 π‘’βˆ’2πœ‹π‘–Ξ€π‘ 2π‘š(π‘˜+

𝑁2)

=

π‘š=0

𝑁2βˆ’1

π‘₯ 2π‘š π‘’βˆ’2πœ‹π‘–Ξ€π‘ 2π‘šπ‘˜π‘’βˆ’2πœ‹π‘šπ‘– +π‘’βˆ’

2πœ‹π‘–π‘ π‘˜ π‘’βˆ’πœ‹π‘–

π‘š=0

𝑁2βˆ’1

π‘₯ 2π‘š + 1 π‘’βˆ’2πœ‹π‘–Ξ€π‘ 2π‘šπ‘˜π‘’βˆ’2πœ‹π‘šπ‘–

=

π‘š=0

𝑁2βˆ’1

π‘₯ 2π‘š π‘’βˆ’2πœ‹π‘–Ξ€π‘ 2π‘šπ‘˜

βˆ’π‘’βˆ’2πœ‹π‘–π‘ π‘˜

π‘š=0

𝑁2βˆ’1

π‘₯ 2π‘š + 1 π‘’βˆ’2πœ‹π‘–Ξ€π‘ 2π‘šπ‘˜

= 𝐸(π‘˜) βˆ’ π‘’βˆ’2πœ‹π‘–π‘ π‘˜π‘‚(π‘˜).

Page 31: Fast 1D Convolution Algorithms - Drone Research

1D FFT

4. We can rewrite 𝑋 as :

𝑋 π‘˜ = 𝐸 π‘˜ + π‘’βˆ’2πœ‹π‘–π‘ π‘˜π‘‚ π‘˜ .

𝑋 π‘˜ +𝑁

2= 𝐸 π‘˜ βˆ’ π‘’βˆ’

2πœ‹π‘–π‘ π‘˜π‘‚ π‘˜ .

5. By inserting a recursion, we can compute in that way 𝑋(π‘˜) and 𝑋(π‘˜ +𝑁

2) as well.

The algorithm gains its speed by re-using the results of intermediate computations to

compute multiple DFT outputs.

Page 32: Fast 1D Convolution Algorithms - Drone Research

1D FFTβ€’ radix-2 FFT breaks a length-N DFT into many size-2 DFTs called

"butterfly" operations.

β€’ There are log2N FFT

stages.

Page 33: Fast 1D Convolution Algorithms - Drone Research

Radix-2 1D FFT properties

β€’ DFT length 𝑁 should be a power of 2.

β€’ The nice FFT structure is based on the properties of the 𝑁-th

complex roots of unity π‘Šπ‘π‘› = π‘’βˆ’

2πœ‹π‘–

𝑁𝑛, 𝑛 = 0,… , 𝑁 βˆ’ 1.

β€’ Computational complexity of the 1D FFT is 𝑂 π‘π‘™π‘œπ‘”2𝑁 .

β€’ Computational complexity of the 1D FFT is 𝑂(𝑁2).

β€’ Other FFT algorithms: Radix-4 FFT (𝑁 is power of 4),

Prime Factor Algorithm (PFA) 𝑁 = 𝑃𝑄, 𝑃, 𝑄 co-prime numbers.

Page 34: Fast 1D Convolution Algorithms - Drone Research

Cyclic 1D convolution

computation

β€’ Its computation by definition requires:

β€’ Two nested for loops

β€’ Computational complexity (number of multiplications) is 𝑂(𝑁2).

β€’ Computational complexity through two 1D FFTs and one inverse 1D

FFT is 𝑂 π‘π‘™π‘œπ‘”2𝑁 .

β€’ Fast linear convolution calculation through cyclic convolution

embedding.

Page 35: Fast 1D Convolution Algorithms - Drone Research

𝒁 transform

𝑋 𝑧 =

𝑛=0

π‘βˆ’1

π‘₯ 𝑛 π‘§βˆ’π‘› .

The 𝑍 transform of a discrete signal π‘₯(𝑛) having domain [0, … , 𝑁] is

given by:

The domain of 𝑍 transform is the complex plane, since 𝑧 is a complex

number.

Convolution property of the 𝑍 transform (polynomial product 𝑋(𝑧)𝐻(𝑧)):

𝑦(𝑛) = π‘₯(𝑛) βˆ— β„Ž(𝑛) ⇔ π‘Œ(𝑧) = 𝑋(𝑧)𝐻(𝑧).

Page 36: Fast 1D Convolution Algorithms - Drone Research

Cyclic convolution and 𝒁 transform

𝑦 π‘˜ = π‘₯ π‘˜ βŠ› β„Ž π‘˜ =

𝑖=0

π‘βˆ’1

β„Ž 𝑖 π‘₯ π‘˜ βˆ’ 𝑖 𝑁 ,

where : (π‘˜)𝑁 = π‘˜ mod 𝑁.

𝑦 π‘˜ = π‘₯ π‘˜ βŠ› β„Ž π‘˜ < => π‘Œ 𝑧 = 𝑋 𝑧 𝐻 𝑧 mod 𝑧𝑁 βˆ’ 1.

Polynomial product form of the 1D cyclic convolution:

Page 37: Fast 1D Convolution Algorithms - Drone Research

Convolutions as polynomial

products-examplesβ€’ 1D linear convolution 𝑦 π‘˜ = β„Ž π‘˜ βˆ— π‘₯ π‘˜ of length 𝐿 = 𝑀 = 3:

β€’ 1D cyclic convolution 𝑦 π‘˜ = π‘₯ π‘˜ βŠ› β„Ž π‘˜of length 𝑁 = 3:

π‘Œ 𝑧 = 𝑋 𝑧 𝐻 𝑧 mod 𝑧3 βˆ’ 1= π‘₯ 0 + π‘₯ 1 𝑧 + π‘₯ 2 𝑧2 β„Ž 0 + β„Ž 1 𝑧 + β„Ž 2 𝑧2 mod 𝑧3 βˆ’ 1.

Factorization: 𝑧3 βˆ’ 1 = 𝑃1 𝑧 𝑃2 𝑧 = (𝑧 βˆ’ 1)(𝑧2 + 𝑧 + 1)(𝑣 = 2 factors).

π‘Œ 𝑧 = 𝑋 𝑧 𝐻 𝑧 = π‘₯ 0 + π‘₯ 1 𝑧 + π‘₯ 2 𝑧2 β„Ž 0 + β„Ž 1 𝑧 + β„Ž 2 𝑧2 .

Page 38: Fast 1D Convolution Algorithms - Drone Research

Chinese Remainder Theorem (CRT)If we have a polynomial Y(z) and:

𝑃 𝑧 = 𝑧𝑁 βˆ’ 1, where the Greatest Common Divider 𝐺𝐢𝐷 𝑃𝑖 𝑧 , 𝑃𝑗 𝑧 = 1 for 𝑖 β‰  𝑗.

𝑃 𝑧 =ΰ·‘

𝑖=1

Ξ½

𝑃𝑖 𝑧 .

π‘Œπ‘– 𝑧 = π‘Œ 𝑧 mod 𝑃𝑖(𝑧).

0 < 𝑑𝑒𝑔 π‘Œ 𝑧 ≀ σ𝑖=1𝜈 𝑑𝑒𝑔[𝑃𝑖(𝑧)] , where deg is the degree of a polynomial.

𝑅𝑖 𝑧 = 𝑑𝑖𝑗 mod 𝑃𝑖(𝑧), 1 ≀ 𝑖 ≀ 𝜈.

Then the polynomial π‘Œ 𝑧 is given by: π‘Œ 𝑧 = σ𝑖=1Ξ½ 𝑅𝑖 𝑧 π‘Œπ‘– 𝑧 mod 𝑧𝑁 βˆ’ 1

Page 39: Fast 1D Convolution Algorithms - Drone Research

From CRT to Winograd

convolution algorithm

π‘Œ(𝑧) = 𝐻(𝑧)𝑋(𝑧) mod 𝑧𝑁 βˆ’ 1.

A 'divide-and-conquer' approach can be used for creating fast convolution

algorithms, by factorizing polynomial 𝑧𝑁 βˆ’ 1 to its prime factors 𝑃𝑖 𝑧 .

Then, we calculate first the items:

𝑋𝑖 𝑧 = 𝑋 𝑧 mod 𝑃𝑖 𝑧 , 𝑖 = 1,… , πœˆπ›¨π‘– 𝑧 = 𝛨 𝑧 mod 𝑃𝑖 𝑧 , 𝑖 = 1,… , 𝜈

and their products:

π‘Œπ‘– 𝑧 = 𝑋𝑖 𝑧 𝐻𝑖 𝑧 mod 𝑃𝑖 𝑧 , 𝑖 = 1,… , 𝜈.

CRT helps us to create fast convolution algorithms by embedding linear

convolutions in cyclic ones:

Page 40: Fast 1D Convolution Algorithms - Drone Research

Winograd algorithm

Fast 1D cyclic convolution with

minimal complexityβ€’ Winograd convolution algorithms or fast

filtering algorithms:

𝐲 = 𝐂 𝐀𝐱⨂𝐁𝐑 .

β€’ They require only 2𝑁 βˆ’ 𝑣 multiplications in

their middle vector product, thus having

minimal complexity.

β€’ 𝜈: number of cyclotomic polynomial

factors of polynomial 𝑧𝑁 βˆ’ 1 over the

rational numbers 𝑄.

Page 41: Fast 1D Convolution Algorithms - Drone Research

Example: Winograd Cyclic

convolution Algorithm for N=3

( )( )3

3 3 2

1 3

/

1 ( ) 1 ( ) ( ) 1 1 1N

N

d

c N

z c z z c z c z z z z z=

βˆ’ = βˆ’ = βˆ’ = βˆ’ + +

( ) ( )3

3( ) ( ) ( ) mod 1 ( ) ( ) ( ) mod 1N

NY z H z X z z Y z H z X z z=

= βˆ’ = βˆ’

1 232

0 1 2

0 0

( ) ( ) ( )N N

n n

n n

n m

X z x z X z x z X z x x z x zβˆ’ =

= =

= = = + +

1 232

0 1 2

0 0

( ) ( ) ( )N N

n n

n n

n m

H z h z H z h z H z h h z h zβˆ’ =

= =

= = = + +

( )( )2( ) ( ) ( ) mod 1 1Y z H z X z z z z= βˆ’ + +

I. Definitions:

Page 42: Fast 1D Convolution Algorithms - Drone Research

Example: Winograd Cyclic

convolution Algorithm for N=3

( ) ( ) ( )1

2

1 0 1 2 1 0 1 2mod mod 1i

i iX X z P X x x z x z z X x x x=

= = + + βˆ’ = + +

( ) ( ) ( ) ( )2

2 2

2 0 1 2 2 0 2 1 2mod 1i

X x x z x z z z X x x x x z=

= + + + + = βˆ’ + βˆ’

( ) ( )1 0 1 2 0 2 0 2 1 2 1 2Similarly for H : and i H h h h b H h h h h z b b z= + + = = βˆ’ + βˆ’ = +

0 1 2 3 1 0 2 2 1 2Let , and :a x x x a x x a x x= + + = βˆ’ = βˆ’ 1 0 2 1 2 and X a X a a z= = +

II. Computing the reduced Polynomials 𝑋𝑖 , 𝐻𝑖:

Page 43: Fast 1D Convolution Algorithms - Drone Research

Example: Winograd Cyclic

convolution Algorithm for N=3

( ) ( ) ( ) ( )1

1 0 0 1 0 0mod ( ) mod 1i

i i i iY X z H z P z Y a b z Y a b=

= = βˆ’ =

( ) ( )

( ) ( ) ( )( ) ( )

2

2 2 2 2

2

2 1 2 1 2

2 1 1 2 2 1 2 2 1 2 2

mod ( )

mod 1

i

Y X z H z P z

Y a a z b b z z z

Y a b a b z a b a b a b

=

=

= + + + +

= βˆ’ + + βˆ’

( ) ( )1 2 2 1 2 2 1 2 1 2 1 1 2One can see, that: , thus, Y becomes: a b a b a b a a b b a b+ βˆ’ = βˆ’ βˆ’ βˆ’ +

( ) ( ) ( )2 1 1 2 2 1 2 1 2 1 1Y a b a b z a a b b a b= βˆ’ + βˆ’ βˆ’ βˆ’ +

( ) ( )0 0 0 1 1 1 2 2 2 1 2 1 2 1 1Let c , c and ca b a b a b a a b b a b= = βˆ’ = βˆ’ βˆ’ βˆ’ +

1 0 2 1 2 and Y c Y c c z= = +

III. Computing the products π‘Œπ‘–:

6 multiplications

4 multiplications

Page 44: Fast 1D Convolution Algorithms - Drone Research

Example: Winograd Cyclic

convolution Algorithm for N=3

( ) ( ) 233

1 1 1

3 3 1

3 3

ppp c z c z z z

R R Rp

=βˆ’ βˆ’ βˆ’ βˆ’ βˆ’= = =

( ) ( ) ( ) ( )0 1 2

2 32

0 1 2 0 1 2 0 1 2

1

1mod 1 2 2

3

NN

i i

iy y y

Y RY z Y c c c c c c z c c c z=

=

= βˆ’ = = + βˆ’ + βˆ’ + + βˆ’ βˆ’

IV. Using the Chinese Remainder Theorem (CRT) to Compute π‘Œ:

( ) ( ) 233

2 2 2

1

3 3

ppc z c z z z

R R Rp

= + += = =

Page 45: Fast 1D Convolution Algorithms - Drone Research

Example: Winograd Cyclic

convolution Algorithm for N=3V. Finding Matrices 𝐴 and 𝐡:

0 0 1 20

0

1 0 21

1

2 1 22

2

3 0 11 2

1 1 1

1 0 1

0 1 1

1 1 0

x x x xax

x x xaA A x A

x x xax

x x xa a

+ + βˆ’ βˆ’ = = = βˆ’ βˆ’

βˆ’βˆ’ βˆ’

Similarly, we get exactly the same result for matrix 𝐡!

Page 46: Fast 1D Convolution Algorithms - Drone Research

Example: Winograd Cyclic

convolution Algorithm for N=3

( )

( ) ( )

( ) ( )

( ) ( )

1 0 1 2 0 0 1 1 2 2 1 2 1 2

2 0 1 2 0 0 1 1 2 2 1 2 1 23 1

3 0 1 2 0 0 1 1 2 2 1 2 1 2

2 2

2 2

2

y c c c a b a b a b a a b b

Y y c c c a b a b a b a a b b

y c c c a b a b a b a a b b

+ βˆ’ + βˆ’ + βˆ’ βˆ’

= = βˆ’ + = + + βˆ’ βˆ’ βˆ’ βˆ’ βˆ’ βˆ’ + + βˆ’ βˆ’

( ) ( ) ( )

( )( )

0 0

0 0

1 1

1 14 1

2 2

2 2

1 2 1 2

1 1 1 1 1 1

1 0 1 1 0 1

0 1 1 0 1 1

1 1 0 1 1 0

a bx h

a bM A x B h x h

a bx h

a a b b

βˆ’ βˆ’ = = = βˆ’ βˆ’ βˆ’ βˆ’βˆ’ βˆ’

VI. Finding matrix 𝐢:

Page 47: Fast 1D Convolution Algorithms - Drone Research

Example: Winograd Cyclic

convolution Algorithm for N=3

( ) ( ) ( )

( ) ( )

( ) ( )

( ) ( )( )( )

3 1 3 4 4 1

0 0

0 0 1 1 2 2 1 2 1 2 00 01 02 03

1 1

0 0 1 1 2 2 1 2 1 2 10 11 12 13

2 2

0 0 1 1 2 2 1 2 1 2 20 21 22 23

1 2 1 2

2

2

2

1 1 2 1

1 1

Y C M

a ba b a b a b a a b b C C C C

a ba b a b a b a a b b C C C C

a ba b a b a b a a b b C C C C

a a b b

C

=

+ βˆ’ + βˆ’ βˆ’

+ + βˆ’ βˆ’ βˆ’ = βˆ’ + + βˆ’ βˆ’ βˆ’ βˆ’

βˆ’

= 1 2

1 2 1 1

βˆ’ βˆ’

VI. Finding matrix 𝐢:

Theoretically minimal number of multiplications: πŸ’!

Page 48: Fast 1D Convolution Algorithms - Drone Research

Winograd algorithm version

with minimal computational

complexity β€’ Can be equivalently expressed as:

𝐲 = 𝐑𝐁T 𝐀𝐱⨂𝐂T𝐑𝐑 .

β€’ Matrices 𝐀, 𝐁 typically have elements 0,+1,βˆ’1.

β€’ Multiplications 𝐂T𝐑𝐑, 𝐑𝐁T𝐲′ are done only by

additions/subtractions.

β€’ 𝐑 is an 𝑁 Γ— 𝑁 permutation matrix.

β€’ 𝐑h can be precomputed.

Page 49: Fast 1D Convolution Algorithms - Drone Research

Block diagram: Winograd Cyclic

convolution Algorithm for N=3

Page 50: Fast 1D Convolution Algorithms - Drone Research

Block diagram: Winograd Cyclic

convolution Algorithm for N=3

Page 51: Fast 1D Convolution Algorithms - Drone Research

Winograd algorithm

Fast 1D cyclic convolution with

minimal complexity

β€’ Winograd algorithm works on small blocks of the input signal.

β€’ The input block and filter are transformed.

β€’ The outputs of the transform are multiplied together in an element-

wise fashion.

β€’ The result is transformed back to obtain the outputs of the

convolution.

β€’ GEneral Matrix Multiplication (GEMM) BLAS or CUBLAS routines can

be used.

Page 52: Fast 1D Convolution Algorithms - Drone Research

Nested convolutions

β€’ Winograd algorithms exist for relatively short convolution lengths,

e.g.: 𝑁 = 3, 5, 7.

β€’ Use of efficient short-length convolution algorithms iteratively to

build long convolutions.

β€’ Does not achieve minimal multiplication complexity.

β€’ Good balance between multiplications and additions.

Decomposition of 1D convolution into a 2D convolution:

β€’ 1D convolution of length: 𝑁 = 𝑁1𝑁2β€’ with 𝑁1, 𝑁2 co-prime integers, 𝑁1, 𝑁2 = 1

β€’ results into a 2D 𝑁1 Γ— 𝑁2 convolution.

Page 53: Fast 1D Convolution Algorithms - Drone Research

Nested convolutions

β€’ Remember circular convolution definition:

𝑦(𝑙) =

𝑛=0

π‘βˆ’1

β„Ž 𝑛 π‘₯( 𝑙 βˆ’ 𝑛 𝑁), 𝑙 = 0, . . , 𝑁 βˆ’ 1.

β€’ If 𝑁 = 𝑁1𝑁2, 𝑙 and 𝑛 can be analyzed as:

𝑙 ≑ 𝑁1𝑙2 + 𝑁2𝑙1 mod 𝑁1𝑁2 𝑙1, 𝑛1 = 0,… ,𝑁1 βˆ’ 1.𝑛 ≑ 𝑁1𝑛2 + 𝑁2𝑛1 mod 𝑁1𝑁2 𝑙2, 𝑛2 = 0,… ,𝑁2 βˆ’ 1.

β€’ 1D circular convolution becomes now a 2D 𝑁1 Γ— 𝑁2 convolution:

𝑦(𝑁1𝑙2 + 𝑁2𝑙1) =

𝑛1=0

𝑁1βˆ’1

𝑛2=0

𝑁2βˆ’1

β„Ž 𝑁1𝑛2 +𝑁2𝑛1 π‘₯(𝑁1(𝑙2βˆ’π‘›2) + 𝑁2(𝑙1 βˆ’ 𝑛1)).

Page 54: Fast 1D Convolution Algorithms - Drone Research

Nested convolutions

β€’ The number of multiplications required are 𝑀1𝑀2, where, 𝑀1 and

𝑀2 is the number of multiplications for convolutions 𝑁1 and 𝑁2accordingly.

β€’ The same applies for the reverse decomposition: 𝑁2 Γ— 𝑁1.

β€’ However, the number of additions varies over the nesting order :𝑁1 Γ— 𝑁2: 𝐴1𝑁2 +𝑀1𝐴2𝑁2 Γ— 𝑁1: 𝐴2𝑁1 +𝑀2𝐴1,

with 𝐴1 and 𝐴2 being the number of additions required for 𝑁1 and 𝑁2accordingly.

β€’ Additions must be calculated beforehand, so as to choose the

nesting order.

Page 55: Fast 1D Convolution Algorithms - Drone Research

Nested convolutionsFor example, lets assume we have the circular convolution

of length 𝑁 = 12 and 𝑁1 = 3,𝑁2 = 4 :

then,𝑙1, 𝑛1 = 0,1,2𝑙2, 𝑛2 = [0, 1, 2, 3]

and

𝑋0 𝑧 = π‘₯0 + π‘₯3𝑧 + π‘₯6𝑧2 + π‘₯9𝑧

3

𝑋1 𝑧 = π‘₯4 + π‘₯7𝑧 + π‘₯10𝑧2 + π‘₯1𝑧

3

𝑋2 𝑧 = π‘₯8 + π‘₯11𝑧 + π‘₯2𝑧2 + π‘₯5𝑧

3

𝐻0 𝑧 = β„Ž0 + β„Ž3𝑧 + β„Ž6𝑧2 + β„Ž9𝑧

3

𝐻1 𝑧 = β„Ž4 + β„Ž7𝑧 + β„Ž10𝑧2 + β„Ž1𝑧

3

𝐻2 𝑧 = β„Ž8 + β„Ž11𝑧 + β„Ž2𝑧2 + β„Ž5𝑧

3

Page 56: Fast 1D Convolution Algorithms - Drone Research

Nested convolutions

𝐗 and 𝐇 can be defined now as 3Γ—4 matrices

with the polynomial terms as elements:

𝑿 =

π‘₯0 π‘₯3 π‘₯6 π‘₯9π‘₯4 π‘₯7 π‘₯10 π‘₯1π‘₯8 π‘₯11 π‘₯2 π‘₯5

𝑯 =

β„Ž0 β„Ž3 β„Ž6 β„Ž9β„Ž4 β„Ž7 β„Ž10 β„Ž1β„Ž8 β„Ž11 β„Ž2 β„Ž5

Page 57: Fast 1D Convolution Algorithms - Drone Research

Nested convolutions

β€’ We use Winograd 3-point circular convolution C, A, and B matrices at

the outer nesting layer:

𝐲 = 𝐂 𝐀𝐗⨂𝐁𝐇

β€’ The matrix-vector products 𝐀𝒙 and 𝐁𝒉, are replaced by matrix

multiplication 𝐀𝐗 and 𝐁𝐇 respectively, each one resulting to a 4Γ—4

matrix.

β€’ We use a fast 4-point Winograd routine to calculate the 4 circular

convolutions 𝐀𝐗 ⨂ 𝐁𝐇 (one per row). The 4-point routine is β€œnested”

inside the 3-point routine.

β€’ The correct order of the elements in the result can be found based on

the nesting formulation:

𝑦𝑁1𝑙2+𝑁2𝑙1 .

Page 58: Fast 1D Convolution Algorithms - Drone Research

Block Based convolutions

β€’ Input signal x 𝑛 is split in overlapping/non-overlapping blocks.

β€’ Blocks are convolved independently.

β€’ Great parallelism is achieved.

β€’ Two block-based convolution methods:

β€’ Overlap-add method

β€’ Overlap-save method.

Page 59: Fast 1D Convolution Algorithms - Drone Research

Overlap-add method

It uses the following signal processing principles:

β€’ The linear convolution of two discrete-time signals of length 𝐿 and 𝑀results in a discrete-time convolved signal of length 𝑁 = 𝐿 +𝑀 βˆ’ 1.

β€’ Additivity:

[π‘₯1 𝑛 + π‘₯2 𝑛 ] βˆ— β„Ž 𝑛 = π‘₯1 𝑛 βˆ— β„Ž 𝑛 + π‘₯2 𝑛 βˆ— β„Ž 𝑛 .

Page 60: Fast 1D Convolution Algorithms - Drone Research

Overlap-add method

1. Break the input signal π‘₯(𝑛) into non-overlapping blocks π‘₯π‘š 𝑛 of length 𝐿.

2. Zero pad β„Ž 𝑛 to be of length 𝑁 = 𝐿 +𝑀 βˆ’ 1.

Calculate the 𝐷𝐹𝑇 𝐻(π‘˜),π‘˜ = 0,1,… ,𝑁 βˆ’ 1.

3. For each block π‘š:

β€’ Zero pad π‘₯π‘š 𝑛 to be of length 𝑁 = 𝐿 +𝑀 βˆ’ 1. Calculate the DFT π‘‹π‘š π‘˜ .

β€’ Use the IDFT of 𝐻 π‘˜ π‘‹π‘š π‘˜ for block output π‘¦π‘š 𝑛 ,𝑛 = 0,1,… ,𝑁 βˆ’ 1.

4. Form overall output 𝑦 𝑛 by overlapping the last 𝑀 βˆ’ 1 samples of π‘¦π‘š 𝑛 with the

first 𝑀 βˆ’ 1 samples π‘¦π‘š+1 𝑛 and adding the result.

Page 61: Fast 1D Convolution Algorithms - Drone Research

Overlap-add method𝐿 𝐿𝐿

π‘₯2 𝑛

π‘₯3 𝑛

π‘₯1 𝑛

𝑦2 𝑛

𝑦1 𝑛

𝑦3 𝑛

(𝑀 βˆ’ 1) zeros

(𝑀 βˆ’ 1) zeros

(𝑀 βˆ’ 1) zeros

(𝑀 βˆ’ 1) points

(𝑀 βˆ’ 1) points

(𝑀 βˆ’ 1) points

(𝑀 βˆ’ 1) points

(𝑀 βˆ’ 1) points

Page 62: Fast 1D Convolution Algorithms - Drone Research

Overlap-add method

Page 63: Fast 1D Convolution Algorithms - Drone Research

Overlap-add method

Page 64: Fast 1D Convolution Algorithms - Drone Research

Overlap-save method

It uses the following signal processing principles:

β€’ The linear convolution of two discrete-time signals of length 𝑁 and 𝑀results in a discrete-time convolved signal of length 𝑁 = 𝐿 +𝑀 βˆ’ 1.

β€’ Time-Domain Aliasing:

π‘₯𝑐 𝑛 =

𝑙=βˆ’βˆž

∞

π‘₯𝐿 𝑛 βˆ’ 𝑙𝑁 , 𝑛 = 0,1, … , 𝑁 βˆ’ 1.

Page 65: Fast 1D Convolution Algorithms - Drone Research

Overlap-save method

Overlap-Save method: 1. Pad 𝑀 βˆ’ 1 zeros at the beginning of the input sequence π‘₯(𝑛).2. Break the padded input signal into overlapping blocks π‘₯π‘š 𝑛 of length 𝑁 = 𝐿 +𝑀 βˆ’ 1, where the overlap length is 𝑀 βˆ’ 1.

3. Zero-pad β„Ž(𝑛) to be of length 𝑁 = 𝐿 +𝑀 βˆ’ 1.

Calculate the 𝐷𝐹𝑇 𝐻(π‘˜),π‘˜ = 0,1,… ,𝑁 βˆ’ 1.

4. For each block π’Ž:β€’ Calculate the 𝐷𝐹𝑇 π‘‹π‘š π‘˜ ,π‘˜ = 0,1,… ,𝑁 βˆ’ 1.

β€’ Calculate the IDFT of: π‘Œπ‘š π‘˜ = π‘‹π‘š π‘˜ 𝐻(π‘˜), π‘˜ = 0,1,… ,𝑁 βˆ’ 1 to get the block output

π‘¦π‘š 𝑛 ,𝑛 = 0,1,… ,𝑁 βˆ’ 1.

5. Discard the first 𝑀 βˆ’ 1 points of each output block π‘¦π‘š 𝑛 . Concatenate the remaining (i.e., last) 𝐿 samples of each block π‘¦π‘š 𝑛 to form the output 𝑦 𝑛 .

Page 66: Fast 1D Convolution Algorithms - Drone Research

Overlap-save method

𝐿 𝐿 𝐿

(𝑀 βˆ’ 1) zeros(𝑀 βˆ’ 1) point overlap

(𝑀 βˆ’ 1) point overlap

𝐼𝑛𝑝𝑒𝑑 π‘ π‘–π‘”π‘›π‘Žπ‘™ π‘π‘™π‘œπ‘π‘˜π‘ 

𝑂𝑒𝑑𝑝𝑒𝑑 π‘ π‘–π‘”π‘›π‘Žπ‘™ π‘π‘™π‘œπ‘π‘˜π‘ 

Discard (𝑀 βˆ’ 1) points

Page 67: Fast 1D Convolution Algorithms - Drone Research

Overlap-save method

Page 68: Fast 1D Convolution Algorithms - Drone Research

Overlap-save method

Page 69: Fast 1D Convolution Algorithms - Drone Research

Applications

β€’ Convolutional neural networks

β€’ Signal processing

Signal filtering

Signal restoration

Signal deconvolution

β€’ Signal analysis

Time delay estimation

Distance calculation (e.g., sonar)

1D template matching

Page 70: Fast 1D Convolution Algorithms - Drone Research

Convolutional Neural Networks

β€’ Two step architecture:

β€’ First layers with sparse NN connections: convolutions.

β€’ Fully connected final layers.

β€’ Need for fast convolution calculations.

Convergence of machine learning and signal processing

Page 71: Fast 1D Convolution Algorithms - Drone Research

Deep Learning Frameworks

Image Source: Heehoon Kim, Hyoungwook Nam, Wookeun Jung, and Jaejin Le - Performance Analysis of CNN Frameworks for GPUs

Page 72: Fast 1D Convolution Algorithms - Drone Research

Convolutions in DL Frameworks

β€’ All 5 frameworks work with cuDNN as backend.

β€’ cuDNN unfortunately not open source.

β€’ cuDNN supports FFT and Winograd convolutions.

Image Source: Heehoon Kim, Hyoungwook Nam, Wookeun Jung, and Jaejin Le - Performance Analysis of CNN Frameworks for GPUs

Page 73: Fast 1D Convolution Algorithms - Drone Research

Basic Exercises

1. Implement the definition of the 1D linear convolution algorithm

in C/C++ or Python.

2. Implement the definition of the 1D DFT algorithm in C/C++ or

Python.

3. Implement the 1D cyclic convolution algorithm for length 𝑁 =2𝑛 in C/C++ using FFT.

4. Derive analytically the 1D Winograd cyclic convolution

algorithm for length 𝑁 = 3.

5. Implement the 1D Winograd cyclic convolution algorithm for

length 𝑁 = 3 in C/C++ or Python.

Page 74: Fast 1D Convolution Algorithms - Drone Research

Advanced Exercises

1. Implement the 1D linear convolution for filter length M and

signal length 𝐿 in C/C++ using your own myFFT routine.

2. Implement a nested convolution algorithm for length 𝑁 = 𝑁1𝑁2in C/C++.

3. Study the properties of cyclotomic polynomials. Derive

analytically the 1D Winograd cyclic convolution algorithm for

length 𝑁 = 4,𝑁 = 6 or 𝑁 = 9.

4. Implement the 1D linear convolution algorithm for filter length

M and a large signal length 𝐿 using overlap-add in a) C/C++ or

b) CUDA.

Page 75: Fast 1D Convolution Algorithms - Drone Research

References

β€’ Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol. 521, no. 7553, p. 436, 2015.

β€’ V. Oppenheim, Discrete-time signal processing, Pearson Education India, 1999.

β€’ I. Pitas, Digital image processing algorithms and applications. John Wiley & Sons, 2000.

β€’ S. Winograd, Arithmetic complexity of computations, vol. 33. Siam, 1980.

β€’ J. H. McClellan and C. M. Rader, Number theory in digital signal processing," 1979.

β€’ H. J. Nussbaumer, Fast Fourier transform and convolution algorithms.

β€’ Springer Science & Business Media, 1981.

β€’ I. Pitas and M. Strintzis, Multidimensional cyclic convolution algorithms with minimal multiplicative

complexity," IEEE transactions on acoustics, speech, and signal processing, vol. 35, no. 3, pp. 384-

390, 1987.

β€’ A. Lavin and S. Gray, Fast algorithms for convolutional neural networks, Proceedings CVPR, pp.

4013-4021, 2016.

Page 76: Fast 1D Convolution Algorithms - Drone Research

Q & A

Thank you very much for your attention!

Contact: Prof. I. Pitas

[email protected]