how to multiply · matrix multiplication. given two n-by-n matrices a and b, compute c = ab....
TRANSCRIPT
1
How to Multiply
Slides by Kevin Wayne.Copyright © 2005 Pearson-Addison Wesley.All rights reserved.
integers, matrices, and polynomials
2
Complex Multiplication
Complex multiplication. (a + bi) (c + di) = x + yi.
Grade-school. x = ac - bd, y = bc + ad.
Q. Is it possible to do with fewer multiplications?
4 multiplications, 2 additions
3
Complex Multiplication
Complex multiplication. (a + bi) (c + di) = x + yi.
Grade-school. x = ac - bd, y = bc + ad.
Q. Is it possible to do with fewer multiplications?A. Yes. [Gauss] x = ac - bd, y = (a + b) (c + d) - ac - bd.
Remark. Improvement if no hardware multiply.
4 multiplications, 2 additions
3 multiplications, 5 additions
5.5 Integer Multiplication
5
Addition. Given two n-bit integers a and b, compute a + b.Grade-school. Θ(n) bit operations.
Remark. Grade-school addition algorithm is optimal.
Integer Addition
1
011 1
110 1+
010 1
111
010 1
011 1
100 0
10111
6
Integer Multiplication
Multiplication. Given two n-bit integers a and b, compute a × b.Grade-school. Θ(n2) bit operations.
Q. Is grade-school multiplication algorithm optimal?
1
1
1
0
0
0
1
1
1
0
1
0
1
1
1
0
1
0
1
1
1
1
0
1
00000000
01010101
01010101
01010101
01010101
01010101
00000000
100000000001011
0
1
1
1
1
1
0
0
×
7
To multiply two n-bit integers a and b: Multiply four ½n-bit integers, recursively. Add and shift to obtain result.
Ex.
Divide-and-Conquer Multiplication: Warmup
!
T (n) = 4T n /2( )recursive calls
1 2 4 3 4 + "(n)
add, shift
1 2 3 # T (n) ="(n
2)
!
a = 2n / 2
" a1
+ a0
b = 2n / 2
"b1
+ b0
ab = 2n / 2
" a1+ a
0( ) 2n / 2
"b1
+ b0( ) = 2
n" a
1b
1 + 2
n / 2" a
1b
0+ a
0b
1( ) + a0b
0
a = 10001101 b = 11100001
a1 a0 b1 b0
8
Recursion Tree
!
T (n) =0 if n = 0
4T (n /2) + n otherwise
" # $
n
4(n/2)
16(n/4)
4k (n / 2k)
4 lg n (1)
T(n)
T(n/2)
T(n/4) T(n/4)
T(2) T(2) T(2) T(2) T(2) T(2) T(2) T(2)
T(n / 2k)
T(n/4)
T(n/2)
T(n/4) T(n/4)
T(n/2)
T(n/4) T(n/4)T(n/4)
...
...
!
T (n) = n 2k
k=0
lgn
" = n2
1+ lgn #1
2#1
$
% &
'
( ) = 2n
2 #n
T(n/2)
...
...
......
9
To multiply two n-bit integers a and b: Add two ½n bit integers. Multiply three ½n-bit integers, recursively. Add, subtract, and shift to obtain result.
Karatsuba Multiplication
!
a = 2n / 2
" a1 + a0
b = 2n / 2
"b1 + b0
ab = 2n" a1b1 + 2
n / 2" a1b0 + a0b1( ) + a0b0
= 2n" a1b1 + 2
n / 2" (a1 + a0 ) (b1 +b0 ) # a1b1 # a0b0( ) + a0b0
1 2 1 33
10
To multiply two n-bit integers a and b: Add two ½n bit integers. Multiply three ½n-bit integers, recursively. Add, subtract, and shift to obtain result.
Theorem. [Karatsuba-Ofman 1962] Can multiply two n-bit integersin O(n1.585) bit operations.
Karatsuba Multiplication
!
T (n) " T n /2# $( ) + T n /2% &( ) + T 1+ n /2% &( )recursive calls
1 2 4 4 4 4 4 4 4 3 4 4 4 4 4 4 4 + '(n)
add, subtract, shift
1 2 4 3 4 ( T (n) = O(n
lg 3) = O(n
1.585)
!
a = 2n / 2
" a1 + a0
b = 2n / 2
"b1 + b0
ab = 2n" a1b1 + 2
n / 2" a1b0 + a0b1( ) + a0b0
= 2n" a1b1 + 2
n / 2" (a1 + a0 ) (b1 +b0 ) # a1b1 # a0b0( ) + a0b0
1 2 1 33
11
Karatsuba: Recursion Tree
!
T (n) =0 if n = 0
3T (n /2) + n otherwise
" # $
n
3(n/2)
9(n/4)
T(n)
T(n/2)
T(n/4) T(n/4)T(n/4)
T(n/2)
T(n/4) T(n/4)T(n/4)
T(n/2)
T(n/4) T(n/4)T(n/4)
!
T (n) = n 32( )
k
k=0
lgn
" = n32( )
1+ lgn#1
32#1
$
%
& &
'
(
) ) = 3n
lg 3 #2n
3 lg n (1)T(2) T(2) T(2) T(2) T(2) T(2) T(2) T(2)
T(n / 2k)
...
......
......
3k (n / 2k)
12
Integer division. Given two n-bit (or less) integers s and t,compute quotient q = s / t and remainder r = s mod t.
Fact. Complexity of integer division is same as integer multiplication.To compute quotient q:
Approximate x = 1 / t using Newton's method: After log n iterations, either q = s x or q = s x.
!
xi+1
= 2xi" t x
i
2
Fast Integer Division Too (!)
using fastmultiplication
Matrix Multiplication
14
Dot product. Given two length n vectors a and b, compute c = a ⋅ b.Grade-school. Θ(n) arithmetic operations.
Remark. Grade-school dot product algorithm is optimal.
Dot Product
!
a " b = aibi
i=1
n
#
!
a = .70 .20 .10[ ]
b = .30 .40 .30[ ]a " b = (.70 # .30) + (.20 # .40) + (.10 # .30) = .32
15
Matrix multiplication. Given two n-by-n matrices A and B, compute C = AB.Grade-school. Θ(n3) arithmetic operations.
Q. Is grade-school matrix multiplication algorithm optimal?
Matrix Multiplication
!
cij = aikbkj
k=1
n
"
!
c11
c12
L c1n
c21
c22
L c2n
M M O M
cn1
cn2
L cnn
"
#
$ $ $ $
%
&
' ' ' '
=
a11
a12
L a1n
a21
a22
L a2n
M M O M
an1
an2
L ann
"
#
$ $ $ $
%
&
' ' ' '
(
b11
b12
L b1n
b21
b22
L b2n
M M O M
bn1
bn2
L bnn
"
#
$ $ $ $
%
&
' ' ' '
!
.59 .32 .41
.31 .36 .25
.45 .31 .42
"
#
$ $ $
%
&
' ' '
=
.70 .20 .10
.30 .60 .10
.50 .10 .40
"
#
$ $ $
%
&
' ' '
(
.80 .30 .50
.10 .40 .10
.10 .30 .40
"
#
$ $ $
%
&
' ' '
16
Block Matrix Multiplication
!
C11
= A11"B
11 + A
12"B
21 =
0 1
4 5
#
$ %
&
' ( "
16 17
20 21
#
$ %
&
' ( +
2 3
6 7
#
$ %
&
' ( "
24 25
28 29
#
$ %
&
' ( =
152 158
504 526
#
$ %
&
' (
!
152 158 164 170
504 526 548 570
856 894 932 970
1208 1262 1316 1370
"
#
$ $ $ $
%
&
' ' ' '
=
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
"
#
$ $ $ $
%
&
' ' ' '
(
16 17 18 19
20 21 22 23
24 25 26 27
28 29 30 31
"
#
$ $ $ $
%
&
' ' ' '
C11A11 A12 B11
B11
17
Matrix Multiplication: Warmup
To multiply two n-by-n matrices A and B: Divide: partition A and B into ½n-by-½n blocks. Conquer: multiply 8 pairs of ½n-by-½n matrices, recursively. Combine: add appropriate products using 4 matrix additions.
!
C11
= A11" B
11( ) + A12" B
21( )C
12= A
11" B
12( ) + A12" B
22( )C
21= A
21" B
11( ) + A22" B
21( )C
22= A
21" B
12( ) + A22" B
22( )
!
C11
C12
C21
C22
"
# $
%
& ' =
A11
A12
A21
A22
"
# $
%
& ' (
B11
B12
B21
B22
"
# $
%
& '
!
T (n) = 8T n /2( )recursive calls
1 2 4 3 4 + "(n
2)
add, form submatrices
1 2 4 4 3 4 4 # T (n) ="(n
3)
18
Fast Matrix Multiplication
Key idea. multiply 2-by-2 blocks with only 7 multiplications.
7 multiplications. 18 = 8 + 10 additions and subtractions.
!
P1 = A11 " (B12 # B22 )
P2 = (A11 + A12 ) " B22
P3 = (A21 + A22 ) " B11
P4 = A22 " (B21 # B11)
P5 = (A11 + A22 ) " (B11 + B22 )
P6 = (A12 # A22 ) " (B21 + B22 )
P7 = (A11 # A21) " (B11 + B12 )
!
C11
= P5
+ P4" P
2+ P
6
C12
= P1+ P
2
C21
= P3
+ P4
C22
= P5
+ P1" P
3" P
7
!
C11
C12
C21
C22
"
# $
%
& ' =
A11
A12
A21
A22
"
# $
%
& ' (
B11
B12
B21
B22
"
# $
%
& '
19
Fast Matrix Multiplication
To multiply two n-by-n matrices A and B: [Strassen 1969] Divide: partition A and B into ½n-by-½n blocks. Compute: 14 ½n-by-½n matrices via 10 matrix additions. Conquer: multiply 7 pairs of ½n-by-½n matrices, recursively. Combine: 7 products into 4 terms using 8 matrix additions.
Analysis. Assume n is a power of 2. T(n) = # arithmetic operations.
!
T (n) = 7T n /2( )recursive calls
1 2 4 3 4 + "(n
2)
add, subtract
1 2 4 3 4 # T (n) ="(n
log2 7) =O(n
2.81)
20
Fast Matrix Multiplication: Practice
Implementation issues. Sparsity. Caching effects. Numerical stability. Odd matrix dimensions. Crossover to classical algorithm around n = 128.
Common misperception. “Strassen is only a theoretical curiosity.” Apple reports 8x speedup on G4 Velocity Engine when n ≈ 2,500. Range of instances where it's useful is a subject of controversy.
Remark. Can "Strassenize" Ax = b, determinant, eigenvalues, SVD, ….
21
Begun, the decimal wars have. [Pan, Bini et al, Schönhage, …]
Fast Matrix Multiplication: Theory
Q. Multiply two 2-by-2 matrices with 7 scalar multiplications?
!
" (nlog3 21) = O(n
2.77)
!
O(n2.7801
)
!
"(nlog2 6) = O(n
2.59)
!
"(nlog2 7 ) =O(n
2.807)A. Yes! [Strassen 1969]
Q. Multiply two 2-by-2 matrices with 6 scalar multiplications?A. Impossible. [Hopcroft and Kerr 1971]
Q. Two 3-by-3 matrices with 21 scalar multiplications?A. Also impossible.
Two 48-by-48 matrices with 47,217 scalar multiplications.
December, 1979.
!
O(n2.521813
)
!
O(n2.521801
) January, 1980.
A year later.
!
O(n2.7799
)
Two 20-by-20 matrices with 4,460 scalar multiplications.
!
O(n2.805
)
22
Fast Matrix Multiplication: Theory
Best known. O(n2.376) [Coppersmith-Winograd, 1987]
Conjecture. O(n2+ε) for any ε > 0.
Caveat. Theoretical improvements to Strassen are progressivelyless practical.
5.6 Convolution and FFT
24
Fourier Analysis
Fourier theorem. [Fourier, Dirichlet, Riemann] Any periodic functioncan be expressed as the sum of a series of sinusoids. sufficiently smooth
t
N = 1N = 5N = 10N = 100
!
y(t) = 2
"
sin kt
kk=1
N
#
25
Euler's Identity
Sinusoids. Sum of sine an cosines.
Sinusoids. Sum of complex exponentials.
eix = cos x + i sin x
Euler's identity
26
Time Domain vs. Frequency Domain
Signal. [touch tone button 1]
Time domain.
Frequency domain.
!
y(t) = 12sin(2" # 697 t) + 1
2sin(2" # 1209 t)
Reference: Cleve Moler, Numerical Computing with MATLAB
frequency (Hz)
amplitude
0.5
time (seconds)
soundpressure
27
Time Domain vs. Frequency Domain
Signal. [recording, 8192 samples per second]
Magnitude of discrete Fourier transform.
Reference: Cleve Moler, Numerical Computing with MATLAB
28
Fast Fourier Transform
FFT. Fast way to convert between time-domain and frequency-domain.
Alternate viewpoint. Fast way to multiply and evaluate polynomials.
If you speed up any nontrivial algorithm by a factor of amillion or so the world will beat a path towards findinguseful applications for it. -Numerical Recipes
we take this approach
29
Fast Fourier Transform: Applications
Applications. Optics, acoustics, quantum physics, telecommunications, radar,
control systems, signal processing, speech recognition, datacompression, image processing, seismology, mass spectrometry…
Digital media. [DVD, JPEG, MP3, H.264] Medical diagnostics. [MRI, CT, PET scans, ultrasound] Numerical solutions to Poisson's equation. Shor's quantum factoring algorithm. …
The FFT is one of the truly great computationaldevelopments of [the 20th] century. It has changed theface of science and engineering so much that it is not anexaggeration to say that life as we know it would be verydifferent without the FFT. -Charles van Loan
30
Fast Fourier Transform: Brief History
Gauss (1805, 1866). Analyzed periodic motion of asteroid Ceres.
Runge-König (1924). Laid theoretical groundwork.
Danielson-Lanczos (1942). Efficient algorithm, x-ray crystallography.
Cooley-Tukey (1965). Monitoring nuclear tests in Soviet Union andtracking submarines. Rediscovered and popularized FFT.
Importance not fully realized until advent of digital computers.
31
Polynomials: Coefficient Representation
Polynomial. [coefficient representation]
Add. O(n) arithmetic operations.
Evaluate. O(n) using Horner's method.
Multiply (convolve). O(n2) using brute force.
!
A(x) = a0 + a1x + a2x2
+L+ an"1x
n"1
!
B(x) = b0 +b1x +b2x2
+L+ bn"1x
n"1
!
A(x)+ B(x) = (a0 +b0 )+ (a1 +b1)x +L+ (an"1 +b
n"1)xn"1
!
A(x) = a0 + (x (a1 + x (a2 +L+ x (an"2 + x (an"1))L))
!
A(x)" B(x) = ci xi
i =0
2n#2
$ , where ci = a j bi# j
j =0
i
$
32
A Modest PhD Dissertation Title
"New Proof of the Theorem That Every Algebraic RationalIntegral Function In One Variable can be Resolved intoReal Factors of the First or the Second Degree."
- PhD dissertation, 1799 the University of Helmstedt
33
Polynomials: Point-Value Representation
Fundamental theorem of algebra. [Gauss, PhD thesis] A degree npolynomial with complex coefficients has exactly n complex roots.
Corollary. A degree n-1 polynomial A(x) is uniquely specified by itsevaluation at n distinct values of x.
x
y
xj
yj = A(xj )
34
Polynomials: Point-Value Representation
Polynomial. [point-value representation]
Add. O(n) arithmetic operations.
Multiply (convolve). O(n), but need 2n-1 points.
Evaluate. O(n2) using Lagrange's formula.
!
A(x) : (x0, y0 ), K, (xn-1, yn"1)
B(x) : (x0, z0 ), K, (xn-1, zn"1)
!
A(x)+B(x) : (x0, y0 + z0 ), K, (xn-1, yn"1 + zn"1)
!
A(x) = yk
(x " x j )j#k
$
(xk " x j )j#k
$k=0
n"1
%
!
A(x) " B(x) : (x0, y0 " z0 ), K, (x2n-1, y2n#1" z2n#1)
35
Converting Between Two Polynomial Representations
Tradeoff. Fast evaluation or fast multiplication. We want both!
Goal. Efficient conversion between two representations ⇒ all ops fast.
coefficient
representation
O(n2)
multiply
O(n)
evaluate
point-value O(n) O(n2)
!
a0, a1, ..., an-1
!
(x0, y0 ), K, (xn"1, yn"1)
coefficient representation point-value representation
36
Converting Between Two Representations: Brute Force
Coefficient ⇒ point-value. Given a polynomial a0 + a1 x + ... + an-1 xn-1,evaluate it at n distinct points x0 , ..., xn-1.
Running time. O(n2) for matrix-vector multiply (or n Horner's).
!
y0
y1
y2
M
yn"1
#
$
% % % % % %
&
'
( ( ( ( ( (
=
1 x0
x0
2L x
0
n"1
1 x1
x1
2L x
1
n"1
1 x2
x2
2L x
2
n"1
M M M O M
1 xn"1 xn"12
L xn"1n"1
#
$
% % % % % %
&
'
( ( ( ( ( (
a0
a1
a2
M
an"1
#
$
% % % % % %
&
'
( ( ( ( ( (
37
Converting Between Two Representations: Brute Force
Point-value ⇒ coefficient. Given n distinct points x0, ... , xn-1 and valuesy0, ... , yn-1, find unique polynomial a0 + a1x + ... + an-1 xn-1, that has givenvalues at given points.
Running time. O(n3) for Gaussian elimination.
!
y0
y1
y2
M
yn"1
#
$
% % % % % %
&
'
( ( ( ( ( (
=
1 x0
x0
2L x
0
n"1
1 x1
x1
2L x
1
n"1
1 x2
x2
2L x
2
n"1
M M M O M
1 xn"1 xn"12
L xn"1n"1
#
$
% % % % % %
&
'
( ( ( ( ( (
a0
a1
a2
M
an"1
#
$
% % % % % %
&
'
( ( ( ( ( (
Vandermonde matrix is invertible iff xi distinct
or O(n2.376) via fast matrix multiplication
38
Divide-and-Conquer
Decimation in frequency. Break up polynomial into low and high powers. A(x) = a0 + a1x + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + a7x7. Alow(x) = a0 + a1x + a2x2 + a3x3. Ahigh (x) = a4 + a5x + a6x2 + a7x3. A(x) = Alow(x) + x4 Ahigh(x).
Decimation in time. Break polynomial up into even and odd powers. A(x) = a0 + a1x + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + a7x7. Aeven(x) = a0 + a2x + a4x2 + a6x3. Aodd (x) = a1 + a3x + a5x2 + a7x3. A(x) = Aeven(x2) + x Aodd(x2).
39
Coefficient to Point-Value Representation: Intuition
Coefficient ⇒ point-value. Given a polynomial a0 + a1x + ... + an-1 xn-1,evaluate it at n distinct points x0 , ..., xn-1.
Divide. Break polynomial up into even and odd powers. A(x) = a0 + a1x + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + a7x7. Aeven(x) = a0 + a2x + a4x2 + a6x3. Aodd (x) = a1 + a3x + a5x2 + a7x3. A(x) = Aeven(x2) + x Aodd(x2). A(-x) = Aeven(x2) - x Aodd(x2).
Intuition. Choose two points to be ±1. A( 1) = Aeven(1) + 1 Aodd(1). A(-1) = Aeven(1) - 1 Aodd(1). Can evaluate polynomial of degree ≤ n
at 2 points by evaluating two polynomialsof degree ≤ ½n at 1 point.
we get to choose which ones!
40
Coefficient to Point-Value Representation: Intuition
Coefficient ⇒ point-value. Given a polynomial a0 + a1x + ... + an-1 xn-1,evaluate it at n distinct points x0 , ..., xn-1.
Divide. Break polynomial up into even and odd powers. A(x) = a0 + a1x + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + a7x7. Aeven(x) = a0 + a2x + a4x2 + a6x3. Aodd (x) = a1 + a3x + a5x2 + a7x3. A(x) = Aeven(x2) + x Aodd(x2). A(-x) = Aeven(x2) - x Aodd(x2).
Intuition. Choose four complex points to be ±1, ±i. A(1) = Aeven(1) + 1 Aodd(1). A(-1) = Aeven(1) - 1 Aodd(1). A( i ) = Aeven(-1) + i Aodd(-1). A( -i ) = Aeven(-1) - i Aodd(-1).
Can evaluate polynomial of degree ≤ nat 4 points by evaluating two polynomialsof degree ≤ ½n at 2 points.
we get to choose which ones!
41
Discrete Fourier Transform
Coefficient ⇒ point-value. Given a polynomial a0 + a1x + ... + an-1 xn-1,evaluate it at n distinct points x0 , ..., xn-1.
Key idea. Choose xk = ωk where ω is principal nth root of unity.
DFT
!
y0
y1
y2
y3
M
yn"1
#
$
% % % % % % %
&
'
( ( ( ( ( ( (
=
1 1 1 1 L 1
1 )1 )2 )3L )n"1
1 )2 )4 )6L )2(n"1)
1 )3 )6 )9L )3(n"1)
M M M M O M
1 )n"1 )2(n"1) )3(n"1)L )(n"1)(n"1)
#
$
% % % % % % %
&
'
( ( ( ( ( ( (
a0
a1
a2
a3
M
an"1
#
$
% % % % % % %
&
'
( ( ( ( ( ( (
Fourier matrix Fn
42
Roots of Unity
Def. An nth root of unity is a complex number x such that xn = 1.
Fact. The nth roots of unity are: ω0, ω1, …, ωn-1 where ω = e 2π i / n.Pf. (ωk)n = (e 2π i k / n) n = (e π i ) 2k = (-1) 2k = 1.
Fact. The ½nth roots of unity are: ν0, ν1, …, νn/2-1 where ν = ω2 = e 4π i / n.
ω0 = ν0 = 1
ω1
ω2 = ν1 = i
ω3
ω4 = ν2 = -1
ω5
ω6 = ν3 = -i
ω7
n = 8
43
Fast Fourier Transform
Goal. Evaluate a degree n-1 polynomial A(x) = a0 + ... + an-1 xn-1 at itsnth roots of unity: ω0, ω1, …, ωn-1.
Divide. Break up polynomial into even and odd powers. Aeven(x) = a0 + a2x + a4x2 + … + an-2 x n/2 - 1. Aodd (x) = a1 + a3x + a5x2 + … + an-1 x n/2 - 1. A(x) = Aeven(x2) + x Aodd(x2).
Conquer. Evaluate Aeven(x) and Aodd(x) at the ½nth
roots of unity: ν0, ν1, …, νn/2-1.
Combine. A(ω k) = Aeven(ν k) + ω k Aodd (ν k), 0 ≤ k < n/2 A(ω k+ ½n) = Aeven(ν k) – ω k Aodd (ν k), 0 ≤ k < n/2
ωk+ ½n = -ωk
νk = (ωk )2
νk = (ωk + ½n )2
44
fft(n, a0,a1,…,an-1) { if (n == 1) return a0
(e0,e1,…,en/2-1) ← FFT(n/2, a0,a2,a4,…,an-2) (d0,d1,…,dn/2-1) ← FFT(n/2, a1,a3,a5,…,an-1)
for k = 0 to n/2 - 1 { ωk ← e2πik/n
yk+n/2 ← ek + ωk dk yk+n/2 ← ek - ωk dk }
return (y0,y1,…,yn-1)}
FFT Algorithm
45
FFT Summary
Theorem. FFT algorithm evaluates a degree n-1 polynomial at each ofthe nth roots of unity in O(n log n) steps.
Running time.
!
a0, a1, ..., an-1
!
("0, y0 ), ..., ("
n#1, yn#1)
O(n log n)
coefficientrepresentation
point-valuerepresentation
!
T (n) = 2T (n /2) + "(n) # T (n) = "(n logn)
???
assumes n is a power of 2
46
Recursion Tree
a0, a1, a2, a3, a4, a5, a6, a7
a1, a3, a5, a7a0, a2, a4, a6
a3, a7a1, a5a0, a4 a2, a6
a0 a4 a2 a6 a1 a5 a3 a7
"bit-reversed" order
000 100 010 110 001 101 011 111
perfect shuffle
47
Inverse Discrete Fourier Transform
Point-value ⇒ coefficient. Given n distinct points x0, ... , xn-1 and valuesy0, ... , yn-1, find unique polynomial a0 + a1x + ... + an-1 xn-1, that has givenvalues at given points.
Inverse DFT
!
a0
a1
a2
a3
M
an"1
#
$
% % % % % % %
&
'
( ( ( ( ( ( (
=
1 1 1 1 L 1
1 )1 )2 )3L )n"1
1 )2 )4 )6L )2(n"1)
1 )3 )6 )9L )3(n"1)
M M M M O M
1 )n"1 )2(n"1) )3(n"1)L )(n"1)(n"1)
#
$
% % % % % % %
&
'
( ( ( ( ( ( (
"1
y0
y1
y2
y3
M
yn"1
#
$
% % % % % % %
&
'
( ( ( ( ( ( (
Fourier matrix inverse (Fn) -1
48
Claim. Inverse of Fourier matrix Fn is given by following formula.
Consequence. To compute inverse FFT, apply same algorithm but use ω-1 = e -2π i / n as principal nth root of unity (and divide by n).
!
Gn
=1
n
1 1 1 1 L 1
1 "#1 "#2 "#3L "#(n#1)
1 "#2 "#4 "#6L "#2(n#1)
1 "#3 "#6 "#9L "#3(n#1)
M M M M O M
1 "#(n#1) "#2(n#1) "#3(n#1)L "#(n#1)(n#1)
$
%
& & & & & & &
'
(
) ) ) ) ) ) )
Inverse DFT
!
1
nFn
is unitary
49
Inverse FFT: Proof of Correctness
Claim. Fn and Gn are inverses.Pf.
Summation lemma. Let ω be a principal nth root of unity. Then
Pf. If k is a multiple of n then ωk = 1 ⇒ series sums to n. Each nth root of unity ωk is a root of xn - 1 = (x - 1) (1 + x + x2 + ... + xn-1). if ωk ≠ 1 we have: 1 + ωk + ωk(2) + … + ωk(n-1) = 0 ⇒ series sums to 0. ▪
!
" k j
j=0
n#1
$ =n if k % 0 mod n
0 otherwise
& ' (
!
Fn Gn( ) k " k = 1
n#k j #$ j " k
j=0
n$1
% = 1
n#(k$ " k ) j
j=0
n$1
% = 1 if k = " k
0 otherwise
& ' (
summation lemma
50
Inverse FFT: Algorithm
ifft(n, a0,a1,…,an-1) { if (n == 1) return a0
(e0,e1,…,en/2-1) ← FFT(n/2, a0,a2,a4,…,an-2) (d0,d1,…,dn/2-1) ← FFT(n/2, a1,a3,a5,…,an-1)
for k = 0 to n/2 - 1 { ωk ← e-2πik/n
yk+n/2 ← (ek + ωk dk) / n yk+n/2 ← (ek - ωk dk) / n }
return (y0,y1,…,yn-1)}
51
Inverse FFT Summary
Theorem. Inverse FFT algorithm interpolates a degree n-1 polynomialgiven values at each of the nth roots of unity in O(n log n) steps.
assumes n is a power of 2
!
a0, a1,K, an-1
!
("0, y0 ), K, ("
n#1, yn#1)
O(n log n)
coefficientrepresentation
O(n log n) point-valuerepresentation
52
Polynomial Multiplication
Theorem. Can multiply two degree n-1 polynomials in O(n log n) steps.
!
a0, a1,K, an-1
b0, b1,K, bn-1
!
c0, c1,K, c2n-2
!
A("0), ..., A("
2n#1)
B(" 0), ..., B(" 2n#1
)
!
C("0), ..., C("
2n#1)
O(n)
point-value multiplication
O(n log n)2 FFTs inverse FFT O(n log n)
coefficientrepresentation coefficient
representation
pad with 0s to make n a power of 2
53
FFT in Practice ?
54
FFT in Practice
Fastest Fourier transform in the West. [Frigo and Johnson] Optimized C library. Features: DFT, DCT, real, complex, any size, any dimension. Won 1999 Wilkinson Prize for Numerical Software. Portable, competitive with vendor-tuned code.
Implementation details. Instead of executing predetermined algorithm, it evaluates your
hardware and uses a special-purpose compiler to generate anoptimized algorithm catered to "shape" of the problem.
Core algorithm is nonrecursive version of Cooley-Tukey. O(n log n), even for prime sizes.
Reference: http://www.fftw.org
55
Integer Multiplication, Redux
Integer multiplication. Given two n bit integers a = an-1 … a1a0 andb = bn-1 … b1b0, compute their product a ⋅ b.
Convolution algorithm. Form two polynomials. Note: a = A(2), b = B(2). Compute C(x) = A(x) ⋅ B(x). Evaluate C(2) = a ⋅ b. Running time: O(n log n) complex arithmetic operations.
Theory. [Schönhage-Strassen 1971] O(n log n log log n) bit operations.Theory. [Fürer 2007] O(n log n 2O(log *n)) bit operations.
!
A(x) = a0 + a1x + a2x2
+L+ an"1x
n"1
!
B(x) = b0 +b1x +b2x2
+L+ bn"1x
n"1
56
Integer Multiplication, Redux
Integer multiplication. Given two n bit integers a = an-1 … a1a0 andb = bn-1 … b1b0, compute their product a ⋅ b.
Practice. [GNU Multiple Precision Arithmetic Library]It uses brute force, Karatsuba, and FFT, depending on the size of n.
"the fastest bignum library on the planet"
57
Integer Arithmetic
Fundamental open question. What is complexity of arithmetic?
addition
Operation
O(n)
Upper Bound
Ω(n)
Lower Bound
multiplication O(n log n 2O(log*n)) Ω(n)
division O(n log n 2O(log*n)) Ω(n)
58
Factoring
Factoring. Given an n-bit integer, find its prime factorization.
2773 = 47 × 59
267-1 = 147573952589676412927 = 193707721 × 761838257287
RSA-704($30,000 prize if you can factor)
74037563479561712828046796097429573142593188889231289084936232638972765034028266276891996419625117843995894330502127585370118968098286733173273108930900552505116877063299072396380786710086096962537934650563796359
a disproof of Mersenne's conjecture that 267 - 1 is prime
59
Factoring and RSA
Primality. Given an n-bit integer, is it prime?Factoring. Given an n-bit integer, find its prime factorization.
Significance. Efficient primality testing ⇒ can implement RSA.Significance. Efficient factoring ⇒ can break RSA.
Theorem. [AKS 2002] Poly-time algorithm for primality testing.
60
Shor's Algorithm
Shor's algorithm. Can factor an n-bit integer in O(n3) time on aquantum computer.
Ramification. At least one of the following is wrong: RSA is secure. Textbook quantum mechanics. Extending Church-Turing thesis.
algorithm uses quantum QFT !
61
Shor's Factoring Algorithm
Period finding.
Theorem. [Euler] Let p and q be prime, and let N = p q. Then, thefollowing sequence repeats with a period divisible by (p-1) (q-1):
Consequence. If we can learn something about the period of thesequence, we can learn something about the divisors of (p-1) (q-1).
by using random values of x, we get the divisors of (p-1) (q-1),and from this, can get the divisors of N = p q
1 2 4 8 16 32 64 128 …2 i
1 2 4 8 1 2 4 8 …2 i mod 15
1 2 4 8 16 11 1 2 …2 i mod 21period = 4
period = 6
x mod N, x2 mod N, x3 mod N, x4 mod N, …
Extra Slides
63
Fourier Matrix Decomposition
!
y = Fn a = In /2
Dn /2
In /2"Dn /2
#
$ %
&
' (
Fn /2aeven
Fn /2aodd
#
$ %
&
' (
!
I4
=
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
"
#
$ $ $ $
%
&
' ' ' '
!
D4
=
"00 0 0
0 "10 0
0 0 "20
0 0 0 "3
#
$
% % % %
&
'
( ( ( (
!
Fn
=
1 1 1 1 L 1
1 "1 "2 "3L "n#1
1 "2 "4 "6L "2(n#1)
1 "3 "6 "9L "3(n#1)
M M M M O M
1 "n#1 "2(n#1) "3(n#1)L "(n#1)(n#1)
$
%
& & & & & & &
'
(
) ) ) ) ) ) )
!
a =
a0
a1
a2
a3
"
#
$ $ $ $
%
&
' ' ' '