some mathematical tools for machine learning (part 1)€¦ · some mathematical tools for machine...
Post on 29-May-2020
11 Views
Preview:
TRANSCRIPT
Some Mathematical Tools
for Machine Learning
Chris Burges
Microsoft Research
February 13th, 2009
Contents – Part 1
1.Lagrange Multipliers: An Overview
2.Basic Concepts in Functional Analysis
3.Some Notes on Matrix Analysis
4.Convex Optimization: A Brief Tour
A Brief Diversion
Leslie Lamport: “How to Write a Proof”: DEC tech report, Feb 14, 1993
Abstract: A method of writing proofs is proposed that makes it much harder
to prove things that are not true. The method, based on hierarchical
structuring, is simple and practical.
“Anecdotal evidence suggests that as many as a third of all papers
published in mathematical journals contain mistakes – not just minor errors,
but incorrect theorems and proofs.”
Lagrange Multipliers:
An Overview, and Some Examples
Lagrange the Mathematician
• Born 1736 in Turin, one of two of 11 to survive infancy
• “Responsible for much fine mathematics published under the names
of other mathematicians”
• Believed that a mathematician has not thoroughly understood his
own work till he has made it so clear that he can go out and explain
it to the first person he meets on the street
• Worked on mechanics, calculus, the calculus of variations
astronomy, probability, group theory, and number theory
• At least partly responsible for the choice of base 10 for the metric
system, rather than 12
• Supported by Euler and d’Alembert, financed by Frederick and Louis
XIV, close to Lavoisier, Marie Antoinette
An indirect approach can be easier( ) ( ) 1,
0,
1,
Example: Minimize subject to
If could rotate to coordinate system and rescale so that
constraints take the form substitute with a parameterization
that encodes the c
nf x c x x Ax x
A
y y
1
1 1 2 1
2 1 2 1
:
sin sin sin
sin sin cos
onstraints that lives on
solve, and then apply the inverse mapping. But this can get
very complicated!
It is often not even possible to parameterize co
n
n
n
x
y
y
S
nstraints (for
example, polynomial constraints in several variables)
One equality constraint
2( ) ( ,) 0 .Minimize subject to f x xc x
0f 1f
2f
3f
( ) 0c x
f
c
One equality constraint, cont.
,
( ) 0
f c
f c
L f c
Hence at the optimum, we must have or:
0f 1f
2f
3f
( ) 0c x
f
c
Multiple equality constraints
( ) 0, 1, , .
( ) ( ).
,
constraints:
Define gradients:
Let be the subspace spanned by the and let be its
orthogonal complement.
Suppose that at some point, all constraints hold, and
i
i i
i
n c x i n
g x c x
S g S
0
0, ( ) : ( ) 0
Then can increase (or decrease) by moving along
Hence or: i i i i
i i
f
f f
f f c x L f c
Puzzle: why not multiple Lagrangians?
One inequality constraint
*
*
( ) ( ) 0.
( ) 0.
( ) ', ( ) 0.)
Find that minimizes subject to
What's new? At the solution, it's possible that
(Simple but not guaranteed: solve 'minimize check that
x f x c x
c x
f x c x
0f
1f
2f
3f
( ) 0c x
f
c
( ) 0c x
( ) 0c x
: ( ) 0, 0f c f c
Multiple inequality constraints
( ) 0, 0i i ii
f c
Suppose that at the solution
Then removing makes no difference, and we must drop
from the sum in
Equivalently we can set
Hence, always impose
A simple example
21 11 2 1 2
2 2
1 1 2 1 2 1 2 2
2 2 2
1 2 1 2 1 1 2 2
1 1 2 1 1
:
: , ,
( , ) 1 0, ( , ) 1
( , ) (1 ) (1 )
0 ( )
Extremize the distance between two points on
Embed in extremize
subject to
n
n n
i i
i
f x x x x
c x x x c x x x
L x x f c x x x x
L x x x
n
S
R
2 2 1 2 2
2 1 1 1 2 2
0
0 ( ) 0
(1 ) (1 )
2 0.
,
antipodal or equal: or i
L x x x
x x x x
Another simple example
c
Given a parallelogram whose sidelengths you can choose but
whose perimeter is fixed - what shape has the largest area?
h
b
2( )
( , ) (2( ) )
0
bh b h c
L b h bh b h c
L b h
Maximize subject to
Again, not explicitly needed: hence “method of
undetermined multipliers”
Simple exercises
2
2
1.
1 0 0
Puzzle: what coefficients maximize a convex sum of fixed numbers?
Puzzle: minimize subject to
Puzzle: maximize subject to and (hint: use )
i i
i i
i i i i i
i i
x x
x x x x
Resource Allocation
Distortion
Rate
inNumber of widgets
( )
Cost
i ic n
Fiber has bit errors per second, and sends bits total, per second. We wish to maximize the bit rate at a fixed distortion rate:
Puzzle: Does this work for economics, too?
A variational problem
An isoperimetric problem: find the curve of fixed length and
fixed endpoints {a,b} that encloses maximum area above [a,b].
x=1x=0
y
1 12 1/ 2
0 0
1 12 1/ 2
0 0
1 12 1/ 2
0 0
12 1/ 2
0
12 3/ 2
0
2 3/ 2
(1 ' )
(1 ' )
(1 ' ) ' '
1 '(1 ' )
1 ''(1 ' )
1 ''(1 ' ) 1 0
Area , length y dx y dx
L y dx y dx
L y dx y y y dx
dy y y dx
dx
y y y dx
y y
…straight line, or arc of circle.
Which univariate distribution has max entropy?
( ) log ( )f x f x dx
Minimize
1
2
2
( ) 0 ,
( ) 1,
( ) ,
( )
f x x
f x dx
x f x dx c
x f x dx c
subject to:
( )( )
( )
functional derivative
g xx y
g y
Need :
2
1 2( ) log ( ) (1 ( ) ) ( ) ( )L f x f x dx f x dx x f x dx x f x dx
2
1 20, log ( ) log( ) 0( )
Lx f y e y y
f y
Impose integrate w.r.t
→ f must be Gaussian!
Which univariate distribution has max
entropy?
Puzzle: I thought the uniform distribution has max entropy.
What’s going on?
Puzzle: What distribution do you get if you fix the mean,
but not the variance?
Puzzle: What distribution do you get if you fix only the
function’s support?
Max Entropy for Discrete Distribn + Linear
Constraints
: 1, 0
Have discrete distribution
Suppose you also have known linear constraints:
but you are maximally uncertain about everything else. So want max
entropy distribution subject to
i i ii
ij j ji
P P P
a P C
log 1
0 exp( 1 ) (1/ )exp( )
these constraints.
i i j ij j j i i ii j i i i
k k j kj j kjj j
L P P a P C P P
L P a Z a
→ logistic regression!
Are Lagrange Multipliers Really That Common?
Yes.
• Most flavors of Support Vector Machines
• Principal Component Analysis
• Canonical Correlation Analysis
• Locally Linear Embedding
• Laplacian Eigenmaps
• …
For example:
Basic Concepts in
Functional Analysis:
A Brief Tour
of Hilbert spaces, Norms,
and All That
What is a Field?
{ , , }: , { , } F F a set operations
{ , } is an Abelian group with identity denoted by 0F
{ 0, }F is an Abelian group with identity denoted by 1
( ) { , , }x y z x y x z x y z F
A field generalizes the notion of arithmetic on reals.
Field : Examples
F
F
F
R
Q
C
With + meaning addition and meaning multiplication,
the reals:
the rationals:
the complex numbers:
2
{0
{
,1
: 0
}
} { , }?
2
+
Puzzle 1: For , why doesn't using "
with + meaning "X
OR" for + work?
Pu
What's the s
zzle 2: Is
OR" a
mallest fiel
nd meaning "AND"
a field und
?
er
d
x x
Z
Z
How Many Fields Are There?
p
2
Actually infinitely many - e.g. , prime.
However, can define an 'ordering;' for fields (like <, =, > for ).
, can be ordered; and cannot; in fact all finite fields
cannot be ordered.
For orde
pZ
R
R Q C Z
.
red fields, 'supremum' and 'infimum' can be defined.
An ordered field is 'complete' iff every nonempty subset of
that has an upper bound in also has a supremum in
is not complete, but is. In
F
F F
Q R fact: every complete, ordered
field is isomorphic to !R
However, can define an ordering for some fields.
What is a Vector Space?
,
( , )
( , )
;
( ) (
A vector space is a nonempty set a field and operations 'addition'
( from ) and 'multiplication by a scalar'
( from into ) such that:
(a)
(b)
E F
x y x y E E E
x x F E E
x y y x
x y z z y
);
, , ;
) ( ) ;
) ;
( ) ;
1 .
(c) For every there exists such that
(d) (
(e) (
(f)
(g)
z
x y E z E x y z
x x
x x x
x y x y
x x
nA vector space generalizes the notion of vectors in nR
Vector Spaces: Field Matters!
{ , } { , } );
{ , } { , } );
{ , } { , } 2 );
(dimension
(dimension
(dimension
N
N
N
E F N
E F N
E F N
{ , }, 1
{ , }
:
For the vector space vectors and are linearly
independent.
For
'Linear
the vec
dependence' depends on field
tor space they are not.
i
F
Vector Spaces: More Examples
( )( ) ( ) ( );
( )( ) ( ).
f g x f x g x
f x f x
(2) (complex by matrices), over the field ;mnM m n
(1) Functions, whose range is a vector space, also
themselves form a vector space:
1
1 1
1,
( ),
(3) : for the space of all infinite sequences of
complex numbers such that
p
p
n
n
n n n
n n
l p
z
x y x y x x
What is an Inner Product?
, :
, , ,
, 0
, 0 0
, , ,
, ,
, ,
Let be a vector space over or . A function
is an if for all
(a)
(b)
(c)
(d) for all scalars
inner produ
)
ct
(e
V V V F
x y z V
x x
x x x
x y z x z y z
cx y c x y c F
x y y x
The inner product generalizes the notion of dot product
Inner Product: Examples
2 1
{ , } ,
,
, ( )
,
(1) Vector space with , defined by
(2) Vector space over with , defined by
(3) Vector space of matrices over with
( )
Ni ii
i ii
T
pm
x y x y
l x y x y
X Y Trace X Y
X Y M
Inner Product: Trace
2
2
( ) 0
( ) 0
(( ) , ) ( )
, 0
, 0 0
, , , :
,
(
,
)
(a)
(b)
(c)
(d) f
Ti
i
Ti
i
T T T
X X
X X x
x y
Tr X X X
Tr X X X
Tr Xz x z Yy z
cx y c
Z Tr X Z Tr Y Z
x y
( ) ( )
( ) ( ), ,
or all scalars
(e)
T T
T T
Tr X Y Tr X Y
Tr X Y Tr Yx X
c F
y y x
2
1
1/ 2
| , | , , ,
| , |cos : 0
( , , )
Cauchy Schartz inequality holds for any inner product , :
for all
"Angle" between two vectors can be defined for any inner product:
x y x x y y x y V
x y
x x y y
2
1 2 1 0
3 4 1 2E.g. the angle between and is 78.4 degrees !
Inner Product is General
What is a Norm?
:
, ,
0
0 0
| |
Let be a vector space over or . A function is
a norm if for all
(a)
(b)
(c) for all scalars
(d)
If condition (b) is droppe
Wh
d, it's a 'seminorm'.
at is the
V V
x y V
x
x x
cx c x c F
x y x y
simplest seminorm? : 0Ans x
Seminorm splits the space
0
1 0 1
1.
{ : 0}
,
0.
If is a seminorm on , then is a subspace
(called the null space).
If is a subspace of such that then is a norm
on
Define The corresponding cosets form a vec
V V v V v
V V V V
V
x y x y
5 , ' , (3,2,1,0,0),
(0,0,0,1,0) (0,0,0,0,1).
tor
space, and on that sp
Example seminorm on : null space
spanned by
ace, is a no
and
rm.
x Dx D diag
2 2 21 2 1 2
1/
( , , , )
{ } . .n=1
for is the Euclidean
norm.
Let The function defined by is a norm in
This works for finite sums too: define the norm on as:
nn n
pp
n p n p
np
x x x x x x x x
z z l z z l
l
x
1/
1 1 2| | | | | |
max({ })
{ }.
i=1
The norm is the 'Manhattan' norm
The norm is the 'max' norm
A normed vector space is a pair vector space, norm
np
p
i
n
i
x
l x x x x
l x x
Norm Generalizes Length
What is convexity?
1 2 1
1 2( ) ( ).
A function is if, between and , it lies below the
chord joi
convex
strictly convexning and It is if it lies strictly below.
x x x
f x f x
1 2 1 2((1 ) ) (1 ) ( ) ( )
0 1
f x x f x f x
.
A set is if, for all and , all points on the line
joining and
convex
lie in
S a S b S
a b S
1x
2x
When is a sphere a square?
2
1
: the 1-sphere
: the
Green
1-sphere
: the
Blu
Red
1-spher
e
e
l
l
l
Interesting... each ball looks convex
When is a sphere a cross?
0
0
0
0 0,
0
| | 1.0
In this context, never. The " norm" (count the number of
nonzero elements) is not a norm (for over or ).
if
if
unless
n
l
v
v
v
00? lim ?
p
p iix x
All norms are convex functions
1 2 1 2
1 2
1 2
(1 ) 1
| 1 | | |
1
x x x x
x x
x x
1 2
1 2 1 2
2
( ) ( ) 0
: ( ) 0 . ( 0 ( 0,
((1 ) ) (1 ) ( ) ( ) 0
1,
If is convex, then defines a convex set:
let Then if ) and )
the unit ball, for any norm is a convex region.
f x f x
R x f x f x f x
f x x f x f x
x
Open, Closed, Compact
, :
: , 0 . . ( , )
:
: 0 ( ,0)
Given a norm and a set in a vector space
n
n
i
n S V
Open x S s t B x S
Closed complement of open
Bounded r suchthat S B r
Compact : Every sequence x in S contains convergent subsequence
with limit in S
O
1 2, , , , , ,
Compact sets are closed and bounded.
For finite dimensional normed vector spaces, every closed bounded
set is compa
NN i i
R : Every cover has a finite sub - cover :
S S S S S N finite suchthat S S
,
ct.
If for a normed vector space the unit ball is compact, then has
finite dimension.
V V
Making your own norm
In fact, in real finite dimensional spaces, any symmetric,
compact, convex region centered on the origin defines
a norm (as the unit ball for that norm):
0Rescuing : the Hamming norml
2
2
0
.
:
( )
( ) ( )
1
.
0
Consider n-tuples taking values in These form a vector
space over the field for example,
Now define the norm to be the number of nonzero elements
of Is
x y x y
x x
x x
l x N
x
Z
Z
0
0
0 0
0 0 0
0
0 0
| |
this a norm?
(a)
(b)
(c)
(d)
x
x x
cx c x
x y x y
Hamming norm, cont.
2
2
1)
0
Puzzle: ( - how can this be correct?
Puzzle: What is the subtraction operation, in ?
Puzzle: What is the Hamming distance ?
N
x y
Z
Z
When does a norm come from an inner product?
2 2
,
1,
4
Every inner product defines a norm:
Does every norm define an inner product? If so, for real vector spaces,
No! A necessary and sufficient condition for a norm to correspond t
x x x
x y x y x y
2 2 2 2
o an
inner product is the paralellogram identity:
1
2
(Jordan and von Neumann, 1935)
x y x y x y
Lp norms, inner products
1/
2 /2
2 // 2
2 // 2
1 2 2 1 1 2 2 1 1 2 2
| | , | |
| | ? ,
, | | : , ,
, | ( ) | , ,
2.
1
where is integrable
E.g. try then but
unless
p
pp p
L
pp
pp
pp
f f f
f f f f
f g fg f g f g
f f g f f g f g f g
p
Lp norms, inner products cont.
2 2
1
)
1,
4
Maybe we could find some other inner product that works for
all ?
No: if a norm is derivable from an inner product (over , then
. Choose two functions:
p
x y x y x y
1
1
1
1
( ) 0 : 0
( ) 1 : 0 1
( ) 2 : 1 2
( ) 0 : 2
f x x
f x x
f x x
f x x
2
2
2
0 : 1
1 : 1 2
0 : 2
f x
f x
f x
1 2
1
Lp norms, inner products cont.
We’d like:
1 2 3 4 5 6 7 8 9 101.5
1.6
1.7
1.8
1.9
2
2.1
2.2
2.3
2.4
2.5
p
norm on Rn has no inner product
2
2 2 2 2
2 2 2 22 2
: [1,0], [0,2]
1?
2
1 1(2 2 ) 4 1 4 5
2 2
[1,0,0, ,0], [0,2,0, ,0]
Example in
Use parallelogram test:
Extend to : n
x y
x y x y x y
x y x y x y
x y
R
R
What is a Cauchy Sequence?
{ }
, .
Cauchy sequenc
A sequence of vectors in a normed vector space is called
a if for every >0 there exists a number
such that for all
e
n
m n
x
M
x x m n M
1x2x 3x
4x 5x 6x
Key idea: the Cauchy sequence
allows us to define notions of
convergence without ever
leaving the space
Cauchy sequences, cont.
Every convergent sequence is a Cauchy sequence.
Not every Cauchy sequence is a convergent sequence.
Note Cauchy sequence requires choice of norm!
[0,1]
2 3
(0,1) [0,1].
max | ( ) | .
( ) 12! 3! !
1,2, { }
(0,1)
: the space of polynomials on Choose the
norm Define
for . Then is a Cauchy sequence but it does
not converge in becaus
n
n
n
l
P P x
x x xP x x
n
n P
P
P e its limit is not a polynomial.
The notion of completeness
1 .
A normed vector space is called if every Cauchy
sequence in converges to an element of .
complete
Banach spaA complete normed space is called a .
with norm is complete, for all
The se
ce
q
np
E
E E
l p
2 1
2
1
[ , ]
[ , ]
uence space is complete, for all .
with norm is complete.
with norm or norm is not complete.
"All square integrable functions with norm" is complete.
A complete space "con
pl p
C a b L
C a b L L
L
tains all its limit points."
It is always possible to 'complete' a non-complete space.
Hilbert and Banach spaces
A Hilbert space is a complete inner product space.
Normed Inner Product
Banach Hilbert
[ , ] : ,C a b x y xy
[0,1]Polynomials on with max norm
Hilbert and Banach spaces, cont.
Normed
Banach
Hilbert
2[ , ] with
norm
C a b L
Polynomials on [0,1], max norm
[ , ] with normC a b L2pL
2pl
Finite dim. inner product spaces
2, all square
integrable fns
L
2l
How many kinds of Hilbert spaces are there?
1 2 1,2
1
: ,
) ( ) ( ) ,linear mappin
A mapping where are vector spaces, is called
a if ( and
all scalar .
g
s ,
T E E E
T x y T x T y x y E
1 2
1 2
1
( ), ( ) ,
, .
A Hilbert space is said to be to a Hilbert space
if there exists a 1-1 linear mapping from to such that
for every Such a
isomorphic
Hilbert spac is called a e i
H H
T H H
T x T y x y
x y H T
1 2.of
somo
onto
rp ism
h
H H
How many kinds of Hilbert spaces are there?
2 2[ , ]E.g. , are separable Hilbert spaces.l L a b
2.
, .
Let be separable:
If is infinite dimensional, then it is isomorphic to
If has dimension then it is is
Answe
omorp
r: 2..
hic to
.
N
H
H l
H N
A Hilbert space H is called separable if and only if it admits a
countable orthonormal basis. (So, all finite dimensional
Hilbert spaces are separable).
The Riesz Representation Theorem
1sup | ( ) |
. . ( ) )
The of a linear functional is defined:
A linear functional on a normed vector space is bounded
( if and only if its operator
norm
operato
is finite
r
.
norm
x
f
f f x
K s t f x K x x V
{ , }
: .
A linear on a normed vector space is
a linear mappi
functi
ng
onal V F
V F
The Riesz Representation Theorem
0 0
0
.
( ) ,
, .
Let be a bounded linear functional on a Hilbert space
Then there exists exactly one such that
for all and in fact
f H
x H f x x x
x H f x
2
2
[ , ], .
( ) ( )
? ( ) ( ) ( ) ( ) ( ) ( ) ( )
? ( ) ( ) ( ) ( )1
( )
Example: Define a linear functional by
b
a
b b b
a a a
b b b
a a a
b
a
H L a b a b
f x x t dt
Linear f x y x t y t dt x t dt y t dt f x f y
Bounded f x x t dt x t dt x t dt
x t dt
1 12 2
1
b
a
dt b a x
Riesz Representation Theorem, cont.
0 0
0
1/ 22
0
2
1 1
1: ,1 ( ).1 ( ) ( )
?
1
sup | ( ) | sup | ( ) | sup | ( ) | : ( ) 1
1/ ,
Can we find ? Try
Check:
The sup is found by choosing
b b
a a
b
a
b b b
x x
a a a
x x x x t dt x t dt f x
f x
x b a
f f x x t dt
x b a
x t dt x t dt
a
, 0 otherwise.
f
x
b a
t b
A Brief Look Ahead
The Riesz representation theorem can be used to show that any Hilbert
space for which the evaluation functional is continuous, is a Reproducing
Kernel Hilbert Space: there exists K such that
In particular:
Representer Theorem (Kimeldorf and Wahba, 1971; Schölkopf and
Smola, 2002): Let be a strictly monotonic increasing function
and let be an arbitrary loss function. Then each minimizer of
the “regularized risk”
admits a representation of the form
What is a metric space?
( , )
( , ) ,
( , ) ( , ) ;
For any set , let be a function (with range in ) defined on
the set of all ordered pairs of members of satisfying:
(i) is a finite real number for every pair of
(i
E x y
E E x y E
x y x y E E
( , ) 0 ;
( , ) ( , ) ( , ), { , , } .
:
i)
(iii)
Such a function is called a on ; a set with
metric is called a . Different choices of m
metric
metric sp etric on
give different me
ace
t
x y x y
y z x y x z x y z E
E E E E
E
ric spaces.
( , ) 0?Puzzle: What about x y
( , ) ( , )?Puzzle: How about x y y x
( , ) ( , ) ( , )?Puzzle: Where's the triangle inequality: x y x z z y
Metric versus norm
( , )Every normed vector space is a metric space: define
But metric spaces are much more general:
x y x y
Normed spaces
Metric spaces
2.2 1.0
3.4
4.5 3.1
6.0
1.1 0.5
5.9
1.1 6.4
2.3
9.6
19.5d
( ( )) ( )Metrics extend "continuity": f B x B f x
Some metrics on function spaces
12
[ , ]
1
2
2
:[ , ]
, , ( , ) sup ( ) ( )
[ , ] : ( , ) ( ) ( ) ,
( , ) ( ) ( ( , ) ?) ,
Let be the set of all bounded functions . For two
points is a metric.
Let be
Sup
x a b
b
ap
a
b
A f a b
f g A f g f x g x
A C a b f g f x g x dx
f g f x g x dx f g
R.
( ) ( ), [ , ]
[ , ] :
( , ) sup ( ) ( ) , '( ) '( ) , , ( ) ( )
pose instead thenr
r rr x a b
A C a b
f g f x g x f x g x f x g x
Topological Spaces
, :
,
,
A is a non-empty set, a fixed collection
of subsets of satisfying
(1) ,
(2) Intersection of any two sets in is in ,
(3) Unio
topological space
n of any collection of sets in is
T A A
A
A
S S a
S
S i S
S
1 1
,
.
,
in .
is called a topology for and the members of are called the
of
Topological spaces are more general than metric spaces!
Topological spaces extend "continuity": Given an
open
sets
A
T
T A 1
S.
S S
S 2 2
11 2 2 1
,
: , ( )
d and a
map is "continuous" if
T A
A A U A U A
2S
..
Continuity, convergence, connectedness… without distance!
Putting Spaces in their Places…
Hilbert spaces
Banach spaces
Normed vector spaces
Metric spaces
Topological spaces
, : the "indiscrete" topology on A A S
~ The Middle ~
Thanks!
top related