lecture note 2 – calculus and probability shuaiqiang wang department of cs & is university of...

25
Lecture Note 2 – Calculus and Probability Shuaiqiang Wang Department of CS & IS University of Jyväskylä http://users.jyu.fi/~swang/ [email protected]

Upload: leslie-george

Post on 06-Jan-2018

219 views

Category:

Documents


2 download

DESCRIPTION

Definition

TRANSCRIPT

Page 1: Lecture Note 2 – Calculus and Probability Shuaiqiang Wang Department of CS & IS University of Jyväskylä

Lecture Note 2 – Calculus and Probability

Shuaiqiang WangDepartment of CS & ISUniversity of Jyväskylä

http://users.jyu.fi/~swang/[email protected]

Page 2: Lecture Note 2 – Calculus and Probability Shuaiqiang Wang Department of CS & IS University of Jyväskylä

Part 1: Calculus

Page 3: Lecture Note 2 – Calculus and Probability Shuaiqiang Wang Department of CS & IS University of Jyväskylä

Definition

• Given a function , the derivative is

𝑓 ′ (𝑥 )= 𝑑𝑑𝑥 𝑓 (𝑥)=

lim𝑡→ 0

𝑓 (𝑥+𝑡 )− 𝑓 (𝑥)

𝑡

𝑑𝑑𝑥 𝑓 (𝑡)=𝑑 𝑓

𝑑𝑡𝑑𝑡𝑑𝑥

𝑑𝑑𝑥 2=0

Page 4: Lecture Note 2 – Calculus and Probability Shuaiqiang Wang Department of CS & IS University of Jyväskylä

Polynomial Function

• Example:

𝑑𝑑𝑥 𝑥

𝑎=𝑎𝑥𝑎−1

Page 5: Lecture Note 2 – Calculus and Probability Shuaiqiang Wang Department of CS & IS University of Jyväskylä

Proof: Polynomial Function

Page 6: Lecture Note 2 – Calculus and Probability Shuaiqiang Wang Department of CS & IS University of Jyväskylä

Logarithm Function

• Where the base • Example:

𝑑𝑑𝑥 ln𝑥=

1𝑥

𝑑𝑑𝑥 ln(𝑥2+2)¿𝑡=𝑥2+2 𝑑

𝑑𝑡 ln 𝑡× 𝑑𝑡𝑑𝑥¿1𝑡 ×2 𝑥=

2𝑥𝑥2+2

Page 7: Lecture Note 2 – Calculus and Probability Shuaiqiang Wang Department of CS & IS University of Jyväskylä

Proof: Logarithm Function

• Let , Then when , and• =

Page 8: Lecture Note 2 – Calculus and Probability Shuaiqiang Wang Department of CS & IS University of Jyväskylä

Exponential Function

Example:

𝑑𝑑𝑥 𝑒

𝑥=𝑒𝑥

𝑑𝑑𝑥 𝑒

𝑥2+𝑥¿𝑡=𝑥2+𝑥 𝑑𝑑 𝑡 𝑒

𝑡× 𝑑𝑡𝑑𝑥

¿𝑒𝑡× (2 𝑥+1 )=(2 𝑥+1 )𝑒𝑥2+𝑥

Page 9: Lecture Note 2 – Calculus and Probability Shuaiqiang Wang Department of CS & IS University of Jyväskylä

Proof: Exponential Function

• Let’s calculate . Let Then• = • • Thus , and

Page 10: Lecture Note 2 – Calculus and Probability Shuaiqiang Wang Department of CS & IS University of Jyväskylä

Exponential Function

• Proof.• Let Then • =

• Thus

𝑑𝑑𝑥 𝑎

𝑥=𝑎𝑥 ln𝑎

Page 11: Lecture Note 2 – Calculus and Probability Shuaiqiang Wang Department of CS & IS University of Jyväskylä

Taylor Series

When

Example:

Page 12: Lecture Note 2 – Calculus and Probability Shuaiqiang Wang Department of CS & IS University of Jyväskylä

Partial Derivative and Gradient

𝒙=[𝑥1

⋮𝑥𝑛

] 𝑓 (𝒙 )=𝑎𝑥1𝑥2+𝑏𝑥22For example

Partial derivative of a function with respect to certain variable is the derivative of while regarding other variables as constants.

𝛻 𝑓 (𝒙 )=[𝜕 𝑓𝜕𝑥1

⋮𝜕 𝑓𝜕 𝑥𝑛

]

Page 13: Lecture Note 2 – Calculus and Probability Shuaiqiang Wang Department of CS & IS University of Jyväskylä

Taylor Approximation

𝑓 (𝑥 )≈∑𝑖=0

𝑘 𝑓 (𝑖 ) (𝑎 )𝑖 ! (𝑥−𝑎 )𝑖Taylor

Approximation

Taylor Series 𝑓 (𝑥 )=∑

𝑖=0

∞ 𝑓 (𝑖 ) (𝑎 )𝑖 ! (𝑥−𝑎 )𝑖

Page 14: Lecture Note 2 – Calculus and Probability Shuaiqiang Wang Department of CS & IS University of Jyväskylä

First-Order Taylor Approximation

𝑓 (𝒙 )≈ 𝑓 (𝒂 )+𝛻 𝑓 (𝒙 )⊤(𝒙−𝒂)

𝑓 (𝑥 )≈ 𝑓 (𝑎)+ 𝑓 ′ (𝑎 )(𝑥−𝑎)1 dimension

dimensions when

Page 15: Lecture Note 2 – Calculus and Probability Shuaiqiang Wang Department of CS & IS University of Jyväskylä

Gradient Descent Optimization

According to the first order Taylor approximation of ( ) :

( ) ( ) ( ) (1)It can be written as:

( ) ( ) ( ) (1)where is the learning rate, and is a unit vector represent

Tn n n

Tn n n

f x

f x hu f x h f x u O

f x hu f x h f x u Oh u

1

ing direction.Let , which is the value of in the next iteration.Our optimization objective function is:

arg min ( ) ( ) arg min ( ) (1)

The optimal solution is: ( )

n n

Tn n n

u u

n

x x hu x

f x hu f x h f x u O

u f x

Page 16: Lecture Note 2 – Calculus and Probability Shuaiqiang Wang Department of CS & IS University of Jyväskylä

Gradient Descent Algorithm

max

1

For n 1,2, , :( )

if || ( )|| , return

1End

n n

n n

n n n

Ng f xg x x

x x hgn n

K

Page 17: Lecture Note 2 – Calculus and Probability Shuaiqiang Wang Department of CS & IS University of Jyväskylä

Part 2: Probability

Page 18: Lecture Note 2 – Calculus and Probability Shuaiqiang Wang Department of CS & IS University of Jyväskylä

Independent Events• Let and be two independent events.

𝑃 ( 𝐴 ,𝐵 )=𝑃 ( 𝐴 ) 𝑃 (𝐵)

• Example 1: Coin tossing– Each tossing is independent to previous ones

• Example 2: Taking exams– Each exam is independent to previous ones– Fail 3 times:

– Pass at least 1 time:

Page 19: Lecture Note 2 – Calculus and Probability Shuaiqiang Wang Department of CS & IS University of Jyväskylä

Conditional Probability

• A person goes to sauna 6 times during the last 10 days, at most once per day.

• It snowed 8 days during the last 10 days.• It snowed 4 days during the 6 sauna days.• P(sauna | snow) = ?• P(snow | sauna) = ?

𝑃 ( 𝐴|𝐵 )= 𝑃 (𝐴 ,𝐵)𝑃 (𝐵)

Example

Page 20: Lecture Note 2 – Calculus and Probability Shuaiqiang Wang Department of CS & IS University of Jyväskylä

Bayes’ Theorem

𝑃 (𝜃|𝑦 )=𝑃 (𝑦 ,𝜃)𝑃 (𝑦 )

𝑃 (𝜃|𝑦 )=𝑃 (𝑦 ,𝜃)𝑃 (𝑦 )

𝑃 (𝑦 ,𝜃 )=𝑃 (𝑦|𝜃 ) 𝑃 (𝜃 )=𝑃 (𝜃|𝑦 ) 𝑃 ( 𝑦 )

Since

Then

¿𝑃 ( 𝑦|𝜃 ) 𝑃 (𝜃)

𝑃 (𝑦 )

Page 21: Lecture Note 2 – Calculus and Probability Shuaiqiang Wang Department of CS & IS University of Jyväskylä

Bayes’ Theorem

𝑃 (𝜃|𝑦 )=𝑃 (𝑦 ,𝜃)𝑃 (𝑦 )

=𝑃 (𝑦|𝜃 ) 𝑃 (𝜃)

𝑃 (𝑦)

With same data and same prior

Page 22: Lecture Note 2 – Calculus and Probability Shuaiqiang Wang Department of CS & IS University of Jyväskylä

Maximum Likelihood Estimation

• Input: A set of observations with parameters • Output: The estimation of • Assume that all of the observations are

independent • Thus their probability can be calculated as

ℒ (𝑦∨𝜃)=∏𝑖=1

𝑛

𝑃 (𝑦 𝑖∨𝜃)

Page 23: Lecture Note 2 – Calculus and Probability Shuaiqiang Wang Department of CS & IS University of Jyväskylä

Maximum Likelihood Estimation

• We try to find the largest probability of with the given observations

• With same and , we can actually maximize :

Page 24: Lecture Note 2 – Calculus and Probability Shuaiqiang Wang Department of CS & IS University of Jyväskylä

Optimization

• Since is a increasing function, it is equivalent to

Then we can optimize it with gradient descent.

Page 25: Lecture Note 2 – Calculus and Probability Shuaiqiang Wang Department of CS & IS University of Jyväskylä

Any Question?