lecture note 2 – calculus and probability shuaiqiang wang department of cs & is university of...

Lecture Note 2 – Calculus and Probability

Shuaiqiang WangDepartment of CS & ISUniversity of Jyväskylä

http://users.jyu.fi/~swang/[email protected]

http://users.jyu.fi/~swang/

mailto:[email protected]

Part 1: Calculus

Definition

• Given a function , the derivative is

𝑓 ′ (𝑥 )= 𝑑𝑑𝑥 𝑓 (𝑥)=

lim𝑡→ 0

𝑓 (𝑥+𝑡 )− 𝑓 (𝑥)

𝑡

𝑑𝑑𝑥 𝑓 (𝑡)=𝑑 𝑓

𝑑𝑡𝑑𝑡𝑑𝑥

𝑑𝑑𝑥 2=0

Polynomial Function

• Example:

𝑑𝑑𝑥 𝑥

𝑎=𝑎𝑥𝑎−1

Proof: Polynomial Function

Logarithm Function

• Where the base • Example:

𝑑𝑑𝑥 ln𝑥=

1𝑥

𝑑𝑑𝑥 ln(𝑥2+2)¿𝑡=𝑥2+2 𝑑

𝑑𝑡 ln 𝑡× 𝑑𝑡𝑑𝑥¿1𝑡 ×2 𝑥=

2𝑥𝑥2+2

Proof: Logarithm Function

• Let , Then when , and• =

Exponential Function

Example:

𝑑𝑑𝑥 𝑒

𝑥=𝑒𝑥

𝑑𝑑𝑥 𝑒

𝑥2+𝑥¿𝑡=𝑥2+𝑥 𝑑𝑑 𝑡 𝑒

𝑡× 𝑑𝑡𝑑𝑥

¿𝑒𝑡× (2 𝑥+1 )=(2 𝑥+1 )𝑒𝑥2+𝑥

Proof: Exponential Function

• Let’s calculate . Let Then• = • • Thus , and

Exponential Function

• Proof.• Let Then • =

• Thus

𝑑𝑑𝑥 𝑎

𝑥=𝑎𝑥 ln𝑎

Taylor Series

When

Example:

Partial Derivative and Gradient

𝒙=[𝑥1

⋮𝑥𝑛

] 𝑓 (𝒙 )=𝑎𝑥1𝑥2+𝑏𝑥22For example

Partial derivative of a function with respect to certain variable is the derivative of while regarding other variables as constants.

𝛻 𝑓 (𝒙 )=[𝜕 𝑓𝜕𝑥1

⋮𝜕 𝑓𝜕 𝑥𝑛

]

Taylor Approximation

𝑓 (𝑥 )≈∑𝑖=0

𝑘 𝑓 (𝑖 ) (𝑎 )𝑖 ! (𝑥−𝑎 )𝑖Taylor

Approximation

Taylor Series 𝑓 (𝑥 )=∑

𝑖=0

∞ 𝑓 (𝑖 ) (𝑎 )𝑖 ! (𝑥−𝑎 )𝑖

First-Order Taylor Approximation

𝑓 (𝒙 )≈ 𝑓 (𝒂 )+𝛻 𝑓 (𝒙 )⊤(𝒙−𝒂)

𝑓 (𝑥 )≈ 𝑓 (𝑎)+ 𝑓 ′ (𝑎 )(𝑥−𝑎)1 dimension

dimensions when

Gradient Descent Optimization

According to the first order Taylor approximation of ( ) :

( ) ( ) ( ) (1)It can be written as:

( ) ( ) ( ) (1)where is the learning rate, and is a unit vector represent

Tn n n

Tn n n

f x

f x hu f x h f x u O

f x hu f x h f x u Oh u

1

ing direction.Let , which is the value of in the next iteration.Our optimization objective function is:

arg min ( ) ( ) arg min ( ) (1)

The optimal solution is: ( )

n n

Tn n n

u u

n

x x hu x

f x hu f x h f x u O

u f x

Gradient Descent Algorithm

max

1

For n 1,2, , :( )

if || ( )|| , return

1End

n n

n n

n n n

Ng f xg x x

x x hgn n

K

Part 2: Probability

Independent Events• Let and be two independent events.

𝑃 ( 𝐴 ,𝐵 )=𝑃 ( 𝐴 ) 𝑃 (𝐵)

• Example 1: Coin tossing– Each tossing is independent to previous ones

• Example 2: Taking exams– Each exam is independent to previous ones– Fail 3 times:

– Pass at least 1 time:

Conditional Probability

• A person goes to sauna 6 times during the last 10 days, at most once per day.

• It snowed 8 days during the last 10 days.• It snowed 4 days during the 6 sauna days.• P(sauna | snow) = ?• P(snow | sauna) = ?

𝑃 ( 𝐴|𝐵 )= 𝑃 (𝐴 ,𝐵)𝑃 (𝐵)

Example

Bayes’ Theorem

𝑃 (𝜃|𝑦 )=𝑃 (𝑦 ,𝜃)𝑃 (𝑦 )

=𝑃 (𝑦|𝜃 ) 𝑃 (𝜃)

𝑃 (𝑦)

With same data and same prior

Maximum Likelihood Estimation

• Input: A set of observations with parameters • Output: The estimation of • Assume that all of the observations are

independent • Thus their probability can be calculated as

ℒ (𝑦∨𝜃)=∏𝑖=1

𝑛

𝑃 (𝑦 𝑖∨𝜃)

Maximum Likelihood Estimation

• We try to find the largest probability of with the given observations

• With same and , we can actually maximize :

Optimization

• Since is a increasing function, it is equivalent to

Then we can optimize it with gradient descent.

Any Question?

lecture note 2 – calculus and probability shuaiqiang wang department of cs & is university of...

Documents