cse 1320 intermediate programming
TRANSCRIPT
CS 7455
MOBILE APP DEVELOPMENT
Dr. Mingon Kang
Computer Science, Kennesaw State University
Terminology
Features
An individual measurable property of a phenomenon being observed
The number of features or distinct traits that can be used to describe each item in a quantitative manner
May have implicit/explicit patterns to describe a phenomenon
Samples
Items to process (classify or cluster)
Can be a document, a picture, a sound, a video, or a patient
Reference: http://www.slideshare.net/rahuldausa/introduction-to-machine-learning-38791937
Data In Machine Learning
๐ฅ๐: input vector, independent variable
๐ฆ: response variable, dependent variable
๐ฆ โ {โ1, 1}: binary classification
๐ฆ โ โ: regression
Predict a label when having observed some new ๐ฅ
Types of Variable
Categorical variable: discrete or qualitative variables
Nominal:
Have two or more categories, but which do not have an
intrinsic order
Dichotomous
Nominal variable which have only two categories or levels.
Ordinal
Have two or more categories, which can be ordered or
ranked.
Continuous variable
Mathematical Notation
Matrix: uppercase bold Roman letter, ๐
Vector: lower case bold Roman letter, ๐ฑ
Scalar: lowercase letter
Transpose of a matrix or vector: superscript T or โ
E.g.
Row vector: ๐ฅ1, ๐ฅ2, โฆ , ๐ฅ๐
Corresponding column vector: ๐ฑ = ๐ฅ1, ๐ฅ2, โฆ , ๐ฅ๐T
Matrix: ๐ = {๐๐, ๐๐, โฆ , ๐๐}
Transpose of a Matrix
Operator which flips a matrix over its diagonal
Switch the row and column indices of the matrix
Denoted as AT, Aโฒ, Atr, or, At.
[AT ]๐๐= [A]๐๐ If A is an m*n matrix, Aโ is an n*m matrix
(AT)T = A
(๐ด + ๐ต)T= AT + BT
(AB)T= BTAT
Inverse of a Matrix
The inverse of a square matrix A, sometimes called
a reciprocal matrix, is a matrix Aโ1 such that
AAโ1 = I
where I is the identity matrix.
The Inverse of a Matrix is the same idea but we
write it A-1
Reference: https://www.mathsisfun.com/algebra/matrix-inverse.html
What is Machine Learning?
Data Model
f(x)
Training
What is Machine Learning?
New Data Make a decision
f(x)
Supervised learning
Data: ๐ท = {๐1, ๐2, โฆ , ๐๐} a set of n samples
where ๐๐ =< ๐๐, ๐ฆ๐ >
๐๐ is a input vector and ๐ฆ๐ is a desired output
Objective: learning the mapping ๐: ๐ฟ โ ๐
s.t. ๐ฆ๐ โ ๐(๐๐) for all i = 1,โฆ,n
Regression: ๐ is continuous
Classification: ๐ is discrete
Linear Regression
Review of Linear Regression in Statistics
Linear Regression
Linear Regression
Linear Regression
Reference: https://lagunita.stanford.edu/c4x/HumanitiesScience/StatLearning/asset/linear_regression.pdf
Linear Regression
How to represent the data as a vector/matrix
We assume a model:
๐ฒ = b0 + ๐๐ + ฯต,
where b0 and ๐ are intercept and slope, known as
coefficients or parameters. ฯต is the error term (typically
assumes that ฯต~๐(๐, ๐2)
Linear Regression
How to represent the data as a vector/matrix
Include bias constant (intercept) in the input vector
๐ โ โ๐ร(๐+๐), ๐ฒ โ โ๐, and ๐ โ โ๐+๐
๐ = ๐, ๐ฑ๐, ๐ฑ๐, โฆ , ๐ฑ๐ฉ
๐ฒ = {y1, y2, โฆ , yn}T
๐ = {b0, b1, b2, โฆ , bp}T
Linear Regression
Find the optimal coefficient vector b that makes the most similar observation
๐ฆ1โฎ๐ฆ๐
โ111
๐ฅ11 โฏ ๐ฅ๐1โฎ โฑ โฎ
๐ฅ1๐ โฏ ๐ฅ๐๐
๐0โฎ๐๐
or
๐ฆ1โฎ๐ฆ๐
=111
๐ฅ11 โฏ ๐ฅ๐1โฎ โฑ โฎ
๐ฅ1๐ โฏ ๐ฅ๐๐
๐0โฎ๐๐
+
๐1โฎ๐๐
Ordinary Least Squares (OLS)
๐ฒ = ๐๐ + ๐
Estimate the unknown parameters (b) in linear regression model
Minimizing the sum of the squares of the differences between the observed responses and the predicted by a linear function
Residual Sum of Squares (RSS) =
๐=1
๐
(๐ฆ๐ โ ๐ฑ๐๐)2
Ordinary Least Squares (OLS)
Optimization
Need to minimize the error
min ๐ฝ(๐) =
๐=1
๐
(๐ฆ๐ โ ๐ฑ๐๐)2
To obtain the optimal set of parameters (b),
derivatives of the error w.r.t. each parameters must
be zero.
Optimization
๐ฝ = ๐T๐ = ๐ฒ โ ๐๐ โฒ ๐ฒ โ ๐๐= ๐ฒโฒ โ ๐โฒ๐โฒ ๐ฒ โ ๐๐= ๐ฒโฒ๐ฒ โ ๐ฒโฒ๐๐ โ ๐โฒ๐โฒ๐ฒ + ๐โฒ๐โฒ๐๐= ๐ฒโฒ๐ฒ โ ๐๐โฒ๐โฒ๐ฒ + ๐โฒ๐โฒ๐๐
๐๐โฒ๐
๐๐= โ2๐โฒ๐ฒ + 2๐โฒ๐๐ = 0
๐โฒ๐ ๐ = ๐โฒ๐ฒแ๐ = (๐โฒ๐)โ1๐โฒ๐ฒ
Matrix Cookbook: https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf
Linear regression for classification
For binary classification
Encode class labels as y = 0, 1 ๐๐ {โ1, 1}
Apply OLS
Check which class the prediction is closer to
If class 1 is encoded to 1 and class 2 is -1.
๐๐๐๐ ๐ 1 ๐๐ ๐ ๐ฅ โฅ 0
๐๐๐๐ ๐ 2 ๐๐ ๐ ๐ฅ < 0
Logistic regression
We will cover this later
Linear Model in Computer Vision
Features are pixel values in an image
Data matrix X:
An image is a two dimensional matrix
Need to reshape the 2D-matrix to 1D-array
Height by Width matrix Height * Width matrix array for
a single image
If we have N samples, X will be a N by (Height*Width)
matrix
Assume that every pixel is independent
Linear Model in Computer Vision
Applications
Any classification and regression problems:
Gender/age classification
Hand-written digit classification
MNIST
However, linear models are not the best for
classification problems
Logistic regression is a linear model for classification.