regression usman roshan cs 675 machine learning. regression same problem as classification except...

15
Regression Usman Roshan CS 675 Machine Learning

Upload: doris-morton

Post on 18-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Regression

Usman RoshanCS 675

Machine Learning

Regression

• Same problem as classification except that the target variable yi is continuous.

• Popular solutions– Linear regression (perceptron)– Support vector regression– Logistic regression (for regression)

Linear regression• Suppose target values are generated by a

function yi = f(xi) + ei

• We will estimate f(xi) by g(xi,θ). • Suppose each ei is being generated by a Gaussian

distribution with 0 mean and σ2 variance (same variance for all ei).

• This implies that the probability of yi given the input xi and variables θ (denoted as p(yi|xi,θ) is normally distributed with mean g(xi,θ) and variance σ2.

Linear regression• Apply maximum likelihood to estimate g(x, θ)• Assume each (xi,yi) i.i.d.

• Then probability of data given model (likelihood) is P(X|θ) = p(x1,y1)p(x2,y2)…p(xn,yn)

• Each p(xi,yi)=p(yi|xi)p(xi)

• p(yi|xi) is normally distributed with meang(xi,θ) and variance σ2

• Maximizing the log likelihood (like for classification) gives us least squares (linear regression)

Logistic regression

• Similar to linear regression derivation• Minimize sum of squares between predicted

and actual value• However – predicted is given by sigmoid function and– yi is constrained in the range [0,1]

Support vector regression

• Makes no assumptions about probability distribution of the data and output (like support vector machine).

• Change the loss function in the support vector machine problem to the e-sensitive loss to obtain support vector regression

Support vector regression• Solved by applying Lagrange multipliers like in

SVM• Solution w is given by a linear combination of

support vectors (like in SVM)• The solution w can also be used for ranking

features.• From regularized risk minimization the loss

would be 0

1

1max(0, | ( ) | )

nT

i ii

y w x wn

Application

• Prediction of continuous phenotypes in mice from genotype (Predicting unobserved phen…)

• Data are vectors xi where each feature takes on values 0, 1, and 2 to denote number of alleles of a particular single nucleotide polymorphism (SNP)

• Data has about 1500 samples and 12,000 SNPs• Output yi is a phenotype value. For example coat

color (represented by integers), chemical levels in blood

Mouse phenotype prediction from genotype

• Rank SNPs by Wald test– First perform linear regression y = wx + w0

– Calculate p-value on w using t-test• t-test: (w-wnull)/stderr(w))• wnull = 0• T-test: w/stderr(w)• stderr(w) given by Σi(yi-wxi-w0)2 /(xi-mean(xi))

– Rank SNPs by p-values– OR by Σi(yi-wxi-w0)

• Rank SNPs by Pearson correlation coefficient• Rank SNPs by support vector regression (w vector in SVR)• Rank SNPs by ridge regression (w vector)• Run SVR and ridge regression on top k ranked SNP under cross-

validation.

MCH phenotype in mice

CD8 phenotype in mice

Rice phenotype prediction from genotype

• Same experimental study as previously• Improving the Accuracy of Whole Genome Pre

diction for Complex Traits Using the Results of Genome Wide Association Studies

• Data has 413 samples and 37,000 SNPs (features)

• Basic unbiased linear prediction (BLUP) method improved by prior SNP knowledge (given in genome-wide association studies)

Days to flower

Flag leaf length

Panicle length