fa18 cs188 lecture21 perceptron and logistic regression · 2021. 1. 8. · linear classifiers §...

26
Linear Classifiers

Upload: others

Post on 03-Feb-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

  • LinearClassifiers

  • FeatureVectors

    Hello,

    Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just

    # free : 2YOUR_NAME : 0MISSPELLED : 2FROM_FRIEND : 0...

    SPAMor+

    PIXEL-7,12 : 1PIXEL-7,13 : 0...NUM_LOOPS : 1...

    “2”

  • Some(Simplified)Biology

    § Verylooseinspiration:humanneurons

  • LinearClassifiers

    § Inputsarefeaturevalues§ Eachfeaturehasaweight§ Sumistheactivation

    § Iftheactivationis:§ Positive,output+1§ Negative,output-1

    Sf1f2f3

    w1w2w3

    >0?

  • Weights§ Binarycase:comparefeaturestoaweightvector§ Learning:figureouttheweightvectorfromexamples

    # free : 2YOUR_NAME : 0MISSPELLED : 2FROM_FRIEND : 0...

    # free : 4YOUR_NAME :-1MISSPELLED : 1FROM_FRIEND :-3...

    # free : 0YOUR_NAME : 1MISSPELLED : 1FROM_FRIEND : 1...

    Dot product positive means the positive class

  • DecisionRules

  • BinaryDecisionRule

    § Inthespaceoffeaturevectors§ Examplesarepoints§ Anyweightvectorisahyperplane§ OnesidecorrespondstoY=+1§ OthercorrespondstoY=-1

    BIAS : -3free : 4money : 2... 0 1

    0

    1

    2

    freem

    oney

    +1 = SPAM

    -1 = HAM

  • WeightUpdates

  • Learning:BinaryPerceptron

    § Startwithweights=0§ Foreachtraininginstance:

    § Classifywithcurrentweights

    § Ifcorrect(i.e.,y=y*),nochange!

    § Ifwrong:adjusttheweightvector

  • Learning:BinaryPerceptron

    § Startwithweights=0§ Foreachtraininginstance:

    § Classifywithcurrentweights

    § Ifcorrect(i.e.,y=y*),nochange!§ Ifwrong:adjusttheweightvectorbyaddingorsubtractingthefeaturevector.Subtractify*is-1.

  • Examples:Perceptron

    § SeparableCase

  • MulticlassDecisionRule

    § Ifwehavemultipleclasses:§ Aweightvectorforeachclass:

    § Score(activation)ofaclassy:

    § Predictionhighestscorewins

    Binary = multiclass where the negative class has weight zero

  • Learning:MulticlassPerceptron

    § Startwithallweights=0§ Pickuptrainingexamplesonebyone§ Predictwithcurrentweights

    § Ifcorrect,nochange!§ Ifwrong:lowerscoreofwronganswer,

    raisescoreofrightanswer

  • Example:MulticlassPerceptron

    BIAS : 1win : 0game : 0 vote : 0 the : 0 ...

    BIAS : 0 win : 0 game : 0 vote : 0 the : 0 ...

    BIAS : 0 win : 0 game : 0 vote : 0 the : 0 ...

    “win the vote”

    “win the election”

    “win the game”

  • PropertiesofPerceptrons

    § Separability:trueifsomeparametersgetthetrainingsetperfectlycorrect

    § Convergence:ifthetrainingisseparable,perceptron willeventuallyconverge(binarycase)

    § MistakeBound:themaximumnumberofmistakes(binarycase)relatedtothemargin ordegreeofseparability

    Separable

    Non-Separable

  • ProblemswiththePerceptron

    § Noise:ifthedataisn’tseparable,weightsmightthrash§ Averagingweightvectorsovertime

    canhelp(averagedperceptron)

    § Mediocregeneralization:findsa“barely”separatingsolution

    § Overtraining:test/held-outaccuracyusuallyrises,thenfalls§ Overtrainingisakindofoverfitting

  • ImprovingthePerceptron

  • Non-SeparableCase:DeterministicDecisionEventhebestlinearboundarymakesatleastonemistake

  • Non-SeparableCase:ProbabilisticDecision

    0.5 |0.50.3|0.7

    0.1|0.9

    0.7|0.30.9|0.1

  • Howtogetprobabilisticdecisions?

    § Perceptronscoring:§ If verypositiveà wantprobabilitygoingto1§ If verynegativeà wantprobabilitygoingto0

    § Sigmoidfunction

    z = w · f(x)z = w · f(x)

    z = w · f(x)

    �(z) =1

    1 + e�z

  • Bestw?

    § Maximumlikelihoodestimation:

    with:

    maxw

    ll(w) = maxw

    X

    i

    logP (y(i)|x(i);w)

    P (y(i) = +1|x(i);w) = 11 + e�w·f(x(i))

    P (y(i) = �1|x(i);w) = 1� 11 + e�w·f(x(i))

    =LogisticRegression

  • SeparableCase:DeterministicDecision– ManyOptions

  • SeparableCase:ProbabilisticDecision– ClearPreference

    0.5 |0.50.3 |0.7

    0.7 |0.3

    0.5 |0.50.3 |0.7

    0.7 |0.3

  • MulticlassLogisticRegression

    § RecallPerceptron:§ Aweightvectorforeachclass:

    § Score(activation)ofaclassy:

    § Predictionhighestscorewins

    § Howtomakethescoresintoprobabilities?

    z1, z2, z3 !ez1

    ez1 + ez2 + ez3,

    ez2

    ez1 + ez2 + ez3,

    ez3

    ez1 + ez2 + ez3

    original activations softmax activations

  • Bestw?

    § Maximumlikelihoodestimation:

    with:

    maxw

    ll(w) = maxw

    X

    i

    logP (y(i)|x(i);w)

    P (y(i)|x(i);w) = ew

    y(i)·f(x(i))

    Py e

    wy·f(x(i))

    =Multi-ClassLogisticRegression

  • NextLecture

    § Optimization

    § i.e.,howdowesolve:

    maxw

    ll(w) = maxw

    X

    i

    logP (y(i)|x(i);w)