experimental design for machine learning on multimedia...

Post on 17-Jul-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Experimental Design for Machine Learning on Multimedia Data

Lecture 5

Dr. Gerald Friedland, fractor@eecs.berkeley.edu

1Website: http://www.icsi.berkeley.edu/~fractor/fall2019/

2

DiscussHomework

Projectproposaldeadline:October11th,2019.

EmailprojectproposalstomeandRishi.

3

ProjectQuestions1-4(Data&ProblemInspection)

1) What is the variable the machine learner should predict? What is the required accuracy for success? What impact will adversarial examples have?

2) How much data do we have to train the prediction of the variable? Are the classes balanced? How many modalities could be exploited in the data? Is there temporal information? How much noise are we expecting? Do you expect bias?

3) How well is the data annotated (anecdotally)? What is the annotator agreement (measured)?

4) Given questions 1-3: Are we reducing information (pattern matching) or do we need to infer information (statistical machine learner)? As a consequence, what seems the best choice for the type of machine learner per modality?

4

ProjectQuestions5-8(TrainingforGeneralization)

1) Estimate the memory equivalent capacity needed for the machine learner of your choice. What is the expected generalization? How does the progression look like: Is there enough data?

2) Train your machine learner for accuracy at memory equivalent capacity. Can you reach near 100% memorization? If not, why (diagnose)?

3) Train your machine learner for generalization: Plot the accuracy/capacity curve. What is the expected accuracy and generalization ratio at the point you decided to stop? Do you need to try a different machine learner (if so, redo from 5)? Should you extract features (if so, redo from 5)?

4) How well did your generalization prediction hold on the independent test data? Explain results. How confident are you in the results?

5

ProjectQuestions9-10(FinishingTouch)

1) How do you combine the models of the modalities? Explain your choice. How confident are you in the combination results (ie., does it make sense to combine)?

2) What are the final combined results of the system? Are the experiments documented and repeatable (if not, please make sure they are, even for bad results)? Are the experiments reproducible (speculate)?

6

GenericProjectWorkflowforAccuracy

Machine Learning

Development Data

Statistical Models

Apply Models

Test Data

Results

Ground Truth

Error Metric

Accuracy Scores

Training Testing

Evaluation

7

GenericProjectWorkflowforGeneralization

Supervised Machine Learning Engineering ProcessMaximizing the Chance for Generalization/Minimizing Adverserial Examples

Labelled Data High Enough?

ClassesBalanced?

Check Annotator Agreement

“Clean“ DataAnnotate Again

Too low

Undetermined

Annotate Redundantly

Yes

Subsample to Balance Classes

Yes

No

Committed to a Specific ML

Approach?

No

Yes

Approach with Lowest Capacity

Estimate wins

Estimate Best Approach using

Capacity Estimators

Estimate Generalization

Progression

Train Machine Learner

Good Generalization Progression?

No Acquire More Labelled DataHard Decision

Congrats!

Insufficient Labelled Data

Yes Distrust Estimators

Mismatched Representation Function(s)

Training Machine Learner High Accuracy, Small Capacity?

Yes

High Accuracy only with Large Capacity

Low Accuracyeven with

Large Capacityaka “Overfitting“

Labelled Data

Train with Training Data at

Memory Capacity

Split into Train and Test Data

Run Capacity Estimator on Training Data

HighAccuracy?

Yes

Reduce Machine Learner Capacity

Train on Training Data, Test on Test

Data

Start Debugging

No

Model

Start Debugging

HighAccuracy?

Yes Use Model from Previous

Iteration

No

Already tried many different ML

approaches?

Yes

No

Acquire More Labelled Data

Gerald Friedland, v0.4 Jan 2nd, 2019 fractor@eecs.berkeley.edu

8

Conclusionssofar

▪ Thelowerlimitofgeneralizationismemorization.Thisis,theupperlimitforthesizeofamachinelearnerisit’smemorycapacity.

▪ Thememorycapacityismeasurableinbits.▪ Usingamachinelearnerthatisovercapacityisawasteofresourcesand

increasestheriskoffailure!▪ Alchemyconvertedintochemistrybymeasuring:It’stimetoconvert

guessingandcheckinginMachineLearningintoscience!Let’scallitdatascience?

▪ Todi=o:▪ Non-Binaryclassifiers,regression▪ Convolutionalnetworks,othermachinelearners▪ Re-thinkingtraining▪ Explainableadversarialexamples

9

PredictingCapacityRequirements

Givendataandlabels:HowmuchactualcapacitydoIneedtomemorizethefunction?

Theoreticalanswer:Whatistheminimumdescriptionlengthofthetablerepresentingthefunctionf(thisis,ShannonEntropy).

PracticalAnswer:1) Worstcase:Let’sbuildaneuralnetworkwhereonlythe

biasesaretrained2) Expectedcase:Howmuchparameterreductioncan

(exponential)trainingbuyus?

10

PredictingMaximumMemoryCapacity

0

1

+1...

1

.

.

.

1

1

-1

x1

x2

+/-1

.

.

.1

1

1

b1

b2

bmxn

1

1

“Dumb”Network

Runtime:O(nlogn)

11

PredictingMemoryCapacity

DumbNetwork:• Highlyinefficient.• Potentiallynot100%accurate(hashcollisions).• Wecanassumetrainingweights(andbiases)gets100%accuracywhilereducingparameters.

ExpectedReduction:Exponential!nthresholdsshouldbeabletoberepresentedwithlog2nweightsandbiases(searchtree!).

12

EmpiricalResults

Allresultsrepeatableat:https://github.com/fractor/nntailoring

13

FromMemorizationtoGeneralization

Goodnews:• Real-worlddataisnotrandom.• Theinformationcapacityofaperceptronisusually>1bitperparameter(Cover,MacKay).

Thismeans,weshouldbeabletouselessparametersthanpredictedbymemorycapacitycalculations.

Memorizationisworst-casegeneralization.

14

SuggestedEngineeringProcessforGeneralization

• Startatapproximateexpectedcapacity.• Trainto>98%accuracy.Ifimpossible,increaseparameters.• Retrainiterativelywithdecreasedcapacitywhiletestingagainstvalidationset. Shouldsee:decreaseintrainingaccuracywithincreaseinvalidationsetaccuracy

• Stopatminimumcapacityforbestheld-outsetaccuracy.

Bestcasescenario:Asparametersarereduced,neuralnetworkfailstomemorizeonlytheinsignificant(noise)bits.

15

GeneralizationProcess:ExpectedCurve

16

OvercapacityMachineLearning:Issues

▪ Wasteofmoney,energy,andtime.Badforenvironment.

▪ Unseen(redundantbits)areanecessaryconditionforadversarialexamples.See: B.Li,G.Friedland,J.Wang,R.Jia,C.Spanos,D.Song:“OneBitMatters:ExplainingAdversarialExamplesastheAbuseofRedundancy”,submittedtoICLR2019.

▪ Lessparametersgiveahigherchanceforexplainability(Occam’sRazor).See: G.Friedland,A.Metere:“MachineLearningforScience”,UQSciMLWorkshop,LosAngeles,June2018.

17

Reminder:Occam’sRazor

Amongcompetinghypotheses,theonewiththefewestassumptionsshouldbeselected.

Foreachacceptedexplanationofaphenomenon,theremaybeanextremelylarge,perhapsevenincomprehensible,numberofpossibleandmorecomplexalternatives,becauseonecanalwaysburdenfailingexplanationswithadhochypothesestopreventthemfrombeingfalsified;therefore,simplertheoriesarepreferabletomorecomplexonesbecausetheyaremoretestable.(Wikipedia,Sep.2017)

18

Demotime!

▪ Intrototoolsongithub▪ T(n,k)calculation▪ Capacityestimationgiventable▪ Capacityprogression

top related