experimental design for machine learning on multimedia...

Experimental Design for Machine Learning on Multimedia Data

Lecture 5

Dr. Gerald Friedland, fractor@eecs.berkeley.edu

1Website: http://www.icsi.berkeley.edu/~fractor/fall2019/

DiscussHomework

Projectproposaldeadline:October11th,2019.

EmailprojectproposalstomeandRishi.

ProjectQuestions1-4(Data&ProblemInspection)

1) What is the variable the machine learner should predict? What is the required accuracy for success? What impact will adversarial examples have?

2) How much data do we have to train the prediction of the variable? Are the classes balanced? How many modalities could be exploited in the data? Is there temporal information? How much noise are we expecting? Do you expect bias?

3) How well is the data annotated (anecdotally)? What is the annotator agreement (measured)?

4) Given questions 1-3: Are we reducing information (pattern matching) or do we need to infer information (statistical machine learner)? As a consequence, what seems the best choice for the type of machine learner per modality?

ProjectQuestions5-8(TrainingforGeneralization)

1) Estimate the memory equivalent capacity needed for the machine learner of your choice. What is the expected generalization? How does the progression look like: Is there enough data?

2) Train your machine learner for accuracy at memory equivalent capacity. Can you reach near 100% memorization? If not, why (diagnose)?

3) Train your machine learner for generalization: Plot the accuracy/capacity curve. What is the expected accuracy and generalization ratio at the point you decided to stop? Do you need to try a different machine learner (if so, redo from 5)? Should you extract features (if so, redo from 5)?

4) How well did your generalization prediction hold on the independent test data? Explain results. How confident are you in the results?

ProjectQuestions9-10(FinishingTouch)

1) How do you combine the models of the modalities? Explain your choice. How confident are you in the combination results (ie., does it make sense to combine)?

2) What are the final combined results of the system? Are the experiments documented and repeatable (if not, please make sure they are, even for bad results)? Are the experiments reproducible (speculate)?

GenericProjectWorkflowforAccuracy

Machine Learning

Development Data

Statistical Models

Apply Models

Test Data

Results

Ground Truth

Error Metric

Accuracy Scores

Training Testing

Evaluation

GenericProjectWorkflowforGeneralization

Supervised Machine Learning Engineering ProcessMaximizing the Chance for Generalization/Minimizing Adverserial Examples

Labelled Data High Enough?

ClassesBalanced?

Check Annotator Agreement

“Clean“ DataAnnotate Again

Too low

Undetermined

Annotate Redundantly

Subsample to Balance Classes

Committed to a Specific ML

Approach?

Approach with Lowest Capacity

Estimate wins

Estimate Best Approach using

Capacity Estimators

Estimate Generalization

Progression

Train Machine Learner

Good Generalization Progression?

No Acquire More Labelled DataHard Decision

Congrats!

Insufficient Labelled Data

Yes Distrust Estimators

Mismatched Representation Function(s)

Training Machine Learner High Accuracy, Small Capacity?

High Accuracy only with Large Capacity

Low Accuracyeven with

Large Capacityaka “Overfitting“

Labelled Data

Train with Training Data at

Memory Capacity

Split into Train and Test Data

Run Capacity Estimator on Training Data

HighAccuracy?

Reduce Machine Learner Capacity

Train on Training Data, Test on Test

Start Debugging

HighAccuracy?

Yes Use Model from Previous

Iteration

Already tried many different ML

approaches?

Acquire More Labelled Data

Gerald Friedland, v0.4 Jan 2nd, 2019 fractor@eecs.berkeley.edu

Conclusionssofar

▪ Thelowerlimitofgeneralizationismemorization.Thisis,theupperlimitforthesizeofamachinelearnerisit’smemorycapacity.

▪ Thememorycapacityismeasurableinbits.▪ Usingamachinelearnerthatisovercapacityisawasteofresourcesand

increasestheriskoffailure!▪ Alchemyconvertedintochemistrybymeasuring:It’stimetoconvert

guessingandcheckinginMachineLearningintoscience!Let’scallitdatascience?

▪ Todi=o:▪ Non-Binaryclassifiers,regression▪ Convolutionalnetworks,othermachinelearners▪ Re-thinkingtraining▪ Explainableadversarialexamples

PredictingCapacityRequirements

Givendataandlabels:HowmuchactualcapacitydoIneedtomemorizethefunction?

Theoreticalanswer:Whatistheminimumdescriptionlengthofthetablerepresentingthefunctionf(thisis,ShannonEntropy).

PracticalAnswer:1) Worstcase:Let’sbuildaneuralnetworkwhereonlythe

biasesaretrained2) Expectedcase:Howmuchparameterreductioncan

(exponential)trainingbuyus?

PredictingMaximumMemoryCapacity

“Dumb”Network

Runtime:O(nlogn)

PredictingMemoryCapacity

DumbNetwork:• Highlyinefficient.• Potentiallynot100%accurate(hashcollisions).• Wecanassumetrainingweights(andbiases)gets100%accuracywhilereducingparameters.

ExpectedReduction:Exponential!nthresholdsshouldbeabletoberepresentedwithlog2nweightsandbiases(searchtree!).

EmpiricalResults

Allresultsrepeatableat:https://github.com/fractor/nntailoring

FromMemorizationtoGeneralization

Goodnews:• Real-worlddataisnotrandom.• Theinformationcapacityofaperceptronisusually>1bitperparameter(Cover,MacKay).

Thismeans,weshouldbeabletouselessparametersthanpredictedbymemorycapacitycalculations.

Memorizationisworst-casegeneralization.

SuggestedEngineeringProcessforGeneralization

• Startatapproximateexpectedcapacity.• Trainto>98%accuracy.Ifimpossible,increaseparameters.• Retrainiterativelywithdecreasedcapacitywhiletestingagainstvalidationset. Shouldsee:decreaseintrainingaccuracywithincreaseinvalidationsetaccuracy

• Stopatminimumcapacityforbestheld-outsetaccuracy.

Bestcasescenario:Asparametersarereduced,neuralnetworkfailstomemorizeonlytheinsignificant(noise)bits.

GeneralizationProcess:ExpectedCurve

OvercapacityMachineLearning:Issues

▪ Wasteofmoney,energy,andtime.Badforenvironment.

▪ Unseen(redundantbits)areanecessaryconditionforadversarialexamples.See: B.Li,G.Friedland,J.Wang,R.Jia,C.Spanos,D.Song:“OneBitMatters:ExplainingAdversarialExamplesastheAbuseofRedundancy”,submittedtoICLR2019.

▪ Lessparametersgiveahigherchanceforexplainability(Occam’sRazor).See: G.Friedland,A.Metere:“MachineLearningforScience”,UQSciMLWorkshop,LosAngeles,June2018.

Reminder:Occam’sRazor

Amongcompetinghypotheses,theonewiththefewestassumptionsshouldbeselected.

Foreachacceptedexplanationofaphenomenon,theremaybeanextremelylarge,perhapsevenincomprehensible,numberofpossibleandmorecomplexalternatives,becauseonecanalwaysburdenfailingexplanationswithadhochypothesestopreventthemfrombeingfalsified;therefore,simplertheoriesarepreferabletomorecomplexonesbecausetheyaremoretestable.(Wikipedia,Sep.2017)

Demotime!

▪ Intrototoolsongithub▪ T(n,k)calculation▪ Capacityestimationgiventable▪ Capacityprogression

experimental design for machine learning on multimedia...

Documents

spencer mclintock fall2019

new student orientation fall2019 - sci.tohoku.ac.jp

product catalog fall2019 - fff enterprisesproduct catalog...

motion capture cs294-7 jacqueline takeshita mindy lue

cs294-6 reconfigurable computing

l.vandenberghe ece133a(fall2019)...

experimental design for machine learning on multimedia...

cs294-1 behavioral data mining -...

hands on: multimedia methods for large scale video...

hands on: multimedia methods for large scale video analysis...

lastone(woohoo!) fall2019 duemonday,december2ndby5pm ·...

sp08 cs294 lecture 9 -- speech signalklein/cs294-19/sp08...

sp09 cs294 lecture 9 -- speech signal.ppt

fall2019-centos7-nfssweden.minnesota.edu/examples/ex/fall2019-centos7-nfs.pdf ·...

industry update | fall2019

product catalog fall2019

cs294-32: dynamic program analysis, testing, and debugging

communication avoiding algorithms for dense linear algebra...

project report cs294-1: behavioral data mining...

ucb cs294-88: declarative design [0.2cm] chisel...