of - unige.it · neta rch. regula rization revisited consider a highly non-natural p roblem fo r nn...

38

Upload: others

Post on 22-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

Introducetion

to

Ensem

blesofExperts

and

Hybrid

M

ethods

NathanIntrator

Tel-AvivUniversityandBrownUniversity

www.physics.brown.edu/users/faculty/intrator

Collaboration:

YairShimshoni,YuvalRaviv,InnaStainvas,ShimonCohen

Page 2: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

Outline

Basicde�nitions

Averagingformulation

Combiningmodelsofdi�erentnature

Assessingthegoodnessofanexpert

Predicting

large

ensemble

performance

from

a

smallensemble

Regularizationrevisited

netarch

Page 3: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

GeneralSetup

Problem:

Smalltrainingset,largenumberof

dependentvariables

BestSolution:

Detailedmodelingofthedata

withveryfew

freeparameterstoestimate

Second

best:

Useamore exiblemodel,esti-

matemanyparameters

netarch

Page 4: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

GeneralSetup

Tradeo�between:

numberoffreeparameters

datacomplexity

reliabilityoftheestimation

netarch

Page 5: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

W

hatareEnsemblesandHybridarchitectures

netarch

Page 6: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

W

hatareEnsemblesandHybridarchitectures

De�nition:

Combiningdi�erentmodelswhereeachiscapableof

modelingtheobservationsseparately

netarch

Page 7: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

ReasonsforEnsemblesandHybridMethods

netarch

Page 8: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

ReasonsforEnsemblesandHybridMethods

Uncertaintyaboutthedesiredmodel

netarch

Page 9: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

ReasonsforEnsemblesandHybridMethods

Uncertaintyaboutthedesiredmodel

{Uncertaintyaboutmodelparameters

{Uncertaintyaboutmodelcapacity

{Uncertaintyaboutmodelcomplexity

{Uncertaintyaboutmodeltypeandarchitecturen

etarch

Page 10: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

Uncertaintyaboutmodelparameters

Whentheoptimizationsolutionisunique,uncer-

taintyresultsfrom

thechoiceofthetrainingset

When

thesolution

isnon-unique,additionalun-

certaintyresultsfrom

theinitialchoiceofmodel

parameters

netarch

Page 11: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

Uncertaintyaboutmodelparameters

(continued)

Usuallyaddressedby:

Imposingaprior�(W

)onthedistributionofpa-

rameters

Integrationoverthedistribution:

Z�(W

)W

(x)dx:

Thismaybeproblematicduetomultiplelocalminima.

netarch

Page 12: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

Uncertaintyaboutmodelparameters

(continued)

A

betterapproach:Integrate(average)overthepre-

dictionMW

ofallthesemodels

Z�(W

)MW

dW:

Leadstoensembleofexpertsasanapproximationto

amodelposterior

netarch

Page 13: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

Combiningmodelsofdi�erentnature

SequentialmethodsofaHybrid avor

AdditiveandGeneralizedadditivemodels(GAM)

Projectionpursuitregression

Matchingpursuit:Choosefrom

a(nonorthogonal)

collectionofbasisfunctions

netarch

Page 14: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

Combiningmodelsofdi�erentnature

Reasonsforcombination

EÆciency

di�erence

between

models,

training

methodology

Sequentialmodeling

Divideandconquer

Modelinterpretation

netarch

Page 15: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

Actualcombination

Similartothecombinationofmodelswithdi�erent

parametervalues:

Construct(orempiricallyestimate)aposteriorto

themodels�(Mi (W

)),whereirepresentsthedif-

ferentmodels

Integrateovertheprior

Z�(Mi (W

))Mi (W

)

netarch

Page 16: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

Prosandconsofcombiningmodelsusinga

posteriordistribution

Pros

Appearsto

modelthedatabetter,�tthemore

appropriatemodels

Removesnaturallyveryunrelatedmodels

Smallerensemblesizeworks�ne

netarch

Page 17: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

Prosandconsofcombiningmodelsusinga

posteriordistribution

Cons

Regularizationissimpler

Sensitivitytowrongmodelsisreduced

Trainingforoptimalensembleperformanceissim-

pler

netarch

Page 18: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

Maincaveatfor"smart"averaging:Construct

usefulModelAssessment

A

"good"modelassessmentcouldbeusefulfor

modelaveraging.

Whentwomodelshavesimilarpredictionsshould

wegivethem

sameimportance?

netarch

Page 19: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

Maincaveatfor"smart"averaging:Construct

usefulModelAssessment

Simplyput,ifa40hiddenunitarchitectureper-

formsaswellasa5hiddenunitarchitecture,which

oneshouldweprefer?

netarch

Page 20: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

Maincaveatfor"smart"averaging:Construct

usefulModelAssessment

Simplyput,ifa40hiddenunitarchitectureper-

formsaswellasa5hiddenunitarchitecture,which

oneshouldweprefer?

Informationtheorymaysurpriseushere..

netarch

Page 21: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

ModelAssessment(Hinton&

vanCamp,1993)

Basicidea

Theperformanceofanexpertisafunctionofits

error(residual)andafunctionofitscomplexity.

Thecomplexityofamodelisafunction

ofthe

numberofparametersandtherequiredaccuracy

fortheparameters

netarch

Page 22: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

ModelAssessment(Hinton&

vanCamp,1993)

Tousethesamescale,wemeasurethecode-length

oftheresidualandofthemodelparameters

Thecode-lengthofamodelisobtainedusingthe

posteriorprobabilityoftheparameters

Modelassessmentisthusinverselyproportionalto

thesum

ofthecode-lengths

netarch

Page 23: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

Prosandconsofcombiningmodelsusinga

posteriordistribution

Cons

Regularizationissimplerandsensitivityreduced

Varianceoftheensemblecanbereduced

Trainingforoptimalensembleperformance

Predictlargeensembleperformancefrom

asmall

set

netarch

Page 24: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

Variance/BiasDecompositionforEnsembles

�f(x)=

1Q

QXi=1

fi (x):

E[(�f�E[�f])2]=

E[(1QX

fi�E[1QX

fi ])2]

=

E[(1QX

fi )2]�(E[1QX

fi ])2:

(1)

The�rstRHSterm

canwerewrittenas

E[(1QX

fi )2]=

1Q2 XE[f2i]+

2Q2 Xi<

jE[fi fj ];

netarch

Page 25: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

Variance/BiasDecompositionforEnsembles

andthesecondterm

gives,

(E[1QX

fi ])2=

1Q2 X(E[f2i])2+

2Q2 Xi<

jE[fi ]E[fj ]:

Pluggingtheseequalitiesinto(1)gives

E[(�f�E[�f])2]=

1Q2 XfE[f2i]�(E[fi ])2g+

2Q2 Xi<

j fE[fi fj ]�E[fi ]E[fj ]g:

Set

=

Var(fi )+(Q�1)maxi;j (E[fi fj ]�E[fi ]E[fj ]):

Itfollows[ab�a2+b2

2

)

E[fi fj ]�E[fi ]E[fj ]�maxiVar(fi )]that

Var(�f)�

1Q �max

i

Var(fi ):

(2)

netarch

Page 26: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

Error as a function of ensemble size and training time

(Horn et al., 98)

0 20 40 60 80 100 120 140

t

0.070

0.075

0.080

0.085

0.090

0.095

0.100

ARV Q=1

Q=2

0 20 40 60 80 100 120 140

t

0.075

0.080

0.085

0.090

0.095

0.100

ARV

Q=1

Q=2

0 0.2 0.4 0.6 0.8 1

1/Q

0.074

0.076

0.078

0.080

0.082

0.084

ARV

Di�erent ensembles of two predictors as a function of training time. The

variance goes down as 1/Q.

netarch

Page 27: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

Regularizationrevisited

Considerahighlynon-naturalproblem

forNN

Low

dimensional(highlynonlinear)

StudytheabilitytocontrolmodelpropertiesCa-

pacity,Variance,Bias/Smoothness

EasyvisualizationofGeneralizationProperties

netarch

Page 28: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

The Two-Spiral Problem

c(-6, 6)

c(-6

, 6)

-6 -4 -2 0 2 4 6-6

-4-2

02

46

o o

x

o

x

o

x

o

x

o

x

o

x

o

x

ox

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

xo xo

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

xoxo

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

xo xo

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

ox ox ox

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

xo

xo xo xo xo

xo

xo

x

o

x

o

x

o

xox

� 194 X,Y values. Half produce a 1 output, and half produce 0

� Lang and Witbrock (1988) proposed a 2-5-5-5-1 net (138 weights)

� Fahlman Lebiere (1990) Cascade Correlation architecture

� Baum and Lang (1991) Net of 2�50�1 could be consistent with training

set, but could not be found from random initial weights

� De�uant (1995) suggested the "Perceptron Membrane": piecewise linear

discriminating surfaces using 29 perceptrons. Non smooth solution

netarch

Page 29: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

The noisy spiralsc(-6, 6)

c(-6

, 6)

-6 -4 -2 0 2 4 6

-6-4

-20

24

6

o oo

x

o

x

o

x

o

x

o

x

o

xox

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

xo x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

xo

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

xo xo

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

ox ox

o

xo

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

x

o

xo

xo

xo xo xo

xo

x

ox

o

x

o

xox

o

x

Additional Gaussian noise (SD=0.3)

netarch

Page 30: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

Di�erentnoiselevels

Resultsoftrainingwithdi�erentnoiselevels.�=

0;:::;0:8

netarch

Page 31: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

Di�erentnoiselevelsandoptimalweightdecayn

etarch

Page 32: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

Summary:40NetEnsemble

Top

left:

Noconstrains.Top

right:

OptimalWD,nonoise.Bottom

left:

Optimalnoise,noWD.Bottom

right:Optimalnoise&

WD.

netarch

Page 33: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

LocalGAM

Localgeneralizedadditivemodel(HastieTibshirani,1986)

Usesapolynomial�tofdegree1(optimal)

Thespanparameterdeterminesthedegreeoflocalityoftheestimation

Idealmodelfortheproblem

{

Localwithcontrolonlocality

{

Noridgeconstraints

{

Providesauniquemodel(lessvariability)

{

Smoothnesscontrolledbylocalityanddegreeofthepolynomial

netarch

Page 34: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

Noisy GAM

No bootstrap sd=0 sd=0.025

sd=0.05 sd=0.1 sd=0.3

Average of 20 GAMs with varying degrees of noisenetarch

Page 35: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

Takehomefrom

theSpirals

NN

areeasytoregularize

{Weightdecay(smoothing)

{Ensembleaverage

Bootstrapwith

noiseisusefulforothermodels'

regularizationandisnotequivalenttosmoothingn

etarch

Page 36: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

Challenge:

show

similarperformanceusing

Stacking,Bagging,

Boosting,Arcing,Randomization,etc.

netarch

Page 37: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

ProblemsinInterpretabilityofNN

�Themodelisnotidenti�ablesincethereisnouniquesolution

toa�xedANN

architectureandlearningrule.

�Estimationwithgradientdescentincreasesvariabilityofthe

modelduetolocalminima

�ThereisnoclearOptimalnetworkarchitecture(numberof

hiddenlayers,numberofhiddenunits,recurrent,secondor-

der,etc.)

�Nonlinearmodel:alle�ectsshouldonlybecalculatedlocally

(perinputobservation)

�How

todevisesummarystatisticsforrankingbetweenvari-

ables?

netarch

Page 38: of - unige.it · neta rch. Regula rization revisited Consider a highly non-natural p roblem fo r NN Lo w dimensional (highly nonlinea r) Study the abilit y to control mo del p rop

Summary

Whilemostactivityisgearedtowardssamearchi-

tectureensembles,Di�erentarchitectureensemble

ispromising

Modelassessmentwaspresentedforsameordif-

ferentarchitectureensembles

Variancecontrolispossiblewithsimpleaveraging

Largeensembleperformancecanbepredictedfrom

smallset

netarch