basics of bayesian formalism - caltech astronomygeorge/aybi199/mahabal_bayes.pdf · 2011-05-12 ·...
TRANSCRIPT
BasicsofBayesianFormalism
AshishMahabal(PQ,CSS,JPLcollabs)
Ay/Bi199Caltech,12May2011
AdvantagesofBayesianNetworks
• Handlingofincompletedata– Real‐worldcases
• LearningcausalconnecNons– Whatvariablecausedwhat
• IncorporaNngdomainknowledge– Expertscanweightinatdifferentpoints
• Memorizing(akaoverfiVng)avoided– Noholdoutnecessary
• TueMay17MoghaddamBayesianMethods
NonparametricBayesandGaussianProcesses
AmbiNousOutline
• BasicastronomyclassificaNontrivia• TimeandplaceforBayesiantechniques
• Basicconceptsrelatedtobelief• Logicandprobabilitytheory• Theinversionformula
• ApplicaNontoastronomy
AstronomicalClassificaNonandtheNmedomain
• Movingobjects(asteroids,TNOs,KBOs)• SNe(cosmologicalstandardcandles,endpointsofstellar
evoluNon)
• GRBorphana^erglows(constrainingbeamingmodels)
• Variablestars(stellarastrophysics,galacNcstructure)• AGN(QSOs,fuellingmechanisms,lifeNmes)• Blazars,CosmicRays,…
Rapidfollow‐upkeystounderstanding
TowardsAutomatedEventClassificaNon
Eventparameters:m1(t),m2(t),…α,δ,µ,…imageshape…
colors lightcurves
etc.
ExpertandMLgeneratedpriors
contextualinformaNon
EventClassifica<on
Engine
P(SNIa)=…P(SNII)=…P(AGN)=…P(CV)=…P(dM)=…
….
AnecessityforlargesynopNcsurveys
ClassificaNonprobabiliNes(evolving,iterated)
BasicastronomyclassificaNontriviaColors
• MagnitudeasbasicobservaNon(flux)• ColorasfluxraNo• Color‐colordiagramasadiagnosNc
• Ambiguity
BasicastronomyclassificaNontrivia
• AdachprobabiliNesthroughpriorstovariousclassesanddeterminewhatclassanewlylookedatobjectbelongsto
• BayesiantechniquesallowustodothisinaraNonalmannerevenwhensomeofthedataisuncertainormissing
TowardsAutomatedEventClassificaNon
Eventparameters:m1(t),m2(t),…α,δ,µ,…imageshape…
colors lightcurves
etc.
ExpertandMLgeneratedpriors
contextualinformaNon
EventClassifica<on
Engine
P(SNIa)=…P(SNII)=…P(AGN)=…P(CV)=…P(dM)=…
….
AnecessityforlargesynopNcsurveys
ClassificaNonprobabiliNes(evolving,iterated)
Bayesiantechniques
• Bayesianmethodsprovideaformalismforreasoningaboutpar<albeliefsundercondi<onsofuncertainty
Beliefisgoingtobeacrucialword
A:Worldwillendin2012p(A|K)beliefaboutAgivenabodyofknowledgeK.OJenwriKensimplyasp(A).
• p(NOTA)beliefthatAwillnothappen• WhenKchanges,p(A)andp(NOTA)changeaccordingly
• Ingeneral:• 0≤p(A)≤1• p(sureproposiNon)=1• p(AorB)=p(A)+p(B)whenAandBaremutuallyexclusive
• p(A)+p(NOTA)=1• p(A)=p(A,B)+p(A,NOTB)– p(A,B)==p(AandB)Ingeneral:– ForBi=B1,B2,B3,…Bnmutuallyexclusivep(A)=Σp(A,Bi)=p(A,B1)+p(A,B2)+..+p(A,Bn)
Wewillrevisittheselater.Catchword:Belief
Anexample
• A:outcomeof2diceisequal• B1:2nddieis1• B2..B6:2nddieis2..6• Bi=B1..B6formaparNNon(areexhausNve)
• p(A)=Σp(A,Bi)=p(A,B1)+..+p(A,B6) =1/36+..+1/36
=1/6
Aboutbelief
• RuleshaveexcepNons• IgnoringexcepNonsleadstouncertainty• IfweconsiderallexcepNons,wemaynotbeabletoproceed
• MiddlewayistosummarizeexcepNons
That’swhereBayesformalismleadsusto
Anexample
SupposeIhaveabirdWhatcanbesaidaboutitsabilitytofly?
Birdsfly.SothebirdunderquesNonshouldbeabletofly
Anexample
SupposeIhaveabirdWhatcanbesaidaboutitsabilitytofly?Birdsfly.SothebirdunderquesNonshouldbeabletofly
• Whatifit’sapenguin?• Dead?• Withwingscut?• Madeofpaper?
Summary:Mostbirdscanfly.NeedtomakeitquanNtaNve
Logicalapproach?
• Assignnumericalvaluestouncertaintyandcombinethemliketruthvalues
• ButsourcesofuncertaintyarenotindependentanditisnoteasytoevaluateeffectofaddiNonalevidencee.g.inthelastexample,ifwearetoldthatthesaidbirdhasexistedfor1year,howdowetakethatintoconsideraNon?
Whatisp(B|A)?
• Not“GivenA,theprobabilitythatBistrue”ThatistrueonlyifBdoesnot(also)dependonanythingelse.Becauseifweknewotherthings,maybetheprobabilitywillbedifferent.
Ifof10000species10can’tfly,then:p(cantfly|bird)=1/1000(blanketstatement)Thatdoesnotindicateifweknowanythingelse(here,e.g.thatthebirdhasexistedfor1year).
Verifica<onofirrelevancyiscrucial
• Rule:Whatgoesupmustcomedown– A:foocomesdown
– B:foogoesup– P(A|B)=1
• Rule:Whatgoesupmustcomedown– A:foocomesdown
– B:foogoesup– P(A|B)=1Isthatreallytrue?
Whatifupwardvelocity>escapevelocity?Wheredidescapevelocitycomefrom?Escapevelocitywasalwaysthere
IntenNonalv.extensional
• TheserulebasedsystemsarecomputaNonallyconvenient,butsemanNcallyinconvenient.
• TheoppositeistrueofBayesformalism:itsdeclaraNveandmodelbased
A
B
1‐m
n
mP(B|A)=m=>Inallworldsthatsa<sfyA,thosealsosa<sfyingBareafrac<onm
ParNNonedCondiNonalsandMarginals
Abitmoreaboutlogicalsystems
• A:Thegrassiswet• B:ItrainedAgivescredibilitytoB.Inarulebasedsystem,thatweightincreasesirrevocably.
If,later,C:Thesprinklerwason,
AndD:Theneighbor’sgrassisdry
ItisdifficulttoconnectAandC
• Itrained
B
• Sprinklerson
C
• Neighbor’ssprinklersoff
E
• Wetgrass
A
• Neighbor’sgrassdry
D
Ifanything,CbecomeslesscredibleonceweknowAtobetrue
Chaining?
• Logic:IfAthenBandifBthenC=>ifAthenC• Inplausiblereasoning,itcanleadtoproblems:– Ifthegroundiswet,thenitrained– Ifthesprinklersareon,thegrassiswetDoesthatmean:ifsprinklersareon,itrained?
Ifanything,ittakesawaysupportforsuchasuggesNon.
Asystemofrulesproducescoherentupdatesifandonlyifrulesformadirectedtreei.e.notworulesmaystemfromthesamepremise.WetgrassherepointstotwopossibleexplanaNons.
Itrained
Grassiswet
Sprinklerson
Whynetworks?
Tomakep(B|A)meaningful,wehavetoshow: OtheritemsinknowledgebaseareirrelevanttoB
BedersNll,makeunignorablequicklyidenNfiableandaccessible
Neighboringnodesinagraphallowthat.Whatisnotlocaldoesnotmader!
NetworksarealsousedinAIetc.butBNhaveclearsemanNcs.Mostfeaturescanbederivedfromtheknowledgebase.
Theseplaycentralroleinuncertaintyformalisms BN causalnets influencediagrams Constraintnetworks
Moreimportantterms
• Likelihood:inblackjackthelikelihoodofgeVng10ishigher(becauseitcanbe10,J,QorK)
• CondiNoning:P(A|C)=p(A,C)/p(C)
Belief
CondiNoningbar
Knowledge/context
P(fly(a)|bird(a))=HIGHP(fly(a)|bird(a),sick(a))=LOW (non‐zero!)
(retracNonpossible)
P(fl|bird)=α*p(fl|bird,sick)+(1‐α)*p(fly(bird,NOTsick) 0<α<1
• Relevance:potenNalchangeinbeliefduetochangeinknowledge
• CausaNon:
rain
sprinkler
Wetpavement
Bananapeel
Fallduetoslipping
Oncefallingisobserveditsirrelevantwhythepavementwaswet
Casestudy:collegeplans(Heckerman,1995,MSR‐TR‐95‐06)
• Sex(SEX:M,F)
• SocioeconomicStatus(SES:L,M,U,H)• IQ(IQ:L,M,U,H)
• Parentalencouragement(PE:L,H)• Collegeplans:(CP:Y,N)
Hiddenvariable
UsingthetermsjustexplainedwecanuseprobabilityfordescribingqualitaNvephenomena.
Onecanseeifrefinements,extensionsarepossible.Ifitisbasedontheory,wecanunderstandexactlywhatadjustmentsneedtobemade.
Theinversionformula
p(H|e)=p(e|H)*p(H)/p(e)Posterior=likelihood*prior/normalizingconstant
p(e)=p(e|H)*p(H)+p(e|NOTH)*p(NOTH)
Theformulaseemstocomefrom:
p(A|B)=p(A,B)/p(B)andp(B|A)=p(A,B)/p(A)
Assesingp(H|e)
Inagamblingroomsomeonecalls12Isitfromapairofdice,orfromarouledewheel?
• Fordice:p(e|H)=p(12|dice)=1/36• Forroulede:p(e|H)=p(12|roulede)=1/38
Thusiftherearemorethan38/36rouledewheelsintheroom,p(roulede)ismorelikely
Talkingaboutp(roulede|12)wouldhavebeenmuchmoredifficult.
3prisonerproblem
1prisoner.3doors.2leadtodeath,1toescape.S/Heisaskedtochooseone.Oncehehasindicatedhischoice,oneoftheotherdoorsisindicatedtobeleadingtodeath.Heisgivenachancetoswitchtothethirddoor.Shouldhe?
Whatifthereare1000doors(999leadingtodeath)?Shouldheswitch?
TheBayesiantwist
OfthreeprisonersA,B,Conlyoneisgoingtobehangedandtheothertwopardoned.
Asaystoguard:Givethisledertooneofthepardonedones.
Anhourlater,Aaskstheguard:tellmewhodidyougiveitto?
Theguardanswers“B”.Areasons:SoeitherCwillbehanged,orIwillbe.Prob.ThatIwillbeis50%.Isheright?
p(G_A|I_B)=p(I_B|G_A)*p(G_A)/p(I_B) =1*(1/3)/(2/3)=½
So,applyingBayesianlogicincorrectlyleadstofalseresults.TheerrorhereismisinterpreNngthecontext.
Ratherthan:I_B=Bwillbereleased,I’_B=GuardsaidBwillbereleased.p(G_A|I’_B)=p(I’_B|G_A)*p(G_A)/p(I’_B)=(1/2)*(1/3)/(1/2)=1/3
Caseof1000?
• 1000prisonersofwhich1istobeputtodeath• Afindsalistof998tobereleasedwithouthisnameonit
• Whatshouldourbeliefbethathewillbetheonebeingputtodeath?
Caseof1000?
• Listof998tobereleased• Queryassociatedwiththeprintout:998righthanders
• Aisle^‐handed• ??Difficulttotreatsuchignoranceingeneral.
• MulNvaluedhypothesis• Uncertainevidence• Virtual(intangible)evidence• PredicNngfutureevents• MulNplecauses,explainingaway
• Padernsofplausiblereasoning• …
NaïveBayes
• x:featurevectorofeventparameters• y:objectclassthatgivesrisetox(1<y<k)• Certainfeaturesofxknown:
– PosiNon– Fluxatobservedwavelength
• Otherswillbeunknown– Color– Changeinmag/fluxoverNmebaselines
NaïveBayes(contd.)
• AssumpNon:basedony,xisdecomposableintoBdisNnctindependentclasses(labeledxb)
• Thishelpswiththecurseofdimensionality• Alsoallowsustodealwithmissingvalues• Alternateparallelsupplementalsupervised
classificaNon– AutomatedNeuralNetworks(ANN)– SupportVectorMachines(SVM)
Follow‐up(formissingvalues)
• Suchthatitwillhelpdiscriminatebeder• ServeprobabiliNessothatconsumerscanchoosetheirtypesoftransients
• Widestpossiblemodels
Choosingfollow‐upconfigs
r‐icolor,hi‐zquasar,bluestar
TransientclassificaNonmantra
• Obtainacoupleofepochsinoneormorefilters
• AssignsprobabiliNesfordifferentclasses• ChooseobservaNons(filters,wavelengths)forbestdiscriminaNon
• FeedthenewobservaNonsbackin• ReviseprobabiliNes,chooseobservaNons,…• Basedonconfirmedclass(how?)revisepriors
Summary
• Modelingisallimportant(topredict/explore/explain)
• Localdependencies,irrelevanciestobeevaluated
• Priors,likelihoodstobeobtained• DirectedAcyclicGraphtobeconstructed• Datadefinenetwork• No“training”necessary
BayesianNetworkToolboxhdp://bnt.googlecode.com
ABayesianNetwork
Cloudy
Rain Sprinkler
Wet
CreaNngaDAG
MulNvaluednodes
Makingthebnet
Namingparameters
CondiNonalprobabilitydistribuNon
Enteringevidence
TheCatalinaRealNmeTransientSurvey
Catalina Survey Fast Transient (a flare star), 02 Nov 2007 UT:
4 individual exposures, separated by 10 min Baseline coadd:
CRTS is a search for transients being done at Caltech piggybacking on the data from the search for near-Earth, potentially hazardous asteroids (this later is led by S. Larson, E. Beshore, et al. at UAz LPL). The survey uses the 24-inch Schmidt on Mt. Bigellow, and a single, unfiltered 4kx4k CCD (and also telescopes at Mt. Lemmon and Siding Spring). Coverage of well over 1000 deg2/night
BasicastronomyclassificaNontriviacontextbasedinformaNon
• GalacNclaNtude–GalacNcness• Proximitytoagalaxy–SN
CV,SN,Blazars,Rest
‐4‐>4(10binseach)
spectra lightcurves
• N=5• Dag=zeroes(N,N)• C=1;g=2;c1=3;c2=4;c3=5;• Dag(c,[g,c1,c2,c3])=1• Discrete_nodes(1:N)• Node_sizes=[4,10,10,10,10]• Bnet=mk_bnet(dag,node_sizes,names,{’class’,’galacNc_laNtude’,’g‐r’,’r‐i’,’i‐z’},’discrete’,1:5)
AdvantagesofBayesianNetworks
• Handlingofincompletedata– Real‐worldcases
• LearningcausalconnecNons– Whatvariablecausedwhat
• IncorporaNngdomainknowledge– Expertscanweightinatdifferentpoints
• Memorizing(akaoverfiVng)avoided– Noholdoutnecessary