open universes and nuclear weapons
DESCRIPTION
Open universes and nuclear weapons. Outline. Why we need expressive probabilistic languages BLOG combines probability and first-order logic Application to global seismic monitoring for the Comprehensive Nuclear-Test-Ban Treaty (CTBT). The world has things in it!!. - PowerPoint PPT PresentationTRANSCRIPT
1
Open universes and nuclear weapons
Stuart RussellComputer Science Division, UC Berkeley
2
Outline
Why we need expressive probabilistic languages BLOG combines probability and first-order logic Application to global seismic monitoring for the
Comprehensive Nuclear-Test-Ban Treaty (CTBT)
3
The world has things in it!!
Expressive language => concise models => fast learning, sometimes fast reasoning E.g., rules of chess:
1 page in first-order logicOn(color,piece,x,y,t)
~100000 pages in propositional logicWhiteKingOnC4Move12
~100000000000000000000000000000000000000 pages as atomic-state modelR.B.KB.RPPP..PPP..N..N…..PP….q.pp..Q..n..n..ppp..pppr.b.kb.r
[Note: chess is a tiny problem compared to the real world]
4
Brief history of expressiveness
atomic propositional first-order/relational
logic
probability
5
Brief history of expressiveness
atomic propositional first-order/relational
logic
probability
5th C B.C.
6
Brief history of expressiveness
atomic propositional first-order/relational
logic
probability
5th C B.C.
17th C
7
Brief history of expressiveness
atomic propositional first-order/relational
logic
probability
5th C B.C. 19th C
17th C
8
Brief history of expressiveness
atomic propositional first-order/relational
logic
probability
5th C B.C. 19th C
17th C 20th C
9
Brief history of expressiveness
atomic propositional first-order/relational
logic
probability
5th C B.C. 19th C
17th C 20th C 21st C
10
Brief history of expressiveness
atomic propositional first-order/relational
logic
probability
5th C B.C. 19th C
17th C 20th C 21st C
(be patient!)
11
First-order probabilistic languages
Gaifman [1964]: Possible worlds with objects and relations,
probabilities attached to (infinitely many) sentences Halpern [1990]:
Probabilities within sentences, constraints on distributions over first-order possible worlds
Poole [1993], Sato [1997], Koller & Pfeffer [1998], various others: KB defines distribution exactly (cf. Bayes nets) assumes unique names and domain closure like Prolog, databases (Herbrand semantics)
12
Herbrand vs full first-order
GivenFather(Bill,William) and Father(Bill,Junior)How many children does Bill have?
13
Herbrand vs full first-order
GivenFather(Bill,William) and Father(Bill,Junior)How many children does Bill have?
Herbrand semantics:2
14
Herbrand vs full first-order
GivenFather(Bill,William) and Father(Bill,Junior)How many children does Bill have?
Herbrand semantics:2First-order logical semantics:
Between 1 and ∞
15
Possible worlds Propositional
16
Possible worlds Propositional
First-order + unique names, domain closureA B
C D
A B
C D
A B
C D
A B
C D
A B
C D
17
Possible worlds Propositional
First-order + unique names, domain closure
First-order open-universe
A B
C D
A B
C D
A B
C D
A B
C D
A B
C D
A B C D A B C D A B C D A B C D A B C D A B C D
18
Open-universe models
Essential for learning about what exists, e.g., vision, NLP, information integration, tracking, life
[Note the GOFAI Gap: logic-based systems going back to Shakey assumed that perceived objects would be named correctly]
Key question: how to define distributions over an infinite, heterogeneous set of worlds?
Bayes nets build propositional worlds
19
BurglaryBurglary
AlarmAlarm
EarthquakeEarthquake
Bayes nets build propositional worlds
20
BurglaryBurglary
AlarmAlarm
EarthquakeEarthquake
Burglary
Bayes nets build propositional worlds
21
BurglaryBurglary
AlarmAlarm
EarthquakeEarthquake
Burglarynot Earthquake
Bayes nets build propositional worlds
22
BurglaryBurglary
AlarmAlarm
EarthquakeEarthquake
Burglarynot EarthquakeAlarm
23
Open-universe models in BLOG
Construct worlds using two kinds of steps, proceeding in topological order: Dependency statements: Set the value of a
function or relation on a tuple of (quantified) arguments, conditioned on parent values
24
Open-universe models in BLOG
Construct worlds using two kinds of steps, proceeding in topological order: Dependency statements: Set the value of a
function or relation on a tuple of (quantified) arguments, conditioned on parent values
Number statements: Add some objects to the world, conditioned on what objects and relations exist so far
25
Semantics
Every well-formed* BLOG model specifies a unique proper probability distribution over open-universe possible worlds; equivalent to an infinite contingent Bayes net
* No infinite receding ancestor chains, no conditioned cycles, all expressions finitely evaluable
26
Example: Citation Matching[Lashkari et al 94] Collaborative Interface Agents,
Yezdi Lashkari, Max Metral, and Pattie Maes, Proceedings of the Twelfth National Conference on Articial Intelligence, MIT Press, Cambridge, MA, 1994.
Metral M. Lashkari, Y. and P. Maes. Collaborative interface agents. In Conference of the American Association for Artificial Intelligence, Seattle, WA, August 1994.
Are these descriptions of the same object?
Core task in CiteSeer, Google Scholar, over 300 companies in the record linkage industry
27
(Simplified) BLOG model
#Researcher ~ NumResearchersPrior();
Name(r) ~ NamePrior();
#Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r));
Title(p) ~ TitlePrior();
PubCited(c) ~ Uniform({Paper p});
Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));
28
(Simplified) BLOG model
#Researcher ~ NumResearchersPrior();
Name(r) ~ NamePrior();
#Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r));
Title(p) ~ TitlePrior();
PubCited(c) ~ Uniform({Paper p});
Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));
29
(Simplified) BLOG model
#Researcher ~ NumResearchersPrior();
Name(r) ~ NamePrior();
#Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r));
Title(p) ~ TitlePrior();
PubCited(c) ~ Uniform({Paper p});
Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));
30
(Simplified) BLOG model
#Researcher ~ NumResearchersPrior();
Name(r) ~ NamePrior();
#Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r));
Title(p) ~ TitlePrior();
PubCited(c) ~ Uniform({Paper p});
Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));
31
(Simplified) BLOG model
#Researcher ~ NumResearchersPrior();
Name(r) ~ NamePrior();
#Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r));
Title(p) ~ TitlePrior();
PubCited(c) ~ Uniform({Paper p});
Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));
32
(Simplified) BLOG model
#Researcher ~ NumResearchersPrior();
Name(r) ~ NamePrior();
#Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r));
Title(p) ~ TitlePrior();
PubCited(c) ~ Uniform({Paper p});
Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));
33
Citation Matching Results
Four data sets of ~300-500 citations, referring to ~150-300 papers
0
0.05
0.1
0.15
0.2
0.25
Reinforce Face Reason Constraint
Error
(Fraction of Clusters Not Recovered Correctly)
Phrase Matching[Lawrence et al. 1999]
Generative Model + MCMC[Pasula et al. 2002]
Conditional Random Field[Wellner et al. 2004]
34
Example: Sibyl attacks Typically between 100 and 10,000 real people About 90% are honest, have one login ID Dishonest people own between 10 and 1000 logins. Transactions may occur between logins
If two logins are owned by the same person (sibyls), then a transaction is highly likely;
Otherwise, transaction is less likely (depending on honesty of each login’s owner).
A login may recommend another after a transaction: Sibyls with the same owner usually recommend each other; Otherwise, probability of recommendation depends on the honesty of the
two owners.
35
#Person ~ LogNormal[6.9, 2.3]();Honest(x) ~ Boolean[0.9]();#Login(Owner = x) ~ if Honest(x) then 1 else LogNormal[4.6,2.3]();Transaction(x,y) ~ if Owner(x) = Owner(y) then SibylPrior() else TransactionPrior(Honest(Owner(x)), Honest(Owner(y)));Recommends(x,y) ~ if Transaction(x,y) then if Owner(x) = Owner(y) then Boolean[0.99]() else RecPrior(Honest(Owner(x)), Honest(Owner(y)));
Evidence: lots of transactions and recommendations, maybe some Honest(.) assertionsQuery: Honest(x)
36
#Person ~ LogNormal[6.9, 2.3]();Honest(x) ~ Boolean[0.9]();#Login(Owner = x) ~ if Honest(x) then 1 else LogNormal[4.6,2.3]();Transaction(x,y) ~ if Owner(x) = Owner(y) then SibylPrior() else TransactionPrior(Honest(Owner(x)), Honest(Owner(y)));Recommends(x,y) ~ if Transaction(x,y) then if Owner(x) = Owner(y) then Boolean[0.99]() else RecPrior(Honest(Owner(x)), Honest(Owner(y)));
Evidence: lots of transactions and recommendations, maybe some Honest(.) assertionsQuery: Honest(x)
37
#Person ~ LogNormal[6.9, 2.3]();Honest(x) ~ Boolean[0.9]();#Login(Owner = x) ~ if Honest(x) then 1 else LogNormal[4.6,2.3]();Transaction(x,y) ~ if Owner(x) = Owner(y) then SibylPrior() else TransactionPrior(Honest(Owner(x)), Honest(Owner(y)));Recommends(x,y) ~ if Transaction(x,y) then if Owner(x) = Owner(y) then Boolean[0.99]() else RecPrior(Honest(Owner(x)), Honest(Owner(y)));
Evidence: lots of transactions and recommendations, maybe some Honest(.) assertionsQuery: Honest(x)
38
#Person ~ LogNormal[6.9, 2.3]();Honest(x) ~ Boolean[0.9]();#Login(Owner = x) ~ if Honest(x) then 1 else LogNormal[4.6,2.3]();Transaction(x,y) ~ if Owner(x) = Owner(y) then SibylPrior() else TransactionPrior(Honest(Owner(x)), Honest(Owner(y)));Recommends(x,y) ~ if Transaction(x,y) then if Owner(x) = Owner(y) then Boolean[0.99]() else RecPrior(Honest(Owner(x)), Honest(Owner(y)));
Evidence: lots of transactions and recommendations, maybe some Honest(.) assertionsQuery: Honest(x)
39
#Person ~ LogNormal[6.9, 2.3]();Honest(x) ~ Boolean[0.9]();#Login(Owner = x) ~ if Honest(x) then 1 else LogNormal[4.6,2.3]();Transaction(x,y) ~ if Owner(x) = Owner(y) then SibylPrior() else TransactionPrior(Honest(Owner(x)), Honest(Owner(y)));Recommends(x,y) ~ if Transaction(x,y) then if Owner(x) = Owner(y) then Boolean[0.99]() else RecPrior(Honest(Owner(x)), Honest(Owner(y)));
Evidence: lots of transactions and recommendations, maybe some Honest(.) assertionsQuery: Honest(x)
40
Example: classical data association
41
Example: classical data association
42
Example: classical data association
43
Example: classical data association
44
Example: classical data association
45
Example: classical data association
46
#Aircraft(EntryTime = t) ~ NumAircraftPrior();
Exits(a, t) if InFlight(a, t) then ~ Bernoulli(0.1);
InFlight(a, t)if t < EntryTime(a) then = falseelseif t = EntryTime(a) then = trueelse = (InFlight(a, t-1) & !Exits(a, t-1));
State(a, t)if t = EntryTime(a) then ~ InitState() elseif InFlight(a, t) then ~ StateTransition(State(a, t-1));
#Blip(Source = a, Time = t) if InFlight(a, t) then
~ NumDetectionsCPD(State(a, t));
#Blip(Time = t) ~ NumFalseAlarmsPrior();
ApparentPos(r)if (Source(r) = null) then ~ FalseAlarmDistrib()else ~ ObsCPD(State(Source(r), Time(r)));
47
Inference
Theorem: BLOG inference algorithms (rejection sampling, importance sampling, MCMC) converge to correct posteriors for any well-formed* model, for any first-order query
Current generic MCMC engine is quite slow Applying compiler technology Developing user-friendly methods for specifying
piecemeal MCMC proposals
48
CTBT
Bans testing of nuclear weapons on earth Allows for outside inspection of 1000km2
182/195 states have signed 153/195 have ratified Need 9 more ratifications including US, China US Senate refused to ratify in 1998
“too hard to monitor”
49
2053 nuclear explosions
50
51
254 monitoring stations
52
53
Vertically Integrated Seismic Analysis
The problem is hard: ~10000 “detections” per day, 90% false CTBT system (SEL3) finds 69% of significant events plus
about twice as many spurious (nonexistent) events 16 human analysts find more events, correct existing ones,
throw out spurious events, generate LEB (“ground truth”) Unreliable below magnitude 4 (1kT)
Solve it by global probabilistic inference NET-VISA finds around 88% of significant events
54
55
56
57
58
59
60
61
62
63
64
Generative model for IDC arrival data
Events occur in time and space with magnitude Natural spatial distribution a mixture of Fisher-Binghams Man-made spatial distribution uniform Time distribution Poisson with given spatial intensity Magnitude distribution Gutenberg-Richter (exponential) Aftershock distribution (not yet implemented)
Travel time according to IASPEI91 model plus Laplacian error distribution for each of 14 phases
Detection depends on magnitude, distance, station* Detected azimuth, slowness plus Laplacian error False detections with station-dependent distribution
65
# SeismicEvents ~ Poisson[TIME_DURATION*EVENT_RATE];IsEarthQuake(e) ~ Bernoulli(.999);EventLocation(e) ~ If IsEarthQuake(e) then EarthQuakeDistribution()
Else UniformEarthDistribution();Magnitude(e) ~ Exponential(log(10)) + MIN_MAG;Distance(e,s) = GeographicalDistance(EventLocation(e), SiteLocation(s));IsDetected(e,p,s) ~ Logistic[SITE_COEFFS(s,p)](Magnitude(e), Distance(e,s);#Arrivals(site = s) ~ Poisson[TIME_DURATION*FALSE_RATE(s)];#Arrivals(event=e, site) = If IsDetected(e,s) then 1 else 0;Time(a) ~ If (event(a) = null) then Uniform(0,TIME_DURATION)
else IASPEI(EventLocation(event(a)),SiteLocation(site(a)),Phase(a)) + TimeRes(a);TimeRes(a) ~ Laplace(TIMLOC(site(a)), TIMSCALE(site(a)));Azimuth(a) ~ If (event(a) = null) then Uniform(0, 360)
else GeoAzimuth(EventLocation(event(a)),SiteLocation(site(a)) + AzRes(a);AzRes(a) ~ Laplace(0, AZSCALE(site(a)));Slow(a) ~ If (event(a) = null) then Uniform(0,20)
else IASPEI-SLOW(EventLocation(event(a)),SiteLocation(site(a)) + SlowRes(site(a));
66
Seismic event
Propagation
Seismic event
Propagation
Station 1picks
Station 2picks
Forward model structure
Detected atStation 1?
Detected atStation 2?
Station 1noise
Station 2 noise
67
Seismic event
Propagation
Seismic event
Propagation
Station 1picks
Station 2picks
Forward model structure
Detected atStation 1?
Detected atStation 2?
Station 1noise
Station 2 noise
TypeTime
LocationDepth
MagnitudePhase
68
Seismic event
Propagation
Seismic event
Propagation
Station 1picks
Station 2picks
Forward model structure
Detected atStation 1?
Detected atStation 2?
Station 1noise
Station 2 noise
Travel timeAmplitude decay
69
Seismic event
Propagation
Seismic event
Propagation
Station 1picks
Station 2picks
Forward model structure
Detected atStation 1?
Detected atStation 2?
Station 1noise
Station 2 noise
Arrival time*Amplitude*Azimuth*
Slowness*Phase*
70
Seismic event
Propagation
Seismic event
Propagation
Station 1picks
Station 2picks
Forward model structure
Detected atStation 1?
Detected atStation 2?
Station 1noise
Station 2 noise
71
72
73
Seismic event
Propagation
Seismic event
Propagation
Station 1picks
Station 2picks
Forward model structure
Detected atStation 1?
Detected atStation 2?
Station 1noise
Station 2 noise
74
Travel-time residual (station 6)
75
Seismic event
Propagation
Seismic event
Propagation
Station 1picks
Station 2picks
Forward model structure
Detected atStation 1?
Detected atStation 2?
Station 1noise
Station 2 noise
76
Detection probability as a function of distance (station 6, mb 3.5)
P phase S phase
77
Seismic event
Propagation
Seismic event
Propagation
Station 1picks
Station 2picks
Forward model structure
Detected atStation 1?
Detected atStation 2?
Station 1noise
Station 2 noise
78
Overall Pick Error
79
Overall Azimuth Error
80
Phase confusion matrix
81
Fraction of LEB events missed
82
Fraction of LEB events missed
83
Event distribution: LEB vs SEL3
84
Event distribution: LEB vs NET-VISA
85
Why does NET-VISA work?
Multiple empirically calibrated seismological models Improving model structure and quality improves the results
Sound Bayesian combination of evidence Measured arrival times, phase labels, azimuths, etc.,
NOT taken literally Absence of detections provides negative evidence More detections per event than SEL3 or LEB
86
Example of using extra detections
87
NEIC event (3.0) missed by LEB
88
NEIC event (3.7) missed by LEB
89
NEIC event (2.6) missed by LEB
90
Why does NET-VISA not work (perfectly)?
Needs hydroacoustic for mid-ocean events Weaknesses in model:
Travel time residuals for all phases along a single path are assumed to be uncorrelated
Each phase arrival is assumed to generate at most one detection; in fact, multiple detections occur
Arrival detectors use high SNR thresholds, look only at local signal to make hard decisions
91
Detection-based and signal-based monitoring
events
detections
waveform signals
SEL3 NET-VISA SIG-VISA
92
Summary
Expressive probability models are very useful BLOG provides a generative language for defining
first-order, open-universe models Inference via MCMC over possible worlds
Other methods welcome! CTBT application is typical of multi-sensor monitoring
applications that need vertical integration and involve data association
Scaling up inference is the next step