i1.2 a quality-of-information theory for sensor data collection and fusion
DESCRIPTION
I1.2 A Quality-of-Information Theory for Sensor Data Collection and Fusion. Abdelzaher (UIUC). This Talk: Towards a QoI Theory for Data Fusion from Sensors + Information network links. Fusion of text and images. Fusion of soft sources. Fusion from human sources. Methods: - PowerPoint PPT PresentationTRANSCRIPT
I1.2 A Quality-of-Information Theory for Sensor Data Collection
and FusionAbdelzaher (UIUC)
Research Milestones
Due Description
Q1Estimation-theoretic QoI analysis. Formulation of analytic models for quantifying accuracy of prediction/estimation results.
Q2
Extended analysis of semantic links in information networks. Formulation of information network abstractions that are amenable to analysis as new sensors in a data fusion framework.
Q3Data pool quality metrics and impact of data fusion. Formulation of metrics for data selection when all data cannot be used/sent.
Q4 Validation of QoI theory. Documentation and publications.
Research Milestones
Due Description
Q1Estimation-theoretic QoI analysis. Formulation of analytic models for quantifying accuracy of prediction/estimation results.
Q2
Extended analysis of semantic links in information networks. Formulation of information network abstractions that are amenable to analysis as new sensors in a data fusion framework.
Q3Data pool quality metrics and impact of data fusion. Formulation of metrics for data selection when all data cannot be used/sent.
Q4 Validation of QoI theory. Documentation and publications.
4
Signal data fusionInformation
Network Analysis
Sensors, reports, and human sources
Sensors, reports, and human sources
Trust, Social Networks
Methods:• Bayesian analysis• Maximum
likelihood Estimation
• etc.
Methods:• Ranking• Clusterin
g• etc.
Methods:• Fact-finding• Influence
analysis• etc.
Machine Learning
Methods:• Transfer
knowledge
• CCM• etc.
Fusion of hard sources
Fusion of soft sources
Fusion of text and images
Fusion from human sources
This Talk: Towards a QoI Theory for Data Fusion from Sensors + Information network links
Infrared motion sensor
Target
Sensor Fusion Example: Target Classification
Vibration sensors
Acoustic sensors
Different sensors (of known reliability, false alarm rates, etc) are used to classify targets
Well-developed theory exists to combine possibly conflicting sensor measurements to accurately estimate target attributes.Bayesian analysisMaximum likelihoodKalman filtersetc.
Information Network MiningExample: Fact-finding
Example 1:Consider a graph of who published where (but
no prior knowledge of these individuals and conferences)
Rank conferences and authors by importance in their field
Han
Abdelzaher
Roth
Sensys
KDD
WWWFusion
Example 2:Consider a graph of who said what (sources
and assertions but no prior knowledge of their credibility)
Rank sources and assertions by credibility
John
Sally
Mike
Claim4
Claim1
Claim3Claim2
The ChallengeHow to combine information from sensors and
information network links to offer a rigorous quantification of QoI (e.g., correctness probability) with minimal prior knowledge?
Infrared motion sensor
TargetVibration sensors
Acoustic sensors
John
Sally
Mike
Claim4
Claim1
Claim3Claim2
+
P(armed convoy)=?
ApplicationsUnderstand Civil Unrest
Remote situation assessmentUse Twitter feeds, news, cameras, …
Expedite Disaster RecoveryDamage assessment and first
responseUse sensor feeds, eye witness
reports, …
Reduce Traffic CongestionMaping traffic congestion in cityUse crowd-sourcing (of cell-phone
GPS measurements), speed sensor readings, eye witness reports, …
Approach: Back to the BasicsInterpret the simplest fact-finder as a classical
(Bayesian) sensor fusion problemIdentify the duality between information link
analysis and Bayesian sensor fusion (links = sensor readings)
Use that duality to quantify probability of correctness of fusion (i.e., information link analysis) results
Incrementally extend analysis to more complex information network models and mining algorithms
An Interdisciplinary Team
Abdelzaher (QoI, sensor fusion)Roth (fact-finders, machine learning)Aggarwal, Han (Data mining, veracity
analysis)
Fusion TaskI1.1
QoI Mining
TaskI3.1QoI Task
I1.2
The Bayesian InterpretationThe Simplest Fact-finder:
John
Sally
Mike
Claim4
Claim1
Claim3Claim2
i
j
Claimskk
ii
Sourceskk
jj
RankRank
RankRank
)Claim(1
)Source(
)Source(1
)Claim(
The Simplest Bayesian Classifier (Naïve Bayesian):
Z
)Target|Sensor(
)Target()Sensors|Target(
jSensorskjk
jj
P
PP
The Equivalence Condition
We know that for a sufficiently small xk:
Z
)Target|Sensor(
)Target()Sensors|Target(
jSensorskjk
jj
P
PP
k k
kk xx 1)1(
Consider individually unreliable sensors:
1,1)Sensor(
)Target|Sensor( jkjk
k
jk xxP
P
A Bayesian Fact-finder
and:
i
j
Claimskki
Sourceskkj
RankRank
RankRank
)Claim()Source(
)Source()Claim(
)1)Source((network)|Source(
)1)Claim(()network|Claim(
ii
jj
RankP
RankP
ClaimsStatesMeasured
SourcesSensors
By duality, if:
Then, Bayes Theorem eventually leads to:
Fusion of Sensors and Information Networks
Putting fusion of sensors and information network link analysis on a common analytic foundation:Can quantify probability of correctness of
resultsCan leverage existing theory to derive
accuracy bounds
Source1
Source3
Source2
Claim4
Claim1
Claim3 Claim2
Sensor1Sensor2
Sensor3
Fusion Result
Information Network
Fusion of Sensors and Information Networks
Putting fusion of sensors and information network link analysis on a common analytic foundation:Can quantify probability of correctness of
resultsCan leverage existing theory to derive
accuracy bounds
Source1
Source3
Source2
Claim4
Claim1
Claim3 Claim2
Sensor1Sensor2
Sensor3
Fusion Result
Information Network
Measurements
Measurements
Simulation-based EvaluationGenerate thousands of “assertions” (some true, some
false – unknown to the fact-finder)Generate tens of sources (each source has a different
probability of being correct – unknown to the fact-finder)Sources make true/false assertions consistently with their
probability of correctnessA link is created between each source and each assertion it
makesAnalyze the resulting network to determine:
The set of true and false assertionsThe probability that a source is correct
No prior knowledge of individual sources and assertions is assumed
Evaluation ResultsComparison to 4 fact-finders from literature Significantly improved prediction accuracy of
source correctness probability (from 20% error to 4% error)
(Almost) no false positives for larger networks (> 30 sources)
Evaluation ResultsComparison to 4 fact-finders from literature
Below 1% false negatives for larger networks (> 30 sources)
Evaluation ResultsComparison to 4 fact-finders from literature
Coming up: The Apollo FactFinder
Apollo Architecture
Apollo: Towards Factfinding in Participatory Sensing, H. Khac Le, J. Pasternack, H. Ahmadi, M. Gupta, Y. Sun, T. Abdelzaher, J. Han, D. Roth, B. Szymanski, and S. Adali, demo session at ISPN10, The 10th International Conference on Information Processing in Sensor Networks, April, 2011, Chicago, IL, USA.
Abdelzaher, Adali, Han, Huang, Roth, Szymanski
Apollo: Improves fusion QoI from noisy human and sensor data. Demo in IPSN 2011 (in April) Collects data from cell-phones Interfaced to twitter Can use sensors and human text Analysis on several data sets: what really happened?
Apollo Datasets
Track data from cell-phones in a controlled experiment
2 Million tweets from Egypt Unrest
Tweets on Japan Earthquake, Tsunami and
Nuclear Emergency
Immediate ExtensionsNon-independent sources
Sources that have a common bias, sources where one influences another, etc.
Collaboration opportunities with SCNARC and Trust
Non-independent claimsClaims that cannot be simultaneously trueClaims that increase or decrease each other’s
probabilityMixture of reliable and unreliable sources
More reliable sources can help calibrate correctness of less reliable sources
Road AheadDevelop a unifying QoI-assurance theory for fact-finding/fusion from hard and soft sources
SourcesUse different media: signals, text, images, …Feature differ authors: physical sensors, humans
Capabilities Computes accurate best estimates of probabilities of correctness Computes accurate confidence bounds in resultsEnhances QoI/cost trade-offs in data fusion systemsIntegrates sensor and information network link analysis into a
unified analytic framework for QoI assessmentAccounts for data dependencies, constraints, context and prior
knowledgeAccount for effect of social factors such as trust, influence, and
homophily on opinion formation, propagation, and perception (in human sensing)
Impact: Enhanced warfighter ability to assess information
CollaborationsFusion TaskI1.1
QoI/cost analysis (unified theory for estimation/prediction
and information network link analysis
QoI TaskI1.2
QoI Mining
TaskI3.1
(w/Jiawei Han) Consider new link analysis algorithms
Capacity TaskI1.2
Community
ModelingS2.2
Sister QoI TaskI1.2
Decisions under StressS3.1
(w/Dan Roth) Account for prior knowledge and
constraints
(w/Boleslaw Szymanski and Sibel Adali)Model humans in the loop
(w/Ramesh Govindan) Improve communication resource efficiency
(w/Aylin Yener) Increase OICC
CollaborationsCollaborative – Multi-institution:Q2 (UIUC+IBM): Tarek Abdelzaher, Dong Wang, Hossein Ahmadi,
Jeff Pasternack, Dan Roth, Omid Fetemieh, and Hieu Le, Charu Aggarwal, “On Bayesian Interpretation of Fact-finding in Information Networks,” submitted to Fusion 2011
Collaborative – Inter-center:Q2 (I+SC): H. Khac Le, J. Pasternack, H. Ahmadi, M. Gupta, Y. Sun,
T. Abdelzaher, J. Han, D. Roth, B. Szymanski, S. Adali, “Apollo: Towards Factfinding in Participatory Sensing,” IPSN Demo, April 2011
Q2 (I+SC): Mani Srivastava, Tarek Abdelzaher, Boleslaw Szymanski, “Human-centric Sensing,” Philosophical Transactions of the Royal Society, special issue on Wireless Sensor Networks, expected in 2011 (invited).
Invited Session on QoI at Fusion 2011(co-chaired with Ramesh Govindan, CNARC)
Military RelevanceEnhanced warfighter decision-making ability
based on better quality assessment of fusion outputs
A unified QoI assurance theory for fusion systems that utilize both sensors and information networksOffers a quantitative understanding of the
benefits of exploiting information network links in data fusion
Enhances result accuracy and provides confidence bounds in result correctness