aggregate query answering under uncertain schema mappings
DESCRIPTION
Aggregate Query Answering under Uncertain Schema Mappings. Avigdor Gal, Maria Vanina Martinez, Gerardo I. Simari, VS Subrahmanian Presented By Stephen Lynn. Overview. Aggregate Queries Probabilistic Schema Mapping Goals/Objectives Aggregate Processing (3 proposals) By-Table Algorithm - PowerPoint PPT PresentationTRANSCRIPT
Data Integration
Aggregate Query Answering under Uncertain Schema Mappings
Avigdor Gal, Maria Vanina Martinez, Gerardo I. Simari, VS Subrahmanian
Presented By Stephen Lynn
Data Integration
Overview Aggregate Queries Probabilistic Schema Mapping Goals/Objectives Aggregate Processing (3 proposals) By-Table Algorithm By-Tuple Algorithm Evaluation Analysis
Data Integration
Aggregate Queries
COUNT, MIN, MAX, SUM, AVG
ID Price Quantity1 2.30 2
2 3.20 4
3 7.34 1
4 8.29 20
5 3.32 3
Simple PTIME algorithms to compute
Data Integration
Probabilistic Schema Mappings
Data Integration
By-Table vs By-Tuple
Tuple – consider all possible mappings for each tuple
Table – single mapping for entire table P(date→postedDate) = 0.7 P(date→reducedDate) = 0.3
Data Integration
Goals/ObjectivesImpact Analysis of Probabilistic Schemas on Aggregate Queries
Aggregate Query AlgorithmsTime Complexity AnalysisEvaluation
Data Integration
Aggregation Methods
RangeDistribution
Expected Value
Data Integration
Method Relationships Distribution
Most time consumingMost information
RangeComputed directly from distribution
Expected ValueComputed directly from distribution
More efficient ways to compute
Data Integration
By-Table Algorithm
All PTIME computable
Data Integration
By-Tuple Algorithm (COUNT)
O(n * m)
Data Integration
Example By-Tuple (COUNT)
Data Integration
Time Complexity
Data Integration
Evaluation Empirical Evaluation
Real-world dataset (eBay)Synthetic dataset
Evaluate Time ComplexityVary tuple numbersVary attribute mappings
Data Integration
Evaluation Results
Data Integration
Evaluation Results
Data Integration
Evaluation Results
Data Integration
Analysis Strengths
Effect of probabilistic schemas on aggregatesNice PTIME algorithms
WeaknessesEvaluation was obviousBy-Table results biased by database optimizations
Future Work Improve algorithmsExtend to sub-queriesHeuristics