mindreader: querying databases through multiple examples
DESCRIPTION
MindReader: Querying databases through multiple examples. Yoshiharu Ishikawa (Nara Institute of Science and Technology, Japan) Ravishankar Subramanya (Pittsburgh Supercomputing Center) Christos Faloutsos (Carnegie Mellon University). Outline. Background & Introduction Query by Example - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/1.jpg)
MindReader:Querying databases through
multiple examples
Yoshiharu Ishikawa(Nara Institute of Science and Technology, Japan)
Ravishankar Subramanya(Pittsburgh Supercomputing Center)
Christos Faloutsos(Carnegie Mellon University)
![Page 2: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/2.jpg)
Outline
Background & Introduction Query by Example Our Approach Relevance Feedback What’s New in MindReader?
Proposed Method Problem Formulation Theorems
Experimental Results Discussion & Conclusion
![Page 3: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/3.jpg)
Query-by-Example: an example
Searching “mildly overweighted” patients
• The doctor selects examples by browsing patient database
![Page 4: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/4.jpg)
Query-by-Example: an example
Searching “mildly overweighted” patients
: very good
: good
• The doctor selects examples by browsing patient database
![Page 5: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/5.jpg)
Query-by-Example: an example
Searching “mildly overweighted” patients
Height
Weig
ht
: very good
: good
• The doctor selects examples by browsing patient database
![Page 6: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/6.jpg)
Query-by-Example: an example
Searching “mildly overweighted” patients
Height
Weig
ht
: very good
: good
• The doctor selects examples by browsing patient database
![Page 7: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/7.jpg)
Query-by-Example: an example
Searching “mildly overweighted” patients
Height
Weig
ht
: very good
: good
• The doctor selects examples by browsing patient database
• The examples have “oblique” correlation
![Page 8: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/8.jpg)
Query-by-Example: an example
Searching “mildly overweighted” patients
Height
Weig
ht
: very good
: good
• The doctor selects examples by browsing patient database
• The examples have “oblique” correlation
• We can “guess” the implied query
![Page 9: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/9.jpg)
Query-by-Example: an example
Searching “mildly overweighted” patients
Height
Weig
ht
: very good
: good
• The doctor selects examples by browsing patient database
• The examples have “oblique” correlation
• We can “guess” the implied query
q
![Page 10: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/10.jpg)
Query-by-Example: the question
Assume that user gives multiple examples user optionally assigns scores to the
examples samples have spatial correlation
![Page 11: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/11.jpg)
Query-by-Example: the question
Assume that user gives multiple examples user optionally assigns scores to the
examples samples have spatial correlation
How can we “guess” the implied query?
![Page 12: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/12.jpg)
Outline
Background & Introduction Query by Example Our Approach Relevance Feedback What’s New in MindReader?
Proposed Method Problem Formulation Theorems
Experimental Results Discussion & Conclusion
![Page 13: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/13.jpg)
Our Approach
Automatically derive distance measure from the given examples
Two important notions:1. diagonal query: isosurfaces of queries have
ellipsoid shapes
2. multiple-level scores: user can specify “goodness scores” on samples
![Page 14: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/14.jpg)
Isosurfaces of Distance Functions
Euclidean weightedEuclidean
generalizedellipsoid distance
q qq
![Page 15: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/15.jpg)
Distance Function Formulas
Euclidean
D(x, q) = (x – q)2
Weighted Euclidean
D(x, q) = i mi(xi – qi)2
Generalized ellipsoid distanceD(x, q) = (x – q)T M (x – q)
![Page 16: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/16.jpg)
Outline
Background & Introduction Query by Example Our Approach Relevance Feedback What’s New in MindReader?
Proposed Method Problem Formulation Theorems
Experimental Results Discussion & Conclusion
![Page 17: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/17.jpg)
Relevance Feedback
Popular method in IRQuery is modified based on relevance
judgment from the userTwo major approaches
1. query-point movement
2. re-weighting
![Page 18: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/18.jpg)
Relevance Feedback— Query-point Movement —
Query point is moved towards “good” examples — Rocchio’s formula in IR
Q0
Q0: query point
![Page 19: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/19.jpg)
Relevance Feedback— Query-point Movement —
Query point is moved towards “good” examples — Rocchio’s formula in IR
Q0: query point
: retrieved data
Q0
![Page 20: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/20.jpg)
Relevance Feedback— Query-point Movement —
Query point is moved towards “good” examples — Rocchio’s formula in IR
Q0: query point
: retrieved data
: relevance judgments
Q0
![Page 21: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/21.jpg)
Relevance Feedback— Query-point Movement —
Query point is moved towards “good” examples — Rocchio’s formula in IR
Q1
Q0: query point
: retrieved data
: relevance judgments
Q1: new query pointQ0
![Page 22: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/22.jpg)
Relevance Feedback— Re-weighting —
Standard Deviation Method in MARS (UIUC) image retrieval system
![Page 23: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/23.jpg)
Relevance Feedback— Re-weighting —
Standard Deviation Method in MARS (UIUC) image retrieval system
Assumption: the deviation is high the feature is not important
![Page 24: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/24.jpg)
Relevance Feedback— Re-weighting —
Standard Deviation Method in MARS (UIUC) image retrieval system
Assumption: the deviation is high the feature is not important
f2
f1
![Page 25: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/25.jpg)
Relevance Feedback— Re-weighting —
Standard Deviation Method in MARS (UIUC) image retrieval system
Assumption: the deviation is high the feature is not important
f1
f2
![Page 26: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/26.jpg)
Relevance Feedback— Re-weighting —
Standard Deviation Method in MARS (UIUC) image retrieval system
Assumption: the deviation is high the feature is not important
f1
f2
![Page 27: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/27.jpg)
Relevance Feedback— Re-weighting —
Standard Deviation Method in MARS (UIUC) image retrieval system
Assumption: the deviation is high the feature is not important
“bad” feature
“good”feature
f1
f2
![Page 28: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/28.jpg)
Relevance Feedback— Re-weighting —
Standard Deviation Method in MARS (UIUC) image retrieval system
Assumption: the deviation is high the feature is not important
For each feature, weight wi = 1/i
is assigned “bad” feature
“good”feature
f1
f2
![Page 29: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/29.jpg)
Relevance Feedback— Re-weighting —
Standard Deviation Method in MARS (UIUC) image retrieval system
Assumption: the deviation is high the feature is not important
For each feature, weight wi = 1/i
is assigned “bad” feature
“good”feature
f1
f2ImpliedQuery
![Page 30: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/30.jpg)
Relevance Feedback— Re-weighting —
Standard Deviation Method in MARS (UIUC) image retrieval system
Assumption: the deviation is high the feature is not important
For each feature, weight wi = 1/ j
is assigned MARS didn’t provide any justification for this formula
“bad” feature
“good”feature
f1
f2ImpliedQuery
![Page 31: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/31.jpg)
Outline
Background & Introduction Query by Example Our Approach Relevance Feedback What’s New in MindReader?
Proposed Method Problem Formulation Theorems
Experimental Results Discussion & Conclusion
![Page 32: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/32.jpg)
What’s New in MindReader?
MindReader does not use ad-hoc heuristics
cf. Rocchio’s expression, re-weighting in MARS
can handle multiple levels of scores can derive generalized ellipsoid distance
![Page 33: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/33.jpg)
What’s New in MindReader?
MindReader can derive generalized ellipsoid distances
q
![Page 34: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/34.jpg)
Isosurfaces of Distance Functions
Euclidean weightedEuclidean
generalizedellipsoid distance
q qq
![Page 35: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/35.jpg)
Isosurfaces of Distance Functions
Euclidean
Rocchio
weightedEuclidean
generalizedellipsoid distance
q qq
![Page 36: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/36.jpg)
Isosurfaces of Distance Functions
Euclidean
Rocchio
weightedEuclidean MARS
generalizedellipsoid distance
q qq
![Page 37: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/37.jpg)
Isosurfaces of Distance Functions
Euclidean
Rocchio
weightedEuclidean MARS
generalizedellipsoid distance MindReader
q qq
![Page 38: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/38.jpg)
Outline
Background & Introduction Query by Example Our Approach Relevance Feedback What’s New in MindReader?
Proposed Method Problem Formulation Theorems
Experimental Results Discussion & Conclusion
![Page 39: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/39.jpg)
Method: distance function
Generalized ellipsoid distance functionD(x, q) = (x – q)T M (x – q), or
D(x, q) = j k mjk (xj – qj) (xk – qk) q: query point vector
x: data point vector
M = [mjk]: symmetric distance matrix
![Page 40: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/40.jpg)
Method: definitions
N: no. of samplesn: no. of dimensions (features)xi: n-d sample data vectors
xi = [xi1, …, xin]T
X: N×n sample data matrix X = [x1, …, xN]T
v: N-d score vector v = [v1, …, vN]
![Page 41: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/41.jpg)
Method: problem formulation
Problem formulation
Given N sample n-d vectors multiple-level scores (optional)
Estimate optimal distance matrix M optimal new query point q
![Page 42: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/42.jpg)
Method: optimality
How do we measure “optimality”?minimization of “penalty”
What is the “penalty”?weighted sum of distances between query
point and sample vectors
Therefore, minimize i (xi – q)T M (xi – q) under the constraint det(M) = 1
![Page 43: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/43.jpg)
Outline
Background & Introduction Query by Example Our Approach Relevance Feedback What’s New in MindReader?
Proposed Method Problem Formulation Theorems
Experimental Results Discussion & Conclusion
![Page 44: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/44.jpg)
Theorems: theorem 1
Solved with Lagrange multipliersTheorem 1: optimal query point
q = x = [x1, …, xn]T= XT v / vi
optimal query point is the weighted average of sample data vectors
![Page 45: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/45.jpg)
Theorems: theorem 2 & 3
Theorem 2: optimal distance matrix M = (det(C))1/n C–1
C = [cjk] is the weighted covariance matrix
cjk = vi (xik - xk) (xij - xj)
Theorem 3 If we restrict M to diagonal matrix, our
method is equal to standard deviation method MindReader includes MARS!
![Page 46: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/46.jpg)
Outline
Background & Introduction Query by Example Our Approach Relevance Feedback What’s New in MindReader?
Proposed Method Problem Formulation Theorems
Experimental Results Discussion & Conclusion
![Page 47: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/47.jpg)
Experiments
1. Estimation of optimal distance function Can MindReader estimate target distance
matrix Mhidden appropriately?
Based on synthetic data Comparison with standard deviation method
2. Query-point movement
3. Application to real data sets GIS data
![Page 48: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/48.jpg)
Experiment 1: target data
Two-dimensional normal distribution
![Page 49: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/49.jpg)
Experiment 1: idea
Assume that the user has “hidden” distance Mhidden in his mind
Simulate iterative query refinement
Q: How fast can we discover “hidden” distance?
Query point is fixed to (0, 0)
![Page 50: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/50.jpg)
Experiment 1: iteration steps
1. Make initial samples: compute k-NNs with Euclidean distance
2. For each object x, calculate its score that reflects the hidden distance Mhidden
3. MindReader estimates the matrix M 4. Retrieve k-NNs with the derived matrix M5. If the result is improved, go to step 2
![Page 51: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/51.jpg)
Experiment 1: scores
Calculation of scores in terms of “hidden” distance function 1. Calculate distance from the query point q
based on hidden distance matrix Mhidden
d = D(x, q) (0 v
2. Translate distance value d to score (-v s = exp(-d2/2) v = log s / (1 - s)
![Page 52: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/52.jpg)
Experiment 1: evaluation measures
Used to check whether the query result is improved or not
CD-k measure CD stands for “cumulative distance” for k-NNs retrieved by matrix M, compute
actual distance by matrix Mhidden then take summation
![Page 53: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/53.jpg)
Experiment 1: final k-NNs
Ellipse: isosurface for Mhidden
Red points: final k-NNs obtained by standard deviation method
Green points: final k-NNs obtained by MindReader
![Page 54: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/54.jpg)
Experiment 1: speed of convergence
x-axis: no. of iterations
y-axis: CD-k measure value
Red : standard deviation method
Green: MindReader
Blue: best CD-k value possible for the data set
![Page 55: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/55.jpg)
Experiment 1: changes of isosurfaces
After 0th and 2nd iterations
![Page 56: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/56.jpg)
Experiment 1: changes of isosurfaces
After 4th and 8th iterations
![Page 57: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/57.jpg)
Experiment 2: query-point movement
Starts from query point (0.5, 0.5)
MindReader converges to Mhidden with five iterations
![Page 58: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/58.jpg)
Experiment 3: real data set
End-points of road segments from the Montgomery County, MD
Data is normalized to [-1, 1] [-1, 1]
The query specifies five points along route I-270
Can we estimate good distance function?
![Page 59: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/59.jpg)
Experiment 3: isosurfaces
After 0th and 2nd iterations: fast convergence!
![Page 60: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/60.jpg)
Discussion: efficiency
Don’t worry about speed! ellipsoid query support using spatial access
methods: Seidl & Kriegel [VLDB97] Ankerst, Branmüller, Kriegel, Seidl [VLDB98]
for the derived distance, we can efficiently use spatial index
![Page 61: MindReader: Querying databases through multiple examples](https://reader035.vdocument.in/reader035/viewer/2022062217/568135e4550346895d9d5778/html5/thumbnails/61.jpg)
Conclusion
MindReader automatically guess diagonal queries from the given examples multiple levels of scores includes “Rocchio” and “MARS” (standard
deviation method) problem formulation & solution evaluation based on the experiments