multi feature indexing networkdistributed similarity search structures •native metric structures:...

64
Multi Feature Indexing Network MUFIN Similarity Search Platform for many Applications Pavel Zezula Faculty of Informatics Masaryk University, Brno 23.1.2012 1 MUFIN: Multi Feature Indexing Network

Upload: others

Post on 27-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

Multi Feature Indexing Network MUFIN

Similarity Search Platform for many Applications

Pavel Zezula

Faculty of Informatics

Masaryk University, Brno

23.1.2012 1 MUFIN: Multi Feature Indexing Network

Page 2: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

Outline of the talk

• Why similarity

• Principles of metric similarity searching

• The MUFIN approach

• Demo applications

• Future directions

23.1.2012 MUFIN: Multi Feature Indexing Network 2

Page 3: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

Real-Life Motivation The social psychology view

• Any event in the history of organism is, in a sense, unique.

• Recognition, learning, and judgment presuppose an ability to categorize stimuli and classify situations by similarity.

• Similarity (proximity, resemblance, communality, representativeness, psychological distance, etc.) is fundamental to theories of perception, learning, judgment, etc.

23.1.2012 3 MUFIN: Multi Feature Indexing Network

Page 4: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

Contemporary Networked Media The digital data view

• Almost everything that we see, read, hear, write, measure, or observe can be digital.

• Users autonomously contribute to production of global media and the growth is exponential.

• Sites like Flickr, YouTube, Facebook host user contributed content for a variety of events.

• The elements of networked media are related by numerous multi-facet links of similarity.

23.1.2012 4 MUFIN: Multi Feature Indexing Network

Page 5: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

Examples with Similarity

• Does the computer disk of a suspected criminal contain illegal multimedia material?

• What are the stocks with similar price histories?

• Which companies advertise their logos in the direct TV transmission of football match?

• Is it the situation on the web getting close to any of the network attacks which resulted in significant damage in the past?

23.1.2012 5 MUFIN: Multi Feature Indexing Network

Page 6: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

Challenge

• Networked media is getting close to the human “fact-bases” – the gap between physical and digital has blurred

• Similarity data management is needed to connect,

search, filter, merge, relate, rank, cluster, classify, identify, or categorize objects across various collections.

WHY? It is the similarity which is in the world revealing.

23.1.2012 6 MUFIN: Multi Feature Indexing Network

Page 7: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

Limitations: Data Types

We have • Attributes

– Numbers, strings, etc.

• Text (text-based) – Documents, annotations

We need • Multimedia

– Image, video, audio

• Security – Biometrics

• Medicine – EKG, EEG, EMG, EMR, CT, etc.

• Scientific data – Biology, chemistry, physics,

life sciences, economics

• Others – Motion, emotion, events, etc.

23.1.2012 7 MUFIN: Multi Feature Indexing Network

Page 8: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

Limitations: Models of Similarity

We have • Simple geometric models,

typically vector spaces

We need • More complex model

• Non metric models

• Asymmetric similarity

• Subjective similarity

• Context aware similarity

• Complex similarity

• Etc.

23.1.2012 8 MUFIN: Multi Feature Indexing Network

Page 9: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

Limitations: Queries

We have • Simple query

– Nearest neighbor

– Range

We need • More query types

– Reverse NN, distinct NN, similarity join

• Other similarity-based operations – Filtering, classification, event

detection, clustering, etc.

• Similarity algebra – May become the basis of a

“Similarity Data Management System”

23.1.2012 9 MUFIN: Multi Feature Indexing Network

Page 10: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

Limitations: Implementation Strategies

We have • Centralized or parallel

processing

We need • Scalable and distributed

architectures

• MapReduce like approaches

• P2P architectures

• Cloud computing

• Self-organized architectures

• Etc.

23.1.2012 10 MUFIN: Multi Feature Indexing Network

Page 11: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

Search Strategy Evolution

Scalability ● data volume - exponential ● number of users (queries) ● variety of data types ● multi-lingual, -feature –modal queries

Determinism exact match ► similarity precise ► approximate same answer ► good answer; recommendation fixed query ► personalized; context aware fixed infrastr. ► dynamic mapping; mobile dev.

grad

e

high

low

well established cutting-edge research

pe

er-

to-p

ee

r

cen

tral

ize

d

par

alle

l

dis

trib

ute

d

self

-org

aniz

ed

23.1.2012 11 MUFIN: Multi Feature Indexing Network

Page 12: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

similarity effectiveness efficiency

stimuli

algebra

Similarity Data Management System

Similarity Data

Management System

23.1.2012 12 MUFIN: Multi Feature Indexing Network

Page 13: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

Metric Search Grows in Popularity

Hanan Samet Foundation of Multidimensional and Metric Data Structures Morgan Kaufmann, 2006

P. Zezula, G. Amato, V. Dohnal, and M. Batko Similarity Search: The Metric Space Approach Springer, 2006

23.1.2012 13 MUFIN: Multi Feature Indexing Network

Page 14: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

The MUFIN Approach

MUFIN: MUlti-Feature Indexing Network

SEARCH

infrastructure

Scalability P2P structure

Extensibility metric space

Independence Infrastructure as a service

23.1.2012 14 MUFIN: Multi Feature Indexing Network

Page 15: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

Extensibility: Metric Abstraction of Similarity

• Metric space: M = (D,d) – D – domain

– distance function d(x,y)

x,y,z D

• d(x,y) > 0 - non-negativity

• d(x,y) = 0 x = y - identity

• d(x,y) = d(y,x) - symmetry

• d(x,y) ≤ d(x,z) + d(z,y) - triangle inequality

23.1.2012 15 MUFIN: Multi Feature Indexing Network

Page 16: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

Examples of Distance Functions

• Lp Minkovski distance (for vectors) • L1 – city-block distance

• L2 – Euclidean distance

• L – infinity

• Edit distance (for strings) • minimal number of insertions, deletions and substitutions

• d(‘application’, ‘applet’) = 6

• Jaccard’s coefficient (for sets A,B)

n

i

ii yxyxL1

1 ||),(

n

i

ii yxyxL1

2

2 ),(

ii

n

i

yxyxL max),(1

BA

BABAd 1,

23.1.2012 16 MUFIN: Multi Feature Indexing Network

Page 17: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

Examples of Distance Functions

• Mahalanobis distance

– for vectors with correlated dimensions

• Hausdorff distance

– for sets with elements related by another distance

• Earth movers distance

– primarily for histograms (sets of weighted features)

• and many others

23.1.2012 17 MUFIN: Multi Feature Indexing Network

Page 18: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

Similarity Search Problem

• For X D in metric space M,

pre-process X so that the similarity queries

are executed efficiently.

No total ordering exists!

23.1.2012 18 MUFIN: Multi Feature Indexing Network

Page 19: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

23.1.2012 MUFIN: Multi Feature Indexing Network 19

Similarity Queries

• Range query

• Nearest neighbor query

• Similarity join

• Combined queries

• Complex queries

Page 20: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

23.1.2012 MUFIN: Multi Feature Indexing Network 20

Similarity Range Query

• range query

– R(q,r) = { x X | d(q,x) ≤ r }

… all museums up to 2km from my hotel …

r q

Page 21: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

23.1.2012 MUFIN: Multi Feature Indexing Network 21

Nearest Neighbor Query

• the nearest neighbor query – NN(q) = x – x X, y X, d(q,x) ≤ d(q,y)

• k-nearest neighbor query

– k-NN(q,k) = A – A X, |A| = k – x A, y X – A, d(q,x) ≤ d(q,y)

… five closest museums to my hotel …

q

k=5

Page 22: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

23.1.2012

MUFIN: Multi Feature Indexing Network 22

Similarity Join Queries

• similarity join of two data sets

• similarity self join X = Y

…pairs of hotels and museums which are five minutes walk

apart …

}),(:),{(),,(

0,,

yxdYXyxYXJ

YX DD

Page 23: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

23.1.2012 MUFIN: Multi Feature Indexing Network 23

Combined Queries

• Range + Nearest neighbors

• Nearest neighbor + similarity joins

– by analogy

}),(),(),(

:,||,{),(

rxqdyqdxqd

RXyRxkRXRrqkNN

Page 24: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

23.1.2012 MUFIN: Multi Feature Indexing Network 24

Complex Queries

• Find the best matches of circular shape objects with red color

• The best match for circular shape or red color needs not be the best match combined

• A0 algorithm

• Threshold algorithm

Page 25: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

23.1.2012 MUFIN: Multi Feature Indexing Network 25

Partitioning Principles

• Given a set X D in M=(D,d), basic

partitioning principles have been defined:

– Ball partitioning

– Generalized hyper-plane partitioning

– Excluded middle partitioning

– Clustering

Page 26: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

23.1.2012 MUFIN: Multi Feature Indexing Network 26

Ball Partitioning

• Inner set: { x X | d(p,x) ≤ dm }

• Outer set: { x X | d(p,x) > dm }

p dm

Page 27: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

23.1.2012 MUFIN: Multi Feature Indexing Network 27

Generalized Hyper-plane

• { x X | d(p1,x) ≤ d(p2,x) }

• { x X | d(p1,x) > d(p2,x) }

p2

p1

Page 28: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

23.1.2012 MUFIN: Multi Feature Indexing Network 28

Excluded Middle Partitioning

• Inner set: { x X | d(p,x) ≤ dm - } • Outer set: { x X | d(p,x) > dm + }

• Excluded set: otherwise

p

dm

2

p

dm

Page 29: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

23.1.2012 MUFIN: Multi Feature Indexing Network 29

Clustering

• Cluster data into sets

– bounded by a ball region

– { x X | d(pi,x) ≤ ric }

Page 30: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

Scalability: Peer-to-Peer Indexing

• Local search: M-tree, D-Index, M-Index

• Native metric techniques: GHT*, VPT*

• Transformation techniques: M-CAN, M-Chord

23.1.2012 30 MUFIN: Multi Feature Indexing Network

Page 31: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

The M-tree [Ciaccia, Patella, Zezula, VLDB 1997]

1) Paged organization

2) Dynamic

3) Suitable for arbitrary metric spaces

4) I/O and CPU optimization - computing d can be time-consuming

23.1.2012 31 MUFIN: Multi Feature Indexing Network

Page 32: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

The M-tree Idea

• Depending on the metric, the “shape” of index regions changes

C D E F

A B

B

F D

E A

C

Metric: L2 (Euclidean)

L1 (city-block) L (max-metric) weighted-Euclidean quadratic form

23.1.2012 32 MUFIN: Multi Feature Indexing Network

Page 33: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

23.1.2012 MUFIN: Multi Feature Indexing Network 33

o7

M-tree: Example

o1 o6

o10

o3

o2

o5

o4

o9

o8

o11

o1 4.5 -.- o2 6.9 -.-

o1 1.4 0.0 o10 1.2 3.3 o7 1.3 3.8 o2 2.9 0.0 o4 1.6 5.3

o2 0.0 o8 2.9 o1 0.0 o6 1.4 o10 0.0 o3 1.2

o7 0.0 o5 1.3 o11 1.0 o4 0.0 o9 1.6

Covering radius

Distance to parent Distance to parent Distance to parent

Distance to parent Leaf entries

Page 34: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

M-tree family

• Bulk loading

• Slim-tree

• Multi-way insertion

• PM-tree

• M2-tree

• etc.

23.1.2012 34 MUFIN: Multi Feature Indexing Network

Page 35: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

D-Index [Dohnal, Gennaro, Zezula, MTA 2002]

4 separable buckets at

the first level

2 separable buckets at

the second level

exclusion bucket of

the whole structure

23.1.2012 35 MUFIN: Multi Feature Indexing Network

Page 36: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

D-index: Insertion

23.1.2012 36 MUFIN: Multi Feature Indexing Network

Page 37: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

D-index: Range Search

q

r

q

r

q

r

q

r

q

r

q

r

23.1.2012 37 MUFIN: Multi Feature Indexing Network

Page 38: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

Implementation Postulates of Distributed Indexes

• dynamism – nodes can be added and removed

• no hot-spots – no centralized nodes, no flooding by messages (transactions)

• update independence – network update at one site does not require an immediate change propagation to all the other sites

23.1.2012 38 MUFIN: Multi Feature Indexing Network

Page 39: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

Distributed Similarity Search Structures

• Native metric structures:

– GHT* (Generalized Hyperplane Tree)

– VPT* (Vantage Point Tree)

• Transformation approaches:

– M-CAN (Metric Content Addressable Network)

– M-Chord (Metric Chord)

23.1.2012 39 MUFIN: Multi Feature Indexing Network

Page 40: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

23.1.2012 MUFIN: Multi Feature Indexing Network 40

GHT* Address Search Tree

• Based on the Generalized Hyperplane Tree [Uhl91]

– two pivots for binary partitioning

p6

p5

p3

p4

p1

p2

p1 p2

p5 p6 p3 p4

Page 41: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

23.1.2012 MUFIN: Multi Feature Indexing Network 41

GHT* Address Search Tree

• Inner node

– two pivots (reference objects)

• Leaf node

– BID pointer to a bucket if data stored on the current peer

– NNID pointer to a peer if data stored on a different peer

p1 p2

p5 p6 p3 p4

BID1 BID2 BID3 NNID2

Peer 2

Page 42: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

23.1.2012 MUFIN: Multi Feature Indexing Network 42

GHT* Address Search Tree

Peer 2 Peer 3

Peer 1

Page 43: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

23.1.2012 MUFIN: Multi Feature Indexing Network 43

BID1 BID2 BID3 NNID2

Peer 2

p1 p2

p5 p6 p3 p4

Peer 2

BID3 NNID2

p5 p6

p1 p2

GHT* Range Query

• Range query R(q,r)

– traverse peer’s own AST

– search buckets for all BIDs found

– forward query to all NNIDs found

p6

p5

p3

p4

r q

p1

p2

Page 44: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

23.1.2012 MUFIN: Multi Feature Indexing Network 44

AST: Logarithmic replication

• Full AST on every peer is space consuming

– replication of pivots grows in a linear way

• Store only a part of the AST:

– all paths to local buckets

• Deleted sub-trees:

– replaced by NNID of the leftmost peer

p13 p14 p11 p12

p5 p6

p1 p2

p3 p4

p7 p8 p9 p10

NNID2 NNID3 BID1 NNID4 NNID5 NNID6 NNID7 NNID8

p1 p2

p3 p4

p7 p8

BID1 NNID3 NNID5

Page 45: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

23.1.2012 MUFIN: Multi Feature Indexing Network 45

AST: Logarithmic Replication (cont.)

• Resulting tree

– replication of pivots grows in a logarithmic way

p1 p2

p3 p4

p7 p8

NNID2

NNID3

BID1

NNID5

p1 p2

p3 p4

p7 p8

BID1

Page 46: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

23.1.2012 MUFIN: Multi Feature Indexing Network 46

p1

r1

p3

r3

VPT* Structure

• Similar to the GHT* - ball partitioning is used for AST

Based on the Vantage Point Tree [Yia93]

• inner nodes have one pivot and a radius

• different traversing conditions

p2

r2

p1 (r1)

p2 (r2) p3 (r3)

Page 47: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

M-Chord: The Metric Chord

• Transform metric space to one-dimensional domain

– Use M-Index - a generalized version of the iDistance

• Divide the domain into intervals

– assign each interval to a peer

• Use the Chord P2P protocol for navigation

• The Skip graphs distributed protocol can be used, alternatively

23.1.2012 47 MUFIN: Multi Feature Indexing Network

Page 48: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

– range query R(q,r): identify intervals of

interest

• Generalization to metric spaces

– select pivots – then partition: Voronoi-style

M-Chord: Indexing the Distance

• iDistance – indexing technique for vector domains

– cluster analysis = centers = reference points pi

– assign iDistance keys to objects iCx

cixpdxiDist i ),()(

},...,{ 0 npp

23.1.2012

48 MUFIN: Multi Feature Indexing Network

Page 49: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

M-Chord: Chord Protocol

• Peer-to-Peer navigation protocol

• Peers are responsible for intervals of keys

• hops to localize a node storing a key

M-Chord

set the iDistance domain

make it uniform: function h

Use Chord on this domain

)(logn

)),(()( cixpdhxmchord i

23.1.2012 49 MUFIN: Multi Feature Indexing Network

Page 50: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

M-Chord: Range Query

• Node Nq initiates the search

• Determine intervals

– generalized iDistance

• Forward requests to peers on

intervals

• Search in the nodes

– using local organization

• Merge the received partial

answers

23.1.2012

50 MUFIN: Multi Feature Indexing Network

Page 51: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

23.1.2012 MUFIN: Multi Feature Indexing Network 51

M-CAN: The Metric CAN

• Based on the Content-Addressable Network (CAN) – a DHT navigating in an N-dimensional vector space

• The Idea: 1. Map the metric space to a vector space

– given N pivots: p1, p2 , … , pN, transform every o into vector F(o)

2. Use CAN to

– distribute the vector space zones among the nodes – navigate in the network

Page 52: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

23.1.2012 MUFIN: Multi Feature Indexing Network 52

CAN: Principles & Navigation

• CAN – the principles – the space is divided in zones

– each node “owns” a zone

– nodes know their neighbors

• CAN – the navigation – greedy routing

– in every step, move to the neighbor closer to the target location

2-d

imensi

onal vect

or

space

1

6 2

5 3

4

x,y

Page 53: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

23.1.2012 MUFIN: Multi Feature Indexing Network 53

M-CAN: Contractiveness & Filtering

• Use the L∞ as a distance measure

– the mapping F is contractive

• More pivots better filtering

– but, CAN routing is better for less dimensions

• Additional filtering

– some pivots are only used for filtering data (inside the explored nodes)

– they are not used for mapping into CAN vector space

),())(),(( yxdyFxFL

Page 54: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

Infrastructure Independence: MESSIF Metric Similarity Search Implementation Framework

Metric space (D,d) Operations Storage

Centralized index structures

Distributed index structures

Communication

Net Vectors

• Lp and quadratic form

Strings

• (weighted) edit and

protein sequence

Insert, delete,

range query,

k-NN query,

Incremental k-NN

Volatile memory

Persistent memory

Performance statistics

23.1.2012 54 MUFIN: Multi Feature Indexing Network

Page 55: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

Applications: a Word Cloud

23.1.2012 57 MUFIN: Multi Feature Indexing Network

Page 56: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

Concepts of the Image search

Image base

23.1.2012 58 MUFIN: Multi Feature Indexing Network

Page 57: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

Images and their Descriptors

Image level

R

B

G

Descriptor level

23.1.2012 59 MUFIN: Multi Feature Indexing Network

Page 58: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

• Largest publicly available collection of high-quality images metadata: 106 million images

• Each image contains: • Five MPEG-7 VDs: Scalable Color, Color Structure, Color Layout, Edge

Histogram, Homogeneous Texture

• Other textual information: title, tags, comments, etc.

• Photos have been crawled from the Flickr photo-sharing site.

http://cophir.isti.cnr.it/

100M images + metadata + MPEG-7 VDs

CoPhIR: Content-based Photo Image Retrieval

23.1.2012 60 MUFIN: Multi Feature Indexing Network

Page 59: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

MUFIN SEARCH ENGINE

infrastructure

Scalability M-Chord + M-Index

Extensibility COPHIR

edge histogram

color structure

scalable color

homogeneous texture

color layout

6 x IBM server x3400 – 2 servers used

Image Search Demo http://mufin.fi.muni.cz/imgsearch/

23.1.2012 61 MUFIN: Multi Feature Indexing Network

Page 60: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

MUFIN demos

• http://mufin.fi.muni.cz/imgsearch/similar

• http://www.pixmac.com/

• http://mufin.fi.muni.cz/twenga/random

• http://mufin.fi.muni.cz/fingerprints/random

• http://mufin.fi.muni.cz/subseq/random

• http://mufin.fi.muni.cz/plugins/annotation

23.1.2012 62 MUFIN: Multi Feature Indexing Network

Page 61: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

MUFIN Future Research Directions

• MUFIN - a universal similarity search technology

• Research directions in: – Core technology

– Applications

– A style of computing

MUFIN Search Engine

infrastructure

Scalability P2P structures

Extensibility metric space

Performance Tuning

23.1.2012 63 MUFIN: Multi Feature Indexing Network

Page 62: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

MUFIN Future Research Directions

October 28, 2011

MUFIN Search Engine

infrastructure New style of computing

Cloud Computing Similarity Search as Service

23.1.2012 64 MUFIN: Multi Feature Indexing Network

Page 63: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

Major Applications

– Images: • Sub-image retrieval

• Ranking

• Annotation

• Categorization

• Benchmarking

– Biometrics: • Face recognition

• Fingerprint recognition

• Gait recognition

– Signals: • Audio recognition

• Time series similarity

– Videos: • Event detection

23.1.2012 65 MUFIN: Multi Feature Indexing Network

Page 64: Multi Feature Indexing NetworkDistributed Similarity Search Structures •Native metric structures: –GHT* (Generalized Hyperplane Tree) –VPT* (Vantage Point Tree) •Transformation

A New Style of Computing

• From the project-oriented approach towards similarity cloud for multimedia findability

through similarity searching

Advantages: – Cloud makes similarity search accessible to common

users

– Computational resources are shared – users don’t need to maintain any hardware infrastructure

– Users don’t need to care for the OS, security, software platform, etc.

23.1.2012 66 MUFIN: Multi Feature Indexing Network