the multimedia semantic web

The Multimedia Semantic WebBill Grosky

Multimedia Information Systems LaboratoryUniversity of Michigan-Dearborn

Dearborn, Michigan

Contents Introduction

CBR – Where are we? Multimedia annotation Context-rich environments Semantic web

Our work Anglograms Finding latent semantics Using text for improved image search Using images for improved text search Web page structure A cross-modal theory of linked document semantics

CBR – Where are We? Development of feature-based techniques for

content-based retrieval is a mature area, at least for images

CBR researchers should now concentrate on extracting semantics from multimedia documents so that retrievals using concept-based queries can be tailored to individual users The semantic gap

(Semi)-automated multimedia annotation

Multimedia Annotation

Multimedia annotations should be semantically rich Multiple semantics

A social theory based on how multimedia information is used

This can be discovered by placing multimedia information in a natural, context-rich environment

Context-Rich Environments

Structural context – Author’s contribution Document’s author places semantically

similar pieces of information close to each other

User can cluster together semantically similar pieces of information

Dynamic context – User’s contribution Short browsing sub-paths are semantically

coherent

Context-Rich Environments

The WEB is a perfect example of a context-rich environment

Develop multimedia annotations through cross-modal techniques Audio Images Text Video

Semantic Web This program overlaps another very important

current research topic, the semantic web Web page annotations are the backbone of this

research effort We have something very important to offer to this

area Multimedia documents Deriving multiple semantics for a single document

Combining our efforts will enrich both communities

Semantic Web

“The Semantic Web is a new initiative to transform the web into a structure that supports more intelligent querying and browsing, both by machines and by humans. This transformation is to be supported through the generation and use of metadata constructed via web annotation tools using user-defined ontologies that can be related to one another.”

Somewhere on the web

Semantic Web

x C D

Web-Page AnnotationTool

Ontology ConstructionTool

End User

Community Portal

InferenceEngine

Metadata RepositoryAnnotated Web Pages

Ontology Articulation Toolkit

Ontologies

Agents

Based on www.semanticweb.org

http://images.google.com/imgres?imgurl=www.flash.net/~akstudio/open.door.gif.gif&imgrefurl=http://www.flash.net/~akstudio/&h=368&w=198&prev=/images%3Fq%3D%2522open%2Bdoor%2522%26svnum%3D10%26hl%3Den%26imgsafe%3Doff

Semantic Web

Plan a vacation within the next month Bill instructed his semantic web agent through

his handheld browser. An agent retrieved Bill’s vacation profile from his

travel agent, retrieved Bill’s availability from his calendar, checked availability of airlines, hotels and restaurants, and made all the necessary arrangements.

Semantic Web

Multimedia semantic web Plan a vacation close to where

is being exhibited.

http://www.vangoghmuseum.nl/collection/catalog/vglpainting.asp?ARTID=68&LANGID=0&SEL=1&PERIOD=-1&SORT=2

Anglograms

Image object Entire image Some meaningful portion of an image

semcon Point-based features

corner points color histograms

Anglograms

Point feature mapfor shape

Anglograms

Point feature mapfor color

Anglograms

Voronoi diagram of n = 18 sites

Anglograms

Dual graph of a Voronoidiagram

Delaunay triangulation ofn = 18 sites

Anglograms

Delaunay triangulation of a set of n points O(n log n) algorithm

Invariance of Delaunay triangles of a set of points to translation rotation scaling

Anglograms

Spatial layout of point set Anglogram

Computed by discretizing and counting the angles of the Delaunay triangles

Which angles are counted? O(max(n #bins)) algorithm

What is bin size?

A set of 26 points

Delaunay triangulations of the point set and its two transformed variants

Anglograms

Computation of color anglogram of an image Divide image evenly into a number of M*N

non-overlapping blocks Each individual block is abstracted as a

unique feature point labeled with its spatial location and dominant colors

Anglograms

Computation of color anglogram of an image Point feature map

Normalized feature points, after adjusting any two neighboring feature points to a fixed distance

Construct Delaunay triangulation for each set of feature points labeled with identical color

Anglograms

Computation of color anglogram of an image Compute anglogram based on each Delaunay

triangulation Color anglogram for image

Concatenating all the anglograms together

Anglograms

Pyramid image

Anglograms

Anglograms

Hue component

Anglograms

Saturation component

Anglograms

Point feature map

Anglograms

Feature points ofhue 2

Anglograms

Delaunay triangulationof hue 2

Anglograms

Delaunay triangulationof saturation 5

Anglograms

Anglogram

0102030405060

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Bin number

Num

ber o

f ang

les

Anglogram of saturation 5

Finding Latent Semantics

We want to transform low-level features to a higher level of meaning

Used for dimension reduction in QBIC Searching in high-dimensional spaces

More importantly, it creates clusters of co-occurring features So-called concepts

Finding Latent Semantics Latent Semantic Analysis (LSA) was introduced

to overcome a fundamental problem in textual information retrieval

Users want to retrieve on the basis of conceptual content Individual words provide unreliable evidence about

conceptual meanings Synonymy

Many ways to refer to the same object Polysemy

Most words have more than one distinct meaning


Searching for documents concerning automobiles Tend to use the key-word automobile

A statistical analysis determines that the key-words automobile and car tend to co-occur

LSA will retrieve documents in which the key-word car appears, but not the key-word automobile


Term-document association It is assumed that there exists some underlying latent

semantic structure in the data that is partially obscured by the randomness of term choice

By semantic structure we mean the correlation structure in which individual terms appear in documents

Semantic implies only the fact that terms in a document may be taken as referents to the document itself or to its topic

Statistical techniques are used to estimate this latent semantic structure, and to get rid of obscuring noise

Finding Latent Semantics Singular-value decomposition (SVD)

Take a large matrix of term-document association Construct a semantic space wherein terms and documents that

are closely associated are placed near to each other SVD allows the arrangement of space to reflect the major

associative patterns and ignore smaller, less important influence As a result, terms that did not actually appear in a document

may still end up close to the document, if that is consistent with the major patterns of association

Position in the space serves as the semantic indexing Retrieval proceeds by using the terms in a query to identify a

point in the semantic space, and documents in its neighborhood are returned as relevant results


Term-document matrix d documents t terms Represented by a t d term-document matrix

A Each document is represented by a column

document vector Each term is represented by a row

term vector

Finding Latent SemanticsThe terms (t = 6)

t1: bak(e,ing) t2: recipes t3: bread t4: cake t5: pastr(y,ies) t6: pie

The document titles (d = 5) d1: How to Bake Bread Without Recipes d2: The Classic Art of Viennese Pastry d3: Numerical Recipes: The Art of Scientific Computing d4: Breads, Pastries, Pies and Cakes: Quantity Baking Recipes d5: Pastry: A Book of Best French Recipes


000100101100010100101011110010

A

04082.00007071.04082.001004082.000004082.0005774.0

7071.04082.0105774.004082.0005774.0

A


SVD is a dimension reduction technique Reduced-rank approximation to both column

space and row space Find a rank-k approximation to matrix A with

minimal change to that matrix for a given value of k

This decomposition exists for any matrix A

Finding Latent Semantics SVD of a term-document matrix A

A = U VT

A is t d U is a t r orthogonal matrix, where r is rank(A)

The columns of U are a basis for the column space of A U is the matrix of eigenvectors of the matrix AAT

is an r r diagonal matrix having singular values 1 2 … r of A in order along its diagonal

2 is the matrix of eigenvalues of AAT or ATA VT is a r d orthogonal matrix

The rows of VT are a basis for the row space of A V is the matrix of eigenvectors of the matrix ATA


t d t r r r r d


A special rank-k approximation, Ak

Ak = Uk k VkT

Uk First k columns of U

k First k diagonal values of

VkT

First k rows of VT


04082.00007071.04082.001004082.000004082.0005774.0

7071.04082.0105774.004082.0005774.0

A

7071.007071.0000

06394.02774.00127.01182.001158.00838.08423.05198.006394.02774.00127.01182.0

7071.02847.05308.02567.02670.000816.05249.03981.07479.07071.02847.05308.02567.02670.0

U

000000001004195.0000008403.0000001158.1000006950.1

7071.00577.03712.02815.05288.006571.05711.00346.04909.05000.01945.06247.03568.04412.05000.02760.00998.07549.03067.006715.03688.04717.04366.0

V


Reduce the rank to 3

04082.00007071.04082.001004082.000004082.0005774.0

7071.04082.0105774.004082.0005774.0

A

0155.02320.00522.00740.01801.07043.04402.00094.09866.00326.00155.02320.00522.00740.01801.00069.04867.00232.00330.04971.0

7091.03858.09933.00094.06003.00069.04867.00232.00330.04971.0

3A

Finding Latent Semantics Documents w/o SVD

Term 1 2 3 4Mark 15 0 0 0Twain 15 0 20 0Samuel 0 10 5 0Clemens 0 20 10 0Purple 0 0 0 20Lion 0 0 0 15

30 0 20 0

Query

Score

110000

Finding Latent SemanticsDocument with SVD

Term 1 2 3 4Mark 3.7 3.5 5.5 0Twain 11.0 10.3 16.1 0Samuel 4.1 3.9 6.1 0Clemens 8.3 7.8 12.2 0Purple 0 0 0 20Lion 0 0 0 15

14.7 13.8 21.6 0

Query

Score

110000

Using Text for Improved Image Search

10 sets of 5 similar images


Color anglogram Each image is divided into 64 non-

overlapping blocks Extract average hue and average saturation values of each

block Hue and saturation each quantized into 10 values Generate Delaunay triangles for each hue value and each

saturation value Count two largest angles and quantize them into 36 bins,

each of 5° Feature vector has 720 elements


Annotations Extra 15 elements

Category positions sky, sun, land, water, boat, grass, horse, rhino, bird,

human, pyramid, column, tower, sphinx, snow

Each image annotated with appropriate keywords and the area coverage of each of these keywords

e.g., sky (0.55), sun (0.15), water (0.30)


Raw color global histogram data

Raw color global histogram data using LSA

Annotated color global histogram data using LSA

0.3% improvement

0.5% improvement


Raw color anglogram data

Raw color anglogram data using LSA

Annotated color anglogram data using LSA

0.5% improvement

1% improvement

Using Images for Improved Text Search

Using documents collected from news Web sites News headlines are often used as URL anchors and

document titles Topic can be represented easily and clearly by a

group of keywords in the headline News web sites often have extensive coverage of the

same topic during certain period of time News documents often include multimedia

components which are closely related to the topic


Discover the semantic correlation between keywords and image in the same document

A collection of 20 documents from cnn.com 4 semantic categories of 5 documents each 43 keywords Select 1 image from each document

Color anglogram


1. Bush, in first address as president2. Education, tax cuts top Bush's Washington agenda3. Campaign promises could prove troublesome for Bush4. Bush's to-do list: Set tone for next four years5. George W. Bush: The 43rd President6. Rescue mission for crippled Russian sub enters second day7. Russian official says chances not good for rescue of trapped crew aboard sunken nuclear sub8. Kursk salvage raises questions9. Russia to start recovering Kursk bodies10. Russian navy begins attempt to evacuate sailors from sunken sub11. Clinton acquitted; president apologizes again12. Clinton apologizes to nation13. Clinton's evolving apology for the Lewinsky affair14. Clinton will not address impeachment in State of the Union15. Clinton says 'presidents are people, too'16. MIR prepares for risky plunge17. Mir positioned for fiery descent18. A Mir risk19. Mir demise causes international high anxiety20. New Zealand issues Mir warning


Integrated feature vector F = [f1, f2,…, f143]T

Textual feature vector K = [k1, k2, …, k43]T

Image feature vector I = [i1, i2, …, i100]T

Feature document matrix A = [F1, F2, …, F20] A = UΣVT

U is 143 143, Σ is 143 20, and V is 20 20 k = 12

Ak = UkΣkVkT

Uk is 143 12, Σk is 12 12, and Vk is 20 12


Each image is normalized to 192 128, and then divided into 64 non-overlapping blocks

Extract average hue and saturation values of each block

Hue and saturation each quantized into 10 values

Generate Delaunay triangles for each hue value and each saturation value


Count two largest angles and quantize them into 36 bins, each of 5°

Image feature vector has 720 elements Feature document matrix A is 763 20

SVD k = 12


Keywords only

Keywords using LSA

Image (anglogram) annotated keywords using LSA

1% improvement

21% improvement

Image (global color histogram) annotated keywords using LSA

3% improvement

Web Page Structure Genre detection We do the following:

Display web page in the program Get tag hierarchy with area co-ordinates Normalize the web page to size 512 * 512 Divide page in 16*16 blocks Calculate area covered by each tag in each block

considering the level of the tag in tag hierarchy For each feature tag get the center coordinates of the

blocks where it is covering maximum area as compared with other tags on the same level

Web Page Structure

Web Page Structure

Histogram 36 bins with two large angles Tags independent of level

Try approach where tag on lower level overrides upper-level tag

Web Page Structure

Set of tags defined - Initially, a large set of feature tags (52) is

defined to ensure a powerful set of independent features for the discrimination of web pages

A second set of tags (3) is defined based on histograms created for initial set of tags so that these tags will better differentiate web pages

Web Page Structure

Experiment # 1 Categories defined are

Detroit News Times of India Tribune India Esakal Amazon.com Buy.com

Web Page Structure

Cluster category based on closest page

Matches Failures

52 tags 26 10

3 tags 27 9

Web Page Structure

Experiment # 2 Categories defined are

News paper environment Detroit News Times of India Tribune India Esakal

e - Commerce environment Amazon.com Buy.com

Web Page Structure

Matches Failures

52 tags 33 3

3 tags 33 3

A Cross-Modal Theory of Linked Document Semantics

Environment Suppose one has a linked set of multimedia

documents Web Content-based hypermedia

This provides a rich context for individual chunks of information

The structure of individual multimedia documents The link structure


Goal Derive document semantics based on user

browsing behavior The same document has multiple semantics

Different people see different meanings in the same document

Over short browsing paths, an individual user’s wants and needs are uniform

The pages visited over these short paths exhibit semantics in congruence with these wants and needs


Questions How can the semantics of a web page be derived

given a set of user browsing paths that end at that page?

How can we characterize the semantics of a user browsing path?

How can web page semantics help us in navigating the web more efficiently?

How can our approach actually be implemented in the real web world?


Our approach We use actual browsing paths to find the

latent semantics of web pages Textual features Image features Structural features

We hope to find general concepts comprising various textual and image features which frequently co-occur


We believe that a user’s browsing path exhibits semantic coherence While the user’s entire path exhibits multiple

semantics, especially pages far from each other on the path, neighboring pages, especially the portions close to the links taken, are semantically close to each other


We would like to characterize the contiguous sub-paths of a user’s browsing path that exhibit similar semantics and detect the semantic break points along the path where the semantics appreciably change Collect these sub-paths into a multiset


We categorize the semantics of each web page based on a history of the semantically-coherent browsing paths of all users which end at that page

A browsing path will be represented by a high-dimensional vector

The various positions of the vector correspond to the presence of textual keywords image features (visual keywords) structural features (structural keywords)


From the complete set of web pages under consideration, we extract a set of textual, visual, and structural keywords

For each multiset, M, of sub-paths that we are to analyze, we form three matrices term-path matrix image-path matrix structure-path matrix


The (i,j)th element of these matrices are determined by Strength of the presence of ith keyword along the jth

browsing path Determined by

How many times this term occurs on the pages along the path How much time the user spends examining these pages How close each occurrence of the ith keyword is to both the

outgoing and incoming anchor positions How many times this browsing path occurs in M


These matrices may be concatenated together in various ways to produce an overall keyword-path matrix

Perform latent-semantic analysis to get concepts

A page is then represented by a set of concept classes

Conclusions Researchers in CBR should now be

concentrating on extracting semantics from multimedia documents

The web is a perfect testbed for studying semi-(automated) techniques for multimedia annotation due to contextual richness

CBR + Semantic Web = The Multimedia Semantic Web

Get Involved!!!

the multimedia semantic web

Documents

semantic web agent

web annotation tools

based techniques

vacation close

contentbased retrieval

conceptbased queries

travel agent

bills vacation profile