multipedia: enriching dbpedia with multimedia information

16
Multipedia: Enriching DBpedia with Images Andrés García-Silva , Asunción Gómez-Pérez Max Jakob * , Pablo Mendez * and Chris Bizer † {hgarcia, ocorcho,asun}@fi.upm.es Facultad de Informática Universidad Politécnica de Madrid Campus de Montegancedo s/n 28660 Boadilla del Monte, Madrid, Spain *[email protected] Web-based Systems Group Freie Universitat Berlin, Germany

Upload: oscar-corcho

Post on 19-Jan-2015

1.354 views

Category:

Technology


0 download

DESCRIPTION

Presentation given by Andrés García at KCAP2011 on the selection of images for dbpedia terms

TRANSCRIPT

Page 1: Multipedia: Enriching DBpedia with Multimedia information

Multipedia:Enriching DBpedia with

ImagesAndrés García-Silva†, Asunción Gómez-Pérez†

Max Jakob *, Pablo Mendez * and Chris Bizer �

† {hgarcia, ocorcho,asun}@fi.upm.esFacultad de Informática

Universidad Politécnica de MadridCampus de Montegancedo s/n

28660 Boadilla del Monte, Madrid, Spain

*[email protected] Systems Group

Freie Universitat Berlin, Germany

Page 2: Multipedia: Enriching DBpedia with Multimedia information

Garcia-Silva et al.

Multipedia Introduction

2

• Enriching ontologies with multimedia• The use of images and videos complement information

about concepts/entities in existing knowledge bases.

• Multimodal ontologies can help in QA systems, User Interfaces, search and recommendation processes.

Bone

Pathology

IsA

occurs

isA

depicts

depicts

«Show me X-ray Images with fractures of the Femur»

Radhouani, S., HweeLim, J.: pierre Chevallet, J., Falquet, G.: Combining textual and visual ontologies to solve medical multimodal queries. In: IEEE International Conference on Multimedia and Expo., pp. 1853-1856 (2006).

hgarcia
Cambiar la imagen por otra de internet
Page 3: Multipedia: Enriching DBpedia with Multimedia information

Garcia-Silva et al.

Multipedia

3

• Goal: Populate a general purpose ontology with images from the Web.

- Find relevant images for ontology instances with ambiguous names

• DBpedia knowledge base• Collects facts from Wikipedia containing 3.5 million entities, • Classified into a consistent cross-domain ontology: 272 classes and

1.6 million instances.• Has evolved into a hub in the linked data cloud.

• Images in DBpedia• Wikipedia images are represented in

DBpedia (foaf:depiction)• about 70% of the wikipedia articles don’t

have images

Introduction

hgarcia
1) validar el dato del 70%2) Validar el numero de classes en la DBpedia Ontology3) validar "has evolved into a " el into
Page 4: Multipedia: Enriching DBpedia with Multimedia information

Garcia-Silva et al.

Multipedia Introduction

• Challenges• Ambiguity of instance labels

4

Querying the web for images related to the resource dbpedia:hornet

Page 5: Multipedia: Enriching DBpedia with Multimedia information

5Garcia-Silva et al.

Multipedia Related Work

Approach Technique Contextual Information

Ontology

Taneva et al., 2010

Search Engine,Training data, and Visual similarity.

Wikipedia Infobox properties

YAGO Instances

Deng et al., 2009 (ImageNet)

Search engine, Visual Similarity, Amazon Mechanical Turk to assess quality

WordNet synonyms and words from parent synset

WordNet Noun Synsets

Popescu et al., 2007(RetrievOnto)

Search Engine, Content based Image Retrieval,

WordNet synonyms WordNet Synsets under Plancental

Russel et al., 2008(LabelMe)

Collaborative Manual Annotation of set of images

- WordNet Synsets

Flickr Wrapper Search Engine, Exact term match

Geographic info (latitude, longitude)

DBpedia Resources

Page 6: Multipedia: Enriching DBpedia with Multimedia information

6Garcia-Silva et al.

Multipedia Enriching DBpedia with Multimedia

Get Context

Retrieve Images

Aggregate

Generate tag-based ranking Aggregate

Wikipedia-based Context Index

Image Search Engines

Related terms

Query per context term & dbpr name

Rankings of Images(One per each query)

List of ImagesAnnotated with tags

Ranking of ImagesRanking of Images

Ranking of Images

dbpr:Hornet

Page 7: Multipedia: Enriching DBpedia with Multimedia information

7Garcia-Silva et al.

Multipedia Enriching DBpedia with Multimedia

GetContext Retrieve Images Agregate

Generate tag-based image

rankingAgregate

Wikipedia-based Context Index

Get Contextfamily, wasps, insect

Wikipedia article

dbpr:Hornet

Page 8: Multipedia: Enriching DBpedia with Multimedia information

8Garcia-Silva et al.

Multipedia Enriching DBpedia with Multimedia

GetContextRetrieve Images Agregate

Generate tag-based image

rankingAgregate

Retrieve Images

Image Search Engines

Q0=HornetQ1=Hornet and FamilyQ2=Hornet and WaspsQ3=Hornet and insect

family, wasps, insect

R0 = img0,1; img0,2 ... Img0,k

R1 = img1,1; img1,2 ... Img1,l

R2 = img2,1; img2,2 ... Img2,m

R3 = img3,1; img3,2 ... Img3,n

dbpr:Hornet

Image Rankings

Page 9: Multipedia: Enriching DBpedia with Multimedia information

9Garcia-Silva et al.

Multipedia Enriching DBpedia with Multimedia

GetContextRetrieve Images Agregate

Generate tag-based image

rankingAgregate

R0 = img0,1; img0,2 ... Img0,k

R1 = img1,1; img1,2 ... Img1,l

R2 = img2,1; img2,2 ... Img2,m

R3 = img3,1; img3,2 ... Img3,n

Aggregate

Rcontext-based= img1; img2 ... Imgp

Borda´s count• Positional Method, very easy to compute• Each query result Ri is a voter and Images imgj are candidates:

For each candidate imgj in Ri Si(imgj) = number of candidates

ranked below imgj in Ri.

Output: imgj ordered by S(imgj) value

𝑆൫𝑖𝑚𝑔𝑗൯= 𝑆𝑖(𝑖𝑚𝑔𝑗)|𝐶|𝑖=0

Page 10: Multipedia: Enriching DBpedia with Multimedia information

10Garcia-Silva et al.

Multipedia Enriching DBpedia with Multimedia

GetContextRetrieve Images Agregate

Generate tag-based image

rankingAgregate

List of images

L = R0 ᴜ R1 ᴜ R2 ᴜ R3

Generate tag-based ranking Rtag-based= img1; img2 ... Imgq

1) Measuring relatedness between a DBpedia resource and an image: - Overlapping of terms between the context of the former and the tags of the latter.

2) Vector Space Model to represent the DBpedia resource and images: - TF as weighting scheme, - cosine function to measure similarity

3) Generate ranking of images according to the similarity value

Rtag-based= img1; img2 ... Imgq

Rcontext-based= img1; img2 ... Imgp

Aggregate Rfinal= img1; img2 ... Imgl

Page 11: Multipedia: Enriching DBpedia with Multimedia information

11Garcia-Silva et al.

Multipedia Experiments

• How many context words do produce the best results?

Apple context: «juice, fruit, apples, capital, michigan, orange»

Page 12: Multipedia: Enriching DBpedia with Multimedia information

12Garcia-Silva et al.

Multipedia Experiments• Ambiguity

• Search engines work well:• unambiguous names• ambiguous names referring a dominant sense

e.g., dbpedia:Stonehenge

• However they fail for ambiguous names:

• Lacking of a dominant sensee.g.: dbpedia:Apple

• When they do not refer to the dominant sense

e.g.: dbpedia:Blackberry

Page 13: Multipedia: Enriching DBpedia with Multimedia information

13Garcia-Silva et al.

Multipedia Experiments

• Dominance:

• Dataset:• 10 Classes and 15 dbpr randomly selected per each class• Each dbpr must be: 1) popular, 2) have a dominance under 0.7 • We found dbpr for Mammals, Birds and Insects• Increasing the dominance limit to 0.9 we found dbpr for the rest

of classes.

Page 14: Multipedia: Enriching DBpedia with Multimedia information

14Garcia-Silva et al.

Multipedia Experiments

• 15 people evaluate the results of three approaches• Each image was rated by 3 evaluators

Page 15: Multipedia: Enriching DBpedia with Multimedia information

15Garcia-Silva et al.

Multipedia Experiments

Page 16: Multipedia: Enriching DBpedia with Multimedia information

16Garcia-Silva et al.

Multipedia Conclusions

• Multipedia an approach to automatically populate an ontology with images related to existing instances

• We focused on the particularly challenging problem of ambiguity in instance names

• Human-driven evaluation of the approach involving 15 users and a total of 2250 image ratings containing DBpedia resources from several classes.

• A variation of Multipedia improves average precision by 9.4% over a baseline of keyword queries to commercial image search engines

• We have validated that in contrast to the baseline our approach achieves the highest precision with ambiguous names lacking a dominant sense.