center for e-business technology seoul national university seoul, korea freebase: a collaboratively...

Post on 04-Jan-2016

218 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Center for E-Business TechnologySeoul National University

Seoul, Korea

Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge

Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, Jamie Taylor

Metaweb Technologies, Inc.

San Francisco

International Conference on Management of Data (2008)

2008. 11. 12.

Summarized & presented by Babar Tareen, IDS Lab., Seoul National University

Copyright 2008 by CEBT

Motivation – Wikipedia

Free multilingual encyclopedia

Supports 264 languages

854 Volumes of English articles

2

Copyright 2008 by CEBT

Motivation – English Wikipedia Growth

3

Copyright 2008 by CEBT

Introduction

A public repository of world’s knowledge

Inspired by The Semantic Web and Wikipedia

Supports highly diverse and heterogeneous data

Tries to merge the scalability of structured databases with the diversity of collaborative wikis into a practical, scalable, database of structured general human knowledge

The information contained in Freebase is open to anyone

However, Freebase backend database is not open

4

Copyright 2008 by CEBT

Data Sources

User Contribution

Metaweb Bots

Incorporates facts from many large, publicly available information sources

5

Copyright 2008 by CEBT

Data Model

Freebase is a graph database

Set of nodes and a set of links that establish relationships between the nodes

Key Concepts

Domains

– Bases: collections of topics created by users

– Commons: similar to bases but more general

– Film, Religion, Computers

Types

– Analogues to classes

– Film Actor, Film Festival, Film Distribution, Film Rating, Film Format

Properties

– Specific information elements within a type

– Film Performances, Film Dubbing Performances, IMDb Entry

Topics

– Analogues to objects

– Instances of a type

– Topics can be linked to other domains or other topics

6

Copyright 2008 by CEBT

Data Model (2)

7

Copyright 2008 by CEBT

Key Components

A scalable Tuple Store

An HTTP/JSON-Based API

MQL for read / write operations

A Lightweight, Collaborative Typing System

Loose collection of structuring mechanisms and conventions

A Large, Diverse Data Set

100 million asserts

4000 types

A Philosophy of “Complete Normalization”

Only one GUID for a real world object

8

Copyright 2008 by CEBT

Data Entry

9

Copyright 2008 by CEBT

Schema Creation

10

Copyright 2008 by CEBT

Data Evaluation

11

Copyright 2008 by CEBT

Metaweb Query Language

Metaweb Query Language

Who created the comic character Spider-Man ?

12

QUERY[ { "character_created_by" : null, "name" : "Spider-Man", "type" : "/fictional_universe/fictional_character" }]

{ "code" : "/api/status/ok", "q1" : { "code" : "/api/status/error", "messages" : [ { "code" : "/api/status/error/mql/result", "info" : { "count" : 2, "result" : [ "Steve Ditko", "Stan Lee" ] }, "message" : "Unique query may have at most one result. Got 2", "path" : "character_created_by", "query" : [ { "character_created_by" : null, "error_inside" : "character_created_by", "name" : "Spider-Man", "type" : "/fictional_universe/fictional_character" } ] } ] }, "status" : "200 OK", "transaction_id" : "cache;cache01.p01.sjc1:8101;2008-11-11T05:54:45Z;0021"}

Copyright 2008 by CEBT

MQL Queries

Characters created by Stan Lee

Foreign donations to 2008 US Political Candidates

Nikon Cameras in order of Resolution

Tropical Storms in the 90's

Mountains of the Himalayas

African American authors and their books

Web Browsers that run on the Mac

US cities named Canton

13

Copyright 2008 by CEBT

Applications

Parallax: Freebase Browserhttp://mqlx.com/~david/parallax/index.html

Powerset: Semantic Search Enginehttp://www.powerset.com/

ArchiPortalhttp://dev.mqlx.com/~zak/arch/

Dipity Timelineshttp://www.dipity.com/

14

Copyright 2008 by CEBT

Discussion

Simple architecture

Topics can be associated to multiple types

Analogues to having a database of knowledge

BUT, Now we have two Knowledge bases to maintain

Wikipedia

Freebase

15

Copyright 2008 by CEBT

References

Freebasehttp://www.freebase.com

The Semantic Edge (Web 2.0 Summit 2007)http://www.web2summit.com/cs/web2007/view/e_sess/15043

MQL Query Editorhttp://www.freebase.com/tools/queryeditor/

Freebase Bloghttp://blog.freebase.com/

Freebase Sample Queries http://www.freebase.com/view/freebase/freebase_query

16

top related