grenoble, 10/12/2010

32
1 WebdamExchange and WebdamLog: some models for web data management Alban Galland INRIA Saclay & ENS Cachan Grenoble, 10/12/2010

Upload: duane

Post on 25-Feb-2016

22 views

Category:

Documents


0 download

DESCRIPTION

WebdamExchange and WebdamLog : some models for web data management Alban Galland INRIA Saclay & ENS Cachan. Grenoble, 10/12/2010. Organization. Introduction Representing all Web information as logical sentences Representing all Web data management as logical rules - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Grenoble, 10/12/2010

1

WebdamExchange and WebdamLog: some models for web data management Alban GallandINRIA Saclay & ENS Cachan

Grenoble, 10/12/2010

Page 2: Grenoble, 10/12/2010

2

Organization

• Introduction• Representing all Web information as logical sentences• Representing all Web data management as logical rules• Some clues about implementation

• Conclusion

Page 3: Grenoble, 10/12/2010

Introduction

Page 4: Grenoble, 10/12/2010

4

Context of the work presented here

• ERC Grant Webdam on Web Data Management of Serge Abiteboul with two INRIA teams, Leo-Iasi (ex Gemo, INRIA Saclay) and Dahu (LSV, ENS Cachan)

• Joint work with many people: Émilien Antoine, Serge Abiteboul, Meghyn Bienvenu, David Gross-Amblard, Amélie Marian, Bruno Marnette, Neoklis Polyzotis, Philippe Rigaux, Marie-Christine Rousset…

Page 5: Grenoble, 10/12/2010

5

Context: Web data management• Scale: lots of users, servers, large volume of data…• Distribution heterogeneity: Cloud (social networks), P2P (DHT,

gossiping)…• Security heterogeneity: login, https, crypto, hidden URL…• Terminology heterogeneity: annotation, semantic Web, ontologies…• Incomplete information: inconsistencies, belief, trust…• The heterogeneity keeps increasing with new systems and new

applications arriving

• Consequence 1: difficulty to perform data integration/management• Consequence 2: impossibility to keep control over its own data

Page 6: Grenoble, 10/12/2010

6

Thesis: Web data = distributed knowledge

• Work plan1. Represent all Web information as logical sentences2. Represent all Web data management as logical rules3. Develop a system to validate these ideas

• Motivation for the approach• Facilitate the design/implementation of complex systems• Facilitate the control/surveillance of complex systems• Use reasoning to optimize query evaluation• Use reasoning for semantics/ontologies • Use reasoning to manage access control and protect data• Use reasoning to analyze properties of systems

Page 7: Grenoble, 10/12/2010

7

Motivating example

• Alice : get me the pictures of my friends where I am with Bob?• What is going on:

• Find the friends of Alice (The iPhone of Alice may remember it)• For each answer, say Sue, find where Sue keeps her pictures (She may

keep her pictures on Picasa)• Find the means to access Sue’s pictures (Alice may ask the private url to

a common friend)• Find the photos with Bob and Alice (e.g. by querying the meta-data)

Page 8: Grenoble, 10/12/2010

8

Motivating example

• Alice : get me the pictures of my friends where I am with Bob?• Issues: heterogeneity of friends

• Heterogeneity of hosting: Some keep their pictures on trusted servers such as Picasa, some put in on untrusted DHT, some have them on their smartphones…

• Heterogeneity of access-control: Some are public, some use login-password, some use private url, some use cryptography…

• Heterogeneity of data description: they may use different models of meta-data (taxonomies, ontologies…)

Page 9: Grenoble, 10/12/2010

Representing all Web information as logical sentences

Page 10: Grenoble, 10/12/2010

10

The information belongs to someone

• Each information belongs to a principal• A principal has an identity (URI) which can be authenticated• Two kinds of principal: peer and virtual principal

• A peer: alice-laptop, alice-iPhone, picasa, facebook, dht-peer-124, …• Storage and processing capabilities• A peer typically has a URL and can be sent query/update requests

• A virtual principal: alice, alice-friends, roc14• A virtual principal relies on peers for storage and processing

Page 11: Grenoble, 10/12/2010

11

The kind of information we are talking about

• Data: pictures, movies, music, emails, ebooks, reports• Localization: bookmarks, knowledge such as Alice has an

account in Facebook, Sue puts her pictures in Picasa• Access: login/password, access rights on servers• Annotations /Ontologies: semantic tags in Picasa ,RDFS, OWL• Services: search engines, yellow pages, dictionaries…• Incomplete information: beliefs, probabilistic information…• And more…

Page 12: Grenoble, 10/12/2010

12

Logical statements to represent information

• Data: • Document: picture34@alice-iPhone(picture34.jpg,09/12/2009,…)• Collection: pictures@alice(picture34@alice-iPhone)

• Localization: where@alice(picture37, picasa/alice)• Access right: isOwner@picasa/alice(alice)• Access secret : ownSecret@picasa/alice(“alice”, “HG-FT23”)• Ontologies: [email protected](“alice”, human-being)• Services: [email protected]($Person, $City, $Y)• Belief: picture34@alice-iPhone(picture34.jpg,09/12/2009,…,75%)• Etc.

Page 13: Grenoble, 10/12/2010

13

WebdamExchange focus: authenticated knowledge

• Base statement: • someone states picture37@alice (….)• It is annotated with a proof that “someone” can write data of alice• In the cryptographic setting, it is a signature of the whole statement using

the write secret key of alice

• Keeping trace of provenance: • alice-laptop states picture37@alice (….) requester bob at 12:30,

10/08/2009• alice-Laptop is the performer (the peer who did the update of the data of

Alice)• bob is the requester (the peer or the user who requested the update)

• The content is possibly encrypted: • alice-laptop states picture37@alice (….) protected for reader@alice

requester bob at 12:30, 10/08/2009

Page 14: Grenoble, 10/12/2010

14

WebdamExchange focus: authenticated knowledge

• Communication: external knowledge is knowledge about other principals: • alice-laptop says (alice-laptop states picture37@Alice (….) requester bob

at 12:30, 10/08/2009) to sue-iphone at 13:15, 15/10/2009• alice-laptop is the performer of the communication• sue-iphone is the receiver of the communication• External knowledge is authenticated by the performer and is stored by the

receiver .

• The external knowledge keep a trusted trace of the provenance and communication are pilled-up: • sue-iphone says (alice-laptop says (alice-laptop states picture37@Alice

(….) requester bob at 12:30, 10/08/2009) to sue-iphone at 13:15, 15/10/2009) to bob-iphone at 13:10, 15/10/2009

• The time is the time of the performer, there is no global clock

Page 15: Grenoble, 10/12/2010

15

The model covers a wide range of data

• The model does not prescribe any particular architecture for distribution• Gossiping, DHT, centralized server• Combination of these• Based on an abstract notion of localization

• The model does not prescribe how access control is enforced, e.g.:• Documents in Web servers with access protected by login/password• Documents protected by cryptographic keys in public sites• Based on an abstract notion of secret and hint

Page 16: Grenoble, 10/12/2010

16

Summary of WebdamExchange

• All the information forms a trusted knowledge base• Each peer manages some portion of the knowledge base

• Now, we have to use this distributed knowledge base … for the management of the distributed knowledge base!

Page 17: Grenoble, 10/12/2010

Representing all Web data management as logical rules

Page 18: Grenoble, 10/12/2010

18

From WebdamExchange to WebdamLog

• The logical part of the WebdamExchange statements can easily be translated into datalog facts.

• Most of the reasoning of the system can be done using the logical form and datalog-like rules

• It motivates WebdamLog, a rule-based language for web data management

Page 19: Grenoble, 10/12/2010

19

Why datalog?• Datalog: very popular in the 90’s, prehistory by Web time

+ Nicer/more compact syntax; easy to extend- Recursion not really essential

• Datalog extensions• Negation and aggregate functions tons of works on that• Updates, time, trees, distribution fewer works on it

• We use a datalog-like language influenced by• Active XML for distribution and intensional data • Hellerstein’s Dedalus for time and performance

Page 20: Grenoble, 10/12/2010

20

Webdamlog

• Facts are of the form: m@p(a1,...,an) (sorted)• Rules are of the form:

• R@P(U) :- (not) R1@P1(U1), …, (not) Rn@Pn(Un)• R,Ri are message terms• P,Pi are peer terms• U,Ui are tuples of terms• Safety condition

• Intuition: if the body holds for some valuation v, the message vR@vP(vU) is sent to the peer vP

• Issue: what happen if the body of the rules mentions different peers?

Page 21: Grenoble, 10/12/2010

21

Webdamlog

System:• A finite set of peers• Each peer p in has a local

program P(p) and some delegated program D(p) consisting of finite sets of rules

• Each peer p in has a database I(p), consisting of a finite set of facts of the form m@p(u)

Semantics: • in a state (P,D,I), choose

randomly some p • Evaluate (P(p)UD(p))(I(p))• This defines the new database

I’(p)• This adds facts and update rules

of the other peers to define (D’(q),I’(q)) for each q

• The changes to each q are installed synchronously – we will see how to avoid it if desired

• Choose another peer and keep going (in a fair way)

Peer1 Peer2

Peer3 Peer4

Page 22: Grenoble, 10/12/2010

22

Features of WebdamLog illustrated

• Alice: get me the pictures of my friends where I am with Bob?• result@alice-iphone($photo,$X) :-

friends@alice-iphone($X),findPhotos@alice-iphone($X,$R,$P),$R@$P($Photo,$Meta),contains@$P($Meta, “Alice”) , contains@$P($Meta, “Bob”)

• Peers and messages as data: they are reified• friends@alice-iphone is extensional, in I(alice-iphone)• findPhotos@alice-iphone is intensional, in P(alice-iphone)UD(alice-

iphone)• $R@$P is bounded to a relation of (possibly) another peer• contains@$P is a service of that peer

Page 23: Grenoble, 10/12/2010

23

Features of WebdamLog illustrated

• Delegation of rules

• Alice: get me the pictures of my friends where I am with Bob?• result@alice-iphone($Photo,$X) :-

friends@alice-iphone($X),findPhotos@alice-iphone($X,$R,$P),$R@$P($Photo,$Meta),contains@$P($Meta, “Alice”) , contains@$P($Meta, “Bob”)

• friends@alice-iphone(Sue);• findPhotos@alice-iphone(Sue,photos,picasa/sue) :-

• Then alice-iphone installs the following rule at picasa/sue:• result@alice-iphone($Photo,Sue) :-

photos@picasa/sue($Photo,$Meta),contains@picasa/sue($Meta, “Alice”) , contains@picasa/sue($Meta, “Bob”)

• picasa/sue will send the photos as extensional facts to alice-iphone. When Alice terminates her query, it cancels all the delegations.

Page 24: Grenoble, 10/12/2010

24

Managing rules at other peers

• This is complex• Regarding implementation, one manages instantiations of rules, i.e.,

rules and valuation• The content of valuations may be constantly changing• There could be some negations in the rules

• This is a security risk• Someone else is installing data (facts) or code (rules) in a peer • Need to control that carefully

Page 25: Grenoble, 10/12/2010

25

Does it means something?

• Some not-so trivial theorems about positive case or stratified negation case insuring • Church-rosser properties (convergence)• Natural simulation by centralized systems

• Some even-less-trivial theorems about comparing expressivity of different variations of WebdamLog: without exchanging rules, without exchanging intensional data, with time-stamp…

Page 26: Grenoble, 10/12/2010

26

More refined asynchronicity

• To model message from peer p to peer q, we may use a “peer” netpq that captures the network• Replace a call m@q(u) at p by m@netpq(u)

• netpq should just relay messages: $M@q($U) :- $M@netpq($U)• Problem: all messages from p to q in the net arrive at the same time

• Better with time • m@netpq(u,t) where t is the time of the send at p

• $M@q(U) :- $M@netpq (U,T), min( T , $M@netpq (U,T)) , using min aggregate function

Page 27: Grenoble, 10/12/2010

27

Summary of WebdamLog

• Peer are asynchronically running their datalog programs• They exchange facts and delegations of rules

Page 28: Grenoble, 10/12/2010

Some clues about implementation

Page 29: Grenoble, 10/12/2010

29

Implementation

• We are implementing two kinds of peers• WEP (Webdam Exchange Peer) – all functionalities• IWEP (iPad Webdam Exchange Peer) – limited functionalities; rely on

proxies

• We are implementing a social network on top of the system

Page 30: Grenoble, 10/12/2010

Conclusion

Page 31: Grenoble, 10/12/2010

31

Some cool results and still a lot of works

• WebdamExchange and WebdamLog models capture some nice problems of web data management: distribution, access control…• Their good semantics allow us to prove theorems!• We are implementing the corresponding system!

• Many issues are still open• Concurrency, optimization, implementation• Defining and verifying protocols (access control is not violated, one gets

all the information one has access to)• Looking for a killer application

Page 32: Grenoble, 10/12/2010