rule mining and applications in social data

37
Rule Mining and applications in Social Data Luis Galárraga Télécom ParisTech Presented at: International Workshop on Social Media and Culture 2014 Daejeon, Korea April 4th, 2014 1

Upload: luis-galarraga

Post on 10-May-2015

867 views

Category:

Social Media


1 download

DESCRIPTION

An overview of potential applications of rule mining on social graphs presented at the International Workshop of Social Media and Culture 2014.

TRANSCRIPT

Page 1: Rule Mining and Applications in Social Data

Rule Mining and applications in Social Data

Luis GalárragaTélécom ParisTech

Presented at:International Workshop on Social Media and Culture 2014

Daejeon, KoreaApril 4th, 2014

1

Page 2: Rule Mining and Applications in Social Data

Natural Language vs Knowledge Bases (KBs)

Natural Language Knowledge Bases

2

Is a performs

born On

Feb 2, 1977

Singer

Hips don't lie

Shakira

Page 3: Rule Mining and Applications in Social Data

Natural Language vs Knowledge Bases (KBs)

Natural Language Knowledge Bases

Understandable forcomputer programs

3

Suitable for humans but difficult

for computers

Page 4: Rule Mining and Applications in Social Data

Some popular KBs

4

Page 5: Rule Mining and Applications in Social Data

KBs in action

Page 6: Rule Mining and Applications in Social Data

Some popular KBs

6

Page 7: Rule Mining and Applications in Social Data

Social graphs are KBs

7

● Sources may be different but they both share: ● Natural graph-like structure likes

Luis Galárraga

Shakira

Hips don't lie

friendOf

Shamiralikes

performs

likes

Page 8: Rule Mining and Applications in Social Data

Social graphs are KBs

8

● Sources may be different but they both share: ● Natural graph-like structure

● Incompleteness

likes

Luis Galárraga

friendOf

Shamira

likes

Shakira

Hips don't lie

likes

performs

likes

Page 9: Rule Mining and Applications in Social Data

likes

Social graphs are KBs

9

likes

Luis Galárraga

friendOf

Shamira

likes

● Sources may be different but they both share: ● Natural graph-like structure

● Incompleteness

Shakira

Hips don't lie

likes

performs

Maybe nobody asked me? :(

Page 10: Rule Mining and Applications in Social Data

Social graphs are KBs

10

likes

Luis Galárraga

friendOf

Shamira

● Sources may be different but they both share: ● Natural graph-like structure

● Incompleteness

● Opportunities for data description and prediction.

Shakira

Hips don't lie

likes

performs

likes

Page 11: Rule Mining and Applications in Social Data

Social graphs are KBs

likes

Luis Galárraga

friendOf

Shamira

● Sources may be different but they both share: ● Natural graph-like structure

● Incompleteness

● Opportunities for data description and prediction.

90% of computer scientists like political party X

If you like Shakira you are likely to buy her latest song

Shakira

Hips don't lie

likes

performs

likes

11

Page 12: Rule Mining and Applications in Social Data

Rule Mining and KBs

● Data Mining is about finding interesting and non-obvious correlations in data.

● Correlations are rules that hold often.● You probably live in the same city of your spouse.● If you like an artist, you like her songs.

● They can be formulated as logical rules:

12

isMarriedTo(x, y) ^ livesIn(x, city) => livesIn(y, city)

likes(x, artist) ^ performs(artist, song) => likes(x, song)

Page 13: Rule Mining and Applications in Social Data

Applications for social data

● Recommendations

likes

Luis Galárraga

friendOf

Shamira

Shakira

Hips don't lie

likes

performs

likes(x, artist) ^ performs(artist, song) => likes(x, song)

likes

13

Page 14: Rule Mining and Applications in Social Data

Applications for social data

● Recommendations

likes

Luis Galárraga

friendOf

Shamira

Shakira

Hips don't lie

likes

performs

likes(x, artist) ^ performs(artist, song) => likes(x, song)

likes

14

Page 15: Rule Mining and Applications in Social Data

Applications for social data

● Recommendations

likes

Luis Galárraga

friendOf

Shamira

Shakira

Hips don't lie

performs

likes(x, artist) ^ performs(artist, song) => likes(x, song)

likes

likes

likes

15

Page 16: Rule Mining and Applications in Social Data

Applications for social data

● Market basket analysis.● People who buy laptops also buy laptop cases.

● Link and event prediction● Two people who attended the same high school the same

year might know each other.● If you registered for this workshop, then you are coming to

Daejeon (and need to book a flight and hotel).

● Dealing with incompleteness● If you like German newspapers, fluency in German is

perhaps missing in your profile.

16

Page 17: Rule Mining and Applications in Social Data

Challenges

17

Page 18: Rule Mining and Applications in Social Data

Challenges of Rule Mining from KBs

● Scalability● State-of-the-art approaches for rule mining cannot

handle the size of current KBs.– YAGO: 10M entities, 120M facts– Dbpedia 3.8: 24.9M entities, 1.98B facts.– Facebook Graph: 1.2B users

● Rule Mining requires exhaustive search of the data.

18

Page 19: Rule Mining and Applications in Social Data

Challenges of Rule Mining from KBs

● Scalability● State-of-the-art approaches cannot handle the size of

current KBs.– YAGO: 10M entities, 120M facts– Dbpedia 3.8: 24.9M entities, 1.98B facts.– Facebook Graph: 1.2B users

● Rule Mining requires exhaustive search of the data.

● Solution:● Language bias.● A set of pruning heuristics.● Optimized storage implementation. 19

Page 20: Rule Mining and Applications in Social Data

AMIE: Association Rule Mining Under Incomplete Evidence

● AMIE is a system that learns Horn rules such as:

● Starting with all possible head relations r(x,y) and a minimum support threshold:– The system explores the search space by means of

carefully designed mining operators.– Search space is restricted to closed Horn rules.– Monotonicity of support helps pruning non-promising paths.– It relies on an optimized in-memory database.– Confidence gain is used to prune the output.

20

livesIn(x, city) ^ isMarriedTo(x, y) => livesIn(y, city)

L. Galárraga, C. Teflioudi, K. Hose, and F. M. Suchanek. AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In WWW, 2013.

Page 21: Rule Mining and Applications in Social Data

y citylivesIn

21

livesIn(x, city) ^ isMarriedTo(x, y) => livesIn(y, city)

Page 22: Rule Mining and Applications in Social Data

y citylivesIn

22

Page 23: Rule Mining and Applications in Social Data

y citylivesIn

Add dangling atom (OD) y city

livesIn?r

isMarriedTolivesIn

….

x

23

Page 24: Rule Mining and Applications in Social Data

y citylivesIn

Add dangling atom (OD) y city

livesIn?r

isMarriedTolivesIn

….

x

y citylivesInmarriedTo

x

24

Page 25: Rule Mining and Applications in Social Data

y citylivesIn

Add dangling atom (OD) y city

livesIn?r

isMarriedTolivesIn

….

x

y citylivesInmarriedTo

x

Add closing atom (OC) y city

livesInx

isMarriedTo

?rlivesIndiedIn

25

Page 26: Rule Mining and Applications in Social Data

y citylivesIn

Add dangling atom (OD) y city

livesIn?r

isMarriedTolivesIn

….

x

y citylivesInmarriedTo

x

Add closing atom (OC) y city

livesInx

isMarriedTo

?rlivesIndiedIn

livesIn

y citylivesIn

xisMarriedTo

26

Page 27: Rule Mining and Applications in Social Data

y citylivesIn

Add dangling atom (OD) y city

livesIn?r

isMarriedTolivesIn

….

x

y citylivesInmarriedTo

x

Add closing atom (OC) y city

livesInx

isMarriedTo

?rlivesIndiedIn

livesIn

y citylivesIn

xisMarriedTo

27livesIn(x, city) ^ isMarriedTo(x, y) => livesIn(y, city)

Page 28: Rule Mining and Applications in Social Data

AMIE: Association Rule Mining Under Incomplete Evidence

Minimum support threshold

RDF KB

k

11

Concurrent mining implementation

Tailored In-memory DB

28

Page 29: Rule Mining and Applications in Social Data

AMIE: Association Rule Mining Under Incomplete Evidence

Facts RulesYAGO2 1M 3.62min 138

1M 17.76min 18K6.7M 2.89min 6.9K

Dataset Runtime

YAGO2 (const)Dbpedia (2 atoms)

AMIE finds rules in medium-size ontologies in a few minutes.

Page 30: Rule Mining and Applications in Social Data

Challenges of Rule Mining on KBs

● Incompleteness● Graph data often contains gaps.

● Open World Assumption (OWA)● Absence of evidence is not evidence of absence

● Problem to estimate the confidence of a rule.

Page 31: Rule Mining and Applications in Social Data

Challenges of Rule Mining on KBs

● Incompleteness● Graph data often contains gaps.

● Open World Assumption (OWA)● Absence of evidence is not evidence of absence

● Problem to estimate the confidence of a rule.

likes

Luis Galárraga

friendOf

ShamiraShakira

likes

citizenOf

likes(x, Shakira) => isCitizenOf(x, Ecuador)

Ecuador

Page 32: Rule Mining and Applications in Social Data

Challenges of Rule Mining on KBs

● Incompleteness● Graph data often contains gaps.

● Open World Assumption (OWA)● Absence of evidence is not evidence of absence

● Problem to estimate the confidence of a rule.

likes(x, Shakira) => isCitizenOf(x, Ecuador) likes

Luis Galárraga

friendOf

ShamiraShakira

likes

citizenOf

Standard confidence uses a CWA and counts Shamira as counterexample. Score = 0.5

Ecuador

Page 33: Rule Mining and Applications in Social Data

Challenges of Rule Mining on KBs

likes(x, Shakira) => isCitizenOf(x, Ecuador)

33

likes

Luis Galárraga

friendOf

ShamiraShakira

likes

Ecuador

AMIE uses the Partial Completeness Assumption (PCA) to estimate the confidence of rules under OWA.

A KB knows all or none of the nationalities of a person.

citizenOf

Page 34: Rule Mining and Applications in Social Data

Challenges of Rule Mining on KBs

likes(x, Shakira) => isCitizenOf(x, Ecuador)

34

likes

Luis Galárraga

friendOf

ShamiraShakira

likes

Ecuador

AMIE uses the Partial Completeness Assumption (PCA) to estimate the confidence of rules under OWA.

A KB knows all or none of the nationalities of a person.

citizenOf

PCA confidence considers as counterexamples only those people whose nationality is known to be different from Ecuador. Score = 1.0

Page 35: Rule Mining and Applications in Social Data

AMIE: Association Rule Mining under Incomplete Evidence

● PCA confidence has better predictive behavior than the standard confidence.

35

Page 36: Rule Mining and Applications in Social Data

AMIE: Association Rule Mining under Incomplete Evidence

isMarriedTo(x, y) livesIn(x, z) => livesIn(y, z)∧isCitizenOf(x, y) => livesIn(x, y)hasAdvisor(x, y) graduatedFrom(x, z) => worksAt(y, z)∧hasWonPrize(x, Gottfried Wilhelm Leibniz Prize) => livesIn(x, Germany)

● Some rules mined by AMIE on YAGO:

Page 37: Rule Mining and Applications in Social Data

Research outlook● Mine other types of logical rules for more

applications.● Numerical correlations for data description and

prediction.– If you like Justin Bieber, then you are probably less than

18.● Rules involving temporal information for event

prediction.– If a person bought a laptop today, then she will buy a hard

disk in approximately one month.– If a person traveled for Christmas to the same place in the

last two years, then she will probably do it this year.37