rule mining and applications in social data
DESCRIPTION
An overview of potential applications of rule mining on social graphs presented at the International Workshop of Social Media and Culture 2014.TRANSCRIPT
Rule Mining and applications in Social Data
Luis GalárragaTélécom ParisTech
Presented at:International Workshop on Social Media and Culture 2014
Daejeon, KoreaApril 4th, 2014
1
Natural Language vs Knowledge Bases (KBs)
Natural Language Knowledge Bases
2
Is a performs
born On
Feb 2, 1977
Singer
Hips don't lie
Shakira
Natural Language vs Knowledge Bases (KBs)
Natural Language Knowledge Bases
Understandable forcomputer programs
3
Suitable for humans but difficult
for computers
Some popular KBs
4
KBs in action
Some popular KBs
6
Social graphs are KBs
7
● Sources may be different but they both share: ● Natural graph-like structure likes
Luis Galárraga
Shakira
Hips don't lie
friendOf
Shamiralikes
performs
likes
Social graphs are KBs
8
● Sources may be different but they both share: ● Natural graph-like structure
● Incompleteness
likes
Luis Galárraga
friendOf
Shamira
likes
Shakira
Hips don't lie
likes
performs
likes
likes
Social graphs are KBs
9
likes
Luis Galárraga
friendOf
Shamira
likes
● Sources may be different but they both share: ● Natural graph-like structure
● Incompleteness
Shakira
Hips don't lie
likes
performs
Maybe nobody asked me? :(
Social graphs are KBs
10
likes
Luis Galárraga
friendOf
Shamira
● Sources may be different but they both share: ● Natural graph-like structure
● Incompleteness
● Opportunities for data description and prediction.
Shakira
Hips don't lie
likes
performs
likes
Social graphs are KBs
likes
Luis Galárraga
friendOf
Shamira
● Sources may be different but they both share: ● Natural graph-like structure
● Incompleteness
● Opportunities for data description and prediction.
90% of computer scientists like political party X
If you like Shakira you are likely to buy her latest song
Shakira
Hips don't lie
likes
performs
likes
11
Rule Mining and KBs
● Data Mining is about finding interesting and non-obvious correlations in data.
● Correlations are rules that hold often.● You probably live in the same city of your spouse.● If you like an artist, you like her songs.
● They can be formulated as logical rules:
12
isMarriedTo(x, y) ^ livesIn(x, city) => livesIn(y, city)
likes(x, artist) ^ performs(artist, song) => likes(x, song)
Applications for social data
● Recommendations
likes
Luis Galárraga
friendOf
Shamira
Shakira
Hips don't lie
likes
performs
likes(x, artist) ^ performs(artist, song) => likes(x, song)
likes
13
Applications for social data
● Recommendations
likes
Luis Galárraga
friendOf
Shamira
Shakira
Hips don't lie
likes
performs
likes(x, artist) ^ performs(artist, song) => likes(x, song)
likes
14
Applications for social data
● Recommendations
likes
Luis Galárraga
friendOf
Shamira
Shakira
Hips don't lie
performs
likes(x, artist) ^ performs(artist, song) => likes(x, song)
likes
likes
likes
15
Applications for social data
● Market basket analysis.● People who buy laptops also buy laptop cases.
● Link and event prediction● Two people who attended the same high school the same
year might know each other.● If you registered for this workshop, then you are coming to
Daejeon (and need to book a flight and hotel).
● Dealing with incompleteness● If you like German newspapers, fluency in German is
perhaps missing in your profile.
16
Challenges
17
Challenges of Rule Mining from KBs
● Scalability● State-of-the-art approaches for rule mining cannot
handle the size of current KBs.– YAGO: 10M entities, 120M facts– Dbpedia 3.8: 24.9M entities, 1.98B facts.– Facebook Graph: 1.2B users
● Rule Mining requires exhaustive search of the data.
18
Challenges of Rule Mining from KBs
● Scalability● State-of-the-art approaches cannot handle the size of
current KBs.– YAGO: 10M entities, 120M facts– Dbpedia 3.8: 24.9M entities, 1.98B facts.– Facebook Graph: 1.2B users
● Rule Mining requires exhaustive search of the data.
● Solution:● Language bias.● A set of pruning heuristics.● Optimized storage implementation. 19
AMIE: Association Rule Mining Under Incomplete Evidence
● AMIE is a system that learns Horn rules such as:
● Starting with all possible head relations r(x,y) and a minimum support threshold:– The system explores the search space by means of
carefully designed mining operators.– Search space is restricted to closed Horn rules.– Monotonicity of support helps pruning non-promising paths.– It relies on an optimized in-memory database.– Confidence gain is used to prune the output.
20
livesIn(x, city) ^ isMarriedTo(x, y) => livesIn(y, city)
L. Galárraga, C. Teflioudi, K. Hose, and F. M. Suchanek. AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In WWW, 2013.
y citylivesIn
21
livesIn(x, city) ^ isMarriedTo(x, y) => livesIn(y, city)
y citylivesIn
22
y citylivesIn
Add dangling atom (OD) y city
livesIn?r
isMarriedTolivesIn
….
x
23
y citylivesIn
Add dangling atom (OD) y city
livesIn?r
isMarriedTolivesIn
….
x
y citylivesInmarriedTo
x
24
y citylivesIn
Add dangling atom (OD) y city
livesIn?r
isMarriedTolivesIn
….
x
y citylivesInmarriedTo
x
Add closing atom (OC) y city
livesInx
isMarriedTo
?rlivesIndiedIn
…
25
y citylivesIn
Add dangling atom (OD) y city
livesIn?r
isMarriedTolivesIn
….
x
y citylivesInmarriedTo
x
Add closing atom (OC) y city
livesInx
isMarriedTo
?rlivesIndiedIn
…
livesIn
y citylivesIn
xisMarriedTo
26
y citylivesIn
Add dangling atom (OD) y city
livesIn?r
isMarriedTolivesIn
….
x
y citylivesInmarriedTo
x
Add closing atom (OC) y city
livesInx
isMarriedTo
?rlivesIndiedIn
…
livesIn
y citylivesIn
xisMarriedTo
27livesIn(x, city) ^ isMarriedTo(x, y) => livesIn(y, city)
AMIE: Association Rule Mining Under Incomplete Evidence
Minimum support threshold
RDF KB
k
11
Concurrent mining implementation
Tailored In-memory DB
28
AMIE: Association Rule Mining Under Incomplete Evidence
Facts RulesYAGO2 1M 3.62min 138
1M 17.76min 18K6.7M 2.89min 6.9K
Dataset Runtime
YAGO2 (const)Dbpedia (2 atoms)
AMIE finds rules in medium-size ontologies in a few minutes.
Challenges of Rule Mining on KBs
● Incompleteness● Graph data often contains gaps.
● Open World Assumption (OWA)● Absence of evidence is not evidence of absence
● Problem to estimate the confidence of a rule.
Challenges of Rule Mining on KBs
● Incompleteness● Graph data often contains gaps.
● Open World Assumption (OWA)● Absence of evidence is not evidence of absence
● Problem to estimate the confidence of a rule.
likes
Luis Galárraga
friendOf
ShamiraShakira
likes
citizenOf
likes(x, Shakira) => isCitizenOf(x, Ecuador)
Ecuador
Challenges of Rule Mining on KBs
● Incompleteness● Graph data often contains gaps.
● Open World Assumption (OWA)● Absence of evidence is not evidence of absence
● Problem to estimate the confidence of a rule.
likes(x, Shakira) => isCitizenOf(x, Ecuador) likes
Luis Galárraga
friendOf
ShamiraShakira
likes
citizenOf
Standard confidence uses a CWA and counts Shamira as counterexample. Score = 0.5
Ecuador
Challenges of Rule Mining on KBs
likes(x, Shakira) => isCitizenOf(x, Ecuador)
33
likes
Luis Galárraga
friendOf
ShamiraShakira
likes
Ecuador
AMIE uses the Partial Completeness Assumption (PCA) to estimate the confidence of rules under OWA.
A KB knows all or none of the nationalities of a person.
citizenOf
Challenges of Rule Mining on KBs
likes(x, Shakira) => isCitizenOf(x, Ecuador)
34
likes
Luis Galárraga
friendOf
ShamiraShakira
likes
Ecuador
AMIE uses the Partial Completeness Assumption (PCA) to estimate the confidence of rules under OWA.
A KB knows all or none of the nationalities of a person.
citizenOf
PCA confidence considers as counterexamples only those people whose nationality is known to be different from Ecuador. Score = 1.0
AMIE: Association Rule Mining under Incomplete Evidence
● PCA confidence has better predictive behavior than the standard confidence.
35
AMIE: Association Rule Mining under Incomplete Evidence
isMarriedTo(x, y) livesIn(x, z) => livesIn(y, z)∧isCitizenOf(x, y) => livesIn(x, y)hasAdvisor(x, y) graduatedFrom(x, z) => worksAt(y, z)∧hasWonPrize(x, Gottfried Wilhelm Leibniz Prize) => livesIn(x, Germany)
● Some rules mined by AMIE on YAGO:
Research outlook● Mine other types of logical rules for more
applications.● Numerical correlations for data description and
prediction.– If you like Justin Bieber, then you are probably less than
18.● Rules involving temporal information for event
prediction.– If a person bought a laptop today, then she will buy a hard
disk in approximately one month.– If a person traveled for Christmas to the same place in the
last two years, then she will probably do it this year.37