cpts 570 – machine learning school of eecs …holder/courses/cpts570/fall10/slides/... ·...

55
CptS 570 – Machine Learning School of EECS Washington State University CptS 570 - Machine Learning 1

Upload: phungmien

Post on 31-Aug-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

CptS 570 – Machine LearningSchool of EECS

Washington State University

CptS 570 - Machine Learning 1

Page 2: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Relational data Logic-based representation Graph-based representation Propositionalization Inductive Logic Programming (ILP) Graph-based relational learning Applications

CptS 570 - Machine Learning 2

Page 3: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

So far, training data have been propositional◦ Each instance represents one entity and its features

Learned hypotheses have also been propositional◦ If Income > 250,000 Then Rich

CptS 570 - Machine Learning 3

ID First Name Last name Age IncomeP1 John Doe 30 120,000P2 Jane Doe 29 140,000P3 Robert Smith 45 280,000… … … … …

Person

Page 4: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Entities may be related to each other

Learned hypotheses should allow relations◦ If Income(Person1,Income1) and

Income(Person2,Income2) and Married(Person1,Person2) and (Income1+Income2)>250,000 ThenRichCouple(Person1,Person2)

CptS 570 - Machine Learning 4

Person1 Person2P1 P2P3 P7… …

Married

Page 5: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Logic-based representation Data◦ Person(ID, FirstName,LastName,Income) Person(P1,John,Doe,120,000)◦ Married(Person1,Person2) Married(P1,P2), Married(P2,P1)

Hypotheses◦ If Person(ID1,FirstName1,LastName1,Income1) and

Person(ID2,FirstName2,LastName2,Income2) and Married(ID1,ID2) and (Income1+Income2)>250,000 Then RichCouple(ID1,ID2)

CptS 570 - Machine Learning 5

Page 6: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Graph-based representation Data

CptS 570 - Machine Learning 6

Person

Doe John

12000030

Last First

Age IncomePerson

Doe Jane

14000029

LastFirst

Age Income

MarriedP2IDP1 ID

Person

Smith Robert

28000045

Last First

Age Income

Married …P1 ID

Page 7: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Graph-based representation Hypotheses

CptS 570 - Machine Learning 7

Person

X

IncomePerson

Y

Income

Married

+

Z

>

250000

Operand Operand

Result

Operand Operand

true

Result

Page 8: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Logical rule◦ Instance consists of relations E.g., Person(), Married(), …◦ Check if rule is matched by new instance◦ Unification (NP-Complete)

Graphical rule◦ Instance consists of a graph◦ Check if rule matches a subgraph of instance◦ Subgraph isomorphism (NP-Complete)

Many polynomial-time specializations exist (e.g., Horn clauses, trees)

CptS 570 - Machine Learning 8

Page 9: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Create new single table combining all relations

Apply propositional learner Number of fields in new table can grow

exponentially

CptS 570 - Machine Learning 9

First Name1

Last Name1

Age1 Income1 FirstName2

LastName2

Age2 Income2 Married

John Doe 30 120,000 Jane Doe 29 140,000 Yes

Jane Doe 29 140,000 Robert Smith 45 280,000 No

Robert Smith 45 280,000 Jane Doe 29 140,000 No

… … … … … … … … …

Page 10: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Terminology◦ Relations are predicates (e.g., person, married)◦ Predicate p(a1,a2,…,an) has arguments a1, a2, …, an◦ Arguments can be constants (e.g., sally) or variables

(e.g., X, Income1)◦ A predicate is ground if it contains no variables

CptS 570 - Machine Learning 10

Page 11: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Terminology◦ A literal is a predicate or its negation◦ A clause is a disjunction of literals◦ A Horn clause has at most one positive literal◦ A definite clause has exactly one positive literal

(a ⋀ b c) ≡ (~a ⋁ ~b ⋁ c) Adopt Prolog syntax◦ Predicates and constants are lowercase◦ Variables are uppercase◦ E.g., married(X,Y), person(p1,john,doe,30,120000)

CptS 570 - Machine Learning 11

Page 12: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Given◦ Training examples (xt,rt) ∈ X xt is a set of facts (ground predicates) rt is a ground predicate◦ Background knowledge B B is a set of predicates and rules (definite clauses)

Find hypothesis h such that◦ (∀(xt,rt) ∈ X) B ⋀ h ⋀ xt ⊢ rt

◦ where ⊢ means entails (can deduce)

CptS 570 - Machine Learning 12

Page 13: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Example◦ Learn concept of child(X,Y): Y is the child of X◦ rt: child(bob,sharon)◦ xt: male(bob), female(sharon), father(sharon,bob)◦ B: parent(U,V) father(U,V)◦ h1: child(X,Y) father(Y,X)◦ h2: child(X,Y) parent(Y,X)

CptS 570 - Machine Learning 13

Page 14: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

First-Order Induction of Logic (FOIL) Learns Horn clauses Set covering algorithm◦ Seeks rules covering subsets of positive examples

Each new rule generalizes the learned hypothesis

Each conjunct added to a rule specializes the rule

CptS 570 - Machine Learning 14

Page 15: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

CptS 570 - Machine Learning 15

Page 16: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Learning rule p(X1,X2,…,Xk) L1 ⋀ … ⋀ Ln Candidate specializations add new literal of

form:◦ Q(V1,…,Vr) where at least one of Vi must already

exist as a variable in the rule◦ Equal(Xj,Xk) where Xj and Xk are variables present in

the rule◦ The negation of either of the above forms of literals

CptS 570 - Machine Learning 16

Page 17: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

L is the candidate literal to add to rule R p0 = number of positive bindings of R n0 = number of negative bindings of R p1 = number of positive bindings of R+L n1 = number of negative bindings of R+L t = number of positive bindings of R and R+L Note: -log2(p0 /(p0 + n0)) = number of bits to

indicate the class of a positive binding of R

CptS 570 - Machine Learning 17

+

−+

=00

02

11

12 loglog),(

npp

npptRLFoilGain

Page 18: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Target concept◦ canReach(X,Y) true if directed path from X to Y

Examples◦ Pairs of nodes for which path exists (e.g., <1,5>)◦ Graph described by literals E.g., linkedTo(0,1), linkedTo(0,8)

Hypothesis space◦ Horn clauses using predicates linkedTo and canReach

CptS 570 - Machine Learning 18

Page 19: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

CptS 570 - Machine Learning 19

X: 0, 1, 2, 3, 4, 5, 6, 7, 8.

canreach(X,X)0,10,20,30,40,50,60,70,81,21,31,41,51,61,71,8…

2,32,42,52,62,72,83,43,53,63,73,84,54,64,74,85,65,75,86,76,87,8.…

*linkedto(X,X)0,10,21,22,33,43,84,54,85,66,76,87,8.

Page 20: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

CptS 570 - Machine Learning 20

FOIL 6.4 [January 1996]--------

Relation canreach

Relation *linkedto

----------canreach:

State (36/81, 91.4 bits available)

Save clause ending with linkedto(A,B) (cover 12, accuracy 100%)

Save linkedto(C,B) (36,72 value 6.0)

Best literal linkedto(A,B) (4.6 bits)

Clause 0: canreach(A,B) :- linkedto(A,B).

Page 21: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

CptS 570 - Machine Learning 21

State (24/69, 81.4 bits available)

Save clause ending with not(linkedto(C,A)) (cover 6, accuracy 85%)

Save linkedto(C,B) (24,60 value 4.8)Save not(linkedto(B,A)) (24,57 value 6.5)

Best literal not(linkedto(C,A)) (4.6 bits)

State (6/7, 33.5 bits available)

Save clause ending with A<>B (cover 6, accuracy 100%)

Best literal A<>B (2.0 bits)

Clause 1: canreach(A,B) :- not(linkedto(C,A)), A<>B.

State (18/63, 71.5 bits available)

Save not(linkedto(B,A)) (18,51 value 5.4)

Best literal linkedto(C,B) (4.6 bits)…

Page 22: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

CptS 570 - Machine Learning 22

State (27/73 [18/54], 66.9 bits available)

Save clause ending with canreach(A,C) (cover 18, accuracy 100%)

Best literal canreach(A,C) (4.2 bits)

Clause 2: canreach(A,B) :- linkedto(C,B), canreach(A,C).

Delete clausecanreach(A,B) :- not(linkedto(C,A)), A<>B.

canreach(A,B) :- linkedto(A,B).canreach(A,B) :- linkedto(C,B), canreach(A,C).

Time 0.0 secs

Page 23: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Resolution rule

Given initial clauses C1 and C2, find a literal L from clause C1 such that ¬L occurs in clause C2

Form the resolvent C by including all literals from C1 and C2, except for L and ¬L◦ C = (C1 − {L}) ∪ (C2 − {¬L})◦ where ∪ denotes set union, and “−” denotes set

difference

CptS 570 - Machine Learning 23

Page 24: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

CptS 570 - Machine Learning 24

Page 25: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Propositional Given initial clauses C1 and C, find a literal L

that occurs in clause C1, but not in clause C Form the second clause C2 by including the

following literals◦ C2 = (C − (C1 − {L})) ∪ {¬L}

CptS 570 - Machine Learning 25

Page 26: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

First-order resolution◦ Find a literal L1 from clause C1, literal L2 from

clause C2, and substitution θ such that L1θ = ¬L2θ◦ Form the resolvent C by including all literals from

C1θ and C2θ, except for L1θ and ¬L2θ◦ C = (C1 − {L1})θ ∪ (C2 − {L2})θ

Inverting first-order resolution◦ C2 = (C − (C1 − {L1})θ1)θ2

-1 ∪ {¬L1θ1θ2-1}

CptS 570 - Machine Learning 26

Page 27: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

CptS 570 - Machine Learning 27

Page 28: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Reduce combinatorial explosion by generating the most specific acceptable h

User specifies H by stating predicates, functions, and forms of arguments allowed for each

Progol uses set covering algorithm◦ For each <xt,rt>◦ Find most specific hypothesis ht s.t. B ∧ ht ∧ xt ⊢ rt

actually, considers only k-step entailment Conduct general-to-specific search bounded by

specific hypothesis ht, choosing hypothesis with minimum description length

CptS 570 - Machine Learning 28

Page 29: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

CptS 570 - Machine Learning 29

% Learning aunt_of from parent_of and sister_of.

% Settings

:- set(posonly)?

% Mode declarations

:- modeh(1,aunt_of(+person,+person))?:- modeb(*,parent_of(-person,+person))?:- modeb(*,parent_of(+person,-person))?:- modeb(*,sister_of(+person,-person))?

% Types

person(jane).person(henry).person(sally).person(jim).person(sam).person(sarah).person(judy).…

Page 30: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

CptS 570 - Machine Learning 30

% Background knowledge

parent_of(Parent,Child) :- father_of(Parent,Child).parent_of(Parent,Child) :- mother_of(Parent,Child).

father_of(sam,henry).

mother_of(sarah,jim).

sister_of(jane,sam).sister_of(sally,sarah).sister_of(judy,sarah).

% Examples

aunt_of(jane,henry).aunt_of(sally,jim).aunt_of(judy,jim).

Page 31: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

CptS 570 - Machine Learning 31

CProgol Version 4.4

[Noise has been set to 100%][Example inflation has been set to 400%][The posonly flag has been turned ON][:- set(posonly)? - Time taken 0.00s][:- modeh(1,aunt_of(+person,+person))? - Time taken 0.00s][:- modeb(100,parent_of(-person,+person))? - Time taken 0.00s][:- modeb(100,parent_of(+person,-person))? - Time taken 0.00s][:- modeb(100,sister_of(+person,-person))? - Time taken 0.00s][Testing for contradictions][No contradictions found][Generalising aunt_of(jane,henry).][Most specific clause is]

aunt_of(A,B) :- parent_of(C,B), sister_of(A,C).…

Page 32: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

CptS 570 - Machine Learning 32

[Learning aunt_of/2 from positive examples][C:-0,12,11,0 aunt_of(A,B).][C:6,12,4,0 aunt_of(A,B) :- parent_of(C,B).][C:6,12,3,0 aunt_of(A,B) :- parent_of(C,B), sister_of(A,C).][C:6,12,3,0 aunt_of(A,B) :- parent_of(C,B), sister_of(A,D).][C:4,12,6,0 aunt_of(A,B) :- sister_of(A,C).][5 explored search nodes]f=6,p=12,n=3,h=0[Result of search is]

aunt_of(A,B) :- parent_of(C,B), sister_of(A,C).

[3 redundant clauses retracted]

aunt_of(A,B) :- parent_of(C,B), sister_of(A,C).

[Total number of clauses = 1]

[Time taken 0.00s]

Page 33: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

TILDE [Blockheel & De Raedt, 1998]◦ Relational extension to C4.5

CptS 570 - Machine Learning 33

Page 34: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

RIBL [Emde and Wettschereck, 1996]◦ Distance measure compares top-level objects, then

objects related to them, and so on

CptS 570 - Machine Learning 34

Page 35: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Unsupervised learning◦ Frequent subgraphs◦ Compressing subgraphs

Supervised learning◦ Graph features◦ Graph kernels◦ Relational decision trees◦ Relational instance-based learning

CptS 570 - Machine Learning 35

Page 36: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Graph G = (V,E), where V is a set of verticesand E is a set of edges (u,v) such that u,v ∈ V

Edges may be directed or undirected Vertices and/or edges may have labels Graph G1=(V1,E1) is isomorphic to G2=(V2,E2)

if there exists a bijective mapping f:V1→V2such that (u,v) ∈ E1 iff (f(u),f(v)) ∈ E2◦ The complexity of checking if two graphs are

isomorphic is unknown, but expensive

CptS 570 - Machine Learning 36

Page 37: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Graph G1=(V1,E1) is a subgraph of G2=(V2,E2) if V1⊆V2 and E1⊆E2

G1 is an induced subgraph of G2 if V1⊆V2 and E1={(u,v) ∈ E2 | u,v ∈ V1}

Subgraph isomorphism: G1 is isomorphic to a subgraph of G2◦ Subgraph isomorphism is NP-Complete

CptS 570 - Machine Learning 37

Page 38: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

CptS 570 - Machine Learning 38

Page 39: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Adjacency matrix

All permutations of rows and columns yield an identical (isomorphic) graph

Canonical form◦ Generate code by concatenating rows of adjacency

matrix◦ Choose code that is lexicographically smallest or largest

CptS 570 - Machine Learning 39

Page 40: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Minimum DFS code◦ Concatenate

edges of DFS trees

◦ Take smallest lexicographic order min(G)

◦ Canonical label◦ Two graphs G1

and G2isomorphic iffmin(G1)=min(G2)

CptS 570 - Machine Learning 40

Page 41: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Given a set of graphs G and minimum support threshold t

Find all subgraphs gs such that

where gs ⊆ g mean gs is isomorphic to a subgraph of g

CptS 570 - Machine Learning 41

tG

ggGg s ≥⊆∈ }|{

Page 42: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Systems◦ Frequent Sub-Graph discovery (FSG) Kuramochi and Karypis, 2001◦ Graph-based Substructure pattern mining (gSpan) Yan and Han, 2002 (uses DFS codes)◦ Apriori-based Graph Mining (AGM) Inokuchi, Washio and Motoda, 2003◦ GrAph/Sequence/Tree extractiON (GASTON) Nijssen and Kok, 2004 (www.liacs.nl/~snijssen/gaston)

Focus on pruning and fast, code-based graph matching

CptS 570 - Machine Learning 42

Page 43: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Minimum Description Length (MDL) principle◦ Best theory minimizes description length of theory

and the data given theory Best subgraph S minimizes description length

of subgraph definition DL(S) and compressed graph DL(G|S)

CptS 570 - Machine Learning 43

))|()((min SGDLSDLS

+

Page 44: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

CptS 570 - Machine Learning 44

S1

S1

S1

S1

S1

Page 45: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Systems◦ Graph-Based Induction (GBI) Yoshida, Motoda and Indurkhya, 1994◦ SUBstructure Discovery Using Examples (SUBDUE) Cook and Holder, 1994

Focus on efficient subgraph generation and compression-based heuristic search

CptS 570 - Machine Learning 45

Page 46: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

CptS 570 - Machine Learning 46

v 1 objectv 2 objectv 3 objectv 4 objectv 5 objectv 6 objectv 7 objectv 8 objectv 9 objectv 10 objectv 11 trianglev 12 trianglev 13 trianglev 14 trianglev 15 squarev 16 squarev 17 squarev 18 squarev 19 circlev 20 rectangle

e 1 11 shapee 2 12 shapee 3 13 shapee 4 14 shapee 5 15 shapee 6 16 shapee 7 17 shapee 8 18 shapee 9 19 shapee 10 20 shapee 1 5 one 2 6 one 3 7 one 4 8 one 5 10 one 9 10 one 10 2 one 10 3 one 10 4 on

sample.g:

R1

C1T1S1

T2S2

T3S3

T4S4

Page 47: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

CptS 570 - Machine Learning 47

SUBDUE 5.2.1

Parameters:Input file..................... sample.gPredefined substructure file... noneOutput file.................... noneBeam width..................... 4Compress....................... falseEvaluation method.............. MDL'e' edges directed............. trueIncremental.................... falseIterations..................... 1Limit.......................... 9Minimum size of substructures.. 1Maximum size of substructures.. 20Number of best substructures... 3Output level................... 2Allow overlapping instances.... falsePrune.......................... falseThreshold...................... 0.000000Value-based queue.............. falseRecursion...................... false

Page 48: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

CptS 570 - Machine Learning 48

Read 1 total positive graphs

1 positive graphs: 20 vertices, 19 edges, 252 bits7 unique labels

3 initial substructures

Best 3 substructures:

(1) Substructure: value = 1.86819, pos instances = 4, neg instances = 0Graph(4v,3e):v 1 objectv 2 objectv 3 trianglev 4 squared 1 3 shaped 2 4 shaped 1 2 on

Page 49: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

CptS 570 - Machine Learning 49

(2) Substructure: value = 1.37785, pos instances = 4, neg instances = 0Graph(3v,2e):v 1 objectv 2 objectv 3 squared 2 3 shaped 1 2 on

(3) Substructure: value = 1.37219, pos instances = 4, neg instances = 0Graph(3v,2e):v 1 objectv 2 objectv 3 triangled 1 3 shaped 1 2 on

SUBDUE done (elapsed CPU time = 0.00 seconds).

Page 50: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Given positive graphs G+ and negative graphs G-

Find subgraph S minimizing DL(G+|S) / DL(G-|S) If |G+|≫1 and |G-|≫1, find subgraph S minimizing

CptS 570 - Machine Learning 50

SUBDUE

PositiveGraphs

NegativeGraphs

Pattern(s)

|||||}|{||}|{|

−+

−+

+∈⊂+∈⊄

=GG

GGGSGGGSerror

Page 51: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

DT-GBI (Geamsakul et al., 2003)◦ Decision Tree Graph-based Induction

Graph instance-based learning◦ Graph edit distance Minimum cost changes to transform one graph into

another (add/delete/change node, edge, direction, label)

CptS 570 - Machine Learning 51

Page 52: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Use feature vector based on presence of frequent or compressing subgraphs

Graph kernels◦ Compare substructures of graphs E.g., walks, paths, cycles, trees, subgraphs◦ K(G1,G2) = number of identical random walks in

both graphs◦ K(G1,G2) = number of subgraphs shared by both

graphs

CptS 570 - Machine Learning 52

Page 53: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Natural language understanding

Drug design Protein structure

prediction ()

CptS 570 - Machine Learning 53

[Muggleton et al., 1992]

Page 54: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Social networks () Computer networks WWW, Internet

Biological networks () Drug design

CptS 570 - Machine Learning 54

Page 55: CptS 570 – Machine Learning School of EECS …holder/courses/CptS570/fall10/slides/... · Person(P1,John,Doe,120,000) Married(Person1,Person2) Married(P1,P2), ... [Generalising

Exploit relational information in data Inductive Logic Programming (ILP)◦ FOIL, Progol

Graph-based relational learning◦ SUBDUE

Numerous applications Computationally expensive

CptS 570 - Machine Learning 55