a visual analytics approach to augmenting formal concepts with relational background knowledge in a...

18
A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike Bain, Mark Temple *CSE, UNSW/School of Biomedical and Health Sciences,UWS 1 The Sixth Australasian Ontology Workshop, Adelaide University of South Australia

Upload: corbin-uzzle

Post on 14-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike

1

A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain

7th December

2010

Elma Akand*, Mike Bain, Mark Temple

*CSE, UNSW/School of Biomedical and Health Sciences,UWS

The Sixth Australasian Ontology Workshop, Adelaide University of South Australia

Page 2: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike

Outline

Machine learning and data mining in bioinformatics

Domain Ontologies in biomedical applications

Formal Concept Analysis

MCW algorithm (Mining Closed itemsets for Web apps)

BioLattice – a web based browser

Experimental Application: systems biology

Part-1: Concept ranking by gene interaction

Part-2: Relational learning of multiple-stress rules

Page 3: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike

Machine learning & Data mining in Bioinformatics

Bioinformatics

“Bioinformatics is the study of information content and information flow in biological systems and processes” (Michael Liebman,1995) Machine Learning & Data mining

-Can offer automatic knowledge acquisition

-Process to discover knowledge by analyzing data from different perspectives and can contribute greatly in building knowledge base Our work: focus on knowledge-based machine learning- Previous work: learning from ontologies - Current work: ontology construction by learning- Potential application areas: ontologies – central to eCommerce, eHealth- Current application area: systems biology – predict gene function, data integration

Page 4: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike

Ontology

In philosophy - concerned with nature and relations of being

In knowledge representation - study of categorization of things:

Informal Ontology

Formal Ontology

Natural language

First order logic or a variant

Upper Ontology

Domain Ontology

Specific

General

Ontology

Ontology – "specification of a conceptualization” (Gruber, 1993)

Conceptualization – "formalization of knowledge in declarative form” (Genesereth and Nilsson, 1987)

Page 5: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike

Gene Ontology

Missing concepts and relations

One gene annotated with different GO terms with a term specialization of other

a

b

xy

x

gene: x concepts : a ,brelations : (i) x- a (ii) x- b and (iii) b - a

Page 6: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike

Formal Concept Analysis (FCA)

Mathematical order theory (Rudolf Wille in the early 80s)

-Derives conceptual structures out of data

-Method for data analysis, knowledge representation and information management

Components

-Formal context, concept , concept lattice

four-legged

hair-covered

intelligent marine thumbed

cats x x

dogs x x

dolphins x x

gibbons x x x

humans x x

whales x x

Page 7: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike

Formal concepts in a concept lattice({cats, gibbons, dogs, dolphins, humans, whales}, {-})

Bottom

({gibbons, dolphins, humans, whales}, {intelligent})

({dolphins, whales}, {intelligent, marine})

({cats, gibbons, dogs}, {hair-covered})

({cats, dogs}, {hair-covered, four-legged})

({gibbons, humans}, {intelligent, thumbed})

({gibbons}, {intelligent, hair-covered, thumbed})

({-}, {intelligent, hair-covered, thumbed, marine, four-legged})

2

1

56

Top

3

4

Formal context: an n by m Boolean matrixm attributes A columns n objects O rows

Formal concept: Galois connection <X, Y> X is a subset of A, Y is a subset of O

Concept lattice loosely interpretable in ontology terms:concept definitions and cf. T-box

sub-concept relations

concept membership cf. A-box

by objects

Page 8: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike

FCA in data mining

FCA can be seen as a clustering technique in machine learning

-Most of the work is in a propositional framework

In data mining closed itemset mining is an efficient alternative to FCA

A frequent itemset X is closed if there exists no proper superset Y such that

Y⊃X with support(Y)=support(X)

E.g., if X = {a,b,c,d} and Y ={a,b,c,d,e} and support(Y)=support(X), then X is not closed

Parameters to avoid building entire lattice

-Extent size must be greater than minsup

Existing closed itemset mining algorithms

-Data structures to speed up closed itemset mining

-But may not build lattice, or include extents

Page 9: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike

MCW algorithm (Mining Closed itemsets for Web apps)

Vertical data format

IT-tree (itemset-tidset tree) search space

-node has X x t(X) and all children have prefix X

Pruning

- 4 set difference closure operators

Subsumption check

- A look-up table to record all attributes and their occurrences in closed concepts

Lattice

- adding concepts following a general to specific order

D

2

4

5

6

A

1

3

4

5

C

1

2

3

4

5

6

T

1

3

5

6

W

1

2

3

4

5

attribute Concept_id

D C1,C2

T C3,C4

A C4,C5

W C2,C4,C5,C6

C C1,C2,C3,C4,C5,C6,C7

Is {TA}{135} closed?i(135)={TAWC}

Page 10: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike

Closure operators

{TA}{135}={TW}{135} ->{TAW}{135}

{D}{2456}⊂{C}{123456}->{DC}{2456}

{D}{2456} and {W}{12345}->{DW}{245}

D

2

4

5

6

A

1

3

4

5

C

1

2

3

4

5

6

T

1

3

5

6

W

1

2

3

4

5Based on CHARM (Zaki, 2005)

Page 11: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike

Visual analytics

-combination of information visualization with machine learning and data analysis (Keim et al., 2008)

Visualization of concept lattice

- provides overview of the structure of the domain - means for further data analysis, e.g., classification, clustering, implication discovery, rule

learning

Previous work

- lattice navigation since Godin et al. (1993)

-Browsable concept lattice, e.g., Kim & Compton (2004)

Our current work

- on augmenting concept lattice by integrating multiple sources of knowledge (Gene Ontology, protein interactions) for further analysis & machine learning

Concept lattice as a visual analytics approach

Page 12: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike

Case study: Yeast systems biology

Page 13: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike

Browsable concept lattice

more general

Page 14: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike

Biological validation (1) : synthetic lethality

Synthetic lethal interactionif cell is viable when either gene A or B are individually deleted, but cannot grow when both are deleted.

Our results show that 72 (119) concepts in the lattice more likely than random chance at p < 0.01 (p < 0.05) to contain synthetic lethal pairs.

Page 15: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike

Protein-protein interaction data

Microarray gene-expression data

Transcription factor binding data (ChIP-chip)

Ontology data

Biochemical pathway data

Inductive Logic

Programming

concept(A):- ppi(B,A,C), ppi(B,A,E), ppi(B,C,E)tfbinds(D,C),fbinds(F,E)

First-order rule

Biological validation (2) : ILP learning of concept definitions

Page 16: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike

Transcription factors

RSM19 required for H2O2 response; RSM19, RSM22 and MRPS17 in “mitochondrial ribosomal small subunit” stable complex; and RSM22, MRPS17 bound by transcription factors under amino acid starvation.

Example rule:

Page 17: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike

Conclusions

Many real-world domains are data-intensive

Machine learning and data mining applications required to generate predictive and useful outputs

We focus on knowledge-based learning for comprehensibility – use ontologies

Formal concept analysis as a framework for ontology structure

Use data mining techniques for efficient concept lattice generation

Visual analytics approach: browsable lattice, added background knowledge

Initial validation on a case study from yeast systems biology

Page 18: A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain 7 th December 2010 Elma Akand*, Mike

Investigate pseudo-intents to simplify concept lattice

Investigate variants of concept lattice structures-e.g., concept lattice of inverse context

Add concept definitions to background knowledge in ILP

Future work