shahriyar hossain , munirul islam , jesmin , hasan m jamil integration informatics laboratory,...

21
Shahriyar Hossain , Munirul Islam , Jesmin , Hasan M Jamil Integration Informatics Laboratory, Computer Science, Wayne State University Department of Genetic Engineering and Biotechnology, University of Dhaka, Bangladesh BIBM 2008 07/04/22 1 PhyQL: A Phylogenetic Visual Query Engine Integration Informatics Research Group

Upload: blanche-may

Post on 14-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Shahriyar Hossain , Munirul Islam , Jesmin , Hasan M Jamil  Integration Informatics Laboratory, Computer Science, Wayne State University  Department

Shahriyar Hossain, Munirul Islam, Jesmin, Hasan M JamilIntegration Informatics Laboratory, Computer Science, Wayne State University

Department of Genetic Engineering and Biotechnology, University of Dhaka, Bangladesh

BIBM 200804/21/231

PhyQL: A Phylogenetic Visual Query Engine

Integration Informatics Research Group

Page 2: Shahriyar Hossain , Munirul Islam , Jesmin , Hasan M Jamil  Integration Informatics Laboratory, Computer Science, Wayne State University  Department

04/21/23Integration Informatics Research Group2

What is a Phylogenetic Tree?

Page 3: Shahriyar Hossain , Munirul Islam , Jesmin , Hasan M Jamil  Integration Informatics Laboratory, Computer Science, Wayne State University  Department

04/21/23Integration Informatics Research Group3

Page 4: Shahriyar Hossain , Munirul Islam , Jesmin , Hasan M Jamil  Integration Informatics Laboratory, Computer Science, Wayne State University  Department

Queries:Least Common

Ancestor

Thurs 03/20/20084

<root> <node>rayfinned fish</node> <inode> <node>lungfish</node> <inode> <inode> <node>salamanders</node> <node>frogs</node> </inode> . . . </inode> </inode></root>

for $root in doc(“tree.xml")//root return <span> <h1> { $root/node/text() } </h1> </span>Integration Informatics Research Group

Page 5: Shahriyar Hossain , Munirul Islam , Jesmin , Hasan M Jamil  Integration Informatics Laboratory, Computer Science, Wayne State University  Department

Phylogenetic Query Language:

Select: select a subset of trees that match a given criteria

Join: Join two trees based on a pair of nodesSubset: Subset queries retrieve part of a given tree

11/5/20085 Integration Informatics Research Group

Page 6: Shahriyar Hossain , Munirul Islam , Jesmin , Hasan M Jamil  Integration Informatics Laboratory, Computer Science, Wayne State University  Department

04/21/236

Using Path Operators

SubTree Projection

Tree Join

Integration Informatics Research Group

Page 7: Shahriyar Hossain , Munirul Islam , Jesmin , Hasan M Jamil  Integration Informatics Laboratory, Computer Science, Wayne State University  Department

PhyQL:

04/21/237

XSB

DB

Visual Query Interface

User

SELECT

JOIN

SUBTREE

Translator

XML /NEXUSFrom User /

Interoperable

Databases

Wrappers

Integration Informatics Research Group

Page 8: Shahriyar Hossain , Munirul Islam , Jesmin , Hasan M Jamil  Integration Informatics Laboratory, Computer Science, Wayne State University  Department

Why XSB?eliminates left recursion problem

Path(X,Z) :- Path(X,Y), Edge(Y,Z)Stores intermediate results (by tabling method)Model-based (order of writing rules doesn’t matter)

Path(X,Y) :- edge(X,Y)Path(X,Y) :- Path(X,Y), edge(Y,Z)

its in-memory database queries are an order of magnitude faster than methods such as tuProlog.

11/5/2008Integration Informatics Research Group8

:- odbc_import(conn, 'tbl_treeinfo'(‘rootId', ‘author'), tree).:- odbc_import(conn, 'tbl_nodeinfo'('nodeId', 'nodename'), node).:- odbc_import(conn, 'tbl_edge'('parentId', 'childId'), edge).

Page 9: Shahriyar Hossain , Munirul Islam , Jesmin , Hasan M Jamil  Integration Informatics Laboratory, Computer Science, Wayne State University  Department

04/21/239

<tree author="stern"> <node type=“*"> <node type=“?"> <node> Stanhopea_gibbosa </node> <node> Stanhopea_vasquezii </node> </node> <node> Stanhopea_shuttleworthii </node> </node></tree>

node(Y1, ‘Stanhopea_shuttleworthii’),node(Y2, ‘Stanhopea_gibbosa’),node(Y3, ‘Stanhopea_vasquezii),edge(Y4,Y2),edge(Y4,Y3),lca(Y0,Y4,Y1),edge(Y0,Y1)

Integration Informatics Research Group

Page 10: Shahriyar Hossain , Munirul Islam , Jesmin , Hasan M Jamil  Integration Informatics Laboratory, Computer Science, Wayne State University  Department

04/21/2310 Integration Informatics Research Group

Page 11: Shahriyar Hossain , Munirul Islam , Jesmin , Hasan M Jamil  Integration Informatics Laboratory, Computer Science, Wayne State University  Department

04/21/2311 Integration Informatics Research Group

Page 12: Shahriyar Hossain , Munirul Islam , Jesmin , Hasan M Jamil  Integration Informatics Laboratory, Computer Science, Wayne State University  Department

04/21/23Integration Informatics Research Group12Integration Informatics Research Group

Page 13: Shahriyar Hossain , Munirul Islam , Jesmin , Hasan M Jamil  Integration Informatics Laboratory, Computer Science, Wayne State University  Department

04/21/2313 Integration Informatics Research Group

Page 14: Shahriyar Hossain , Munirul Islam , Jesmin , Hasan M Jamil  Integration Informatics Laboratory, Computer Science, Wayne State University  Department

SummaryPhyQL offers a simple web-based visual query

interfaceLogic based tree query operationsModifications to query tools only requires change in

logic rulesProposed architecture can also applied to protein-

protein interaction networks, metabolic pathways etc.

Future Work:Database Interoperability – allow retrieving integrate

phylogenetic data during query submission ReQuery – query on the result setTree Similarity Estimation

04/21/2314

Page 15: Shahriyar Hossain , Munirul Islam , Jesmin , Hasan M Jamil  Integration Informatics Laboratory, Computer Science, Wayne State University  Department

Thank You!

04/21/2315 Integration Informatics Research Group

me: http://homopan.wayne.edu/PhD Students/Munirul Islam/index.htm

Page 16: Shahriyar Hossain , Munirul Islam , Jesmin , Hasan M Jamil  Integration Informatics Laboratory, Computer Science, Wayne State University  Department

Uses of Phylogenetic Trees:1. date events of

divergence of species2. what is the most

common ancestor of all living species?

3. identify geographic origins of new disease outbreaks

11/5/2008Integration Informatics Research Group16

Page 17: Shahriyar Hossain , Munirul Islam , Jesmin , Hasan M Jamil  Integration Informatics Laboratory, Computer Science, Wayne State University  Department

CrimsonUses nested subtrees to avoid long stringsZheng, Y. S. Fisher, S. Cohen, S. Guo, J. Kim, and

S. B. Davidson. 2006. Crimson: A Data Management System to Support Evaluating Phylogenetic Tree Reconstruction Algorithms. 32nd International Conference on Very Large Data Bases, ACM, pp. 1231-1234.

Page 18: Shahriyar Hossain , Munirul Islam , Jesmin , Hasan M Jamil  Integration Informatics Laboratory, Computer Science, Wayne State University  Department

A B C D E

0.1

0.1.1 0.1.2

0.2

0.2.1

0.2.1.1 0.2.1.2 0.2.2

0

Dewey system:

Integration Informatics Research Group18 11/5/2008

Page 19: Shahriyar Hossain , Munirul Islam , Jesmin , Hasan M Jamil  Integration Informatics Laboratory, Computer Science, Wayne State University  Department

Label Path

Root 0

NULL 0.1

A 0.1.1

B 0.1.2

NULL 0.2

NULL 0.2.1

C 0.2.1.1

D 0.2.1.2

E 0.2.2

A B C D E

Find clade for: Z = (<CS+Ds)

Find common pattern starting from left

SELECT * FROM nodesWHERE (path LIKE “0.2.1%”);

Integration Informatics Research Group19 11/5/2008

Page 20: Shahriyar Hossain , Munirul Islam , Jesmin , Hasan M Jamil  Integration Informatics Laboratory, Computer Science, Wayne State University  Department

A B C D E

2

3 5

8

9

10 12 15

1

4 6

7

17

11 13 16

18

14

Depth-first traversal scoring each node with a left and right ID

Integration Informatics Research Group20 11/5/2008

Page 21: Shahriyar Hossain , Munirul Islam , Jesmin , Hasan M Jamil  Integration Informatics Laboratory, Computer Science, Wayne State University  Department

Label Left Right

1 18

2 7

A 3 4

B 5 6

8 17

9 14

C 10 11

D 12 13

E 15 16

A B C D E

2

3 5

8

9

10 12 15

1

4 6

7

17

11 13 16

18

14

SELECT * FROM nodesINNER JOIN nodes AS includeON (nodes.left_id BETWEEN include.left_id AND include.right_id)WHERE include.node_id = 5 ;

Minimum Spanning Clade of Node 5

Integration Informatics Research Group21 11/5/2008