development of a chicken unigene database

11
Development of a Chicken Unigene Database Project No. 9 Mentors: Dr. Wellington Martins - Dr. Joan Burnsid Animal Science Dept. University of Delaware Jianshan Tang Ruoming Jin Department of CIS University of Delaware Lilian Lacoste DBI - French National School of Aeronautics and Space

Upload: isabelle-rowen

Post on 03-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Development of a Chicken Unigene Database. Project No. 9. Ruoming Jin. Lilian Lacoste. Jianshan Tang. Department of CIS University of Delaware. Animal Science Dept. University of Delaware. DBI - French National School of Aeronautics and Space. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Development of a Chicken Unigene Database

Development of a Chicken Unigene Database

Project No. 9

Mentors: Dr. Wellington Martins - Dr. Joan Burnside

Animal Science Dept.University of Delaware

Jianshan Tang Ruoming Jin

Department of CIS

University of Delaware

Lilian Lacoste

DBI - French National School of Aeronautics

and Space

Page 2: Development of a Chicken Unigene Database

Results

2815 contigs 6390 singlets

17,090 ESTsPhrap

9,205 cluster

Phrap Clustering Result:

Page 3: Development of a Chicken Unigene Database

Second clustering method : using BLAST output

Contig 1

BLASToutput1

Contig 2

BLASToutput2

FilteringParsing

Comparing

Similarity function

Similarity matrix

Page 4: Development of a Chicken Unigene Database

Whats gbc?

Graph Based Clustering Clustering, a process of partitioning a set of data (or

objects) in a set of meaningful sub-classes, called clusters. Graph, the relation of the data could be expressed as

graph If there is a relation of two nodes, one edge connects them

Working in bioinformatics Protein sequence clustering EST clustering A lot of other applications!

Objective of "gbc" Support different input format Efficiently support very large sparse graph clustering Flexible to use by user

Page 5: Development of a Chicken Unigene Database

How to use gbc

Output Cluster number, and all the nodes belongs

to the cluster Clique clustering

a clique is a completely connected subgraph each maximal clique in the graph becomes a cluster clusters many overlap generally produces small but very tight clusters

Single-link clustering A maximal connected subgraph becomes a cluster produces larger but weaker clusters

Page 6: Development of a Chicken Unigene Database

A little about Implementation Works

Two clustering algorithm Single-link Clique

Graph Classes Efficiently support dense/sparse

graph Provide the same interface without

modifying clustering code

Page 7: Development of a Chicken Unigene Database

Analysis program

Reset BLAST output

Change matrix thresholdReset semantics

Run analysisNew contig set

Number ofcontigs

Comparisonalgorithm

Clusteringalgorithm

Resultsoutput

Analysis tools

Processlog output

Page 8: Development of a Chicken Unigene Database

Analysis tools : contig information

Display the BLAST output :- sequences references- sequences annotations- percentage of matching basepairs

Display the list of contigs sortedaccording to their best matching percentage in the BLAST output

Page 9: Development of a Chicken Unigene Database

Analysis tool : EST selector

Display :- frequency vs length (in ESTs)of contigs- list of ESTs in a contig

Allows to select the best representative EST accordingto length and tissue type

Page 10: Development of a Chicken Unigene Database

First results

On a set of 400 contigs representing 1000 ESTs

Contig number :79Contig size :743Best matching fraction :0.43587786259541983gb|AF178529.1|AF178529 Gallus gallus Rad54b (RAD54B) mRNA, compl... 571 e-160gb|BC001965.1|BC001965 Homo sapiens, RAD54, S. cerevisiae, homol... 143 2e-31ref|XM_005161.3| Homo sapiens RAD54, S. cerevisiae, homolog of, ... 143 2e-31gb|AF112481.1|AF112481 Homo sapiens RAD54B protein (RAD54B) mRNA... 143 2e-31ref|NM_012415.1| Homo sapiens RAD54, S. cerevisiae, homolog of, ... 143 2e-31emb|AL133578.1|HSM801429 Homo sapiens mRNA; cDNA DKFZp434J1672 (... 143 2e-31dbj|AP003534.1|AP003534 Homo sapiens genomic DNA, chromosome 8q2... 76 3e-11gb|AC009623.6|AC009623 Homo sapiens chromosome 8, clone RP11-219... 40 1.7

Contig number :133Contig size :740Best matching fraction :0.9413109756097561gb|AF178529.1|AF178529 Gallus gallus Rad54b (RAD54B) mRNA, compl... 1235 0.0gb|BC001965.1|BC001965 Homo sapiens, RAD54, S. cerevisiae, homol... 184 5e-44ref|XM_005161.3| Homo sapiens RAD54, S. cerevisiae, homolog of, ... 184 5e-44gb|AF112481.1|AF112481 Homo sapiens RAD54B protein (RAD54B) mRNA... 184 5e-44ref|NM_012415.1| Homo sapiens RAD54, S. cerevisiae, homolog of, ... 184 5e-44emb|AL133578.1|HSM801429 Homo sapiens mRNA; cDNA DKFZp434J1672 (... 184 5e-44dbj|AP003534.1|AP003534 Homo sapiens genomic DNA, chromosome 8q2... 76 3e-11gb|AC084633.1|CBRG45G04 Caenorhabditis briggsae cosmid G45G04, c... 44 0.11dbj|AB018110.1|AB018110 Arabidopsis thaliana genomic DNA, chromo... 44 0.11

Page 11: Development of a Chicken Unigene Database

References

Gene Index analysis of the human genome estimates approximately 120,000 genes. Liang-Feng; Holt-Ingeborg, Pertea-Geo, Karamycheva-Svetlana, Salzberg-Steven-L, Quackenbush-John Nature-Genetics. June, 2000; 25 (2): 239-240.

The TIGR Gene Indices: Reconstruction and representation of expressed gene sequences Quackenbush-John, Liang-Feng, Holt-Ingeborg, Pertea-Geo, Upton-Jonathan Nucleic-Acids-ResearchJan. 1, 2000; 28 (1): 141-145

IMAGEne I: Clustering and ranking of I.M.A.G.E. cDNA clones corresponding to known genes. Cariaso-M, Folta-P , Wagner-M, Kuczmarski-T, Lennon-G Bioinformatics-Oxford. Dec., 1999; 15 (12): 965-973.

R. Larson, M. Hearst : Content analysis - Lecture from University of California , Berkeley School of information management and systems 1998. http://www.sims.berkeley.edu/courses/is202/f98/Lecture16/sld001.htmGib

T. Ono, H. Hishigaki, A. Tanigami, T. Takagi - Automated extraction of information on protein-protein interaction from biological literature. Bioinformatics vol 17 no 2 - Oxford University Press 2001.

I. Iliopoulos, A.J. Enright, C.A. Ouzounis - TEXTQUEST: document clustering of medline abstracts for concept discovery in molecular biology. EMBL Cmabridge Outstation, Cambridge CB10 ISD, UK.