first release of hogenom, a database of homologous genes from complete genome equipe bioinformatique...
TRANSCRIPT
First release of HOGENOM, a database of homologous genes from complete genome
Equipe Bioinformatique et Génomique EvolutiveLaboratoire de Biométrie et Biologie Evolutive
Université Claude Bernard - Lyon 1
Simon Penel, Laurent Duret, Pascal Calvat, Jean-François Dufayard, Guy Perrière, Manolo Gouy.
POSTER JO 60
Homologous Genes Databases
Research fields:• Proteome/genome comparative analysis
• Phylogenetic studies
• Orthology/Paralogy relationship assignments
• Development of generic databases, specialised databases
– HOVERGEN: families of homologous vertebrate genes
– HOBACGEN: families of homologous bacterial genes
– NureBase, RTKdb, Hoppsigen, Mitalib, Polymorphix..
Contents:
• Nucleic and protein sequences
• Sequence annotations
• Taxonomic data
• Protein multiple alignments
• Phylogenetic trees
The HoGenom database:Homologous Genes Families from
fully Sequenced OrganismsEuropean project TEMBLOR
The HoGenom database:Building of Database
European Bioinformatic Institute
Data selection
1 sequence many species
Proteome sets
Rat
etc.
MouseHuman
SwissProt TrEMBL TrEMBL-new
Protein sequences
1 sequence 1 species
Filtering (SEG)
Local pairwise alignments
The HoGenom database:Building of Database
Similarity search
BLASTP
BLOSUM62E ≤ 10-4
Parralelised calculations at IN2P3
Clustering into familiesA
B
A
C
HSP ≥ 80 % lengthSimilarity ≥ 50 %
1 : Clustering of complete sequences into families2 : Including partial sequences to the families defined previously
The HoGenom database:Building of Database
C
B
A
Cluster A, B, C
Protein Family
Protein family
ABCDEFG
BIONJ
Neighbor joining,Observed divergence
Partial sequences: distance matrix with missing values
Multiple alignment
ABCDEFG
Rooting: mid-point
Phylogenetic treeG
F
E
D
C
B
A
CLUSTAL W
Default parameters
Alignments and trees
The HoGenom database:Building of Database
1016
91
Arabidopsis thaliana (plant) Caenorhabditis elegans (nematod) Drosophila melanogaster (fly) Encephalitozoon cuniculi (microsporidia) Guillardia theta (alguae) Homo sapiens (man) Mus musculus (mouse) Rattus norvegicus (rat) Saccharomyces cerevisiae (yeast) Schizosaccharomyces pombe (fungus)
423 577 proteins,527 925 cds
41 907 families
31%
9%
60%
117 organisms
The HoGenom database:Contents
WWW QueryQuery on sequences and families according to multiple
criteria
Cross Taxa Query on families according to complex taxonomic
criteria
Querying the databases
POSTER JO-60à suivre…