orthology analysis
DESCRIPTION
Orthology Analysis. Erik Sonnhammer C enter for G enomics and B ioinformatics Karolinska Institutet, Stockholm. Outline. Basic concepts BLAST-based approaches to orthology Tree-based approaches to orthology Domain-level orthology. Homologs. = genes with a common origin - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/1.jpg)
Orthology Analysis
Erik Sonnhammer
Center for Genomics and Bioinformatics
Karolinska Institutet, Stockholm
![Page 2: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/2.jpg)
Outline
• Basic concepts
• BLAST-based approaches to orthology
• Tree-based approaches to orthology
• Domain-level orthology
![Page 3: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/3.jpg)
Homologs
= genes with a common origin
• May be genes in the same or in different organisms
• Does not say that function is identical
• Can only be true or false, and not a percentage!
• Homologs have the same 3D-structure layout
![Page 4: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/4.jpg)
Homologs
Orthologs Paralogs
![Page 5: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/5.jpg)
Gene Y1 in human
Gene Y in rat
Gene Y2 in human
DGene X in ancient animal
Gene Yin ancient mammal
In-paralogs
Orthologs: Orthologs: separated by speciationseparated by speciation
Gene Xin ancient mammal
Gene Xin human
Gene X in rat
Time
Orthologs
Orthologs
Out-paralogs
paralogs
speciation
D
S
S
![Page 6: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/6.jpg)
In/Out-paralog definition
In-paralogs ~ co-orthologsparalogs that were duplicated after the speciation and hence are orthologs to a cluster in the other species
Out-paralogs = not co-orthologsparalogs that were duplicated before the speciation. Not necessarily in the same species.
Sonnhammer & Koonin, Trends Genet. 18:619-620 (2002)
![Page 7: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/7.jpg)
Orthologs for functional genomicsOrthologs for functional genomics• Co-orthologs / inparalogs are more likely than outparalogs to
have identical biochemical functions and biological roles.
• Co-orthologs can be used to discover human gene function via model organism experiments
• Co-orthologs are key to exploit functional genomics/proteomics data in in model organisms
![Page 8: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/8.jpg)
Orthology and function conservation
• Orthology does not say anything about evolutionary distance.
• Close orthologs, e.g. human-mouse are very likely to have the same biological role in the organism.
• Distant orthologs, e.g. human-worm are less likely to have the same phenotypical role, but may have the same role in the corresponding pathway.
![Page 9: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/9.jpg)
Ortholog DatabasesSequence database Orthology
detection methodOrtholog database
SwTrembl proteomes Inparanoid (blast) Inparanoid
proteomes COGs (blast) COGs / KOGs
TIGR gene index COGs (blast) TOGA/EGO
proteomes OrthoMCL (blast) OrthoMCL
Pfam Orthostrapper (tree) HOPS
Pfam RIO (tree)
![Page 10: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/10.jpg)
How to find orthologs?How to find orthologs?
1. Calculate phylogenetic tree, look for orthologs in the tree (Orthostrapper, Rio):
2. Two-way best matches between two species can be used to find orthologs without trees.
[However, in-paralogs are harder to find this way]
![Page 11: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/11.jpg)
Two-way best match approachto finding orthologs
![Page 12: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/12.jpg)
COGsCOG2813:
Out-
paralogs
orthologs
![Page 13: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/13.jpg)
Inpara-n-oidInparalog ‘n ortholog identification
Blue = species 1
Red = species 2
![Page 14: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/14.jpg)
Inparanoid
Blue = species 1
Red = species 2
![Page 15: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/15.jpg)
No overlap - no problems:
Partial overlap - separate:
Complete overlap - merge:
Resolve overlapping clustersResolve overlapping clusters
![Page 16: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/16.jpg)
Inparalog score
Score for inparalog P = (scoreAP - scoreAB) / (scoreAA - scoreAB)
0 20 40 60 80 100%
A
P
B
![Page 17: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/17.jpg)
Confidence values for main orthologs from sampling
TVHIVDDEEPVR---KSLAFM---LTMNGFAT+ ++DD +R K L M +T+ G ATILLIDDHPMLRTGVKQLISMAPDITVVGEA
Sampling with replacement; insertions kept intact
GAFDEP---LVTHVR..........GA + ++T +RGAEEHMAPDILTLLR..........
“Bootstrap alignment” -> “bootstrap score”
Confidence = (bootstrap alignments best-best matches / nr of bootstraps)
![Page 18: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/18.jpg)
http://inparanoid.cgb.ki.se
![Page 19: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/19.jpg)
inparanoid.cgb.ki.se
Remm et al, J. Mol. Biol. 314:1041-1052 (2001)
Homo Sapiens vs. C. elegans
![Page 20: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/20.jpg)
Ortholog group sizes, human vs XVersion 2.5:
08-apr-03151360 sequences from Swissprot-TREMBL
44996 sequences from Homo sapiens26674 sequences from Mus musculus20316 sequences from Drosophila melanogaster20997 sequences from Caenorhabditis elegans36751 sequences from Arabidopsis thaliana6910 sequences from Saccharomyces cerevisiae8709 sequences from Escherichia coli
Species
Number of orthologs (orthologous groups) in H.sapiens
Number of sequences (in-paralogs) from H.sapiens in orthologous groups
Number of sequences (in-paralogs) from this species in orthologous groups
M.musculus 12458 19532 17055D.melanogaster 5549 15259 9854C.elegans 4541 14222 6537A.thaliana 3258 10863 12178S.cerevisiae 2175 7265 2751E.coli 599 2144 1037
![Page 21: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/21.jpg)
Nr of inparalogs per ortholog group
Species Avg. inparalogs in model organism ortholog groups
Avg. inparalogs in human
ortholog groups
Mouse 1.36 1.56
Fly 1.77 2.75
Worm 1.44 3.13
Mustard weed 3.73 3.33
Yeast 1.26 3.34
E. coli 1.73 3.57
![Page 22: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/22.jpg)
• No guarantee that the same segment is used in different sequences
• No evolutionary distance model
• Does not take multiple domains into account
Drawbacks of Blast-basedorthology assignment
![Page 23: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/23.jpg)
Domain orthology• Inparanoid Human-Fly ortholog pairs with domains in
Pfam-A 13.0: 20335
• Different domain architectures: 5411– Many of these are minor differences, e.g. 22 vs 21 Spectrin repeats
– Sometimes the difference is big:
ef-hand UCH
TBC UCH
![Page 24: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/24.jpg)
Tree-based approaches
![Page 25: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/25.jpg)
Distance-based tree building
• Bootstrapping: – randomly pick columns to bootstrap alignment, calculate tree
– Repeat 1000 times, frequency of node = bootstrap support
A2 A3
A1 4 8
A2 10
A1
A2
A3
1
3
5
2
A1 MKFYSLPNFPEN
A2 MKYYKLPDLPDE
A3 MRFYTACENPRS
Distance matrix
![Page 26: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/26.jpg)
Orthology by tree reconciliation
Species tree
Gene tree
Infer 2 duplications and 2 losses
![Page 27: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/27.jpg)
• Assumption that the species tree is fully known
• Does not give confidence values
• Gene trees become unreliable when involving a lot of sequences (more data -> less certainty)
• Computationally expensive
Drawbacks of tree reconciliationfor orthology assignment
![Page 28: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/28.jpg)
Partial tree reconciliation
• Find pairwise orthologs by computer parsing of tree.
![Page 29: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/29.jpg)
99
45
85
100
82
99
C14F5.4
AAF49194.1
AH6.2
F37H8.4
Y6E2A.9
C47D12.3
T04F8.1
AAF52138.1
PIR-S67168
Pairwise orthology confidence by ‘orthostrapping’
The original tree with bootstrap support values
![Page 30: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/30.jpg)
C14F5.4
AAF49194.1
AH6.2
F37H8.4
Y6E2A.9
C47D12.3
T04F8.1
AAF52138.1
PIR-S67168
Pairwise orthology confidence by ‘orthostrapping’
01C14F5.4
10T04F8.1
00C47D12.3
00Y6E2A.9
00F37H8.4
00AH6.2
AAF52138.1
AAF49194.1
FlyWorm
![Page 31: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/31.jpg)
C14F5.4
AAF49194.1
AH6.2
F37H8.4
Y6E2A.9
C47D12.3
T04F8.1
AAF52138.1
PIR-S67168
Pairwise orthology confidence by ‘orthostrapping’
02C14F5.4
20T04F8.1
10C47D12.3
00Y6E2A.9
00F37H8.4
00AH6.2
AAF52138.1
AAF49194.1
FlyWorm
![Page 32: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/32.jpg)
99
45
85
100
82
99
C14F5.4
AAF49194.1
AH6.2
F37H8.4
Y6E2A.9
C47D12.3
T04F8.1
AAF52138.1
PIR-S67168
Pairwise orthology confidence by ‘orthostrapping’
099C14F5.4
980T04F8.1
810C47D12.3
770Y6E2A.9
770F37H8.4
770AH6.2
AAF52138.1
AAF49194.1
FlyWorm
![Page 33: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/33.jpg)
orthostrapper.cgb.ki.se
![Page 34: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/34.jpg)
Orthology is not transitive!
Multiple species at different distances may give erroneous groups, that includes out-paralogs
![Page 35: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/35.jpg)
Orthology is not transitive!
-> Orthology strictly defined for only 2 species/clades
Combining species of different distances is very dangerous
But OK to combine multiple equidistant ones
YH1D1H2D2
D1 H2
Y
![Page 36: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/36.jpg)
Domain-level orthology
![Page 37: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/37.jpg)
HOPS - Hierarchy of Orthologs and Paralogs
eukaryota
metazoa
viridiplantae
fungi
nematoda
arthropoda
chordata
1. All species in Pfam are bundled in groups according to scheme:
2. Apply Orthostrapper to groups at same level in Pfam families
3. Display results in NIFAS
![Page 38: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/38.jpg)
Pfam
![Page 39: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/39.jpg)
Pfam in brief:
Profile-HMMHMMer-2.0
FULL alignment
Search database
Manually curated Automatically made
SEED alignmentrepresentative members
Description file
• Release 13.0 (April 2004):– 7426 families Pfam-A domain families
– Based on 1160000 sequences (Swissprot & Trembl)– 21980 unique Pfam-A domain architectures– 73% of all proteins have >=1 Pfam-A domain
![Page 40: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/40.jpg)
HOPS results
Pfam 10, 6190 families:
• 2450 families (40%) have HOPS orthologs
• 1319 families (21%) have HOPS orthologs in all 6 pairwise comparisons
• 286356 pairwise orthology assignments (> 75% orthostrap)
Storm and Sonnhammer, Genome Research 13:2353-2362 (2003)
![Page 41: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/41.jpg)
Ways to access HOPS
• NIFAS graphical browser
• By sequence ID at Pfam.cgb.ki.se/HOPS
• Flatfiles (Orthostrap tables of 2 clades)
![Page 42: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/42.jpg)
Pfam.cgb.ki.se/HOPS
![Page 43: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/43.jpg)
![Page 44: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/44.jpg)
![Page 45: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/45.jpg)
Evolution of Domain Architectures
NIFAS:
![Page 46: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/46.jpg)
ATP sulfurylase /APS kinase
![Page 47: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/47.jpg)
Orthologous shuffled domains?
ATP sulfurylase domain, metazoa vs fungi
![Page 48: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/48.jpg)
APS kinase domain
![Page 49: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/49.jpg)
HOPS orthologs of PPS1_HUMAN (ATP sulfurylase/APS kinase)
![Page 50: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/50.jpg)
Summary of ATP sulfurylases/APS kinases:
Shuffled non-orthologous domains
Fungi
Metazoa
![Page 51: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/51.jpg)
Conclusions
• Orthologs can be detected by – Blast: fast– tree: slow but less error-prone
• Species at different evolutionary distances should not be combined in orthology analysis
• Inparanoid and Orthostrapper were designed to find inparalogs but not outparalogs
• HOPS/NIFAS can be used to find domain orthologs and analyze domain architecture evolution
![Page 52: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/52.jpg)
Future perspectives
• Multiparanoid – multiple species merging of pairwise Inparalogs.
• Functional divergence among inparalogs
![Page 53: Orthology Analysis](https://reader034.vdocument.in/reader034/viewer/2022051218/568159a6550346895dc70a41/html5/thumbnails/53.jpg)
Acknowledgments
– Christian Storm
– Maido Remm
– Andrey Alexeyenko
– Volker Hollich
– Mats Jonsson
http://sonnhammer.cgb.ki.se