virus host co-evolution in sight of their proteomes and codon preferences bioinformatics project...
TRANSCRIPT
![Page 1: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/1.jpg)
Virus Host co-evolution in sight of their proteomes and
codon preferences
Bioinformatics project 2007
Yaar ReuveniInstructor - Michal Linial
![Page 2: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/2.jpg)
Outline:
My project is composed of two phases:
1. Phase I: The virus host web tool – VirOsNet. You are welcome to visit at: www.virosnet.cs.huji.ac.il
2. Phase II: Virus Host co-evolution research using codon usage analysis.
![Page 3: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/3.jpg)
Viruses: Basically a cpasid
envelope that contains genetic information.
Viruses can not replicate by themselves, and depend on the host for reproduction.
It’s main purpose in life enter a host, and use it’s facilities to reproduce
![Page 4: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/4.jpg)
Viruses fight back:
![Page 5: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/5.jpg)
Phase I: VirOsNet
VirOsNet provides database and tools for exploring virus evolution and virus-host co-
evolution
![Page 6: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/6.jpg)
Background and Motivation:
Ample of examples suggest that often viruses steal information from their hosts.
Viruses must optimize their amount of genetic material and physical size.
Viruses have very fast evolution:o Hard to trace.o Might change by switching hosts.o Shuffle their genetic material.
![Page 7: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/7.jpg)
Phase (I) main objective:
Compare all viral proteins to all known proteins and detect resemblance.
Meaning: in what way do viral proteins "resemble" any of all other known proteins in our world?
![Page 8: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/8.jpg)
Objectives and possible outcomes (i)
Clever search: Provide crossbreeding factors when searching
Offer comparisons of viruses relative to the proteome of their known hosts
Stolen elements: where were they stolen from? Was it from the host?
Mimicking phenomenon: detect host - protein mimicry
When did it happen: Evolutionary tracking
![Page 9: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/9.jpg)
Objectives and possible outcomes (ii)
Recent event – indicative by similarity search results that are exceptional.
Insights on viruses and their proteomes.
Long term: Pharmaceutics applications. Proposal
of drug targets
![Page 10: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/10.jpg)
Methods: Data is from the ProtoNet DB (currently ~ 1.8 million
proteins) All proteins are from UniProt.
New tables to the DB -specialized for host-virus relations.
Pre computed BLAST (BLOSUM62) and dynamic BLAST options.
Entry is a Viral Protein, BLAST search results are sorted by the descending E-values.
Several display schemes. Each result associated with domain information
(InterPro) Download options for next phase analysis
![Page 11: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/11.jpg)
Tool overview:
The tool works in a 4 steps scheme:1. Step 1: search for a virus to query on
using one of the search methods2. Step 2: choose a specific virus3. Step 3: choose one of it’s proteins, and
the BLAST properties4. Step 4: choosing one of the BLAST
results to get it’s pairwise alignment
![Page 12: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/12.jpg)
7,763 viruses and 199,563 proteins
![Page 13: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/13.jpg)
Some Statistics
Entry point to viruses according to their genetic material complexity
![Page 14: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/14.jpg)
Example: check all dsRNA viruses
Affecting Eukaryotes
![Page 15: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/15.jpg)
![Page 16: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/16.jpg)
![Page 17: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/17.jpg)
![Page 18: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/18.jpg)
![Page 19: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/19.jpg)
Case study: Abelson murine leukemia virus:a VERY close homolog of human and a
mouse protein tyrosine kinase that:(i) Regulates cytoskeleton during cell differentiation,
cell division and cell adhesion(ii) Regulates DNA repair potentially in severe
demage.
The viral protein causes cancer (active site mutation)
Lets look at it ……
![Page 20: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/20.jpg)
Active site
![Page 21: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/21.jpg)
Summery Phase I: Pros:
Platform for studying viruses relative to hosts A discovery tool Rich BLAST options for evolutionary wider view Crossbreeding with host data (i.e. IntrPro
Domains). Dynamic view on BLAST result as a group
(ProtoMesh) Cons: Still to improve the usability to the average
biologist VirOsNet can get very slow on overload or in some
of the filtering options.
![Page 22: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/22.jpg)
Phase II: Codon usage
Virus-host classification using codon usage analysis
with SVM
Figure adapted fromL. Merkel, N. Budisa, BIOspektrum 2006 , 12 , 41.Veränderung des genetischen Codes.
![Page 23: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/23.jpg)
RNA codons:
2’nd base
UCAG
1’st
U
UUU (Phe/F)PhenylalanineUCU (Ser/S)SerineUAU (Tyr/Y)TyrosineUGU (Cys/C)Cysteine
baseUUC (Phe/F)PhenylalanineUCC (Ser/S)SerineUAC (Tyr/Y)TyrosineUGC (Cys/C)Cysteine
UUA (Leu/L)LeucineUCA (Ser/S)SerineUAA Ochre (Stop)UGA Opal (Stop)
UUG (Leu/L)LeucineUCG (Ser/S)SerineUAG Amber (Stop)UGG (Trp/W)Tryptophan
C
CUU (Leu/L)LeucineCCU (Pro/P)ProlineCAU (His/H)HistidineCGU (Arg/R)Arginine
CUC (Leu/L)LeucineCCC (Pro/P)ProlineCAC (His/H)HistidineCGC (Arg/R)Arginine
CUA (Leu/L)LeucineCCA (Pro/P)ProlineCAA (Gln/Q)GlutamineCGA (Arg/R)Arginine
CUG (Leu/L)LeucineCCG (Pro/P)ProlineCAG (Gln/Q)GlutamineCGG (Arg/R)Arginine
A
AUU (Ile/I)IsoleucineACU (Thr/T)ThreonineAAU (Asn/N)AsparagineAGU (Ser/S)Serine
AUC (Ile/I)IsoleucineACC (Thr/T)ThreonineAAC (Asn/N)AsparagineAGC (Ser/S)Serine
AUA (Ile/I)IsoleucineACA (Thr/T)ThreonineAAA (Lys/K)LysineAGA (Arg/R)Arginine
AUG (Met/M)Methionine, Start[1]ACG (Thr/T)ThreonineAAG (Lys/K)LysineAGG (Arg/R)Arginine
G
GUU (Val/V)ValineGCU (Ala/A)AlanineGAU (Asp/D)Aspartic acidGGU (Gly/G)Glycine
GUC (Val/V)ValineGCC (Ala/A)AlanineGAC (Asp/D)Aspartic acidGGC (Gly/G)Glycine
GUA (Val/V)ValineGCA (Ala/A)AlanineGAA (Glu/E)Glutamic acidGGA (Gly/G)Glycine
GUG (Val/V)ValineGCG (Ala/A)AlanineGAG (Glu/E)Glutamic acidGGG (Gly/G)Glycine
![Page 24: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/24.jpg)
Main question:
Given a viral protein, determine who might be a potential host of the virus.
The basis for the hypothesis: An optimization of the viruses toward their hosts
![Page 25: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/25.jpg)
Objectives:
Create a classification tool, that receives a viral protein and will give a prediction on its potential hosts.
Classify all the proteins to different classes, using a maximum-margin hyperplane.
Provide different levels of classification. Create a “host rank” for a given viral
protein for each of its potential hosts.
Results: May suggest a “virus cross-species potential index”
![Page 26: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/26.jpg)
Methods: Collect and arrange all the codon usage
data (or other relevant data for this classification).
Analyze the data, normalization and processing.
Unsupervised learning and clustering for better understanding of the data.
Given all codon usage for all species, use the SVM algorithm to create a predictor for a new specimens.
Provide various levels of classifying classes for the codon data.
![Page 27: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/27.jpg)
About the data: Codon usage is calculated for
each species. Each species is represented
by a 64 positions vector. The question of
normalization:o standard normalize to 1.o functional per amino-acid, or
by entropy.o percentage – per column
666444442222222223113
RLSTPAGVKNQHEDYCFIMWSTOP
Codon usage
spec
ies
1 . . . 64
![Page 28: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/28.jpg)
Bacteria
666444442222222223113
RLSTPAGVKNQHEDYCFIMWSTOP
![Page 29: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/29.jpg)
Primates
![Page 30: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/30.jpg)
Data from Nakamura: Codon usage tabulated from the
international DNA sequence databasesNakamura, Y., Gojobori, T. and Ikemura, T. (2000) Nucl. Acids
Res. 28, 292.
Downloading the codon usage table The data covers all species (including
viruses).
![Page 31: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/31.jpg)
Usage distribution:Bacteria Invertebrates Primates
ViralPlants Rodents
![Page 32: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/32.jpg)
Usage distribution:
Positions 1-13
![Page 33: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/33.jpg)
Our data: It was expected to find diverse codon
usage between different taxonomy groups.
There are 703 distinct known hosts in our DB and 2152 distinct known hosted viruses.
I created an interface for extracting the CDS data from the coding data we have in ProtoNet.
I used the same convention for the vector
![Page 34: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/34.jpg)
In ProtoNet (version 5.1):16,567 viruses and 409,726 proteins
![Page 35: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/35.jpg)
Dividing our data in to groups:
GroupName
FungiBacteria
Viridiplantae (green plants)
Rodents
Primates
Fish
Aves (birds)
Tetrapoda
Arthropoda
Taxid4751233090998999443
32443
8782325236656
distinct Hosts
4463393831313418788
Number viruses not distinct
916914142511015474262761263
Distinct viruses
9161329162868163741549175
Distinct viruses with CDS
9151304150816163631462169
![Page 36: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/36.jpg)
Who infect what?
226
112
1370
308
64732
6Primates
Rodents
Aves
Tetrapoda
16
Fish
151
Bacteria
7
2 302Fungi
Plants
6 Others
70
Arthropoda
+)99 (distributed
![Page 37: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/37.jpg)
These are all diferent viruses groups:
![Page 38: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/38.jpg)
Comparison:Positions 1-12
Looks Promising!
![Page 39: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/39.jpg)
Clustering: preliminary results
Using a set of COMPACT tool (COMPACT: A Comparative Package for Clustering Assessment)
Varshavsky et al, 2005 ISPA: 159-167.
Visualization of resultsScoring
![Page 40: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/40.jpg)
Hierarchal - Percentage Normalization
![Page 41: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/41.jpg)
Hierarchal - Standard Normalization
![Page 42: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/42.jpg)
Summery phase II: All data is organized, accessible and
will update along with the ProtoNet DB. Comprehensive analysis, created a
good understanding of the data. Future plans:
Decide on a good division into classes. Use SVM algorithm to create a classifier, given
a virus codon preferences guess potential hosts.
Create an interface that offers this service.
![Page 43: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/43.jpg)
Acknowledgements:
Thank you to all the people that helped:
Michal Linial Iris Bahir Menachem Fromer Alexander Savenok Michael Dvorkin Roy Varshavsky
![Page 44: Virus Host co-evolution in sight of their proteomes and codon preferences Bioinformatics project 2007 Yaar Reuveni Instructor - Michal Linial](https://reader036.vdocument.in/reader036/viewer/2022062308/56649ee65503460f94bf6c73/html5/thumbnails/44.jpg)
Thank You!