identification of protein domains. orthologs and paralogs describing evolutionary relationships...
TRANSCRIPT
![Page 1: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/1.jpg)
Identification of Protein Domains
![Page 2: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/2.jpg)
Orthologs and Paralogs
Describing evolutionary relationships among genes (proteins):
Two major ways of creating homologous genes is gene duplication and speciation.
Homology: not sufficiently well-defined Therefore additional terms are used:
![Page 3: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/3.jpg)
Orthologs are two genes from two different species that derive from a single gene in the last common ancestor of the species.
ortho
para
ortho Paralogs are genes that derive from a single gene that was duplicated within a genome.
![Page 4: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/4.jpg)
Co-orthologs are paralogs produced by duplications of orthologs subsequent to a given speciation event.
co-ortho
![Page 5: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/5.jpg)
Inparalogs are paralogs in a given lineage that all evolved by gene duplications that happened after the speciation event.
in-para
in-para
out-para
Outparalogs are paralogs in the given lineage that evolved by gene duplications that happened before the speciation event
![Page 6: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/6.jpg)
Orthologs and Paralogs
• Orthologs - evolutionary functional counterparts in different species
• Inparalogs – important for detecting lineage-specific adaptations
![Page 7: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/7.jpg)
Proteins :• Rapidly growing databases of protein
sequences due to genome sequencing projects.
• Many new proteins belong to protein families with known functions, (significant sequence similarity).
• Only a small fraction of known proteins have functions determined by experiment.
• Databases providing computational sequence analysis allow us to classify new proteins to known families, and thus determine their function.
![Page 8: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/8.jpg)
Protein Domains
• A domain is an independent structural unit which can be found alone or in conjunction with other domains or repeats.
• Module = mobile domain.
• Different domains have distinct functions.
• Many eukaryotic proteins have multiple domains.
![Page 9: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/9.jpg)
Protein Domains
PX domain with ligand
SH3 domain with ligand
![Page 10: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/10.jpg)
Identifying Protein Domains:
Problems :
– Defining the members of each family.– Building multiple alignments of the
members.– Finding the boundaries of the domain.
![Page 11: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/11.jpg)
![Page 12: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/12.jpg)
Identifying Protein Domains
• Little structural data identification by sequence analysis.
• Sequence characterization of families - determine 3D structure and molecular functions.
• Even when the structure of the domain is not known it may be possible to define its boundaries from sequence alone.
![Page 13: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/13.jpg)
Identifying Protein Domains:
• They do not give a clear picture of the domain boundaries.
• Lack sensitivity.
Motif matches are often useful to indicatefunctional sites, however :
![Page 14: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/14.jpg)
Identifying Protein Domains:
Automatic methods :• Fast, effective, deals with a lot of
information.• Might fragment domain families.• Might cause fusion of domain families.
Manual methods :• Knowledge of protein experts is put to
use.• Slow, require a lot of manpower.
![Page 15: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/15.jpg)
![Page 16: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/16.jpg)
SMART : (Simple Modular Architecture Research Tool)
Web-based resource used for :– rapid annotation of protein domains.– analysis of domain architectures.
![Page 17: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/17.jpg)
Domain ArchitectureProtein: PA-3427CGSpecies: Drosophila melanogaster
Protein: ENSMUSP00000023109
Species: Mus musculus
Protein: ENSANGP00000009529Species: Anopheles gambiae
![Page 18: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/18.jpg)
SMART (Simple Modular Architecture Research Tool)
• There are over 600 domain families.
• Provides information about :– function .– subcellular localization.– phyletic distribution.– tertiary structure.
• Based on HMMs (Hidden Markov Models).
![Page 19: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/19.jpg)
SMART (Simple Modular Architecture Research Tool)
HMM – based on seed alignment.
Threshold values used to determine homology of domains.
![Page 20: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/20.jpg)
SMART (Simple Modular Architecture Research Tool)• Alignments of proteins by:
– Minimize insertions/deletions in conserved alignment blocks.
– Optimize amino acid property conservation.
– Closing unnecessary gaps.
• Gapped alignments prefered over ungapped ones:– prediction of domain boundaries.– greater information content.
• Alignment of entire structural domains.
![Page 21: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/21.jpg)
![Page 22: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/22.jpg)
![Page 23: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/23.jpg)
PROSITE - database of protein families and
domains • Database of biologically significant sites
and patterns. Contains 1,609 profiles.• Pattern – conserved sequence of a few
amino acids.• Identifies to which known family of
proteins (if any) the new sequence belongs.
• Used to determine the function of uncharacterized proteins translated from genomic or cDNA sequences.
![Page 24: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/24.jpg)
PROSITE - database of protein families and domains
• A protein too distant from any other to detect its resemblance by overall sequence alignment, can be classified according to a Pattern.
• Patterns arise because of requirements of binding sites that impose very tight constraint on the evolution of portions of the protein.
![Page 25: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/25.jpg)
PROSITE – how is a pattern developed ?
• As short as possible.
• Detects all/most sequences it describes.
• As little false results as possible.
high sensitivity and high specificity.
![Page 26: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/26.jpg)
PROSITE – how is a pattern developed ?First – study reviews on a protein family.
Then build alignment table with particularattention to residues and regions important tothe biological function of that family. - Enzyme catalytic sites. - Prostethic group attachment sites (heme). - Amino acids involved in binding a metal ion.- Cysteines involved in disulfide bonds. - Regions involved in binding a molecule
(ADP/ATP, GDP/GTP, calcium, DNA, etc.) or another protein.
![Page 27: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/27.jpg)
PROSITE steps in the development of a pattern:
• Finding a core pattern : 4-5 biologically significant residues.
• Test the pattern on a large database.• If lucky – there is correlation in this
region which indicates a good pattern.• Mostly, there is no correlation :
– Gradually increase the size of the pattern.– search over other patterns.
![Page 28: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/28.jpg)
PROSITE – An example
ALRDFATHDDF SMTAEATHDSI ECDQAATHEAS
This pattern is small and would probably pick up too many false positive results :
![Page 29: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/29.jpg)
Profiles – characterize a protein family or domain over its entire length.
Patterns - small regions, high sequence similarity.
![Page 30: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/30.jpg)
![Page 31: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/31.jpg)
Research: Finding new domain familiesAutomatic methods• The team started with 107 nuclear
domains.• Using SMART - get all proteins with
at least one of these domains, characterize their complete domain structure.
• Regions not annotated using known SMART domain models were extracted with their domain context.
![Page 32: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/32.jpg)
Finding new domain families: Automatic methods
• Grouping proteins by region similarity.
• Finding homologs using PSI-BLAST on longest of every group (Threshold E-value<0.001).
• Finding domain organization via SMART.
• Homologous regions – candidates for a novel domain family.
![Page 33: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/33.jpg)
Finding new domain families:
m an u a l in sp ec tion m ore search es
d om ain a rch itec tu re - S M A R T
P S I-B L A S T fin d in g h om olog s
g rou p reg ion s
reg ion s n o t kn ow n b y S M A R T
fin d in g p ro te in s -S M A R T
1 0 7 n u c lear d om ain s
![Page 34: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/34.jpg)
Finding new domain families: Manual confirmation• Different context – novel module family.• Proteins with nuclear AND extracellular
domains excluded.• Multiple alignments and known locations of
domains – definition of domains’ borders.• Automatic searches to find more members,
E-value < 0.1, and manual checks.• Marginal similarity to domain family –
possible divergent family.
![Page 35: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/35.jpg)
Prediction of Function: Chromatin-Binding Domains
• Protein SPT6 containing CSZ domain, regulates transcription through a histone-binding capability.
• It also contains two other types of domains, which are unlikely to bind histones.
• Therefore it was predicted that CSZ domain has that function.
![Page 36: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/36.jpg)
Research :
• Search of C-terminal by PSI-BLAST (E-value<10-5) found UBX containing proteins and metazoan homologs of PNGases.
• PNGases – proteins involved in UPR.
• UPR – unfolded protein response. • PUG – the homologous regions.• PUG domains found in proteins
with domains central to ubiquitin- mediated proteolysis, (UBA and UBX).
• Arabidopsis protein – UBA in N-terminal.
![Page 37: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/37.jpg)
Conclusion :
PUG containing proteins might link the UPR to ubiquitin mediated protein degradation.
![Page 38: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/38.jpg)
PUG UBA
PUG
PUG
UBX
PUG UBCc
PNGasesBelieved to
have a role in the UPR
Domains central to ubiquitin mediated proteolysis
![Page 39: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/39.jpg)
![Page 40: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/40.jpg)
ApoptosisUbx domain from human faf1
Dna binding proteinc-terminal uba domain of the human homologue of rad23a (hhr23a)
![Page 41: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/41.jpg)
• Orthologs of PNGases in metazoan are present singly, (not in multiple paralogs) – likely to have similar cellular localization.
• The ortholog in Sacharaomyces cervisiae is known to be localized mainly in the nucleus. Likely that PNGases are localized in the nucleus too.
![Page 42: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/42.jpg)
• HMM from the PUG – marginal similarity to IRE1p-like Kinases which are known to initiate the UPR as well.
• They suggest the presence of divergent PUG domains in the C termini of these Proteins.
• Analysis revealed a conserved region in metazoan PNGases. Named it PAW. Put it in SMART.
![Page 43: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/43.jpg)
• The team found 28 novel nuclear domain families.
• Most of them with representatives in diverse molecular context in different species.
• Some specific to single species.
• Others divergent members of previously recognized families.
![Page 44: Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous](https://reader033.vdocument.in/reader033/viewer/2022051516/56649e9f5503460f94ba2424/html5/thumbnails/44.jpg)
The End