the freaks session 3 .1: repeats session 3 .2: biased regions
DESCRIPTION
The FREAKS Session 3 .1: Repeats Session 3 .2: Biased regions. of PROTEIN SEQUENCE. Miguel Andrade Johannes-Gutenberg University of Mainz A ndrade @uni-mainz.de. Definition. 14% proteins contains repeats (Marcotte et al, 1999) 1: Single amino acid repeats. - PowerPoint PPT PresentationTRANSCRIPT
The FREAKS
Session 3.1: Repeats Session 3.2: Biased regions
Miguel AndradeJohannes-Gutenberg University of Mainz
of PROTEIN SEQUENCE
Definition
14% proteins contains repeats (Marcotte et al, 1999)
1: Single amino acid repeats.
2: Longer imperfect tandem repeats. Assemble in structure.
Definition CBRs
Perfect repeat: QQQQQQQQQQQImperfect: QQQQPQQQQQQAmino acid type: DDDDDEEEDEDEED
Compositionally biased regions (CBRs)
High frequency of one or two amino acids in a region.
Particular case of low complexity region
Conservation => Function
Length, amino acid type not necessarily conserved
Frequency: 1 in 3 proteins contains a compositionally biased region (Wootton, 1994), ~11% conserved (Sim and Creamer, 2004)
Function CBRs
Function CBRs
Conservation => Function
Length, amino acid type not necessarily conserved
Functions:Passive: linkersActive: binding, mediate protein interaction, structural integrity
(Sim and Creamer, 2004)
Structure of CBRs
Often variable or flexible: do not easily crystalize
1CJF: profilin bound to polyP
2IF8: Inositol Phosphate Multikinase Ipk2
2IF8: Inositol Phosphate Multikinase Ipk2
RVSETTTSGSL
2CX5: mitochondrial cytochrome c B subunit N-terminal
2CX5: mitochondrial cytochrome c B subunit N-terminal
FFFFIFVFNF
Types of CBRs
More than 6 aa in length, 1.4% of all, 87% of them in Euk (Faux et al 2005)
Types of CBRs
(Faux et al 2005)
Distribution is not random:
Eukaryota:Most common: poly-Q, poly-N, poly-A, poly-S, poly-G
Prokaryota: Most common: poly-S, poly-G, poly-A, poly-PRelatively rare: poly-Q, poly-N
Very rare or absent in both eukaryota and prokaryota:Poly-I, Poly-M, Poly-W, Poly-C, Poly-Y
Toxicity of long stretches of hydrophobic residues.
Filtering out CBRs
Normally filtered out as low complexity region: they give spurious BLAST hits
QQQQQQQQQQ||||||||||QQQQQQQQQQ 10/10 id
IDENTITIES||||||||||IDENTITIES 10/10 id
Filtering out CBRs
Normally filtered out as low complexity region: they give spurious BLAST hits
QQQQQQQQQQ||||||||||QQQQQQQQQQ Shuffle: 10/10 id
IDENTITIES||||||||||IDENTITIES 10/10 id
Filtering out CBRs
Normally filtered out as low complexity region: they give spurious BLAST hits
QQQQQQQQQQ||||||||||QQQQQQQQQQ Shuffle: 10/10 id
IDENTITIES | |SIINDIETTE Shuffle: 2/10 id
Filtering out CBRs
Option for pre-BLAST treatmentSEG algorithm:1) Identify sequence regions with low information content over a sequence window2) Merge neighbouring regions
Eliminates hits against common acidic-, basic- or proline-rich regions
(Wootton and Federhen, 1993)
• Obtain this protein sequence from NCBI. This is a hypothetical protein from Nematocida sp., a microsporidia (spore-forming fungi) that infects the worm Caenorhabditis elegans.
• Can you see funny things in this sequence?• Go to the NCBI's BLAST web page and go to the "protein
blast" option• Search for homologs of the protein• Keep the output• Do the same search in another NCBI's BLAST window
selecting the filter low complexity regions using SEG option• Compare the outputs: Can you identify different hits? Do
matches to the same sequence have relevant differences in the E-value? Comment on the relevance of the differences.
Exercise 1. Filtering CBRs for BLAST using SEG
A particular analysis…
AIR9 (1708 aa)
Ser rich+ basic LRR A9 repeats
conservedregion
Δ1
Δ15
Δ9
Δ12
Δ14
Δ10
Δ11
Δ16
Δ3Δ2
Δ6
Buschmann, et al (2006). Current Biology.
Buschmann, et al (2007). Plant Signaling & Behavior
Microtubule localization of Δx-GFP
…triggers a tool
A particular analysis…
http://sourceforge.net/projects/biasviz/
Huska, et al. (2007). Bioinformatics
…triggers BiasViz
http://sourceforge.net/projects/biasviz/
Huska, et al. (2007). Bioinformatics
A particular analysis…
…triggers BiasViz
ADAM15
Binds SH3 of endophilin and SH3 PX1 PMID:10531379
Binds SH3 of endophilinI and SH3 PX1 PMID:10531379
Binds SH3 of Fish PMID:12615925
Binds SH3 of Grb2 PMID:11127814
Binds SH3 of Fish PMID:12615925
Binds SH3 of Fish PMID:12615925
Binds SH3 of ArgBP1/ABI2 PMID:12463424
ADAM19
ADAM9
ADAM11
ADAM20
a
b
c
0.0
0.1
0.2
0.3
0.4
0.0
0.1
0.2
0.3
0.4
0.0
0.1
0.2
0.3
0.4
0.0
0.1
0.2
0.3
0.4
• Go to the BiasViz2 web page
• Launch BiasViz2
• Load this alignment on the step 1 section
• Hit the "Go to graphical view" button
• Try to find combinations of parameters that reveal CBRs
• Try hydrophobic residues and window size 10. If I tell you this is a transmembrane protein, what is this result telling you?
• Can you see other biased regions?
Exercise 2. Viewing CBRs in an alignment with BiasViz2
Function of polyQ Martin
Schaefer
HumanDog
MouseOpossum
ChickenFrog
ZebrafishTroutFugu
SticklebackLanceletCapitella
LimpetNematostella
TrichoplaxCiona intestinalis
Ciona savignyiD. melanogaster
D. mojavensisD. sechellia
D. erectaD. yakuba
D. grimshawiD. pseudoobscura
D. persimilisD. ananassae
D. willistoniD. virilis
polyQ in Huntingtin
Schaefer et al (2012) Nucleic Acids Res.
polyQ TFs long
human
non polyQ
1
5
10
5
0 1
00
500
100
0
part
ners
no polyQ
polyQ>14
polyQ 4-14
no polyQ
polyQ>14
polyQ 4-14
part
ners
1
5
10
5
0 1
00
500
100
0
1
2
5 1
0 2
0 5
0 10
0 2
00 5
00
polyQ TFs long
human
non polyQ
1
5
10
5
0 1
00
500
100
0
part
ners
polyQ non polyQ
TFs long
yeast
1
5
10
5
0 1
00
500
100
0
polyQ protein
N-terminal
C-terminal
unbound
polyP
polyQdisordered
coiled
coil
polyQ protein
polyQ protein
N-terminal
C-terminal
unbound
polyP
polyQdisordered
coiled
coil polyQ
polyP
coiled
coil
protein X
bound
ATXN1Q82NT is toxic ATXN1Q82NT aggregates
Petrakis et al (2012) PLoS Genetics
Spyros Petrakis
interactors that change ATXN1Q82NT toxicity
polyQdisordered
CC
Normal polyQ protein
CC partner
polyQdisordered
CC
Normal polyQ protein
CC partner
polyQdisordered
CC
Normal polyQ protein
non-CC partner
polyQalpha-helix
CC partner
polyQdisordered
CC
Normal polyQ protein
non-CC partner
polyQalpha-helix
CC partner
Toxic polyQ protein
polyQdisordered
CC
Normal polyQ protein
non-CC partner
polyQalpha-helix
CC partner
Toxic polyQ protein
polyQdisordered
CC
Normal polyQ protein
non-CC partner
polyQalpha-helix
CC partner
Toxic polyQ protein
polyQbeta-aggregates
polyQdisordered
CC
Normal polyQ protein
CC partner
non-CC partner
polyQalpha-helix
Toxic polyQ protein
polyQbeta-aggregates
polyQbeta-aggregates
polyQdisordered
CC
Normal polyQ protein
CC partner
non-CC partner
polyQalpha-helix
Toxic polyQ protein
polyQbeta-aggregates
polyQbeta-aggregates
polyQdisordered
CC
Normal polyQ protein
CC partner
non-CC partner
polyQalpha-helix
Toxic polyQ protein
polyQbeta-aggregates
polyQincreased beta-aggregates
polyQdisordered
CC
Normal polyQ protein
CC partner
non-CC partner
polyQalpha-helix
Toxic polyQ protein
polyQbeta-aggregates
polyQincreased beta-aggregates
polyQdisordered
CC
Normal polyQ protein
CC partner
non-CC partner
polyQalpha-helix
Toxic polyQ protein
polyQbeta-aggregates
polyQincreased beta-aggregates
polyQdisordered
CC
Normal polyQ protein
CC partner
non-CC partner
polyQalpha-helix
Toxic polyQ protein
polyQbeta-aggregates
polyQincreased beta-aggregates
BiasViz2
• Go to the BiasViz2 web page
• Load this alignment of N-terminal huntingtins on the step 1 section
• Load this file with secondary structure predicted for the human fragment in the step 2 section
• Load this file with ARD2 predictions for all sequences of the alignmnent in the step 2 section "raw values for each amino acid"
• Hit the "Go to graphical view" button
• Find the CBRs we have discussed for huntingtin
• Compare the relative position of the predicted repeats and the predicted secondary structure
Exercise 3. All together! View repeats, CBRs, and secondary structure in the N-terminal of huntingtin with BiasViz2