the freaks session 3 .1: repeats session 3 .2: biased regions

The FREAKS

Session 3.1: Repeats Session 3.2: Biased regions

Miguel AndradeJohannes-Gutenberg University of Mainz

[email protected]

of PROTEIN SEQUENCE

Definition

14% proteins contains repeats (Marcotte et al, 1999)

1: Single amino acid repeats.

2: Longer imperfect tandem repeats. Assemble in structure.

Definition CBRs

Perfect repeat: QQQQQQQQQQQImperfect: QQQQPQQQQQQAmino acid type: DDDDDEEEDEDEED

Compositionally biased regions (CBRs)

High frequency of one or two amino acids in a region.

Particular case of low complexity region

Conservation => Function

Length, amino acid type not necessarily conserved

Frequency: 1 in 3 proteins contains a compositionally biased region (Wootton, 1994), ~11% conserved (Sim and Creamer, 2004)

Function CBRs

Function CBRs

Conservation => Function

Length, amino acid type not necessarily conserved

Functions:Passive: linkersActive: binding, mediate protein interaction, structural integrity

(Sim and Creamer, 2004)

Structure of CBRs

Often variable or flexible: do not easily crystalize

1CJF: profilin bound to polyP

2IF8: Inositol Phosphate Multikinase Ipk2

2IF8: Inositol Phosphate Multikinase Ipk2

RVSETTTSGSL

2CX5: mitochondrial cytochrome c B subunit N-terminal

2CX5: mitochondrial cytochrome c B subunit N-terminal

FFFFIFVFNF

Types of CBRs

More than 6 aa in length, 1.4% of all, 87% of them in Euk (Faux et al 2005)

Types of CBRs

(Faux et al 2005)

Distribution is not random:

Eukaryota:Most common: poly-Q, poly-N, poly-A, poly-S, poly-G

Prokaryota: Most common: poly-S, poly-G, poly-A, poly-PRelatively rare: poly-Q, poly-N

Very rare or absent in both eukaryota and prokaryota:Poly-I, Poly-M, Poly-W, Poly-C, Poly-Y

Toxicity of long stretches of hydrophobic residues.

Filtering out CBRs

Normally filtered out as low complexity region: they give spurious BLAST hits

QQQQQQQQQQ||||||||||QQQQQQQQQQ 10/10 id

IDENTITIES||||||||||IDENTITIES 10/10 id

Filtering out CBRs


QQQQQQQQQQ||||||||||QQQQQQQQQQ Shuffle: 10/10 id

IDENTITIES||||||||||IDENTITIES 10/10 id

Filtering out CBRs


QQQQQQQQQQ||||||||||QQQQQQQQQQ Shuffle: 10/10 id

IDENTITIES | |SIINDIETTE Shuffle: 2/10 id

Filtering out CBRs

Option for pre-BLAST treatmentSEG algorithm:1) Identify sequence regions with low information content over a sequence window2) Merge neighbouring regions

Eliminates hits against common acidic-, basic- or proline-rich regions

(Wootton and Federhen, 1993)

• Obtain this protein sequence from NCBI. This is a hypothetical protein from Nematocida sp., a microsporidia (spore-forming fungi) that infects the worm Caenorhabditis elegans.

• Can you see funny things in this sequence?• Go to the NCBI's BLAST web page and go to the "protein

blast" option• Search for homologs of the protein• Keep the output• Do the same search in another NCBI's BLAST window

selecting the filter low complexity regions using SEG option• Compare the outputs: Can you identify different hits? Do

matches to the same sequence have relevant differences in the E-value? Comment on the relevance of the differences.

Exercise 1. Filtering CBRs for BLAST using SEG

http://www.ncbi.nlm.nih.gov/protein/378756390?report=fasta

http://blast.ncbi.nlm.nih.gov/

http://blast.ncbi.nlm.nih.gov/

A particular analysis…

AIR9 (1708 aa)

Ser rich+ basic LRR A9 repeats

conservedregion

Δ1

Δ15

Δ9

Δ12

Δ14

Δ10

Δ11

Δ16

Δ3Δ2

Δ6

Buschmann, et al (2006). Current Biology.

Buschmann, et al (2007). Plant Signaling & Behavior

Microtubule localization of Δx-GFP

…triggers a tool


http://sourceforge.net/projects/biasviz/

Huska, et al. (2007). Bioinformatics

…triggers BiasViz

http://sourceforge.net/projects/biasviz/

Huska, et al. (2007). Bioinformatics


…triggers BiasViz

ADAM15

Binds SH3 of endophilin and SH3 PX1 PMID:10531379

Binds SH3 of endophilinI and SH3 PX1 PMID:10531379

Binds SH3 of Fish PMID:12615925

Binds SH3 of Grb2 PMID:11127814



Binds SH3 of ArgBP1/ABI2 PMID:12463424

ADAM19

ADAM9

ADAM11

ADAM20

a

b

c

0.0

0.1

0.2

0.3

0.4

0.0

0.1

0.2

0.3

0.4

0.0

0.1

0.2

0.3

0.4

0.0

0.1

0.2

0.3

0.4

• Go to the BiasViz2 web page

• Launch BiasViz2

• Load this alignment on the step 1 section

• Hit the "Go to graphical view" button

• Try to find combinations of parameters that reveal CBRs

• Try hydrophobic residues and window size 10. If I tell you this is a transmembrane protein, what is this result telling you?

• Can you see other biased regions?

Exercise 2. Viewing CBRs in an alignment with BiasViz2

http://cbdm.mdc-berlin.de/tools/biasviz/

http://oz-masterclass.wikispaces.com/file/view/littleMSA_fasta.txt/459348072/littleMSA_fasta.txt

Function of polyQ Martin

Schaefer

HumanDog

MouseOpossum

ChickenFrog

ZebrafishTroutFugu

SticklebackLanceletCapitella

LimpetNematostella

TrichoplaxCiona intestinalis

Ciona savignyiD. melanogaster

D. mojavensisD. sechellia

D. erectaD. yakuba

D. grimshawiD. pseudoobscura

D. persimilisD. ananassae

D. willistoniD. virilis

polyQ in Huntingtin

Schaefer et al (2012) Nucleic Acids Res.

polyQ TFs long

human

non polyQ

1

5

10

5

0 1

00

500

100

0

part

ners

no polyQ

polyQ>14

polyQ 4-14

no polyQ

polyQ>14

polyQ 4-14

part

ners

1

5

10

5

0 1

00

500

100

0

1

2

5 1

0 2

0 5

0 10

0 2

00 5

00

polyQ TFs long

human

non polyQ

1

5

10

5

0 1

00

500

100

0

part

ners

polyQ non polyQ

TFs long

yeast

1

5

10

5

0 1

00

500

100

0

polyQ protein

N-terminal

C-terminal

unbound

polyP

polyQdisordered

coiled

coil

polyQ protein

polyQ protein

N-terminal

C-terminal

unbound

polyP

polyQdisordered

coiled

coil polyQ

polyP

coiled

coil

protein X

bound

ATXN1Q82NT is toxic ATXN1Q82NT aggregates

Petrakis et al (2012) PLoS Genetics

Spyros Petrakis

interactors that change ATXN1Q82NT toxicity

polyQdisordered

CC

Normal polyQ protein

CC partner

polyQdisordered

CC


non-CC partner

polyQalpha-helix

CC partner

polyQdisordered

CC


non-CC partner

polyQalpha-helix

CC partner

Toxic polyQ protein

polyQdisordered

CC


non-CC partner

polyQalpha-helix

CC partner

Toxic polyQ protein

polyQbeta-aggregates

polyQdisordered

CC


CC partner

non-CC partner

polyQalpha-helix

Toxic polyQ protein



polyQdisordered

CC


CC partner

non-CC partner

polyQalpha-helix

Toxic polyQ protein


polyQincreased beta-aggregates

BiasViz2

• Go to the BiasViz2 web page

• Load this alignment of N-terminal huntingtins on the step 1 section

• Load this file with secondary structure predicted for the human fragment in the step 2 section

• Load this file with ARD2 predictions for all sequences of the alignmnent in the step 2 section "raw values for each amino acid"

• Hit the "Go to graphical view" button

• Find the CBRs we have discussed for huntingtin

• Compare the relative position of the predicted repeats and the predicted secondary structure

Exercise 3. All together! View repeats, CBRs, and secondary structure in the N-terminal of huntingtin with BiasViz2

http://cbdm.mdc-berlin.de/tools/biasviz/

http://oz-masterclass.wikispaces.com/file/view/hdNterm_fasta.txt/459349546/hdNterm_fasta.txt

http://oz-masterclass.wikispaces.com/file/view/hdNterm_jpred.txt/459349568/hdNterm_jpred.txt

http://oz-masterclass.wikispaces.com/file/view/hdNterm_ard2.txt/459349640/hdNterm_ard2.txt

the freaks session 3 .1: repeats session 3 .2: biased regions

Documents

polyg prokaryota

polyprelatively rare

sequence regions

protein blast optionsearch

single amino acid repeats

function length

different hits

thisprotein sequence