q2 q1 dissection of usp catalytic domains reveals five ... · q2 q1 dissection of usp catalytic...

19
Dissection Q1 of USP catalytic Q2 domains reveals five common insertion pointsw Yu Ye,z a Hartmut Scheel,z b Kay Hofmann b and David Komander* a Received 16th April 2009, Accepted 5th June 2009 First published as an Advance Article on the web DOI: 10.1039/b907669g Ubiquitin specific proteases (USPs) are the largest family of deubiquitinating enzymes with B56 members in humans. USPs regulate a wide variety of cellular processes by their ability to remove (poly)ubiquitin from target proteins. Their enzymatic activity is encoded in a common catalytic core of B350 amino acids, however many USPs show significantly larger catalytic domains. Here we have analysed human and yeast USP domains, combining bioinformatics with structural information. We reveal that all USP domains can be divided into six conserved boxes, and we map the conserved boxes onto the USP domain core structure. The boxes are interspersed by insertions, some of which as big as the catalytic core. The two most common insertion points place inserts near the distal ubiquitin-binding site, and in many cases ubiquitin-binding domains or ubiquitin-like folds are found in these insertions, potentially directly affecting catalytic function. Other inserted sequences are unstructured, and removal of these might aid future structural and functional analysis. Yeast USP domains have a different pattern of inserted sequences, suggesting that the insertions are hotspots for evolutionary diversity to expand USP functionality. Introduction Protein ubiquitination is a reversible posttranslational mod- ification, which affects a large number of cellular processes including protein degradation, trafficking, cell signalling and the DNA damage response. During ubiquitination, the 76 amino acid protein ubiquitin is covalently linked to a Lys residue of a target protein. 1 The mechanism of ubiquitination involves the concerted action of a three-step enzymatic cas- cade. In an initial ATP-dependent process, ubiquitin is acti- vated by an E1 ubiquitin activating enzyme, which transfers ubiquitin to the active site Cys of an E2 ubiquitin conjugating enzyme. The charged E2, together with an E3 ligase, transfers the ubiquitin to the e-amino group of a substrate Lys residue, forming an isopeptide bond. 2 Importantly, ubiquitin itself contains seven Lys residues, all of which can become ubiquitinated, leading to polyubiquitin chains of different linkages. 3,4 It is becoming increasingly clear that differently linked ubiquitin chains serve different func- tions in cells. 5 Lys48-linked ubiquitin polymers target sub- strate proteins for degradation by the proteasome. 6 In contrast, non-degradative Lys63-linked ubiquitin chains can activate protein kinase cascades, DNA repair processes and endosomal trafficking. 7 The roles for the remaining chain types are currently less clear, although recently some progress has been made 3,8,9 (reviewed in ref. 5). Ubiquitination is reversible, and dedicated deubiquitinases exist which hydrolyse isopeptide bonds. 10 This proteolytic reaction is facilitated by B84 deubiquitinases in humans that belong to five families. Four families, the ubiquitin specific proteases (USP, 56 members, not including isoforms), ubiqui- tin C-terminal hydrolases (UCH, 4 members), ovarian tumour domain containing proteases (OTU, 14 members), and Ma- chado Joseph disease (MJD)/Josephin domain deubiquitinases (4 members), are Cys-dependent proteases, containing a cat- alytic triad of Cys, His and Asp/Asn. 11 The mechanism of the hydrolysis reaction is similar to that of the plant protease papain. 12 The fifth family of deubiquitinases is JAB1/MPN/ Mov34 metalloenzyme (JAMM, also known as MPN+ and hereafter indicated as JAMM/MPN+) domain containing zinc-dependent metalloproteases (12 members), 11,13 the me- chanism of which has recently been elucidated by structural work. 14 USP enzymes were first identified and cloned in the yeast Saccharomyces cerevisiae. 15 It was noted that this enzyme class contained an N-terminal Cys-box and a C-terminal His-box, which comprise the catalytic Cys and His residues, respec- tively. This annotation was crucial for the subsequent identi- fication of further yeast and human USP enzymes. 11,16 Early work in Drosophila revealed Epsin (Drosophila liquid facets (lqf)) as a substrate for USP9 (Drosophila fat facets (faf)), 17 an interaction that regulates Notch signalling. 18 This suggested that USP enzymes might have specific substrates and can regulate distinct signalling pathways. More exciting research on this enzyme family in the past decade has linked many USPs to specific biological pathways, revealing that USPs 1 5 10 15 20 25 30 35 40 45 50 55 1 5 10 15 20 25 30 35 40 45 50 55 a MRC Laboratory of Molecular Biology, Protein and Nucleic Acid Chemistry Division, Hills Road, Cambridge, UK CB2 0QH. E-mail: [email protected]; Fax: +44 (0)1223412178; Tel: +44 (0)1223402300 b Miltenyi Biotec, Friedrich-Ebert-Strasse 68, 51429 Bergisch- Gladbach, Germany w Electronic supplementary information (ESI) available: [DETAILS]. See DOI: 10.1039/b907669g z These authors contributed equally to the work. This journal is c The Royal Society of Chemistry 2009 Mol. BioSyst., 2009, ]]], 1–12 | 1 PAPER www.rsc.org/molecularbiosystems | Molecular BioSystems

Upload: lamkhanh

Post on 12-May-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

DissectionQ1 of USP catalyticQ2 domains reveals five common insertionpointsw

Yu Ye,za Hartmut Scheel,zb Kay Hofmannb and David Komander*a

Received 16th April 2009, Accepted 5th June 2009

First published as an Advance Article on the web

DOI: 10.1039/b907669g

Ubiquitin specific proteases (USPs) are the largest family of deubiquitinating enzymes with B56

members in humans. USPs regulate a wide variety of cellular processes by their ability to remove

(poly)ubiquitin from target proteins. Their enzymatic activity is encoded in a common catalytic

core of B350 amino acids, however many USPs show significantly larger catalytic domains. Here

we have analysed human and yeast USP domains, combining bioinformatics with structural

information. We reveal that all USP domains can be divided into six conserved boxes, and we

map the conserved boxes onto the USP domain core structure. The boxes are interspersed by

insertions, some of which as big as the catalytic core. The two most common insertion points

place inserts near the distal ubiquitin-binding site, and in many cases ubiquitin-binding domains

or ubiquitin-like folds are found in these insertions, potentially directly affecting catalytic

function. Other inserted sequences are unstructured, and removal of these might aid future

structural and functional analysis. Yeast USP domains have a different pattern of inserted

sequences, suggesting that the insertions are hotspots for evolutionary diversity to expand USP

functionality.

Introduction

Protein ubiquitination is a reversible posttranslational mod-

ification, which affects a large number of cellular processes

including protein degradation, trafficking, cell signalling and

the DNA damage response. During ubiquitination, the 76

amino acid protein ubiquitin is covalently linked to a Lys

residue of a target protein.1 The mechanism of ubiquitination

involves the concerted action of a three-step enzymatic cas-

cade. In an initial ATP-dependent process, ubiquitin is acti-

vated by an E1 ubiquitin activating enzyme, which transfers

ubiquitin to the active site Cys of an E2 ubiquitin conjugating

enzyme. The charged E2, together with an E3 ligase, transfers

the ubiquitin to the e-amino group of a substrate Lys residue,forming an isopeptide bond.2

Importantly, ubiquitin itself contains seven Lys residues, all

of which can become ubiquitinated, leading to polyubiquitin

chains of different linkages.3,4 It is becoming increasingly clear

that differently linked ubiquitin chains serve different func-

tions in cells.5 Lys48-linked ubiquitin polymers target sub-

strate proteins for degradation by the proteasome.6 In

contrast, non-degradative Lys63-linked ubiquitin chains can

activate protein kinase cascades, DNA repair processes and

endosomal trafficking.7 The roles for the remaining chain

types are currently less clear, although recently some progress

has been made3,8,9 (reviewed in ref. 5).

Ubiquitination is reversible, and dedicated deubiquitinases

exist which hydrolyse isopeptide bonds.10 This proteolytic

reaction is facilitated by B84 deubiquitinases in humans that

belong to five families. Four families, the ubiquitin specific

proteases (USP, 56 members, not including isoforms), ubiqui-

tin C-terminal hydrolases (UCH, 4 members), ovarian tumour

domain containing proteases (OTU, 14 members), and Ma-

chado Joseph disease (MJD)/Josephin domain deubiquitinases

(4 members), are Cys-dependent proteases, containing a cat-

alytic triad of Cys, His and Asp/Asn.11 The mechanism of the

hydrolysis reaction is similar to that of the plant protease

papain.12 The fifth family of deubiquitinases is JAB1/MPN/

Mov34 metalloenzyme (JAMM, also known as MPN+ and

hereafter indicated as JAMM/MPN+) domain containing

zinc-dependent metalloproteases (12 members),11,13 the me-

chanism of which has recently been elucidated by structural

work.14

USP enzymes were first identified and cloned in the yeast

Saccharomyces cerevisiae.15 It was noted that this enzyme class

contained an N-terminal Cys-box and a C-terminal His-box,

which comprise the catalytic Cys and His residues, respec-

tively. This annotation was crucial for the subsequent identi-

fication of further yeast and human USP enzymes.11,16 Early

work in Drosophila revealed Epsin (Drosophila liquid facets

(lqf)) as a substrate for USP9 (Drosophila fat facets (faf)),17 an

interaction that regulates Notch signalling.18 This suggested

that USP enzymes might have specific substrates and can

regulate distinct signalling pathways. More exciting research

on this enzyme family in the past decade has linked many

USPs to specific biological pathways, revealing that USPs

1

5

10

15

20

25

30

35

40

45

50

55

1

5

10

15

20

25

30

35

40

45

50

55

aMRC Laboratory of Molecular Biology, Protein and Nucleic AcidChemistry Division, Hills Road, Cambridge, UK CB2 0QH.E-mail: [email protected]; Fax: +44 (0)1223412178;Tel: +44 (0)1223402300

bMiltenyi Biotec, Friedrich-Ebert-Strasse 68, 51429 Bergisch-Gladbach, Germany

w Electronic supplementary information (ESI) available: [DETAILS].See DOI: 10.1039/b907669gz These authors contributed equally to the work.

This journal is c The Royal Society of Chemistry 2009 Mol. BioSyst., 2009, ]]], 1–12 | 1

PAPER www.rsc.org/molecularbiosystems | Molecular BioSystems

regulate cell signalling at various levels. Examples include

USP7/HAUSP which regulates p53 and MDM2 levels in

cells,19 while USP28 stabilises the transcription factor c-myc

by removing degradative Lys48 ubiquitin chains.20 CYLD is a

Lys63-specific USP deubiquitinase in the NF-kB pathway, and

inhibits activation of the IKK kinase complex.21 USP8/UBPY

affects endosomal trafficking,22 while USP14 is associated with

the proteasome and required for ubiquitin recycling by remov-

ing Lys48-linked chains from proteasome substrates.23 How-

ever, many members of the USP family have not been studied

in detail.

USP family deubiquitinases range in size between 330 and

3500 amino acids, with an average size of 800–1000 residues

for the full-length enzymes. As commonly found in signalling

proteins, many USP deubiquitinases have a modular architec-

ture, and not only contain a catalytic domain but also addi-

tional protein–protein interaction and localisation domains.11

Here, we further analyse the catalytic domains of USP

family deubiquitinases on a sequence and structural level.

Guided by multiple sequence alignments and crystal struc-

tures, we divide the USP catalytic core into six conserved

boxes, and we assign such sequence elements to the structure

of the USP catalytic domain. Our analysis reveals that USPs

are prone to harbour inserted sequences at a number of points

between boxes. Interestingly, the insertions can fold into

independent domains that can be involved in the regulation

of deubiquitinase activity.

Results and discussion

Identification of conserved core components of USP catalytic

domains

The catalytic core of all human USP deubiquitinases was

extracted from sequence databases based on the boundaries

annotated in ProSite.24 This analysis shows that USP catalytic

domains comprise between 295 and 840 residues (Fig. 1), and

27 out of 56 USP deubiquitinases contain a catalytic core of

300 to 400 amino acids (Aa). The remaining 29 USPs contain

catalytic domains of more than 400 and up to 850 Aa. This

difference in size for a catalytic domain is surprising, and

suggests that either different subclasses of USP domains exist

which are variations of a catalytic fold or that insertions are

incorporated into a similar catalytic fold.

Multiple sequence alignment of the USP catalytic cores

identifies six regions of notable conservation, numbered box

1 to box 6, that are present in all USP domains (Fig. 2). In this

alignment, box 1 contains the catalytic Cys residue, box 5

contains the catalytic His, and box 6 contains the catalytic

Asp/Asn residue. All boxes show several additional conserved

features and residues. Box 3 and box 4 in most (45/56) human

USP domains each contain a conserved Cys-X-X-Cys motif,

which is most commonly associated with a zinc-binding site

(see below).

The annotated boxes represent a minimal definition of a

USP domain, and reveal that between individual boxes, large

insertions can occur. Between box 3 and box 4, inserted

sequences in 12 USPs comprise between 90 and 314 Aa.

Between box 4 and box 5, further 12 USPs contain insertions

of between 64 and 283 Aa (Fig. 2). Less frequent insertions

also exist between boxes 2 and 3 (USP20, USP33 and USP1,

between 172 and 195 Aa), boxes 1 and 2 (USP9X, USP9Y,

USP44, USP49, between 44 and 66 Aa) and boxes 5 and 6

(USP1, USP40, 142 and 131 Aa, respectively) (Fig. 2).

Three USPs show two insertions and not surprisingly these

USPs contain the largest annotated USP domains. USP1 (USP

domain of 704 Aa) contains inserts between boxes 2 and 3 (195

Aa) and between boxes 5 and 6 (142 Aa). USP6 (USP domain

of 847 Aa) and its close homologue USP32 (834 Aa) harbour

two insertions between boxes 3/4 (314 Aa) and boxes 4 and 5

(191/187 Aa) (Fig. 2, Table S1, ESIw).

Mapping of boxes onto USP structure reveals common insertion

points

Structural studies have defined the USP domain fold, and 5

crystal structures of mammalian USP domains have been

published.25–29 USP domains share a common conserved fold,

and so far only the structure of CYLD, the most divergent

USP at the sequence level, showed some unique features,25 and

is further discussed below. The structure of USP7/HAUSP

was the first to be determined, and will serve as the model for

our analysis.26

The USP7/HAUSP USP domain resembles an open hand

containing Thumb, Palm and Fingers subdomains (Fig. 3A).26

The catalytic triad resides between the Thumb (Cys) and Palm

subdomains (His/Asp). This architecture with Thumb, Palm

and Fingers is conserved in all USP domains, with the excep-

tion of CYLD, which contains a significantly truncated Fin-

gers subdomain25 (Fig. S1, ESIw). Most USP domains

analysed to date cleave the isopeptide linkage between two

ubiquitin molecules, and hence contain (at least) two ubiqui-

tin-binding sites, one for the distal ubiquitin, the C-terminus of

which is linked to the Lys residue on the proximal ubiquitin in

a second, proximal binding site (Fig. 3B). Complex structures

of USP7/HAUSP, USP2 and USP14 covalently bound to

ubiquitin have shown that the Fingers subdomain wraps

around the first distal ubiquitin molecule, and its C-terminus

extends towards the active site where hydrolysis occurs26,27,29

(Fig. 3B). This distal ubiquitin-binding surface of USPs (cor-

responding to the S1 site of the enzyme according to standard

enzyme nomenclature) is the primary ubiquitin interaction

site, as it covers up to 40% of the ubiquitin surface (B1800

A2). In contrast, interaction with the proximal ubiquitin (i.e.

the molecule providing the Lys residue for the isopeptide, at

the S10 site of the enzyme) is less extensive. Although no USPhas been crystallised with a diubiquitin substrate across the

active site, USPs generally have a flat surface at the proximal

end of the active site, not providing an extensive ubiquitin-

binding interface (Fig. 3B).

We mapped the boxes derived from the multiple sequence

alignment on the structure of USP7/HAUSP, and annotated

the USP7/HAUSP core domain according to Fig. 2 (Fig. 3C

and 3E). The same analysis was also performed for USP2,

USP8, USP14 and CYLD (Fig. S1, ESIw). This reveals thatbox 1 and box 2 form the helical Thumb subdomain, box 3

and box 4 form the extended b-sheet structure of the Fingerssubdomain, and box 5 and box 6 form the Palm subdomain

1

5

10

15

20

25

30

35

40

45

50

55

1

5

10

15

20

25

30

35

40

45

50

55

2 | Mol. BioSyst., 2009, ]]], 1–12 This journal is c The Royal Society of Chemistry 2009

comprising most of the b-sheet core of the protein (Fig. 3C).

Some interesting details are emerging from the structural

mapping. Box 3 forms two of the three large b-strands of

the Fingers subdomain, and also contributes a helix (a6) to theThumb subdomain and strand b3 to the core b-sheet of thePalm subdomain (Fig. 3C and E). Box 4 contributes the

remaining Fingers sheets, as well as strand b8 to the Palm b-sheet (Fig. 3C and E).

Annotation of the boxes onto the USP domain structure

allows a structural understanding of where the common

insertion points are located (Fig. 3D). Importantly, the bound-

aries of the boxes in no case disrupt a secondary structure

element, and hence all insertions occur in loop regions

(Fig. 3D and 3E).

However, not all secondary structure elements of the crys-

tallised USP domains are described in boxes, and additional

elements with secondary structure are observed, mostly at

points where also larger insertions can occur. For example

USP2 and USP8 contain a prominent a-helix between boxes 2and 3, folding along the Fingers subdomain of these enzymes

(grey in Fig. S1, ESIw). The structure of USP14 revealed a

small non-conserved insertion of 40 Aa between boxes 4 and 5,

which forms an extended loop, mainly disordered in the crystal

structure, remote from the ubiquitin-binding site. Box 6 of

CYLD is disrupted by a small insertion containing two short

helices (grey in Fig. S1, ESIw). These small insertions might be

due to common structural variations, which are not conserved

between USPs.

Structural and functional consequences for insertions within the

USP core domain

The six conserved boxes define five spatial positions where

insertions are frequently introduced. Different parts of the

USP core structure might be affected by insertions at different

points, potentially impacting on USP function. While some

insertions might directly alter ubiquitin binding at the distal or

the proximal ubiquitin-binding site of the USP domain itself,

other insertion points are remote from the catalytic centre and

unlikely to affect ubiquitin binding directly.

The most common insertion points in human USPs are

between boxes 3 and 4, and boxes 4 and 5, with 12 occurrences

each (Fig. 2). An insertion between boxes 3 and 4 resides

within the Fingers subdomain, which could potentially directly

interfere with ubiquitin binding at the distal ubiquitin-binding

site, and therefore might affect catalytic efficiency. The notion

of ubiquitin-like (Ubl) domains in eight USPs at this point is

intriguing30 (see below). Four USPs, USP16, USP35, USP38

and USP45, do not contain ubiquitin-like or other domains

within their insertions, and are evolutionarily distant from the

Ubl containing family members (see below).

Similarly, an insertion between box 4 and box 5 might

extend the ubiquitin-binding platform provided by the Fingers

subdomain. This is indeed the case for USP5/IsoT and USP13,

1

5

10

15

20

25

30

35

40

45

50

55

1

5

10

15

20

25

30

35

40

45

50

55

Fig. 1 Human USP catalytic domains. The catalytic domains of human USP domains were extracted from ProSite (http://expasy.org/cgi-bin/

prosite/). Coloured USPs correspond to those with structural information available, and boxes are annotated and coloured in blue (box 1), light

blue (box 2), light green (box 3), dark green (box 4), orange (box 5) and dark red (box 6)(also see Fig. 3). Light pink colours show position of

insertions. The USP domains are ordered by the size of their catalytic domain in this representation.

This journal is c The Royal Society of Chemistry 2009 Mol. BioSyst., 2009, ]]], 1–12 | 3

which provide additional distal binding sites, in the form of

ubiquitin associated (UBA) domains31 (see below). USP37

contains three ubiquitin interacting motifs (UIMs) inserted

at an equivalent position. In contrast, CYLD contains a B-box

insertion at this point that does not bind to ubiquitin25 (see

below). The large box 4/5 insertions in USP6, USP25, USP26,

USP28, USP29 and USP32 have no identifiable domains.

1

5

10

15

20

25

30

35

40

45

50

55

1

5

10

15

20

25

30

35

40

45

50

55

Fig. 2 Alignment and box-classification of human USP domains. Multiple sequence alignment of all human USP domains identifies six conserved

boxes. Residues of the catalytic triad are shown in red. The residues forming the zinc-binding motif in the Fingers subdomain are displayed in blue.

Numbers between boxes indicate insertions, and large insertions are highlighted. USP names in bold indicate active enzymes with a complete

catalytic triad, and italics indicate USPs which have lost their zinc-binding motif.

4 | Mol. BioSyst., 2009, ]]], 1–12 This journal is c The Royal Society of Chemistry 2009

Insertions between box 1 and 2 might not affect distal but

proximal ubiquitin binding, as this insertion point is located

near the proximal ubiquitin-binding site of USPs. The binding

to the proximal ubiquitin has been implicated in linkage

specificity for the CYLD USP domain25 and for the structu-

rally unrelated JAMM/MPN+ metalloprotease AMSH-LP

(associated molecule with the SH3 domain of STAM

(AMSH)-like protease).14 Analogous to these examples, other

deubiquitinases might be able to recognise the sequence con-

text of the ubiquitin–ubiquitin linkage by specifically interact-

ing with the linkage-neighbouring residues in the proximal

ubiquitin. Only four deubiquitinases, USP9X, USP9Y, USP44

and USP49, have relatively small insertions at this point

(45–70 Aa). However, those insertions might shape the prox-

imal ubiquitin-binding site to allow for a different chain

specificity. Interestingly, USP9X was recently implicated to

regulate ubiquitination of AMP kinase related protein kinases

by an atypical Lys29/Lys33-linked ubiquitin chain.9

Insertions between box 2 and 3 or box 5 and 6 are

potentially projecting away from the catalytic centre as they

are located on a surface remote from the ubiquitin-binding

sites of USP domains. USP1 and USP40 contain an insertion

between box 5 and box 6, and the USP1 insertion at this site

contains a Gly-Gly motif, which was shown to undergo auto-

cleavage by the enzyme.32 It is currently unclear if the auto-

cleavage event happens in cis, which would mean that the

insertion could reach the active site. USP1, as well as USP20

and USP33, also have an insertion between box 2 and box 3.

USP20 and USP33 are closely related, and both were identified

to interact with the large ECV (Elongins, Von-Hippel-Lindau,

Cul2 SCF ligase) E3 ligase complex.33,34

Some insertions contain independently folded domains

As noted above, the large size-differences in USP catalytic

domains can be explained by the presence of inserted se-

quences at common insertion points. We next analysed the

content of the insertions to identify domains. Interestingly, a

number of independent domains have been described to over-

lap with the USP domains.

CYLD structure reveals B-box insertion. The recent crystal-

lographic analysis of the CYLD USP domain has revealed

that it deviates significantly from the canonical USP domain

fold25 (Fig. S1D, ESIw). CYLD contains a truncated Fingers

subdomain, and box 3 and box 4 are significantly shorter and

not conserved with other USP domains (Fig. 2). An insertion

in the long linker between box 4 and box 5 folds into a B-box

domain that coordinates two zinc ions25 (Fig. 4A). B-box

domains are present in the TRIM family of ubiquitin E3

ligases, but their function is not clear.35 In our previous

analysis of CYLD, we were able to express the isolated

B-box domain, and furthermore also the USP domain core

with the entire B-box removed.25. The truncated USP domain

was not affected in activity or in specificity for Lys63-linked

polyubiquitin chains. However, the B-box insertion affected

cellular localisation of CYLD. Removal of the B-box allowed

CYLD to enter the nucleus, while the wild-type protein was

excluded from the nucleus.25

UBA domains in USP5/IsoT and USP13. USP5/IsoT and

USP13 (also known as IsoT2) contain two UBA domains

within an insert of B200 residues between box 4 and box 5.

UBA domains are a common ubiquitin-binding fold compris-

ingB50 residues that form three helices36,37 (Fig. 4B). USP5/

IsoT and USP13 are specialised deubiquitinases required for

the processing of unattached ubiquitin chains. The roles of the

UBA domains for IsoT function have been investigated. It was

found that the first inserted UBA domain constituted the S2,

while the second inserted UBA constituted the S3 ubiquitin-

1

5

10

15

20

25

30

35

40

45

50

55

1

5

10

15

20

25

30

35

40

45

50

55

Fig. 3 Structures of USP7/HAUSP, USP14 and CYLD with anno-

tated boxes. (A) Cartoon representation of the structure of the USP7/

HAUSP catalytic domain (pdb-id 1nbf26) that serves as a generic

model for this USP domain analysis. The Thumb (blue), Palm (dark-

green) and Fingers (green) subdomains are shown. The position of the

zinc-binding motif in the tip of the Fingers subdomain, present in

many USPs (but not in USP7/HAUSP), is indicated, as is the active

site. The catalytic as well as zinc-binding residues are shown in a stick

model with yellow carbon atoms, blue nitrogen atoms, red oxygen

atoms and green sulfur atoms. (B) Ubiquitin binding to USP domains,

shown as in the USP7/HAUSP ubiquitin complex,26 with ubiquitin

under a yellow semi-transparent surface. The approximate position of

the proximal ubiquitin is indicated. (C) Representation of USP7/

HAUSP with boxes coloured in blue (box 1), light blue (box 2), light

green (box 3), dark green (box 4), orange (box 5) and dark red (box 6).

Regions in grey are not part of any box, and represent the points

where insertions can occur. (D) Structure of USP7/HAUSP with

insertion points highlighted in orange and labelled. (E) Topology

diagram for USP7/HAUSP, coloured according to the box definition,

generated with TopDraw.69 Insertion points are indicated. Helix a4b isnot conserved in other USP domains, and some of the smaller b-strands are not present in other USP structures.

This journal is c The Royal Society of Chemistry 2009 Mol. BioSyst., 2009, ]]], 1–12 | 5

binding site, while the S1 site is the common distal ubiquitin-

binding site residing in the Fingers subdomain (see above).31

An additional zinc finger ubiquitin specific protease (ZnF

UBP) domain constitutes the proximal S1 0 ubiquitin-bindingsite.38 The ZnF UBP domain of USP5/IsoT recognises free

ubiquitin chains by virtue of their free Gly-Gly motif, and

presents such free chain to the catalytic domain for efficient

hydrolysis into monoubiquitin.38 The case of USP5/IsoT is a

remarkable example of inserted domains directly regulating

catalytic activity and processivity of USP enzymes.

Ubiquitin interacting motifs in USP37. The ubiquitin inter-

acting motif (UIM) consists of a short helical stretch with a

sequence motif EEExLxxLALxLS that binds directly to ubi-

quitin39 (Fig. 4C). USP37 contains three UIMs between box 4

and box 5, the function of which is currently unclear. It is

possible that similar to IsoT, these UIMs provide distal

ubiquitin-binding sites allowing USP37 to efficiently bind to

and cleave longer ubiquitin chains.

A MYND domain in USP19. The Uniprot entry of USP19

annotates a MYND (myeloid-Nervy-DEAF-1) domain (43

residues, Fig. 4D) inserted within the catalytic core between

box 3 and box 4. MYND domains are zinc-binding folds,

which can serve as protein interaction domains,40 but also

have roles in protein ubiquitination.41

Identification of ubiquitin-like domains inserted into the USP

core. A recent report30 has identified a large number of

ubiquitin-like domains within USP family deubiquitinases,

several of which occurred within the catalytic domains. Ubi-

quitin-like folds are common in many proteins, however, not

all are linked to protein ubiquitination processes (Fig. 4E).

Interestingly, USP4, USP6, USP11, USP15, USP19, USP31,

USP32 and USP43 contain Ubl domains ofB87–89 Aa within

a larger insertion between box 3 and box 4. This intriguingly

places a Ubl fold in close proximity to the ubiquitin-binding

Fingers subdomain, suggesting a potential regulatory function

of the Ubl domain.30 It is tempting to speculate about an

autoinhibitory mechanism where the Ubl blocks the active site

by binding in cis to the distal binding site. Alternatively, the

Ubl might serve to recruit ubiquitinated substrate proteins,

many of which harbour ubiquitin–Ubl-binding domains. Ubl

domains are also present outside the USP domains, and in

some cases (e.g. USP4, USP11, USP15 and USP32) are

directly preceding the USP core, which would place a Ubl

fold right next to the proximal ubiquitin-binding site.30

1

5

10

15

20

25

30

35

40

45

50

55

1

5

10

15

20

25

30

35

40

45

50

55

Fig. 4 Inserted domains found in USP domains. (A) Structure of CYLD (pdb-id 2vhf25), indicating the B-box insertion (red) between box 4

(green) and box 5 (orange). (B) UBA domain of cbl-b (red) bound to ubiquitin (yellow, under a semi-transparent surface) (pdb-id 2oob70). USP5/

IsoT and USP13 contain UBA insertions between boxes 4 and 5. (C) UIM motif of Vps27 (blue) bound to ubiquitin (yellow, under a semi-

transparent surface) (pdb-id 1q0w71). USP37 contains three UIMs between boxes 4 and 5. (D) MYND domain of AML1-ETO (purple) in complex

with a peptide from SMRT (cyan) (pdb-id 2odd40). Zinc ions are shown as yellow spheres. USP19 contains a MYND domain between boxes 3 and

4. (E) Ubiquitin-like domains from different proteins, mParkin (top, pdb-id 2zeq72), mUSP14 (middle, pdb-id 1wgg, unpublished) and hNP952

(bottom, pdb-id 2faz, unpublished).

6 | Mol. BioSyst., 2009, ]]], 1–12 This journal is c The Royal Society of Chemistry 2009

Structural studies of this subclass of USPs will likely reveal

interesting insights.

USP1 was reported to also contain a Ubl fold in its insertion

between box 5 and box 6.32 It was found that USP1 undergoes

inhibitory auto-proteolysis at a Gly-Gly motif resembling the

C-terminus of ubiquitin, which is located in this insertion (see

above). However, a complete ubiquitin-like fold has not been

predicted in the recent Ubl annotation30 or in our own

sequence analysis (data not shown).

Conservation and structure prediction of insertion sequences

The current domain annotation only covers a fraction of USPs

with large insertions. In order to understand the role of the

inserts in other USPs we analysed these sequences for con-

servation amongst species, and performed structure and dis-

order prediction analyses (see Material and methods).

In general, the insertions are less conserved between species

compared to the rest of the catalytic domains, and hence can

be easily identified in most cases. Inserted folded domains are

usually well conserved, however the linker regions between

them show less conservation. USP37 contains an insertion of

283 Aa between box 4 and 5 harbouring three UIM motifs.

The entire insertion is well conserved between species, suggest-

ing that the UIM helices might be part of a structural scaffold

rather than independent flexible entities. Likewise, the inser-

tions between boxes 2 and 3 in USP20 and USP33 are well

conserved, and predicted to contain secondary structure. In

contrast, USP5/IsoT and USP13 have non-conserved linkers

between the UBA domains inserted between boxes 4 and 5,

which might allow for flexibility.

The large box 3/4 insertions containing a Ubl domain

(4250 Aa, see above) show conservation at the start (where

the Ubl is present) and at the end of the insert, but all also

contain a non-conserved stretch of B100 Aa in between.

Secondary structure prediction and threading indicates folded

regions for the conserved parts, linked by a non-conserved,

unfolded linker. This is in contrast to the large box 3/4

insertions in USP16 and USP35, which are entirely variable

between species and show a high degree of charged residues,

suggesting conformational flexibility.

Different insertion patterns in yeast USP domains

Much of the ubiquitin system is conserved in S. cerevisiae,

albeit compared to mammals, yeast contains a slightly down-

scaled version. S. cerevisiae contains 18 USP deubiquitinases,

14 of which contain a complete catalytic triad and are likely to

be active enzymes, and the same boxes can be readily anno-

tated due structural similarity of the USP fold (Fig. 5A). 11 of

18 yeast Ubp enzymes contain zinc-binding motifs in the

Fingers subdomain, further supporting marked similarities

within the USP domain architecture. The crystal structure of

yUbp6, the yeast homologue of USP14, has been deposited in

the protein data bank (pdb-id 1vjv, unpublished), and both

structures (in absence of bound ubiquitin) superimpose well,

with an RMSD of 1.2 A over 296 aligned residues (Fig. 5B).

The number of USP enzymes has increased roughly three-

fold from S. cerevisiae to human (18 to 56 members), however

the ratio of insertion carrying enzymes increased even further

(human: 30/56, yeast: 7/18, increase of 4.3 fold). This reveals

that insertions are more common in higher eukaryotes, and

some striking relationships with respect to the kind and

presence (or absence) of inserted sequences can be noted.

The box analysis performed on yeast USP domains reveals

large insertions in eight yeast enzymes, ranging from 80 to 400

residues (Fig. 5A). yUbp9 and yUbp13 display insertions

between boxes 1 and 2 (215 and 207 Aa, respectively), yUbp2

and yUbp3 contain 88 and 80 Aa insertions between boxes 2

and 3, Ubp12 is the single USP domain with a 400 Aa

insertion between boxes 3 and 4, and three yeast USP enzymes

(yUbp1, 164 Aa; yUbp2, 99 Aa; yUbp14, 152 Aa) have

insertions between boxes 4 and 5 (Fig. 5A).

It is interesting to compare the relative position of insertion

between the human and yeast USPs. The 200 Aa insertions

between boxes 1 and 2 in yeast Ubp9 and Ubp13 have no

equivalent in human USPs, which at most contain 45–70 Aa

insertions at this point (see above) (Fig. 2 and 5A). In contrast,

only Ubp12 shows an insertion between boxes 3 and 4, while

this is a common insertion point in human USPs. Interest-

ingly, yeast Ubp12 shows a Ubl within this insertion suggest-

ing evolutionary relationships of this subgroup of USPs (see

below). Yeast Ubp14 contains a UBA domain containing

insertion between boxes 4 and 5.42 Ubp1, which has a similar

sized insertion between boxes 4 and 5, does not contain any

discernible domains within its insertion.

Phylogenetic analysis of yeast and human USP domains

In order to understand the evolutionary relationships between

USP domains from yeast and human, we performed a phylo-

genetic analysis on the isolated USP domain sequences

(Fig. 6). This analysis confirmed that the domain of the

aforementioned hUSP14 is most closely related to yUbp6,

and that the closely related hUSP5/IsoT and hUSP13 are co-

orthologues of yUbp14. Additional relationships based on the

USP domain can be identified, with other pairs comprising

hUSP7/HAUSP and yeast Ubp15; hUSP52 and yPAN2;

hUSP39 and ySAD1. Some yeast sequences can be related to

several human USPs, which might indicate that gene duplica-

tion events have occurred. The yeast Ubp8 is an orthologue to

the human proteins USP22, USP27 and USP51, and yUbp10

is related to hUSP26, hUSP29 and hUSP37. Several yeast USP

domains, e.g. yUbp1 and yUbp16, as well as yDoa4, yUbp5,

yUbp7 and yUbp11 have no clear orthologues in humans

(Fig. 6).

Yeast Ubp12 is the only yeast USP with an insertion

between boxes 3 and 4, and interestingly, this insertion carries

a Ubl fold. This places yUbp12 in the group of eight human

enzymes with Ubl-containing insertions (see above). The

phylogenetic analysis reveals that human enzymes USP4,

USP11, USP15 and USP19 are the closest relatives to yUbp12

(Fig. 6). The remaining human USPs with inserted sequences

between boxes 3 and 4, USP16, USP45 and USP35, USP38 are

found in isolated pairs further away evolutionarily, revealing

that these unrelated insertions have occurred at later stages.

Similarly, human USP domains related to yUbp10 (i.e.

USP37, USP26, USP29) have acquired large insertions be-

tween boxes 4 and 5 (Fig. 6).

1

5

10

15

20

25

30

35

40

45

50

55

1

5

10

15

20

25

30

35

40

45

50

55

This journal is c The Royal Society of Chemistry 2009 Mol. BioSyst., 2009, ]]], 1–12 | 7

The phylogenetic tree reveals the close evolutionary rela-

tionship between the human USP12 and USP46 to the yeast

Ubp9 and yUbp13. Here, the human USP domains are

relatively small and do not harbour any major insertions,

while both their yeast homologues have large inserted se-

quences (215/207 Aa, respectively, Fig. 5A) between their

boxes 1 and 2. Intriguingly, USP12 and USP46 (as well as

USP1) are activated by the WD40 repeat protein USP1

associated factor-1 (UAF1).43,44 With such large insertions

in the catalytic core of their yeast proteins, it is tempting to

speculate about an activating function of the insertions.

However, secondary structure prediction on the yeast se-

quences does not reveal any occurrences of WD40 repeats

(data not shown). Human USP10 and yeast Ubp3 is another

pair where the human catalytic domain is undisrupted whereas

the yeast orthologue carries an insertion.

Several USP orthologues between yeast and human are

known to be functionally equivalent. Yeast Ubp6 and human

USP14 are components of the proteasome lid where they serve

important chain remodelling functions on proteasome sub-

strates,23,27,45 while yeast Ubp14 and human USP5 cleave free

ubiquitin chains, replenishing the cellular monoubiquitin

pool.31,38,42,46 Yeast Ubp8 and human USP22 both affect

histone deubiquitination and are components of the SAGA

complex.47–49 Whether further functional relationships also

exist between other orthologues is not clear at the moment,

and it is intriguing that some closely related enzymes (e.g.

USP27, USP51 that are closely related to USP22) (Fig. 6) have

not been studied in detail.

Zn serves as molecular glue in USPs with box 3/4 insertions

As mentioned above, boxes 3 and 4 contain a Cys-X-X-Cys

motif each, which has been shown to be a functional zinc-

binding motif.28,50 45 out of 56 human enzymes and 11 out of

18 yeast USP domains contain a functional zinc-binding site.

The motifs are located at the tip of the Fingers subdomain

(Fig. 3A), and reside on perpendicular b-hairpin loops

(Fig. 3A and 7). The four Cys residues coordinate one zinc-

ion, structurally resolved in the structures of USP8 and

USP228,29 (Fig. 7, Fig. S1 (ESIw)). The remaining structures

(USP7/HAUSP and USP14) lack the zinc-coordinating resi-

dues, yet the zinc-ribbon fold remains, suggesting that in the

course of evolution, zinc-binding residues and hence coordi-

nation capacity have been lost51 (Fig. 2 and 3 and Fig. S1

(ESIw)).USP15 contains a functional zinc-binding motif, and also a

large Ubl-containing insertion between boxes 3 and 4,

1

5

10

15

20

25

30

35

40

45

50

55

1

5

10

15

20

25

30

35

40

45

50

55

Fig. 5 Different distribution of insertions in yeast USP domains. Multiple sequence alignment of the 18 S. cerevisiae USP domains, annotated as

in Fig. 2. Residues of the catalytic triad are shown in red. The residues forming the zinc-binding motif in the Fingers subdomain are displayed in

blue. Numbers between boxes indicate insertions, and large insertions are highlighted. USP names in bold indicate active enzymes with a complete

catalytic triad, and italics indicate USPs which have lost their zinc-binding motif.

8 | Mol. BioSyst., 2009, ]]], 1–12 This journal is c The Royal Society of Chemistry 2009

separating the Cys-X-X-Cys motifs by B330 residues. Bio-

chemical and in vivo studies on USP15 have investigated the

role of zinc for USP15 function.50 A zinc-chelating agent, o-

phenanthroline, abrogated USP15 function in vivo, as did

mutation of a Cys residue. In vitro experiments showed that

zinc-devoid USP15 mutants could no longer cleave ubiquitin

chains, while the enzyme was still able to cleave ubiquitin-GFP

fusion protein, suggesting that only ubiquitin polymer hydro-

lysis was affected. Hence zinc-binding and structural integrity

of the zinc-binding site are important for USP15 function,50

and potentially also for other zinc-binding USP enzymes.

It is interesting to note that all USP domains with insertions

between boxes 3 and 4 have retained their ability to bind zinc

(Fig. 2). Potentially, zinc-binding facilitates folding of the USP

core, helping the interaction of sequence motifs some few

hundred residues apart, allowing incorporation of inserted

domain, which in case of USP19 even contain another zinc-

binding MYND domain (see above). Insertions at other points

are remote from the Fingers subdomain and are unlikely to

benefit from a stabilising mechanism by zinc-binding to the

USP catalytic core. In fact, four USPs with large insertions

between boxes 4 and 5 (USP5/IsoT, USP13, USP25, USP28)

have lost the ability to coordinate zinc within their Fingers

subdomain.

1

5

10

15

20

25

30

35

40

45

50

55

1

5

10

15

20

25

30

35

40

45

50

55

Fig. 6 Phylogenetic relationships of human and yeast USP enzymes. Phylogenetic tree of all human and yeast USP domains corresponding to

sequences in Fig. 2 and 5. Yeast Ubp enzymes are coloured green. Names in italics do not contain a functional zinc-binding site in the Fingers

subdomain, and bold font indicates active enzymes with a complete catalytic triad. Brackets in blue indicate USPs with insertions between

annotated boxes. Branch points labelled with black dots are more reliable (466% bootstrap support) than unlabelled bifurcations. Branch length

corresponds roughly to the evolutionary distance.

This journal is c The Royal Society of Chemistry 2009 Mol. BioSyst., 2009, ]]], 1–12 | 9

Conclusions

Many USP deubiquitinases have important cellular functions,

however their large size makes this class of enzymes difficult to

study biochemically. Our analysis of the USP catalytic domain

described here gives further insights into the sophisticated

catalytic domain architecture, and we reveal that USP do-

mains consist of a common conserved catalytic core which is

interspersed at five points with insertions, some of which as big

as the catalytic domain itself. In previous studies using

BLOCKS analysis, eight small regions of homology (between

15 and 25 Aa) within USP sequences from yeast and Droso-

phila had been annotated,52,53 and are consistent with the

structure-based dissection of USP domains into six boxes

performed in this study. The recent structural definition of

the underlying USP domain fold allowed our more rigorous

annotation of structurally equivalent sequence stretches, en-

abling a box annotation that covers almost 100% of the USP

fold (e.g. in USP50), yet allowing for structural variation in

divergent USP family members (e.g. in CYLD).

The phylogenetic analysis of USP domains from yeast and

humans revealed some surprising insights. It is striking that

the yeast USP domains are overall similarly complicated and

also contain large folded insertions. The USP core fold with its

distinct structure, however, is ancient and exists almost un-

changed in fungi, higher eukaryotes as well as plants. It seems

that fine-tuning of USP deubiquitinases was evolutionarily

performed by acquiring insertions (e.g. through alternative

splicing events). However, this also implies that the ancient

USP insertions that had already evolved prior to the yeast

ubiquitin system, are likely to regulate fundamental USP

function (such as activity, specificity or processivity) rather

than having roles in more sophisticated regulation networks

present only in higher eukaryotes (such as subcellular localisa-

tion).

Many human USPs have not been studied in detail, and

therefore identification of close counterparts in a more trace-

able system such as yeast might be a useful resource, even if the

homology is merely based on the catalytic domain itself. In

addition, the annotation of USP domains might allow rational

design of insertion-less constructs that retain catalytic proper-

ties (as shown in our earlier work on CYLD25), which might

aid further structural and functional studies on these enzymes.

A number of USP enzymes are drug targets, however, to date

no USP-directed small-molecule inhibitors are available.54,55

Few inhibitors exist for other deubiquitinase families, how-

ever,56 and recently a small-molecule inhibitor was described

for a USP-related SARS deubiquitinase, Plpro.57,58 Drug

discovery on USPs is complicated by the fact that in order

to produce the enzymes, insect cell expression is often re-

quired, yet still yields only low amounts of protein in some

cases.59 Removal of insertions might make USP domains more

amenable for (bacterial) expression, allowing higher protein

yields and enable high-throughput screening.

The occurrence of inserted sequences and domains within

USP catalytic domains is interesting. Importantly, domain

annotation tools, including ProSite (http://expasy.org/cgi-

bin/prosite/), InterPro (http://www.ebi.ac.uk/interpro/) or

Pfam (http://pfam.sanger.ac.uk/), do not annotate the inser-

tions within USP domains, and hence these regions might not

have been investigated for inserted domains.

It will be interesting to investigate the function (s) of the

insertions, as so far this has only been done for USP5/IsoT,

where the UBA domains have a direct effect on catalytic

activity,31 and the B-box domain in CYLD, which impacts

on localisation of the enzyme.25 The annotations performed

here will enable further studies on this interesting class of

enzymes.

Material and methods

Sequence analysis

Sequence database searches were carried out with current

releases of Uniprot and GenPept.60,61 Initial multiple align-

ments were calculated by MUSCLE62 and visualised with

Boxshade 3.2 (http://www.ch.embnet.org/software/BOX_

form.html). Generalised profile construction63 and searches

for analysing the USP box topology were run locally using the

pftools package, version 2.1. Generalised profiles were con-

structed using the BLOSUM45 substitution matrix64 and

default penalties of 2.1 for gap opening and 0.2 for gap

extension. Profile matches were analysed for statistical signifi-

cance by means of the score distribution of a randomised

database.65 A significance threshold of P= 0.01 as calculated

from database shuffling analysis was used as threshold to

include sequences in subsequent rounds of iterative profile

refinement.

Phylogenetic analysis of human and yeast USP domains

Phylogenetic analysis was performed on the MUSCLE-gener-

ated alignment (Fig. 2 and 5A). Alignment regions that

contain gaps or are otherwise unreliable were removed. The

remaining multiple alignment was used for dendogram con-

struction by ClustalW, using the Neighbor-Joining algo-

rithm.66 Reliability of the branch points was estimated by

bootstrap analysis (100 replicates).

Structural mapping of the boxes and insertions

Structural analysis was based on the crystal structures of

human USP7 (pdb-id 1nbf26), USP8 (pdb-if 2gfo28), USP2

1

5

10

15

20

25

30

35

40

45

50

55

1

5

10

15

20

25

30

35

40

45

50

55

Fig. 7 Zinc-binding as molecular glue. Structure of the USP8 cata-

lytic domain (pdb-id 2gfo28) showing the zinc-binding site that is

formed by box 3 and box 4. The insertion point is indicated.

10 | Mol. BioSyst., 2009, ]]], 1–12 This journal is c The Royal Society of Chemistry 2009

(pdb-id 2hd529), USP14 (pdb-id 2ayo27), and CYLD (pdb-id

2vhf25). The catalytic core region was extracted from each

structure according to ProSite (http://www.expasy.ch/prosite/)

annotation. Superposition of these USP structures in coot67

revealed structurally conserved features. All structure figures

were created using PyMol (http://www.pymol.org).

Analysis of inserted sequences

The inserted sequences according to the box annotation were

extracted from the alignments. Domain definitions as anno-

tated in ProSite were used where available. Further analysis

for un-annotated structural features and predicted folded

sequences/domains/motifs was performed by threading with

PHYRE.68 For evolutionary conservation analysis of USP

domain insertions, alignments of individual full-length USPs

from different species annotated in the Ensembl database

(http://www.ensembl.org/) were examined.

References

1 C. M. Pickart, Annu. Rev. Biochem., 2001, 70, 503–533.2 L. A. Passmore and D. Barford, Biochem. J., 2004, 379, 513–525.3 P. Xu, D. M. Duong, N. T. Seyfried, D. Cheng, Y. Xie, J. Robert,J. Rush, M. Hochstrasser, D. Finley and J. Peng, Cell (Cambridge,Mass.), 2009, 137, 133–145.

4 C. M. Pickart and D. Fushman, Curr. Opin. Chem. Biol., 2004, 8,610–616.

5 F. Ikeda and I. Dikic, EMBO Rep., 2008, 9, 536–542.6 A. Hershko and A. Ciechanover, Annu. Rev. Biochem., 1998, 67,425–479.

7 Z. J. Chen and L. J. Sun, Mol. Cell, 2009, 33, 275–286.8 L. Jin, A. Williamson, S. Banerjee, I. Philipp and M. Rape, Cell(Cambridge, Mass.), 2008, 133, 653–665.

9 A. K. Al-Hakim, A. Zagorska, L. Chapman, M. Deak, M. Peggieand D. R. Alessi, Biochem. J., 2008, 411, 249–260.

10 F. E. Reyes-Turcu and K. D. Wilkinson, Chem. RevQ3 ., 2009.11 S. M. Nijman, M. P. Luna-Vargas, A. Velds, T. R. Brummelkamp,

A. M. Dirac, T. K. Sixma and R. Bernards, Cell (Cambridge,Mass.), 2005, 123, 773–786.

12 S. C. Johnston, S. M. Riddle, R. E. Cohen and C. P. Hill, EMBOJ., 1999, 18, 3877–3887.

13 G. A. Cope, G. S. Suh, L. Aravind, S. E. Schwarz, S. L. Zipursky,E. V. Koonin and R. J. Deshaies, Science, 2002, 298, 608–611.

14 Y. Sato, A. Yoshikawa, A. Yamagata, H. Mimura, M. Yamashita,K. Ookata, O. Nureki, K. Iwai, M. Komada and S. Fukai, Nature,2008, 455, 358–362.

15 J. W. Tobias and A. Varshavsky, J. Biol. Chem., 1991, 266,12021–12028.

16 R. T. Baker, J. W. Tobias and A. Varshavsky, J. Biol. Chem., 1992,267, 23364–23375.

17 X. Chen, B. Zhang and J. A. Fischer, Genes Dev., 2002, 16,289–294.

18 E. Overstreet, E. Fitch and J. A. Fischer, Development (Cambridge,UK), 2004, 131, 5355–5366.

19 C. L. Brooks and W. Gu, Mol. Cell, 2006, 21, 307–315.20 N. Popov, M. Wanzel, M. Madiredjo, D. Zhang, R. Beijersbergen,

R. Bernards, R. Moll, S. J. Elledge and M. Eilers, Nat. Cell Biol.,2007, 9, 765–774.

21 S. C. Sun, Nat. Rev. Immunol., 2008, 8, 501–511.22 M. J. Clague and S. Urbe, Trends Cell Biol., 2006, 16, 551–559.23 A. Guterman and M. H. Glickman, Curr. Protein Pept. Sci., 2004,

5, 201–211.24 N. Hulo, A. Bairoch, V. Bulliard, L. Cerutti, E. De Castro,

P. S. Langendijk-Genevaux, M. Pagni and C. J. Sigrist, NucleicAcids Res., 2006, 34, D227–230.

25 D. Komander, C. J. Lord, H. Scheel, S. Swift, K. Hofmann,A. Ashworth and D. Barford, Mol. Cell, 2008, 29, 451–464.

26 M. Hu, P. Li, M. Li, W. Li, T. Yao, J. W. Wu, W. Gu, R. E. Cohenand Y. Shi, Cell (Cambridge, Mass.), 2002, 111, 1041–1054.

27 M. Hu, P. Li, L. Song, P. D. Jeffrey, T. A. Chenova,K. D. Wilkinson, R. E. Cohen and Y. Shi, EMBO J., 2005, 24,3747–3756.

28 G. V. Avvakumov, J. R. Walker, S. Xue, P. J. Finerty, Jr.,F. Mackenzie, E. M. Newman and S. Dhe-Paganon, J. Biol.Chem., 2006, 281, 38061–38070.

29 M. Renatus, S. G. Parrado, A. D’Arcy, U. Eidhoff, B. Gerhartz,U. Hassiepen, B. Pierrat, R. Riedl, D. Vinzenz, S. Worpenberg andM. Kroemer, Structure (London), 2006, 14, 1293–1302.

30 X. Zhu, R. Menard and T. Sulea, Proteins, 2007, 69, 1–7.31 F. E. Reyes-Turcu, J. R. Shanks, D. Komander and

K. D. Wilkinson, J. Biol. Chem., 2008, 283, 19581–19592.32 T. T. Huang, S. M. Nijman, K. D. Mirchandani, P. J. Galardy,

M. A. Cohn, W. Haas, S. P. Gygi, H. L. Ploegh, R. Bernards andA. D. D’Andrea, Nat. Cell Biol., 2006, 8, 339–347.

33 Z. Li, D. Wang, E. M. Messing and G. Wu, EMBO Rep., 2005, 6,373–378.

34 Z. Li, D. Wang, X. Na, S. R. Schoen, E. M. Messing and G. Wu,Biochem. Biophys. Res. Commun., 2002, 294, 700–709.

35 G. Meroni and G. Diez-Roux, Bioessays, 2005, 27, 1147–1157.36 K. Hofmann and P. Bucher, Trends Biochem. Sci., 1996, 21,

172–173.37 B. L. Bertolaet, D. J. Clarke, M. Wolff, M. H. Watson, M. Henze,

G. Divita and S. I. Reed, Nat. Struct. Biol., 2001, 8, 417–422.38 F. E. Reyes-Turcu, J. R. Horton, J. E. Mullally, A. Heroux,

X. Cheng and K. D. Wilkinson, Cell (Cambridge, Mass.), 2006,124, 1197–1208.

39 K. Hofmann and L. Falquet, Trends Biochem. Sci., 2001, 26,347–350.

40 Y. Liu, W. Chen, J. Gaudet, M. D. Cheney, L. Roudaia,T. Cierpicki, R. C. Klet, K. Hartman, T. M. Laue, N. A. Speckand J. H. Bushweller, Cancer Cell, 2007, 11, 483–497.

41 D. Ju, X. Wang, H. Xu and Y. Xie, Mol. Cell. Biol., 2008, 28,1404–1412.

42 A. Amerik, S. Swaminathan, B. A. Krantz, K. D. Wilkinson andM. Hochstrasser, EMBO J., 1997, 16, 4826–4838.

43 M. A. Cohn, Y. Kee, W. Haas, S. P. Gygi and A. D. D’Andrea,J. Biol. Chem., 2009, 284, 5343–5351.

44 M. A. Cohn, P. Kowal, K. Yang, W. Haas, T. T. Huang,S. P. Gygi and A. D. D’Andrea, Mol. Cell, 2007, 28, 786–797.

45 B. Crosas, J. Hanna, D. S. Kirkpatrick, D. P. Zhang, Y. Tone,N. A. Hathaway, C. Buecker, D. S. Leggett, M. Schmidt,R. W. King, S. P. Gygi and D. Finley, Cell (Cambridge, Mass.),2006, 127, 1401–1413.

46 S. Dayal, A. Sparks, J. Jacob, N. Allende-Vega, D. P. Lane andM. K. Saville, J. Biol. Chem., 2009, 284, 5030–5041.

47 K. W. Henry, A. Wyce, W. S. Lo, L. J. Duggan, N. C. Emre,C. F. Kao, L. Pillus, A. Shilatifard, M. A. Osley and S. L. Berger,Genes Dev., 2003, 17, 2648–2663.

48 X. Y. Zhang, M. Varthi, S. M. Sykes, C. Phillips, C. Warzecha,W. Zhu, A. Wyce, A. W. Thorne, S. L. Berger andS. B. McMahon, Mol. Cell, 2008, 29, 102–111.

49 Y. Zhao, G. Lang, S. Ito, J. Bonnet, E. Metzger, S. Sawatsubashi,E. Suzuki, X. Le Guezennec, H. G. Stunnenberg, A. Krasnov,S. G. Georgieva, R. Schule, K. Takeyama, S. Kato, L. Tora andD. Devys, Mol. Cell, 2008, 29, 92–101.

50 B. K. Hetfeld, A. Helfrich, B. Kapelari, H. Scheel, K. Hofmann,A. Guterman, M. Glickman, R. Schade, P. M. Kloetzel andW. Dubiel, Curr. Biol., 2005, 15, 1217–1221.

51 S. S. Krishna and N. V. Grishin, Cell Cycle, 2004, 3, 1046–1049.52 K. D. Wilkinson, FASEB J., 1997, 11, 1245–1256.53 X. Chen and J. A. Fischer, Genetics, 2000, 156, 1829–1836.54 S. J. Goldenberg, J. L. McDermott, T. R. Butt, M. R. Mattern and

B. Nicholson, Biochem. Soc. Trans., 2008, 36, 828–832.55 A. Shanmugham and H. Ovaa, Curr. Opin. Drug Discovery Dev.,

2008, 11, 688–696.56 K. R. Love, A. Catic, C. Schlieker and H. L. Ploegh, Nat. Chem.

Biol., 2007, 3, 697–705.57 K. Ratia, S. Pegan, J. Takayama, K. Sleeman, M. Coughlin,

S. Baliji, R. Chaudhuri, W. Fu, B. S. Prabhakar, M. E. Johnson,S. C. Baker, A. K. Ghosh and A. D. Mesecar, Proc. Natl. Acad.Sci. U. S. A., 2008, 105, 16119–16124.

58 C. Y. Chou, C. H. Chien, Y. S. Han, M. T. Prebanda, H. P. Hsieh,B. Turk, G. G. Chang and X. Chen, Biochem. Pharmacol., 2008,75, 1601–1609.

1

5

10

15

20

25

30

35

40

45

50

55

1

5

10

15

20

25

30

35

40

45

50

55

This journal is c The Royal Society of Chemistry 2009 Mol. BioSyst., 2009, ]]], 1–12 | 11

59 J. M. Schlaeppi, M. Henke, M.Mahnke, S. Hartmann, R. Schmitz,Y. Pouliquen, B. Kerins, E. Weber, F. Kolbinger andH. P. Kocher, Protein Expression Purif., 2006, 50, 185–195.

60 T. U. Consortium, Nucleic Acids Res., 2009, 37, D169–174.61 D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell and

D. L. Wheeler, Nucleic Acids Res., 2008, 36, D25–30.62 R. C. Edgar, Nucleic Acids Res., 2004, 32, 1792–1797.63 P. Bucher, K. Karplus, N. Moeri and K. Hofmann, Comput.

Chem., 1996, 20, 3–23.64 S. Henikoff and J. G. Henikoff, Proc. Natl. Acad. Sci. U. S. A.,

1992, 89, 10915–10919.65 K. Hofmann, Brief. Bioinformatics, 2000, 1, 167–178.66 M. A. Larkin, G. Blackshields, N. P. Brown, R. Chenna,

P. A. McGettigan, H. McWilliam, F. Valentin, I. M. Wallace,

A. Wilm, R. Lopez, J. D. Thompson, T. J. Gibson andD. G. Higgins, Bioinformatics, 2007, 23, 2947–2948.

67 P. Emsley and K. Cowtan, Acta Crystallogr., Sect. D: Biol.Crystallogr., 2004, 60, 2126–2132.

68 L. A. Kelley and M. J. Sternberg, Nat. Protocols, 2009, 4, 363–371.69 C. S. Bond, Bioinformatics, 2003, 19, 311–312.70 P. Peschard, G. Kozlov, T. Lin, I. A. Mirza, A. M. Berghuis,

S. Lipkowitz, M. Park and K. Gehring, Mol. Cell, 2007, 27,474–485.

71 K. A. Swanson, R. S. Kang, S. D. Stamenova, L. Hicke andI. Radhakrishnan, EMBO J., 2003, 22, 4597–4606.

72 K. Tomoo, Y. Mukai, Y. In, H. Miyagawa, K. Kitamura,A. Yamano, H. Shindo and T. Ishida, Biochim. Biophys. Acta,2008, 1784, 1059–1067.

1

5

10

15

20

25

30

35

40

45

50

55

1

5

10

15

20

25

30

35

40

45

50

55

12 | Mol. BioSyst., 2009, ]]], 1–12 This journal is c The Royal Society of Chemistry 2009

1

DissectionofUSPcatalyticdomainsrevealsfive

commoninsertionpoints

SupplementaryInformation

Yu Ye1$, Hartmut Scheel2$, Kay Hofmann2 and David Komander1*

1 MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 0QH, UK

2 Miltenyi Biotec, Friedrich-Ebert-Strasse 68, 51429 Bergisch-Gladbach,

Germany.

$ These authors contributed equally to the work.

* Correspondence should be addressed to:

David Komander, MRC Laboratory of Molecular Biology, Protein and Nucleic

Acid Chemistry Division, Hills Road, Cambridge, CB2 0QH, UK.

Email: [email protected], Tel: +441223402300, Fax: +441223412178

Running title: USP domain insertions

2

Supplementary Figure 1: Box annotation for USP domain structures.

A) Structure of USP2 (pdb-id 2hd5, 1), with boxes coloured in blue (box 1),

light blue (box 2), light green (box 3), dark green (box 4), orange (box 5) and

dark red (box 6). Regions in grey are not part of any box, and represent the

points where insertions can occur. The catalytic as well as Zn-binding

residues are shown in a stick model with yellow carbon atoms, blue nitrogen

atoms, red oxygen atoms and green sulphur atoms. B) Structure of USP8

(pdb-id 2gfo, 2) coloured as in A. USP8 contains a small helical insertion

between boxes 4/5. C) Structure of USP14 (pdb-id 2ayo, 3), coloured as in A.

USP14 contains a small mainly disordered insert between boxes 4/5. D)

Structure of CYLD (pdb-id 2vhf, 4), coloured as in A. CYLD displays a

truncated Fingers subdomain, and contains a B-box insertion (yellow)

between box 4 and box 5 (see main text).

Supplementary Figure 1: Box annotation for USP domain structures.

Alignment as in Fig.2 of main text, in larger format.

Supplementary Table I: catalytic domain and insertion boundaries for all human USPs

Catalytic domain boundaries, according to ProSite annotation, and position of

insertions identified in this study. All numbers are based on the human

sequences.

References

1. M.Renatus,S.G.Parrado,A.D'Arcy,U.Eidhoff,B.Gerhartz,U.Hassiepen,B.Pierrat,R.Riedl,D.Vinzenz,S.WorpenbergandM.Kroemer,Structure,2006,14,1293‐1302.

2. G.V.Avvakumov,J.R.Walker,S.Xue,P.J.Finerty,Jr.,F.Mackenzie,E.M.NewmanandS.Dhe‐Paganon,JBiolChem,2006,281,38061‐38070.

3. M.Hu,P.Li,L.Song,P.D.Jeffrey,T.A.Chenova,K.D.Wilkinson,R.E.CohenandY.Shi,EmboJ,2005,24,3747‐3756.

4. D.Komander,C.J.Lord,H.Scheel,S.Swift,K.Hofmann,A.AshworthandD.Barford,MolCell,2008,29,451‐464.

A USP2 B USP8

Box1Box1

Box2

Box3

Box4Box5Box5

Box6

active site

Zn binding

Box1

Box2Box3

Box4Box5Box5

Box6

active site

Zn binding

Box1

Box2

Box3

Box4Box5Box5

Box6active site

Zn binding

C USP14

Box1

Box2

Box3

Box4Box5Box5

Box6

active site

B-box insertion

Supplementary Figure 1

D CYLD

<-a---------- box1 --------------------> <------------------------------ box2 ----------------------------------------> <--------------------------- box3 -------------------> USP1 VGLNNL..GNTCYLNSILQVLYFCPGFKSGVKHLFNIIS 22 LASYELICSLQSLIISVEQL 12 DELATQPRRLLNTL...RELNPMY....EGYLQHDAQEVLQCILGNIQETCQLL 195 INKGEEQIGFELVEKLFQGQLV...LRTRCLECESLT.ERREDFQDISVPVQED 17 USP2 AGLRNL..GNTCFMNSILQCLSNTRELRDYCLQRLYMRD 5 NAHTALVEEFAKLIQTIWTSS...PNDVVSPSEFKTQI...QRYAPRF....VGYNQQDAQEFLRFLLDGLHNEVNRV 22 MWRKYLEREDSRIGDLFVGQLK...SSLTCTDCGYCS.TVFDPFWDLSLPIAKR 2 USP3 TGLRNL..GNTCFMNAILQSLSNIEQFCCYFKELPAVEL 17 DNNVSLVEEFRKTLCALWQG....SQTAFSPESLFYVV...WKIMPNF....RGYQQQDAHEFMRYLLDHLHLELQGG 18 SNKCCINGASTVVTAIFGGILQ...NEVNCLICGTES.RKFDPFLDLSLDIPSQ 12 USP4 CGLGNL..GNTCFMNSALQCLSNTAPLTDYFLKDEYEAE 7 GMKGEIAEAYAELIKQMWSG....RDAHVAPRMFKTQV...GRFAPQF....SGYQQQDSQELLAFLLDGLHEDLNRV 22 AWENHRLRNDSVIVDTFHGLFK...STLVCPECAKVS.VTFDPFCYLTLPLPLK 289 USP5 TGIRNL..GNSCYLNSVVQVLFSIPDFQRKYVDKLEKIF 5 DPTQDFSTQVAKLGHGLLSG 20 VQDGIAPRMFKALI...GKGHPEF....STNRQQDAQEFFLHLINMVERNCRSS ..........ENPNEVFRFLVE...EKIKCLATEKVK.YTQRVDYIMQLPVPMD 30 USP6 TGLSNL..GNTCFMNSSIQCVSNTQPLTQYFISGRHLYE 7 GMKGHMAKCYGDLVQELWSG....TQKSVAPLKLRRTI...AKYAPKF....DGFQQQDSQELLAFLLDGLHEDLNRV 22 AWDNHLRRNRSIIVDLFHGQLR...SQVKCKTCGHIS.VRFDPFNFLSLPLPMD 314 USP7 VGLKNQ..GATCYMNSLLQTLFFTNQLRKAVYMMPTEGD DSSKSVPLALQRVFYELQHS.....DKPVGTKKLTKSFG..WETLDSF.......MQHDVQELCRVLLDNVENKMKGT .......CVEGTIPKLFRGKMV...SYIQCKEVDYRS.DRREDYYDIQLSIK.. USP8 TGLRNL..GNTCYMNSILQCLCNAPHLADYFNRNCYQDD 7 GHKGEVAEEFGIIMKALWTG....QYRYISPKDFKITI...GKINDQF....AGYSQQDSQELLLFLMDGLHEDLNKA 22 AWQKHKQLNESIIVALFQGQFK...STVQCLTCHKKS.RTFEAFMYLSLPLAST USP9X VGLKNA..GATCYMNSVIQQLYMIPSIRNGILAIEGTGS 44 EYNIGVLRHLQVIFGHLAAS....RLQYYVPRGFWKQF...RLWGEPV....NLREQHDALEFFNSLVDSLDEALKAL .......GHPAMLSKVLGGSFA...DQKICQGCPHRY.ECEESFTTLNVDIR.. USP9Y VGLKNA..GATCYMNSVIQQLYMIPSIRNSILAIEGTGS 44 EYNIGVLRHLQVIFGHLAAS....QLQYYVPRGFWKQF...RLWGEPV....NLREQHDALEFFNSLVDSLDEALKAL .......GHPAILSKVLGGSFA...DQKICQGCPHRF.ECEESFTTLNVDIR.. USP10 RGLINK..GNWCYINATLQALVACPPMYHLMKFIPLYSK 4 CTSTPMIDSFVRLMNEFTNM 19 PGAAFEPTYIYRLL...TVNKSSLS...EKGRQEDAEEYLGFILNGLHEEMLNL 47 SVTRQADFVQTPITGIFGGHIR...SVVYQQSSKESA..TLQPFFTLQLDIQSD USP11 CGLTNL..GNTCFMNSALQCLSNVPQLTEYFLNNCYLEE 7 GMKGEIAEAYADLVKQAWSG....HHRSIVPHVFKNKV...GHFASQF....LGYQQHDSQELLSFLLDGLHEDLNRV 22 AWQNHKRRNDSVIVDTFHGLFK...STLVCPDCGNVS.VTFDPFCYLSVPLPIS 285 USP12 FGLVNF..GNTCYCNSVLQALYFCRPFREKVLAYKSQPR .KKESLLTCLADLFHSIATQKK..KVGVIPPKKFITRL...RKENELF....DNYMQQDAHEFLNYLLNTIADILQEE 16 NENNNSTPDPTWVHEIFQGTLT...NETRCLTCETIS.SKDEDFLDLSVDVE.. USP13 TGLKNL..GNSCYLSSVMQAIFSIPEFQRAYVGNLPRIF 26 YSKPPVKSELIEQVMKEEHKP...QQNGISPRMFKAFV...SKSHPEF....SSNRQQDAQEFFLHLVNLVERNRIGS ..........ENPSDVFRFLVE...ERIQCCQTRKVR.YTERVDYLMQLPVAME 30 USP14 CGLTNL..GNTCYMNATVQCIRSVPELKDALKRYAGALR 5 ASAQYITAALRDLFDSMDKT.....SSSIPPIILLQFL...HMAFPQF.6..GQYLQQDANECWIQMMRVLQQKLEAI 12 ASAATPSKKKSLIDQFFGVEFE...TTMKCTESEEEE.VTKGKENQLQLSCFIN 1 USP15 CGLSNL..GNTCFMNSAIQCLSNTPPLTEYFLNDKYQEE 7 GMRGEIAKSYAELIKQMWSG....KFSYVTPRAFKTQV...GRFAPQF....SGYQQQDCQELLAFLLDGLHEDLNRI 22 AWENHLKRNDSIIVDIFHGLFK...STLVCPECAKIS.VTFDPFCYLTLPLPMK 312 USP16 KGLSNL..GNTCFFNAVMQNLSQTPVLRELLKEVKMSGT 19 EPPGPLTLAMSQFLNEMQET....KKGVVTPKELFSQV...CKKAVRF....KGYQQQDSQELLRYLLDGMRAEEHQR 24 KDYEKKKSMPSFVDRIFGGELT...SMIMCDQCRTVS.LVHESFLDLSLPVLDD 236 USP17 AGLQNM..GNTCYVNASLQCLTYTPPLANYMLSREHSQT 2 RHKGCMLCTMQAHITRALHN....PGHVIQPS.........QALAAGF....HRGKQEDAHEFLMFTVDAMKKACLPG 1 KQVDHHSKDTTLIHQIFGGYWR...SQIKCLHCHGIS.DTFDPYLDIALDIQ.. USP17L1 AGLQNM..GNTCYENASLQCLTYTLPLANYMLSREHSQT 2 RPKCCMLCTMQAHITWALHS....PGHVIQPS.........QALAAGF....HRGKQEDVHEFLMFTVDAMKKACLPG 1 KQVDHHCKDTTLIHQIFGGCWR...SQIKCLHCHGIS.DTFDPYLDIALDIQ.. USP17L2 AGLQNM..GNTCYENASLQCLTYTPPLANYMLSREHSQT 2 RPKCCMLCTMQAHITWALHS....PGHVIQPS.........QALAAGF....HRGKQEDAHEFLMFTVDAMKKACLPG 1 KQVDHHSKDTTLIHQIFGGCWR...SQIKCLHCHGIS.DTFDPYLDIALDIQ.. USP18 VGLHNI..GQTCCLNSLIQVFVMNVDFTRILKRITVPRG 2 EQRRSVPFQMLLLLEKMQDS....RQKAVRPLELAYCL...QKCNVPL......FVQHDAAQLYLKLWNLIKDQITDV .......HLVERLQALYTIRVK...DSLICVDCAMES.SRNSSMLTLPLSLFDV 3 USP19 TGLVNL..GNTCFMNSVIQSLSNTRELRDFFHDRSFEAE 7 GTGGRLAIGFAVLLRALWKG....THHAFQPSKLKAIV...ASKASQF....TGYAQHDAQEFMAFLLDGLHEDLNRI 22 AWQRHKMRNDSFIVDLFQGQYK...SKLVCPVCAKVS.ITFDPFLYLPVPLPQK 376 USP20 TGMKNL..GNSCYMNAALQALSNCPPLTQFFLECGGLVR 1 DKKPALCKSYQKLVSEVWHKK...RPSYVVPTSLSHGI...KLVNPMF....RGYAQQDTQEFLRCLMDQLHEELKEP 182 SRRRKEQRYRSVISDIFDGSIL...SLVQCLTCDRVS.TTVETFQDLSLPIPGK 52 USP21 VGLRNL..GNTCFLNAVLQCLSSTRPLRDFCLRRDFRQE 4 GRAQELTEAFADVIGALWHPD...SCEAVNPTRFRAVF...QKYVPSF....SGYSQQDAQEFLKLLMERLHLEINRR 37 MWKRYLEREDSKIVDLFVGQLK...SCLKCQACGYRS.TTFEVFCDLSLPIPKK 4 USP22 RGLINL..GNTCFMNCIVQALTHTPLLRDFFLSDRHRCE 2 SPSSCLVCEMSSLFQEFYSG....HRSPHIPYKLLHLV...WTHARHL....AGYEQQDAHEFLIAALDVLHRHCKGD 3 KKANNPNHCNCIIDQIFTGGLQ...SDVTCQVCHGVS.TTIDPFWDISLDLPGS 23 USP24 VGLRNG..GATCYMNAVFQQLYMQPGLPESLLSVDDDTD NPDDSVFYQVQSLFGHLMES....KLQYYVPENFWKIF...KMWNKEL....YVREQQDAYEFFTSLIDQMDEYLKKM .......GRDQIFKNTFQGIYS...DQKICKDCPHRY.EREEAFMALNLGVT.. USP25 VGLKNV..GNTCWFSAVIQSLFNLLEFRRLVLNYKPPSN 10 HRNLPFMRELRYLFALLVGT....KRKYVDPSRAVEIL...KDAFKSN.....DSQQQDVSEFTHKLLDWLEDAFQMK 1 EEETDEEKPKNPMVELFYGRFL...AVGVLEGKKFEN...TEMFGQYPLQVN.. USP26 HGLPNL..GNTCYMNAVLQSLLSIPSFADDLLNQSFPWG 1 IPLNALTMCLARLLFFKDTYNI..EIKEMLLLNLKKAI...SAAAEIF....HGNAQNDAHEFLAHCLDQLKDNMEKL 20 ADDPDTSGFSCPVITNFELELL...HSIACKACGQVI.LKTELNNYLSINLPQR 3 USP27X RGLINL..GNTCFMNCIVQALTHTPILRDFFLSDRHRCE 2 SPELCLVCEMSSLFRELYSG....NPSPHVPYKLLHLV...WIHARHL....AGYRQQDAHEFLIAALDVLHRHCKGD 3 KAANNPNHCNCIIDQIFTGGLQ...SDVTCQACHGVS.TTIDPCWDISLDLPGS 22 USP28 VGLKNV..GNTCWFSAVIQSLFQLPEFRRLVLSYSLPQN 10 KRNIMFMQELQYLFALMMGS....NRKFVDPSAALDLL...KGAFRSS.....EEQQQDVSEFTHKLLDWLEDAFQLA 1 NVNSPRNKSENPMVQLFYGTFL.2.EVREGKPFCNNE.....TFGQYPLQVN.. USP29 QGFPNL..GNTCYMNAVLQSLFAIPSFADDLLTQGVPWE 1 IPFEALIMTLTQLLALKDFCST..KIKRELLGNVKKVI...SAVAEIF....SGNMQNDAHEFLGQCLDQLKEDMEKL 20 VGSAATKVFVCPVVANFEFELQ...LSLICKACGHAV.LKVEPNNYLSINLHQE 3 USP30 PGLVNL..GNTCFMNSLLQGLSACPAFIRWLEEFTSQYS 6 PSHQYLSLTLLHLLKALSCQEVT.DDEVLDASCLLDVL...RMYRWQI....SSFEEQDAHELFHVITSSLEDERDRQ 27 RGSPHPTSNHWKSQHPFHGRLT...SNMVCKHCEHQSPVRFDTFDSLSLSIPAA 3 USP31 AGLRNH..GNTCFMNATLQCLSNTELFAEYLALGQYRAG 16 QGQGEVTEQLAHLVRALWTL....EYTPQHSRDFKTIV...SKNALQY....RGNSQHDAQEFLLWLLDRVHEDLNHS 18 PEGPSFPVCSTFVQELFQAQYR...SSLTCPHCQKQS.NTFDPFLCISLPIPLP 278 USP32 TGLSNL..GNTCFMNSSIQCVSNTQPLTQYFISGRHLYE 7 GMKGHMAKCYGDLVQELWSG....TQKNVAPLKLRWTI...AKYAPRF....NGFQQQDSQELLAFLLDGLHEDLNRV 22 AWDNHLRRNRSIVVDLFHGQLR...SQVKCKTCGHIS.VRFDPFNFLSLPLPMD 314 USP33 TGLKNI..GNTCYMNAALQALSNCPPLTQFFLDCGGLAR 1 DKKPAICKSYLKLMTELWHKS...RPGSVVPTTLFQGI...KTVNPTF....RGYSQQDAQEFLRCLMDLLHEELKEQ 172 KRKKQHKKYRSVISDIFDGTII...SSVQCLTCDRVS.VTLETFQDLSLPIPGK 52 USP34 VGLTNL..GATCYLASTIQQLYMIPEARQAVFTAKYSED MKHKTTLLELQKMFTYLMES....ECKAYNPRPFCKTY...TMDKQPL....NTGEQKDMTEFFTDLITKIEEM.... .....SPELKNTVKSLFGGVIT...NNVVSLDCEHVS.QTAEEFYTVRCQVA.. USP35 IGLINL..GNTCYVNSILQALFMASEFRHCVLRLTE... NNSQPLMTKLQWLFGFLEHS....QRPAISPENFLS.....ASWTPWF....SPGTQQDCSEYLKYLLDRLHEEEKTG 14 PPEEPPAPSSTSVEKMFGGKIV...TRICCLCCLNVS.SREEAFTDLSLAFPPP 153 USP36 AGLHNL..GNTCFLNATIQCLTYTPPLANYLLSKEHARS 2 QGSFCMLCVMQNHIVQAFAN....SGNAIKPVSFIRDL...KKIARHF....RFGNQEDAHEFLRYTIDAMQKACLNG 1 AKLDRQTQATTLVHQIFGGYLR...SRVKCSVCKSVS.DTYDPYLDVALEIR.. USP37 QGFSNL..GNTCYMNAILQSLFSLQSFANDLLKQGIPWK 1 IPLNALIRRFAHLLVKKDICNS..ETKKDLLKKVKNAI...SATAERF....SGYMQNDAHEFLSQCLDQLKEDMEKL 15 PDISATRAYTCPVITNLEFEVQ...HSIICKACGEII.PKREQFNDLSIDLPRR 4 USP38 TGLINL..GNTCYMNSVIQALFMATDFRRQVLSLNL... NGCNSLMKKLQHLFAFLAHT....QREAYAPRIFFE.....ASRPPWF....TPRSQQDCSEYLRFLLDRLHEEEKIL 30 ETPRTSDGEKTLIEKMFGGKLR...THIRCLNCRSTS.QKVEAFTDLSLAFCPS 90 USP39 VGLNNI..KANDYANAVLQALSNVPPLRNYFLEEDNYKN 6 DIMFLLVQRFGELMRKLWNPRN..FKAHVSPHEMLQAVV..LCSKKTF....QITKQGDGVDFLSWFLNALHSALGGT 2 KKKTIVTDVFQGSMRIFTKKLP.9.EQLLHNDEYQET.MVESTFMYLTLDLPTA 3 USP40 SGIRNQ..GGTCYLNSLLQTLHFTPEFREALFSLGPEEL 10 AKVRIIPLQLQRLFAQLLLL....DQEAASTADLTDSFG..WTSNEEM.......RQHDVQELNRILFSALETSLVGT .......SGHDLIYRLYHGTIV...NQIVCKECKNVS.ERQEDFLDLTVAVK.. USP41 VGLHNI..GQTCCLNSLIQVFVMNVDFARILKRITVPRG 2 EQRRSVPFQMLLLLEKMQDS....RQKAVWPLELAYCL...QKYNVPL......FVQHDAAQLYLKLWNLIKDQIADV .......HLVERLQALYMIRMK...DSLICLDCAMES.SRNSSMLTLRLSFFDV 3 USP42 AGLQNL..GNTCFANAALQCLTYTPPLANYMLSHEHSKT 2 AEGFCMMCTMQAHITQALSN....PGDVIKPMFVINEM...RRIARHF....RFGNQEDAHEFLQYTVDAMQKACLNG 1 NKLDRHTQATTLVCQIFGGYLR...SRVKCLNCKGVS.DTFDPYLDITLEIK.. USP43 QGLKNH..GNTCFMNAVVQCLSNTDLLAEFLALGRYRAA PGRAEVTEQLAALVRALWTRE...YTPQLSAEFFQNAV...SKYGSQF....QGNSQHDALEFLLWLLDRVHEDLEGS 21 SPSAQLPLGQSFVQSHFQAQYR...SSLTCPHCLKQS.NTFDPFLCVSLPIPLR 262 USP44 TGLRNL..GNTCYMNSVLQVLSHLLIFRQCFLKLDLNQW 66 SQYISLCHELHTLFQVMWSG....KWALVSPFAMLHSV...WRLIPAF....RGYAQQDAQEFLCELLDKIQRELETT 10 SQRKLIKQVLNVVNNIFHGQLL...SQVTCLACDNKS.NTIEPFWDLSLEFPER 10 USP45 RGITNL..GNTCFFNAVMQNLAQTYTLTDLMNEIKESST 19 SRPGPLTSALFLFLHSMKET....EKGPLSPKVLFNQL...CQKAPRF....KDFQQQDSQELLHYLLDAVRTEETKR 25 KAYGKEGVKMNFIDRIFIGELT...STVMCEECANIS.TVKDPFIDISLPIIEE 228 USP46 FGLVNF..GNTCYCNSVLQALYFCRPFRENVLAYKAQQK KKENLLTCLADLFHSIATQKK..KVGVIPPKKFISRL...RKENDLF....DNYMQQDAHEFLNYLLNTIADILQEE 16 EPAENNKPELTWVHEIFQGTLT...NETRCLNCETVS.SKDEDFLDLSVDVE.. USP47 VGLVNQ..AMTCYLNSLLQTLFMTPEFRNALYKWEFEES 2 DPVTSIPYQLQRLFVLLQTS....KKRAIETTDVTRSF...GWDSSE......AWQQHDVQELCRVMFDALEQKWK.. .....QTEQADLINELYQGKLK...DYVRCLECGYEG.WRIDTYLDIPLVIRPY 3 USP48 VGLTNL..GATCYVNTFLQVWFLNLELRQALYLCPSTCS 13 YEPQTICEHLQYLFALLQNS....NRRYIDPSGFVKAL........GL....DTGQQQDAQEFSKLFMSLLEDTLSKQ ....KNPDVRNIVQQQFCGEYA...YVTVCNQCGRES.KLLSKFYELELNIQ.. USP49 TGLRNL..GNTCYMNSILQVLSHLQKFRECFLNLDPSKT 62 SKHISLCRELHTLFRVMWSG....KWALVSPFAMLHSV...WSLIPAF....RGYDQQDAQEFLCELLHKVQQELESE 10 SQRKLTKQVLKVVNTIFHGQLL...SQVTCISCNYKS.NTIEPFWDLSLEFPER 13 USP50 TGLWNL..GNTCCVNAISQCLCSILPLVEYFLTGKYITA 7 SDCSEVATAFAYLMTDMWLG....DSDCVSPEIFWSAL...GNLYPAF....TKKMQQDAQEFLICVLNELHEALKKY 15 CCRKWITTETSIITQLFEEQLN...YSIVCLKCEKCT.YKNEVFTVFSLPIPSK USP51 RGLINL..GNTCFMNCIVQALTHIPLLKDFFLSDKHKCI 2 SPSLCLVCEMSSLFHAMYSG....SRTPHIPYKLLHLI...WIHAEHL....AGYRQQDAHEFLIAILDVLHRHSKDD 3 QEANNPNCCNCIIDQIFTGGLQ...SDVTCQACHSVS.TTIDPCWDISLDLPGS 23 USP52 AGLEPH..IPNAYCNCMIQVLYFLEPVRCLIQNHLCQKE ...FCLACELGFLFHMLDLS....RGDPCQGNNFLRAF...RTIPEAS 14 KGNLARLIQRWNRFILTQLHQDMQEL 9 GGSSFCSSGDSVIGQLFSCEME...NCSLCRCGSETV.RASSTLLFTLSYPDGS 4 USP53 KGLLNEPGQNSCFLNSAVQVLWQLDIFRRSLRVLTGHVC QGDACIFCALKTIFAQFQHS....REKALPSDNIRHALAESFKDEQRF....QLGLMDDAAECFENMLERIHFHIVPS 4 RDADMCTSKSCITHQKFAMTLY...EQCVCRSCGASS..DPLPFTEFVRYISTT 1 USP54 KGLSNEPGQNSCFLNSALQVLWHLDIFRRSFRQLTTHKC MGDSCIFCALKGIFNQFQCS....SEKVLPSDTLRSALAKTFQDEQRF....QLGIMDDAAECFENLLMRIHFHIADE TKEDICTAQHCISHQKFAMTLF...EQCVCTSCGATS..DPLPFIQMVHYISTT 4 USPL1 VQWKNA..YALCWLDCILSALVHSEELKNTVTGLCSKEE SIFWRLLTKYNQANTLLYTS....QLSGVKDGDCKKLT...SEIFAEI....ETCLNEVRDEIFISLQPQLRCTLGDM FAFPLLLKLETHIEKLFLYSFS...WDFECSQCGHQY.QNR............. CYLD KGIQGH..YNSCYLDSTLFCLFAFSSVLDTVLLRPKEKN 3 YYSETQELLRTEIVNPLRIYG....YVCATKIMKLRKIL...EKVEAASG...FTSEEKDPEEFLNILFHHILRVEPLL 1 IRSAGQKVQDCYFYQIFMEKNE.7.QQLLEWSFINSNLKFAEAPSCLIIQMPRF 13

Supplementary Figure 2 (part 1)

<-------------------------------------- box4 -------------------------------------------------------> <---------- box5--------------> <------------- box6 ---------------------> USP1 EMKTLRWAISQ...FASVERIV...GEDKYFCENCH....HYTEAERSLLFDKMPEVITIHLKCFAAS 7 GGGLSKINTPLLTPL.KLSLEEWSTK.... PTNDSYGLFAVVMHSGI.TISSGHYTASVK. 142 EYEGKWLLFDDSEVKVTEEKDF..7..TSPTSTPYLLFYKKL USP2 PEVTLMDCMRL...FTKEDVLD...GDEKPTCCRCR....GRKRCIKKFSIQRFPKILVLHLKRFSES...RIRTSKLTTFVNFPLRDLDLREFASEN... TNHAVYNLYAVSNHSG..TTMGGHYTAYCRS PGTGEWHTFNDSSVTPMSSSQV.......RTSDAYLLFYELA USP3 PVCSLRDCLRS...FTDLEELD...ETELYMCHKCK....KKQKSTKKFWIQKLPKVLCLHLKRFHWT...AYLRNKVDTYVEFPLRGLDMKCYLLEPENS 1 PESCLYDLAAVVVHHGS.GVGSGHYTAYA.. THEGRWFHFNDSTVTLTDEETV.......VKAKAYILFYVEH USP4 TTVALRDCIEL...FTTMETLG...EHDPWYCPNCK....KHQQATKKFDLWSLPKILVVHLKRFSYN...RYWRDKLDTVVEFPIRGLNMSEFVCNLS.. ARPYVYDLIAVSNHYG..AMGVGHYTAYAKN KLNGKWYYFDDSNVSLASEDQI.......VTKAAYVLFYQRR USP5 AQVPFSSCLEA...YGAPEQVD......DFWSTALQ....AKSVAVKTTRFASFPDYLVIQIKKFTFGL..DWVPKKLDVSIEMPE.ELDISQLRGTGLQP 186 DGPGKYQLFAFISHMGT.STMCGHYVCHIK. .KEGRWVIYNDQKVCASEKPP.........KDLGYIYFYQRV USP6 EPINLDSCLRA...FTSEEELG...ESEMYYCSKCK....THCLATKKLDLWRLPPFLIIHLKRFQFV...NDQWIKSQKIVRFLR.ESFDPSAFLVPRDP 191 HIKPIYNLYAISCHSG..ILSGGHYITYAK. NPNCKWYCYNDSSCEELHPDEI.......DTDSAYILFYEQQ USP7 GKKNIFESFVD...YVAVEQLD...GDNKYDAGEHG.....LQEAEKGVKFLTLPPVLHLQLMRFMYDPQ.TDQNIKINDRFEFPE.QLPLDEFLQKTDP. KDPANYILHAVLVHSG..DNHGGHYVVYLNP KGDGKWCKFDDDVVSRCTKEEA.13..VRHCTNAYMLVYIRE USP8 SKCTLQDCLRL...FSKEEKLT...DNNRFYCSHCR....ARRDSLKKIEIWKLPPVLLVHLKRFSYD...GRWKQKLQTSVDFPLENLDLSQYVIGPK.. NNLKKYNLFSVSNHYG..GLDGGHYTAYCKN AARQRWFKFDDHEVSDISVSSV.......KSSAAYILFYTSL USP9X NHQNLLDSLEQ...YVKGDLLE...GANAYHCEKCN....KKVDTVKRLLIKKLPPVLAIQLKRFDYDWE.RECAIKFNDYFEFPR.ELDMEPYTVAGVAK 23 AGSTKYRLVGVLVHSG..QASGGHYYSYIIQ 5 GERNRWYKFDDGDVTECKMDDD.28..QKRWWNAYILFYERM USP9Y NHQNLLDSLEQ...YIKGDLLE...GANAYHCEKCD....KKVDTVKRLLIKKLPRVLAIQLKRFDYDWE.RECAIKFNDYFEFPR.ELDMGPYTVAGVAN 23 AGGTKYRLVGVLVHSG..QASGGHYYSYIIQ 5 DQTDHWYKFDDGDVTECKMDDD.28..QKRWWNAYIPFYEQM USP10 KIRTVQDALES...LVARESVQ.......GYTTKTK....QEVEISRRVTLEKLPPVLVLHLKRFVYEK..TGGCQKLIKNIEYPV.DLEISKELLSPGVK 4 KCHRTYRLFAVVYHHGN.SATGGHYTTDVFQ IGLNGWLRIDDQTVKVINQYQV..2..PTAERTAYLLYYRRV USP11 APVRLQECIEL...FTTVETLE...KENPWYCPSCK....QHQLATKKLDLWMLPEILIIHLKRFSYT...KFSREKLDTLVEFPIRDLDFSEFVIQPQNE 2 PELYKYDLIAVSNHYG..GMRDGHYTTFACN KDSGQWHYFDDNSVSPVNENQI.......ESKAAYVLFYQRQ USP12 QNTSITHCLRG...FSNTETLC...SEYKYYCEECR....SKQEAHKRMKVKKLPMILALHLKRFKYMDQ.LHRYTKLSYRVVFPL.ELRLFNTSGDAT.. NPDRMYDLVAVVVHCGS.GPNRGHYIAIVK. .SHDFWLLFDDDIVEKIDAQAI.10..SKNSESGYILFYQSR USP13 AKIPFSACLQA...FSEPENVD......DFWSSALQ....AKSAGVKTSRFASFPEYLVVQIKKFTFGL..DWVPKKFDVSIDMPD.LLDINHLRARGLQP 179 DGSGTYELFAFISHMGT.STMSGHYICHIK. .KEGRWVIYNDHKVCASERPP.........KDLGYMYFYRRI USP14 EVKYLFTGLKL...RLQEEITK........QSPTLQ....RNALYIKSSKISRLPAYLTIQMVRFFYKEK.ESVNAKVLKDVKFPL.MLDMYELCTPELQE 47 NNCGYYDLQAVLTHQGR.SSSSGHYVSWVK. RKQDEWIKFDDDKVSIVTPEDI..5..GGDWHIAYVLLYGPR USP15 PFVKLKDCIEL...FTTKEKLG...AEDPWYCPNCK....EHQQATKKLDLWSLPPVLVVHLKRFSYS...RYMRDKLDTLVDFPINDLDMSEFLINPN.. AGPCRYNLIAVSNHYG..GMGGGHYTAFAKN KDDGKWYYFDDSSVSTASEDQI.......VSKAAYVLFYQRQ USP16 DECSIQHCLYQ...FTRNEKLR...DANKLLCEVCT.17.VYTNAKKQMLISLAPPVLTLHLKRFQQA...GFNLRKVNKHIKFPE.ILDLAPFCTLKCKN 4 NTRVLYSLYGVVEHSG..TMRSGHYTAYAK 23 ESKGQWFHISDTHVQAVPTTKV.......LNSQAYLLFYERI USP17 AAQSVQQALEQ...LVKPEELN...GENAYHCGVCL....QRAPASKTLTLHTSAKVLILVLKRFSD.....VTGNKIAKNVQYPE.CLDMQPYMSQPN.. TGPLVYVLYAVLVHAGW.SCHNGHYFSYVK. AQEGQWYKMDDAEVTASSITSV.......LSQQAYVLFYIQK USP17L1 AAQSVKQALEQ...LVKPEELN...GENAYHCGLCL....QRAPASNTLTLHTSAKVLILVLKRFSD.....VAGNKLAKNVQYPE.CLDMQPYMSQQN.. TGPLVYVLYAVLVHAGW.SCHDGHYFSYVK. AQEVQWYKMDDAEVTVCSIISV.......LSQQAYVLFYIQK USP17L2 AAQSVKQALEQ...LVKPEELN...GENAYHCGLCL....QRAPASKTLTLHTSAKVLILVLKRFSD.....VTGNKLAKNVQYPE.CLDMQPYMSQQN.. TGPLVYVLYAVLVHAGW.SCHDGHYFSYVK. AQEGQWYKMDDAKVTACSITSV.......LSQQAYVLFYIQK USP18 PLKTLEDALHC...FFQPRELS...SKSKCFCENCG....KKTRGKQVLKLTHLPQTLTIHLMRFSIR...NSQTRKICHSLYFPQ.SLDFSQILPMKRES 5 QSGGQYELFAVIAHVG..MADSGHYCVYIRN AVDGKWFCFNDSNICLVSWEDI..8..YHWQETAYLLVYMKM USP19 GHFTLDQCLNL...FTRPEVLA...PEEAWYCPQCK....QHREASKQLLLWRLPNVLIVQLKRFSFRS..FIWRDKINDLVEFPVRNLDLSKFCIGQKE. EQLPSYDLYAVINHYG..GMIGGHYTACAR 8 RSDVGWRLFDDSTVTTVDESQV.......VTRYAYVLFYRRR USP20 PVVTLEDCLAA...FFAADELK...GDNMYSCERCK....KLRNGVKYCKVLRLPEILCIHLKRFRHE...VMYSFKINSHVSFPLEGLDLRPFLAKECT. SQITTYDLLSVICHHG..TAGSGHYIAYCQN VINGQWYEFDDQYVTEVHETVV.......QNAEGYVLFYRKS USP21 GKVSLRDCFNL...FTKEEELE...SENAPVCDRCR....QKTRSTKKLTVQRFPRILVLHLNRFSAS...RGSIKKSSVGVDFPL.QRLSLGDFASDK.. AGSPVYQLYALCNHSG..SVHYGHYTALCR. .CQTGWHVYNDSRVSPVSENQV.......ASSEGYVLFYQLM USP22 GTTTLTDCLRR...FTRPEHLG...SSAKIKCSGCH....SYQESTKQLTMKKLPIVACFHLKRFEHS...AKLRRKITTYVSFPL.ELDMTPFMASSKES 13 NNDNKYSLFAVVNHQG..TLESGHYTSFIR. QHKDQWFKCDDAIITKASIKDV.......LDSEGYLLFYHKQ USP24 SCQSLEISLDQ...FVRGEVLE...GSNAYYCEKCK....EKRITVKRTCIKSLPSVLVIHLMRFGFDWE.SGRSIKYDEQIRFPW.MLNMEPYTVSGMAR 26 ALTENYELVGVIVHSG..QAHAGHYYSFIKD 3 CGKGKWYKFNDTVIEEFDLNDE.26..RRRYWNAYMLFYQRV USP25 GFKDLHECLEA...AMIEGEIE.........SLHSE....NSGKSGQEHWFTELPPVLTFELSRFEFNQA.LGRPEKIHNKLEFPQ.VLYLDRYMHRNREI 174 MIQVPYRLHAVLVHEG..QANAGHYWAYIFD HRESRWMKYNDIAVTKSSWEEL..6..GYRNASAYCLMYIND USP26 HPSSIQSTFDL...FFGAEELE.......YKCAKCE.....HKTSVGVHSFSRLPRILIVHLKRYSLNE..FCALKKNDQEVIISK.YLKVSSHCNEGTRP 261 KGDHTYRLISVVSHLGK.TLKSGHYICDAYD FEKQIWFTYDDMRVLGIQEAQM....QEDRRCTGYIFFYMHN USP27X GITTLTDCLRR...FTRPEHLG...SSAKIKCGSCQ....SYQESTKQLTMNKLPVVACFHFKRFEHS...AKQRRKITTYISFPL.ELDMTPFMASSKES 13 NNENKYSLFAVVNHQG..TLESGHYTSFIR. HHKDQWFKCDDAVITKASIKDV.......LDSEGYLLFYHKQ USP28 GYRNLDECLEG...AMVEGDVE........LLPSDH.....SVKYGQERWFTKLPPVLTFELSRFEFNQS.LGQPEKIHNKLEFPQ.IIYMDRYMYRSKEL 174 LRQVPYRLHAVLVHEG..QANAGHYWAYIYN QPRQSWLKYNDISVTESSWEEV..6..GLRNVSAYCLMYIND USP29 LPLSIQNSLDL...FFKEEELE.......YNCQMCK.....QKSCVARHTFSRLSRVLIIHLKRYSFNN..AWLLVKNNEQVYIPK.SLSLSSYCNESTKP 270 DPLQAYRLISVVSHIGS.SPNSGHYISDVYD FQKQAWFTYNDLCVSEISETKM....QEARLHSGYIFFYMHN USP30 HPLTLDHCLHH...FISSESVR......DVVCDNCT.15.QRTTFVKQLKLGKLPQCLCIHLQRLSWSS..HGTPLKRHEHVQFNE.FLMMDIYKYHLLGH 69 SSTYLFRLMAVVVHHG..DMHSGHFVTYRR 9 STSNQWLWVSDDTVRKASLQEV.......LSSSAYLLFYERV USP31 QTCTLSQCFQL...YTKEERLA...PDDAWRCPHCK....QLQQGSITLSLWTLPDVLIIHLKRFRQE...GDRRMKLQNMVKFPLTGLDMTPHVVKRSQS 20 PEDYIYDLYAVCNHHG..TMQGGHYTAYCKN SVDGLWYCFDDSDVQQLSEDEV.......CTQTAYILFYQRR USP32 EPINLDSCLRA...FTSEEELG...ENEMYYCSKCK....THCLATKKLDLWRLPPILIIHLKRFQFV...NGRWIKSQKIVKFPR.ESFDPSAFLVPRDP 187 RIKPIYNLYAISCHSG..ILGGGHYVTYAK. NPNCKWYCYNDSSCKELHPDEI.......DTDSAYILFYEQQ USP33 PVVTLQDCLAA...FFARDELK...GDNMYSCEKCK....KLRNGVKFCKVQNFPEILCIHLKRFRHE...LMFSTKISTHVSFPLEGLDLQPFLAKDSP. AQIVTYDLLSVICHHG..TASSGHYIAYCRN NLNNLWYEFDDQSVTEVSESTV.......QNAEAYVLFYRKS USP34 DMKNIYESLDE...VTIKDTLE...GDNMYTCSHCG....KKVRAEKRACFKKLPRILSFNTMRYTFNMV.TMMKEKVNTHFSFPL.RLDMTPYTEDFLMG 17 SESYEYDLIGVTVHTG..TADGGHYYSFIR 7 YKNNKWYLFNDAEVKPFDSAQL.25..FEKTHSAYMLFYKRM USP35 GSRSVLDLVNY...FLSPEKLT...AENRYYCESCA....SLQDAEKVVELSQGPCYLILTLLRFSFDLR.TMRRRKILDDVSIPL.LLRLPLAG...... GRGQAYDLCSVVVHSGV.SSESGHYYCYAR 16 EPENQWYLFNDTRVSFSSFESV..5..FFPKDTAYVLFYRQR USP36 QAANIVRALEL...FVKADVLS...GENAYMCAKCK....KKVPASKRFTIHRTSNVLTLSLKRFAN.....FSGGKITKDVGYPE.FLNIRPYMSQNN.. GDPVMYGLYAVLVHSGY.SCHAGHYYCYVK. ASNGQWYQMNDSLVHSSNVKVV.......LNQQAYVLFYLRI USP37 PPRSIQDSLDL...FFRAEELE.......YSCEKCG.....GKCALVRHKFNRLPRVLILHLKRYSFNVA.LSLNNKIGQQVIIPR.YLTLSSHCTENTKP 283 NLPHSYRLISVVSHIGS.TSSSGHYISDVYD IKKQAWFTYNDLEVSKIQEAAV....QSDRDRSGYIFFYMHK USP38 TTPSVTDLLNY...FLAPEILT...GDNQYYCENCA....SLQNAEKTMQITEEPEYLILTLLRFSYDQK.YHVRRKILDNVSLPL.VLELPVKRITSFSS 32 TKLVPYLLSSVVVHSGI.SSESGHYYSYAR 44 EMSKEWFLFNDSRVTFTSFQSV..5..RFPKDTAYVLLYKKQ USP39 KDEKEQLIIPQ...VPLFNILA...KFNGITEKEYKT...YKENFLKRFQLTKLPPYLIFCIKRFTKN...NFFVEKNPTIVNFPITNVDLREYLSEEVQA 1 HKNTTYDLIANIVHDG..KPSEGSYRIHVLH HGTGKWYELQDLQVTDILPQMI.......TLSEAYIQIWKRR USP40 NVSGLEDALWNM..YVEEEVFD...CDNLYHCGTCD....RLVKAAKSAKLRKLPPFLTVSLLRFNFDFV.KCERYKETSCYTFPL.RINLKPFCEQSELD DLEYIYDLFSVIIHKG..GCYGGHYHVYIK 131 ISCPHWFDINDSKVQPIREKDI..3..FQGKESAYMLFYRKS USP41 PLKTLEDALHC...FFQPRELS...SKSKCFCENCG....KKTRGKQVLKLTHLPQTLTIHLMRFSIR...NSQTRKICHSLYFPQ.SLDFSQILPMKRES 5 QSGGQYELFAVIAHVG..MADSGHYCVYIRN AVDGKWFCFNDSNICLVSWEDI....QCTYGNPNYHW..... USP42 AAQSVNKALEQ...FVKPEQLD...GENSYKCSKCK....KMVPASKRFTIHRSSNVLTLSLKRFAN.....FTGGKIAKDVKYPE.YLDIRPYMSQPN.. GEPIVYVLYAVLVHTGF.NCHAGHYFCYIK. ASNGLWYQMNDSIVSTSDIRSV.......LSQQAYVLFYIRS USP43 HSCTLDECFQF...YTKEEQLA...QDDAWKCPHCQ....VLQQGMVKLSLWTLPDILIIHLKRFCQV...GERRNKLSTLVKFPLSGLNMAPHVAQRSTS 21 PLDFLYDLYAVCNHHG..NLQGGHYTAYCRN SLDGQWYSYDDSTVEPLREDEV.......NTRGAYILFYQKR USP44 QPCLVTEMLAK...FTETEALE....GKIYVCDQCN.11.VLTEAQKQLMICHLPQVLRLHLKRFRWSG..RNNREKIGVHVGFEE.ILNMEPYCCRETLK 3 PECFIYDLSAVVMHHGK.GFGSGHYTAYCYN SEGGFWVHCNDSKLSMCTMDEV.......CKAQAYILFYTQR USP45 KECSIQSCLYQ...FTSMELLM...GNNKLLCENCT.19.VYTNARKQLLISAVPAVLILHLKRFHQA...GLSLRKVNRHVDFPL.MLDLAPFCSATCKN 3 GDKVLYGLYGIVEHSG..SMREGHYTAYVK 26 ESAGQWVHVSDTYLQVVPESRA.......LSAQAYLLFYERV USP46 QNTSITHCLRD...FSNTETLC...SEQKYYCETCC....SKQEAQKRMRVKKLPMILALHLKRFKYMEQ.LHRYTKLSYRVVFPL.ELRLFNTSSDAV.. NLDRMYDLVAVVVHCGS.GPNRGHYITIVK. .SHGFWLLFDDDIVEKIDAQAI.10..SKNSESGYILFYQSR USP47 AFASVEEALHA...FIQPEILD...GPNQYFCERCK....KKCDARKGLRFLHFPYLLTLQLKRFDFDYT.TMHRIKLNDRMTFPE.ELDMSTFIDVEDEK 54 KNSLIYELFSVMAHSG..SAAGGHYYACIKS FSDEQWYSFDDQHVSRITQEDI.17..FASSTNAYMLIYRLK USP48 GHKQLTDCISE...FLKEEKLE...GDNRYFCENCQ....SKQNATRKIRLLSLPCTLNLQLMRFVFDRQ.TGHKKKLNTYIGFSE.ILDMEPYVEHK... GGSYVYELSAVLIHRGV.SAYSGHYIAHVKD PQSGEWYKFNDEDIEKMEGKKL.23..THCSRNAYMLVYRLQ USP49 TECLLTEMLAK...FTETEALE....GRIYACDQCN.11.VLSEARKQLMIYRLPQVLRLHLKRFRWSG..RNHREKIGVHVVFDQ.VLTMEPYCCRDMLS 3 KETFAYDLSAVVMHHGK.GFGSGHYTAYCYN TEGGFWVHCNDSKLNVCSVEEV.......CKTQAYILFYTQR USP50 YECSLRDCLQC...FFQQDALT...WNNEIHCSFCE....TKQETAVRASISKAPKIIIFHLKRFDIQ...GTTKRKLRTDIHYPLTNLDLTPYICSIF.. RKYPKYNLCAVVNHFG..DLDGGHYTAFCKN SVTQA*YSFDNTRVSEIPDTSV........QNAMDLLFYSCQ USP51 GIPSLTDCLQW...FTRPEHLG...SSAKIKCNSCQ....SYQESTKQLTMKKLPIVACFHLKRFEHV...GKQRRKINTFISFPL.ELDMTPFLASTKES 12 PNENKYSLFAVINHHG..TLESGHYTSFIR. QQKDQWFSCDDAIITKATIEDL.......LYSEGYLLFYHKQ USP52 KTGKNYDFAQV...LKRSICLD...QNTQAWCDTCEK....YQPTIQTRNIRHLPDILVINCEVNSSK...EADFWRMQAEVAFKM.AVKKHGGEISKNKE 64 HGVYVYDLMATVVHIL..DSRTGGSLVAHIK 11 VTHQQWYLFNDFLIEPIDKHEA..2.FDMNWKVPAILYYVKR USP53 LCNEVERMLERHE.RFKPEMFA.5.ANTTDDYRKCPSN..CGQKIKIRRVLMNCPEIVTIGLVWDSEH...SDLTEAVVRNLATHL.YLPGLFYRVTDEN. AKNSELNLVGMICYT......SQHYCAFAFH TKSSKWVFFDDANVKEIGTRWK..6..IRCHFQPLLLFYANP USP54 QAICMLERREKPSPSMFGELLQ.1.ASTMGDLRNCPSN..CGERIRIRRVLMNAPQIITIGLVWDSDH...SDLAEDVIHSLGTCL.KLGDLFFRVTDDR. AKQSELYLVGMICYY......GKHYSTFFFQ TKIRKWMYFDDAHVKEIGPKWK..6..IKGHYQPLLLLYADP USPL1 HMKSLVTFTNV...IPEWHPLN...AAHFGPCNNCN.....SKSQIRKMVLEKVSPIFMLHFVEGLP....................QNDLQHYAFH.... FEGCLYQITSVIQYRA.....NNHFITWIL. DADGSWLECDDLKGPCSERHKK....FEVPASEIHIVIWERK CYLD LELNITDLLED...TPRQCRIC.1.GLAMYECRECYDD.PDISAGKIKQFCKTCNTQVHLHPKRLNHK..........YNPVSLPK.DLPDWDWRHGC... 66 IPCQNMELFAVLCIE......TSHYVAFVKY 1 KDDSAWLFFDSMADRDGGQNGF.33.ARRLLCDAYMCMYQSP

Supplementary Figure 2 (part 2)

Supplementary Table I. Catalytic domain boundaries, according to ProSite annotation, and position of insertions identified in this study. All numbers are

based on the human sequences. USP names in bold letters contain insertions, and USP names not underlined are catalytically inactive.

Catalytic core Insert 1 Insert 2

Enzyme Total length Insertion point Start End Length Start End Length Start End Length

1 USP1 785 2 - 3 and 5 - 6 81 785 705 219 413 195 600 741 142 2 USP2 605 267 600 334 3 USP3 520 159 512 354 4 USP4 963 3 - 4 302 914 613 485 773 289 5 USP5/IsoT 858 3 - 4 326 857 532 610 795 186 6 USP6 1406 3 - 4 and 4 - 5 532 1370 839 715 1028 314 1116 1306 191 7 USP7/HAUSP 1102 214 522 309 8 USP8 1118 777 1110 334 9 USP9X 2547 1 - 2 1550 1950 401 1586 1630 45 10 USP9Y 2555 1 - 2 1559 1959 401 1596 1639 44 11 USP10 798 415 796 382 12 USP11 920 3 - 4 266 888 623 449 733 285 13 USP12 370 39 370 332 14 USP13 863 3 - 4 336 862 527 622 800 179 15 USP14 494 105 484 380 16 USP15 981 3 - 4 289 934 646 472 783 312 17 USP16 823 3 - 4 196 823 628 393 628 236 18 USP17L2 530 80 376 297 19 USP18 372 55 371 317 20 USP19 1318 3 - 4 497 1215 719 680 1055 376 21 USP20 914 2 - 3 145 686 542 251 432 182 22 USP21 565 212 559 348 23 USP22 525 176 521 346 24 USP24 2620 1689 2043 355 25 USP25 1087 4 - 5 169 658 490 412 585 174 26 USP26 913 4 - 5 295 887 593 558 818 261

27 USP27 559 78 422 345 28 USP28 1077 4 - 5 162 651 490 405 578 174 29 USP29 922 4 - 5 285 886 602 548 817 270 30 USP30 517 4 - 5 68 503 436 363 431 69 31 USP31 1352 3 - 4 128 766 639 316 593 278 32 USP32 1604 3 - 4 and 4 - 5 734 1568 835 917 1230 314 1318 1504 187 33 USP33 942 2 - 3 185 716 532 291 462 172 34 USP34 3546 1894 2240 347 35 USP35 1017 3 - 4 440 926 487 603 755 153 36 USP36 1121 122 424 303 37 USP37 979 4 - 5 341 952 612 601 883 283 38 USP38 1042 3 - 4 445 950 506 624 713 90 39 USP39/SNUT2 565 225 556 332 40 USP40 1235 5 - 6 41 483 443 312 442 131 41 USP41 358 55 358 304 42 USP42 1325 111 413 303 43 USP43 1124 3 - 4 101 712 612 277 538 262 44 USP44 712 1 - 2 273 679 407 310 375 66 45 USP45 814 3 - 4 190 814 625 388 615 228 46 USP46 366 35 366 332 47 USP47 1375 188 565 378 48 USP48 1035 89 422 334 49 USP49 688 1 - 2 253 658 406 290 351 62 50 USP50 339 44 339 296 51 USP51 711 363 707 345 52 USP52/PAN 1202 4 - 5 517 925 409 781 844 64 53 USP53 1073 39 352 314 54 USP54 1684 33 353 321 55 CYLD 956 4 - 5 592 951 360 784 857 74 56 USPL1 1092 227 501 275