biochemistry - macromolecular...
TRANSCRIPT
Crystal Structure of YkfC
Crystal structure of the -D-glutamyl-L-diamino acid endopeptidase
YkfC from Bacillus cereus in complex with L-Ala--D-Glu: insights
into substrate recognition by NlpC/P60 cysteine peptidases
Qingping Xu1,2, Polat Abdubek1,3, Tamara Astakhova1,4, Herbert L. Axelrod1,2, Constantina
Bakolitsa1,5, Xiaohui Cai1,6, Dennis Carlton1,7, Connie Chen1,3, Hsiu-Ju Chiu1,2, Michelle Chiu1,3,
Thomas Clayton1,7, Debanu Das1,2, Marc C. Deller1,7, Lian Duan1,4, Kyle Ellrott1,6, Carol L. Farr1,7,
Julie Feuerhelm1,3, Joanna C. Grant1,3, Anna Grzechnik1,7, Gye Won Han1,7, Lukasz
Jaroszewski1,4,5, Kevin K. Jin1,2, Heath E. Klock1,3, Mark W. Knuth1,3, Piotr Kozbial1,5, S. Sri
Krishna1,4,5, Abhinav Kumar1,2, David Marciano1,7, Mitchell D. Miller1,2, Andrew T. Morse1,4,
Edward Nigoghossian1,3, Amanda Nopakun1,7, Linda Okach1,3, Christina Puckett1,3, Ron Reyes1,2,
Henry J. Tien1,7, Christine B. Trame1,2, Henry van den Bedem1,2, Dana Weekes1,5, Tiffany
Wooten1,3, Andrew Yeh1,2, Keith O. Hodgson1,8, John Wooley1,4, Marc-André Elsliger1,7, Ashley
M. Deacon1,2, Adam Godzik1,4,5, Scott A. Lesley1,3,7, and Ian A. Wilson1,7,*
1Joint Center for Structural Genomics, 2Stanford Synchrotron Radiation Lightsource, SLAC
National Accelerator Laboratory, Menlo Park, California 94025, 3Protein Sciences Department,
Genomics Institute of the Novartis Research Foundation, San Diego, California 92121, 4Center
for Research in Biological Systems, University of California, San Diego, La Jolla, California
92093, 5Program on Bioinformatics and Systems Biology, Burnham Institute for Medical
Research, La Jolla, California 92037, 6University of California, San Diego, La Jolla, California
92093, 7Department of Molecular Biology, The Scripps Research Institute, La Jolla, California
92037, 8Photon Science, SLAC National Accelerator Laboratory, Menlo Park, California 94025
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
9
10
11
12
13
15
16
17
18
19
20
21
22
2
3
Crystal Structure of YkfC
* To whom correspondence should be addressed. The Scripps Research Institute, BCC206,
10550 North Torrey Pines Road, La Jolla, CA 92037. E-mail: [email protected]
Structure of endopeptidase YkfC
1
1
2
3
4
5
2
3
Page 3
Abstract
Dipeptidyl-peptidase VI of Bacillus sphaericus and YkfC of Bacillus subtilis were
characterized previously as highly specific -D-glutamyl-L-diamino acid endopeptidases. We
determined the crystal structure of the YkfC orthologue from Bacillus cereus in complex with L-
Alanine--D-glutamate at 1.8 Å resolution. YkfC contains two N-terminal bacterial SH3
domains (SH3b) and a C-terminal catalytic domain, which is a member of a very large family of
cell-wall related cysteine peptidases, known as NlpC/P60 domains. A bound reaction product (L-
Ala--D-Glu) allows the identification of conserved sequence and structural signatures, which
recognize L-Ala and -D-Glu. The structure provides a clear framework for understanding
previous experimental results, including the substrate specificity observed in the dipeptidyl-
peptidase VI and YkfC. The first SH3b domain plays an important role in defining substrate
specificity such that only murein peptides with a free N-terminal alanine are allowed. A
conserved acidic residue present on the NlpC/P60 domain interacts with the amine group of the
alanine. This structural feature of the catalytic domain allows us to define a subfamily of
NlpC/P60 enzymes with the same substrate requirement, including a previously characterized
cyanobacterial L-Alanine--D-glutamate endopeptidase that is essentially a substructure of
YkfC.
Introduction
Cell-wall turnover, which refers to the loss of peptidoglycan components, were reported in
many bacteria, such as Escherichia coli and Bacillus subtilis (1). The products of the turnover
are generally reutilized through a process known as peptidoglycan (PG) recycling (2). The
molecular processes involved in cell-wall turnover and recycling are not currently well
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
2
3
Page 4
understood in comparison to cell-wall synthesis, particularly in bacteria other than E. coli (2-6).
In E. coli, the cell-wall is degraded by lytic transglycosylases to release anhydromuropeptides
(GlcNAc-anhMurNAc-L-Ala--D-Glu-DAP-D-Ala, DAP=meso-diaminopimelic acid). The
anhydromuropeptides are imported into the cytoplasm primarily by AmpG permease, and
subsequently processed by N-acetyl-anhydromuramyl-L-Alanine amidase (cleaves the bond
between GlcNAc-anhMurNAc and L-Ala) and LD-carboxypeptidase LdcA (cleaves the bond
between DAP and D-Ala). There are two possible fates for the generated murein tripeptide L-
Ala--D-Glu-DAP. Under normal growth conditions, these tripeptides are recycled by the Mpl
ligase and returned to the peptidoglycan biosynthetic pathway. An additional pathway (Figure 1)
is involved in the murein tripeptide metabolism in E. coli, probably during nutrient-limiting
conditions (6). The MpaA endopeptidase, a metallocarboxypeptidase, specifically cleaves the -
D-Glu-DAP bond to produce L-Ala--D-Glu and DAP. The L-Ala--D-Glu is then converted to
L-Ala-L-Glu and, subsequently, to L-Ala and L-Glu by the YcjG epimerase and the PepD
peptidase respectively.
Some of the enzymes in cell-wall cycling of E. coli, such as AmpG, AmpD, Mpl and MpaA,
have no homologs in B. subtilis (2), suggesting the mechanism for cell-wall recycling may differ
in details between the two bacteria. It was proposed that the PG is cleaved by a muramidase and
an amidase to produce GlcNAc-MurNAc and stem peptides (2). The free peptides are imported
into the cytoplasm by an unidentified permease and subsequently processed by YkfABC
enzymes. YkfA, an LD-carboxypeptidase, removes the terminal D-Ala. The generated tripeptide
is further metabolized by a pathway which is functionally equivalent to that of E. coli with YkfC
as the -D-Glu-DAP endopeptidase and YkfB as the L-Ala-D-Glu epimerase (Figure 1) (7).
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
2
3
Page 5
Interestingly, while YkfB is homologous to YcjG of E. coli, YkfC is not homologous to MpaA
despite an equivalent function.
YkfC contains a C-terminal NlpC/P60 cysteine peptidase domain. NlpC/P60 is a large family
of cell-wall related cysteine peptidases that are broadly distributed in bacteria, virus, archaea and
eukaryotes (8). Currently characterized NlpC/P60 enzymes are almost all -D-Glu-DAP(Lys)
endopeptidases. While their biochemical function seems to be conserved, the physiological roles
of NlpC/P60 proteins are diverse, including cell separation, expansion, differentiation, cell-wall
turnover, cell lysis, protein secretion and virus infection (9). Secreted NlpC/P60 proteins also
have other roles in pathogenesis. The autolysin P60 of Listeria monocytogenes is involved in
host cell invasion (10). Enterotoxin FM of Bacillus cereus is involved in food poisoning (11).
SagA of Enterococcus faecium is a secreted antigen which binds to extracellular matrix proteins
(12).
NlpC/P60 proteins could be lethal to the bacteria due to their ability to compromise cell-wall
integrity or cell-wall biosynthesis. Therefore, their activities are tightly controlled through
multiple mechanisms (9): the expression of these proteins is regulated at the transcription level;
they are localized dependent on their physiological roles; their atomic level structures are
modified to precisely define substrate specificity (13). NlpC/P60 proteins are often fused to
auxiliary domains, many of which are known cell-wall binding modules (e.g. LysM and choline
binding domain). Thus, it is generally assumed that these auxiliary domains function as targeting
domains, which localize their proteins to the cell-wall. The functional synergy between the
NlpC/P60 domains and their auxiliary domains are currently not fully understood. We previously
determined the crystal structure of a -D-Glu-DAP endopeptidase from cyanobacteria
(AvPCP/NpPCP, Anabaena variabilis/Nostoc punctiforme PG cysteine peptidase) (13), which
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
2
3
Page 6
consists of an N-terminal bacterial SH3 domain (SH3b) and a C-terminal NlpC/P60 domain. We
proposed that the SH3b domain of this enzyme is important in defining the substrate specificity
of the peptidase domain. However, the mechanism of substrate recognition by the NlpC/P60 and
SH3b was not firmly established. Here, we report the crystal structure of YkfC from B. cereus
(BcYkfC) in complex with L-Ala--D-Glu. BcYkfC is a close homolog to the YkfC protein
from B. subtilis that was previously biochemically characterized (7). Thus, we have a first
detailed view of substrate recognition by an NlpC/P60 protein.
Experimental Details
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
9
2
3
Page 7
Sequence analysis. Homologs of BcYkfC is obtained using PSI-BLAST (14) (3 iterations)
against the non-redundant (nr) protein sequence database at National Center for Biotechnology
Information (NCBI), using the sequence of catalytic domain of BcYkfC as the probe (residues
210-333). Alignment length (>=70) and E-value (<=0.02) are used as a criterion to extract a
subset of hits. These proteins (2599 sequences) were aligned using HMMALIGN (15) against the
ls profile model of NlpC/P60 domains from the PFAM database (v23, PF00877) (16). The YkfC
subfamily candidates were extracted from the above aligned subset based on the presence of an
aspartate at position Asp256 of BcYkfC. The full-length sequences were then aligned and
clustered using PIPEALIGN (17). Only sequences with a conserved tyrosine at position Tyr118
of BcYkfC were classified into YkfC subfamily. The phylogenetic tree was constructed using
the MAFFT (18).
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
9
10
11
2
3
Page 8
Protein expression and purification. Clones were generated using the Polymerase Incomplete
Primer Extension (PIPE) cloning method (19). The gene encoding BcYkfC (GenBank:
NP_979181; Swiss-Prot: Q736M3) was amplified by polymerase chain reaction (PCR) from
Bacillus cereus NRS248 ATCC 10987 genomic DNA using PfuTurbo DNA polymerase
(Stratagene) and I-PIPE (Insert) primers (forward primer: 5’-
ctgtacttccagggcGAAGAGAAGAAAGATAGTAAGGCGT-3’, reverse primer: 5’-
aattaagtcgcgttaAGGTAAGTAACGACGCGCACCAGCG-3’, target sequence in upper case) that
included sequences for the predicted 5' and 3' ends. The expression vector, pSpeedET, which
encodes an amino-terminal tobacco etch virus (TEV) protease-cleavable expression and
purification tag (MGSDKIHHHHHHENLYFQ/G), was PCR amplified with V-PIPE (Vector)
primers (forward primer: 5’-taacgcgacttaattaactcgtttaaacggtctccagc-3’, reverse primer: 5’-
gccctggaagtacaggttttcgtgatgatgatgatgatg-3’). V-PIPE and I-PIPE PCR products were mixed to
anneal the amplified DNA fragments together. Escherichia coli GeneHogs (Invitrogen)
competent cells were transformed with the I-PIPE / V-PIPE mixture and dispensed on selective
LB-agar plates. The cloning junctions were confirmed by DNA sequencing. Using the PIPE
method, the gene segment encoding residues A1-S23 were deleted. Expression was performed in
a selenomethionine-containing medium with suppression of normal methionine synthesis. At the
end of fermentation, lysozyme was added to the culture to a final concentration of 250 μg/ml,
and the cells were harvested and frozen. After one freeze/thaw cycle the cells were sonicated in
lysis buffer [50mM HEPES pH 8.0, 50mM NaCl, 10mM imidazole, 1mM Tris(2-
carboxyethyl)phosphine-HCl (TCEP)] and the lysate was clarified by centrifugation at 32,500g
for 30min. The soluble fraction was passed over nickel-chelating resin (GE Healthcare) pre-
equilibrated with lysis buffer, the resin washed with wash buffer [50mM HEPES pH 8.0, 300mM
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
2
3
Page 9
NaCl, 40mM imidazole, 10% (v/v) glycerol, 1mM TCEP], and the protein eluted with elution
buffer [20mM HEPES pH 8.0, 300mM imidazole, 10% (v/v) glycerol, 1mM TCEP]. The eluate
was buffer exchanged with TEV buffer [20mM HEPES pH 8.0, 200mM NaCl, 40mM imidazole,
1mM TCEP] using a PD-10 column (GE Healthcare), and incubated with 1mg of TEV protease
per 15 mg of eluted protein. The protease-treated eluate was run over nickel-chelating resin (GE
Healthcare) pre-equilibrated with HEPES crystallization buffer [20mM HEPES pH 8.0, 200mM
NaCl, 40mM imidazole, 1mM TCEP] and the resin was washed with the same buffer. The flow-
through and wash fractions were combined and concentrated for crystallization trials to 18.8
mg/ml by centrifugal ultrafiltration (Millipore).
Oligomeric state. The oligomeric state of BcYkfC was determined using a 0.8 x 30cm2 Shodex
Protein KW-803 column (Thomson Instruments) pre-calibrated with gel filtration standards (Bio-
Rad). BcYkfC is likely a monomer in solution, supported by crystal packing analysis and
analytical size exclusion chromatography.
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
9
10
11
12
13
2
3
Page 10
Crystallization. BcYkfC was crystallized by mixing 200nl protein with 200nl crystallization
solution and with a 50μl reservoir volume using the nanodroplet vapor diffusion method (20)
with standard JCSG crystallization protocols (21). The crystallization reagent that produced the
BcYfkC crystal used for structure solution contained 0.2M sodium chloride, 50% PEG-200, and
0.1M phosphate citrate pH 4.2. A needle-shaped crystal of approximate size 100 μm x 15 μm x
15 μm was harvested after 29 days at 277K. No additional cryoprotectant was added to the
crystal. Initial screening for diffraction was carried out using the Stanford Automated Mounting
system (SAM) (22) at the Stanford Synchrotron Radiation Lightsource (SSRL, Menlo Park, CA).
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
2
3
Page 11
Data collection, structure solution, and refinement. Multi-wavelength anomalous diffraction
(MAD) data were collected at SSRL beamline 11-1. Data were collected at wavelengths
corresponding to the peak, high energy remote, and inflection of a selenium MAD experiment at
100 K using Mar CCD 325 detector (Rayonix). Data processing and structure solution from
initial diffraction images were carried out using an automatic structure solution script
(autoXDSp) developed at the JCSG. This script shepherds the structural determination process
described below using a few preset rules mimicking decision-making of a crystallographer. The
calculations were parallelized on a PC cluster such that an initial map/model can be usually
obtained within an hour after the completion of data collection. The MAD data were integrated
and reduced using XDS and then scaled with the program XSCALE (23). Selenium sites were
located with SHELXD (24). Phase refinement and automatic model building were performed
using autoSHARP (25) and wARP (26). The above automatic process produced an initial model
that is 92% complete. Further model completion and refinement were performed manually with
COOT (27) and REFMAC (28) of the CCP4 suite (29). Data and refinement statistics are
summarized in Table 1. Analysis of the stereochemical quality of the model was accomplished
using MolProbity (30). All molecular graphics were prepared with PyMOL (DeLano Scientific)
unless specifically stated otherwise.
Molecular modeling. Molecular docking was performed using the same protocol as described
previously (13) using Glide 5.0 (Schrödinger LLC, Portland, OR). The positions of the bound
ligand were used as restraints such that the L-Ala--D-Glu portion of the docked substrate
adopts a similar conformation as seen in the crystal structure. The catalytic cysteine is in reduced
form for the docking studies. Only main conformers were used if multiple conformations were
observed for the active site residues.
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
2
3
Page 12
Accession code. Atomic coordinates and experimental structure factors for BcYkfC at 1.8 Å have
been deposited in the PDB under accession code 3h41.
Results and Discussion
Genomic context. The full-length BcYkfC (strain B. cereus ATCC 10987) contains 333 residues
(Molecular weight 37.3 kD). The N-terminal 23 residues of BcYkfC were predicted to be a
signal peptide (not a membrane anchoring helix) by the Phobius program (31). B. subtilis YkfC,
despite being highly homologous to BcYkfC (40% sequence identity), does not contain a signal
peptide (length 296 aa), suggesting that the two proteins function at different cellular locations.
As in B. subtilis, the ykfC gene (BCE_2878) is adjacent to the ykfB L-Ala-D-Glu epimerase
gene (BCE_2879) in the B. cereus genome. The conserved genome association of ykfB and ykfC
is also observed in other bacteria (e.g. Listeria innocua, Acidobacterium sp., Solibacter usitatus,
Bacteroides thetaiotaomicron, Gramella forsetii). However, the genome context of the ykfBC is
different in B. subtilis compared with B. cereus (Figure 2). The YkfA-D genes of B. subtilis are
located next to the dppB-E dipeptide ABC transporter operon, whereas the YkfBC genes of B.
cereus are located downstream of divIC, which encodes a putative cell division protein. The
downstream of ykfC is an oppA gene that is homologous to dppE (34% sequence identity). Both
genes are predicted to encode periplasmic dipeptide binding proteins.
Interestingly, YkfB epimerases of both B. subtilis and B. cereus (seq id 50%) do not have
signal peptides. The differences in signal peptides in YkfC B. subtilis and B. cereus and their
genomic contexts suggest that the strategy for metabolizing murein peptides likely differs in the
two bacteria. In B. cereus, the murein peptides are likely broken down outside the cell with
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
2
3
Page 13
resulting dipeptide L-Ala--D-Glu imported into cytoplasm for further processing by YkfB,
while both reactions likely occur in the cytoplasm in B. subtilis.
Structure determination and quality of the model. The crystal structure of BcYkfC was
determined using the high-throughput structural genomics pipeline implemented at the Joint
Center for Structural Genomics (JCSG) (21). The selenomethionine derivative of BcYkfC was
expressed in E. coli with an N-terminal TEV cleavable His-tag and purified by metal affinity
chromatography. The predicted N-terminal signal peptide (residues 1-23) was not included in the
cloned construct. The structure was solved in space group C2 at 1.79 Å resolution with one
molecule per asymmetric unit using the MAD method (Rfactor=16.3/Rfree=19.7). The main chain
density is excellent throughout the whole molecule. The mean residual error of the coordinates
was estimated at 0.11 Å by a diffraction-component precision index method (DPI) (32). The
model of BcYkfC displays good geometry with an all-atom clash score of 5.07, and the
Ramachandran plot produced by MolProbity (30) shows that all residues but one are in allowed
regions, with 97.7 % in favored regions. The side chains of the model are also well refined, only
1 residue flagged by MolProbity as a rotamer outlier. The Ramachandran outlier (Pro305) and
rotamer outlier (His303) are supported by well defined electron density. Since these residues are
either close to or part of the active site, these structural anomalies are likely of functional
relevance. Additionally, two cis-peptides (Asn57-Pro58 and Asn154-Pro155) are also supported
by clear electron density. The final model of BcYkfC contains residues 29 to 333, one dipeptide
L-Ala--D-Glu, one phosphate, 6 polyethylene glycol (PEG) fragments from the crystallization
solution and 265 waters. The residual of the N-terminal residual purification tag (Gly0), residues
24-28, as well as side chains for residues Glu201 and Arg270, are disordered and were not
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
2
3
Page 14
included in the final model. Data collection, refinement and model statistics are summarized in
Table 1.
Overall structure. The structure of BcYkfC consists of three domains, including two bacterial
SH3 domains (SH3b) (SH3b1, residues 29-129; SH3b2, residues 130-207), and a C-terminal
NlpC/P60 cysteine peptidase domain (residues 208-333) (Figure 3A). The structures of the two
SH3b domains are similar to each other with an rmsd of 2.2 Å for 57 aligned C atoms (sequence
identity 13%). SH3b1 contains a helical insertion (1-3) compared to SH3b2 (Figure 3B). A
structural similarity search using Dali (33) indicated that there are no previously known
structures with the same three-domain architecture. However, significantly similar substructures
were identified. A summary of structural comparisons is present in Table 2. The SH3b domains
are similar to several known SH3bs (33), including the N-terminal domain of cyanobacteria -D-
Glu-DAP endopeptidases (13), the GW domains of Internalin B (34), the cell wall-targeting
domain of glycylglycine endopeptidase ALE-1 (35), the PhnA-like protein (36), the endolysin
PlyPSA (37), as well as many eukaryotic SH3 domains. A -hairpin within the RT loop region
(between A and B) appears to be a common and unique structural feature among SH3bs
compared with their eukaryotic counterparts (13). Both SH3bs of BcYkfC also have this
conserved structural motif (A1-A2). The C-terminal NlpC/P60 catalytic domain of BcYkfC is
remotely related to the papain family of cysteine peptidases (8). It is most similar to
cyanobacteria -D-Glu-DAP endopeptidases (13), E. coli lipoprotein Spr (38), and two
uncharacterized proteins (PDB codes: 2p1g and 2im9).
The three domains of BcYkfC are arranged in a triangular manner such that each domain
interacts with two other domains. The domain interface between the two SH3b domains is
mostly hydrophobic. The interface (~570 Å2 per domain) centers at the A2-B loop of SH3b1
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
2
3
Page 15
and the A2-B region of SH3b2. The SH3b1/NlpC/P60 domain interface is mediated through
the 2-3-A1 region and C-D loop of SH3b1 and the 1-2-3 region and the 6-7 loop
of NlpC/P60 and buries a surface area of 903 Å2 per domain. The active site is located at the
domain interface of SH3b1 and NlpC/P60. SH3b2 domain is located away from the active site,
thus does not contribute to the active site. A multiple sequence alignment of full length homologs
of YkfC (37 sequences, average seq id 54%) indicated that most of the highly conserved residues
are either buried inside the protein or are clustered around the active site (Figure 3C).
Active site. The catalytic triad on the peptidase domain consists of Cys238, His291 and His303.
Cys238 appears tri-oxidized based on the electron density. The conformation of the cysteine side
chain is not significantly affected by the oxidation as its side chain adopts a similar conformation
as seen in other NlpC/P60 structures (13).
A dipeptide L-Ala--D-Glu was identified based on well-defined electron density in the
active site of BcYkfC (Figure 4A). The dipeptide was not added during protein purification or
crystallization, thus it is most likely to have been obtained from the expression host E. coli. Since
this dipeptide is one of the reaction products of BcYkfC (Figure 2), it unequivocally identifies
the S1-2 binding site cavity, which is formed by residues from both the SH3b1 domain (Glu83,
Thr84 and Tyr118) and the NlpC/P60 domain (Tyr226, Trp228, Ala229, Asp237, Arg255,
Asp256, Ser257, His290, and His291). Two charged residues, Asp237 and Arg255, which are
highly conserved in NlpC/P60 domains, are involved in a hydrogen bond network connecting
many residues in the active site (Figure 4B).
Two of the residues in the active site, Ser257 and His291, displayed discrete side chain
disorder. The first conformer of His291 (occupancy modeled as 0.7), which allows hydrogen
bonding with His303, is identical to the corresponding histidine residue of the catalytic dyad in
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
2
3
Page 16
papain and other NlpC/P60 enzymes. The second conformation of His291 points towards the
solvent. The side chain disorder in the active site is likely an artifact of the oxidized state of the
catalytic Cys238, since the oxidized Cys238 makes multiple hydrogen bond interactions with
nearby side chains, including Tyr226, Ser257 and His291 (Figure 4B).
Recognition of L-Ala--D-Glu. The active site residues form a pocket that is highly
complementary in shape and molecular properties with the ligand (Figure 4C-D). The average B-
factor for the bound ligand is 26 Å2 (overall B factor of the protein 17.8 Å2). The interface
between the dipeptide and the protein buries a total surface area of 490 Å2. The dipeptide is
stabilized by multiple hydrogen bonds. The free amine group of the L-Ala makes hydrogen
bonds with Glu83O, Tyr118OH, and Asp256O1. The -NH group of the -D-Glu forms a weak
H-bond with Asp237O2. The -carboxyl of D-Glu is stabilized by hydrogen bonding
interactions with Ser239 and Ser257. On the solvent-exposed side of the substrate, waters are
also involved in the hydrogen bond network with the substrate and the enzyme (not shown). The
side chain of the L-Ala points towards a hydrophobic region defined by the side chains of
Trp228 and Ala229. Additionally, the amphipathic carbons of D-Glu are involved in
hydrophobic contacts with Trp228 side chain. As a result, Trp228 contributes to both the S2 and
S1 binding sites.
YkfC is specific for murein peptides with free N-terminal L-Ala. The active site of B. subtilis
YkfC is highly conserved compared to that of BcYkfC (Figure 5). Dipeptidyl-peptidase VI (DPP
VI) of B. sphaericus is an -D-Glu-DAP(Lys) dipeptidase that has been found in the cell
cytoplasm during sporulation (39). It has strict specificity for murein peptides with a free N-
terminal L-Ala. The crystal structure of BcYkfC provides a structural basis for this specificity.
DPP VI is 21% sequence identical to BcYkfC with conserved residues distributed across the full-
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
2
3
Page 17
length of the protein (Figure 5). These residues are important for correct folding or function. As a
result, DPP VI is a homolog of BcYkfC except that the SH3b1 domain of DPP VI has no helical
insertion (1-3) between A1 and A2 as well as shorter C-D loop (Figure 5). Furthermore,
the S2 and S1 binding sites for L-Ala--D-Glu are highly conserved between DPP VI and
BcYkfC. DPP VI possesses the conserved tyrosine (Tyr118 of YkfC) that is the only residue
from SH3b1 whose side chain interacts with L-Ala. The residues from the catalytic domain that
contribute to S2 and S1 sites are identical between BcYkfC and DPP VI (except an A/S mutation
at position 257 of BcYkfC). As a result, we conclude that both B. subtilis and B. cereus YkfCs
have the same substrate specificity for murein peptides with a free N-terminal L-Ala. Enzymes
with this specificity can not cleave PG biosynthesis precursors such as UDP-MurNAc-
pentapeptide, thus are not likely to interfere with the PG synthesis pathway.
S’ sites of YkfC and docking studies. B. subtilis YkfC was shown previously to process the
tripeptide (L-Ala--D-Glu-L-Lys), the tetrapeptide (L-Ala--D-Glu-L-Lys-D-Ala) and the
pentapeptide (L-Ala--D-Glu-L-Lys-D-Ala-D-Ala) (7). DPP VI also displayed no specificity
towards additional residues attached to DAP (or Lys) (39). The residues in BcYkfC that
potentially form S’ sites include His290, His291, Arg306, Glu308, Arg309, Tyr321 and Glu324.
These residues are generally less conserved among close BcYkfC homologs (Figure 3),
suggesting that the specificity of YkfC is determined primarily at the S sites as described above.
The catalytic mechanism of NlpC/P60 is currently unknown, but it is likely to be similar to
papain based on a similar arrangement of their catalytic residues (8, 13, 38). The third polar
residue of the catalytic triad in NlpC/P60 is more often a histidine instead of an asparagine as
found in papain (16). Furthermore, a tyrosine (Tyr226 of BcYkfC) is equivalent to a glutamine
(Gln119) in papain that is thought to be important for substrate binding during catalysis. We
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
2
3
Page 18
docked tripeptide L-Ala--D-Glu-L-DAP to the active site of BcYkfC. Figure 6A shows a
possible mode of tripeptide interaction with the enzyme. The C end of -D-Glu is displaced due
to oxidation of Cys238 in the crystal structure. In the modeled structure, Tyr226OH interacts
with the carbonyl group of the -D-Glu (Figure 6B). This conformation places the C atom close
to Cys238SH (distance ~3.6 Å), likely similar to a productive conformation. The electrostatic
surface of the binding site is complementary to that of the substrate, indicating that electrostatic
interaction is likely important for substrate recognition. Basic residues (His290, His291 and
Arg306) could interact with carboxyl groups. It has been previously reported that the catalytic
efficiency of B. subtilis YkfC for L-Ala--D-Glu- Lys is significant less than that of tetra- or
penta- peptides (7). This could be explained by unfavorable environment (near Arg306/Glu308,
Lys/Gly for B. subtilis YkfC) for accommodating the positive charge on Lys.
Recognition of -D-Glu by NlpC/P60 homologs. NlpC/P60 cysteine peptidases are a ubiquitous
peptidase in bacteria, there are ~5,000 NlpC/P60 domains in the current Pfam database when
metagenomics data are included (16). However, only a few of these domains have been
biochemically characterized. In light of the clearer picture we have of substrate binding in
BcYkfC, we examined the sequence conservation of NlpC/P60 family proteins around the active
site in order to gain further insights into substrate specificity across the entire family.
The non-redundant (nr) database of protein sequences was searched using the PSI-BLAST
program (14) with the NlpC/P60 domain of BcYkfC as a search probe. We extracted a total of
2599 protein sequences using a criteria of E-value >0.02 and alignment length >70 residues from
3985 total hits, representing a significant subset of NlpC/P60 proteins. Among these proteins,
~1% do not possess the conserved dyad Cys/His, an additional 9% do not contain the conserved
tyrosine that is believed to be essential. These proteins are likely to have similar fold, but either
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
2
3
Page 19
have lost cysteine peptidase activity, or belong to remote homologs that have different substrate
specificity (e.g. CHAP family) (8, 40, 41). The sequence conservation pattern of active site
residues of the remaining 2277 proteins (average seq id 19%) is shown in Figure 6C. The S sites
(S2 and S1) are significantly more conserved than the S’ sites in these proteins. The most
conserved site is S1 that is tailored to recognize -D-Glu specifically. The three most highly
conserved residues (Asp237, Ser239 and Tyr226), aside from the catalytic dyad, are involved
with hydrogen bonding with -D-Glu (Figure 6B). At position Ser257, residues with small side
chains (A/S/T) are observed, creating a cavity for the carboxyl group of -D-Glu. Trp228
contributes to both the S1 and S2 sites and is conserved as a hydrophobic residue, which may
facilitate the interaction with C of Ala. Therefore, the conservation of active site residues
indicates that a significant percentage of the NlpC/P60 family is likely to be specific for -D-
Glu. More than 1300 proteins, which contain the strictly conserved Trp228, Asp237, Ser239,
Tyr226 residues (and catalytic triads), most likely bind X-L-Ala--D-Glu moieties (X=H or
other moieties).
Structural comparison to the cyanobacterial NlpC/P60 L-Ala--D-Glu endopeptidases. We
previously determined crystal structures of two closely-related NlpC/P60 L-Ala--D-Glu
endopeptidases from cyanobacteria A. variabilis (AvPCP) and N. punctiforme (NpPCP). These
enzymes contain an N-terminal SH3b domain and a C-terminal NlpC/P60 catalytic domain (13).
Surprisingly, the structures of the cyanobacterial enzymes are essentially a substructure of YkfC:
the two domains of the cyanobacteria enzymes are structurally equivalent to the first and third
domain of BcYkfC (Figure 5, 7A-B). The full-length AvPCP can be superimposed on to
BcYkfC, with an rmsd of 2.3 Å and 29% sequence identity for 193 aligned C atoms (Table 2).
Thus, the individual domains of SH3b and NlpC/P60 as well as their arrangements are highly
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
2
3
Page 20
similarly in AvPCP and BcYkfC. Furthermore, the residues of the S2 and S1 pockets are nearly
identical (except for S257A, Figure 7C). The striking similarity between the cyanobacteria
enzymes and YkfC was previously not detected by sequence analysis due to the presence of two
insertions: the entire SH3b2 domain and the 1-3 region of SH3b1.
The C-D loop and the A1-A2 loop of SH3b1 are longer compared to SH3b of AvPCP.
The C-D loop of YkfC is located upstream of the S2 site. The A1-A2 loop of YkfC
contains three helices (1-3). It is located at the S’ side of the binding groove on the catalytic
domain. This insertion packs against the surface of NlpC/P60 domain and contributes to
stabilization of the interface between SH3b1 and NlpC/P60 domains. Furthermore, it is also
located at the S’ binding site and define its shape. Indeed, the shape of S’ sites of BcYkfC appear
more restricted than that of AvPCP at the S’ sites (Figure 7D). Without these insertions to
stabilize the domain interface between SH3b and NlpC/P60, AvPCP uses a different strategy to
achieve similar purpose: a longer C-terminus loop extends to interact with its SH3b domain.
We previously proposed that the cyanobacteria enzymes have the same substrate requirement
as DPP VI based on modeling studies (13). The crystal structure of YkfC reported here further
supports that DPP VI, YkfC and the cyanobacteria enzymes belong to a subset of NlpC/P60 -D-
Glu-DAP endopeptidases whose substrates possess an N-terminal free L-Ala. The SH3b domain
helps define the S2 binding site (Tyr118 in BcYkfC, Tyr64 in AvPCP). Additionally, it sterically
hinders docking of large moiety beyond S2 site. Thus, it directly contributes to specificity. This
role for SH3b constitutes a new function for SH3b domains.
Identification and distribution of YkfC homologs. As shown above, BcYkfC and cyanobacterial
-D-Glu-DAP endopeptidases have the same specificity. Similar NlpC/P60 enzymes can be
detected by sequence homology in both the SH3b and the NlpC/P60 domains. However, due to
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
2
3
Page 21
the presence of long insertions/deletions and highly divergent sequences in the SH3b domains, it
is often difficult to detect similar enzymes using sequence searches, and the NlpC/P60 region
tends to dominate the hits. We examined an alternate method by examining specific residues on
the NlpC/P60 domain only. An acidic residue (Asp256) is essential for YkfC specificity by
neutralizing the positive charge on the free amine of the L-Ala. Among 2599 sequences obtained
above, 282 proteins were identified, which contained an aspartate equivalent to Asp256 and a
conserved catalytic dyad. 239 out of 282 proteins (85%) contained the conserved tyrosine on
their SH3b domains (Tyr118 of BcYkfC). Thus, the presence of aspartate residue at the position
Asp256 and a conserved tyrosine on SH3b is highly correlated (Figure 6D). Interestingly,
NlpC/P60 domains with the conserved aspartate are also found in a few single-domain or
multiple-domain proteins that do not contain a detectable SH3b domain. The biochemical
properties of these proteins are currently unknown; they could represent novel variants of a
YkfC-type enzyme.
Nonetheless, the 239 proteins above with the conserved aspartate (Asp256) and tyrosine
(Tyr118) most likely defines a YkfC-like subfamily of NlpC/P60 proteins. These NlpC/P60
enzymes are predominately distributed in four phyla of bacteria: bacteroidetes, cyanobacteria,
firmicute and proteobacteria-proteobacteria) (Figure 8). They include all currently known
enzymes with the same requirement of a free N-terminal L-Ala, i.e., DPP VI, YkfC, and
AvPCP/NpPCP. The cyanobacterial enzymes contain one SH3b domain, while homologs in
other bacteria contain two SH3b domains. As expected, the SH3b domains are highly divergent.
The D strand, where the conserved tyrosine is located, is more conserved (Figure 6D). This
strand is also responsible for the interface with the catalytic domain. BcYkfC is a member of a
cluster of highly conserved homologs that are present in closed related species such as Bacillus
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
2
3
Page 22
anthracis, Bacillus thuringiensis, Bacillus weihenstephanensis (seq id >= 95%). All the
members of this group contain signal peptides. Signal peptides are also present in some
members of the bacteroidetes group, while no signal peptides are identified in proteobacteria and
cyanobacteria homologs.
It is likely that the YkfC subfamily of highly specialized enzymes is evolved from a common
ancestor, likely a P60-like general-purpose enzyme with two or more SH3b domains connected
to a C-terminal NlpC/P60 domain. The SH3b domains may have lost the function as targeting
domains. The nonessential SH3b domain is degraded, with the complete loss of the second SH3b
domain in the case of the cyanobacterial enzymes.
Conclusion
The structure of BcYkfC in complex with L-Ala--D-Glu is the first structural representative
of an NlpC/P60 enzyme with a bound ligand. It allows us to identify the determinants of
substrate specificity, which further leads to the classification of a subfamily of highly specialized
NlpC/P60 enzymes. It also provides structural and functional insights to the NlpC/P60 family of
enzymes in general.
Acknowledgements
The project is sponsored by National Institutes of General Medical Sciences Protein Structure
Initiative (U54 GM074898). Portions of this research were carried out at the Stanford
Synchrotron Radiation Lightsource (SSRL). The SSRL is a national user facility operated by
Stanford University on behalf of the U.S. Department of Energy, Office of Basic Energy
Sciences. The SSRL Structural Molecular Biology Program is supported by the Department of
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
2
3
Page 23
Energy, Office of Biological and Environmental Research, and by the National Institutes of
Health (National Center for Research Resources, Biomedical Technology Program, and the
National Institute of General Medical Sciences). Genomic DNA from Bacillus cereus NRS248
(ATCC #10987D) was obtained from the American Type Culture Collection (ATCC). The
content is solely the responsibility of the authors and does not necessarily represent the official
views of the National Institute of General Medical Sciences or the National Institutes of Health.
References
1. Doyle, R. J., Chaloupka, J., and Vinter, V. (1988) Turnover of cell walls in
microorganisms, Microbiol. Rev. 52, 554-567.
2. Park, J. T., and Uehara, T. (2008) How bacteria consume their own exoskeletons
(turnover and recycling of cell wall peptidoglycan), Microbiol. Mol. Biol. Rev. 72, 211-
227.
3. Uehara, T., and Park, J. T. (2007) An anhydro-N-acetylmuramyl-L-alanine amidase with
broad specificity tethered to the outer membrane of Escherichia coli, J. Bacteriol. 189,
5634-5641.
4. Uehara, T., Suefuji, K., Valbuena, N., Meehan, B., Donegan, M., and Park, J. T. (2005)
Recycling of the anhydro-N-acetylmuramic acid derived from cell wall murein involves a
two-step conversion to N-acetylglucosamine-phosphate, J. Bacteriol. 187, 3643-3649.
5. Uehara, T., and Park, J. T. (2004) The N-acetyl-D-glucosamine kinase of Escherichia
coli and its role in murein recycling, J. Bacteriol. 186, 7273-7279.
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
2
3
Page 24
6. Uehara, T., and Park, J. T. (2003) Identification of MpaA, an amidase in Escherichia coli
that hydrolyzes the -D-glutamyl-meso-diaminopimelate bond in murein peptides, J.
Bacteriol. 185, 679-682.
7. Schmidt, D. M., Hubbard, B. K., and Gerlt, J. A. (2001) Evolution of enzymatic activities
in the enolase superfamily: functional assignment of unknown proteins in Bacillus
subtilis and Escherichia coli as L-Ala-D/L-Glu epimerases, Biochemistry 40, 15707-
15715.
8. Anantharaman, V., and Aravind, L. (2003) Evolutionary history, structural features and
biochemical diversity of the NlpC/P60 superfamily of enzymes, Genome Biol. 4, R11.
9. Smith, T. J., Blackman, S. A., and Foster, S. J. (2000) Autolysins of Bacillus subtilis:
multiple enzymes with multiple functions, Microbiology 146(2), 249-262.
10. Kuhn, M., and Goebel, W. (1989) Identification of an extracellular protein of Listeria
monocytogenes possibly involved in intracellular uptake by mammalian cells, Infect.
Immun. 57, 55-61.
11. Asano, S. I., Nukumizu, Y., Bando, H., Iizuka, T., and Yamamoto, T. (1997) Cloning of
novel enterotoxin genes from Bacillus cereus and Bacillus thuringiensis, Appl. Environ.
Microbiol. 63, 1054-1057.
12. Teng, F., Kawalec, M., Weinstock, G. M., Hryniewicz, W., and Murray, B. E. (2003) An
Enterococcus faecium secreted antigen, SagA, exhibits broad-spectrum binding to
extracellular matrix proteins and appears essential for E. faecium growth, Infect. Immun.
71, 5033-5041.
13. Xu, Q., Sudek, S., McMullan, D., Miller, M. D., Geierstanger, B., Jones, D. H., Krishna,
S. S., Spraggon, G., Bursalay, B., Abdubek, P., Acosta, C., Ambing, E., Astakhova, T.,
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
2
3
Page 25
Axelrod, H. L., Carlton, D., Caruthers, J., Chiu, H. J., Clayton, T., Deller, M. C., Duan,
L., Elias, Y., Elsliger, M. A., Feuerhelm, J., Grzechnik, S. K., Hale, J., Won Han, G.,
Haugen, J., Jaroszewski, L., Jin, K. K., Klock, H. E., Knuth, M. W., Kozbial, P., Kumar,
A., Marciano, D., Morse, A. T., Nigoghossian, E., Okach, L., Oommachen, S., Paulsen,
J., Reyes, R., Rife, C. L., Trout, C. V., van den Bedem, H., Weekes, D., White, A., Wolf,
G., Zubieta, C., Hodgson, K. O., Wooley, J., Deacon, A. M., Godzik, A., Lesley, S. A.,
and Wilson, I. A. (2009) Structural basis of murein peptide specificity of a -D-glutamyl-
L-diamino acid endopeptidase, Structure 17, 303-313.
14. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and
Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein
database search programs, Nucleic Acids Res. 25, 3389-3402.
15. Eddy, S. R. (1998) Profile hidden Markov models, Bioinformatics 14, 755-763.
16. Bateman, A., Coin, L., Durbin, R., Finn, R. D., Hollich, V., Griffiths-Jones, S., Khanna,
A., Marshall, M., Moxon, S., Sonnhammer, E. L., Studholme, D. J., Yeats, C., and Eddy,
S. R. (2004) The Pfam protein families database, Nucleic Acids Res. 32, D138-141.
17. Plewniak, F., Bianchetti, L., Brelivet, Y., Carles, A., Chalmel, F., Lecompte, O., Mochel,
T., Moulinier, L., Muller, A., Muller, J., Prigent, V., Ripp, R., Thierry, J. C., Thompson,
J. D., Wicker, N., and Poch, O. (2003) PipeAlign: A new toolkit for protein family
analysis, Nucleic Acids Res. 31, 3829-3832.
18. Katoh, K., Kuma, K., Toh, H., and Miyata, T. (2005) MAFFT version 5: improvement in
accuracy of multiple sequence alignment, Nucleic Acids Res. 33, 511-518.
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
2
3
Page 26
19. Klock, H. E., Koesema, E. J., Knuth, M. W., and Lesley, S. A. (2008) Combining the
polymerase incomplete primer extension method for cloning and mutagenesis with
microscreening to accelerate structural genomics efforts, Proteins 71, 982-994.
20. Santarsiero, B. D., Yegian, D. T., Lee, C. C., Spraggon, G., Gu, J., Scheibe, D., Uber, D.
C., Cornell, E. W., Nordmeyer, R. A., Kolbe, W. F., Jin, J., Jones, A. L., Jaklevic, J. M.,
Schultz, P. G., and Stevens, R. C. (2002) An approach to rapid protein crystallization
using nanodroplets., J. Appl. Cryst. 35, 278-281.
21. Lesley, S. A., Kuhn, P., Godzik, A., Deacon, A. M., Mathews, I., Kreusch, A., Spraggon,
G., Klock, H. E., McMullan, D., Shin, T., Vincent, J., Robb, A., Brinen, L. S., Miller, M.
D., McPhillips, T. M., Miller, M. A., Scheibe, D., Canaves, J. M., Guda, C., Jaroszewski,
L., Selby, T. L., Elsliger, M. A., Wooley, J., Taylor, S. S., Hodgson, K. O., Wilson, I. A.,
Schultz, P. G., and Stevens, R. C. (2002) Structural genomics of the Thermotoga
maritima proteome implemented in a high-throughput structure determination pipeline,
Proc. Natl. Acad. Sci. USA 99, 11664-11669.
22. Cohen, A. E., Ellis, P. J., Miller, M. D., Deacon, A. M., and Phizackerley, R. P. (2002)
An automated system to mount cryo-cooled protein crystals on a synchrotron beamline,
using compact samples cassettes and a small-scale robot, J. Appl. Cryst. 35, 720-726.
23. Kabsch, W. (1993) Automatic processing of rotation diffraction data from crystals of
initially unknown symmetry and cell constants, J. Appl. Cryst. 26, 795-800.
24. Schneider, T. R., and Sheldrick, G. M. (2002) Substructure solution with SHELXD, Acta
Cryst. D 58, 1772-1779.
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
2
3
Page 27
25. Bricogne, G., Vonrhein, C., Flensburg, C., Schiltz, M., and Paciorek, W. (2003)
Generation, representation and flow of phase information in structure determination:
recent developments in and around SHARP 2.0, Acta Cryst. D 59, 2023-2030.
26. Cohen, S. X., Morris, R. J., Fernandez, F. J., Ben Jelloul, M., Kakaris, M., Parthasarathy,
V., Lamzin, V. S., Kleywegt, G. J., and Perrakis, A. (2004) Towards complete validated
models in the next generation of ARP/wARP, Acta Cryst. D 60, 2222-2229.
27. Emsley, P., and Cowtan, K. (2004) Coot: model-building tools for molecular graphics,
Acta Cryst. D 60, 2126-2132.
28. Murshudov, G. N., Vagin, A. A., and Dodson, E. J. (1997) Refinement of
macromolecular structures by the maximum-likelihood method, Acta Cryst. D 53, 240-
255.
29. CCP4. (1994) The CCP4 suite: programs for protein crystallography, Acta Cryst. D 50,
760-763.
30. Davis, I. W., Murray, L. W., Richardson, J. S., and Richardson, D. C. (2004)
MOLPROBITY: structure validation and all-atom contact analysis for nucleic acids and
their complexes, Nucleic Acids Res. 32, W615-619.
31. Kall, L., Krogh, A., and Sonnhammer, E. L. (2007) Advantages of combined
transmembrane topology and signal peptide prediction--the Phobius web server, Nucleic
Acids Res. 35, W429-432.
32. Cruickshank, D. W. (1999) Remarks about protein structure precision, Acta Cryst. D 55,
583-601.
33. Holm, L., and Sander, C. (1995) Dali: a network tool for protein structure comparison,
Trends Biochem. Sci. 20, 478-480.
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
2
3
Page 28
34. Marino, M., Banerjee, M., Jonquieres, R., Cossart, P., and Ghosh, P. (2002) GW domains
of the Listeria monocytogenes invasion protein InlB are SH3-like and mediate binding to
host ligands, EMBO J. 21, 5623-5634.
35. Lu, J. Z., Fujiwara, T., Komatsuzawa, H., Sugai, M., and Sakon, J. (2006) Cell wall-
targeting domain of glycylglycine endopeptidase distinguishes among peptidoglycan
cross-bridges, J. Biol. Chem. 281, 549-558.
36. Srisailam, S., Lukin, J. A., Lemak, A., Yee, A., and Arrowsmith, C. H. (2006) Sequence
specific resonance assignment of a hypothetical protein PA0128 from Pseudomonas
aeruginosa, J Biomol NMR 36 Suppl 1, 27.
37. Korndorfer, I. P., Danzer, J., Schmelcher, M., Zimmer, M., Skerra, A., and Loessner, M.
J. (2006) The crystal structure of the bacteriophage PSA endolysin reveals a unique fold
responsible for specific recognition of Listeria cell walls, J. Mol. Biol. 364, 678-689.
38. Aramini, J. M., Rossi, P., Huang, Y. J., Zhao, L., Jiang, M., Maglaqui, M., Xiao, R.,
Locke, J., Nair, R., Rost, B., Acton, T. B., Inouye, M., and Montelione, G. T. (2008)
Solution NMR structure of the NlpC/P60 domain of lipoprotein Spr from Escherichia
coli: structural evidence for a novel cysteine peptidase catalytic triad, Biochemistry 47,
9715-9717.
39. Vacheron, M. J., Guinand, M., Francon, A., and Michel, G. (1979) [Characterisation of a
new endopeptidase from sporulating Bacillus sphaericus which is specific for the
gamma-D-glutamyl-L-lysine and -D-glutamyl-(L)meso-diaminopimelate linkages of
peptidoglycan substrates (author's transl)], Eur. J. Biochem. 100, 189-196.
40. Bateman, A., and Rawlings, N. D. (2003) The CHAP domain: a large family of amidases
including GSP amidase and peptidoglycan hydrolases, Trends Biochem. Sci. 28, 234-237.
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
2
3
Page 29
41. Rigden, D. J., Jedrzejas, M. J., and Galperin, M. Y. (2003) Amidase domains from
bacterial and phage autolysins define a family of -D,L-glutamate-specific
amidohydrolases, Trends Biochem. Sci. 28, 230-234.
42. Nasertorabi, F., Tars, K., Becherer, K., Kodandapani, R., Liljas, L., Vuori, K., and Ely,
K. R. (2006) Molecular basis for regulation of Src by the docking protein p130Cas, J.
Mol. Recognit. 19, 30-38.
43. Pai, C. H., Chiang, B. Y., Ko, T. P., Chou, C. C., Chong, C. M., Yen, F. J., Chen, S.,
Coward, J. K., Wang, A. H., and Lin, C. H. (2006) Dual binding sites for translocation
catalysis by Escherichia coli glutathionylspermidine synthetase, EMBO J. 25, 5970-5982.
Legends
Figure 1. Proposed metabolic pathways for murein peptides in model bacteria E. coli and B.
subtilis. Catalytic enzymes are highlighted in red.
Figure 2. Genomic context of ykfB and ykfC genes in B. subtilis and B. cereus.
Figure 3. Crystal structure of YkfC from B. cereus in complex with L-Ala--D-Glu. (A)
Ribbons representation of BcYkfC. The three domains (SH3b1, SH3b2 and NlpC/P60) are
colored as blue, green and red respectively. The bound L-Ala--D-Glu is shown as sticks. (B)
Ribbons representations of individual domains with secondary structures labeled. (C) Molecular
surface of YkfC colored by sequence conservation. The most conserved residues are shown in
red, the non-conserved residues in white.
Figure 4. Active site and recognition of L-Ala--D-Glu by BcYkfC. (A) 2Fo-Fc density of L-
Ala--D-Glu identified in the active site, contoured at 1.5. (B) The S binding sites of BcYkfC.
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
2
3
Page 30
Hydrogen bonds and distances are shown as dashed lines. (C) L-Ala--D-Glu (stick) is located
in the active site defined by SH3b1 domain (blue) and the NlpC/P60 domain (red). (D)
Interaction between L-Ala--D-Glu and the active site of YkfC. This figure was generated using
the program MOE 2008.10 (Chemical Computing Group Inc.).
Figure 5. Sequence alignment of YkfC from B. subtilis and B. cereus, Dipeptidyl-peptidase VI
(DPP VI) from Bacillus sphaericus, and a -D-Glutamyl-L-Diamino acid endopeptidase from
Anabaena variabilis. The sequence numbering and secondary structures of YkfC from B. cereus
are shown at the top. The active site residues are marked at the bottom.
Figure 6. Models of substrate recognition by BcYkfC. (A) L-Ala--D-Glu-DAP was docked
into the active site of BcYkfC. The protein surface is colored according to a gradient in
electrostatic potential (red: negative, blue: positive) (MOE 2008.10, Chemical Computing Group
Inc.). (B) Recognition of -D-Glu by four residues of YkfC (Tyr226, Trp228, Asp237, Ser239
and Ser257). (C) Sequence conservation of the active sites in NlpC/P60 domains, based on 2277
NlpC/P60 domains with intact catalytic dyad and conserved Tyr226. (D) Sequence conservation
of the active sites in the YkfC subfamily of NlpC/P60 enzymes based on 282 sequences selected
based on the presence of a conserved aspartate residue at position 256 of BcYkfC. The
conservation of this residue is highly correlated with a conserved tyrosine in the D region of the
SH3b domain (Tyr118 of BcYkfC).
Figure 7. Structural comparisons between BcYkfC and the cyanobacterial NlpC/P60
endopeptidase AvPCP. (A) BcYkfC and the cyanobacteria endopeptidase AvPCP (PDB 2hbw)
shown in the same orientations. The equivalent residues are shown in red. (B) A stereoview of
the C traces of two proteins shown in A (coloring the same as in A). (C) The S binding sites of
the two proteins are nearly identical. The equivalent residues for the cyanobacteria
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
2
3
Page 31
endopeptidase are labeled in parenthesis. (D) Comparison of the active site cavities and their
environments. The catalytic cysteine is shown as white. A stick model of a docked murein
tripeptide is shown in the active site of BcYkfC.
Figure 8. UPGMA circular phylogram of the YkfC subfamily of NlpC/P60 enzymes. The
bacteria from bacteroidetes, cyanobacteria, firmicute and proteobacteria (-proteobacteria) are
shown in blue, cyan, green and red respectively.
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
2
3
Page 32
Tables
Table 1. Data collection and refinement statistics (PDB code 3h41)
Space group C2Unit Cell a=95.2 Å, b=59.4 Å , c=61.3 Å, =103.3°Data collection 1 MADSe - peak 2 MADSe - remo 3 MADSe - inflWavelength (Å) 0.9786 0.9184 0.9799Resolution range (Å) 41.5-1.79 41.5-1.84 41.5-1.86Number of observations 116,822 107,729 104,295Number of unique reflections 31,085 28,629 27,652Completeness (%) 98.0 (95.1)a 98.6 (98.1) 98.5 (97.9)Mean I/σ(I) 11.3 (2.1)a 12.5 (2.8) 13.1 (2.8)Rmerge on I (%) 11.1 (71)a 10.0 (54) 10.0 (56)Highest resolution shell (Å) 1.88-1.79 1.94-1.84 1.96-1.86MAD PhasingResolution 41.5-1.79Number of Se sites 5Mean Figure of merit 0.36Model and refinement statisticsResolution range (Å) 50-1.79 Data set used in refinement 1 MADSeNo. of reflections (total) 31,083 Cutoff criteria |F|>0No. of reflections (test) 1,564 Rcryst 16.3Completeness (% total) 98.0 Rfree 19.7Stereochemical parametersRestraints (RMS observed) MolProbity statistics Bond lengths (Å) 0.015 All-atom clash score 5.07 Bond angles (°) 1.50 Ramachandran favored (%) 97.7Average isotropic B-value (Å2) 19.9 Ramachandran outliers 1ESU based on Rfree (Å) 0.11 Rotamer outliers 1Protein residues / atoms 305/2,463aHighest resolution shell in parentheses. The high resolution cutoff was chosen such that the mean I/(I) in the highest resolution shell is around 2.ESU = Estimated overall coordinate error (32).Rmerge=hkli|Ii(hkl)-<I(hkl)>|/hkliIi(hkl)Rcryst = hkl||Fobs|-|Fcalc||/hkl|Fobs| where Fcalc and Fobs are the calculated and observed structure factor amplitudes, respectively.Rfree = as for Rcryst, but for 5.0% of the total reflections chosen at random and omitted from refinement.
Structure of endopeptidase YkfC
1
1
2
3456789
10
2
3
Page 33
Table 2. Structural comparisons of BcYkfC and bacteria proteins with at least one common
domaina
Aligned Substructure
PDB code
Ref. RMSD (Å)
Aligned length (aa)
No. residues (aa)
Z score
Seq. id
(%)
Comments
SH3b+NlpC/P602fg0 (13) 2.1 193 221 21.3 26 Cyanobacterial NpPCP2hbw (13) 2.3 193 220 21.2 29 Cyanobacterial AvPCP
SH3b1m9s (34) 1.9 60 523 6.0 12 TDb/Internalin B1r77 (35) 2.3 57 103 4.7 12 TD/ PG hydrolase ALE-11xov (37) 3.0 51 317 3.5 10 TD/Endolysin PlyPSA2akk (36) 1.8 44 74 2.7 7 PhnA-like protein1x27 (42) 1.9 54 164 5.7 13 Eukaryotic SH3
NlpC/P602jyx (38) 1.9 119 129 17.8 29 E. coli lipoprotein Spr2p1g 2.5 110 230 9.6 18 Uncharacterized2ioa (43) 3.2 106 589 7.3 9 CHAP domain / GspSb
aThe alignment is performed by the DALI (33) structural comparison server using full-length
BcYkfC and individual domains (SH3b1 and NlpC/P60) as search probes. No. residues is the
number of residues present in the model used for comparison. For proteins with multiple SH3-
like domains (PDB codes 1m9s and 1xov) or multiple chains, only the best match is shown.
bGspS: Glutathionylspermidine synthetase/amidase; TD: targeting domain.
Structure of endopeptidase YkfC
1
1
2
3
4
5
6
7
8
2
3
Page 34
TOC figure:
Structure of endopeptidase YkfC
1
1
2
2
3
Page 35
Figure 1.
Structure of endopeptidase YkfC
1
1
2
2
3
Page 36
Figure 2.
Structure of endopeptidase YkfC
1
1
2
2
3
Page 37
Figure 3.
Structure of endopeptidase YkfC
1
1
2
2
3
Page 38
Figure 4.
Structure of endopeptidase YkfC
1
1
2
3
2
3
Page 39
Figure 5.
Structure of endopeptidase YkfC
1
1
2
2
3
Page 40
Figure 6.
Structure of endopeptidase YkfC
1
1
2
2
3
Page 41
Figure 7.
Structure of endopeptidase YkfC
1
1
2
2
3
Page 42
Figure 8.
Structure of endopeptidase YkfC
1
1
2
3
2
3