chapter! introduction - shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/23820/8/08... · 2018....
TRANSCRIPT
CHAPTER!
INTRODUCTION
Cha ter 1
Tuberculosis (TB) is the leading cause of death from an infectious disease and
affects roughly one-third of the entire world population (Sander et al., 2004).
Mycobacterium tuberculosis is the etiologic agent of tuberculosis in humans. The
sequencing of its genome revealed that it is comprised of 4,411,529 base pairs.
Mycobacterium tuberculosis differs radically from other bacteria in that a significant
portion of its coding capacity is devoted to the production of enzymes involved in
lipogenesis and lipolysis (Cole et al., 1998). Analysis of the sequences have resulted in
attributing precise functions to 40 % of the predicted proteins, some information is
available for 44 % and the remaining 16 % might account for specific mycobacterial
functions (Cole et al., 1998). A more recent re-annotation suggests that the genome
contains 3995 genes coding for proteins (ORFs) and 50 RNA genes (45 tRNA, 3 rRNA
and 2 stable rRNA) (Cole et al., 1998; Camus et al., 2002).
Coordinated efforts have been launched by several Structural Genomics
Consortiums (http://www.rcsb.org/pdb/strucgen.html) like Tuberculosis Structural
Genomics Consortium (http://www.doe-mbi.ucla.edu/TB/) for structural determination
and analysis of a large number of Mycobacterium tuberculosis proteins. Structure
elucidation of these proteins is expected to also greatly contribute to the identification
of novel drug targets and the development of new drugs aided by rational methods.
1.1 Nucleotidyltransferase family of proteins
ATP and NAD+ -dependent DNA ligases, ATP-dependent RNA ligases and GTP
dependent mRNA capping enzymes comprise this superfamily of proteins (Figure 1.1 ).
These proteins generally catalyze nucleotidyl transfer to polynucleotide 5' ends via
covalent enzyme-(lysyl-N)-NMP intermediates (Figure 1.2) (Shuman and Lima, 2004).
Reaction mechanisms of different classes of nucleotidyltransferase family of proteins are
shown in figure 1.2. This superfamily is defined by five peptide motifs that line the
nucleotide-binding pocket and contribute amino acid sidechains essential for catalysis
(Figure 1.3).
The initial clues that polynucleotide ligation and RNA capping might be related
at the level of enzyme structure and mechanism came from the mapping of the lysine
nucleotidylation sites to a peptide motif, Kx(D/N)G, referred to as motif I, that is
conserved among ATP -dependent DNA and RNA ligases, NAD+-dependent DNA
ligases and GTP-dependent RNA capping enzymes (Thogersen et al., 1985;
Cha ter 1
Tomkinson et al., 1991; Cong and Shuman, 1993). Sequence analysis defined four
additional motifs (III, Ilia, IV and V) that are conserved in order and spacing among
polynucleotide ligases and capping enzymes (Shuman and Schwer, 1995). Mutational
analyses of exemplary enzymes (Figure 1.1) have shown that conserved amino acids
within the five motifs are essential for the function of capping enzymes and
polynucleotide ligases (Sawaya and Shuman, 2003; Zhu and Shuman, 2005). The
prediction that the five motifs comprise the active site of ligases and capping enzymes
was affirmed initially by the crystal structures of bacteriophage T7 ATP-dependent
DNA ligase (Subramanya et al., 1996) and NAD+-dependent bacterial DNA ligases
from Bacillus stearathermaphilus (Bst) (BstLigA) and Thermus filifarmis (Tfi)
(TfiLigA) (Singleton et al., 1999; Lee et al., 2000) as also the Chiarella virus (ChV)
mRNA capping enzyme (Hakansson et al., 1997). These studies revealed a shared core
tertiary structure for DNA ligases and capping enzymes, composed minimally of a
nucleotidyltransferase domain fused to a C-terminal OB-fold domain. Subsequently,
structures of the Candida albicans (Cal) mRNA capping enzyme-GMP intermediate
(Fabrega et al., 2003), the Chiarella virus DNA ligase-AMP intermediate (Odell et al.,
2000; Odell et al., 2003) and Enteraccaccus faecalis DNA ligase bound to NAD+
(Gajiwala and Pinko, 2004) have captured family members in new conformations that
are mechanistically instructive.
1.2 DNA Ligases
DNA ligases join breaks in the phosphodiester backbone of DNA molecules and
are used in many essential reactions viz. DNA replication, repair, and recombination
within the cell (Cao, 2002). All DNA ligases follow the same reaction mechanism, but
they may use either A TP or NAD+ as a cofactor. In the first step, attack on the a
phosphorus of ATP or NAD+ by the ligase results in release of pyrophosphate or
nicotinamide mononucleotide (NMN) respectively and formation of a covalent ligase
(lysyl-N)-AMP intermediate. In the second step, the AMP is transferred to the 5' end of
the 5'-phosphate-terminated DNA strand to form a DNA-adenylate intermediate,
A(5')pp(5')DNA. In the third step, the ligase catalyzes attack by the 3'-0H of the nick on
the DNA-adenylate to join the two polynucleotides and release AMP (Doherty and Suh,
2000) (Figure 1.4).
2
Cha fer I
A RNA Ligase B Capping enzyme
C NAD+ Ligase D ATP Ligase
Fig 1.1: Structures of covalent nucleotidyltransferase family members.
(A) T4 RNA ligase 2 (Rnl2) bound to AMP, PDB : 1S68. (B) C. albicans mRNA guanylyltransferase-GMP intermediate (capping enzyme), PDB: 1P16. (C) E. Jaecalis NAD+-dependent DNA ligase (EfaLigA) bound to NAD+, PDB: lTAE. (D) Chiarella virus A TP -dependent DNA ligase-AMP intermediate, PDB: 1 FVI.
The structures were aligned with respect to their nucleotidyltransferase domains, the secondary structure elements of which are depicted with helices colored red and ~ strands colored yellow. Intervening loops and the respective flanking domains, when present, are shown in blue worm representation. For clarity, segments of the OB-fo ld domain and an extension of the C-terminal helix of the C. albicans capping enzyme were truncated. All structural representations were created with the program PYMOL (Delano, 2002; http://www.pymol.org).
3
Cha ter I
(A) DNA Ligase
(1) E + pppA ... EpA + PPi
(2) EpA + pDNA ... AppDNA + E
(3) DNAoH + AppDNA ~ DNApDNA + Ap
(B) RNA Ligase
(1) E + pppA .... EpA + PPi
(2) EpA + pRNA .. AppRNA + E
(3) RNAOH + AppRNA ... RNApRNA + Ap
(C) RNA capping enzyme
(1) E + pppG
(2) EpG + ppRNA
Phosphoramidate Intermediate
Fig 1.2: Reaction mechanisms
---~...,~ EpG + PPi
-•• GpppRNA + E
H 0 I + II
Lys - N- P- o - (Nucleoside) I I H o-
Conserved pathway of nucleotidyl transfer to polynucleotide ends catalyzed by (A) DNA ligase, (B) RNA ligase and (C) RNA capping enzyme. The first step in each pathway entails the attack of the enzyme (E) on the a.-phosphorus (p) of the nucleotide substrate to form a covalent enzyme-NMP intermediate (EpN). The chemical structure of the phosphoramidate intermediate is depicted at the bottom. Figure was adapted from Shuman, S., and Lima, C. D. (2004) Curr. Opin. Struct. Bioi. 14, 757-64.
4
Cha ter 1
Ill lila IV v Bfa K1DGLA VBVRGECY LNTFL FPPHE Tfi KVDGL£ LBVRG y LRAT FPAHB
Rn12 K1HGTN YQVFGEPA DKDF 011 N£KF£
T7 KYDGVR F~1LDG v LH1KLYA1L EGL1V ChV K1DGIR HGSDGE1S PSYY" DYV EGl-1VI
ChV KTDGIR S1FtC c PAPVLPDAV DGL1I PGTHH Cal KTDGLR TLLDG v LRYVI PDAL DGL1Y PAEHN
ctP04 Ribose Ribose Purine Purine ·p'u PO 4
Metal
Fig 1.3: Structural motifs of the covalent nucleotidyltransferase family
The amino acid sequences of motifs I, ill, illa, IV and V are aligned for the NAD+dependent DNA ligases of E. faecalis (Efa) and Thermus fi/iformis (Tfi) , the A TPdependent T4 RNA ligase 2 (Rnl2), the A TP-dependent DNA ligases of bacteriophage T7 and Chlorel/a virus (Ch V), and the GTP-dependent capping enzymes of Chi orella virus and C. a/bicans (Cal). Conserved active site constituents are highlighted in red. Specific contacts between conserved amino acid side chains and the nucleotide substrate or divalent metal seen in exemplary crystal structures are indicated by arrowheads. Figure was adapted from Shuman, S., and Lima, C. D. (2004) Curr. opin. Struct. Bioi. 14, 757-64.
5
Cha ter 1
All bacteria (eubacteria) (Wilkinson et al., 2001) and entomopoxviruses (Sriskanda
eta!., 2001) contain NAD+ -dependent DNA ligases. In addition to their NAD+ -dependent
enzymes, some bacteria contain genes for putative ATP-dependent DNA ligases. Studies
have been performed on DNA ligases from each domain of life and those encoded by the
genomes of various viruses (Wilkinson et al., 2001 ). Eukaryotic cells. contain a number of
different ligases that use ATP, with each enzyme being used in specific end joining
reactions. ATP -dependent ligases range in sizes from 30 to > 100 kDa but NAD+ ligases
are highly homologous and are monomeric proteins of 70-80 kDa (Doherty and Suh,
2000). As discussed above, both types of ligases share a catalytic core, which consists of
conserved motifs found in nucleotidyltransferase superfamily. These motifs are essential
for adenylation cofactor binding, metal ion coordination, and ligation chemistry, as
validated by X-ray crystal structure determination and mutational analyses (Cao, 2002).
The requirement for these different isozymes in bacteria is unknown, but may be related to
their utilization in different aspects of DNA metabolism. The putative ATP-dependent
DNA ligases found in bacteria are most closely related to proteins from archaea and
viruses. Phylogenetic analysis suggests that all NAD+ -dependent DNA ligases are closely
related, but the ATP-dependent enzymes have been acquired by genomes of eukarya,
bacteria and viruses on a number of separate occasions (Wilkinson et al., 2001 ).
1.2.1 ATP-dependent DNA ligases
As discussed above, ATP dependent ligases have been found in bacteriophages
(Doherty and Wigley, 1999), eubacteria (Cheng and Shuman, 1997), archaea (Sriskanda et
al., 2000; Lai et al., 2001), viruses (Ho et al., 1997; Sekiguchi and Shuman, 1997) and
eukarya (Tomkinson and Levin, 1997; Timson et al., 2000; West et a!., 2000). Domain
organization of ATP ligases from different sources are shown in figure 1.5. A brief
discussion on each one ofthem is as follows:
Viral ATP-ligases are in general smaller in size as compared with ligases from
more complexed organisms such as yeast, plants, and humans (Figure 1.5). 298 amino acid
ATP ligase from Paramecium bursaria Chiarella virus-1 (PBCV-1) protein (Ho et al.,
1997; Odell eta!., 2000) composed of only ligase core domain (Figure 1.5) has been used
as a biochemical model to study the catalytic mechanism of A TP ligases.
6
Cha ter 1
Step 1: Enzyme Adenylation
ATP PPi
E + or E-AMP + or
NAD+ NMN
Step 2: Substrate Adenylation
E- AMP + nicked DNA .,.~.__ ____ ... E • AMP - nicked DNA
AppDNA
Step 3: Nick Closure
E • AMP- nicked DNA E + sealed DNA + AMP
Fig 1.4: Detailed reaction mechanism of DNA ligases
Three steps of a ligation reaction. In the enzyme adenylation step, an· AMP group is transferred from the cofactor NAD+ or ATP to a lysine residue in the adenylation motif I, KXDG, through a phosphoamide linkage. In the substrate adenylation step, this AMP group is transferred to the 5' phosphate at the nick through a pyrophosphate linkage to form a DNA-adenylate intermediate (AppDNA). In the nick-closure step, a phosphodiester bond is formed to seal the nick and release AMP.
7
Cha ter 1
T4 ATP ligase is a 487 amino-acid protein capable of both nick sealing and blunt
end ligation (Rossi et al., 1997). The 359 amino acid-containing T7 ligase is more than
100 amino acids shorter than T4 DNA ligase and its crystal structure (PDB: 1AOI) offers
the first atomic view of DNA ligases (Subramanya et al., 1996) (Figure 1.6). T7 DNA
ligase is organized by a larger N-terminal domain encompassing motifs I, III, Ilia, IV, and
a portion of V and a smaller C-terminal domain known as Oligomer binding fold domain
(OB), which encompasses motifs V and VI (Figure 1.5 and 1. 7b ). Sequencing of the
vaccinia virus genome reveals an A TP-dependent ligase. Vaccinia ligase (Vac) strictly
requires ATP for adenylation, which can not be substituted. The ligation reaction
catalyzed by the vaccinia ligase requires Mg2+ or Mn2+.
Archaea
The information processing machinery in archaea in general resembles eukaryotes more
closely than prokaryotes. This is evident in the presence of A TP -dependent ligases in archaeal
genomes (Kletzin, 1992). The 561 amino acid DNA ligase from thermophilic archaean
Methanobacterium thermoautotrophicum has been characterized (Sriskanda et al., 2000). This
ligase uses ATP as the adenylation cofactor and to a less extent dATP. Another characterized
archaeal A TP ligase from Thermococcus kodakaraensis KOD 1 is unique in adenylation cofactor
selectivity (Nakatani et al., 2000). This ligase not only uses ATP as the adenylation cofactor, but
also to a limited extent uses NAD+ as a cofactor (Nakatani et al., 2000).
Bacteria
The outpouring of bacterial genome data led to identification and characterization
of several ATP ligases from different bacterial species (Table 1.1) (Wilkinson et al., 2001;
Weller and Doherty, 2001). The open reading frames (ORFs) encoding putative bacterial
ATP ligases often contain primase and nuclease domains as well (Figure 1.5) (Weller and
Doherty, 2001). Furthermore, the ligase genes are organized with a putative prokaryotic
end binding Ku protein core as an operon (Doherty et al., 2001 ). This raises interesting
speculations about the role of these ligases and other domains in DNA repair. The
bacterial ATP ligases are additional to NAD+ -dependent ligase encoded by the bacterial
genomes (Wilkinson et al., 2001). A 268 amino acid ORF containing conserved
nucleotidyltransferase signature from the first sequenced bacterium Haemophilus
injluenzae has been confirmed as an ATP ligase (Cheng and Shuman, 1997). Disruption of
this ATP ligase in Haemophilus injluenzae results in loss of viability (Preston et al., 1996).
Bacillus subtilis genome encodes two putative ATP ligases, YkoU and YoqV besides one
8
Cha ter 1
NAD+ ligase, YerG (Petit and Ehrlich, 2000). Aquifex aeolicus genome encodes an ATP
ligase with limited nick sealing activity (Tong eta/., 2000). Obviously, the physiological
role of bacterial ATP ligases has yet to be defined. Mycobacterium tuberculosis is unique
in that it contains three different ATP dependent ligases, viz. LigB, LigC and LigD (Gong
eta/., 2004). Recently, it has been reported (Gong eta/., 2005) that polyfunctional LigD in
Mycobacterium tuberculosis is involved in non-homologous end joining (NHEJ) along
with another factor Ku (Della et a/., 2004) while LigC provides a backup mechanism for
LigD-independent error-prone repair of blunt-end double strand breaks (DSBs).
Yeast
Two ATP-dependent ligases have been discovered in the budding yeast
Saccharomyces cerevisiae (See) (Figure 1.5). The 755 amino acid Cdc9 protein (Barker et
a/., 1985; Kulotti eta/., 1971) is responsible for joining Okazaki fragments during lagging
strand synthesis and is essential for base excision repair and nucleotide excision repair
(Johnston, 1983; Wu eta/., 1999). Both nuclear and mitochondria forms of DNA ligase is
encoded by cdc9 (Willer eta/., 1999; Donahue et al., 2001). The mitochondrial version
contains a 23 amino acid ·signal sequence that targets the ligase to mitochondria. The
complete budding yeast genome, however, reveals a second ATP-dependent ligase, Ligase
IV (Ramos eta/., 1997; Teo and Jackson, 1997; Wilson eta/., 1997). This 944 amino acid
protein is highly homologous with human DNA ligase IV throughout the coding region
but shows weak ligation activity. Genetic analysis demonstrates that yeast ligase IV is not
required for DNA replication, homologous recominbation or repair of UV-induced
damage; however, it is an essential component of DSB repair called non-homologous end
joining (NHEJ) (Ramos eta/., 1997, Schar eta/., 1997, Teo and Jackson, 1997). Yeast
ligases contain two BRCT (BRCA1 like C-!erminus) domains in the C-terminal region.
Ligase IV through its BRCT domain interacts with another protein Lifl (a homolog of
human X-ray cross-complementing group IV, XRCC4) to enhance its ligation activity by
stimulating enzyme adenylation (step 1) (Teo and Jackson, 2000).
Mammalian
Mammals have at least three ligases: Ligase I, III, and IV (Tomkinson and Lewin,
1997; Tomkinson and Mackay, 1998; Lasko et a/., 1990; Lindahl and Barnes, 1992;
Timson eta/., 2000). Ligase II has distinct biochemical properties compared with ligase I.
Ligase II is now considered as a proteolysis fragment of ligase III. Ligase III has two
isoforms generated by alternative splicing (Chen eta/., 1995; Wei eta/., 1995). The 922
9
Cha ter I
amino acid ligase lila contains a BRCT domain at the C-terminus, which is largely deleted
in the 862 amino acid ligase IIIp (Figure 1.5). BRCT domain plays an important role in
interacting with several partner proteins to make a complex, which is involved in base
excision repair (Rice, 1999). Ligase III gene also encodes mammalian mitochondrial
ligases (Lakshmipathy and Campbell, 1999; Pinz and Bogenhagen, 1998). A 44 kDa
protein has been identified from human cells as ligase V (Johnson and Fairman, 1997).
This ligase has a double-strand joining activity similar to ligase I but is weak in nick
sealing. Domain architectures ofhuman ATP ligases are sketched in figure 1.5.
The 919 amino acid human ligase I (Hu I) contains a catalytic core and an N
terminal domain that is required for cellular localization and protein-protein interactions
(Tomkinson and Lewin, 1997, Lindahl and Barnes, 1992) (Figure 1.5). Recently, crystal
structure of hu I (residues 233 to 919) in complex with a nicked, 5' adenylated DNA
intermediate has been reported (Pascal et al., 2004). The structure reveals a unique feature
of mammalian ligases: a DNA-binding domain that allows ligase I to encircle its DNA
substrate, stabilizes the DNA in a distorted structure, and positions the catalytic core on
the nick. Ligase I efficiently joins nicks in double-stranded DNA, involves in maturation
of Okazaki fragments during replication and base excision repair and joins blunt-ended
DNA optimally with 17.5% PEG 6000 (Arrand et al., 1986). In addition to nick sealing,
ligase III is also able to catalyze joining of oligo ( dT)/poly (rA) and oligo (rA)/poly ( dT)
(Robins and Lindahl, 1996; Arrand et al., 1986; Yang and Chan, 1992; Tomkinson et al.,
1991). Human ligase IV (Hu IV), in addition to efficient joining of nicks on a DNA
template, also joins nicks on an RNA template (Robins and Lindahl, 1996). These
enzymatic properties are similar to human ligase III. The 844 amino acid ligase IV
contains two BRCT domains at the C-terminus (Figure 1.5). The linker regions between
the two BRCT domains are essential for binding of the partner protein XRCC4
(Grawunder et al., 1998). Both XRCC4 and Ku protein stimulate the ligation activity of
ligase IV (Grawunder et al., 1997; Ramsden and Gellert, 1998), while Ku also stimulates
end-joining activities of hu I and III (Ramsden and Gellert, 1998).
1.2.2 NAD+ -dependent DNA ligases
Among the first DNA ligases to be purified and analyzed biochemically was the
Escherichia coli (Eco) NAD+ -dependent ligase enzyme (EcoLigA) (Olivera and Lehman,
1967a; Olivera and Lehman, 1967b), which has served as a paradigm for studies ofNAD+-
10
Cha ter I
dependent DNA ligases (LigA) (Lehman, 1974). EcoLigA is an essential enzyme that
consists of671 amino acids (molecular weight of74 kDa) and is encoded by ligA. NAD+
dependent ligases have been found in bacteria and more recently in entomopoxviruses,
Amsacta moorei entomopoxvirus (AmEPV) (Sriskanda et al., 2001) and Melanoplus
sanguinipes entomopoxvirus (MsEPV) (Lu eta/., 2004). They have not been detected in
humans or any other eukaryotes except entomopoxviruses. The typical domain
organization of NAD+ -dependent ligases from eubacterial and entomopoxviruses is
shown in figure 1.5. NAD+ -dependent DNA ligases have been identified in every
bacterial species that has been sequenced so far. Many of these enzymes have been cloned,
sequenced and biochemically characterized and include those from several thermophilic
and cold-adopted seawater bacterial species like Pseudoalteromonas haloplanktis (Table
1.1 and 1.2) (Ishino et a/., 1986; Barany and Gelfend, 1991; Lauer et a/., 1991; Shark and
Conway, 1992; Jonsson et a/., 1994; Thorbjamardottir et a/., 1995; Brannigan et a/.,
1999). These enzymes are of fairly uniform size and show a considerable degree of amino
acid sequence homology. For LigA shown in Tables 1.2, a basic BLAST alignment with
the EcoLigA detects typical amino acid sequence identify of35-50% (average 42 %).
In studies involving temperature-sensitive or deletion mutants, LigA were shown
to be essential for survival in several bacterial species, LigA in Escherichia coli (Dermody
et a/., 1979) and Salmonella typhimurium (Park et a/., 1989), Y erG in Bacillus sub til is
(Petit and Ehrlich, 2000), Lig in Staphylococcus aureus (Kaczmarek et al., 2001) and
LigA in Mycobacterium tuberculosis (Gong eta/., 2004; Sassetti eta/., 2003). Due to
the essential involvement of LigA in replication, its inactivation leads to the non
viability of most bacteria. Therefore, LigA presents an attractive target for broad
spectrum antimicrobial therapy predicated on blocking the reaction of LigA with
NAD+ (Brotz-Oesterhelt et a/., 2003; Gong et a/., 2004; Georlette et a/., 2003;
Kaczmarek et a/., 2001 ).
Limited proteolysis divides the enzyme into two functional domains: an N
terminal domain responsible for NAD+ binding and for the self-adenylation reaction
and a C-terminal, DNA binding domain (Timson and Wigley, 1999).
The high thermostability of these enzymes from thermophilic bacteria aids
structural and mechanistic studies of DNA ligation and has led to the use of DNA
ligases from such organisms as model systems for studies of these enzymes.
Thermostable bacterial DNA ligases have also proved to be a valuable tool for the detection
of mutations that lead to various cancers (Khanna et a/., 1999).
11
Chv
T7
Vac
ykoU
See
See IV
Hul
+ PCNAorPol ~
Hu lila lzn I
Hu III~
HuiV
Eubacterial NAD+ ligase
MsEPV ligase
Ligase Core I Ligase Core . I
Ligase Core
Ligase Core
Ligase Core
Ligase Core
Ligase Core
Primase I
II BRcrllaRCTII t
XRCC4
Ligase Core I BRcrl t
XRCCI
Ligase Core I I
Ligase Core II BRcrll BRcrll t
XRCC4
(A)
II a I Adenylation I OB lznl HhH I BRCTI
ll a I Adenylation I OB lzn·l HhH I (B)
Fig 1.5: Domain architecture of DNA Iigases
Cha ter 1
(A) Domain structure of ATP ligases. Ligase core: the ligase core domain containing the conserved nucleotidyltransferase motifs as seen in Figure 1.3. BRCT: BRCAI like C-!erminus. Zn: zinc finger motif. Hatched box in human ligase I: PCNA or DNA polymerase p (pol p) interaction domain.
(B) Domain structure of NAD+ Iigases. Adenylation: the adenylation domain containing motifs I, III, Ilia, IV, and V of the nucleotidyltransferase core. OB: OB-fold, Zn: zinc finger motif, zn·: a region in MsEPV ligase that is homologous to the Zn finger motif in eubacterial NAD+ ligase but without four conserved Cys for Zn coordination, HhH: the four Helix-hairpin-Helix motifs, BRCT: BRCAl C!erminus. Figure was adapted from Cao, W. (2002) Curr. org. chem. 6, 1-13.
12
Cha ter 1
Table 1.1: Eubacterial genome containing both NAD+ & ATP ligases
Speciesofeubacteria gene Cofactor Amino acids, M. Wt.c
Aquifex aeolicus VF5 ligA NAD+ Lig ATP
Bacillus subtilis yerGa NAD+ yoqVZ ATP ykoU ATP
Campylobacter jejuni ligA NAD+ CJJ669c ATP
Haemophilus influenzae Rd Hill 00 (lig) NAD+ URFlb ATP
Mycobacterium tuberculosis ligAd (Rv3014c) NAD+ ligB (Rv3062) ATP ligC (Rv3731) ATP ligD (Rv0938) ATP
Neisseria meningitidis ligA NAD+ NMA0388 ATP
Neisseria meningitidis NMB0666 NAD+ NMB2048 ATP
Pseudomonas aeruginosa P AO 1 lig PA2138
NAD+ ATP
Vibrio cholerae EI TorN1696 VC0971 VC1542
NAD+ ATP
720 aa, 82.3 kDa 585 aa, 67.1 kDa
668 aa, 74.9 kDa 270 aa, 31.0 kDa 611 aa, 70.2 kDa
647 aa, 73.9 kDa 282 aa, 32.5 kDa
679 aa, 75.2 kDa 268 aa, 30.9 kDa
691 aa, 75.3 kDa 507 aa, 53.7 kDa 358 aa, 40.2 kDa 759 aa, 83.6 kDa
841 aa, 92.4 kDa 274 aa, 30.7 kDa
841 aa, 92.4 kDa 274 aa, 30.7 kDa
794 aa, 86.8 kDa 840 aa, 94.0 kDa
669 aa, 73.3 kDa 282 aa, 31.8 kDa
a - yerG is an essential gene, but yoq Vis not (Petit and Ehrlich, 2000). b - Confirmed as an ATP-dependent DNA ligase by Cheng and Shuman (1997). c- Molecular weight in kilo Dalton (kDa). d -ligA is an essential gene, but ligB, ligC, ligD are dispensable (Gong et al., 2004).
13
Cha ter 1
Initial elucidation of crystal structure of N-terminal fragment of BstLigA
(Singleton et al., 1999) (PDB: 1B04) (Figure 1.6) led to the identification of
nucleotidyltransferase or adenylation domain which was composed of two
subdomains Ia and Ib while Ib has structural homology with N-terminal domains of
other nucleotidyltransferase family of proteins, Ia was unique to this class of
enzymes. Later, the crystal structure of intact TfiLigA (PDB: 1 V9P) (Lee et al.,
2000) further revealed that structurally, NAD+ ligases are organized by a catalytic
core at the N-terminal (which include the N-terminal adenylation domain and OB
fold) and a Zn finger, four Helix-hairpin-Helix (HhH) motifs and BRCT domain at
C-terminal (Figure 1.6). These structures also confirmed that conserved
nucleotidyltransferase motifs (Figure 1.3) which were part of ATP ligases and RNA
capping enzymes and predicted in NAD+ ligases (Sriskanda et al., 1999; Doherty
and Suh, 2000; Arrand et al., 1986) also existed in the catalytic core of these NAD+
ligases. Figure 1. 7 a shows the presence of all these motifs in TfiLigA, sequences of
which are shown in figure 1.3. Although, these ligases utilize NAD+ as cofactor in
ligation reaction none of these crystal structure showed NAD+ in the enzyme active
site. Only TfiLigA ligase structure showed non-covalently bound AMP in the active
site. However, crystal structures of BstLigA apoenzyme (PDB: IB04) (Singleton et
al., 1999) and TfiLigA covalent ligase-adenylate intermediate (PDB: 1 V9P) (Lee et
al., 2000) confirmed that AMP binding pocket was located within a
nucleotidyltransferase domain with ATP-dependent ligases. Only difference lies in
the presence of subdomain Ia in theN-terminal region of NAD+ ligases as compared
to A TP ligases. Subsequent biochemical and mutational analysis (Sriskanda and
Shuman, 2002) on EcoLigA showed the role of some of the highly conserved
residues in the subdomain Ia to play important role in the step 1 of ligation reaction
(Figure 1.4) indicating the role subdomain Ia might play in the reaction of the
enzyme with NAD+ which was subsequently proved by crystallographic snapshots of
adenylation domain ofNAD+ -dependent ligase of Enterococcus faecalis (EfaLigA)
in complexes with NMN and NAD+ (Gajiwala and Pinko, 2004). Recent findings of
Gajiwala and Pinko (2004) and three-dimensional structure of TfiLigA which
presents the arrangement of the different domains in these classes of ligases give a
clear picture of the mode of NAD+ binding in domain I and spatial arrangement of
different domains of eubacterial NAD+ -dependent ligases.
14
Cha ter 1
Table 1.2 Experimentally studied NAD+ ligases from eubacteria
Species of eo bacteria
Aquifex aeolicus
Bacillus subtilis
Bacillus Stearothermophilus
Escherichia coli K-12
Comment
VF5 ligA, 720 aa, 82 kudu, 39% identity to E. coli ligA a
YerG, 668 aa, 75 kDa, essential enzyme, 48% identity to E. coli ligA"
Dnlj, 670 aa, 74 kDa, 47% identity to E. coli LigAa Crystal structure obtained of N-terminal domain
LigA, 671 aa, 74 kDa, essential enzyme
Pseudoalteromonas 672 aa, 74 kDa, Haloplanktis 59% identity to E. coli LigA a
Rhodothermus marin us
Thermus filiform is
Thermus scotoductus (Ts)
Thermus species AK16D
Thermus thermophilus (Tth)
Thermus thermophilus HB8
Zymomonas mobilis
712 aa, 80 kDa, 44% identity to E. coli LigAa
Lig, 667 aa, 76 kDa, 45% identity to E. coli LigAa Crystal structure obtained of full length
674 aa, 77 kDa, 45% identity to E. coli LigA a
674 aa, 77 kDa, 44% identity to E. coli LigAa
676 aa, 77 kDa, 45% identity to E. coli LigA•
676 aa, 77 kDa, 45% identity to E. coli LigA •
732 aa, 82 kDa, 45% identity to E. coli LigAa
Reference
Tong eta/. (2000)
Petit and Ehrlich (2000)
Singleton eta!. ( 1999);
Timson and Wigley {I 999)
Lehman (1974)
Georlette et a/. (2000)
Thorbjamardottir ei a/. (1995)
Lee eta/. (2000)
Thorbjamardottir et al. (1995)
Tong eta!. {I 999)
Barany and Gelfand ( 1991)
Takahashi et a/. ( 1984)
Shark and Conway ( 1992)
a- Homology detected using basic BLAST (Altscul eta/., 1990) alignment.
15
Cha ter I
TfiLigA (PDB: I V9P) BstLigA (PDB : I 804)
T7 ATP Ligase (PDB: I AOI) Ch V ATP Ligase (PDB: I FYI)
Fig 1.6: Motifs and domains of NAD+ and ATP ligases
A ribbon diagram representation of the structures of the NAD+ -dependent ligases encoded by T. filiformis (TfiLigA) and B. stearothermophilus (BstLigA) (N-terminal domain only) and A TP-dependent DNA ligases of bacteriophage T7 and Chiarella virus PBCV -1 (Ch V A TP Ligase). The domains are colour coded: In domain 1- subdomain la, blue; subdomain lb, cyan; domain 2 - Oligomer binding fold domain (OB fold) , green; domain 3- st:Jbdomain 3a (zinc finger) , yellow; subdomain 3b (helix-hairpin-helix), red . In all the structures except BstLigA, AMP molecule is shown bound to adenylation domain. All structural representations were created with the program PYMOL (Delano, 2002; http://www.pymol.org).
16
Cha ter 1
1.3 Modular structure of DNA ligases
As discussed above, the elucidation of crystal structures of the A TP-dependent
DNA ligase from bacteriophage T7, Ch V A TP ligase, Hu I (Subramanya et al., 1996;
Odell et al., 2003; Pascal et al., 2004) and theN- terminal fragment ofBstLigA, EfaLigA
and intact TfiLigA (Singleton et al., 1999; Gajiwala and Pinko, 2004; Lee et al., 2000).
All NAD+ -dependent ligases, has revealed that DNA ligases have a highly
modular architecture consisting of a unique arrangement of two or more discrete domains
(Figure 1.6). Both types of ligases are basically composed of two domains with the N
terminal domain containing the active site lysine where adenylation occurs and the C
terminal domain containing the site of DNA binding.
As described above, currently five classes of motifs [nucleotide-binding domain,
OB fold domain, zinc finger, HhH motif and BRCT domain] have been detected at both
the sequence and at structural level (Figure 1.6). The structure and role of these motifs in
the ligase mechanism are discussed.
1.3.1 Adenylation domain
In case of A TP ligase, crystal structures of bacteriophage T7 and chlorella virus
ligases (Subramanya et al., 1996; Odell et al., 2000; Odell et al., 2003) revealed that
although sequentially less similar these classes of enzymes show considerable structural
homology, consisting of two domains (Figure 1.7b), a larger N-terminal domain called
domain 1 and a small C-terminal domain (wheat), domain 2. In eukaryotic ATP dependent
enzymes these two domains are flanked by other sequences that are either at N-terminal or
C-terminal to these domains as shown in figure 1.5 (Tomkinson et al., 1991 ; Tomkinson
and Levin, 1997). Indeed, it seems likely that these "additional" sequences determine the
different functions of various ligases present in eukaryotes. Domain 1 contains A TP
binding site. Doherty and Wigley (1999) have demonstrated that this domain itself has
intrinsic adenylation activity and therefore has been named 'adenylation domain' . Many of
the residues that line the ATP-binding pocket of the adenylation domain of T7, BstLigA
and TfiLigA belong to five (motifs I-V) of the six sequence elements (Figure 1.3)
conserved among covalent nucleotidyltransferase. Examinations of the positions of these
motifs within the DNA ligases indicates that they cluster around the A TP-binding site and
they also form the sides of the groove between domains 1 and 2 (Figure 1.7). In NAD+
ligases this doma- in is composed of two subdomains viz. Ia and lb. Before the crystal
17
Cha l er I
Fig 1.7a: Conserved nucleotidyltransferase motifs in NAD+ ligases
Conserved motifs I to V characteristics of nucleotidyltransferase family of proteins (Figure 1.3) lie in adenylation domain (subdomain 1 b) of NAD+ ligases, here it is shown in subdomain 1 b of the Thermus filiformis NAD+ ligase, TfiLigA (PDB: 1 V9P). Bound AMP is shown in blue. All the structurally conserved motifs are colour coded. Motif I, black; motif III, blue; motif Ilia, green; motif IV, orange; motif V, red. Figure was made with the program PYMOL.
18
Cha fer I
Fig 1.7b: Conserved nucleotidyltransferase motifs in ATP ligases
Conserved motifs I to V characteristics of nucleotidyltransferase family of proteins (Figure 1.3) are depicted in adenylation domain (Domain 1) of bacteriophage T7 ATP ligase. Motif V (red) encompasses domain 1 and 2. Bound AMP is shown in blue. All the structurally conserved motifs are colour coded. Domain 1, cyan; domain 2, wheat motif I, black; motif III, blue; motif Ilia, green; motif IV, orange; motif V, red. Figure was made with the program PYMOL.
19
Cha ter 1
structure of adenylation domain of EfaLigA by Gajiwala and Pinko (2004), role of
subdomain Ia was not very clear although biochemical and mutational analysis of this
subdomain in EcoLigA (Sriskanda and Shuman, 2002) has indicated the role of five highly
conserved residues in reaction with NAD+. Crystal structure of adenylation domain of
EfaLigA in complexes with NMN and NAD+ (PDB: 1 TA8 and 1 TAE) showed the
flexibility of subdomain Ia which involved its movement over lb to generate the complete
active site for NAD+ (Figure 1.8) thus responsible for NMN binding by creating an
interface between Ia and lb. It allows the nicotinamide base to be sandwitched in a n stack
between two highly conserved aromatic residues (Y29 and Y42 in E. faecalis) (Figure
1.9). In E. fa ecalis, an invariant aspartate (D39) makes a hydrogen bond to the amide
nitrogen of nicotinamide. Another invariant aspartate (D43) coordinates the ribose 2' 0
and a tyrosine (Y30) donates a hydrogen bond to the NMN phosphate (Figure 1.9). Such
highly conserved spatial arrangement of these five residues allows the a-phosphate of the
AMP moiety to get exposed to the conserved lysine in subdomain lb (Figure 1.9).
Subdomain lb is basically composed of a cage of ~ strands (Figure 1.6) and interstrand
loops that includes the five defining motifs (Figure 1.7a) of the enzyme superfamily. The
motif I (KXDG) lysine nucleophile is located in a loop between the two antiparallel ~
sheets that form the active site and nucleotide-binding pocket (Figure 1.7a) whereas the a
phosphate of the nucleotide is exposed on the enzyme surface, the purine base of the
nucleotide is buried in a hydrophobic sandwich between the conserved aromatic residue
(Y227 in E. fa ecalis and Y226 in T filiformis) of motif Ilia and a conserved residue in
motif IV (V289 in E. faecalis and T filiformis) (Figure 1.9). The exocyclic 6-amino group
of adenine is coordinated by a conserved glutamate (E175 in E. faecalis and E 174 in T
filiformis) side chain in the bacterial DNA ligase-AMP structures (Figure 1.9) (Lee et al. ,
2000). Mutation analysis of these residues of subdomain 1 b of adenylation domain in E.
coli LigA (Luo and Barany, 1996; Sriskanda et al., 1999; Zhu and Shuman, 2005) have
been shown to be detrimental to its activity thus confirming the importance of these in the
interaction with bound NAD+ within the adenylation domain. Along with Ia, residues
lining the motifs present in domain Ib viz. glutamate in motif III, tyrosine in motif lila
(histidine in E. coli and M. tuberculosis), aspartate in IV and lysine in V are essential for
interaction with AMP and subsequent catalytic efficiency (Zhu and Shuman, 2005).
Reference to the structure of EfaLigA led to discrimination of three classes of essential I
important side chains in E. coli (Zhu and Shuman, 2005) that: (i) contact NAD+ directly
20
lb
Ia
EfaLigA (PDB: 1 TA8). ··Open confonnation·'
-
EfaL igA (PDB: 1 TAE), ··Closed confom1alion'
Fig 1.8: Open & Closed conformations of subdomain la of NAD+ ligases
Cha ter I
Open and Closed conformations of subdomain la (Blue) in NMN (green) bound E. faecal is, N- terminal domain (PDB: 1 TA8) and NAD+ (red) bound E faecalis N-terminal domain (PDB: 1 TAE) in E faecal is NAD+ -dependent DNA ligase. Subdomain 1 b is shown in cyan. Both the molecules represent the crystallographic snapshots of adenylation domain (domain 1) of EfaLigA. In EfaLigA, subdoamin 1 a moves over Ib (Shown by arrow) to generate the complete active site for NAD+ (Gajiwala and Pinko, 2004).
21
Fig 1.9: NAD+ recognition in Enterococcus (aecalis NAD+ ligase adenylation domain
Cha ter I
Interactions of bound NAD+ with conserved residues of domain Ia and lb in closed conformation in crystal structure of adenylation domain of Enterococcus faecafis NAD+ ligase, EfaLigA (PDB : 1 T AE). Figure was adapted from Gaj iwala, K. , and Pinko, C. (2004) Structure 12, 1449-1459.
22
Cha ter 1
(Lys115, Glu173, Lys290, and Lys314); (ii) comprise the inter ace between the NMN
binding domain (subdomain Ia) and the nucleotidyltransferase domain or comprise part of
a nick-binding site on the surface of the nucleotidyltransferase domain (Arg200 and
Arg208); or (iii) stabilize the active site fold of the nucleotidyltransferase domain
(Arg277). Analysis of mutational effects on the isolated ligase-adenylylation and
phosphodiester formation reactions revealed different functions for essential side chains at
different steps of the DNA ligase pathway, consistent with the proposal that the active site
is serially remodeled as the reaction proceeds. Crystal structure of TfiLigA and that of
adenylation domain of EfaLigA suggest that adenylation domain exists in 'open' and
'closed' conformations (Figure 1.8) through its two subdomains Ia and Ib between
subsequent adenylation steps of enzyme and substrate.
1.3.2 Oligomer binding fold domain
In all DNA ligases adenylation domain 1 is connected to a conserved domain 2
(Figure 1.6). The T7 ATP ligase and TfiLigA structures (Subramanya eta/., 1996; Lee et
a/., 2000) revealed that this domain has an OB-fold, a derivative of a Greek key motif also
found in the structures of many proteins that bind to single-stranded and double-stranded
( ds) DNA and RNA (Suck, 1997). This fold is found in diverse range of protein families,
including the bacterial ribosomal proteins S1 (Bycroft eta/., 1997) and S17 (Jaishree et
a/., 1996), the subunits of replication protein A (Bochkarev et a/., 1999), the telomerase
end-binding protein (Horvath et a/., 1998), bacterial cold shock proteins CspA and CspB
(Schindler et a/., 1998), translation initiation factor (IF) SA (Peat et a/., 1998) etc. A
structural comparison is shown in figure 1.1 0. A number of co-crystal structures of these
domains bound to DNA and RNA have established that the OB fold mediates
polynucleotide recognition (Suck, 1997). Biochemical studies have shown that the OB
domain of T7 ATP ligase binds dsDNA and also dramatically enhances the adenylation
activity of domain 1. A direct physical interaction between these domains has been
demonstrated by gel filtration (Doherty and Wigley, 1999). Compared to TfiLigA, the
equivalent domain in T7 DNA ligase is a much shortened version of the OB-fold. Also,
compared with TfiLigA, its orientation in the non-covalent ATP complex is rotated around
the loop just before the first strand of the OB-fold domain so that it's expected DNA
binding surface is not exactly facing active site (Subramanya et al., 1996). This may be
understandable, because the adenylation site should not be blocked by binding the DNA
23
Cha ter 1
until the conserved lysine at the active site is adenylated. This orientation may change
upon self-adenylation so that the putative DNA-binding groove will be completed.
Recently discovered NAD+ ligases from entomopoxviruses have also had complete OB
fold domain (Sriskanda et al., 2001). Since T7 DNA ligase with a more compact OB-fold
domain is fully functional, it is suggested that domain 1 together with the OB-fold domain
is the minimal unit for the bacterial DNA ligases and that this minimal ligase should have
the nick sensing as well as ligation activities which is supported further by recent finding
that the 298 residue Ch V ATP ligase, the smallest eukaryotic DNA ligase known, has
intrinsic specificity for binding to nicked duplex DNA (Odell and Shuman, 1999). Site
directed mutagenesis studies on human DNA ligase III have also implicated this motif in
the interaction with nicked DNA (Mackey et al., 1999).
1.3.3 Zinc finger Motif
Four cysteine residues are conserved in the C-terminal region ofNAD+ -dependent
ligases and they have been implicated in zinc binding and interaction with DNA. In case
of ATP ligases only hu III has been reported so far to have a zinc finger motif at N
terminal (Figure 1.5). Atomic emission spectroscopy confirmed that TfiLigA binds zinc
ions (Lee et al., 2000). In the TfiLigA a Zn ion is tetrahedrally liganded by the four
conserved cysteine residues (Cys406, Cys409, Cys422 and Cys427). This single Zn finger
forms a subdomain (3a) of the larger domain 3 ofTfiLigA (Figure 1.6). The overall fold of
this zinc finger is similar to other Cys4-type zinc fingers in the DNA binding domains
(DBD) of different steroid I nuclear hormone receptor families such as estrogen receptor
(Figure 1.10) (Klug and Schwabe, 1995; Mackay and Crossley, 1998). Such DBDs of
these protein families can also bind to cognate or non-cognate DNA targets as a monomer
(Gewirth and Sigler, 1995). Conceivable roles for the TfiLigA zinc finger motif
(subdomain 3a) may include a direct interaction with the nicked DNA as well as a
structural support for helix-hairpin-helix motifs and BRCT domain which form subdomain
3b and domain 4. This suggestion is largely consistent with the results of mutagenesis of
the zinc-coordinating cysteines, which abolished the DNA-binding activity in TfiLigA
(Jeon et al., 2004) as well as Thermus thermophilus DNA ligase (Luo and Barany, 1996).
The possibility that the Zn finger in NAD+ ligases may be involved in recognizing the nick
in duplex DNA deserves study as it has been demonstrated that the human DNA ligase III
Zn finger forms a specific complex with a nick in duplex DNA (Mackey et al., 1999).
24
T:fi Ligase RPA14
Tfi Ligase ERDBD
;,;i 0-P R-S
r··
~ I •.. ,GII4
~\ Poll3 RuvA
aspartyl tRNA synthetase
GRDBD
\! U-V
. q endonuclease III
Cha ter 1
IFl
GATA-1
~\J X-Y
j£ AlkA
Fig 1.10: Comparisons of different domains present in NAD+ ligases
(A) OB-fold domains of Tfi ligase, human replication protein A subunit (RPA14), yeast aspartyl tRNA synthetase and E.coli translation initiation factor 1 (IFl) are shown in similar orientations.
(B) Cys4-type zinc fingers of Tfi ligase, human estrogen receptor DNA binding domain (ER DBD), rat glucocorticoid receptor (GR DBD) and chicken erythroid transcription factor (GATA-1) are shown in similar orientations.
(C) Similarly, HhH motifs of Tfi ligase (Top row) are shown in similar orientations with HhH motifs from other sources indicated in figure.
25
Cha ter 1
1.3.4 Helix-hairpin-Helix motif domain
Doherty et a!. (1999) predicted the presence of four copies of conserved helix
hairpin-helix (HhH) motifs in the C-terminal region of NAD+ ligases. So far all the
eubacterial NAD+ ligases contain these motifs but the number of helices varies from
species to species. TfiLigA provides a unique example in which the four clustered HhH
motifs form a single compact structure (subdomain 3b). These are helix pairs 0-P, R-S, U
V and X-Y with the intervening hairpins (residues 430-460, 474-498, 502-528 and 537-
560, respectively) (Figure 1.1 0). Interestingly, all the hairpins are located in a linear chain
at the bottom of this subdomain. This surface is also rich in positively charged residues.
Similar HhH motifs are present in a number of DNA repair enzymes (Doherty et al., 1996,
Arvind et al., 1999), including E coli endonuclease III (Thayer et al., 1995). E. coli AlkA
(Labahn et al., 1996) and human polymerase f3 (Pol f3) (Mullen and Wilson, 1997). A
structural comparison is shown in figure 1.1 0. HhH motifs has been implicated in non
sequence specific DNA binding (Thayer et al., 1995; Doherty et al., 1996). In TfiLigA this
subdomain is suggested to provide one of the two DNA-binding sites in TfiLigA.
1.3.5 BRCA 1 like C-terminal domain
The final structural motif currently found in both ATP and NAD+ -dependent DNA
ligases is a member of the BRCA 1 like C-terminal domain superfamily (Bork et al., 1997;
Callebaut and Momon, 1997). BRCT domains are present in NAD+ -dependent ligases and
eukaryotic Iigases III and IV. The structure ofTfiLigA (Lee et al., 2000) is the first case in
which a BRCT domain has been seen as part of a four-stranded parallel f3-sheet flanked by
three a-helices. This fold is grossly similar to that of the C-terminal BRCT domain of the
human repair protein, XRCCI (Zhang et al., 1998). XRCC1 is a multidomain protein
involved in the repair of single-strand breaks in DNA. The most significant characteristic
of the TfiLigA BRCT domain is its high mobility as a whole.
The BRCT domain present in LigA is a distinct version of its kind and is shared by
the large subunits of eukaryotic replication factor C and P ARP (Poly ADP-ribose
polymerase) (Bork et al., 1997). Evolutionarily, it must be the ancestor of eukaryotic
BRCT domains. Mammalian XRCC 1 forms repair complexes with DNA ligase III, P ARP
and Pol p. The two BRCT domains of XRCC1 interact with PARP and DNA ligase III,
while the N-terminal domain of XRCCI interacts with Pol f3 (Marintchev et al., 1999). ·
The XRCC 1 C-terminal BRCT domain forms a specific heterodimer in vitro with the
26
Cha ter 1
BRCT domain of mammalian DNA ligase Ilia (Nash eta/., 1997). All these available data
suggest a plausible scenario for NAD+ ligases function: after other DNA repair
proteins/enzymes recognize and repair damaged DNA, it is recruited to the nick site for
ligation through protein-protein interactions with its BRCT domain. However, BRCT
domain being involved in other uncharacterized functions can not be ruled out.
27