chapter! introduction - shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/23820/8/08... · 2018....

CHAPTER!

INTRODUCTION

Cha ter 1

Tuberculosis (TB) is the leading cause of death from an infectious disease and

affects roughly one-third of the entire world population (Sander et al., 2004).

Mycobacterium tuberculosis is the etiologic agent of tuberculosis in humans. The

sequencing of its genome revealed that it is comprised of 4,411,529 base pairs.

Mycobacterium tuberculosis differs radically from other bacteria in that a significant

portion of its coding capacity is devoted to the production of enzymes involved in

lipogenesis and lipolysis (Cole et al., 1998). Analysis of the sequences have resulted in

attributing precise functions to 40 % of the predicted proteins, some information is

available for 44 % and the remaining 16 % might account for specific mycobacterial

functions (Cole et al., 1998). A more recent re-annotation suggests that the genome

contains 3995 genes coding for proteins (ORFs) and 50 RNA genes (45 tRNA, 3 rRNA

and 2 stable rRNA) (Cole et al., 1998; Camus et al., 2002).

Coordinated efforts have been launched by several Structural Genomics

Consortiums (http://www.rcsb.org/pdb/strucgen.html) like Tuberculosis Structural

Genomics Consortium (http://www.doe-mbi.ucla.edu/TB/) for structural determination

and analysis of a large number of Mycobacterium tuberculosis proteins. Structure

elucidation of these proteins is expected to also greatly contribute to the identification

of novel drug targets and the development of new drugs aided by rational methods.

1.1 Nucleotidyltransferase family of proteins

ATP and NAD+ -dependent DNA ligases, ATP-dependent RNA ligases and GTP

dependent mRNA capping enzymes comprise this superfamily of proteins (Figure 1.1 ).

These proteins generally catalyze nucleotidyl transfer to polynucleotide 5' ends via

covalent enzyme-(lysyl-N)-NMP intermediates (Figure 1.2) (Shuman and Lima, 2004).

Reaction mechanisms of different classes of nucleotidyltransferase family of proteins are

shown in figure 1.2. This superfamily is defined by five peptide motifs that line the

nucleotide-binding pocket and contribute amino acid sidechains essential for catalysis

(Figure 1.3).

The initial clues that polynucleotide ligation and RNA capping might be related

at the level of enzyme structure and mechanism came from the mapping of the lysine

nucleotidylation sites to a peptide motif, Kx(D/N)G, referred to as motif I, that is

conserved among ATP -dependent DNA and RNA ligases, NAD+-dependent DNA

ligases and GTP-dependent RNA capping enzymes (Thogersen et al., 1985;

Cha ter 1

Tomkinson et al., 1991; Cong and Shuman, 1993). Sequence analysis defined four

additional motifs (III, Ilia, IV and V) that are conserved in order and spacing among

polynucleotide ligases and capping enzymes (Shuman and Schwer, 1995). Mutational

analyses of exemplary enzymes (Figure 1.1) have shown that conserved amino acids

within the five motifs are essential for the function of capping enzymes and

polynucleotide ligases (Sawaya and Shuman, 2003; Zhu and Shuman, 2005). The

prediction that the five motifs comprise the active site of ligases and capping enzymes

was affirmed initially by the crystal structures of bacteriophage T7 ATP-dependent

DNA ligase (Subramanya et al., 1996) and NAD+-dependent bacterial DNA ligases

from Bacillus stearathermaphilus (Bst) (BstLigA) and Thermus filifarmis (Tfi)

(TfiLigA) (Singleton et al., 1999; Lee et al., 2000) as also the Chiarella virus (ChV)

mRNA capping enzyme (Hakansson et al., 1997). These studies revealed a shared core

tertiary structure for DNA ligases and capping enzymes, composed minimally of a

nucleotidyltransferase domain fused to a C-terminal OB-fold domain. Subsequently,

structures of the Candida albicans (Cal) mRNA capping enzyme-GMP intermediate

(Fabrega et al., 2003), the Chiarella virus DNA ligase-AMP intermediate (Odell et al.,

2000; Odell et al., 2003) and Enteraccaccus faecalis DNA ligase bound to NAD+

(Gajiwala and Pinko, 2004) have captured family members in new conformations that

are mechanistically instructive.

1.2 DNA Ligases

DNA ligases join breaks in the phosphodiester backbone of DNA molecules and

are used in many essential reactions viz. DNA replication, repair, and recombination

within the cell (Cao, 2002). All DNA ligases follow the same reaction mechanism, but

they may use either A TP or NAD+ as a cofactor. In the first step, attack on the a

phosphorus of ATP or NAD+ by the ligase results in release of pyrophosphate or

nicotinamide mononucleotide (NMN) respectively and formation of a covalent ligase

(lysyl-N)-AMP intermediate. In the second step, the AMP is transferred to the 5' end of

the 5'-phosphate-terminated DNA strand to form a DNA-adenylate intermediate,

A(5')pp(5')DNA. In the third step, the ligase catalyzes attack by the 3'-0H of the nick on

the DNA-adenylate to join the two polynucleotides and release AMP (Doherty and Suh,

2000) (Figure 1.4).

2

Cha fer I

A RNA Ligase B Capping enzyme

C NAD+ Ligase D ATP Ligase

Fig 1.1: Structures of covalent nucleotidyltransferase family members.

(A) T4 RNA ligase 2 (Rnl2) bound to AMP, PDB : 1S68. (B) C. albicans mRNA guanylyltransferase-GMP intermediate (capping enzyme), PDB: 1P16. (C) E. Jaecalis NAD+-dependent DNA ligase (EfaLigA) bound to NAD+, PDB: lTAE. (D) Chiarella virus A TP -dependent DNA ligase-AMP intermediate, PDB: 1 FVI.

The structures were aligned with respect to their nucleotidyltransferase domains, the secondary structure elements of which are depicted with helices colored red and ~ strands colored yellow. Intervening loops and the respective flanking domains, when present, are shown in blue worm representation. For clarity, segments of the OB-fo ld domain and an extension of the C-terminal helix of the C. albicans capping enzyme were truncated. All structural representations were created with the program PYMOL (Delano, 2002; http://www.pymol.org).

3

Cha ter I

(A) DNA Ligase

(1) E + pppA ... EpA + PPi

(2) EpA + pDNA ... AppDNA + E

(3) DNAoH + AppDNA ~ DNApDNA + Ap

(B) RNA Ligase

(1) E + pppA .... EpA + PPi

(2) EpA + pRNA .. AppRNA + E

(3) RNAOH + AppRNA ... RNApRNA + Ap

(C) RNA capping enzyme

(1) E + pppG

(2) EpG + ppRNA

Phosphoramidate Intermediate

Fig 1.2: Reaction mechanisms

---~...,~ EpG + PPi

-•• GpppRNA + E

H 0 I + II

Lys - N- P- o - (Nucleoside) I I H o-

Conserved pathway of nucleotidyl transfer to polynucleotide ends catalyzed by (A) DNA ligase, (B) RNA ligase and (C) RNA capping enzyme. The first step in each pathway entails the attack of the enzyme (E) on the a.-phosphorus (p) of the nucleotide substrate to form a covalent enzyme-NMP intermediate (EpN). The chemical structure of the phosphoramidate intermediate is depicted at the bottom. Figure was adapted from Shuman, S., and Lima, C. D. (2004) Curr. Opin. Struct. Bioi. 14, 757-64.

4

Cha ter 1

Ill lila IV v Bfa K1DGLA VBVRGECY LNTFL FPPHE Tfi KVDGL£ LBVRG y LRAT FPAHB

Rn12 K1HGTN YQVFGEPA DKDF 011 N£KF£

T7 KYDGVR F~1LDG v LH1KLYA1L EGL1V ChV K1DGIR HGSDGE1S PSYY" DYV EGl-1VI

ChV KTDGIR S1FtC c PAPVLPDAV DGL1I PGTHH Cal KTDGLR TLLDG v LRYVI PDAL DGL1Y PAEHN

ctP04 Ribose Ribose Purine Purine ·p'u PO 4

Metal

Fig 1.3: Structural motifs of the covalent nucleotidyltransferase family

The amino acid sequences of motifs I, ill, illa, IV and V are aligned for the NAD+dependent DNA ligases of E. faecalis (Efa) and Thermus fi/iformis (Tfi) , the A TPdependent T4 RNA ligase 2 (Rnl2), the A TP-dependent DNA ligases of bacteriophage T7 and Chlorel/a virus (Ch V), and the GTP-dependent capping enzymes of Chi orella virus and C. a/bicans (Cal). Conserved active site constituents are highlighted in red. Specific contacts between conserved amino acid side chains and the nucleotide substrate or divalent metal seen in exemplary crystal structures are indicated by arrowheads. Figure was adapted from Shuman, S., and Lima, C. D. (2004) Curr. opin. Struct. Bioi. 14, 757-64.

5

Cha ter 1

All bacteria (eubacteria) (Wilkinson et al., 2001) and entomopoxviruses (Sriskanda

eta!., 2001) contain NAD+ -dependent DNA ligases. In addition to their NAD+ -dependent

enzymes, some bacteria contain genes for putative ATP-dependent DNA ligases. Studies

have been performed on DNA ligases from each domain of life and those encoded by the

genomes of various viruses (Wilkinson et al., 2001 ). Eukaryotic cells. contain a number of

different ligases that use ATP, with each enzyme being used in specific end joining

reactions. ATP -dependent ligases range in sizes from 30 to > 100 kDa but NAD+ ligases

are highly homologous and are monomeric proteins of 70-80 kDa (Doherty and Suh,

2000). As discussed above, both types of ligases share a catalytic core, which consists of

conserved motifs found in nucleotidyltransferase superfamily. These motifs are essential

for adenylation cofactor binding, metal ion coordination, and ligation chemistry, as

validated by X-ray crystal structure determination and mutational analyses (Cao, 2002).

The requirement for these different isozymes in bacteria is unknown, but may be related to

their utilization in different aspects of DNA metabolism. The putative ATP-dependent

DNA ligases found in bacteria are most closely related to proteins from archaea and

viruses. Phylogenetic analysis suggests that all NAD+ -dependent DNA ligases are closely

related, but the ATP-dependent enzymes have been acquired by genomes of eukarya,

bacteria and viruses on a number of separate occasions (Wilkinson et al., 2001 ).

1.2.1 ATP-dependent DNA ligases

As discussed above, ATP dependent ligases have been found in bacteriophages

(Doherty and Wigley, 1999), eubacteria (Cheng and Shuman, 1997), archaea (Sriskanda et

al., 2000; Lai et al., 2001), viruses (Ho et al., 1997; Sekiguchi and Shuman, 1997) and

eukarya (Tomkinson and Levin, 1997; Timson et al., 2000; West et a!., 2000). Domain

organization of ATP ligases from different sources are shown in figure 1.5. A brief

discussion on each one ofthem is as follows:

Viral ATP-ligases are in general smaller in size as compared with ligases from

more complexed organisms such as yeast, plants, and humans (Figure 1.5). 298 amino acid

ATP ligase from Paramecium bursaria Chiarella virus-1 (PBCV-1) protein (Ho et al.,

1997; Odell eta!., 2000) composed of only ligase core domain (Figure 1.5) has been used

as a biochemical model to study the catalytic mechanism of A TP ligases.

6

Cha ter 1

Step 1: Enzyme Adenylation

ATP PPi

E + or E-AMP + or

NAD+ NMN

Step 2: Substrate Adenylation

E- AMP + nicked DNA .,.~.__ ____ ... E • AMP - nicked DNA

AppDNA

Step 3: Nick Closure

E • AMP- nicked DNA E + sealed DNA + AMP

Fig 1.4: Detailed reaction mechanism of DNA ligases

Three steps of a ligation reaction. In the enzyme adenylation step, an· AMP group is transferred from the cofactor NAD+ or ATP to a lysine residue in the adenylation motif I, KXDG, through a phosphoamide linkage. In the substrate adenylation step, this AMP group is transferred to the 5' phosphate at the nick through a pyrophosphate linkage to form a DNA-adenylate intermediate (AppDNA). In the nick-closure step, a phosphodiester bond is formed to seal the nick and release AMP.

7

Cha ter 1

T4 ATP ligase is a 487 amino-acid protein capable of both nick sealing and blunt

end ligation (Rossi et al., 1997). The 359 amino acid-containing T7 ligase is more than

100 amino acids shorter than T4 DNA ligase and its crystal structure (PDB: 1AOI) offers

the first atomic view of DNA ligases (Subramanya et al., 1996) (Figure 1.6). T7 DNA

ligase is organized by a larger N-terminal domain encompassing motifs I, III, Ilia, IV, and

a portion of V and a smaller C-terminal domain known as Oligomer binding fold domain

(OB), which encompasses motifs V and VI (Figure 1.5 and 1. 7b ). Sequencing of the

vaccinia virus genome reveals an A TP-dependent ligase. Vaccinia ligase (Vac) strictly

requires ATP for adenylation, which can not be substituted. The ligation reaction

catalyzed by the vaccinia ligase requires Mg2+ or Mn2+.

Archaea

The information processing machinery in archaea in general resembles eukaryotes more

closely than prokaryotes. This is evident in the presence of A TP -dependent ligases in archaeal

genomes (Kletzin, 1992). The 561 amino acid DNA ligase from thermophilic archaean

Methanobacterium thermoautotrophicum has been characterized (Sriskanda et al., 2000). This

ligase uses ATP as the adenylation cofactor and to a less extent dATP. Another characterized

archaeal A TP ligase from Thermococcus kodakaraensis KOD 1 is unique in adenylation cofactor

selectivity (Nakatani et al., 2000). This ligase not only uses ATP as the adenylation cofactor, but

also to a limited extent uses NAD+ as a cofactor (Nakatani et al., 2000).

Bacteria

The outpouring of bacterial genome data led to identification and characterization

of several ATP ligases from different bacterial species (Table 1.1) (Wilkinson et al., 2001;

Weller and Doherty, 2001). The open reading frames (ORFs) encoding putative bacterial

ATP ligases often contain primase and nuclease domains as well (Figure 1.5) (Weller and

Doherty, 2001). Furthermore, the ligase genes are organized with a putative prokaryotic

end binding Ku protein core as an operon (Doherty et al., 2001 ). This raises interesting

speculations about the role of these ligases and other domains in DNA repair. The

bacterial ATP ligases are additional to NAD+ -dependent ligase encoded by the bacterial

genomes (Wilkinson et al., 2001). A 268 amino acid ORF containing conserved

nucleotidyltransferase signature from the first sequenced bacterium Haemophilus

injluenzae has been confirmed as an ATP ligase (Cheng and Shuman, 1997). Disruption of

this ATP ligase in Haemophilus injluenzae results in loss of viability (Preston et al., 1996).

Bacillus subtilis genome encodes two putative ATP ligases, YkoU and YoqV besides one

8

Cha ter 1

NAD+ ligase, YerG (Petit and Ehrlich, 2000). Aquifex aeolicus genome encodes an ATP

ligase with limited nick sealing activity (Tong eta/., 2000). Obviously, the physiological

role of bacterial ATP ligases has yet to be defined. Mycobacterium tuberculosis is unique

in that it contains three different ATP dependent ligases, viz. LigB, LigC and LigD (Gong

eta/., 2004). Recently, it has been reported (Gong eta/., 2005) that polyfunctional LigD in

Mycobacterium tuberculosis is involved in non-homologous end joining (NHEJ) along

with another factor Ku (Della et a/., 2004) while LigC provides a backup mechanism for

LigD-independent error-prone repair of blunt-end double strand breaks (DSBs).

Yeast

Two ATP-dependent ligases have been discovered in the budding yeast

Saccharomyces cerevisiae (See) (Figure 1.5). The 755 amino acid Cdc9 protein (Barker et

a/., 1985; Kulotti eta/., 1971) is responsible for joining Okazaki fragments during lagging

strand synthesis and is essential for base excision repair and nucleotide excision repair

(Johnston, 1983; Wu eta/., 1999). Both nuclear and mitochondria forms of DNA ligase is

encoded by cdc9 (Willer eta/., 1999; Donahue et al., 2001). The mitochondrial version

contains a 23 amino acid ·signal sequence that targets the ligase to mitochondria. The

complete budding yeast genome, however, reveals a second ATP-dependent ligase, Ligase

IV (Ramos eta/., 1997; Teo and Jackson, 1997; Wilson eta/., 1997). This 944 amino acid

protein is highly homologous with human DNA ligase IV throughout the coding region

but shows weak ligation activity. Genetic analysis demonstrates that yeast ligase IV is not

required for DNA replication, homologous recominbation or repair of UV-induced

damage; however, it is an essential component of DSB repair called non-homologous end

joining (NHEJ) (Ramos eta/., 1997, Schar eta/., 1997, Teo and Jackson, 1997). Yeast

ligases contain two BRCT (BRCA1 like C-!erminus) domains in the C-terminal region.

Ligase IV through its BRCT domain interacts with another protein Lifl (a homolog of

human X-ray cross-complementing group IV, XRCC4) to enhance its ligation activity by

stimulating enzyme adenylation (step 1) (Teo and Jackson, 2000).

Mammalian

Mammals have at least three ligases: Ligase I, III, and IV (Tomkinson and Lewin,

1997; Tomkinson and Mackay, 1998; Lasko et a/., 1990; Lindahl and Barnes, 1992;

Timson eta/., 2000). Ligase II has distinct biochemical properties compared with ligase I.

Ligase II is now considered as a proteolysis fragment of ligase III. Ligase III has two

isoforms generated by alternative splicing (Chen eta/., 1995; Wei eta/., 1995). The 922

9

Cha ter I

amino acid ligase lila contains a BRCT domain at the C-terminus, which is largely deleted

in the 862 amino acid ligase IIIp (Figure 1.5). BRCT domain plays an important role in

interacting with several partner proteins to make a complex, which is involved in base

excision repair (Rice, 1999). Ligase III gene also encodes mammalian mitochondrial

ligases (Lakshmipathy and Campbell, 1999; Pinz and Bogenhagen, 1998). A 44 kDa

protein has been identified from human cells as ligase V (Johnson and Fairman, 1997).

This ligase has a double-strand joining activity similar to ligase I but is weak in nick

sealing. Domain architectures ofhuman ATP ligases are sketched in figure 1.5.

The 919 amino acid human ligase I (Hu I) contains a catalytic core and an N

terminal domain that is required for cellular localization and protein-protein interactions

(Tomkinson and Lewin, 1997, Lindahl and Barnes, 1992) (Figure 1.5). Recently, crystal

structure of hu I (residues 233 to 919) in complex with a nicked, 5' adenylated DNA

intermediate has been reported (Pascal et al., 2004). The structure reveals a unique feature

of mammalian ligases: a DNA-binding domain that allows ligase I to encircle its DNA

substrate, stabilizes the DNA in a distorted structure, and positions the catalytic core on

the nick. Ligase I efficiently joins nicks in double-stranded DNA, involves in maturation

of Okazaki fragments during replication and base excision repair and joins blunt-ended

DNA optimally with 17.5% PEG 6000 (Arrand et al., 1986). In addition to nick sealing,

ligase III is also able to catalyze joining of oligo ( dT)/poly (rA) and oligo (rA)/poly ( dT)

(Robins and Lindahl, 1996; Arrand et al., 1986; Yang and Chan, 1992; Tomkinson et al.,

1991). Human ligase IV (Hu IV), in addition to efficient joining of nicks on a DNA

template, also joins nicks on an RNA template (Robins and Lindahl, 1996). These

enzymatic properties are similar to human ligase III. The 844 amino acid ligase IV

contains two BRCT domains at the C-terminus (Figure 1.5). The linker regions between

the two BRCT domains are essential for binding of the partner protein XRCC4

(Grawunder et al., 1998). Both XRCC4 and Ku protein stimulate the ligation activity of

ligase IV (Grawunder et al., 1997; Ramsden and Gellert, 1998), while Ku also stimulates

end-joining activities of hu I and III (Ramsden and Gellert, 1998).

1.2.2 NAD+ -dependent DNA ligases

Among the first DNA ligases to be purified and analyzed biochemically was the

Escherichia coli (Eco) NAD+ -dependent ligase enzyme (EcoLigA) (Olivera and Lehman,

1967a; Olivera and Lehman, 1967b), which has served as a paradigm for studies ofNAD+-

10

Cha ter I

dependent DNA ligases (LigA) (Lehman, 1974). EcoLigA is an essential enzyme that

consists of671 amino acids (molecular weight of74 kDa) and is encoded by ligA. NAD+

dependent ligases have been found in bacteria and more recently in entomopoxviruses,

Amsacta moorei entomopoxvirus (AmEPV) (Sriskanda et al., 2001) and Melanoplus

sanguinipes entomopoxvirus (MsEPV) (Lu eta/., 2004). They have not been detected in

humans or any other eukaryotes except entomopoxviruses. The typical domain

organization of NAD+ -dependent ligases from eubacterial and entomopoxviruses is

shown in figure 1.5. NAD+ -dependent DNA ligases have been identified in every

bacterial species that has been sequenced so far. Many of these enzymes have been cloned,

sequenced and biochemically characterized and include those from several thermophilic

and cold-adopted seawater bacterial species like Pseudoalteromonas haloplanktis (Table

1.1 and 1.2) (Ishino et a/., 1986; Barany and Gelfend, 1991; Lauer et a/., 1991; Shark and

Conway, 1992; Jonsson et a/., 1994; Thorbjamardottir et a/., 1995; Brannigan et a/.,

1999). These enzymes are of fairly uniform size and show a considerable degree of amino

acid sequence homology. For LigA shown in Tables 1.2, a basic BLAST alignment with

the EcoLigA detects typical amino acid sequence identify of35-50% (average 42 %).

In studies involving temperature-sensitive or deletion mutants, LigA were shown

to be essential for survival in several bacterial species, LigA in Escherichia coli (Dermody

et a/., 1979) and Salmonella typhimurium (Park et a/., 1989), Y erG in Bacillus sub til is

(Petit and Ehrlich, 2000), Lig in Staphylococcus aureus (Kaczmarek et al., 2001) and

LigA in Mycobacterium tuberculosis (Gong eta/., 2004; Sassetti eta/., 2003). Due to

the essential involvement of LigA in replication, its inactivation leads to the non

viability of most bacteria. Therefore, LigA presents an attractive target for broad

spectrum antimicrobial therapy predicated on blocking the reaction of LigA with

NAD+ (Brotz-Oesterhelt et a/., 2003; Gong et a/., 2004; Georlette et a/., 2003;

Kaczmarek et a/., 2001 ).

Limited proteolysis divides the enzyme into two functional domains: an N

terminal domain responsible for NAD+ binding and for the self-adenylation reaction

and a C-terminal, DNA binding domain (Timson and Wigley, 1999).

The high thermostability of these enzymes from thermophilic bacteria aids

structural and mechanistic studies of DNA ligation and has led to the use of DNA

ligases from such organisms as model systems for studies of these enzymes.

Thermostable bacterial DNA ligases have also proved to be a valuable tool for the detection

of mutations that lead to various cancers (Khanna et a/., 1999).

11

Chv

T7

Vac

ykoU

See

See IV

Hul

+ PCNAorPol ~

Hu lila lzn I

Hu III~

HuiV

Eubacterial NAD+ ligase

MsEPV ligase

Ligase Core I Ligase Core . I

Ligase Core

Ligase Core

Ligase Core

Ligase Core

Ligase Core

Primase I

II BRcrllaRCTII t

XRCC4

Ligase Core I BRcrl t

XRCCI

Ligase Core I I

Ligase Core II BRcrll BRcrll t

XRCC4

(A)

II a I Adenylation I OB lznl HhH I BRCTI

ll a I Adenylation I OB lzn·l HhH I (B)

Fig 1.5: Domain architecture of DNA Iigases

Cha ter 1

(A) Domain structure of ATP ligases. Ligase core: the ligase core domain containing the conserved nucleotidyltransferase motifs as seen in Figure 1.3. BRCT: BRCAI like C-!erminus. Zn: zinc finger motif. Hatched box in human ligase I: PCNA or DNA polymerase p (pol p) interaction domain.

(B) Domain structure of NAD+ Iigases. Adenylation: the adenylation domain containing motifs I, III, Ilia, IV, and V of the nucleotidyltransferase core. OB: OB-fold, Zn: zinc finger motif, zn·: a region in MsEPV ligase that is homologous to the Zn finger motif in eubacterial NAD+ ligase but without four conserved Cys for Zn coordination, HhH: the four Helix-hairpin-Helix motifs, BRCT: BRCAl C!erminus. Figure was adapted from Cao, W. (2002) Curr. org. chem. 6, 1-13.

12

Cha ter 1

Table 1.1: Eubacterial genome containing both NAD+ & ATP ligases

Speciesofeubacteria gene Cofactor Amino acids, M. Wt.c

Aquifex aeolicus VF5 ligA NAD+ Lig ATP

Bacillus subtilis yerGa NAD+ yoqVZ ATP ykoU ATP

Campylobacter jejuni ligA NAD+ CJJ669c ATP

Haemophilus influenzae Rd Hill 00 (lig) NAD+ URFlb ATP

Mycobacterium tuberculosis ligAd (Rv3014c) NAD+ ligB (Rv3062) ATP ligC (Rv3731) ATP ligD (Rv0938) ATP

Neisseria meningitidis ligA NAD+ NMA0388 ATP

Neisseria meningitidis NMB0666 NAD+ NMB2048 ATP

Pseudomonas aeruginosa P AO 1 lig PA2138

NAD+ ATP

Vibrio cholerae EI TorN1696 VC0971 VC1542

NAD+ ATP

720 aa, 82.3 kDa 585 aa, 67.1 kDa

668 aa, 74.9 kDa 270 aa, 31.0 kDa 611 aa, 70.2 kDa

647 aa, 73.9 kDa 282 aa, 32.5 kDa

679 aa, 75.2 kDa 268 aa, 30.9 kDa

691 aa, 75.3 kDa 507 aa, 53.7 kDa 358 aa, 40.2 kDa 759 aa, 83.6 kDa

841 aa, 92.4 kDa 274 aa, 30.7 kDa

841 aa, 92.4 kDa 274 aa, 30.7 kDa

794 aa, 86.8 kDa 840 aa, 94.0 kDa

669 aa, 73.3 kDa 282 aa, 31.8 kDa

a - yerG is an essential gene, but yoq Vis not (Petit and Ehrlich, 2000). b - Confirmed as an ATP-dependent DNA ligase by Cheng and Shuman (1997). c- Molecular weight in kilo Dalton (kDa). d -ligA is an essential gene, but ligB, ligC, ligD are dispensable (Gong et al., 2004).

13

Cha ter 1

Initial elucidation of crystal structure of N-terminal fragment of BstLigA

(Singleton et al., 1999) (PDB: 1B04) (Figure 1.6) led to the identification of

nucleotidyltransferase or adenylation domain which was composed of two

subdomains Ia and Ib while Ib has structural homology with N-terminal domains of

other nucleotidyltransferase family of proteins, Ia was unique to this class of

enzymes. Later, the crystal structure of intact TfiLigA (PDB: 1 V9P) (Lee et al.,

2000) further revealed that structurally, NAD+ ligases are organized by a catalytic

core at the N-terminal (which include the N-terminal adenylation domain and OB

fold) and a Zn finger, four Helix-hairpin-Helix (HhH) motifs and BRCT domain at

C-terminal (Figure 1.6). These structures also confirmed that conserved

nucleotidyltransferase motifs (Figure 1.3) which were part of ATP ligases and RNA

capping enzymes and predicted in NAD+ ligases (Sriskanda et al., 1999; Doherty

and Suh, 2000; Arrand et al., 1986) also existed in the catalytic core of these NAD+

ligases. Figure 1. 7 a shows the presence of all these motifs in TfiLigA, sequences of

which are shown in figure 1.3. Although, these ligases utilize NAD+ as cofactor in

ligation reaction none of these crystal structure showed NAD+ in the enzyme active

site. Only TfiLigA ligase structure showed non-covalently bound AMP in the active

site. However, crystal structures of BstLigA apoenzyme (PDB: IB04) (Singleton et

al., 1999) and TfiLigA covalent ligase-adenylate intermediate (PDB: 1 V9P) (Lee et

al., 2000) confirmed that AMP binding pocket was located within a

nucleotidyltransferase domain with ATP-dependent ligases. Only difference lies in

the presence of subdomain Ia in theN-terminal region of NAD+ ligases as compared

to A TP ligases. Subsequent biochemical and mutational analysis (Sriskanda and

Shuman, 2002) on EcoLigA showed the role of some of the highly conserved

residues in the subdomain Ia to play important role in the step 1 of ligation reaction

(Figure 1.4) indicating the role subdomain Ia might play in the reaction of the

enzyme with NAD+ which was subsequently proved by crystallographic snapshots of

adenylation domain ofNAD+ -dependent ligase of Enterococcus faecalis (EfaLigA)

in complexes with NMN and NAD+ (Gajiwala and Pinko, 2004). Recent findings of

Gajiwala and Pinko (2004) and three-dimensional structure of TfiLigA which

presents the arrangement of the different domains in these classes of ligases give a

clear picture of the mode of NAD+ binding in domain I and spatial arrangement of

different domains of eubacterial NAD+ -dependent ligases.

14

Cha ter 1

Table 1.2 Experimentally studied NAD+ ligases from eubacteria

Species of eo bacteria

Aquifex aeolicus

Bacillus subtilis

Bacillus Stearothermophilus

Escherichia coli K-12

Comment

VF5 ligA, 720 aa, 82 kudu, 39% identity to E. coli ligA a

YerG, 668 aa, 75 kDa, essential enzyme, 48% identity to E. coli ligA"

Dnlj, 670 aa, 74 kDa, 47% identity to E. coli LigAa Crystal structure obtained of N-terminal domain

LigA, 671 aa, 74 kDa, essential enzyme

Pseudoalteromonas 672 aa, 74 kDa, Haloplanktis 59% identity to E. coli LigA a

Rhodothermus marin us

Thermus filiform is

Thermus scotoductus (Ts)

Thermus species AK16D

Thermus thermophilus (Tth)

Thermus thermophilus HB8

Zymomonas mobilis

712 aa, 80 kDa, 44% identity to E. coli LigAa

Lig, 667 aa, 76 kDa, 45% identity to E. coli LigAa Crystal structure obtained of full length

674 aa, 77 kDa, 45% identity to E. coli LigA a


676 aa, 77 kDa, 45% identity to E. coli LigA•

676 aa, 77 kDa, 45% identity to E. coli LigA •


Reference

Tong eta/. (2000)

Petit and Ehrlich (2000)

Singleton eta!. ( 1999);

Timson and Wigley {I 999)

Lehman (1974)

Georlette et a/. (2000)

Thorbjamardottir ei a/. (1995)

Lee eta/. (2000)

Thorbjamardottir et al. (1995)

Tong eta!. {I 999)

Barany and Gelfand ( 1991)

Takahashi et a/. ( 1984)

Shark and Conway ( 1992)

a- Homology detected using basic BLAST (Altscul eta/., 1990) alignment.

15

Cha ter I

TfiLigA (PDB: I V9P) BstLigA (PDB : I 804)

T7 ATP Ligase (PDB: I AOI) Ch V ATP Ligase (PDB: I FYI)

Fig 1.6: Motifs and domains of NAD+ and ATP ligases

A ribbon diagram representation of the structures of the NAD+ -dependent ligases encoded by T. filiformis (TfiLigA) and B. stearothermophilus (BstLigA) (N-terminal domain only) and A TP-dependent DNA ligases of bacteriophage T7 and Chiarella virus PBCV -1 (Ch V A TP Ligase). The domains are colour coded: In domain 1- subdomain la, blue; subdomain lb, cyan; domain 2 - Oligomer binding fold domain (OB fold) , green; domain 3- st:Jbdomain 3a (zinc finger) , yellow; subdomain 3b (helix-hairpin-helix), red . In all the structures except BstLigA, AMP molecule is shown bound to adenylation domain. All structural representations were created with the program PYMOL (Delano, 2002; http://www.pymol.org).

16

Cha ter 1

1.3 Modular structure of DNA ligases

As discussed above, the elucidation of crystal structures of the A TP-dependent

DNA ligase from bacteriophage T7, Ch V A TP ligase, Hu I (Subramanya et al., 1996;

Odell et al., 2003; Pascal et al., 2004) and theN- terminal fragment ofBstLigA, EfaLigA

and intact TfiLigA (Singleton et al., 1999; Gajiwala and Pinko, 2004; Lee et al., 2000).

All NAD+ -dependent ligases, has revealed that DNA ligases have a highly

modular architecture consisting of a unique arrangement of two or more discrete domains

(Figure 1.6). Both types of ligases are basically composed of two domains with the N

terminal domain containing the active site lysine where adenylation occurs and the C

terminal domain containing the site of DNA binding.

As described above, currently five classes of motifs [nucleotide-binding domain,

OB fold domain, zinc finger, HhH motif and BRCT domain] have been detected at both

the sequence and at structural level (Figure 1.6). The structure and role of these motifs in

the ligase mechanism are discussed.

1.3.1 Adenylation domain

In case of A TP ligase, crystal structures of bacteriophage T7 and chlorella virus

ligases (Subramanya et al., 1996; Odell et al., 2000; Odell et al., 2003) revealed that

although sequentially less similar these classes of enzymes show considerable structural

homology, consisting of two domains (Figure 1.7b), a larger N-terminal domain called

domain 1 and a small C-terminal domain (wheat), domain 2. In eukaryotic ATP dependent

enzymes these two domains are flanked by other sequences that are either at N-terminal or

C-terminal to these domains as shown in figure 1.5 (Tomkinson et al., 1991 ; Tomkinson

and Levin, 1997). Indeed, it seems likely that these "additional" sequences determine the

different functions of various ligases present in eukaryotes. Domain 1 contains A TP

binding site. Doherty and Wigley (1999) have demonstrated that this domain itself has

intrinsic adenylation activity and therefore has been named 'adenylation domain' . Many of

the residues that line the ATP-binding pocket of the adenylation domain of T7, BstLigA

and TfiLigA belong to five (motifs I-V) of the six sequence elements (Figure 1.3)

conserved among covalent nucleotidyltransferase. Examinations of the positions of these

motifs within the DNA ligases indicates that they cluster around the A TP-binding site and

they also form the sides of the groove between domains 1 and 2 (Figure 1.7). In NAD+

ligases this domain is composed of two subdomains viz. Ia and lb. Before the crystal

17

Cha l er I

Fig 1.7a: Conserved nucleotidyltransferase motifs in NAD+ ligases

Conserved motifs I to V characteristics of nucleotidyltransferase family of proteins (Figure 1.3) lie in adenylation domain (subdomain 1 b) of NAD+ ligases, here it is shown in subdomain 1 b of the Thermus filiformis NAD+ ligase, TfiLigA (PDB: 1 V9P). Bound AMP is shown in blue. All the structurally conserved motifs are colour coded. Motif I, black; motif III, blue; motif Ilia, green; motif IV, orange; motif V, red. Figure was made with the program PYMOL.

18

Cha fer I

Fig 1.7b: Conserved nucleotidyltransferase motifs in ATP ligases

Conserved motifs I to V characteristics of nucleotidyltransferase family of proteins (Figure 1.3) are depicted in adenylation domain (Domain 1) of bacteriophage T7 ATP ligase. Motif V (red) encompasses domain 1 and 2. Bound AMP is shown in blue. All the structurally conserved motifs are colour coded. Domain 1, cyan; domain 2, wheat motif I, black; motif III, blue; motif Ilia, green; motif IV, orange; motif V, red. Figure was made with the program PYMOL.

19

Cha ter 1

structure of adenylation domain of EfaLigA by Gajiwala and Pinko (2004), role of

subdomain Ia was not very clear although biochemical and mutational analysis of this

subdomain in EcoLigA (Sriskanda and Shuman, 2002) has indicated the role of five highly

conserved residues in reaction with NAD+. Crystal structure of adenylation domain of

EfaLigA in complexes with NMN and NAD+ (PDB: 1 TA8 and 1 TAE) showed the

flexibility of subdomain Ia which involved its movement over lb to generate the complete

active site for NAD+ (Figure 1.8) thus responsible for NMN binding by creating an

interface between Ia and lb. It allows the nicotinamide base to be sandwitched in a n stack

between two highly conserved aromatic residues (Y29 and Y42 in E. faecalis) (Figure

1.9). In E. fa ecalis, an invariant aspartate (D39) makes a hydrogen bond to the amide

nitrogen of nicotinamide. Another invariant aspartate (D43) coordinates the ribose 2' 0

and a tyrosine (Y30) donates a hydrogen bond to the NMN phosphate (Figure 1.9). Such

highly conserved spatial arrangement of these five residues allows the a-phosphate of the

AMP moiety to get exposed to the conserved lysine in subdomain lb (Figure 1.9).

Subdomain lb is basically composed of a cage of ~ strands (Figure 1.6) and interstrand

loops that includes the five defining motifs (Figure 1.7a) of the enzyme superfamily. The

motif I (KXDG) lysine nucleophile is located in a loop between the two antiparallel ~

sheets that form the active site and nucleotide-binding pocket (Figure 1.7a) whereas the a

phosphate of the nucleotide is exposed on the enzyme surface, the purine base of the

nucleotide is buried in a hydrophobic sandwich between the conserved aromatic residue

(Y227 in E. fa ecalis and Y226 in T filiformis) of motif Ilia and a conserved residue in

motif IV (V289 in E. faecalis and T filiformis) (Figure 1.9). The exocyclic 6-amino group

of adenine is coordinated by a conserved glutamate (E175 in E. faecalis and E 174 in T

filiformis) side chain in the bacterial DNA ligase-AMP structures (Figure 1.9) (Lee et al. ,

2000). Mutation analysis of these residues of subdomain 1 b of adenylation domain in E.

coli LigA (Luo and Barany, 1996; Sriskanda et al., 1999; Zhu and Shuman, 2005) have

been shown to be detrimental to its activity thus confirming the importance of these in the

interaction with bound NAD+ within the adenylation domain. Along with Ia, residues

lining the motifs present in domain Ib viz. glutamate in motif III, tyrosine in motif lila

(histidine in E. coli and M. tuberculosis), aspartate in IV and lysine in V are essential for

interaction with AMP and subsequent catalytic efficiency (Zhu and Shuman, 2005).

Reference to the structure of EfaLigA led to discrimination of three classes of essential I

important side chains in E. coli (Zhu and Shuman, 2005) that: (i) contact NAD+ directly

20

lb

Ia

EfaLigA (PDB: 1 TA8). ··Open confonnation·'

-

EfaL igA (PDB: 1 TAE), ··Closed confom1alion'

Fig 1.8: Open & Closed conformations of subdomain la of NAD+ ligases

Cha ter I

Open and Closed conformations of subdomain la (Blue) in NMN (green) bound E. faecal is, N- terminal domain (PDB: 1 TA8) and NAD+ (red) bound E faecalis N-terminal domain (PDB: 1 TAE) in E faecal is NAD+ -dependent DNA ligase. Subdomain 1 b is shown in cyan. Both the molecules represent the crystallographic snapshots of adenylation domain (domain 1) of EfaLigA. In EfaLigA, subdoamin 1 a moves over Ib (Shown by arrow) to generate the complete active site for NAD+ (Gajiwala and Pinko, 2004).

21

Fig 1.9: NAD+ recognition in Enterococcus (aecalis NAD+ ligase adenylation domain

Cha ter I

Interactions of bound NAD+ with conserved residues of domain Ia and lb in closed conformation in crystal structure of adenylation domain of Enterococcus faecafis NAD+ ligase, EfaLigA (PDB : 1 T AE). Figure was adapted from Gaj iwala, K. , and Pinko, C. (2004) Structure 12, 1449-1459.

22

Cha ter 1

(Lys115, Glu173, Lys290, and Lys314); (ii) comprise the inter ace between the NMN

binding domain (subdomain Ia) and the nucleotidyltransferase domain or comprise part of

a nick-binding site on the surface of the nucleotidyltransferase domain (Arg200 and

Arg208); or (iii) stabilize the active site fold of the nucleotidyltransferase domain

(Arg277). Analysis of mutational effects on the isolated ligase-adenylylation and

phosphodiester formation reactions revealed different functions for essential side chains at

different steps of the DNA ligase pathway, consistent with the proposal that the active site

is serially remodeled as the reaction proceeds. Crystal structure of TfiLigA and that of

adenylation domain of EfaLigA suggest that adenylation domain exists in 'open' and

'closed' conformations (Figure 1.8) through its two subdomains Ia and Ib between

subsequent adenylation steps of enzyme and substrate.

1.3.2 Oligomer binding fold domain

In all DNA ligases adenylation domain 1 is connected to a conserved domain 2

(Figure 1.6). The T7 ATP ligase and TfiLigA structures (Subramanya eta/., 1996; Lee et

a/., 2000) revealed that this domain has an OB-fold, a derivative of a Greek key motif also

found in the structures of many proteins that bind to single-stranded and double-stranded

( ds) DNA and RNA (Suck, 1997). This fold is found in diverse range of protein families,

including the bacterial ribosomal proteins S1 (Bycroft eta/., 1997) and S17 (Jaishree et

a/., 1996), the subunits of replication protein A (Bochkarev et a/., 1999), the telomerase

end-binding protein (Horvath et a/., 1998), bacterial cold shock proteins CspA and CspB

(Schindler et a/., 1998), translation initiation factor (IF) SA (Peat et a/., 1998) etc. A

structural comparison is shown in figure 1.1 0. A number of co-crystal structures of these

domains bound to DNA and RNA have established that the OB fold mediates

polynucleotide recognition (Suck, 1997). Biochemical studies have shown that the OB

domain of T7 ATP ligase binds dsDNA and also dramatically enhances the adenylation

activity of domain 1. A direct physical interaction between these domains has been

demonstrated by gel filtration (Doherty and Wigley, 1999). Compared to TfiLigA, the

equivalent domain in T7 DNA ligase is a much shortened version of the OB-fold. Also,

compared with TfiLigA, its orientation in the non-covalent ATP complex is rotated around

the loop just before the first strand of the OB-fold domain so that it's expected DNA

binding surface is not exactly facing active site (Subramanya et al., 1996). This may be

understandable, because the adenylation site should not be blocked by binding the DNA

23

Cha ter 1

until the conserved lysine at the active site is adenylated. This orientation may change

upon self-adenylation so that the putative DNA-binding groove will be completed.

Recently discovered NAD+ ligases from entomopoxviruses have also had complete OB

fold domain (Sriskanda et al., 2001). Since T7 DNA ligase with a more compact OB-fold

domain is fully functional, it is suggested that domain 1 together with the OB-fold domain

is the minimal unit for the bacterial DNA ligases and that this minimal ligase should have

the nick sensing as well as ligation activities which is supported further by recent finding

that the 298 residue Ch V ATP ligase, the smallest eukaryotic DNA ligase known, has

intrinsic specificity for binding to nicked duplex DNA (Odell and Shuman, 1999). Site

directed mutagenesis studies on human DNA ligase III have also implicated this motif in

the interaction with nicked DNA (Mackey et al., 1999).

1.3.3 Zinc finger Motif

Four cysteine residues are conserved in the C-terminal region ofNAD+ -dependent

ligases and they have been implicated in zinc binding and interaction with DNA. In case

of ATP ligases only hu III has been reported so far to have a zinc finger motif at N

terminal (Figure 1.5). Atomic emission spectroscopy confirmed that TfiLigA binds zinc

ions (Lee et al., 2000). In the TfiLigA a Zn ion is tetrahedrally liganded by the four

conserved cysteine residues (Cys406, Cys409, Cys422 and Cys427). This single Zn finger

forms a subdomain (3a) of the larger domain 3 ofTfiLigA (Figure 1.6). The overall fold of

this zinc finger is similar to other Cys4-type zinc fingers in the DNA binding domains

(DBD) of different steroid I nuclear hormone receptor families such as estrogen receptor

(Figure 1.10) (Klug and Schwabe, 1995; Mackay and Crossley, 1998). Such DBDs of

these protein families can also bind to cognate or non-cognate DNA targets as a monomer

(Gewirth and Sigler, 1995). Conceivable roles for the TfiLigA zinc finger motif

(subdomain 3a) may include a direct interaction with the nicked DNA as well as a

structural support for helix-hairpin-helix motifs and BRCT domain which form subdomain

3b and domain 4. This suggestion is largely consistent with the results of mutagenesis of

the zinc-coordinating cysteines, which abolished the DNA-binding activity in TfiLigA

(Jeon et al., 2004) as well as Thermus thermophilus DNA ligase (Luo and Barany, 1996).

The possibility that the Zn finger in NAD+ ligases may be involved in recognizing the nick

in duplex DNA deserves study as it has been demonstrated that the human DNA ligase III

Zn finger forms a specific complex with a nick in duplex DNA (Mackey et al., 1999).

24

T:fi Ligase RPA14

Tfi Ligase ERDBD

;,;i 0-P R-S

r··

~ I •.. ,GII4

~\ Poll3 RuvA

aspartyl tRNA synthetase

GRDBD

\! U-V

. q endonuclease III

Cha ter 1

IFl

GATA-1

~\J X-Y

j£ AlkA

Fig 1.10: Comparisons of different domains present in NAD+ ligases

(A) OB-fold domains of Tfi ligase, human replication protein A subunit (RPA14), yeast aspartyl tRNA synthetase and E.coli translation initiation factor 1 (IFl) are shown in similar orientations.

(B) Cys4-type zinc fingers of Tfi ligase, human estrogen receptor DNA binding domain (ER DBD), rat glucocorticoid receptor (GR DBD) and chicken erythroid transcription factor (GATA-1) are shown in similar orientations.

(C) Similarly, HhH motifs of Tfi ligase (Top row) are shown in similar orientations with HhH motifs from other sources indicated in figure.

25

Cha ter 1

1.3.4 Helix-hairpin-Helix motif domain

Doherty et a!. (1999) predicted the presence of four copies of conserved helix

hairpin-helix (HhH) motifs in the C-terminal region of NAD+ ligases. So far all the

eubacterial NAD+ ligases contain these motifs but the number of helices varies from

species to species. TfiLigA provides a unique example in which the four clustered HhH

motifs form a single compact structure (subdomain 3b). These are helix pairs 0-P, R-S, U

V and X-Y with the intervening hairpins (residues 430-460, 474-498, 502-528 and 537-

560, respectively) (Figure 1.1 0). Interestingly, all the hairpins are located in a linear chain

at the bottom of this subdomain. This surface is also rich in positively charged residues.

Similar HhH motifs are present in a number of DNA repair enzymes (Doherty et al., 1996,

Arvind et al., 1999), including E coli endonuclease III (Thayer et al., 1995). E. coli AlkA

(Labahn et al., 1996) and human polymerase f3 (Pol f3) (Mullen and Wilson, 1997). A

structural comparison is shown in figure 1.1 0. HhH motifs has been implicated in non

sequence specific DNA binding (Thayer et al., 1995; Doherty et al., 1996). In TfiLigA this

subdomain is suggested to provide one of the two DNA-binding sites in TfiLigA.

1.3.5 BRCA 1 like C-terminal domain

The final structural motif currently found in both ATP and NAD+ -dependent DNA

ligases is a member of the BRCA 1 like C-terminal domain superfamily (Bork et al., 1997;

Callebaut and Momon, 1997). BRCT domains are present in NAD+ -dependent ligases and

eukaryotic Iigases III and IV. The structure ofTfiLigA (Lee et al., 2000) is the first case in

which a BRCT domain has been seen as part of a four-stranded parallel f3-sheet flanked by

three a-helices. This fold is grossly similar to that of the C-terminal BRCT domain of the

human repair protein, XRCCI (Zhang et al., 1998). XRCC1 is a multidomain protein

involved in the repair of single-strand breaks in DNA. The most significant characteristic

of the TfiLigA BRCT domain is its high mobility as a whole.

The BRCT domain present in LigA is a distinct version of its kind and is shared by

the large subunits of eukaryotic replication factor C and P ARP (Poly ADP-ribose

polymerase) (Bork et al., 1997). Evolutionarily, it must be the ancestor of eukaryotic

BRCT domains. Mammalian XRCC 1 forms repair complexes with DNA ligase III, P ARP

and Pol p. The two BRCT domains of XRCC1 interact with PARP and DNA ligase III,

while the N-terminal domain of XRCCI interacts with Pol f3 (Marintchev et al., 1999). ·

The XRCC 1 C-terminal BRCT domain forms a specific heterodimer in vitro with the

26

Cha ter 1

BRCT domain of mammalian DNA ligase Ilia (Nash eta/., 1997). All these available data

suggest a plausible scenario for NAD+ ligases function: after other DNA repair

proteins/enzymes recognize and repair damaged DNA, it is recruited to the nick site for

ligation through protein-protein interactions with its BRCT domain. However, BRCT

domain being involved in other uncharacterized functions can not be ruled out.

27

chapter! introduction - shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/23820/8/08... · 2018....

Documents