final gsk john a l short report

23

Upload: john-alexander-logan-short

Post on 07-Aug-2015

22 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Final GSK John A L Short Report
Page 2: Final GSK John A L Short Report

Page 1 of 23

Modified fermentation conditions to enhance recombinant protein and plasmid production in E. coli John A L Short

Abstract Soluble recombinant proteins and high plasmid yields are in high demand at GSK. Selected fusion partners, host strains and chaperones (Takara) were the approaches evaluated to enhance the production of recombinant proteins in E. coli. His-MBP was shown to be adept at increasing soluble protein expression compared to a number of other fusion partners. Rosetta-gami 2 proved to be the optimum host strain for the proteins studied. Takara chaperones had varying degrees of impact. For the majority of the proteins they were detrimental to soluble protein expression. The chaperone teams dnaK-dnaJ-grpE-groES-groEL and dnaK-dnaJ-grpE increased expression the most when induced at the same time as the target protein. After the evaluation of several different media, it was shown that TURBO supported the highest specific yield of plasmid DNA. MEG had the highest volumetric yield of plasmid DNA.

Glossary E. coli: Escherichia coli O: Origami 2 IPTG: Isopropyl-beta-D-thiogalactopyranoside MCS: Microbial Culture Sciences GST: Glutathione S-transferase His: hexa histidine-tag, (His6) SUMO: Small Ubiquitin-related Modifier GFP: Green fluorescent protein MBP: Maltose Binding Protein AE: Artic Express RG: Rosetta-gami 2 OD: Optical density SDS-PAGE: sodium dodecyl sulfatepolyacrylamide gel electrophoresis

Introduction Recombinant protein expression is one of the integral technologies employed by the pharmaceutical industry as part of the drug discovery process. These proteins are frequently required at different stages from target validation to late stage manufacturing where the recombinant protein itself may be the therapeutic molecule. GlaxoSmithKline (GSK), Microbial Culture Science (MCS) uses E. coli as its major expression system due its ability to grow rapidly and at high cell density. Proteins can be produced on a large scale cheaply and in a short amount of time compared to other expression systems (1). Recombinant proteins are expressed in E. coli either as aggregated proteins in inclusion bodies or as soluble proteins. In the soluble form, the recombinant protein is typically in the correct conformation and often biologically active, thus simplifying downstream processing. Misfolded insoluble proteins can be refolded but this

is a complex, time consuming, expensive process that does not always yield an active protein (2). This is a significant bottleneck in drug discovery. Many potential protein drug targets such as kinases, phosphatases, membrane-associated proteins and many other enzymes are extremely difficult to produce as soluble proteins in E. coli (3). In addition to protein production, E. coli can be used to generate plasmids for example for use in the transfection of eukaryotic cells for the expression of soluble proteins which otherwise could be difficult to express in E. coli. E. coli is not suitable for the production of many large, complex proteins containing disulfide bonds, or proteins that require post-translational modifications (4). It is important to develop new protocols to obtain high-quality plasmids at high yield, as there is little published information on large-scale plasmid production (5). There are two main strategies for overcoming the limitations of E. coli to produce soluble proteins. The first involves

Page 3: Final GSK John A L Short Report

Page 2 of 23

refolding from inclusion bodies and the second approach is modifying the expression strategy (6) (see Figure 1). Traditional methods have been to modify the expression strategy, altering the culture medium, incubation temperatures, and changing the induction regime (4). In recent years there has been an explosion in knowledge regarding the genetic structure of E. coli which has led to new tools being developed. This report will focus on several of these approaches. Fusion partners and Tags The majority of research in recent years has been around improving the solubility of proteins through the use of different tags. These tags are either proteins (fusion partners) or peptides that are fused to the protein of interest which may lead to enhanced folding and solubility of the protein (Figure 2).

Figure 2. Protein folding pathways with fusion partners. This leads to increased solubility of target protein or continued insolubility of target protein (3)

Many of these tags are used in protein purification as affinity tags. However, it has been reported that with certain tags such as His, these may be detrimental to protein solubility (7). In combination with certain other tags (Table 1), the His tag has been reported to aid protein folding and significantly improve purification. For example, one disadvantage with MBP is that for protein purification, the MBP does not always sufficiently bind to amylase protein matrix. It is usually not possible to obtain protein of sufficient quality in a single affinity step, so the His tag is used as a supplementary tag (8). All of the tags used in this work are situated on the N-terminus of the target protein. This confers increased efficiency of the ribosomes for translation initiation, thus increasing protein expression compared to using C-terminus fusion partners (9). Host strains Over the past 20 years, many specialised host strains have been developed to overcome a variety of metabolic problems related to high level protein expression (10). Protein degradation was partly countered by the development of the BL21 (DE3) strain by Novagen, which is deficient in both lon and ompT proteases. This remains the gold standard today, with its derivative BL21*(DE3) containing the rne131 mutation. This encodes the truncated RNase E enzyme, lacking the ability to degrade mRNA. This results in increased mRNA stability and thus increased protein expression (10). The Origami 2 (Origami from here on in) strain has mutations in the thioredoxin reductase (trxB) and glutathione reductase (gor) genes, which enhances disulfide bond formation in the cytoplasm (10). The Rosetta-gami 2 (Rosetta-gami from here on in) host strain is based on the Origami strain, enhancing disulfide bond formation in the cytoplasm when recombinant proteins are expressed. Rosetta-gami contains the pRARE2 plasmid from the Rosetta 2 strain to alleviate codon bias. By supplying rare tRNAs for codons rarely used in E. coli, the plasmid provides for “universal” translation, which would otherwise be

Figure 1. Downstream applications employed to obtain soluble proteins from recombinant E. coli (2)

Page 4: Final GSK John A L Short Report

Page 3 of 23

Table 1 Fusion Partner (FP)/ Tag

Advantages

Disadvantages

GST FP

Efficient translation initiation High metabolic burden

Inexpensive affinity resin Homodimeric protein

Mild elution conditions Target protein often inactive when FP removed or precipitates

(His6) MBP FP

Efficient translation initiation High metabolic burden

Inexpensive affinity resin Target protein often inactive when FP removed or precipitates

Enhances solubility

Mild elution conditions

(His6) SUMO FP

Efficient translation initiation Not by itself an affinity tag

Might enhance solubility

FLAG Tag

Low metabolic burden Expensive affinity resin

High specificity Harsh elution conditions

His6 Tag

Low metabolic burden Specificity of IMAC is not as high as other affinity methods

Inexpensive affinity resin

Mild elution conditions

Tag works under both native and denaturing conditions Does not enhance solubility (?)

GFP Online real time monitoring of expression Metabolic burden on cell

The advantages and disadvantages of using certain fusion partners and tags (8)

limited for those that are frequently used in eukaryotic protein expression (10). Arctic Express is derived from BL21 (DE3), engineered to express cold-adapted chaperone Cpn60 and co-chaperone Cpn10 from the psychrophilic bacterium, Oleispira antarctica. These chaperones confer an enhanced ability to the host strain to grow at lower temperatures and properly process recombinant proteins, thus increasing the amount of available soluble protein (11).

Chaperones Another approach for the prevention of inclusion body formation is the co-over expression of molecular chaperones. Chaperones are proteins whose function is to assist other proteins in achieving their correct conformation through assisting non-covalent folding or to prevent protein aggregation following expression. They are also heat shock proteins, used at times of cellular stress like high temperatures to maintain protein conformations, but in this instance have been manipulated to aid expression (6). To enable efficient expression, the molecular chaperones are designed as “chaperone teams” that work in cooperation to aid protein folding (Table 2).

Table 2

Plasmid Chaperone Team Promoter Inducer

pG-KJE8

dnaK-dnaJ-grpE araB L-Arabinose

groES-groEL Pzt1 Tetracyclin

pGro7 groES-groEL araB L-Arabinose

pKJE7 dnaK-dnaJ-grpE araB L-Arabinose

pG-Tf2 groES-groEL-tig Pzt1 Tetracyclin

pTf16 tig araB L-Arabinose

Takara Plasmid Chaperone set (12). The chaperones are in specific chaperone teams (see text).

The chaperone teams are based on the DnaK chaperone system and the GroE chaperone system that affect protein aggregation and refolding respectively. The DnaK chaperone system includes the dnaJ and grpE chaperones on the basis that these three proteins are known to act in concert, whilst the same applies to groES and groEL chaperones which constitute the GroE chaperone system. These systems are described in Figure 3. DnaK and dnaJ act by binding transiently to exposed regions in unfolded or partially folded peptide chains and releasing them in an unfolded form, requiring grpE for folding the protein. GroES and groEL act by encapsidating the protein, providing an isolated environment for which it to fold in as well as actively assisting protein conformation (13) (Figure 3).

Page 5: Final GSK John A L Short Report

Page 4 of 23

Plasmid Yield Optimization High plasmid yields are obtained by using E. coli for plasmid generation which is necessary for example, for the large scale transfection of eukaryotic cells. Eukaryotic cells are used as expression systems for the production of many proteins including membrane bound proteins and nuclear receptor proteins destined for high throughput screening (HTS) against potential drug compounds. A typical batch required for HTS is 1x109 cells successfully transfected. The typical transfection efficiency is 80%, thus high plasmid yields are required to obtain sufficient numbers of transfected cells (14). Plasmid yield is typically improved by optimizing the growth conditions of E. coli under selective pressure for the plasmid (15). Some groups use MEG as their medium of choice, whereas at MCS Harlow evaluations are made of several different media. From the literature, it has been reported that changing the medium composition can dramatically alter the

plasmid yield obtained (5). By comparing different media under different growth temperatures (using the same host strain), a comparison of approaches for optimising plasmid yield was made.

Materials and Methods

Targets to be Evaluated To evaluate soluble expression under different conditions “in-house” constructs were used. Eleven proteins (A-K) were analysed using selected approaches depending on the solubility issues for each protein. Plasmid X for plasmid yield optimisation was obtained “in-house”. Chaperones The chaperone plasmids encoding the chaperone teams were obtained from the

Figure 3. Diagram of the DnaK and GroE chaperone systems in E.coli (12). As newly synthesized proteins leave the ribosome they associate with the trigger factor chaperone. Trigger factor sequesters the exposed hydrophobic patches preventing incorrect inter and intra molecular interactions. DnaK is targeted to high-affinity sites by the co-chaperone DnaJ, which activates tight substrate binding by triggering hydrolysis of DnaK-bound ATP. Substrate ejection is controlled by GrpE-catalyzed ADP/ATP exchange. Once released, a newly synthesized protein may reach a native conformation, proteolysed or be transferred to the downstream GroE system which handles about 10% of newly synthesized host proteins. GroEL is an oligomer organized as two stacked homoheptameric rings, one of which is always bound by the co-chaperone GroES. GroEL substrates, are bound by the free ring and allowed to fold within the central chamber in a process controlled by reversible GroES capping and conformational changes orchestrated by ATP binding and hydrolysis. This leads either to a protein in its correct conformational state or to proteolysis if this fails (13).

Page 6: Final GSK John A L Short Report

Page 5 of 23

company Takara developed by the HSP Research Institute (12) (see Table 2). Host Strains The plasmids encoding the genes for the target proteins were transformed (16) into the relevant host strain (Table 4). Host strains BL21* (DE3), BL21 (DE3), Rosetta-gami, Origami and Top10 were purchased from Invitrogen. Proteins F-K were co-transformed with the chaperone plasmids (Table 3) into BL21* (DE3) using the Takara transformation protocol (12). Co-transformation was not advised by Takara, but this was performed in an effort to reduce timelines. 1.5μl Beta-Mercaptoethanol was added to prime the host strains for transformation (11). Plasmid X was transformed (16) and generated in Top10 for initial small scale evaluation and for 2.5litre large scale optimisation using MEG culture medium. Basic fermentation protocol for evaluation with Tags, host strains and chaperones The target protein was expressed using batch fermentation in 500ml Erlenmeyer flasks with 100ml medium using the conditions in Table 4. Media used: “In-house” media, LB and MTB. Commercial media: Power, Superior, Turbo, and Hyper, all called Athena media and supplied by Stratech Scientific. Circle Grow was supplied by MP Biochemicals.

Production flasks: All production flasks were inoculated with 2% (v/v) of seed, relevant antibiotic, and either 1% (v/v) glucose (MTB/LB), 0.5% (v/v) glycerol (Power/Turbo), or 5% (v/v) glucose nutrient mix (Hyper). The carbon source for Superior and Circle Grow media was not added, as it is already included as dry components. Flasks were incubated at the relevant temperature (Table 4) and shaken at 240rpm (5cm throw) until induction. Induction: In all cases, expression was initiated using appropriate IPTG addition (Table 4). Post induction temperatures

were either 37ºC (3 Hours), 25ºC (20 Hours) or 18ºC (20 Hours). When expressing the target protein with the chaperone teams, the chaperone teams were induced at inoculation and compared to induction of them at the same time as the target protein itself. Two different concentration levels of inducer for the appropriate chaperone teams (Table 2) were examined. L-Arabinose was tested at 4mg/ml and 0.5 mg/ml. Tetracycline was tested at 10ng/ml and 1ng/ml. Analytical samples: 1 ml samples were removed for analysis at various time points during the fermentation and at harvest. Samples were centrifuged at 13.2Krpm for 1.5 minutes. Supernatant was discarded and pellets frozen. Samples were prepared by either adding OD (600nm)/10 of SDS to the pellet (total), or to evaluate soluble protein OD (600nm)/10 of BugBuster Mastermix (Novagen). In the latter instance, samples were centrifuged at 13.2Krpm for 20 minutes at 4ºC. The lysate was analysed with an equal amount of SDS-Reduced (soluble), and also 200μl SDS-Reduced added to the pellet (insoluble). Shake flask seeds: All shake flask seeds (LB) were grown at 37ºC overnight at 240rpm. The medium also contained 1% (v/v) glucose and the antibiotic selection marker. SDS-PAGE: Novex 4-20/10-20% Tris/Glycine gels were run in 10% Laemelli electrode buffer and subsequently stained using Coomassie blue. Western Blot: Gels were prepared as above and semi-dry blotted for 30 minutes. The blot was incubated overnight in 1g of Marvel in 20mls PBS & 1% (v/v) Tween. A dilution of relevant primary antibody was added and incubated for 2 hours, followed by washing with PBS & 10% (v/v) Tween. The 1:1000 relevant secondary antibody was added (1:1000 dilution) and incubated for a further 2 hours, followed by washing. 1ml ECF substrate (fluorescence substrate) was added and viewed under UV light. Protein band of interest (as determined by molecular weight) was excised from the SDS-PAGE gel and digested with trypsin.

Page 7: Final GSK John A L Short Report

Page 6 of 23

Table 4

Protein

Fusion Partner/ Tag

Host Strain

Media

Carbon Source

Induction (OD600nm)

IPTG Concentration (mM)

Incubation Temperature (ºC)

Fusion Partners/ Tags Evaluation

A

His, GST, MBP-His, FLAG, Untagged BL21* (DE3)

LB, MTB, POWER, TURBO

Glucose/ Glycerol 2 1.0/0.5/0.1 37/25

B

His, GST, MBP-His, FLAG, SUMO-His, GFP BL21* (DE3)

LB, MTB, POWER, TURBO

Glucose/ Glycerol 2 1.0/0.5/0.1 37/25/18

Host Strain Evaluation

B His

BL21* (DE3), BL21 (DE3), RG, O, AE LB

Glucose/ Glycerol 0.7/1.7 0.5 37

C GST

BL21* (DE3), BL21 (DE3), RG, O, AE LB

Glucose/ Glycerol 0.7/1.7 0.5 37

D GST

BL21* (DE3), BL21 (DE3), RG, O, AE LB

Glucose/ Glycerol 0.7/1.7 0.5 37

E His

BL21* (DE3), BL21 (DE3), RG, O, AE LB

Glucose/ Glycerol 0.7/1.7 0.5 37

Chaperone Evaluation

B His BL21* (DE3) LB Glucose 0.3 0.5 25

F His BL21* (DE3) LB Glycerol 2.7 0.5 18

G His BL21* (DE3) LB Glycerol 1 0.5 18

H Untagged BL21* (DE3) LB Glycerol 1.4 0.5 18

I Untagged BL21* (DE3) LB Glycerol 1.4 0.5 18

J His BL21* (DE3) LB Glycerol (2%) 0.4 1 37

K His BL21* (DE3) LB Glycerol 0.4 0.5 25

Experimental conditions for evaluating approaches for increasing soluble protein expression.

The peptide fragments were separated by MALDI-TOF MS, and the sequence of the protein confirmed by comparison to an “in-house” protein sequence database. This process is called Peptide mass fingerprinting (PMF) (17). Protein C and D small scale GST-Purification Protein concentrations were determined by the Bradford assay (18) and normalised for protein concentration of cells so comparisons could be made of purity. The samples were then purified using one Step GST purification (19) with a resin matrix, ran on an SDS-PAGE gel at 1:4 SDS to Protein ratio. Protein C Scale Up and Purification Protein C was scaled up with the optimized host strain found from small scale optimisation. This was expressed

and scaled up in 20 500ml Erlenmeyer flasks with 100ml medium using the optimized condition found previously, and the cells were harvested. 10g of this material was purified using a large scale GST one step purification column (20). 10x 1ml elutions were collected, processed and ran on an SDS-PAGE gel at a 1:4 SDS to Protein ratio. Protein E His Purification Protein E was purified using the Ni-NTA slurry Qiagen protocol (20) Protein H Scale Up Protein H was scaled up along with a no chaperone control in 3 Litres using 30 500ml Erlenmeyer flasks with 100ml medium using the optimized condition found and the cells were harvested. Samples were removed and run on an SDS-PAGE gel.

Page 8: Final GSK John A L Short Report

Page 7 of 23

Plasmid X Yield Optimisation Plasmid X transformed into E. coli Top 10 was grown at 37oC (8 hours), 30oC (20 hours) and 25oC (20 hours) in LB, MTB, Superior, Hyper Power, Turbo and Circle Grow media with appropriate carbon source (glucose/glycerol) in 500ml Erlenmeyer flasks with 100ml medium shaken at 240rpm (5cm throw). All samples were taken in triplicate in 2ml Eppendorfs and purified using Promega Wizards Plus MiniPrep Protocol (20) and analysed using a NanoDrop spectrophotometer supplied by LabTech (21). Cells were harvested using Sorvall RC 12 BP and plasmids purified from 10g of cell paste using Qiagen GigaPrep protocol (19). A miniprep sample (2ml scale) was also generated and analysed using a NanoDrop spectrophotometer.

Results Fusion Partners and Tags Protein A The optmised expression conditions were identified using SDS-PAGE, and shown below in Table 5 Table 5

Table of optimised conditions found for constructs expressing Protein A

The samples of the above optimised conditions were run on the same gel (Figure 4). Lane 4 shows the most significant over expressed band at the expected molecular weight for Protein A compared to other tags. This indicates that Protein A, when fused with His-MBP, has greater soluble expression than that of the other tags examined. His and FLAG tags appeared to reduce soluble protein

M 1 2 3 4 5

Figure 4. SDS-PAGE Gel of soluble protein expressed for the optimised conditions of each Protein A construct M: SeeBlue Plus II Protein Marker Lane 1: Protein A untagged: 24kDa Lane 2: His-GST-Protein A: 60kDa Lane 3: His-Protein A: 32kDa Lane 4: HIS-MBP-Protein A: 74kDa Lane 5: FLAG-Protein A: 34kDa

expression compared to the untagged version of Protein A. Glycerol was the best carbon source for all of the constructs evaluated. Protein B (His) In terms of total cell protein, bands of Protein B were observed for all of the constructs and identity confirmed by PMF. However, no soluble protein was observed on the SDS-PAGE gel. Western Blots confirmed that no soluble protein had been expressed at detectable levels.

Host Strains Protein B (His) No soluble Protein B was expressed at detectable levels using any of the host strains. Protein C (GST) Following expression of Protein C using the different host strains, the samples

Protein A Fusion Partner/ tag

Medium Carbon Source (0.5% v/v)

Induction IPTG conc. (mM)

Induction Temp. (ºC)

Untagged LB Glycerol 1.0 37

GST-HIS LB Glycerol 0.5 25

HIS POWER Glycerol 1.0 25

MBP-HIS MTB Glycerol 1.0 25

FLAG TURBO Glycerol 0.5 37

Page 9: Final GSK John A L Short Report

Page 8 of 23

1 2 3 4 5 6 7 8 9 10 M P Pl L B

1 2 3 4 5 6 7 8 9 10 M P Pl L B

were processed and analysed on an SDS-PAGE gel. Results showed that soluble protein had been expressed (confirmed by PMF). The lysate samples were normalised for protein concentration using the Bradford Assay and purified using GST one Step Purification. The eluted protein samples were run on SDS-PAGE (Figure 5). Conditions employing glucose as the carbon source showed poorer expression in comparison to conditions using glycerol.

Figure 5. SDS-PAGE gel of eluted samples post GST One step Purification (1:4 SDS: Protein ratio) M = SeeBlue Plus II Marker P = pre-induction IPTG, 1= BL21* (DE3) star; glycerol; induced at OD 0.7 2= Rosetta-gami; glycerol; induced at OD 0.7 3= Arctic Express; glycerol; induced at OD 0.7 4= Origami; glycerol; induced at OD 0.7 5= BL21* (DE3); glycerol; induced at OD 1.7 6= Rosetta-gami; glycerol; induced at OD 1.7 7= Arctic Express; glycerol; induced at OD 1.7 8= Origami; glycerol; induced at OD 1.7

Eluted sample number 6 (Figure 5) contained the least amount of contaminating bands (caused possibly by incomplete translation, or clipping by enzyme degradation of protein). Thus the optimised conditions for Protein C are LB, 0.5% (v/v) glycerol, were grown at 37oC reduced to 18oC, (0.5Mm IPTG at 1.7 OD) in Rosetta-gami. This condition was scaled up to 2 litres. 10g of the cells generated were purified using a one step GST column, and compared with 10g of cells from the original condition (BL21 (DE3)). The samples were run on SDS-PAGE (Figures 6 and 7). Focusing on the circled areas there are fewer bands surrounding the Protein C with Rosetta-gami (Figure 7) than with

BL21 DE3 (Figure 6). These bands had interfered in previous attempts at purification, thus Rosetta-gami offers a “cleaner” product which can potentially aid purification. Figure 6. SDS-PAGE gel (1:4 SDS: Protein ratio) of Original Condition (BL21 DE3) for Protein C post GST One Step Column purification M = SeeBlue Plus II Marker P = pre-induction IPTG, Pl =Pre lysis, L = Lysate B = Blank

Figure 7. SDS-PAGE gel (1:4 SDS: Protein ratio) of Host strain optimised condition (Rosetta-gami) post GST One Step Column purification M = SeeBlue Plus II Marker; P = pre-induction IPTG; Pl = Pre-lysis, L = Lysate; B = Blank

Protein D (GST) Similar results were obtained for Protein D where Lane 8 (Figure 8). Rosetta-gami was the best host strain to use, having fewer contaminating bands. The optimal condition was LB, 37oC to 18oC at OD0.5, (1.0mM IPTG added).

M P 1 2 3 4 5 6 7 8

Page 10: Final GSK John A L Short Report

Page 9 of 23

Table 6

Table displaying the effects of the chaperone teams on the proteins examined. (-): No soluble expression (): Improvement compared to control without foldase (=): Equivalent to control with foldase (X): No improvement

(): better than control with foldase

Figure 8. SDS-PAGE gel of eluted samples (1:4 SDS: protein ratio) of Protein D M=SeeBlue Plus II Marker P= Pre IPTG Control Sample NAF=Non-Absorbed Fraction 1= BL21* (DE3); glycerol; induced at OD 1.5 2= BL21 (DE3); glycerol; induced at OD 1.5 3= Rosetta-gami; glycerol; induced at OD 1.5 4= Origami; glycerol; induced at OD 1.5 5= Arctic Express; glycerol; induced at 1.5 6= BL21* (DE3); glycerol; induced at OD 0.5 7= BL21 (DE3); glycerol; induced at OD 0.5 8= Rosetta-gami; glycerol; induced at OD 0.5 9= Origami; glycerol; induced at OD 0.5

Protein E (His) The best condition for increasing soluble expression and producing a “cleaner” sample was using the host strain Rosetta-gami, medium MTB with glycerol as carbon source induced at 1.7 OD and incubated at 18oC for 20 hours. Unfortunately, mass spectrometry showed that Protein D was deficient in size by around 2.7kDa. The construct was remade.

Takara Chaperones For the majority of the proteins analysed, the chaperone teams had a detrimental effect on protein expression, reducing

expression compared to that of the control (protein expressed without one of the chaperone teams (Table 6)). The chaperones did not enable the expression of Protein B in soluble form. Protein F The best condition for Protein F expression was the chaperone team dnaK-dnaJ-grpE-groES-groEL induced with the target protein. The optimal level of inducers for the team were L-arabinose and tetracycline at 4mg/ml and 10ng/ml respectively. Protein G Protein G had equivalent soluble expression when expressed with the optimal chaperone team dnaK-dnaJ-grpE-groES-groEL compared to the foldase control (Figure 10).

Target Protein/ Chaperone team

Protein F (His)

Protein G (His)

Protein H (untagged)

Protein I (untagged)

Protein J (His)

Protein H (His)

Protein B (His)

dnaK-dnaJ-grpE-groES-groEL

=

=

=

X X X -

groES-groEL X X X X -

dnaK-dnaJ-grpE =

X X X -

groES-groEL-tig X X X X -

tig X X X X -

1 2 3 4 5 6 7 8 9 M P NAF

62kDa

M P 1 C1 C2 1

Total Protein Soluble Protein

Figure 10. SDS-PAGE Gel of Protein G His tagged M= SeeBlue Plus II Marker P=pre-induction IPTG Sample C1=Control 1: Target Protein expressed with foldases only C2=Control 2: Target Protein expressed without chaperones and foldases 1: Target protein expressed with chaperone team dnaK-dnaJ-grpE-groES-groEL (induced with target protein)

49kDa

98kDa

Page 11: Final GSK John A L Short Report

Page 10 of 23

The optimised levels of inducers for the team were L-arabinose and tetracycline at 4mg/ml and 10ng/ml respectively, induced at the same time as the target protein.

Protein H After completing small scale optimisation studies the optimal chaperone teams found were dnaK-dnaJ-grpE-groES-groEL and dnaK-dnaJ-grpE. The optimal levels of inducers for the teams were L-arabinose and tetracycline at 4mg/ml and 10ng/ml respectively, induced at the same time as the target protein. These conditions were scaled up to 3 litres (shake flasks) and the cells harvested. Chaperone team dnaK-dnaJ-grpE was the optimal team offering the greatest increase in soluble expression and a “cleaner” protein compared to both of the controls (Figure 11).

Figure 11. SDS-PAGE Gel of Protein H untagged M= SeeBlue Plus II Marker P=Pre IPTG Sample C1=Control 1: Target Protein expressed with foldases only C2=Control 2: Target Protein expressed without chaperones and foldases (3 Litre Scale Up) A: Target protein expressed with chaperone team dnaK-dnaJ-grpE-groES-groEL (3 Litre Scale Up) (induced with target protein) B: Target protein expressed with chaperone team dnaK-dnaJ-grpE (induced with target protein) (3 Litre Scale up) 40g of cell paste was harvested from both of the chaperone teams and 80g was harvested from Protein H expressed by itself without chaperones or foldases in BL21* (DE3).

Plasmid Optimisation Initial Small Scale Optimization Improving yield and quality of plasmid DNA (pDNA) will reduce the amount of cell paste and number of GigaPreps required. Stocks of pDNA are required for transient transfections and frozen cell supply. Twenty four conditions were evaluated for Plasmid X in Top 10 (Table 7). Condition number 8 (red text) represented the previously used condition. However, this experiment showed that moving to condition 13 (blue text) was clearly favoured supporting an almost 2-fold increase of in plasmid yield from 193.2ng/μl to 391.0ng/μl. Thus the best condition was using TURBO medium incubated at 37oC for 8 Hours. Glycerol was the optimum carbon source used. Approximately 50% lower plasmid yields were obtained with LB and glucose than LB with glycerol. Optimisation and comparison of TURBO medium with MEG medium It was established that another group in GSK used MEG medium for plasmid production (not previously evaluated by Harlow MCS). Thus, a comparison of MEG and TURBO media was carried out in shake flasks. Pellets (2ml) were used for plasmid generation (miniprep scale) and elutes were analysed using a NanoDrop Spectrophotomer. TURBO medium surpassed MEG at most temperatures, with TURBO at 37oC for 12 hours offering the best plasmid yield at 531.9ng/μl compared to 504.7ng/μl for MEG (Figure 12). However, MEG did offer the best OD600 value with 7.5 compared to TURBO at 5.2 at 37oC. By dividing plasmid yield by the OD (600nm) value it was possible to determine the plasmid yield per OD unit, therefore normalising plasmid yield for cell growth (Figure 13).

M P A B C1 C2 A B

Total Protein Soluble Protein

49kDa

62kDa

98kDa

Page 12: Final GSK John A L Short Report

Page 11 of 23

Table 7

Sample Number

Media

Temperature (ºC)

Time (Hours)

Repeat 1

Repeat 2

Repeat 3

Average Plasmid conc: ng/μl

260/280nm ratio

1 LB (glucose) 37 8 44.7 44.3 35.0 41.3

1.86

2 LB (glucose) 30 20 42.3 44.4 49.7 45.5

1.87

3 LB (glucose) 25 20 34.6 100.2 67.3 67.4

1.85

4 MTB 37 8 286.7 253.8 177.8 239.4 1.85

5 MTB 30 20 205.6 134.9 161.0 167.2 1.86

6 MTB 25 20 347.7 166.1 118.6 210.8 1.87

7 Superior 37 8 123.4 342.4 155.2 207.0 1.85

8 Superior 30 20 197.8 161.8 220.0 193.2 1.85

9 Superior 25 20 127.9 109.9 110.8 116.2 1.84

10 Power 37 8 268.8 309.4 329.8 302.7 1.86

11 Power 30 20 239.9 235.0 187.2 220.7 1.86

12 Power 25 20 61.2 36.9 57.4 51.8 1.87

13 Turbo 37 8 380.6 394.9 397.4 391.0 1.85

14 Turbo 30 20 253.1 103.1 254.3 203.5 1.85

15 Turbo 25 20 140.9 109.3 107.8 119.3 1.85

16 Hyper 37 8 216.0 324.6 316.1 285.6 1.84

17 Hyper 30 20 309.9 311.5 182.5 268.0 1.86

18 Hyper 25 20 261.8 157.1 154.0 191.0 1.85

19 LB (glycerol) 37 8 219.8 294.7 234.6 249.7

1.84

20 LB (glycerol) 30 20 114.4 65.2 150.0 109.9

1.86

21 LB (glycerol) 25 20 116.0 113.0 120.6 116.5

1.86

22 Circle Grow 37 8 157.5 141.3 162.8 153.9

1.87

23 Circle Grow 30 20

19.7 48.5 30.5 32.9

1.86

24 Circle Grow 25 20 84.4 72.7 55.7 70.9

1.87

Table of miniprep results from initial small scale shake flask evaluation of different media for enhancing plasmid yields.

Comparing Plasmid yields between MEG and TURBO media at different time intervals for

Plasmid X

0.0

100.0

200.0

300.0

400.0

500.0

600.0

4 5 6 7 8 12 12 12

Incubations Time (Hours) and Temperature

Plasm

id

co

nc. (n

g/μ

l)

MEG

Turbo

37oC 37

oC37

oC37

oC37

oC37

oC 33.5

oC 30

oC

Figure 12. Graph showing plasmid yields for MEG and TURBO media at particular time intervals and temperatures (shake flask optimisation).

Comparing Plasmid yields per OD Unit between MEG and TURBO media at different time intervals

for Plasmid X

0.0

20.0

40.0

60.0

80.0

100.0

120.0

4 5 6 7 8 12 12 12

Incubation time (Hours) and Temperature

Plasm

id

co

nc. (n

g/μ

l)

MEG

TURBO

37oC 37

oC37

oC37

oC37

oC 37

oC 33.5

oC 30

oC

Figure 13. Graph showing plasmid yields per OD unit for MEG and TURBO media at specific time intervals (shake flask optimisation).

Page 13: Final GSK John A L Short Report

Page 12 of 23

Again, TURBO offered the best plasmid yield per OD unit at 37oC for 12 hours with a plasmid yield of 102.3ng/ul compared to MEG with 67.3ng/ul. TURBO was shown to be the best medium for this particular plasmid. The methodology used to determine this can be used as a generic protocol, assuming that all plasmids, regardless of vector type and insert(s), would have similar yields compared to this Plasmid X example. All the above experiments have been performed in shake flasks. However, MEG medium was previously developed for use in fermenters. Consequently, an additional experiment was conducted comparing MEG and TURBO in 2.5 Litre Infors fermenters.

Scale up of Optimised conditions using MEG and TURBO Media Scale-up in fermenters (2.5 litres working volume) using Top 10 was evaluated. Time course samples were removed at intervals (Figure 14).

Comparing Plasmid yields between MEG and TURBO media at different time intervals for

Plasmid X 2.5 Litre Fermentation at 37oC

0.0

100.0

200.0

300.0

400.0

500.0

600.0

700.0

5 (MiniPrep) 6 (MiniPrep) 7 (MiniPrep) 8 (MiniPrep) 8 (GigaPrep, 10g)

Incubations Time (Hours)

Pla

sm

id

c

on

c. (n

g/μ

l)

MEG

Turbo

Figure 14. Graph showing plasmid yields for MEG and TURBO media at specific time intervals at 37oC (2.5 Litre Scale Up).

After 8 hours, cells were harvested and plasmids purified from 10g of cell paste using a GigaPrep approach. A miniprep sample (2ml scale) was also generated. Based on plasmid yields, 8 hours was the optimum incubation time for both of the media. TURBO surpassed MEG: 5.8mg per compared to 5mg per 10g of cell paste respectively (GigaPrep). This was

confirmed by similar results obtained from the miniprep samples. In terms of yield per OD unit (Figure 15), TURBO again surpassed MEG: 116ng/µl per OD unit compared to 58ng/µl per OD unit respectively. However, MEG provided a higher OD than TURBO (at 8 hours they were 8.7 and 5.0 respectively).

Comparing Plasmid yields per OD unit between MEG and TURBO media at different time intervals for

Plasmid X 2.5 Litre Fermentation at 37oC

0.0

20.0

40.0

60.0

80.0

100.0

120.0

140.0

5 (MiniPrep) 6 (MiniPrep) 7 (MiniPrep) 8 (MiniPrep) 8 (GigaPrep, 10g)

Incubation time (Hours)

Pla

sm

id

c

on

c. (n

g/μ

l)

MEG

TURBO

Figure 15. Graph showing plasmid yields per OD unit for MEG and TURBO media at specific time intervals at 37 oC (2.5 Litre scale up). A second 2.5 Litre scale up of TURBO was performed, to compare OD obtained when using different concentrations of glycerol, 0.5% (original optimised concentration) to 1%, 1.5% and 2% (v/v). 2ml samples (in triplicate) were taken during the fermentation at 5, 6, 7 and 8 hours which were then progressed to miniprep (data not shown). No significant differences in plasmid concentration were observed when using different glycerol concentrations. The rational behind increasing the amount of glycerol added was to increase biomass generation when using TURBO medium (MEG employs 2% glycerol as standard). However, growth was not significantly different at any glycerol concentration in TURBO medium suggesting that this carbon source was not the limiting factor. Furthermore, as expected more cell paste was obtained from MEG than TURBO (80g compared to 35g respectively). The quality of pDNA was measured using the NanoDrop spectrophotometer (Table 8).

Page 14: Final GSK John A L Short Report

Page 13 of 23

Table 8

MEG TURBO

Sample Temp

oC Incubation

Time (Hours) 260/280nm

ratio

1 37 5 (MiniPrep) 1.89 1.84

2 37 6 (MiniPrep) 1.91 1.85

3 37 7 (MiniPrep) 1.88 1.84

4 37 8 (MiniPrep) 1.89 1.86

5 37 8 (GigaPrep,

10g) 1.85 1.83

Table of 260/280nm ratio sample of absorbance at 260 and 280 nm. The ratio of absorbance is used to assess the purity of DNA (and RNA). A ratio of ~ 1.8 is accepted as pure for DNA.

TURBO consistently gave a higher purity of pDNA than MEG medium.

Discussion Fusion Partners/ Tags His-MBP was the optimal fusion partner to use with Protein A (Figure 4) based on the tags studied. Protein A was soluble using the other tags, but these tags had varying effects on the solubility of the protein. MBP had been extensively reported to increase the solubility of many proteins. The precise mechanism had not been found but several theories had been suggested (2). Firstly, MBP could act in a passive format like GST, which also increased solubility compared to the untagged construct. GST when expressed in E. coli is a highly soluble protein (due to hydrophobic regions etc). GST’s inherent solubility may make its fusion protein soluble as well (9) as was shown with Protein A (Figure 4). Due to major differences in structure between GST and MBP it is unlikely that MBP works in this way (22). Secondly, MBP could act actively like a chaperone in assisting the folding of Protein A. There are several clusters of hydrophobic residues on the surface of MBP which interact with other proteins in the maltose transport pathway (23). One feature that distinguishes MBP from the other soluble fusion partners examined is its deep hydrophobic cleft, which serves as the maltose-binding site. The E. coli chaperone groEL utilizes a hydrophobic cleft to interact with its targets (24). Any or all of these hydrophobic zones on the surface of MBP could serve as binding

sites for incompletely folded passenger proteins. MBP could thirdly act as a passive chaperone like dnaK. According to this model, the nascent fusion protein initially adopts a “folding intermediate” form in which the fusion partner domain (MBP) is properly folded but the passenger protein is not. If the passenger protein subsequently attains its native conformation, then this gives rise to a soluble fusion protein in its correct conformation. Alternatively, incompletely folded passenger proteins can self-associate to form insoluble fusion protein aggregates (Figure 16).

Figure 16 A model to possibly explain how MBP could improve the solubility and promote the proper folding of its fusion partners. See text for details. The spheres represent native properly folded MBP; the attached strings and helix represent the incompletely folded and native states of a passenger protein, respectively (22)

The fate of the folding intermediate depends on its concentration inside the cell. A high concentration tends to favour intermolecular association whereas the unimolecular folding reaction is more prevalent at lower concentrations. In the case of the MBP fusion proteins, it has been proposed that this intermediate can rearrange into a form in which a physical interaction exists between MBP, and the

incompletely folded passenger protein (Sequestered Intermediate). This effectively occludes its self-association. Although weak and reversible, this non-specific association is promoted by the close proximity of the interacting partners. Consequently, the concentration of the aggregation prone folding intermediate in the cell at any given time is low, so the formation of insoluble aggregates is avoided (22).This is supported in the literature where it has been found that

Page 15: Final GSK John A L Short Report

Page 14 of 23

MBP interacts preferentially with unfolded proteins, promoting their folding in vitro through non-covalent interactions thereby increasing the soluble yield obtained (25). Furthermore, GST and MBP are relatively large fusion partners. Once cleaved, precipitation of the target protein can occur (in house, data not shown). This should be considered in future tag optimisation experiments. Proteins are chemically and structurally diverse, consequently using this generic strategy will not always work, as demonstrated with Protein B. Protein B is a eukaryotic protein which contains rare codons not generally used in E. coli. Perhaps using a different host strain like Rosetta to counteract this codon bias, or through codon optimisation, this problem could be solved (2). In addition, whether MBP works depends on whether aggregation occurs before or after a protein adopts its native conformation, since aggregation may be affected to varying degrees by fusion to MBP and other soluble partners. All of the MBP models depend on the fusion protein with MBP having the ability to conform to its native state in the cytoplasm. For example, a protein that contains disulfide bonds such as Protein B may not be able to fold into a stable conformation in the unfavourable redox environment of the E. coli cytoplasm, but may be able to do so in the non-redox environment of the periplasm. MBP fusions could be exported to the periplasm (after N-terminus signal deletion) for an optimum folding environment for the formation of disulphide bonds (25). When GFP-Protein B was expressed, the flasks were green in colour indicating soluble protein expression, but the soluble protein was too small an amount to be detected by PMF. His and FLAG tags are used mainly as affinity tags for purification, thus as expected they did not increase solubility (some soluble protein was expressed) compared to the untagged construct (Figure 4) as this was not what they were designed for. There are many reports of His tag affecting solubility of proteins, mostly to its detriment (26). This data neither confirms nor refutes this where the

His tag had no real effect compared to untagged control. A variety of other tags are available to evaluate such as NusA that from the literature (2) have been shown to increase the solubility of their fusion proteins. Based on this preliminary evaluation MBP should be considered during construct design.

Host Strains Rosetta-gami was the optimal host strain for all of the proteins examined, apart from Protein B. Rosetta-gami host strains combine the advantages of Rosetta and Origami strains to alleviate codon bias and enhance disulphide bond formation in the cytoplasm respectively when the recombinant proteins are expressed in E. coli (10). Rosetta-gami 2 is derived from the Origami 2 strain, containing deletion mutations in the thioredoxin reductase (trxB) and glutathione reductase (gor) genes. These genes are part of the Thiroredoxin and Glutaredoxin pathways (27) which promote a reducing environment in the cytoplasm, where cysteines do not engage in disulphide bonds. This explained in Figure 17. Figure 17. The Thirodoxin and Glutaredoxin pathways that determine the thiol-disulphide balance in the E. coli cytoplasm (28). Any disulphide bonds that do form are rapidly reduced through the action of disulphide reducing enzymes thioredoxin and glutaredoxin. These perform the fast and reversible thiol-disulphide exchange between their active site cysteines and cysteines in the substrate protein (27). Thioredoxins are kept reduced by thioredoxin reductase, whereas glutaredoxins are kept reduced by glutathione reductase.

Page 16: Final GSK John A L Short Report

Page 15 of 23

These pathways are important in cells to prevent damage from free radicals due to oxidative cell stress, and to provide an optimal environment for cytoplasmic reactions and enzymes (28). However, proteins which do not require disulphide bonds may have reduced solubility in these Origami strains due to intermolecular disulphide bonds (10). A disruption of critical components of either the Thioredoxin or the Glutathione pathways renders the cytoplasm more oxidizing so that some disulphide bonded proteins can be expressed in their active forms (27). The deletion mutations in trxB and gor causes the non-expression of thioredoxin reductase and glutathione reductase, meaning that thioredoxin and glutathione remain oxidised and cannot reduce disulphide bonds formed when the protein folds. The two thioredoxins and glutathione enzymes in E. coli (Figure 17) accumulate in their oxidized forms enabling them to act as disulphide bond formation catalysts, in a reversal of their normal function (28). Rosetta-gami further enhances the protein expression of Proteins C, D and E by containing the pRARE2 plasmid which encodes for seven rare tRNAs for codons rarely used in E. coli (Figure 18). Figure 18. Plasmid schematic of pRARE2 (29). This multicopy plasmid increases rare tRNA levels by increasing the dosage of their respective tRNA genes. The encoded tRNAs for the codons unrepresented in E. coli are: argW: Arg AGG, AGA; argU: Arg CGA; argX: Arg CGG; glyT: Gly GGA; lleX: Ile AUA; leuW: Leu CUA; proL: Pro CCC

Most amino acids are encoded by more than one codon, and each organism

carries its own bias in the usage of the 61 available amino acid codons. In each cell, the tRNA population closely reflects the codon bias of the mRNA population (29). When the mRNA of recombinant protein target genes is over expressed in E. coli, differences in codon usage can impede translation due to the demand for one or more tRNAs that may be rare or lacking in the population. Insufficient tRNA pools can lead to translational stalling, premature translation termination, translation frame shifting and amino acid mis-incorporation (2). These effects are seen from Figures 5 and 6 where there are a number of bands of truncated protein. By supplying these rare tRNAs for the codons required (Figure 18), Rosetta-gami can provides for the translation of proteins which would otherwise be limited by the codon usage of E. coli (10). However, for Protein B, no soluble expression was obtained regardless of host strain used. Whilst Rosetta-gami can correct for rare codons, perhaps codon optimisation is required instead, which can convert the sequence of amino acids of any species into the DNA or RNA sequence of other species. This is time consuming expensive and not always successful, so by using other host strains perhaps these deficiencies can be addressed (Table 9). Table 9 Table showing other host strains that can be used to solve problems with expressing soluble protein (10)

There are many other pathways in the cytoplasm which affect the reduction of disulphide bonds which could cause incorrect folding which may need to be rectified by further DNA modification. Moreover, the cytoplasm itself no matter how it is engineered may still not be the

Page 17: Final GSK John A L Short Report

Page 16 of 23

optimum environment for disulphide bond formation, or due to structural features of the protein itself that prevent correct folding (4). Disulphide bond formation can also take place in the periplasm through specific enzymes (27), so the protein could be engineered to be exported and folded there after translation. In addition, Rosetta-gami generated a low OD and slower growth rate compared to BL21 (DE3) and the other host strains (data not shown). This is because the electron transfer through disulphide bond exchange reactions in the cytoplasm recycles essential enzymes e.g. ribonucleotide reductase, which provides deoxyribonucleotides for DNA replication in all living cells. As a result, DNA replication cannot occur efficiently and this dramatically slows down the division of cells as deoxyribonucleotides are not readily available (27). However, due to the purification issue around contaminants (present at similar molecular weight to the target protein) that are difficult to remove for Proteins C and D, the trade off between purity and low cell yield is a necessary one. This host strain work has helped identify Rosetta-gami as a useful addition to the fermentation “toolbox”.

Chaperones The chaperone teams exhibited varying effects on the proteins examined. This was due to the nature of the chaperone teams and also of the structural properties of the protein, which relates to how it folds. Chaperone teams dnaK-dnaJ-grpE and dnaK-dnaJ-grpE-groES-groEL were the optimal teams for Proteins F, G and H (Table 6). Proteins F, G and H are variants of the same protein, Protein F being E. coli species (His), G (His) and H (untagged) being S. aureus species. These were the optimal teams as they contained the components of the DnaK system (dnaK-dnaJ-grpE) which indicated that aggregation was the main problem. This is supported by the fact that there was less soluble protein expression with the GroE system (groES-groEL) compared to dnaK-dnaJ-grpE. Evidently due to the success of using the DnaK system it could be surmised that the main problem with Protein F, G and H is protein aggregation.

Protein aggregation can occur as the protein is being translated on the ribosome. Thus a protein may begin to fold through a series of intermediates before all the information needed for its folding is available. As protein folding is driven predominantly by hydrophobic interactions, two or more nascent proteins may interact with each others hydrophobic regions while still undergoing synthesis, meaning that they may become entangled and aggregate. Aggregation is concentration dependent. As protein concentration increases, the probability of a protein encountering another protein and interacting with it via hydrophobic forces before it folds increases. The DnaK System acts by binding to the exposed regions on unfolded or partially folded protein chains thus protecting them (13). This is illustrated in Figure 19. Figure 19. Unfolded protein (U) will pass through a number of intermediates to the folded and active state (F). Both the unfolded protein and the early intermediates (I) have exposed regions such as hydrophobic patches that make them susceptible to aggregation (Agg.), a process that increases in likelihood as the concentration of protein grows. Later intermediates and the folded protein do not undergo aggregation. Molecular chaperones may act by (1) blocking the aggregation reaction by binding transiently to the hydrophobic regions (DnaK System) or by providing a cavity to fold in (GroE system), with the substrate being released as a later intermediate or fully folded protein: or (2) reversing the aggregation reaction and releasing proteins back onto the 'correct" folding pathway. Encapsidated folding may also accelerate the rate at which proteins move from the early to the later intermediates, by providing a more hydrophilic environment (GroE System) (30).

It is the above process which causes the majority of protein insolubility which is countered by the DnaK system (see Introduction). DnaK-dnaJ-grpE-groES-groEL contains the components of both chaperone systems which was optimal for the soluble expression of Proteins F and G (Figure 10), whereas for Protein H dnaK-dnaJ-grpE was the best chaperone team (Figure 11). The chaperone systems, like

Page 18: Final GSK John A L Short Report

Page 17 of 23

enzymes, are substrate specific i.e. they can only work on substrates that have a specific structure that can interact and bind to the specific structure of the chaperones. Proteins F and G have a His tag which would alter the native conformational structure of the protein (26) whereas Protein H is untagged, leading to Protein F and G having the necessary binding sites to interact with the chaperones from both of the systems (30) The GroE chaperone system acts by providing a protected environment in which protein folding of individual protein molecules can proceed. This favours folding over aggregation by effectively diluting the substrate protein to an infinitely high dilution where no interaction with other folding proteins can occur. The inside of the cavity is hydrophilic, which may accelerate the burying of hydrophobic residues in proteins that are in the cavity, and there is also good evidence that GroEL can also act as an "unfoldase”, binding to protein folding intermediates that have become trapped in a non-productive and inactive conformation, unfolding them to an extent, and releasing them to try again (30). The key result was increased soluble protein expression of Protein H using dnaK-dnaJ-grpE compared to using foldases from a previous collaboration. This contrasts to Protein G which had equivalent expression and F where there was less expression compared to the foldase control. The same reasons apply as from before, where due to the different structures of the variants, these determine the efficacy of binding to the chaperones (13). The foldases are a group of accessory proteins that catalyse specific isomerisation steps that might otherwise limit the rate of folding of certain proteins (31). Protein F is the E. coli species, which has a different amino acid sequence leading to a different surface shape and different hydrophobic regions for interaction. This could lead to an unstable dnaK-ADP-substrate-dnaJ and/or groES-groEL complex unable to prevent aggregation and aid in folding respectively. This also applies to Protein G, which may not have the correct conformation or exposed hydrophobic regions to interact with the chaperone systems as well as Protein H due to its His tag. Inducing the chaperones with each of the target proteins (except Protein B) provided

increased soluble protein expression compared to induction of the chaperones at inoculation. Induction of the chaperones with the target protein leads to a reduction in the metabolic burden on the cell codon usage at inoculation, therefore greater resources for bacterial cell division. As a result, there are more “protein factories” (2) available to express the target protein leading to increased yields. Tig and groES-groEL-tig chaperone teams had limited impact on solubility of Proteins F, G and H (Table 6). Where they had an impact this was due to the protein having the correct structure to interact with them. Where they did not work as well as the other teams (or not at all) suggests that the protein was of the incorrect conformation for interaction, or that the protein had successfully folded at this point where the chaperones were not necessary. For Protein I, J and K, although there was some soluble expression, all of the chaperone teams were detrimental to soluble protein expression compared to not using them at all (where there was good soluble expression). Expressing the chaperones introduces a metabolic burden on the cell, where not only is part of the cells machinery taken up by producing these instead of the target protein, but also the limited resources available are used in the production of the chaperones. This leaves less machinery and resources for the production of the target protein thus reducing expression. Whatever the benefits of increasing solubility by using the chaperone teams, this is outweighed by the reduced protein expression in these instances. Protein B remained insoluble when expressed with all of the chaperone teams. This could be due firstly to the cellular location of the chaperone systems. In E. coli they are located specifically in the cytoplasm. As mentioned previously the cytoplasm may not be the optimum environment for the folding of Protein B as it is not conducive to disulphide bond formation. Even if some folding takes place due to the chaperone teams, the disulphide bonds may not be able to form or be maintained leading to insolubility. Moreover, Protein B, because of its specific structure, exposed hydrophobic

Page 19: Final GSK John A L Short Report

Page 18 of 23

regions, size or shape may not be able to interact with the chaperone teams. In addition, different proteins follow different pathways of folding (Figure 19). Thus the chaperone systems used may not be the correct systems required for the folding pathway of Protein B (13). For future work, other chaperones could be looked at within the cytoplasm (Figure 20). These offer further enhancers in other pathways of folding in protecting the protein from the cytoplasmic environment and actively assisting folding (13).

In addition, proteins could be exported to the periplasm via the secretory pathway. In the periplasm, there are a number of chaperone systems available such as the Dsb system could be more specific for the protein, such as disulphide bond formation (13).

Figure 20. Nascent polypeptides requiring molecular chaperones encounter Trigger Factor (TF) or DnaK-DnaJ which interact with solvent-exposed stretches of hydrophobic regions, shielding them from the solvent and from each other. After undocking from the Dnak system, the folding intermediate may reach a native conformation, cycle back to DnaK-DnaJ or be transferred to GroEL for folding at infinite dilution upon GroES capping. In times of stress (red arrows), thermolabile proteins unfold and aggregate. IbpB binds partially folded proteins on its surface until folding chaperones become available. The holding chaperones Hsp33 and Hsp31 become important under oxidative and severe thermal stress, respectively. ClpB promotes the shearing and disaggregation of thermally unfolded host proteins and cooperates with the Dnak system to reactivate them once stress has abated. Recombinant proteins that miss an early interaction with TF or Dnak system, that undergo multiple cycles of abortive interactions with folding chaperones or titrate them out, accumulate in inclusion bodies (green arrows) (13).

Without knowing how the protein folds in the first place (as with Protein B) or

whether a protein needs further post-translation modifications before it can fold

using the chaperones will be a hit or miss affair (30). Other expression systems could be evaluated with the chaperones. It has been demonstrated in this study that the chaperone systems evaluated can affect the solubility of proteins, supporting claims in the literature. In the future these chaperones will be further evaluated by GSK.

Plasmid Optimisation

It has been demonstrated that plasmid yields can be optimised by changing the growth conditions of the host strain used. TURBO was the optimal medium for Plasmid X based on the initial shake flask experiments (Table 7). Whilst the exact composition of TURBO remains unpublished, Athena describes the medium as having “a rich nutrient base, providing amino acids, vitamins, inorganic and trace minerals at levels higher than that of LB broth” (32). This supports why, at initial optimisation, TURBO surpassed LB in terms of plasmid yield. This suggests nutrients were a rate limiting step when using LB. Furthermore, Hyper, POWER and MTB media are primarily designed for recombinant protein expression and not plasmid preparations, and as shown they did not perform as well. TURBO surpassed Superior and Circle Grow media, which were designed to be optimal for plasmid generation. TURBO was designed as a dual use medium for recombinant protein expression and for increasing plasmid yields. There was no real significant difference in plasmid quality (defined by 260/280nm ratio ~1.8, see Table 7) between the media examined in the initial shake flask experiments. Comparing the plasmid yields between MEG and TURBO media, TURBO proved to be optimal per 2ml cell pellet, per OD unit and per 10g of cells. This shows that TURBO was the best medium for maximizing specific yields per cell. Maximising for specific yields leads to improved plasmid purity in downstream processing (15). It can be surmised that TURBO medium best supports high nucleotide pools in cells and supplies energy for replication whilst minimizing other cellular process that could use these resources leading to greater yields per cell (15).

Page 20: Final GSK John A L Short Report

Page 19 of 23

In terms of plasmid quality, MEG supported a consistently higher 260/280nm ratio indicating more impurities than TURBO. The increased impurities are not significant as the ratios were still close to 1.8 (Table 8). MEG was the optimal medium for maximising volumetric yields, where greater biomass was generated. 40mg of plasmid from 80g of cell paste was obtained from MEG compared to 20.3mg from 35g of cell paste for TURBO. Maximizing volumetric yield is important for allowing smaller and more economical fermentations (15). This allows for a significant reduction in cost (Table 10). Table 10

Qiagen Fermentation Strategies for increasing plasmid yield (33)

TURBO and MEG media use glycerol as a carbon source. The carbon source is critical for cell growth, providing energy and biomass and is usually the limiting nutrient in batch fermentation (5). Glycerol has been shown to be a better carbon source than glucose from the literature (5) and this was confirmed in this investigation. By using glycerol, greater plasmid yields were obtained. This was observed where the LB with glycerol had a 50% increase in plasmid yield than LB and glucose. By using glycerol, this reduces the repression of intermediate metabolites and the accumulation of inhibitive organic acids that can inhibit plasmid generation when glucose is used (5). Very little difference was observed in plasmid yields when using different concentrations of glycerol with TURBO. This is due to high glycerol concentrations

promoting high growth rates which increase acetate production. This leads to reduced plasmid stability, replication and maintenance, meaning reduced levels of super coiled plasmid (yield and quality). Lower growth rates reduce growth rate dependent plasmid instability by providing time for plasmid replication to synchronies with cell division. Further strategies that can be investigated are as outlined in Table 10. Fed-batch fermentation may be a better protocol compared to the batch fermentation process used. It has been reported that the controlled addition of the limiting nutrients allows for greater control growth rates, allowing for greater biomass yields as the substrate is supplied at a rate that it is nearly all consumed. As a result conversion of substrate to biomass is very efficient whilst residual substrate concentration is nil, never reaching inhibitory concentration. Metabolic overflow from excess substrate is also prevented, reducing the formation of inhibitory acetate (15). Other work could be to develop enhanced plasmids optimized for plasmid generation. There are number of strategies to do this as shown in Table 11. Table 11

Table of Strategies for increasing plasmid generation by altering attributes of plasmid (33). Furthermore, as Plasmid X was Kanomycin resistant this may have had a beneficial effect on plasmid yields compared to using Ampicillin resistant plasmids, due to increased plasmid stability (15). This could be further investigated for plasmid yield optimisation.

Page 21: Final GSK John A L Short Report

Page 20 of 23

In conclusion, this plasmid yield optimisation strategy could be used to increase yields and/or obtain purer plasmids where there are problems with plasmid generation. This could significantly reduce the requirement for large scale fermentations, impacting on cost and timelines.

Acknowledgements Thank you to all of the staff at Biological Reagants and Assay Development at GSK who helped me in this project, in particular Paul Homes, Richard Hall, Anthony Shillings, Praveen Singh, Clare Hobbs, Pamela Acheson, Angela Bridges, Jo Jones and Andrew Fosberry.

References

1. Marino, M. H. Expression systems

for heterologous protein production. BioPharm., 1989: 2, pp. 18–33.

2. Vallejo, L.F., and Rinas, U.

Strategies for the recovery of active proteins through refolding of bacterial inclusion body proteins. Microbial Cell Factories [online]. September 2004, 3(11), [Accessed 9th July 2007]. Available from: <http://www.microbialcellfactories.com/content/3/1/11>

3. Esposito, D, and Chatterjee, D.K. Enhancement of soluble protein expression through the use of fusion tags. Current Opinion in Biotechnology, 2006: 17(4), pp. 353-358.

4. Makrides, S.C. Strategies for Achieving High-Level Expression of Genes in Escherichia coli. Microbiol Rev., 1996: 60(3), pp. 512-38.

5. Zhi-nan, X et al. Effects of medium

composition on the production of plasmid DNA vector potentially for human gene therapy. Journal of Zhejiang University Science, 2005 6(5), pp. 396–400.

6. Sørensen, H.P., and Mortensen, K.K. Soluble expression of recombinant proteins in the cytoplasm of Escherichia coli. Microbial Cell Factories [online]. January 2005, 4(1), [Accessed 10th July 2007]. Available from: <http://bmc.ub.uni-potsdam.de/1475-2859-4-1/>

7. Esmeralda, A.W. et al. His tag

effect on solubility of human proteins produced in Escherichia coli: a comparison between four expression vectors. Journal of Structural and Functional Genomic., 2004: 5, pp 217–229.

8. Sachdev, D., and Chirgwin, J.M. Order of fusions between bacterial and mammalian proteins can determine solubility in E. coli. Biochem Biophys Res Commun.,1998: 244(3), pp. 933-937.

9. Waugh, D.S. Making the most of

affinity tags. Trends in Biotechnology, 2005: 23(6), pp. 316-320.

10. Novagen. What a difference a

strain makes. Competent Cells [online]. [Accessed 5th June 2007]. Available from: http://www.merckbiosciences.co.uk/docs/docs/LIT/220006_MGBP.pdf

11. Stratagene. Arctic Express

Competent Cells. Protein function & analysis [online]. [Accessed 1st April 2007]. Available from: http://www.stratagene.com/lit_items/ArcticExpress_TB119_lores.pdf

12. Takara Bio. Inc. Chaperone

Plasmid Set. [online]. [Accessed 25th January 2007]. Available from: http://www.takara-bio.com/bioview_e/pdfs/3340-V0208.pdf

13. Baneyx, F., Mujacic, M.

Recombinant folding and misfolding in Escherichia coli. Nature Biotechnology, 2004: 22, pp. 1399-1408.

Page 22: Final GSK John A L Short Report

Page 21 of 23

14. Faltynek, C.R. et al. Utility of Large-Scale Transiently Transfected Cells for Cell-Based High-Throughput to Identify Transient Receptor Potential Channel A1 (TRPA1) Antagonists. Journal of Biomolecular Screening, 2007: 12(1) pp. 61-69.

15. Carnes, A.E. Fermentation Design

for the Manufacture of Therapeutic Plasmid DNA. Bioprocess Technical, 2005, pp. 1-7

16. Stratagene. Transfrormation

protocol. BL21(DE3) Competent Cells, BL21(DE3)pLysS Competent Cells and BL21 Competent Cells: Instruction Manual [online]. [Accessed 2nd November 2006]. Available from: <http://www.stratagene.com/manuals/200133.pdf>

17. Homes, P. et al. Optimised

expression of recombinant proteins in E. coli using a variety of media. Microbial Culture Sciences, Gene Expression and Protein Biochemistry, 2004.

18. Bradford, M. A rapid and sensitive

method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. Anal. Biochem., 1976: 72, pp. 248-254.

19. Qiagen. Technical Manual

[online]. [Accessed 3rd November 2006]. Available from: <http://www1.qiagen.com/literature/>

20. Promega. Wizard® Plus Minipreps

DNA Purification System. Technical Bulletin, [online]. [Accessed 25th September 2006]. Available from: <http://www.promega.com/tbs/tb117/tb117.pdf>

21. NanoDrop Technologies. NanoDrop ND-1000 Overview [online]. [Accessed 12 November 2006]. Available from: <http://www.nanodrop.com/nd-1000-overview.html>

22. Kapust, R.B., and Waugh, D.S. Escherichia coli maltose-binding protein is uncommonly effective at promoting the solubility of polypeptides to which it is fused. Protein Sci., 1999: 8(8), pp. 1668-1674.

23. Martineau, P. et al. Progress in

the identification of interaction sites on the periplasmic maltose-binding protein from E. coli. Biochimie., 1990: 72, pp. 397–402.

24. Buckle, A.M. et al. A structural

model for GroEL-polypeptide recognition. Proc. Natl. Acad. Sci. USA, 1997: 94, pp. 3571–3575.

25. Fox, J.D. et al. Maltodextrin-

binding proteins from diverse bacteria and archaea are potent solubility enhancers. FEBS Letters, 2003: 537, pp. 53-57.

26. Arnau, J. et al. Current strategies

for the use of affinity tags and tag removal for the purification of recombinant proteins. Protein expression and purification, 2006: 48, pp. 1-13

27. Ritz, D., and Beckwith, J. Roles of

Thiol-Redox Pathways in Bacteria. Ann. Rev. Microbiol., 2001: 55, pp. 21-48

28. Aslund, F. et al. Efficient

production of disulfide bonded proteins in the cytoplasm in “oxidizing” mutants of E. coli. Biotechnology and bioengineering, 2004: 85(2), pp. 122-129.

29. Novagen. Vectors for expression

of amino-terminal His Tag fusion proteins containing minimal extraneous sequences [online]. [Accessed 1st February 2007]. Available from: <http://www.emdbiosciences.com/docs/NDIS/inno18-007.pdf>.

30. Lund, P.A. Microbial Molecular

Chaperones. Advances in Microbial Physiology, 2001: 44, pp. 93-139.

Page 23: Final GSK John A L Short Report

Page 22 of 23

31. Hockney, R.C. Recent developments in heterologous protein production in Escherichia coli. TIBTECH, 1994: 12, pp. 456-463.

32. AthenaES. Protein Expression

Media [online]. [Accessed 23rd March 2007]. Available from: <http://www.athenaes.com/ReadytoUse.htm>

33. Wayne, T.A. Innovations in

plasmid DNA manufacturing – molecular variations of a generic process, regulatory relevance, and economic impact [online]. [Accessed 24rth April 2007]. Available from: <http://www.palliance.com/otherfiles/flyer3.pdf>