annotation and analysis of newly discovered...

21
1 ANNOTATION AND ANALYSIS OF NEWLY DISCOVERED MYCOBACTERIOPHAGE GENOMES Carla De Los Santos, David Homan, Jose Morales, Erica Shepard, Yu-Chen Hwang, Janine Ilagan, John-Paul Donohue, Patricia Chan, Todd Lowe, Grant Hartzog Abstract Viruses that infect bacteria (bacteriophage) are the most abundant and genetically diverse DNA-containing entities on the planet. Analysis of phage genomes may reveal novel DNA sequences, novel protein domains and provide insights into the biology of the host. We are analyzing two novel mycobacteriophage, Firecracker and Dori, which were isolated on the UCSC campus using Mycobacterium Smegmatis as the viral host. After multiple rounds of plaque purification, we performed electron microscopy and observed that Dori has a typical siphoviral morphology and that Firecracker has an unusual cylindrical morphology. The Dori and Firecracker genomes were sequenced using a combination of next-generation technologies. Following assembly of the sequence data for Dori, we obtained a single large contig of 64,613 basepairs. The Firecracker genome is 71,341 basepairs, has defined ends with 4 basepair 3’ overhangs and has a large number of short sequence repeats. Using the gene prediction programs Glimmer, GenMark, tRNAscan-SE and Aragorn we identified 93 protein-encoding genes in the Dori genome and 126 genes in the Firecracker genome. Although many mycobacteriophage genomes include tRNA genes, neither the Dori nor Firecracker genomes appear to carry structural RNA genes. BLAST searches indicate that phage Dori's genome sequence is distinct from that of previously sequenced mycobacteriophage genomes. Firecracker is very similar to a previously identified phage, Corndog, and together they define a new class of mycobacteriophage. We have also determined that Dori is a temperate phage: we have isolated Dori lysogens, identified repressor and

Upload: others

Post on 25-Feb-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ANNOTATION AND ANALYSIS OF NEWLY DISCOVERED ...beng.soe.ucsc.edu/sites/default/files/project-reports/B...BLAST searches indicate that phage Dori's genome sequence is distinct from

1

ANNOTATION AND ANALYSIS OF NEWLY DISCOVERED MYCOBACTERIOPHAGE GENOMES

Carla De Los Santos, David Homan, Jose Morales, Erica Shepard, Yu-Chen Hwang, Janine Ilagan, John-Paul Donohue, Patricia Chan, Todd Lowe, Grant Hartzog Abstract Viruses that infect bacteria (bacteriophage) are the most abundant and genetically diverse

DNA-containing entities on the planet. Analysis of phage genomes may reveal novel

DNA sequences, novel protein domains and provide insights into the biology of the

host. We are analyzing two novel mycobacteriophage, Firecracker and Dori, which were

isolated on the UCSC campus using Mycobacterium Smegmatis as the viral host. After

multiple rounds of plaque purification, we performed electron microscopy and observed

that Dori has a typical siphoviral morphology and that Firecracker has an unusual

cylindrical morphology. The Dori and Firecracker genomes were sequenced using a

combination of next-generation technologies. Following assembly of the sequence data

for Dori, we obtained a single large contig of 64,613 basepairs. The Firecracker genome

is 71,341 basepairs, has defined ends with 4 basepair 3’ overhangs and has a large

number of short sequence repeats. Using the gene prediction programs Glimmer,

GenMark, tRNAscan-SE and Aragorn we identified 93 protein-encoding genes in the

Dori genome and 126 genes in the Firecracker genome. Although many

mycobacteriophage genomes include tRNA genes, neither the Dori nor Firecracker

genomes appear to carry structural RNA genes. BLAST searches indicate that phage

Dori's genome sequence is distinct from that of previously sequenced mycobacteriophage

genomes. Firecracker is very similar to a previously identified phage, Corndog, and

together they define a new class of mycobacteriophage. We have also determined that

Dori is a temperate phage: we have isolated Dori lysogens, identified repressor and

Page 2: ANNOTATION AND ANALYSIS OF NEWLY DISCOVERED ...beng.soe.ucsc.edu/sites/default/files/project-reports/B...BLAST searches indicate that phage Dori's genome sequence is distinct from

2

integrase genes in the Dori genome and identified and verified attP and attB sites used by

Dori.

Introduction

The Mycobacteriophage Revolution

Bacteriophage played a central role in the early development of molecular

biology. For the past 30 years however, they have been largely ignored in favor of other

model systems. Recently bacteriophage research has once again become an area of

productive investigation. Phage are recognized as the most abundant and genetically

diverse self-replicating organism on the planet; with the rise of antibiotic resistant strains

of bacteria, their potential as antibacterial agents is receiving serious attention; they are

also now recognized as providing a particularly productive platform for science

education. The National Genomics Research Initiative, sponsored by the Howard

Hughes Medical Institute is sponsoring a phage-based research initiative for

undergraduates at a diverse set of universities around the country. Its goals are to

increase the quality of science education and to increase the recruitment and retention of

students into the sciences. The first project sponsored by the NGRI is a phage-hunting

course developed by Graham Hatfull at the University of Pittsburg. In this class, students

isolate novel bacteriophages using Mycobacterium Smegmatis as a host. They purify and

characterize their phage, and then one or more is sequenced by the class. During the

course of this project, many mycobacteriophage genomes have been characterized and

added to existing databases; this success has driven the production of a

mycobacteriophage-specific database that contains all the mycobacteriophage genomes

Page 3: ANNOTATION AND ANALYSIS OF NEWLY DISCOVERED ...beng.soe.ucsc.edu/sites/default/files/project-reports/B...BLAST searches indicate that phage Dori's genome sequence is distinct from

3

identified up to date, including many, as yet, genomic sequences. The current

mycobacteriophage genomes have been grouped into clusters using the a program called

Splitstree, which compares gene phamilies found in phages as well as possible alternative

phylogenetic relationships between them.1 There has been an increase in the size of

existing clusters, from the original clusters A-F to A-O, as well as the discovery of 9 new

phages that do not belong to any of those clusters, the latter phages have been placed in

new clusters and are considered singletons within their new clusters.

As a result of the information now available on mycobacteriophage and

technological scientific advances, new ways of using mycobacteriophage as research

tools have also emerged, such as BRED technology proposed by the Hatfull group.

Bacteriophage Recombineering of Electroporated DNA (BRED) is a technique that uses

the recombineering system in bacteriophages that express Rec E and Rec T homologs.

This technology has been proven to be useful in the construction of unmarked deletions

for both essential and non-essential genes, in-frame internal deletions, point and nonsense

mutations, gene tags, and specific insertions of genes from other organisms.2

Mycobacteriophage Background

Bacteriophages are viruses that infect bacteria. They are the most abundant DNA

containing entities on the planet and are a major source of genetic diversity.

Bacteriophages can infect their host and be propagated in two different ways, through a

lysogenic or a lytic life cycle.

During the lytic cycle, the mechanism of the bacteriophage is to attach to a host

cell and insert their DNA. The virus then uses the host replicating machinery to replicate

its genome and produce several new phages, eventually bursting and killing its host.

Page 4: ANNOTATION AND ANALYSIS OF NEWLY DISCOVERED ...beng.soe.ucsc.edu/sites/default/files/project-reports/B...BLAST searches indicate that phage Dori's genome sequence is distinct from

4

During the lysogenic life cycle, the bacteriophage enters the host and integrates itself into

the genome through recombination. In order for recombination to occur, the phage must

have an attachment site, or attP site, located near the integrase gene, which catalyzes the

event, in order to recognize the attachment site of the bacterial host, attB site. Once

recombination has occurred and the phage genome has been integrated in the host

genome, the phage remains dormant until the infected host is perturbed triggering the

virus to reenter the lytic cycle, this process is known as induction. During induction, the

phage removes itself from the host genome through a process called excision, and enters

the lytic cycle.

Mycobacteriophage Dori & Mycobacteriophage Firecracker

For almost two years, we have focused on characterizing the novel

mycobacteriophage genomes of Dori and Firecracker. The phage samples were collected

from the UCSC campus by students taking the NGRI-sponsored course, Bio21L,

Environmental Phage Genomics, which was taught by Professors Grant Hartzog and

Manny Ares.

Dori was originally isolated and named by Ericka Shepard and Firecracker was

isolated and named by Jose Morales and David Homan. We have examined both

genomes through a combination of microbiology, bioinformatics, computation biology,

and next generation sequencing technologies. In this thesis, I describe the sequencing,

assembly annotation and bioinformatic analyses of these phage.

Results

Page 5: ANNOTATION AND ANALYSIS OF NEWLY DISCOVERED ...beng.soe.ucsc.edu/sites/default/files/project-reports/B...BLAST searches indicate that phage Dori's genome sequence is distinct from

5

Electron Microscopy of Phage Dori

A high-titer plate lysate of phage Dori and Firecracker were spotted onto a

carbon-coated copper grid stained with uranyl acetate and visualized by electron

microscopy. This revealed that Dori has a siphovoridae morphology with a long flexible

tail (fig.1)

Obtaining DNA of Bacteriophage Dori

In order to obtain the bacteriophage DNA necessary for sequencing, we extracted

DNA from a previously prepared high-titer plate lysate of phage Dori. This yielded DNA

at 63.4 ng/ul, which was subsequently sequenced at UCSC, using both 454 and SOLiD

technologies.

Sequencing Analysis of Bacteriophage Dori

The 454 and SOLiD sequencing data were assembled separately using the

Newbler and Minimus sequence assemblers respectively. These assembled sequences

were also reassembled using Minimus. This analysis yielded a single contig, which we

viewed using the program Hawkeye. We observed a few nucleotide uncertainties in the

454 sequence data and a pile up of reads around the 21 K region of the genome (fig.2).

The 454 sequence was then compared to the SOLiD sequence, which was viewed by a

program called Tablet, both sequences agreed and the region of the sequence containing

the pile up was identical for both sequence outputs. The region containing the 21 K Pile

up was sequenced using Sanger Chemistry at the Berkeley Sequencing Center, the results

agreed with both the 454 and SOLiD sequences. Further analysis of this region was

performed using Gepard to look for possible repeats in the genome, however, there were

no possible repeats present. Both Sanger Chemistry and Gepard showed no indication of

Page 6: ANNOTATION AND ANALYSIS OF NEWLY DISCOVERED ...beng.soe.ucsc.edu/sites/default/files/project-reports/B...BLAST searches indicate that phage Dori's genome sequence is distinct from

6

repeats causing the 21 K pile up observed in the Dori genome. One potential explanation

for this unusual over-representation of sequence reads in our data is that they resulted

from preferential PCR amplification of the genomic DNA during preparation of the

samples for 454 sequencing. Once sequence ambiguities were resolved, a final FASTA

file of the complete Dori genome was generated.

Annotating the Dori Genome

We used the DNA master software package and its associated gene prediction

programs, Glimmer and Genemark, to annotate protein-coding genes in phage Dori.

Starting with the gene prediction generated by Glimmer and Genemark, we examined the

protein encoding capacity of the Dori genome. We identified open reading frames in

both positive and negative strands of the genome and annotated a total of 93 protein-

coding genes in phage Dori. BLAST was used to determine whether Dori was similar to

any other bacteriophage currently sequenced and whether it belonged to any of the

existing clusters of mycophages.

The BLAST results did not show any similarity to the entire genomes of any other

bacteriophage in the database and thus Dori could not be clustered with any of the

existing groups of phages; Dori is considered a singleton in the phage database. Although

no extensive similarity to other phage genomes was predicted, Dori shared homology to

other hypothetical phage proteins and some of the structural genes were identified using

Blastp. Many genes in the Dori genome also shared high homology to protein coding

genes in bacteria. The predicted open reading frames were then further annotated by

considering different possible translation start sites (typically upstream of the predicted

start). Criteria used to select these alternative starts included: maximizing gene length, a

Page 7: ANNOTATION AND ANALYSIS OF NEWLY DISCOVERED ...beng.soe.ucsc.edu/sites/default/files/project-reports/B...BLAST searches indicate that phage Dori's genome sequence is distinct from

7

bias against starts that create more than a 4 base pair overlap with the adjacent gene, a

strong Shine Delgarno sequence, and starts that gave better Blastp hits or alignments to

related genes; the map of the Dori genome is shown in Figure 3.

Identifying the Ends of the Dori Genome

The assembled genome sequence for Dori did not show clearly defined ends,

raising the possibilities that either we did not have a complete genome sequence for Dori

or that it is a circularly permuted virus. We generated a restriction map for the Dori

genome and used it to select enzymes that cut near the ends of the genome sequence in

our assembly. We predicted that if the Dori genome were circular, then the resulting

digest using gel electrophoresis would contain only one band, instead of multiple

fragments. We used lambda phage DNA cut with Bst EII as both a marker and to identify

possible cohesive ends in the Dori genomes. Lambda phage contains compatible cohesive

ends that form stable hybrids at room temperature. However, these will melt when

heated, separating the ends of the Bst EII digest lambda DNA into two fragments. We

therefore loaded digests of both lambda phage DNA digested with Bst EII and Dori

digested with various enzymes with or without a prior heating step.

When we digested Dori with Eco RI, we observed an extra, unexpected band at

the 8kb region, which would instead result in a 25 kb band instead of a 32.6 kb band in

our Eco RI digest (fig.4). Restriction digests using enzymes Sac I, Kpn I, Bcl I, Xma I,

and Sma I were performed yielding similar results with extra and missing bands. The

occurrence of extra and missing bands could indicate that Dori uses headful packaging as

its terminating technique and since the digest do not always agree with those predicted

Page 8: ANNOTATION AND ANALYSIS OF NEWLY DISCOVERED ...beng.soe.ucsc.edu/sites/default/files/project-reports/B...BLAST searches indicate that phage Dori's genome sequence is distinct from

8

for a circular genome, these results also imply that Dori is a circularly permutated

genome4.

We used Sanger sequencing on the possible ends of the Dori genome and the

results showed that there were indeed no defined ends in our genome and making it

circularly permutated, but in order to further verify that our predictions are true, southern

blot analysis is currently in progress.

The att Site in Bacteriophage Dori

During annotation of the Dori genome, we observed that one of the predicted

protein coding genes was homologous to a bacteriophage integrase, this was identified

though a Blastp search. We then performed a Blastn search using the DNA sequence of

the integrase gene along with sequence around the gene and obtained. A match to M.

Smegmatis at a tyrosine tRNA site was obtained and generated a figure aligning both the

query sequence with the match and highlighted the perfect matches and underlined the

tRNA portion of M. Smegmatis (fig. 5). Using the figure generated and length of the

sequence, we predicted the lengths of the fragments that would result in a recombination

event. The following results were predicted for the amplification of the recombination

fragments and the original fragment in both bacteria and bacteriophage before

recombination:

Primer set Length of Fragment in base pairs

B1→B2 569

P1→P2 314

P1→B2 491

B1→P2 392

Page 9: ANNOTATION AND ANALYSIS OF NEWLY DISCOVERED ...beng.soe.ucsc.edu/sites/default/files/project-reports/B...BLAST searches indicate that phage Dori's genome sequence is distinct from

9

In order to be able to amplify the predicted recombination fragments as well as

the bacterial and phage fragments, obtained genomic DNA from Dori and M. Smegmatis.

Cultures of M. Smegmatis were grown and a DNA prep Kit was used in order to obtain

genomic DNA. We also prepared DNA from a Dori lysogen of M. Smegmatis. The

samples were then used with the combination of primers, pure bacteria and phage primers

for pure samples and combination of phage and bacteria primers for the infected samples.

Gel analysis showed the bands present around the predicted sizes, indicating that this is

the att site for bacteriophage Dori.

Annotating the Firecracker Genome

The firecracker genome was sequenced using 454 and primer-directed Sanger

sequencing to finish the sequencing. The genome was assembled and then annotated as

described above for Dori. 126 protein-coding genes were predicted. No tRNA genes were

observed. The genomic sequence for firecracker was very similar to the genome of

phage Corndog. There were a few genes that were identified based on homology to

proteins found in Blastp searches. These included the phage tail protein and a terminase

large subunit protein. Other genes with predicted functions are noted in figure 6.

Genome Browser of Phages Dori and Firecracker

Genome Browsers were produced for both Dori and Firecracker by Patricia Chan

in the Lowe Lab and by John-Paul Donohue in the Ares Lab (Mycobacteriophage Dori:

http://microbes.ucsc.edu/cgibin/hgTracks?org=Mycobacterium+phage+Dori&position=c

hr:10001-35000 Mycobacteriophage Firecracker:

http://microbes.ucsc.edu/cgibin/hgTracks?org=Mycobacterium+phage+Firecracker&posi

tion=chr:10001-35000).

Page 10: ANNOTATION AND ANALYSIS OF NEWLY DISCOVERED ...beng.soe.ucsc.edu/sites/default/files/project-reports/B...BLAST searches indicate that phage Dori's genome sequence is distinct from

10

Simple Sequence Repeats in the Firecracker Genome

We used a dotplotter program, Gepard, and repeated sequence motif finder,

MEME to determine in either Dori or Firecracker contains repetitive DNA elements. We

observed a highly repeated sequence in the Firecracker genome using Gepard, and using

MEME we found that this repeat is a 17 base pair palindrome. The palindromic sequence

contained a 3 nucleotide center region that were non palindromic that may imply a stem

loop structure in the RNA of firecracker (fig. 7). The motif also overlapped with a repeat

of the sequence “TGGGGGTGTTCGGTTTCCGAACAG”, that occurs 22 times in the

genome. This sequence is specifically located between the end and the beginning of

genes and it is found mostly on the positive strands, 16/22 times.

Other repeats were identified in the 70 kb region of the firecracker genome. A

square like representation of repeats was observed in the Gepard out put and some of the

predicted repeats were present in the region between gene 125 and 126. Due to no

sequence homology in the region between those two genes, no protein-coding gene was

predicted in this region leaving a large gap between those two genes.

Materials and Methods

Phage Titer Assay for phage Dori

We performed a phage titer assay for phage Dori in order to predict the amount of

phage necessary and at what concentration would be enough to produce a web pattern. A

web pattern is necessary when collecting the filtrate of pure phage so that we get the

maximum amount of phage in order to get the maximum amount of DNA for sequencing

and other experimentation.

Page 11: ANNOTATION AND ANALYSIS OF NEWLY DISCOVERED ...beng.soe.ucsc.edu/sites/default/files/project-reports/B...BLAST searches indicate that phage Dori's genome sequence is distinct from

11

Phage Dilutions

Three different dilutions were made undiluted 10-2,10-3, and 10-4. These

dilutions were created using a microcentrifuge tube and by adding 100 ul of phage buffer

(PB) and 10ul of the undiluted phage filtrate to produce a 10-2 and then 10 ul were taken

from this dilution and added to 100ul of PB to create a 10-4 and then again to create a 10-

6 dilution.

Culture Tube Preparation

3 culture tubes were used; one for each dilution along with 0.5 ml Mycobacterium

Smegmatis. 10 ul of each dilution were added to different culture tubes and were left at

room temperature undisturbed for 10 minutes.

Plating

Using the 3 different culture tubes, 4.5 ml of heated Top Agar was mixed in and

pippetted onto an agar plate making sure that the top agar-phage mixture was distributed

evenly as to produce a smooth top layer covering the entire surface area of the top agar

plate. The plates were left at room temperature to solidify for 30-60 minutes and then

placed in a 37-degree incubator upside down for a 24-hour incubation.

Plaque Ttiter Assay

The individual plaques present on the plates containing the dilutions of phage

Dori were measured using a ruler and the following equation was used:

Plaques needed for web patter = (Area of plate)/ (Area of Plaque)

Since only the 10-2 dilution showed any plaque formation, this was the sample used to

predict the amount necessary for a web pattern. The area of the plate was calculated to be

6082.12 mm2 and the area of the plaque was calculated to be 0.79 mm2.

Page 12: ANNOTATION AND ANALYSIS OF NEWLY DISCOVERED ...beng.soe.ucsc.edu/sites/default/files/project-reports/B...BLAST searches indicate that phage Dori's genome sequence is distinct from

12

Isolation of Genomic DNA

To purify DNA from phage Dori, a high-titer phage lysate was treated with

RNAse and DNAseI. The nuclease-treated phage were precipitated with polyethylene

glycol and phage DNA was purified using a Promega Wizard DNA prep kit according to

the manufacturer’s directions. The concentration of the purified DNA was measured

using a nanodrop device, which resulted in 63.4ng/ul of Dori DNA.

Ligation and PCR reactions were performed using commercially obtained enzymes

following the manufacturer’s directions.

Verification of Bacteriophage Terminating Technique

To verify the suggested terminating technique used by Dori, we performed a

series of restriction digests. For each restriction digest we picked the restriction enzymes

based on the predicted size of the fragments using restriction map producing software.

We aimed to obtain end fragments that were 2-7 kb long so that the resulting fragments

could be easily identified.

EcoR V & Bgl II Restriction Digest

A restriction digest was performed using the EcoR V and Bgl II restriction

enzymes and their appropriate buffers to identify possible cohesive ends in Dori genome.

Bst EII/� was used as a control because it contains cohesive (cos) ends, which can be

distinguished when heated and not heated, and to use as a marker for measurements. The

samples were ran using a 6% agarose gel containing EtBr. The samples were observed at

a half run and a full run. All samples were run heated and not heated.

EcoR I, Sac I, & Sph I Restriction Digest

Page 13: ANNOTATION AND ANALYSIS OF NEWLY DISCOVERED ...beng.soe.ucsc.edu/sites/default/files/project-reports/B...BLAST searches indicate that phage Dori's genome sequence is distinct from

13

A restriction digest was performed in order to explain the heavier bands present

on the previous restriction digest with Bgl II and EcoR V. Restriction enzymes EcoR I,

Sac I, Sph I, and Bst EII/� were used, this time without adding heat to any of the

samples. The samples were run using a 6% agarose gel containing EtBr, EtBr staining

made the lower half of the gel difficult to see. From the top half of the gel, we noticed

EcoR I yielded an unpredicted band around the 8kb region.

Kpn I & Bgl II Restriction Digest

Restriction digest using Kpn I, Bgl II, and Bst EII/� were run with their

appropriate buffers using a 6% agarose gel containing EtBr. A doublet and a singlet band

around the 8kb region not predicted by the restriction map for Kpn I, while the 2.7 kb end

fragment was not present in the gel. Bgl II also yielded an extra band around the 13 kb

region that was unaccounted for.

Isolation Genomic DNA for M. Smegmatis and M.Smegmatis infected with Dori

M.Smegmatis alone and infected M.Smegmatis DNA were isolated using Qiagen

DNeasy Tissue Kit and following the protocol provided by the manufacturer for gram-

positive bacteria.

Amplification of Dori att Site

A blast search using the predicted tyrosine integrase gene in the Dori genome was

used to search for the integration site in its host M. Smegmatis. Microsoft word was used

to align the resulting match and identify the areas of similarity. The resulting information

was used to calculate the possible sizes of the recombination fragments of M.Smegmatis

infected by Dori. Four primers were used in PCR analysis, two for the bacterial genome

and two for the mycobacteriophage genome. These primers were then arranged into four

Page 14: ANNOTATION AND ANALYSIS OF NEWLY DISCOVERED ...beng.soe.ucsc.edu/sites/default/files/project-reports/B...BLAST searches indicate that phage Dori's genome sequence is distinct from

14

possible combinations in order to amplify the att site for M.Smegmatis alone, a pure

bacteriophage sample, and a sample that contained infected M.Smegmatis. Six 20�l

reactions were performed using a control with no template and a sample with infected

M.Smeg and bacterial primers only with the specific settings required for the Mango Mix

containing enzyme. The results were run on a 1% TA+ EtBr gel using a 100 bp marker

Bst EII/�.

Discussion

Since the discovery and characterization of bacteriophage lambda, our

understanding of bacteriophage mechanisms and life cycles have enabled researchers to

engineer tools based on phages for use in biological research and have provided new

insights into bacterial evolution, genetics and physiology. Upon annotating the genome of

phage Dori, we discovered that the Dori genome contains many genes with bacterial

homologs. We hypothesize that these were acquired through horizontal gene transfer. In

order to explain the importance of these genes in our evolved phage Dori, future

experimentation involving mutagenesis or gene knock would be necessary. The Rec E-

like gene found in Dori could imply that there could be a REC T-like gene in the genome

that is not currently annotated in the BLAST database, with this in mind we could be able

to use Dori for BRED if either the a REC T- like protein is present or if other factors are

enough for a REC E-like only recombineering system.

The Firecracker genome annotation did not reveal many bacterial gene homologs,

however, the genome shared almost complete similarity to that of phage Corndog. Both

genomes now make up the cluster O of the mycophages. The Firecracker genome also

Page 15: ANNOTATION AND ANALYSIS OF NEWLY DISCOVERED ...beng.soe.ucsc.edu/sites/default/files/project-reports/B...BLAST searches indicate that phage Dori's genome sequence is distinct from

15

contains a large number of sequence repeats, which we analyzed using Gepard and

MEME. One repeat occurs throughout the Firecracker genome, and its orientation

corresponds to the orientation of the underlying genes. This observation is reminiscent to

that of “stoperators”, repressor-binding sites that occur in a gene-specific orientation

throughout the genomes of Bxb1, L5 and other cluster A phages. Curiously, we have net

yet identified a protein with a DNA binding domain in the Firecracker genome. Further

analysis will be directed at identifying DNA (or RNA) binding proteins that may

recognize this sequence. A sequence set of repeats in the Firecracker genome is restricted

to the 3’ end of the genome. The potential function or origin of these repeats is obscure

and will need further analysis. Finally, our initial analysis indicates that the Firecracker

repeats are found in Corndog. The relationships between the repeat structures of these

phages deserves a more careful analysis.

Although there has been an increase in research focusing on bacteriophage

genomics, the rapid evolutionary rate and the abundance of bacteriophage continues to

make this area of research a frontier.

Figures

Page 16: ANNOTATION AND ANALYSIS OF NEWLY DISCOVERED ...beng.soe.ucsc.edu/sites/default/files/project-reports/B...BLAST searches indicate that phage Dori's genome sequence is distinct from

16

Fig. 1: Electron micrographs of phages Dori and Firecracker. (Left) Dori shows an icoshedral head and long, flexible tail. (Right) Firecracker shows a cylindrical head and long tail (scale bar=100 nanometers)

Fig. 2: Sequence coverage of assembled 454 sequencing reads of phage Dori. 454 sequencing results were viewed using Hawkeye viewer from the AMOS suit. Although the average read coverage is only ~20x the region around 21K shows ~200 fold coverage.

Page 17: ANNOTATION AND ANALYSIS OF NEWLY DISCOVERED ...beng.soe.ucsc.edu/sites/default/files/project-reports/B...BLAST searches indicate that phage Dori's genome sequence is distinct from

17

Fig. 3: Annotated genome of Bacteriophage Dori. The genes labeled in green are transcribed from left to right; genes in red are transcribed in the opposite direction. Genes with homologs or domains of known function are labeled. Annotated genome of phage Dori, the presence of integrase and phage antirepressor genes suggest Dori may be able to form lysogens.

Fig. 4: Restriction Digest of Bacteriophage Dori. In order to find the ends of the Dori genome, whether they were cohesive or not, and verify the phage’s packaging technique we performed a restriction digest using the following enzymes heated and non heated (each enzyme corresponds to the lanes in the order from right to left): Bst EII/� heated, Bst EII/�, Eco RV heated, Eco RV, Bgl II heated, and Bgl II.

Page 18: ANNOTATION AND ANALYSIS OF NEWLY DISCOVERED ...beng.soe.ucsc.edu/sites/default/files/project-reports/B...BLAST searches indicate that phage Dori's genome sequence is distinct from

18

Page 19: ANNOTATION AND ANALYSIS OF NEWLY DISCOVERED ...beng.soe.ucsc.edu/sites/default/files/project-reports/B...BLAST searches indicate that phage Dori's genome sequence is distinct from

19

Fig. 5: P

redicted att site in Dori genom

e. Highlighted region represents the sim

ilarities between D

ori and M. Sm

egmatis at

the predicted att site. The att site has been confirm

ed by PCR

analysis.

Page 20: ANNOTATION AND ANALYSIS OF NEWLY DISCOVERED ...beng.soe.ucsc.edu/sites/default/files/project-reports/B...BLAST searches indicate that phage Dori's genome sequence is distinct from

20

Fig. 6: Annotated genome of Bacteriophage Firecracker. The genes labeled in green are transcribed from left to right; genes in red are transcribed in the opposite direction. Genes with homologs or domains of known function are labeled. Annotated genome of phage Firecracker, genome has high similarity to phage Corndog.

Figure 7: Predicted palindrome motif for Firecracker. Motif found by MEME sequence analysis of the firecracker genomic sequence. The motif occurs more than 30 times in the Firecracker genome.

Page 21: ANNOTATION AND ANALYSIS OF NEWLY DISCOVERED ...beng.soe.ucsc.edu/sites/default/files/project-reports/B...BLAST searches indicate that phage Dori's genome sequence is distinct from

21

References

1. Hatfull, G. F., et al. 2006. Exploring the mycobacteriophage metaproteome: phage genomics as an educational platform. PLoS Genet. 2: e92.

2. Marinelli LJ, Hatfull GF, et all. 2008. BRED: a simple and powerful tool for constructing mutant and recombinant bacteriophage genomes. PLoS ONE. 3:e3957

3. Summer, E. J. 2009. Preparation of a phage DNA fragment library for whole genome shotgun sequencing. Methods Mol. Biol. 502:27-46

4. Casjens SR, Gilcrease EB. 2009. Determining DNA packaging strategy by analysis of the termini of the chromosomes in tailed-bacteriophage virions. Methods Mol. Biol. 502:91-111

5. Käser M, et all. 2009. Optimized method for preparation of DNA from pathogenic and environmental mycobacteria. Appl Environ Microbiol. 75:414-418

6. Hatfull GF, et all. 2010. Comparative genomic analysis of 60 Mycobacteriophage genomes: genome clustering, gene acquisition, and gene size. J Mol Biol. 397:119-43

7. Pope WH, et all. 2011. Expanding the diversity of mycobacteriophages: insights into genome architecture and evolution. PLoS One. 6:e16329