functional characterization and gene regulation of … guo.pdf · the thesis entitled `` functional...
TRANSCRIPT
2
K Ø B E N H A V N S U N I V E R S I T E T
Supervisor: Xu Peng
Submitted: 05/11/14
PhD Thesis
Yang Guo
Functional characterization and Gene regulation
of the archaeal virus SIRV2
2
II
Preface
The thesis entitled `` Functional characterization and gene regulation of the archaeal virus
SIRV2 ´´ was submitted to the Faculty of Science, University of Copenhagen to obtain the
degree. I have been financed by a scholarship from China Scholarship Council and a stipend
from the European Union.
Almost all the experimental work presented in this thesis was performed at Danish Archaea
Centre (DAC), Department of the Biology, University of Copenhagen, Copenhagen,
Denmark, under the supervision of Associate Professor Dr. Xu Peng. Protein Circular
dichroism (CD) spectroscopy was performed at the SBIN lab, Department of the Biology,
University of Copenhagen and the high-throughput sequencing was carried out at
Department of System Biology, Technical University of Denmark, Copenhagen, Denmark.
The thesis starts with a briefly summary of archaea and its viruses. Some typical viruses
with unexpected morphotypes and genome structures were described and SIRV2 infection
life cycle was also presented in detail. Then the second parts is the summary of the results
mainly described ssDNA binding, annealing and nuclease activities of a conserved gene
cluster of SIRV2, and a regulation map of two transcription regulators of Sulfolobus
solfataricus P2 upon SIRV2 infection. At last, it ends with the conclusions and further
perspectives for future work. Two manuscripts are enclosed behind.
Author: Yang Guo
Date: 05/11/14
Place: Copenhagen, Denmark
III
Acknowledgments
First and most, I would like to greatly thank my academic supervisor Xu Peng for endless
support and always positive attitude, for her patient guidance and fruitful discussions. I also
like to thank Xu for bringing me along the inspiring and exciting research trips from the
very beginning of my biology study.
Further, I would like to thank Qunxin She and Roger A. Garrett for the valuable suggestions
and scientific discussions.
Moreover, I want to thank the lab. technicians at Denmark Archaea Center, Hein Phan and
Mariana Awayez for the great technical assistance in the laboratory.
Many great thanks to all the members in Denmark Archaea Center, Ling Deng, Fei He,
Laura Alvarez, Daniel Jensen, Soley Gudbergsdottir, Wenyuan han, Wenfang Peng,
Guannan Liu ,Carlos Sobrino, Kristine Uldahl and Marzieh Mousaei. You are inspring and
talented people on various projects, and bring me many fun times in the laboratory.
Last but not least, I would like to thank my beloved family and friends outside the scientific
environment for unreserved love and support throughout the duration of my Ph.D.
IV
Table of content
Preface II
Acknowledgments III
Summary (English) V
Resumé (Danish) VII
List of the Publications IX
Abbreviations X
Thesis Objective XIII
Introduction 1
1. Archaea 2
1.1 Classification of Archaea 3
1.2 Sulfolobus 5
2. Archaeal Viruses 8
2.1 Crenarchaeal viron morphotypes and their genome 9
2.2 Sulfolobus islandics rod-shaped virus 2 13
2.3 SIRV2 life cycle 15
2.3.1 Attachment and Entry 15
2.3.2 SIRV2 Gene Transcription and Regulation 17
2.3.3 SIRV2 Replication 18
2.3.4 SIRV2 Release Mechanism 20
Summary of Results 23
Future Perspectives 28
Reference 30
Manuscript I 40
Manuscript II 85
V
Summary (English)
Viruses infecting hyperthermophilic archaea have gained wide attention during recent years
owing to its remarkable diversity on morphology and genome structures. Although a
substantial work was made to decipher the functions of the unique proteins encoded by
archaeal viruses and to characterize the relationship of the viruses and host cells, the
knowledge on the biology of the archaeal viruses is still limited. The crenarchaeal virus
Sulfolobus islandics rod-shaped virus 2 (SIRV2), was emerging as a promising model for
genetic and biochemical studies as well as for the characterization of different stages in viral
infection cycle. However, similar to other archaeal viruses, the majority of the SIRV2
genome sequence showed little similarity to the public databases, which hindered the virus
functional researches and raised challenges in protein comparison and prediction.
This thesis comprises two parts of results. Firstly, the functional characterization of a highly
conserved operon of SIRV2 was described, revealing their unique protein structures,
biochemical activities as well as possible biological process they may participate in. In the
second part, the genome wide regulations of two Sulfolobus sofataricus P2 transcription
regulators upon SIRV2 infection were firstly constructed.
A SIRV2 gene operon (gp17, gp18 and gp19) was found to be the only and highly
conserved gene clusters in rudiviruses and filamentous viruses, suggesting an important
function in both viral families. The experimental results showed that ORF131b (gp17) was a
novel ssDNA binding protein, without a canonical ssDNA binding domain. A few positively
charged residues forming a U-shaped binding channel on the gp17 dimer are crucial for its
ssDNA binding activity. The intrinsically disordered C-terminus of gp17 was demonstrated
to be involved in the interaction with gp18, which was predicted previously as a helicase but
showed a ssDNA annealing activity in this study. gp19 was shown to possess a 5´ to 3´
ssDNA nuclease activity, in addition to the previously demonstrated endonuclease activity,
and a weak interaction between gp18 and gp19 was also detected. The functional
characterization of the entire operon and the strand-displacement replication mode proposed
previously for SIRV2 strongly point to a role of the operon in genome maturation and/or
DNA recombination in viral gene DNA replication and repair.
Two transcription regulators sso2474 and sso10340 from Sulfolobus solfataricus P2 were
differently expressed upon SIRV2 infection. A method similar to, but simpler than,
VI
Chromatin immunoprecipitation combined with subsequent high-throughput sequencing
(Chip-seq) was applied in this study to get insight into the gene composition of the two
protein regulons in vivo. After mapping the sequence data with the genomes of Sulfolobus
solfataricus P2 and SIRV2, protein sso2474 was detected to have a high binding affinity to
virus genome by an unknown mechanism, whereas sso10340 or its interacted protein
preferred to bind and regulate the host genes on several binding sites. A total of 27 enriched
DNA fragments extracted from sso10340 complex were selected as candidate binding
targets from the host genome for the further analysis using EMSA (Electrophoretic mobility
shift assay) and foot printing assay. A palindromic sequence motif was defined based on the
enriched sequences, and most of these target genes were involved in energy metabolism,
transport and amino acid metabolism. The genome-wide binding profile presented here
reflected two different kinds of regulon conditions and contribute to the knowledge
expansion of the transcription regulation upon virus infection in Sulfolobus.
VII
Resumé (Danish)
Vira der inficerende hypertermofile archebakterier har fået stor opmærksomhed i de seneste
år på grund af deres bemærkelsesværdige mangfoldighed indenfor morfologi og genom
strukturer. Selv om et stort arbejde bliver gjort for at identificer funktionen af de unikke
proteiner arke virus kodet for og at beskrive forholdet mellem vira og værtsceller, er viden
om arke viras biologi stadig begrænset. Den crenarchaeal virus Sulfolobus islandics
stavformet virus 2 (SIRV2), er et lovende model for genetiske og biokemiske undersøgelser
samt til karakterisering af forskellige stadier af virusinfektionscyklus. Men i lighed med
andre arke vira, har størstedelen af SIRV2 genom sekvens ringe lighed med sekvenser i de
offentlige databaser, dette hindrede funktionel virus forskning og giver store udfordringer
ved sammenligning af og funktionelle forudsigelse af proteiner.
Denne afhandling består af to dele. Første beskrives hvordan en særdeles konserveret
operon fra SIRV2 bliver funktionelle karakteriseret, her afsløres operons proteiners unikke
strukturer, biokemiske aktiviteter samt mulig biologisk processer, de kan deltage i. I den
anden del bliver to Sulfolobus sofataricus P2 transskription regulators mål identificeret i
hele host genomet for første gang.
En SIRV2 gen operon (gp17, gp18 og gp19) blev anset for at være de eneste og
højkonserverede genklynger i rudiviruses og trådformede vira, hvilket tyder på en vigtig
funktion i begge vira familier. De eksperimentelle resultater viste, at ORF131b (gp17) var
en hidtil ukendt ssDNA bindende protein uden et kanonisk ssDNA bindende domæne. Et
par positivt ladede aminosyre, danner en U-formet substrat kanal på gp17 dimer. Dette er
afgørende for gp17s ssDNA bindende aktivitet. Den naturlige uordnet C-terminalen del af
gp17 blev påvist at være involveret i interaktionen med gp18. Som tidligere forudsigelser
har klassificeret som en helicase, men i denne undersøgelse viste gp18 ssDNA bindende
aktivitet. Det blev påvist at gp19 har 5' til 3' ssDNA nuklease aktivitet, udover den tidligere
påvist endonukleaseaktivitet. Ydermere blev en svag interaktion mellem gp18 og gp19 blev
også påvist. Den funktionelle karakterisering af hele operonet og streng-fortrængning
replikation metode som tidligere er foreslået for SIRV2 peger kraftigt på operonens rolle i
genomet modning og / eller DNA-rekombination af viral-DNA under replikation og
reparation.
VIII
To transskription regulatorer sso2474 og sso10340 fra Sulfolobus solfataricus P2 blev
forskelligt udtrykt ved SIRV2 infektion. En metode, der ligner, men er enklere end,
Chromatin immunopræcipitation kombineret med efterfølgende høj-throughput
sekventering (Chip-seq) blev anvendt i denne undersøgelse for at få indsigt i den genetiske
opbygning af de to proteiners regulon in vivo. Efter kortlægning af sekvens data mod
genomerne fra Sulfolobus solfataricus P2 og SIRV2 blev det påvist protein sso2474 have en
høj affinitet til virus genom via en ukendt mekanisme, hvorimod sso10340 eller dets
interaktion partner foretrak at binde og regulere værtsgener på flere steder på genomet. I alt
27 berigede DNA-fragmenter blev udvundet fra sso10340 kompleks blev udvalgt som
mulige bindings mål i værtsgenomet og yderligere analyse ved hjælp af EMSA
(Electrophoretic mobility shift assay) og fodaftryk analyse. Et palindromt mønster blev
defineret på basis af de berigede sekvenser. De fleste af de genre relateret til dette mønster
var involveret i stofskiftet, aminosyretransport og metabolismen. Profilen for de to
proteiners binding til DNA, der dækker hele genomet, afspejler to forskellige typer af
regulons og er med til at udvide viden om regulation af transskription i relation til virus
infektion i Sulfolobus.
IX
List of the Publications:
Article I
Guo,Y., Kragelund,B., White, M., and Peng, X. Single-strand DNA binding,
annealing and nuclease activities encoded by a conserved archaeal viral gene cluster.
Submitted to Nucleic acid research.
Article II
Guo, Y., and Peng,X. Genome-wide binding profile of two transcription regulators
from Sulfolobus solfataricus.
In prep.
Article III
Deng,L., He,F., Bhoobalan-Chitty,Y., Martinez-Alvarez,L., Guo,Y., and Peng,X.
(2014) Unveiling cell surface and type IV secretion proteins responsible for archaeal
rudivirus entry. J Virol 88: 10264-10268.
X
Abbreviations
ABV, Acidianus bottle-shaped virus
ACV, Aeropyrum coil-shaped virus
AFV1, Acidianus filamentous virus 1
APBV1, Aeropyrum pernix bacilliform virus 1
ARV1, Acidianus rod-shaped virus 1
ASV1, Acidianus spindle-shaped virus 1
ATP, Adenosine Triphosphate
ATV, Acidianus two-tailed virus
Ala, Alanine
Amp, Ampicillin
bp, Base pair
BSA, Bovin serum albumin
Cam, Chloramphenicol
CRISPR, Clustered Regularly Interspaced Short Palindromic Repeats
dsDNA, Double-Strand DNA
DTT, Dithiothreitol
EB, Elution Buffer
E.coli, Escherichia coli
EDTA, Ethylenediaminetetraacetic acid
EMSA, Electrophoric mobility shift assay
GST, Glutathione-S-trasferase
Hjr, Holliday junction resolvases
ICTV, International Committee on Taxonomy of Viruses
IPTG, Isopropyl-beta-D-thiogalactopyranoside
ITR, Inverted terminal repeats
Kan, Kanamycin
KDa, Kilo-Dalton
XI
LB medium, Lysogeny Broth medium
Ni-NTA, Ni-nitrilotriacetic acid
MALDI-TOF, Matrix-assisted laser desorption/ionization-time of flight
MCP, Major capsid protein
M.O.I, Multiplicity of infection
mM, Mili Molar
OD, Optical density
ORF, Open Reading Frame
PAV, Pyrococcus abyssi virus
PAGE, Poly acrylamide gel electrophoresis
PBS, Phosphate Buffered Saline
PCNA, Proliferating cell nuclear antigen
PCR, Polymerase Chain Reaction
PDB, Protein Data Bank
PMSF, Phenylmethylsulfonyl fluoride
PSV1, Pyrobaculum spherical virus 1
Pfu, Pyrococcus furiosus
RCR, Rolling-circle replication
rRNA, Ribosomal RNA
SIFV, Sulfolobus islandicus filamentous virus
SIRV1/2, Sulfolobus islandicus rod-shaped virus ½
SNDV, Sulfolobus neozealandicus droplet-shaped virus
SMRV1, Sulfolobales Mexican rudivirus 1
SRV, Stygiolobus rod-shaped virus
SSU, Small-subunit
SSV, Sulfolobus spindle-shaped virus
SSVK1, Sulfolobus spindle-shaped virus K1, Kamchatka
SSVrh, Sulfolobus spindle-shaped virus RH, Yellowstone
STIV1/2, Sulfolobus turreted icosahedral virus 1/2
XII
STSV1, Sulfolobus tengchongensis spindle-shaped virus 1
TEMED, N, N, N', N'-tetramethylethylenediamine
TTSV1, Thermoproteustenax spherical virus 1
TTV1, Thermoproteus tenax virus 1
VAP, Virus-associated pyramid
WT, Wild-type
XIII
Thesis Objective
The objective of this PhD is mainly focused on the functional characterization of a
conserved archaeal viral gene cluster-ORF131b (gp17), ORF436 (gp18) and ORF207 (gp19)
of SIRV2 to investigate their possible roles in the whole virus life cycle. Besides, the
genome wide regulation of two Sulfolobus solfataricus transcription regulators upon SIRV2
infection were also studied to get a better understanding of the regulation network between
virus and host cells.
2
Introduction
2
1 Archaea
Evolution is a process through which the composition of genes in a population changes over
generations, and it seems to progress in a quantized way, from one lever or domain of
organization rising ultimately to a more complex one. In the early to middle 20th
century,
microbiologists tried to classify microorganisms based on the structures of their cell walls,
their shapes, and the substances they consume. Until five decades ago, Zuckerkandl and
Pauling claimed that it is at the level of molecules (particularly molecular sequences) that
one really becomes privy to the workings of the evolutionary process. The comparative
analysis of the molecular sequences started to become a powerful approach for determining
evolutionary relationship (Zuckerkandl and Pauling, 1965).
The Ribosomal RNA was chosen to be a candidate molecule to detect relatedness among
distant species due to broad distribution, slowly changed sequence and a component of
self-replicating systems (Zablen et al., 1975). In 1977, Woese and Fox digested the 16S
(18S) ribosomal RNA of the organisms with T1 RNase and subjected the products to
two-dimensional electrophoretic separation, producing oligonucleotide fingerprint to
identify the relationships of the living system. They found out that many of the prokaryotes
once classified as bacteria belong to their own domain, which was later classified as a third
domain – Archaea, meaning ancient and primitive in ancient Greek language (Woese and
Fox, 1977).
The discovery of the new microbial kingdom eventually led to the classification of all
known life into three major Domains: Eucarya (all eukaryotes), Archaea, and Bacteria,
which was a significant breakthrough in the history of biology (Forterre et al., 2002).
Actually in the early 1980s, people already realized that Thermoplasma and Halobacterium
had close evolutionary affinity with Methanogens, all of which were the representatives of
known archaea (Woese et al., 1990). For a long time, archaea were seen as extremophiles
that only exist in extreme habitats such as hot springs and salt lakes with high salt
concentration (Oren, 2002), low pH (Johnson et al., 2008) or high temperature (Stetter,
2006). At the end of the last century, more organisms were discovered along with new
habitats were studied, archaea have been found in a wide variety of non-extreme
environments, including marine waters (DeLong, 1992), freshwater sediments (Schleper et
al., 1997) as well as all kinds of soil environments (Bintrim et al., 1997;Oline et al., 2006).
3
They are globally distributed in nature and have become common microbes in environment.
Since archaea can survive in such harsh conditions, they can provide a source of enzymes
that resist to heat and/or to acidity, which is a valuable treasure for industry (Breithaupt,
2001). The most familiar application of an archaeal enzyme is the thermostable Pfu DNA
polymerase from Pyrococcus furiosus, allowing the accurate polymerase chain reaction
(PCR) to be widely used in biology science researches. There are many hidden treasures in
archaea still waiting to be deciphered by old and new Archaea lovers.
1.1 Classification of Archaea
Based on the pioneering work of Carl Woese, the small subunit ribosomal RNA (ss rRNA)
is widely used in molecular phylogenetic studies to investigate the relationship between
organisms, rather like some classification systems that trying to group archaea based on the
shared structural features and common ancestors (Gevers et al., 2006). At the early stage,
archaea was further classified into two distinct groups, that the methanogens as well as their
relatives were named as Euryachaeotes and the thermoacidophiles, sulfurdependent ones
were categorized as Crenarchaeota (Woese et al., 1990). Most of the cultivable and
well-studied archaeal species exhibit in these two main phyla. In 2002, the peculiar species
Nanoarchaeum equitant was found. It harbors the smallest archaeal genome with a spherical
cell shape and had been given its own phylum – Nanoarcheota (Hohn et al., 2002). Another
small new group of thermophilic archaeal species, exhibiting an apparent affinity to the
Crenarchaeota, but also sharing features with Euryarchaeota, were identified as
Korarchaeota (Elkins et al., 2008;Anderson et al., 2008). A fifth group has also been created
as Thaumarchaeota in recent years (Guy and Ettema, 2011).
Euryarchaeota, as one of the major phyla in Archaea, encompasses the most diversified
phenotypes. The cultivated Euryarchaeota is subdivided into eight groups ( Thermococci,
Methanopyri, Methanococci, Methanobacteria, Thermoplasmata, Archaeoglobi,
Halobacteria and Methanomicrobia ), while Methanogenesis was the main invention that
occurred in the euryarchaeal phylum along with halophiles, some thermoacidophiles as well
as some hyperthermophiles (Gribaldo and Brochier-Armanet, 2006). In contrast, most of the
cultivable Crenarchaeota strains belong to the thermophilic or hyperthermophilic species,
showing a very limited phenotypic diversity (Forterre et al., 2002). However, since the
marine archaeal group was discovered and identified as characteristic Crenarchaeota by
4
environmental rRNA, it is thought that they may be the extremely abundant archaea in the
marine environment and could be a significant component of deep-sea metabolism
(Fuhrman et al., 1992).
The orders -Thermoproteales, Caldisphaerales, Desulfurococcales and Sulfolobales
represent the four lineages of the Crenarchaeaotal branch of Archaea. Thermoproteales are
rod-shaped extreme thermophiles or hyperthermophiles. They are the only organisms known
to lack the canonical SSB proteins, instead possessing the protein ThermoDBP specifically
bound to ssDNA (Paytubi et al., 2012). The Sulfolobus species are relatively easy to
cultivate due to the aerobic lifestyle and relatively short doubling times, and as the only
genetic manipulatable representatives in Crenarchaeota, have developed into model
organisms to study their DNA repair, replication, transcription, chromosome integration,
RNA processing, cell division, virus-host interaction systems as well as many other cellular
aspects (Bernander, 2007).
5
Figure 1. Small subunit ribosomal RNA-based phylogenetic tree. The thick lineages
represent Hyperthermophiles. (modified from Stetter, 2006)
1.2 Sulfolobus
Since the first description by T. Brock in 1972 about Sulfolobus acidocaldarius, isolated
from a hot spring in Yellowstone National Park, this new group of sulfur-oxidizing
6
organisms has been of interest both evolutionarily and geochemically (Brock et al., 1972).
Sulfolobus species has been isolated from a wide variety of acid thermal areas (in the USA,
Italy, Iceland, Russia and elsewhere), with optimal growth occurring at pH 2-3
and temperatures of 75-80 °C, making them acidophiles and thermophiles respectively. So
far, most strains isolated are able to grow heterotrophically as well as autotrophically. Since
the Sulfolobus species have a wide geographic distribution, they are normally named after
the location where they were first isolated, e.g. Sulfolobus islandics strains were isolated in
Iceland (`island` is German for `Iceland`) (Zillig et al., 1994), Sulfolobus tengchongensis
from Teng Chong, China (Xiang et al., 2003) and Sulfolobus solfataricus from volcanic
hot springs at Pisciarelli Solfatara (Zillig et al., 1980). Among these species, S.solfataricus
is one of the best-characterized and most commonly used strains in laboratories.
Figure 2. Electron micrographs of Sulfolobus solfataricus strain, DSM 1617, thin section
(from Zillig et al., 1980).
The Sulfolobus strains DSM 1616 and DSM 1617 (Fig 2.) were firstly named as Sulfolobus
solfataricus by Zillig for having a similar GC content but significantly different RNA
polymerase molecular weights with respect to S. acidocaIdarius (Zillig et al., 1980).
Moreover, they were newly renamed as S. solfataricus P1, P2 and have developed as the
main model species that researchers work on, especially when the genome sequence of S.
solfataricus strain P2 was published by She, the transcriptome map was drawn by Wurtzel,
providing rich detailed information for the further work on DNA replication mechanism,
cell cycle, transcription and large numbers of unknown genes (She et al., 2001;Wurtzel et
al., 2010). Wealthy data, standardized methods, maturing genetic system, the easy
7
lab-cultivating advantages as well as the host species for studying virus-host interactions
contributed to the construction of Sulfolobus solfataricus as a model organism (Albers et al.,
2009;Worthington et al., 2003;Deng et al., 2009).
The widely used strain S. solfataricus P2 (DSM1617) showed a low fraction of susceptible
cells to both Sulfolobus islandicus rod-shaped virus 2 (SIRV2) and Sulfolobus turreted
icosahedral virus (STIV) infection (Okutan et al., 2013;Ortmann et al., 2008). In this work,
the highly susceptible mutant strain S. solfataricus P2 5E.6 was selected as the host strain
for SIRV2 infection study. The strain carries a deletion of CRISPR (Clustered regularly
interspaced short palindromic repeats) clusters A-D, but shows similar phenotype on
chromosome degradation and virus life cycle as the natural host strain S. islandicus
LAL14/1 upon SIRV2 infection (Okutan et al., 2013;Bize et al., 2009).
8
2 Archaeal Viruses
Along with archaeal communities, viruses thrive in extreme conditions and play an
important role in ecosystem dynamics. Like members of the other domains of life, archaea
are infected with viruses. At the initial stage, the thermophilic viruses isolated from archaea
domain resembled bacteriophages with head-tail in morphotype (Martin et al., 1984), and
the subsequently discovered Euryarchael viruses were also similar with bacterial viruses. In
contrast, as more habitats were studied, viruses were found infecting members of the
kingdom Crenarchaeota (Sulfolobus, Acidianus, Pyrobaculum and Thermoproteus) and
exhibiting highly diverse morphotypes and genomic properties. The extraordinary shape of
some viruses have never been observed before (Prangishvili, 2003). Most crenarchaeal
viruses have been isolated from hot terrestrial habitats, and they show adaption to their
extreme environments like their host.
Due to the abundance and unique biology, some challenges arise with archaeal viruses. The
low lever of sequence similarity to public databases, novel biochemical mechanisms as well
as difficulties in virus and host cultivation need to be addressed (Prangishvili and Garrett,
2005). After a relativly intensive study on archaeal viruses in the last decade, about 100
viral species have been sequenced and their genomic properties as well as relationships with
the host cell have also been described. Among these viruses, only two single-strand DNA
virus species have been discovered (Pietila et al., 2010;Mochizuki et al., 2012) , the others
all possess double-strand DNA genomes. According to the International Committee on
Taxonomy of Viruses (ICTV), bacterial viruses comprise nine morphotypes, which belong
to ten families. While archaeal viruses, exhibit 16 different morphotypes, and are classified
into 15 families (Ackermann and Prangishvili, 2012) (Fig 3.). Although limited in number,
compared with the viruses infecting bacteria, the diverse and unique morphotypes of
archaea viruses revealed new insights into the viral world.
9
Figure 3. Virion morphotypes of prokaryotic viruses. Names of viral genera or families based on International
Committee on Taxonomy of Viruses (ICTV) are indicated below the schematic virus particles. If an archaeal
virus has not been assigned to any genus or family, individual virus names are given. The virions are not
drawn to scale (from Pietila et al., 2014) .
2.1 Crenarchaeal viron morphotypes and their genomes
Thermophilic viruses infecting the crenarchaea have been classified into ten families based
on their morphology, eight families were approved by ICTV and the remaining two are
waiting for approval (Pietila et al., 2014). The ten crenarchaeal virus families are: One tail
spindle –shaped Fuselloviridae (SSV1-7, SSVK1, SSVrh, ASV1); two tail spindle–shaped
Bicaudaviridae (ATV); Bottle-shaped Ampullaviridae (ABV); Droplet-shaped Guttaviridae
(SNDV); Linear filamentous Lipothrixviridae (AFV1-9,SIFV,TTV1); Linear rod-shaped
rudiviridae (SIRV1-2,SRV,ARV1); Spherical Globulaviridae (PSV1,TTSV1); Bacilliform
Clavaviridae (APBV1), Tailless icosahedral `Turriviridae` (STIV,STIV2) and Coil-shaped
Spiravridae (ACV), the last two families are waiting for the approval. Some other viruses
10
like Sulfolobus tengchongensis spindle-shaped virus (STSV1) and Pyrococcus abyssi virus
(PAV1) are awaiting assignment to a viral family. Some well-studied and intriguing viruses
will be described in more detail as below:
Sulfolobus spindle-shaped virus 1 (SSV1). The Sulfolobus spindle-shaped viruses (SSVs)
of the family Fuselloviridae were the first discovered family of archaeal viruses. Most of the
SSVs (except for SSV6 and ASV1) are spindle-shaped, 100 x 60 nm in size and carry tail
structures at one pole (Fig 4.).
Figure 4. Electron micrographs of virus
particles. (A) Cell apparently extruding virus.
(B) Free virus and virus particles attached to
cellular material. Two large particles are
arrowed. (C) Purified free virus particles
exhibiting tail structures. Three bullet-shaped
particles are seen on the right. (D) Thin sections
of cells sampled 6 h after u.v. irritation showing
three cell-cell contacts. The bars represent 0.2
um (from Martin et al., 1984).
The virus SSV1 was isolated from UV-induced growing cultures of Sulfolobus shibatae
(strain B12). It contains a 15.5-kb positively supercoiled circular double-stranded DNA,
with a GC-content of 39.7 %, resembling that of the host DNA (Palm et al., 1991). During
infection SSV1 is stably carried by its lysogenic host S. shibatae and is found
intracellularlly either in a covalently closed circular (plasmid) form or site-specifically
integrated within an arginine tRNA gene in the host chromosome (Yeats et al., 1982). The
transcription pattern of the SSV1 genome is relatively simple, some genes are significantly
upregulated by UV irradiation and the genes can be clearly divided into early, late and
UV-inducible categories (Reiter et al., 1987;Frols et al., 2007).
Acidianus two-tailed virus (ATV). This archaeal virus was discovered in an acidic hot
spring (85–93 °C; pH 1.5) at Pozzuoli, Italy. As the sole member of the viron family
Bicaudaviridae, ATV contains a lemon-shaped central structure, but when it exits the host
cell, it then develops elongated tails protruding from both pointed ends, specifically at
11
temperatures above 75°C, close to the temperature of the natural habitat of the host (Fig. 5).
The circular, dsDNA genome contains 62730 bp, encodes 72 predicted proteins, 11 of
which are structural proteins with molecular masses in the range of 12 to 90 kDa. The
unique host-independent as well as extracellular functional activity might be associated with
an 88.7-kDa ATV viron protein P800, which is rich in coiled-coil motifs and can generate
structures that resemble intermediate filaments (Haring et al., 2005c;Prangishvili et al.,
2006c). ATV was the first known virus from hot, acidic habitats that causes lysis of its host
cell, whereas most archaeal viruses maintain a stable relationship with their host.
Figure 5. Electron micrographs of
Acidianus convivator and different forms
of the Acidianus two-tailed virus. a,
Virions in an enriched sample taken from
acidic hot springs in Pozzuoli, Italy (pH
1.5, 85–93 °C). b, Extrusion of
lemon-shaped virions from an
ATV-infected A. convivator cell. c,
Virions in a growing culture of
ATV-infected A. convivator, 2 days after
infection. d, Cultured virions after
purification and incubation at 75 °C for 0,
2, 5, 6 and 7 days ( panels from right to
left, respectively) (from Haring et al.,
2005c).
Acidianus bottle-shaped virus (ABV). The enveloped virion of ABV, has a complex form
resembling a bottle (230-nm long, 4–75-nm wide, Fig 6.C), the morphology is so unique
that it has been assigned to a new family Ampullaviridae. The narrow end of `the bottle` is
likely to be involved in cellular adsorption and in channeling of viral DNA into the host cell
(Fig 6. A) and the broad end exhibits 20 thin filaments, which are inserted into a disk and
interconnected at the base, the function of these filaments remains unclear but very
intriguing (Fig 6.B) (Haring et al., 2005a).
ABV was isolated from the same hot spring in Pozzuoli, Italy, where ATV was isolated. It
infects strains of the hyperthermophilic archaeal genus Acidianus, and contains a linear
double-stranded DNA. The viron genome has a length of 23,814 bp, with a G +C content of
12
35%, and a 590-bp inverted terminal repeat. It encodes 57 predicted ORFs, of which a
putative RNA molecule was predicted to have a notable secondary structural similarity to
the bacteriophage RNA molecule, which has been implicated in DNA packaging. Moreover,
in contrast to other crearchaeal viruses, ABV encodes a Family B DNA polymerase (Peng et
al., 2007).
Fig. 6. Electron micrographs of particles of ABV after negative staining with 3% uranyl acetate. (A) ABV
particles adsorbed with their pointed end toward a membrane vesicle of the host “A. convivator.” (B) ABV
particles attached to each other with their thin filaments at the broad end. Bars, 100 nm. (C) A scheme of the
structure of an ABV virion (from Haring et al., 2005a).
Acidianus filamentous virus 1 (AFV1). AFV is a Lipothrixvirus that infects the Acidiannus
genus of the Crenarchaeota in a stable carrier state and was observed in an enrichment
culture from a hot spring at 80 °C in Crater Hills region of Yellowstone National Park
(Rachel et al., 2002). AFV1 is composed of a protein core covered with a lipid envelope,
containing at least five different proteins with molecular masses in the range of 23-130 kDa.
The 20.8-kb-long linear genome contains 40 ORFs and particles of AFV1 are measured
with size of 900 × 24 nm.
AFV1 exhibits claw-like terminal structures, connected to the virion body by appendages at
the both ends (Fig 7.A). Apparently, the unusual termini of the virions have a special
function in the process of adsorption, which was detected to have an attachment with the
host pili and the contact seems rather strong (Fig 7.B) (Bettstetter et al., 2003). Crystal
structures of two major coat proteins AFV1-132 and AFV1-140 have been resolved, both
carry a novel four-helix-bundle fold and AFV1-140 also carries an extra C terminal domain
possibly interacting with the virion envelope (Goulet et al., 2009b). Recently, a new
replication model was proposed by analyzing the replicative intermediates on
C
13
two-dimensional (2D) agarose gel, revealing that the genome replication started from a
D-loop formation, proceeded via strand displacement, and terminated by recombination.
This process in some degree resembled the T4 DNA replication, but further studies are still
needed to support the proposed model (Pina et al., 2014).
Fig 7. (A) Electron micrographs of
particles of AFV1 with tail structures
in their native conformation. (B)
Electron micrographs of particles of
AFV1 adsorbed to pili of host A.
hospitalis CH10/1, stained with 3%
uranyl acetate. Black arrows indicate
pili; white arrows show knots which
are putative viral terminal structures
separated from the virus body. Bars,
100 nm. (from Bettstetter et al., 2003)
2.2 Sulfolobus islandics rod-shaped virus 2 (SIRV2).
Rudiviridae and Lipothrixviridae belong to the linear viruses, and they are ubiquitous in
high temperature (>75°C) and low pH (pH <3) terrestrial geothermal environments.
Comparative-genomic analysis suggests a common evolutionary ancestry of the rudiviruses
and lipothrixviruses, based on the conservation of orthologous core genes and the similarity
of the major viron coat proteins (Prangishvili et al., 2006a). Together with Sulfolobus
islandics rod-shaped virus 1 (SIRV1), Stygiolobus rod-shaped virus (SRV) (Vestergaard et
al., 2008b), Acidianus rod-shaped virus 1 (ARV1) (Vestergaard et al., 2005) and
Sulfolobales Mexican rudivirus 1 (SMRV1) (Servin-Garciduenas et al., 2013), SIRV2 was
grouped as rudiviruses by their rod-shaped morphology, gene architecture and sequence.
SIRV2 is one of the most extensively studied archaeal viruses and has developed into the
archaeal model virus thanks to the structural, genomic and transcription studies. SIRV2 was
first isolated from the colony-cloned S.islandicus stains HVE 10/2 isolated from solfataric
fields in Iceland-Hveragerdi. This non enveloped virus is a stiff rod of 23 nm in width, 900
nm in length. As shown in the electron micrographs, a central cavity ends plugging by
A B
14
approximately 50 nm stoppers were clearly visible in Figure 8, and both ends decorated with
three short tail fibers. Sensitivity of SIRV2 genome to BAL31 but not λ exonuclease
indicated the existence of covalently closed hairpin ends (Prangishvili et al., 1999).
The linear 35,502 bp double stranded DNA SIRV2 genome carries inverted terminal repeats
(ITRs) of 1628 bp at each end, and with a low G+C content of 25%. The virion body is a
superhelix formed by genomic DNA and multiple copies of the highly glycosylated 20-kDa
capsid protein. SIRV2 genome encodes 54 ORFs, sharing 44 homologs with SIRV1, and
approximately half of the encoded proteins have been characterized by sequence, structural
and biochemical analysis, which is the highest proportion on recognizing gene functions
among crenarchaeal viruses. Four SIRV2 viron proteins were identified as virion structure
proteins: the major capsid protein (MCP), P134 (gp26), shares a common fold with MCPs
of lipothrixvirus AFV (Acidianus filamentous virus) (Goulet et al., 2009a). The largest viral
protein P1070 (gp38), has a MW of 105 kDa, possesses a coiled-coil domain. It is a
component of the three fibers (Steinmetz et al., 2008). Besides, the two structural proteins
ORF488 (gp33) and ORF564 (gp39) are also found in the SIRV2 virons, although in a low
amount (Vestergaard et al., 2008b). Crystal structure resolution of SIRV1 P119 as well as
the detection of nicking and joining activities by experiments suggest that this protein could
be involved in initiation of SIRV1 genome replication (Oke et al., 2011). P121 has a high
sequence similarity with archaeal Holliday junction resolvases (Hjrs) and the Hjr activity
was experimentally examined (Birkenbihl et al., 2001). There are more hypothetical
proteins predicted by sequence analysis and experimental evidence that involved in
transcription, replication and nucleic acid metabolism, more detailed information about the
whole life cycle of SIRV2 will be discussed below.
15
Figure 8. Structure of SIRV2 particles. (A).
TEM of negatively stained SIRV2 particles.
Inset shows a high resolution image of the
end structure of the capsid. The scale bar is
500 nm. (B). Schematic depiction of
SIRV2 particles ( from Steinmetz et al.,
2008).
2.3 SIRV2 life cycle
2.3.1 Attachment and Entry
The archaeal viruses display an unusual and diverse morphotypes, genome sequences, as
well as the structure of proteins (Krupovic et al., 2012). Recent researches revealed that the
interaction between archeal viruses and their hosts seem also to be unique (Bize et al., 2009).
However, compared with wealth of data available on bacterial and eukaryotic systems, the
studies on archaeal viruses mainly consistent of biochemical and genetic characterization of
their virions, while the attachment and entry process are still elusive.
Virus infection is initiated by entry into the host cell, and the first step of the entry process is
to recognize the receptors present on the host cell surface by specific interaction. Then they
must have ways to transporting their genetic information to the cell compartment where
their genome is replicated (Poranen et al., 2002). The vast majority of known viruses have a
tail structure decorated to one or two ends of nucleocapsid, which facilitate the attachment
of virions to the host membrane. In the Lipothrixviridae family, each of the virion is tapered
and carries different specific terminal structures. These structures can represent claws
(AFV1), T-bars (AFV9), mop-like structures (SIFV), three (AFV3) or six (SFV) short
16
filaments or tips resembling bottle brushes (AFV2), which are implicated in cellular
adsorption (Bettstetter et al., 2003;Bize et al., 2008;Haring et al., 2005b).
Both termini of the SIRV2 virion are connected with three tail fibers composed of the minor
structural protein P1070. These termini were detected to bind the tips of the pilus-like
filaments, which are abundant on the surface of host cells, by transmission electron
microscopy and whole-cell electron cryotomography (cryo-ET). Figure 9 demonstrated the
interaction between SIRV2 termini fibers and purified host cellular filaments. The virus
adsorption was very fast and irreversible, but the infected cells were no longer able to
adsorb more virus efficiently (Quemin et al., 2013). Many bacterial viruses like Ff
inoviruses, utilize the filamentous cellular appendages as primary receptors. Then retraction
of the host pilus bring the viron close to the host cell surface, where it could bind to the
secondary receptor (Rakonjac et al., 2011). Although no retracting pili have been identified
in archaea, there should be secondary receptors on the host cell surface to adsorb the virus
particles. Indeed, Sulfolobus mutant strain lacking cluster sso3138-sso3141 and cluster
sso2386-sso2387 was resistant to SIRV2. No growth retardation was observed when this
mutant strain was diluted and infected with SIRV2 at the same M.O.I, compared with wide
type strain. The first clusters sso3138 to sso3141 were predicted to possess transmembrane
helices and to be located extracellularly, probably acting as a receptor for SIRV2. The
proteins encoded by the other gene cluster may be involved in the secretion of the receptor
components. Besides, the genetic complementation experiments confirmed the involvement
of the mutation in virus resistance and further support that these proteins are responsible for
SIRV2 entry (Deng et al., 2014).
Figure 9. Transmission electron
micrographs of SIRV2 interaction with
purified cellular filaments. The
filaments were removed from S.
islandicus LAL14/1 cells (from
Quemin et al., 2013) .
17
However, how the virus overcomes 12.5-m-long filament to reach the cell body, and the
mechanism of removing their coat protein as well as the association of the two identified
gene clusters with the structure of pili is still poorly understood and need further studies.
2.3.2 SIRV2 Gene Transcription and Regulation
SIRV2 enters into the cell, removes the coat protein and is likely to recruit the host RNA
polymerase complex for transcribing the SIRV2 genes, as no viral gene was shown to
encode a RNA polymerase. Generally, the virus transcription is time regulated and could be
classified as early, middle and late transcribed genes that encode the proteins involved in
regulation, translation, replication and structure proteins for assembly in a chronological
way.
In bacterial and eukaryal virus-host systems, modification of cellular transcription as a result
of virus infection is well studied, such as T7 bacteriophage. Besides transcribed by host
E.coli RNA polymerase, T7 bacteriophage encodes its own RNA polymerase, which is a
single subunit enzyme of 99 kDa. The DNA genome of T7 is transcribed entirely from left
to right, firstly in the early region by E.coli RNA polymerase, and then from a portion of the
early region to the entire late region by newly-made T7 RNA polymerase (Dunn and Studier,
1983;Steitz, 2004). Whereas in archaeal domain, the mechanisms and controls of viral gene
expression as well as host gene regulation upon virus infection are still not elucidated. To
date, several archaeal viruses have developed as suitable models to study molecular details
of the archaeal viron life cycle and host responses, e.g. the temperate Sulfolobus
spindle-shaped viruses (SSV) (Frols et al., 2007) and the lytic Sulfolobus turreted
icosahedral virus (STIV) (Ortmann et al., 2008;Maaty et al., 2012).
Reminiscent to the life cycles of bacteriophages and eukaryotic viruses, SSV1 exhibits a
tight temporal regulation of its own transcription after UV treatment, initiating from a small
UV-specific gene and then continues as three distinct sets of genes representing
immediate-early, early and late transcripts. But very few of host genes was regulated upon
virus infection (Frols et al., 2007). However, the microarray study about transcription of the
lytic virus STIV was completely an opposite story. STIV transcription did not show a
typical temporal regulation. In the virus life cycle, transcription signals of nine early viron
18
genes were detected at 8h, then all the remaining genes were transcribed subsequently.
Surprisingly, a total of 177 host genes were determined to be differentially expressed during
the infection, of which two thirds were up-regulated and one third were down-regulated
(Ortmann et al., 2008).
SIRV2 infects Sulfolobus spp. but does not integrate into the host chromosome. It was
thought to be a lysogenic virus existing in a stable carrier state in the host, corresponding to
the uniform transcription pattern revealed in an earlier study (Kessler et al., 2004). During
the characterization of the special release mechanism of SIRV2, it was then demonstrated to
be a lytic virus (Bize et al., 2009). Independent microarray study performed in infected S.
solfataricus 5E6 cells and transcriptomic analysis of infected S. islandicus LAL14/1 cells
exhibited that SIRV2 transcription starts from the terminal genes located at both ends of the
linear genome (Quax et al., 2013;Okutan et al., 2013). Although SIRV2 transcription is not
tightly regulated chronologically like SSV1, the gene expression showed a temporal pattern.
Some early genes like two identical ORF83a, ORF83b as well as ORF119C, the viral
replication initiation protein (Oke et al., 2011), were detected to be transcribed in 15-30 min
and highly expressed at 1h. Whereas the structure proteins like the major capsid protein and
virus-associated pyramids protein were most abundant at the late stage of the infection cycle.
Despite lacking strong temporal regulation of transcription on its own virus, the host
response to SIRV2 infection was significant. More than one third of the host genes were
differentially regulated, with a similar number of down and upregulated genes. Most of the
host genes that are strongly activated upon infection are assumed to function in defense
against viruses, as well as cellular collapse, energy metabolism and membrane transport,
which may suggest that the virus control the replication phases less dependent on its own
differential gene expression, but co-opted host genes.
2.3.3 SIRV2 Replication
As we know that SIRV2 genome is a linear duplex with covalently closed hairpin termini
and long ITRs at both ends (Blum et al., 2001). This termini structure is normally involved
in replication initiation that parallels to that of Poxviridae and other large cytoplasmic
eukaryotic viruses. Both DNA sequence and structure within the termini are important for
template recognition, which is nicked by Rep initiation protein and exposed a 3'-OH group
as a primer for DNA synthesis (Du and Traktman, 1996).
19
By sequence and structure analysis, ORF119 of SIRV2 was found to be a member of the
replication initiator (Rep) family, having a conserved key active-site motif with
rolling-circle replication (RCR) rep proteins (Vega-Rocha et al., 2007). It forms a dimer and
sequence-specifically nicks one strand of the SIRV2 terminal hairpin only when the
substrate is in the single-strand form. The joining activity of ligating the fragments by a
strand transfer (flip-flop) mechanism was also confirmed (Oke et al., 2011). According to
all these features, along with the detection of head-to-head concatemers of the replicative
intermediates (Peng et al., 2001), a related but unrestricted mechanisms to rolling-circle
replication (RCR) was proposed. The Rep protein recognizes and nicks one ori site of the
genome, then one subunit of rep protein covalent connected with the new generated ori site
and the other subunit of the protein ligated the old two fragments, forming a contiguous
DNA circle. Displacement replication is then used to replicate the rest of the genome and
generated a double strand DNA circle, but with a nicked hairpin termini adducted rep
protein. The next steps of the replication are similar to RCR of the poxviruses. At the
junctions between genome monomers, opposing inverted terminal repeats can be extruded
to form hairpin fourway junctions. Therefore, a Holiday junction resolving enzyme (Hjr)
was supposed to introduced to resolve the concatamers, producing monomer copies with
linear hairpin ends (Culyba et al., 2006;Oke et al., 2011).
Holliday junction resolving enzymes are ubiquitously found in all the domains of life, such
as RuvC in Bacteria, Human GEN1 in Eukarya (Declais and Lilley, 2008), and two different
holiday junction resolving enzymes (Hjr and Hje) from Sulfolobus solfataricus of
crenarchaeon (Kvaratskhelia and White, 2000). As expected, SIRV2 encodes a 14 kDa
Holliday junction resolving enzyme (SIRV2 Hjr), which is conserved among rudiviruses.
Unlike the bacteriophage resolving enzymes, which cleave a variety of branched DNA
structures formed during replication, the SIRV2 Hjr showed a very narrow substrate range,
only cleaves the four-way junctions DNA structures, and the cleavage pattern is also unique
by nicking only exchange strand pairs (Gardner et al., 2011a). This protein was presumed to
be important for the processing of replicative DNA intermediates late during the infection
cycle before packaging into newly synthesized heads commences.
Unlike some bacterial viruses encoding DNA replication related proteins, most of the
archaeal viruses lack its own DNA polymerase genes, indicating that their replication
probably rely on the host replication machinery. It was proved by recently published work
that two of the heterotrimeric S. solfataricus sliding clamp (SsoPCNA1 to 3) (proliferating
20
cell nuclear antigen) interacted with some SIRV2 viral proteins. PCNA is a key protein
functioning as a cofactor of DNA polymerases recruiting different crucial DNA metabolism
proteins in DNA replication and repair (Moldovan et al., 2007). Most of the interacting viral
proteins have not been assigned function, except SIRV2 Hjr, which agreed to previous
research released that SsoPCNA could stimulate the Hjr enzyme activity in S.solfataricus
(Dorazi et al., 2006). It is intriguing that the early transcribed genes ORF83a/b were also
shown to interact with PCNA, suggesting its important roles during the replication cycle of
SIRV2 (Gardner et al., 2014).
Although the functions of some replicative viral proteins were confirmed, and a preliminary
model was proposed, a lot of further studies are still needed to discover and explain the
virus replication mechanism.
2.3.4 SIRV2 Release Mechanism
The final step for completion of the viral replication cycle is the release of virus particles. In
bacterial domain, most lytic viruses cross the cell envelope and spread to the environment
with the assistant of phage-encoded small integral membrane proteins--holins (Krupovic
and Bamford, 2008). How the archaeal viruses overcome the challenging task of rupturing
the cell membrane and escape from the host cells have attracted a lot of attention in recent
years, especially after the discovery of a unique release mechanism (Bize et al., 2009).
Among archaeal viruses, SIRV2 and STIV are the best studied viruses with respect to host
cell interactions. Both of them are lytic viruses, and shared the same extraordinary virion
egress mechanism. SIRV2 induces the degradation of the host chromosome and assemble
virus particles in cytoplasm (Fig. 10A). In the late stages of the virus infection cycle,
numerous prominent virus-associated pyramids (VAPs) were formed on the host cell surface
(Fig. 10B and C), and these special structures open outward at the end of infection cycle,
allowing the escape of the mature viruses through the created apertures (Bize et al.,
2009;Brumfield et al., 2009). Apparently, this release mechanism is not universal for
hyperthermophilic viruses. Although sharing the release mechanism, the two viruses are
dramatically different in their morphological properties. Therefore, it is possible that the
morphogenetic and egress systems evolved independently.
21
Figure 10. (A) Schematic representation of the major stages of SIRV2 infection cycle in the Sulfolobus host
cell. Times after infection are indicated in hours. The gradual opening out of VAPs(at time points
between10and14 h) is illustrated inmoredetails with fragments from the TEM of thin sections. (B) Cells 10 h
after infection. (C) Thin sections in a plane perpendicular to the cell envelope. Arrows indicate VAPs (Bize et
al., 2009). D and E , Negative contrast electron micrographs of isolated VAPs. (D) The side view of intact
VAPs. (E) Top view of a VAP in the open conformation (Scale bars: 100 nm.). (F) Thin sections through S.
acidocaldarius expressing SIRV2-P98. Arrows indicate VAPs. (Scale bars: 200 nm.) (from Quax et al., 2010)
In order to investigate the special structural protein components of SIRV2-infected cells,
three different virus-infected cell fractions were collected, compared and analyzed: the total
cell lysate, the membrane and the cytosol fractions. It was found that the 10 kDa P98 of
SIRV2 is the only protein appearing specifically in the membrane fraction of infected cells
and is exposed on the surface that rupturing the S-layer, no other viral protein is involved in
the assembly of pyramids. After overexpression of SIRV2-ORF98 in E. coli and S.
acidocaldarius, the VAPs were also formed with the same size and shape as those formed in
S. islandicus infected with SIRV2 (Fig. 10F) (Quax et al., 2010;Quax et al., 2011). The
sequence alignment data revealed that no other archaeal virus carried the homologue of the
SIRV2-ORF98, except for STIV and the Rudiviridae (SIRV1/2, SRV). It is also intriguing
that the VAPs was a separate structural unit which can be isolated and purified, and the solo
protein SIRV2-ORF98 is capable of self-assembling into ordered sevenfold isosceles
A B C
D E F
22
triangle–shaped pyramid (Fig. 10D and E), which seems to be a baseless and hollow
structure (Quax et al., 2011).
Although the similar VAPs were observed in heterologous expressed E.coli and S.
acidocaldarius, they only existed in the surface of the inner membrane and all of the VAPs
were in closed state. There must be at least one special factor induced the VAPs opening,
which is absent in S. acidocaldarius and E. coli, but is present in its native host cells.
23
Summary of Results
24
Sulfolobus islandics rod-shaped virus 2 (SIRV2), as a member of the family Rudiviridae, is
a promising candidate to become a general model for detailed studies of archaeal virus
biology due to its relatively easy laboratory cultivation and sufficient yields. To date,
several important stages of its biological life cycle have been characterized such as viral
entry, transcription pattern, genome replication as well as its unique egress mechanism,
which provide us a much better understanding of the unknown archaeal viral world.
Even so, similar to the vast majority of other archaeal viruses showing little sequence
similarity to public databases, the functions of many SIRV2 proteins remain to be identified.
Only one fifth of the 54 ORFs encoded by SIRV2 genome were experimentally confirmed a
function, and the knowledge of its basic molecular processes like DNA repair,
recombination, genome maturation as well as the interaction with its host are still limited.
Although possessing limited sequence similarities with the public gene bank, a
CRISPR-associated Cas4-like protein ORF207 (gp19), previously identified as a ssDNA
endonuclease, has drawn a lot of interests (Gardner et al., 2011b). It was detected to be
transcribed from a single promoter with two other proteins (gp17 and gp18), located at its
upstream, and generated a polycistronic transcript (Kessler et al., 2004). This organization
of proteins suggests related functions. Moreover, the bioinformatic analysis revealed that
this operon constitutes the most conserved gene cluster in archaeal linear viruses including
rudiviruses and filamentous viruses. Then it has raised questions regarding the functions of
this entire operon and the related virus infection stages they may be involved in.
The resolved crystal structure of gp17 homolog encoded by SIRV1 indicated a DNA
binding activity, although no obvious structural similarity was matched in Protein Data
Bank. Different structural DNA substrates were tested for its binding activities, and the
results demonstrated that either ssDNA or dsDNA with a single or double flaps can be
shifted with the protein gp17, no blunt-end dsDNA could form the protein-DNA complex,
which indicate that gp17 is a ssDNA binding protein. However, none of the documented
classical ssDNA binding domains were found in the structure of gp17, therefore this protein
constitutes a novel non-canonical ssDNA binding protein. Sequence alignment of gp17
homologs revealed 3 highly conserved and 5 relatively conserved residues. Mutagenesis of a
25
few conserved basic residues distributed in two adjacent loops within each monomer
suggested a U-shaped binding path for ssDNA.
As gp18 couldn’t be cloned into Sulfolobus cells due to its toxicity and as recombinant
expression in E.coli resulted in the formation of inclusion bodies, a denaturation and
refolding strategy was employed to purify the His-tagged gp18 from E.coli. Both the
circular dichroism spectroscopy and gel-filtration chromatography assay showed that the
refolded protein gp18 is functional stable to be used for the further study. BlastP search of
gp18 sequence suggested a weak similarity to bacterial ATPase domains of Lon protease,
and a tertiary structure prediction suggested a function as hexameric helicase. However,
neither protease or helicase activity of gp18 could be detected under all possible conditions.
Instead, gp18 was detected to be able to increase the dsDNA yield from two complementary
oligonucleotides. The failure of detecting the helicase activity could be due to the lack of
proper experimental conditions or possible mask of helicase activity by the stronger
annealing activity. It also could be that gp18 carries no helicase activity, but only annealing
activirty, possessing the similar features as the annealing helicases (HARP, AH2). (Yusufzai
and Kadonaga, 2008;Yusufzai and Kadonaga, 2010).
To better understand the function of the entire gene operon, the protein product of the third
gene, gp19, was further characterized in this study, which was detected to have a 5’-3’
ssDNA exonuclease activity, in addition to the previously demonstrated ssDNA
endonuclease activity.
There are 38 aa residues missing at the C-terminus of gp17 in the crystal structure, which
was predicted as disordered region by two different program IUpred and PONDR. The
disordered C-terminus of bacterial SSB proteins are normally involved in protein-protein
interactions. Since the entire operon all work in the same type of substrate, ssDNA, the
interactions among the three proteins were performed by GST affinity chromatography. The
experimental results demonstrated that gp17 interacts with gp18 and the C-terminal
disordered domain of gp17 is essential for the interaction. No interaction was detected
between gp17 and gp19, but a weak interaction was shown between gp18 and gp19. In order
to confirm whether gp17 could recruit some ssDNA-processing proteins as bacterial SSBs,
this gene was inserted into plasmid and transfermed into host Sulfolobus solfataricus P2
cells for the pull-down assay in vivo. Two host proteins sso2277 and reverse gyrase
26
(sso0422) were detected to interact with gp17. Protein structure prediction of sso2277
revealed a high similarity with RecF and RecN proteins, which involved in DNA replication
/ recombination.
The operonic or clustered organization of the three genes in rudi- and filamentous viruses
and the observed interactions between their protein products strongly suggest their close
cooperation in a same process(es) involving ssDNA. This process could be the SIRV2
genome maturation, replication or recombination, and new evidences are needed to support
the hypothesis.
Besides available information concerns unusual viral morphological and genomic properties,
the SIRV2 transcription pattern as well as the regulation of host genes during virus infection
was studied, either by microarray analysis or by deep transcriptome sequencing (Okutan et
al., 2013;Quax et al., 2013). Although lacking of strong temporal regulation of transcription
on its own virus, the host response to SIRV2 infection was significant. More than one third
of the host genes were differentially regulated, with a similar number of downregulated and
upregulated genes. Among these regulated genes, there are two transcription regulators
sso2474 and sso10340 from Sulfolobus sulfataricus P2 were responded differently. Then we
are curious to find out if any host genes or virus genes were regulated by these two
regulators, and whether they are a local or global acting transcription factors. In this work,
we investigate the binding targets of the two proteins in an in vivo context by performing a
method similar to chromatin immunoprecipitation combined with DNA sequencing.
In order to detect the regulation on both host genes and viral genes, after the proteins were
overexpressed for 15h, the cells were infected with SIRV2 at about M.I.O of 10 for 2.5 h.
His-tagged protein purification was carried out from virus infected cells, the protein sso2474
was detected to bind hundreds of fold more DNA than the control group, and sso10340
exhibited a range of oligomeric states resembling the feature of Lrp/AsnC family proteins.
The bound DNA was separated from DNA-protein complex from the two purified proteins,
respectively, and sent for the high-throughput sequencing. The alignment between
sequenced data and virus genome or host genome revealed that most of the DNA bound by
sso2474 is viral DNA and sso10340 is mainly associated with the host regulation.
27
However, the specific binding targets and the binding mechanisms of sso2474 are still not
identified, and the experiments showed that this protein purified from E.coli preferred to
bind dsDNA than ssDNA. A total 27 binding target regions by protein sso10340 or its
interacted proteins in S.solfataricus P2 were identified, and half of them located in the
upstream or partial upstream of the corresponding genes, while the other half fell within the
gene coding regions. A 11bp palindromic binding motif was defined by analysis of the
enriched oligonucleotide sequences, which was present in 96% of the binding targets. The
functions of these related genes were categorized and most of which were involved in
energy metabolism, transport and amino acid metabolism.
28
Future Perspectives
29
The PhD work is the first study providing the functional characterization of an entire gene
operon conserved in archaeal rudiviruses and filamentous viruses as well as the general
regulation profile of two host regulators. There are still some work to be done in the future
to enlarge the knowledge of the archaeal viral biology and virus-host interaction.
The toxicity of gp18 to Sulfolobus cells as well as its insolubilities in E.coli hindered the
progress of characterization of the whole operon. Although insertion of both gp17 and gp18
genes into Sulfolobus solfataricus cells could decrease the strong toxicity of gp18, almost no
expressed gp18 can be purified (data not shown). One method we would like further to try is
co-expressing the two or three proteins in E.coli in a suitable system to test if gp17 or gp19
could help the folding of gp18, since both of them were demonstrated to interact with gp18.
If so, the next plan is crystalizing the complex with a synthesized oligonucleotide to further
investigate the interaction in detail between the complex and ssDNA.
The ssDNA binding, annealing and nuclease activities in vitro were all characterized in this
study, and the possible function in viral infection cycle are discussed, whereas more in vivo
evidence is still needed to complete the scenario we constructed. Due to the limited viral
genetic technologies and relatively large size of this virus, silencing these genes in virus is
not possible until now. We already tried to overexpressing the c-terminal truncated gp17 in
the host cells for competition with the wide-type one in virus to detect its influence either in
virus replication or genome maturation. However, the result is not conclusive due to
different reasons. It would be interesting and exciting to use some good ideas and methods
to detect the viral function of this operon in vivo.
The very surprising thing about sso2474 is its special high affinity to viral DNA in a
non-sequence binding way. The DNA binding mechanism of sso2474 was not clear, and the
phonotype changes and no growth retardation to virus are expected to be observed in the
sso2474 mutated organism, if it can be knocked out. At last, the global regulation of
sso10340 need to be further validated. Is there any other DNA binding protein or regulators
interacting with sso10340 and whether this regulator activates or represses the transcription
of the corresponding genes are still need to be confirmed.
30
Reference
Ackermann,H.W., and Prangishvili,D. (2012) Prokaryote viruses studied by electron
microscopy. Arch Virol 157: 1843-1849.
Albers,S.V., Birkeland,N.K., Driessen,A.J., Gertig,S., Haferkamp,P., Klenk,H.P. et al. (2009)
SulfoSYS (Sulfolobus Systems Biology): towards a silicon cell model for the central
carbohydrate metabolism of the archaeon Sulfolobus solfataricus under temperature
variation. Biochem Soc Trans 37: 58-64.
Anderson,I., Rodriguez,J., Susanti,D., Porat,I., Reich,C., Ulrich,L.E. et al. (2008) Genome
sequence of Thermofilum pendens reveals an exceptional loss of biosynthetic pathways
without genome reduction. J Bacteriol 190: 2957-2965.
Bernander,R. (2007) The cell cycle of Sulfolobus. Mol Microbiol 66: 557-562.
Bettstetter,M., Peng,X., Garrett,R.A., and Prangishvili,D. (2003) AFV1, a novel virus
infecting hyperthermophilic archaea of the genus acidianus. Virology 315: 68-79.
Bintrim,S.B., Donohue,T.J., Handelsman,J., Roberts,G.P., and Goodman,R.M. (1997)
Molecular phylogeny of Archaea from soil. Proc Natl Acad Sci U S A 94: 277-282.
Birkenbihl,R.P., Neef,K., Prangishvili,D., and Kemper,B. (2001) Holliday junction resolving
enzymes of archaeal viruses SIRV1 and SIRV2. J Mol Biol 309: 1067-1076.
Bize,A., Karlsson,E.A., Ekefjard,K., Quax,T.E., Pina,M., Prevost,M.C. et al. (2009) A unique
virus release mechanism in the Archaea. Proc Natl Acad Sci U S A 106: 11306-11311.
Bize,A., Peng,X., Prokofeva,M., Maclellan,K., Lucas,S., Forterre,P. et al. (2008) Viruses in
acidic geothermal environments of the Kamchatka Peninsula. Res Microbiol 159: 358-366.
Blum,H., Zillig,W., Mallok,S., Domdey,H., and Prangishvili,D. (2001) The genome of the
archaeal virus SIRV1 has features in common with genomes of eukaryal viruses. Virology
281: 6-9.
Bochkarev,A., and Bochkareva,E. (2004) From RPA to BRCA2: lessons from single-stranded
DNA binding by the OB-fold. Curr Opin Struct Biol 14: 36-42.
Breithaupt,H. (2001) The hunt for living gold. The search for organisms in extreme
environments yields useful enzymes for industry. EMBO Rep 2: 968-971.
Brock,T.D., Brock,K.M., Belly,R.T., and Weiss,R.L. (1972) Sulfolobus: a new genus of
sulfur-oxidizing bacteria living at low pH and high temperature. Arch Mikrobiol 84: 54-68.
31
Brugger,K., Redder,P., and Skovgaard,M. (2003) MUTAGEN: multi-user tool for annotating
genomes. Bioinformatics 19: 2480-2481.
Brumfield,S.K., Ortmann,A.C., Ruigrok,V., Suci,P., Douglas,T., and Young,M.J. (2009) Particle
assembly and ultrastructural features associated with replication of the lytic archaeal virus
sulfolobus turreted icosahedral virus. J Virol 83: 5964-5970.
Chong,J.P., Hayashi,M.K., Simon,M.N., Xu,R.M., and Stillman,B. (2000) A double-hexamer
archaeal minichromosome maintenance protein is an ATP-dependent DNA helicase. Proc
Natl Acad Sci U S A 97: 1530-1535.
Culyba,M.J., Harrison,J.E., Hwang,Y., and Bushman,F.D. (2006) DNA cleavage by the A22R
resolvase of vaccinia virus. Virology 352: 466-476.
Declais,A.C., and Lilley,D.M. (2008) New insight into the recognition of branched DNA
structure by junction-resolving enzymes. Curr Opin Struct Biol 18: 86-95.
DeLong,E.F. (1992) Archaea in coastal marine environments. Proc Natl Acad Sci U S A 89:
5685-5689.
Deng,L., He,F., Bhoobalan-Chitty,Y., Martinez-Alvarez,L., Guo,Y., and Peng,X. (2014)
Unveiling cell surface and type IV secretion proteins responsible for archaeal rudivirus
entry. J Virol 88: 10264-10268.
Deng,L., Zhu,H., Chen,Z., Liang,Y.X., and She,Q. (2009) Unmarked gene deletion and
host-vector system for the hyperthermophilic crenarchaeon Sulfolobus islandicus.
Extremophiles 13: 735-746.
Dickey,T.H., Altschuler,S.E., and Wuttke,D.S. (2013) Single-stranded DNA-binding proteins:
multiple domains for multiple functions. Structure 21: 1074-1084.
Dorazi,R., Parker,J.L., and White,M.F. (2006) PCNA activates the Holliday junction
endonuclease Hjc. J Mol Biol 364: 243-247.
Dosztanyi,Z., Csizmok,V., Tompa,P., and Simon,I. (2005) IUPred: web server for the
prediction of intrinsically unstructured regions of proteins based on estimated energy
content. Bioinformatics 21: 3433-3434.
Du,S., and Traktman,P. (1996) Vaccinia virus DNA replication: two hundred base pairs of
telomeric sequence confer optimal replication efficiency on minichromosome templates.
Proc Natl Acad Sci U S A 93: 9693-9698.
Dunn,J.J., and Studier,F.W. (1983) Complete nucleotide sequence of bacteriophage T7 DNA
and the locations of T7 genetic elements. J Mol Biol 166: 477-535.
32
Elkins,J.G., Podar,M., Graham,D.E., Makarova,K.S., Wolf,Y., Randau,L. et al. (2008) A
korarchaeal genome reveals insights into the evolution of the Archaea. Proc Natl Acad Sci
U S A 105: 8102-8107.
Forterre,P., Brochier,C., and Philippe,H. (2002) Evolution of the Archaea. Theor Popul Biol
61: 409-422.
Frols,S., Gordon,P.M., Panlilio,M.A., Schleper,C., and Sensen,C.W. (2007) Elucidating the
transcription cycle of the UV-inducible hyperthermophilic archaeal virus SSV1 by DNA
microarrays. Virology 365: 48-59.
Fuhrman,J.A., McCallum,K., and Davis,A.A. (1992) Novel major archaebacterial group from
marine plankton. Nature 356: 148-149.
Gardner,A.F., Bell,S.D., White,M.F., Prangishvili,D., and Krupovic,M. (2014) Protein-protein
interactions leading to recruitment of the host DNA sliding clamp by the hyperthermophilic
Sulfolobus islandicus rod-shaped virus 2. J Virol 88: 7105-7108.
Gardner,A.F., Guan,C., and Jack,W.E. (2011a) Biochemical characterization of a
structure-specific resolving enzyme from Sulfolobus islandicus rod-shaped virus 2. PLoS
One 6: e23668.
Gardner,A.F., Prangishvili,D., and Jack,W.E. (2011b) Characterization of Sulfolobus
islandicus rod-shaped virus 2 gp19, a single-strand specific endonuclease. Extremophiles 15:
619-624.
Gevers,D., Dawyndt,P., Vandamme,P., Willems,A., Vancanneyt,M., Swings,J., and De,V.P.
(2006) Stepping stones towards a new prokaryotic taxonomy. Philos Trans R Soc Lond B
Biol Sci 361: 1911-1916.
Goulet,A., Blangy,S., Redder,P., Prangishvili,D., Felisberto-Rodrigues,C., Forterre,P. et al.
(2009a) Acidianus filamentous virus 1 coat proteins display a helical fold spanning the
filamentous archaeal viruses lineage. Proc Natl Acad Sci U S A 106: 21155-21160.
Goulet,A., Spinelli,S., Blangy,S., van,T.H., Leulliot,N., Basta,T. et al. (2009b) The thermo-
and acido-stable ORF-99 from the archaeal virus AFV1. Protein Sci 18: 1316-1320.
Gribaldo,S., and Brochier-Armanet,C. (2006) The origin and evolution of Archaea: a state of
the art. Philos Trans R Soc Lond B Biol Sci 361: 1007-1022.
Gudbergsdottir,S., Deng,L., Chen,Z., Jensen,J.V., Jensen,L.R., She,Q., and Garrett,R.A. (2011)
Dynamic properties of the Sulfolobus CRISPR/Cas and CRISPR/Cmr systems when
challenged with vector-borne viral and plasmid genes and protospacers. Mol Microbiol 79:
35-49.
33
Guilliere,F., Peixeiro,N., Kessler,A., Raynal,B., Desnoues,N., Keller,J. et al. (2009) Structure,
function, and targets of the transcriptional regulator SvtR from the hyperthermophilic
archaeal virus SIRV1. J Biol Chem 284: 22222-22237.
Guy,L., and Ettema,T.J. (2011) The archaeal 'TACK' superphylum and the origin of
eukaryotes. Trends Microbiol 19: 580-587.
Haring,M., Rachel,R., Peng,X., Garrett,R.A., and Prangishvili,D. (2005a) Viral diversity in hot
springs of Pozzuoli, Italy, and characterization of a unique archaeal virus, Acidianus
bottle-shaped virus, from a new family, the Ampullaviridae. J Virol 79: 9904-9911.
Haring,M., Vestergaard,G., Brugger,K., Rachel,R., Garrett,R.A., and Prangishvili,D. (2005b)
Structure and genome organization of AFV2, a novel archaeal lipothrixvirus with unusual
terminal and core structures. J Bacteriol 187: 3855-3858.
Haring,M., Vestergaard,G., Rachel,R., Chen,L., Garrett,R.A., and Prangishvili,D. (2005c)
Virology: independent virus development outside a host. Nature 436: 1101-1102.
Hohn,M.J., Hedlund,B.P., and Huber,H. (2002) Detection of 16S rDNA sequences
representing the novel phylum "Nanoarchaeota": indication for a wide distribution in high
temperature biotopes. Syst Appl Microbiol 25: 551-554.
Howard,J.A., Delmas,S., Ivancic-Bace,I., and Bolt,E.L. (2011) Helicase dissociation and
annealing of RNA-DNA hybrids by Escherichia coli Cas3 protein. Biochem J 439: 85-95.
Johnson,D.B., Joulian,C., d'Hugues,P., and Hallberg,K.B. (2008) Sulfobacillus benefaciens sp.
nov., an acidophilic facultative anaerobic Firmicute isolated from mineral bioleaching
operations. Extremophiles 12: 789-798.
Kawano,S., Iyaguchi,D., Okada,C., Sasaki,Y., and Toyota,E. (2013) Expression, purification,
and refolding of active recombinant human E-selectin lectin and EGF domains in
Escherichia coli. Protein J 32: 386-391.
Kelley,L.A., and Sternberg,M.J. (2009) Protein structure prediction on the Web: a case
study using the Phyre server. Nat Protoc 4: 363-371.
Kessler,A., Brinkman,A.B., van der Oost,J., and Prangishvili,D. (2004) Transcription of the
rod-shaped viruses SIRV1 and SIRV2 of the hyperthermophilic archaeon sulfolobus. J
Bacteriol 186: 7745-7753.
Kowalczykowski,S.C. (2000) Initiation of genetic recombination and
recombination-dependent replication. Trends Biochem Sci 25: 156-165.
Krupovic,M., and Bamford,D.H. (2008) Holin of bacteriophage lambda: structural insights
into a membrane lesion. Mol Microbiol 69: 781-783.
34
Krupovic,M., White,M.F., Forterre,P., and Prangishvili,D. (2012) Postcards from the edge:
structural genomics of archaeal viruses. Adv Virus Res 82: 33-62.
Kvaratskhelia,M., and White,M.F. (2000) Two Holliday junction resolving enzymes in
Sulfolobus solfataricus. J Mol Biol 297: 923-932.
Lemak,S., Beloglazova,N., Nocek,B., Skarina,T., Flick,R., Brown,G. et al. (2013) Toroidal
structure and DNA cleavage by the CRISPR-associated [4Fe-4S] cluster containing Cas4
nuclease SSO0001 from Sulfolobus solfataricus. J Am Chem Soc 135: 17476-17487.
Maaty,W.S., Steffens,J.D., Heinemann,J., Ortmann,A.C., Reeves,B.D., Biswas,S.K. et al.
(2012) Global analysis of viral infection in an archaeal model system. Front Microbiol 3:
411.
Martin,A., Yeats,S., Janekovic,D., Reiter,W.D., Aicher,W., and Zillig,W. (1984) SAV 1, a
temperate u.v.-inducible DNA virus-like particle from the archaebacterium Sulfolobus
acidocaldarius isolate B12. EMBO J 3: 2165-2168.
Mochizuki,T., Krupovic,M., Pehau-Arnaudet,G., Sako,Y., Forterre,P., and Prangishvili,D.
(2012) Archaeal virus with exceptional virion architecture and the largest single-stranded
DNA genome. Proc Natl Acad Sci U S A 109: 13386-13391.
Moldovan,G.L., Pfander,B., and Jentsch,S. (2007) PCNA, the maestro of the replication fork.
Cell 129: 665-679.
Mosig,G. (1998) Recombination and recombination-dependent DNA replication in
bacteriophage T4. Annu Rev Genet 32: 379-413.
Munoz,V., and Serrano,L. (1994) Elucidating the folding problem of helical peptides using
empirical parameters. Nat Struct Biol 1: 399-409.
Oke,M., Carter,L.G., Johnson,K.A., Liu,H., McMahon,S.A., Yan,X. et al. (2010) The Scottish
Structural Proteomics Facility: targets, methods and outputs. J Struct Funct Genomics 11:
167-180.
Oke,M., Kerou,M., Liu,H., Peng,X., Garrett,R.A., Prangishvili,D. et al. (2011) A dimeric Rep
protein initiates replication of a linear archaeal virus genome: implications for the Rep
mechanism and viral replication. J Virol 85: 925-931.
Okutan,E., Deng,L., Mirlashari,S., Uldahl,K., Halim,M., Liu,C. et al. (2013) Novel insights into
gene regulation of the rudivirus SIRV2 infecting Sulfolobus cells. RNA Biol 10: 875-885.
Oline,D.K., Schmidt,S.K., and Grant,M.C. (2006) Biogeography and landscape-scale diversity
of the dominant Crenarchaeota of soil. Microb Ecol 52: 480-490.
35
Oren,A. (2002) Molecular ecology of extremely halophilic Archaea and Bacteria. FEMS
Microbiol Ecol 39: 1-7.
Ortmann,A.C., Brumfield,S.K., Walther,J., McInnerney,K., Brouns,S.J., van de Werken,H.J. et
al. (2008) Transcriptome analysis of infection of the archaeon Sulfolobus solfataricus with
Sulfolobus turreted icosahedral virus. J Virol 82: 4874-4883.
Palm,P., Schleper,C., Grampp,B., Yeats,S., McWilliam,P., Reiter,W.D., and Zillig,W. (1991)
Complete nucleotide sequence of the virus SSV1 of the archaebacterium Sulfolobus
shibatae. Virology 185: 242-250.
Paytubi,S., McMahon,S.A., Graham,S., Liu,H., Botting,C.H., Makarova,K.S. et al. (2012)
Displacement of the canonical single-stranded DNA-binding protein in the
Thermoproteales. Proc Natl Acad Sci U S A 109: E398-E405.
Peng,X., Basta,T., Haring,M., Garrett,R.A., and Prangishvili,D. (2007) Genome of the
Acidianus bottle-shaped virus and insights into the replication and packaging mechanisms.
Virology 364: 237-243.
Peng,X., Blum,H., She,Q., Mallok,S., Brugger,K., Garrett,R.A. et al. (2001) Sequences and
replication of genomes of the archaeal rudiviruses SIRV1 and SIRV2: relationships to the
archaeal lipothrixvirus SIFV and some eukaryal viruses. Virology 291: 226-234.
Peng,X., Kessler,A., Phan,H., Garrett,R.A., and Prangishvili,D. (2004) Multiple variants of the
archaeal DNA rudivirus SIRV1 in a single host and a novel mechanism of genomic variation.
Mol Microbiol 54: 366-375.
Pietila,M.K., Demina,T.A., Atanasova,N.S., Oksanen,H.M., and Bamford,D.H. (2014)
Archaeal viruses and bacteriophages: comparisons and contrasts. Trends Microbiol 22:
334-344.
Pietila,M.K., Laurinavicius,S., Sund,J., Roine,E., and Bamford,D.H. (2010) The
single-stranded DNA genome of novel archaeal virus halorubrum pleomorphic virus 1 is
enclosed in the envelope decorated with glycoprotein spikes. J Virol 84: 788-798.
Pietila,M.K., Roine,E., Paulin,L., Kalkkinen,N., and Bamford,D.H. (2009) An ssDNA virus
infecting archaea: a new lineage of viruses with a membrane envelope. Mol Microbiol 72:
307-319.
Pina,M., Basta,T., Quax,T.E., Joubert,A., Baconnais,S., Cortez,D. et al. (2014) Unique
genome replication mechanism of the archaeal virus AFV1. Mol Microbiol 92: 1313-1325.
Poranen,M.M., Daugelavicius,R., and Bamford,D.H. (2002) Common principles in viral entry.
Annu Rev Microbiol 56: 521-538.
36
Prangishvili,D. (2003) Evolutionary insights from studies on viruses of hyperthermophilic
archaea. Res Microbiol 154: 289-294.
Prangishvili,D., Arnold,H.P., Gotz,D., Ziese,U., Holz,I., Kristjansson,J.K., and Zillig,W. (1999)
A novel virus family, the Rudiviridae: Structure, virus-host interactions and genome
variability of the sulfolobus viruses SIRV1 and SIRV2. Genetics 152: 1387-1396.
Prangishvili,D., Forterre,P., and Garrett,R.A. (2006a) Viruses of the Archaea: a unifying view.
Nat Rev Microbiol 4: 837-848.
Prangishvili,D., and Garrett,R.A. (2005) Viruses of hyperthermophilic Crenarchaea. Trends
Microbiol 13: 535-542.
Prangishvili,D., Garrett,R.A., and Koonin,E.V. (2006b) Evolutionary genomics of archaeal
viruses: unique viral genomes in the third domain of life. Virus Res 117: 52-67.
Prangishvili,D., Koonin,E.V., and Krupovic,M. (2013) Genomics and biology of Rudiviruses, a
model for the study of virus-host interactions in Archaea. Biochem Soc Trans 41: 443-450.
Prangishvili,D., Vestergaard,G., Haring,M., Aramayo,R., Basta,T., Rachel,R., and Garrett,R.A.
(2006c) Structural and genomic properties of the hyperthermophilic archaeal virus ATV
with an extracellular stage of the reproductive cycle. J Mol Biol 359: 1203-1216.
Quax,T.E., Krupovic,M., Lucas,S., Forterre,P., and Prangishvili,D. (2010) The Sulfolobus
rod-shaped virus 2 encodes a prominent structural component of the unique virion release
system in Archaea. Virology 404: 1-4.
Quax,T.E., Lucas,S., Reimann,J., Pehau-Arnaudet,G., Prevost,M.C., Forterre,P. et al. (2011)
Simple and elegant design of a virion egress structure in Archaea. Proc Natl Acad Sci U S A
108: 3354-3359.
Quax,T.E., Voet,M., Sismeiro,O., Dillies,M.A., Jagla,B., Coppee,J.Y. et al. (2013) Massive
activation of archaeal defense genes during viral infection. J Virol 87: 8419-8428.
Quemin,E.R., Lucas,S., Daum,B., Quax,T.E., Kuhlbrandt,W., Forterre,P. et al. (2013) First
insights into the entry process of hyperthermophilic archaeal viruses. J Virol 87:
13379-13385.
Rachel,R., Bettstetter,M., Hedlund,B.P., Haring,M., Kessler,A., Stetter,K.O., and
Prangishvili,D. (2002) Remarkable morphological diversity of viruses and virus-like particles
in hot terrestrial environments. Arch Virol 147: 2419-2429.
Rakonjac,J., Bennett,N.J., Spagnuolo,J., Gagic,D., and Russel,M. (2011) Filamentous
bacteriophage: biology, phage display and nanotechnology applications. Curr Issues Mol
Biol 13: 51-76.
37
Reiter,W.D., Palm,P., Yeats,S., and Zillig,W. (1987) Gene expression in archaebacteria:
physical mapping of constitutive and UV-inducible transcripts from the Sulfolobus virus-like
particle SSV1. Mol Gen Genet 209: 270-275.
Schleper,C., Holben,W., and Klenk,H.P. (1997) Recovery of crenarchaeotal ribosomal DNA
sequences from freshwater-lake sediments. Appl Environ Microbiol 63: 321-323.
Servin-Garciduenas,L.E., Peng,X., Garrett,R.A., and Martinez-Romero,E. (2013) Genome
sequence of a novel archaeal rudivirus recovered from a mexican hot spring. Genome
Announc 1.
She,Q., Singh,R.K., Confalonieri,F., Zivanovic,Y., Allard,G., Awayez,M.J. et al. (2001) The
complete genome of the crenarchaeon Sulfolobus solfataricus P2. Proc Natl Acad Sci U S A
98: 7835-7840.
Shereda,R.D., Kozlov,A.G., Lohman,T.M., Cox,M.M., and Keck,J.L. (2008) SSB as an
organizer/mobilizer of genome maintenance complexes. Crit Rev Biochem Mol Biol 43:
289-318.
Steinmetz,N.F., Bize,A., Findlay,K.C., Lomonossoff,G.P., Manchester,M., Evans,D.J., et al.
(2008) Site-specific and spatially controlled addressability of a new viral nanobuilding block:
Sulfolobus islandics Rod-shaped Virus 2. Advanced functional materials 18: 3478-3486.
Steitz,T.A. (2004) The structural basis of the transition from initiation to elongation phases
of transcription, as well as translocation and strand separation, by T7 RNA polymerase.
Curr Opin Struct Biol 14: 4-9.
Stetter,K.O. (2006) Hyperthermophiles in the history of life. Philos Trans R Soc Lond B Biol
Sci 361: 1837-1842.
Suck,D. (1997) Common fold, common function, common origin? Nat Struct Biol 4:
161-165.
Theobald,D.L., Mitton-Fry,R.M., and Wuttke,D.S. (2003) Nucleic acid recognition by OB-fold
proteins. Annu Rev Biophys Biomol Struct 32: 115-133.
Vega-Rocha,S., Gronenborn,B., Gronenborn,A.M., and Campos-Olivas,R. (2007) Solution
structure of the endonuclease domain from the master replication initiator protein of the
nanovirus faba bean necrotic yellows virus and comparison with the corresponding
geminivirus and circovirus structures. Biochemistry 46: 6201-6212.
Vestergaard,G., Aramayo,R., Basta,T., Haring,M., Peng,X., Brugger,K. et al. (2008a)
Structure of the acidianus filamentous virus 3 and comparative genomics of related
archaeal lipothrixviruses. J Virol 82: 371-381.
38
Vestergaard,G., Haring,M., Peng,X., Rachel,R., Garrett,R.A., and Prangishvili,D. (2005) A
novel rudivirus, ARV1, of the hyperthermophilic archaeal genus Acidianus. Virology 336:
83-92.
Vestergaard,G., Shah,S.A., Bize,A., Reitberger,W., Reuter,M., Phan,H. et al. (2008b)
Stygiolobus rod-shaped virus and the interplay of crenarchaeal rudiviruses with the CRISPR
antiviral system. J Bacteriol 190: 6837-6845.
Woese,C.R., and Fox,G.E. (1977) Phylogenetic structure of the prokaryotic domain: the
primary kingdoms. Proc Natl Acad Sci U S A 74: 5088-5090.
Woese,C.R., Kandler,O., and Wheelis,M.L. (1990) Towards a natural system of organisms:
proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci U S A 87:
4576-4579.
Worthington,P., Hoang,V., Perez-Pomares,F., and Blum,P. (2003) Targeted disruption of
the alpha-amylase gene in the hyperthermophilic archaeon Sulfolobus solfataricus. J
Bacteriol 185: 482-488.
Wu,Y. (2012) Unwinding and rewinding: double faces of helicase? J Nucleic Acids 2012:
140601.
Wurtzel,O., Sapra,R., Chen,F., Zhu,Y., Simmons,B.A., and Sorek,R. (2010) A single-base
resolution map of an archaeal transcriptome. Genome Res 20: 133-141.
Xiang,X., Dong,X., and Huang,L. (2003) Sulfolobus tengchongensis sp. nov., a novel
thermoacidophilic archaeon isolated from a hot spring in Tengchong, China. Extremophiles
7: 493-498.
Xue,B., Dunbrack,R.L., Williams,R.W., Dunker,A.K., and Uversky,V.N. (2010) PONDR-FIT: a
meta-predictor of intrinsically disordered amino acids. Biochim Biophys Acta 1804:
996-1010.
Yeats,S., McWilliam,P., and Zillig,W. (1982) A plasmid in the archaebacterium Sulfolobus
acidocaldarius. The EMBO Journal 1 : 1035-1038.
Yusufzai,T., and Kadonaga,J.T. (2008) HARP is an ATP-driven annealing helicase. Science
322: 748-750.
Yusufzai,T., and Kadonaga,J.T. (2010) Annealing helicase 2 (AH2), a DNA-rewinding motor
with an HNH motif. Proc Natl Acad Sci U S A 107: 20970-20973.
Zablen,L.B., Kissil,M.S., Woese,C.R., and Buetow,D.E. (1975) Phylogenetic origin of the
chloroplast and prokaryotic nature of its ribosomal RNA. Proc Natl Acad Sci U S A 72:
2418-2422.
39
Zhang,J., Kasciukovic,T., and White,M.F. (2012) The CRISPR associated protein Cas4 Is a 5'
to 3' DNA exonuclease with an iron-sulfur cluster. PLoS One 7: e47232.
Zillig,W., Stetter,O.K., Wunderl,S., Schulz,W., Priess,H., Scholz,I.(1980) The
Sulfolobus-``Caldariella`` Group: Taxonomy on the Basis of the Structure of
DNA-Dependent RNA Polymerases. Archives of Microbiology 125:259-269.
Zillig,W., Kletzin,A., Schleper,C., Holz,I., Janekovic,D., Hain,J. et al. (1994) Screening for
Sulfolobales, Their Plasmids and Their Viruses in Icelandic Solfataras. Systematic and
Applied Microbiology 16: 609-628.
Zuckerkandl,E., and Pauling,L. (1965) Molecules as documents of evolutionary history. J
Theor Biol 8: 357-366.
40
Manuscript I
Single-stranded DNA binding, annealing and nuclease activities encoded by a
conserved archaeal viral gene cluster
Yang Guo, Birthe B. Kragelund, Malcolm F. White and Xu Peng
Submitted to Nucleic Acid Research
41
Single-stranded DNA binding, annealing and nuclease activities
encoded by a conserved archaeal viral gene cluster
Yang Guo1, Birthe B. Kragelund1, Malcolm F. White2 and Xu Peng1*
1 Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200 CPH N. Denmark
2 Biomedical Sciences Research Complex, University of St. Andrews, North Haugh, St. Andrews,
Fife, UK
* To whom correspondence should be addressed. Tel: +45 35322018; Fax: +45 35322128; Email:
42
ABSTRACT
Single-stranded DNA (ssDNA) occurs in various cellular and viral processes of DNA
metabolism, including DNA replication, homologous recombination and repair pathways.
Here, we describe a novel type of ssDNA binding protein, a novel ssDNA annealing protein
and a ssDNA nuclease encoded by an operon comprised of ORF131b (gp17), ORF436
(gp18) and ORF207 (gp19), respectively, of Sulfolubus islandicus rod-shaped virus 2
(SIRV2). Rather than comprising one of the canonical ssDNA binding domains, SIRV2 gp17
forms a dimer with each monomer containing two α -helices and three β-strands.
Mutagenesis of a few conserved basic residues distributed in two adjacent loops within each
monomer suggested a U-shaped binding path for ssDNA. Although predicted previously as
a helicase, the recombinant gp18 showed a ssDNA annealing activity often associated with
helicases and recombinases. Moreover, gp19 was shown to possess a 5´ to 3´ ssDNA
exonuclease activity, in addition to the previously demonstrated ssDNA endonuclease
activity. Further, in vitro pull-down assay demonstrated interactions between gp17 and gp18
and between gp18 and gp19 with the former being mediated by the intrinsically disordered
C-terminus of gp17. The strand-displacement replication mode proposed previously for
rudiviruses and the close interaction between the ssDNA binding, annealing and nuclease
activities strongly point to a role of the gene operon in genome maturation and/or DNA
recombination which may function in viral DNA replication/repair.
INTRODUCTION
Viruses that infect extreme hyperthermophilic archaea, the third domain of life, are unusual
in their morphology, genome structure and proteins. In the last decade, a major effort has
been undertaken to study the archaeal viruses, which have attracted intense interest as
model systems to understand the biochemistry and molecular biology required for life at
high temperatures. Based on their morphological and genomic characteristics, 15 viral
families have been classified and about 100 viral isolates described, all with either linear or
circular double-stranded (ds) DNA, except two species possessing a single-stranded (ss)
DNA genome (1-3).
The Sulfolobus islandicus rod-shaped virus 2 (SIRV2) (4), together with SIRV1 (5),
Stygiolobus rod-shaped virus, SRV (6), Acidianus rod-shaped virus 1, ARV1 (7) and
Sulfolobales Mexican rudivirus 1 (SMRV1) (8), belong to the family Rudiviridae. The
43
rudiviruses have linear dsDNA genomes (24.6 to 35.8 kbp) with inverted terminal repeats
and the two strands at the genomic termini are covalently linked. Recently SIRV2 has been
the focus of genomic, structural, genetic and transcriptional studies, which have provided
important insights into its entry, gene regulation and unique release mechanisms (5;9-13).
Even so, similar to the vast majority of other archaeal viruses showing little sequence
similarity to public databases (14), the functions of many SIRV2 proteins remain to be
identified.
Among the 54 ORFs encoded in the genome of SIRV2, only one fifth had been
experimentally assigned a function (15). The virus is coated with one major capsid protein
gp26 and three minor structural proteins gp33, gp38 and gp39 (16). Together with the genes
encoding the viral structural proteins, gp49, encoding the component of the pyramidal
egress structure (11), is repressed by the transcription regulator gp15 (SvtR) during the
early virus infection cycle (17;18). gp16, belonging to the replication initiator (Rep) family
and nicking one strand of the viral genomic termini, was proposed to be involved in the
initiation of the DNA replication (19). The Holiday junction resolving enzyme (Hjr) gp35 was
suggested to resolve the concatemers of the replicative intermediates, producing
monomeric copies with linear hairpin ends (20). Taken together, the functions of many
SIRV2 genes remain unknown and the knowledge of its biology and basic molecular
processes such as DNA replication, recombination and maturation is still limited.
In this work we studied a SIRV2 operon containing three genes, gp17, gp18 and
gp19 that are highly conserved in rudiviruses and filamentous viruses. gp19 was previously
shown to be an endonuclease specifically cutting ssDNA (21). We demonstrate here that
gp17 is a ssDNA binding protein and interacts with gp18 while the latter stimulates
annealing of complementary oligonucleotides. In addition to the previously identified ssDNA
endonuclease activity, we detected the 5’ to 3’ ssDNA exonuclease activity of gp19, which
also interacts with gp18. Based on the data, the possible functions of the gene operon are
discussed.
MATERIALS AND METHODS
Cloning, expression and purification of C-terminally His-tagged recombinant proteins
The coding sequences of gp17, gp18 and gp19 were amplified by PCR from SIRV2 genome
using primers listed in Table S1, digested with NdeI and XhoI and subsequently inserted
44
into a similarly digested pET-30(a) (Novagen) expression vector. To introduce single or
multiple amino acid (aa) mutations into the recombinant gp17, fusion PCR using 4 primers
(Table S1) was performed for each mutant.
E.coli BL21 CodonPlus cells were transformed with individual plasmid construct and
a single clone transformant was inoculated in LB medium containing 30 g/ml kanamycin
and 25 g/ml chloramphenicol. At an optical density (OD600) of 0.4, IPTG (0.5 mM) was
added to the culture and the cells were further cultured at 25°C for 12 hours. Harvested cell
pellet was resuspended in lysis buffer (50 mM Tris-HCl pH 8.0, 300 mM NaCl, 1 mM EDTA,
1% Triton X-100 and 1 mM PMSF) and lysed by sonication. The lysate was cleared by
centrifugation at 10000 x g for 20 minutes and the supernatant was then incubated with
Ni-NTA-agarose beads (Qiagen, Germany) for 1 h at room temperature. The beads were
washed three times with washing buffer (50 mM Tris-HCl pH 8.0, 300 mM NaCl, 40 mM
Imidazol ) and the protein eluted with elution buffer (50 mM Tris-HCl pH 8.0, 300 mM NaCl,
250 mM Imidazol). The purity of the proteins was evaluated by SDS-PAGE and the gel
(12.5%) was stained with PAGE blue (Sigma Aldrich, UK). In the case of gp18, a denaturing
and refolding method was applied (see below).
Cloning, expression and purification of N-terminally GST-tagged proteins
The wild-type gp17, gp17-I, its truncated mutants gp17-II (1-121aa) and gp17-III (1-111aa)
and gp19 were amplified by PCR using primers listed in Table S1 and the PCR products
were digested with BamHI and XhoI and ligated to a BamHI and XhoI digested pGEX-6p-2
(GE Healthcare Life Science, Sweden ) expression vector. The constructs were introduced
individually into BL21 CodonPlus cells. Transformed cells were grown in LB medium
containing 100 g/ml ampicillin and 25 g/ml chloramphenicol and induced with 0.5 mM
IPTG at an OD600 of 0.4. After 12 hours incubation at 25°C, the cells were pelleted,
resuspended in lysis buffer (PBS buffer pH 8.0, 1 mM EDTA, 1% Triton X-100 and 1 mM
PMSF) and lysed by sonication. The supernatant was incubated with Glutathione
Sepharose 4 Fast Flow beads (GE Healthcare Life Science, Sweden) for 1 h at room
temperature. The beads were washed three times with PBS buffer and the proteins remain
bound on the beads for subsequent pull-down assays with His-tagged proteins.
Refolding of His-tagged gp18 from inclusion bodies
45
The cell lysate was centrifuged at 10000 × g for 20 min, and the inclusion bodies in the pellet
were solubilized in lysis buffer containing 8 M urea at room temperature for 1h. The
purification of the denatured protein using Ni-NTA-agarose beads followed the same
procedure as described above for other His-tagged proteins, except that 8 M urea was
included in both washing and elution buffers. The eluted 2 ml protein from 1L E.coli cells
was dialysed first in 200 ml 0.5 M L-Arginine buffer for 2 h, and then in 2 L PBS buffer for 3 h
and the latter dialysis was repeated for 3 times.
Preparation of substrates for DNA mobility shift and nuclease activity assays
To prepare DNA substrates for DNA mobility shift and exonuclease assays, oligonucleotide
1 (Table S2) was annealed to a series of fully or partially complementary ssDNA
oligonucleotides (Table S2) to generate either a 23-nucleotide (23-nt) 5´-ssDNA tailed
duplex (substrate B), a 23-nt 3´-ssDNA tailed duplex (substrate C), or a blunt-ended duplex
(substrate A) (Table 1). Oligonucleotide 4 (Table S2) was annealed to its partially
complementary ssDNA oligonucleotides 5 (Table S2) to generate a Y-shaped dsDNA
(substrate D). The annealing mixture was heated at 95°C for 2 min and then slowly cooled
to room temperature (25°C) over a period of 1 h. M13mp18 DNA (New England Biolabs,
America) was chosen as circular single-stranded DNA substrate for the endonuclease
assay.
Gel mobility shift analysis
50 nM of the ssDNA (oligo 4) or dsDNA (substrate A, B and D in Table 1) were incubated for
20 min at 50oC with increasing concentrations of gp17 or its mutant variants (0-2000 nM) in
20 l DNA-binding buffer (10 mM Tris-Cl, pH 8.0, 100 mM KCl, 2 mM DTT, 10% [vol/vol]
glycerol). The samples were loaded onto 12% acrylamide gel and electrophoresed in 0.5 ×
TBE buffer for 1 h 50 min. Following electrophoresis, the gels were stained with SYBR®
Gold (Life Technologies) and scanned by Typhoon FLA 7000 (GE Healthcare Life Science).
The bands were quantified using ImageQuant TL (GE Healthcare Life Science).
Gel-filtration chromatography
Gel-filtration chromatography was carried out using an ÄKTA–FPLC system. Briefly, purified
proteins in PBS buffer were applied individually to a Superdex 200 HR 10/300 GL column
(GE Healthcare Bio- Sciences, America) equilibrated with the same buffer. The column was
46
operated at a flow rate of 0.5 ml/min, and 0.5 ml fractions were collected. The proteins were
detected by measuring the absorbance at 280 nm, 254 nm and 215 nm. The column was
calibrated with proteins of known molecular weight: Thyroglobulin, Bovine (669 kDa),
Apoferritin, Horse Spleen (443 kDa), β-Amylase, Sweet Potato (200 kDa), Alchol
Dehydrogenase, Yeast (150 kDa).
Circular dichroism (CD) spectroscopy
A far-UV CD spectrum was recorded on a Jasco 810 spectropolarimeter at a wavelength
range from 260 to 190 nm, a scan rate of 20 nm/min, 15 accumulations and 2 s response
time, at room temperature. Samples were recorded in a quartz cuvette with a 1mm path
length. A corresponding spectrum of the buffer was recorded and subtracted and the
resulting spectrum smoothed (Jasco software). The spectrum was recorded of 3.85 µM
protein in PBS, pH 8.0 and the ellipticity given as mean residue ellipticity [ϴ]MRW in
deg*cm2*dmol-1. A temperature denaturation profile was recorded at 220 nm by heating the
sample from 25°C to 95°C with a rate of 1°C/min, and apparent melting temperatures Tmapp’s
derived from fitting of the data to the following equation:
( ) ( ) ( ) (
)
(
)
Where ΔH(Tm) is the enthalpy change at Tm, and ΔS(Tm) the entropy change at Tm.
ssDNA annealing activity
For ssDNA annealing assay, the [32P] end-labelled 57-nt oligo 4 (1 nM) (Table S2) was
incubated in annealing buffer (30 mM Tris-HCl, pH7.5, 5 mM MgCl2, 75 mM NaCl, 50 mM
KCl and 1 mM DTT) with increasing amounts of gp18 at 25˚C for 5 min. The reaction was
initiated by adding the unlabelled complementary oligonucleotide 5 (1.2 nM) (Table S2) and
incubated at 50˚C for 15 min. The reaction was then stopped by the addition of 20 nM cold
oligo 4, 0.5% [wt/vol] SDS and 1 mg/ml proteinase K. The deproteination was carried out at
25˚C for 10 min and the samples were loaded on a 10% native polyacrylamide gel and run
at 100V for 1 h 20 min in 0.5 × TBE buffer. Following electrophoresis, gels were dried and
exposed to X-ray film for documentation. DNA was quantified using ImageQuant TL (GE
Healthcare Life Science).
47
Nuclease activity assays
The nuclease activity assays (20 l) were performed by mixing 0.08 M DNA duplex
(substrate A, B or C) or 0.05 M M13mp18 DNA in reaction buffer containing 20 mM
Tris-HCl, pH 7.5, 50 mM NaCl, 5 mM MgCl2, 1 mM DTT and 10% glycerol. Reactions were
initiated by the addition of 0.5 M SIRV2 gp19, and the mixtures were incubated at 50°C for
the indicated length of time. Time course analyses were carried out by scaling up the
reaction volume to 150 l and withdrawing 20 l aliquots at the indicated times. Reactions
were terminated by the addition of 6 l stop solution (0.1% [wt/vol] bromophenol blue, 0.1%
[wt/vol] xylene cyanol, 8% [vol/vol] glycerol, 1% [wt/vol] SDS, 50 mM EDTA and 2 mg/ml
protease K). As a negative control, the substrates were incubated in the reaction mix in the
absence of protein gp19. Samples for the exonuclease assay were analyzed by 12%
polyacrylamide gel electrophoresis in 0.5 x TBE buffer and stained with SYBR® Gold (Life
Technologies). Samples for the endonuclease assay were resolved in 0.7% agarose gel.
Western blot analysis
Proteins were separated in a 12.5% SDS polyacrylamide gel and transferred with transfer
buffer (39 mM Glycine, 50 mM Tris pH 8.7, 0.04% SDS, 20% Methanol) onto a nitrocellulose
membrane (Whattman, Germany) at 70 mA for 1h 15 min. The membrane was blocked for 1
h with PBST buffer (PBS buffer containing 0.05% Tween® 20) containing 8% milk powder
and then incubated with anti-His antibody (Qiagen, Germany, 1:2000 dilution in PBS buffer
containing 3% BSA). The membrane was further incubated for 1 h with a
peroxidase-coupled secondary antibody (anti-mouse 1:10 000 IgG, Sigma-Aldrich, America)
followed by 3 times wash with PBS buffer. An alkaline phosphate (Sigma-Aldrich, America)
substrate was used for detection according to the instructions provided.
In vitro detection of protein-protein interactions
The GST-tagged gp17-I, gp17-II (1-121aa), gp17-III (1-111aa) or gp19 remaining on the
GSH beads (50 g) was incubated at 25°C for 1 h with His-tagged prey proteins (90 g) in
PBS buffer (0.1 µg/ml BSA, 0.1% Triton X-100). The mixture was centrifuged for 3 min at
3000 × g and the supernatant was stored for later analysis. The beads were washed several
times with PBS buffer to remove unbound components and then heated in SDS loading
buffer at 98°C for 10 min before SDS-PAGE in a 12.5% gel. The gel was stained with PAGE
48
blue (Sigma Aldrich, UK) and the presence of His-tagged interaction partners was detected
by Western blot using anti-His antibody, as described above.
Identification of Sulfolobus proteins interacting with gp17
The gp17 fragment amplified by PCR (Table S1) was inserted between NdeI and Notl
restriction sites of the Sulfolobus/E.coli shuttle vector pEXA2 (22), allowing the expression
of His-tagged gp17 under the control of arabinose promoter. The constructed plasmid, as
well as the empty plasmid pEXA2, were then electroporated individually into the uracil
deficient competent cells (23). Single colonies of the transformants were inoculated into test
tubes containing 5 ml SCV (basal medium supplemented with 0.2% sucrose, 0.2%
casamino acids and 1% vitamin solution) (23), and incubated in an Innova 3100 oil-bath
shaker. Large-scale culturing was performed in ACV medium (0.2% D-arabinose was
substituted for sucrose) with Erlenmeyer flasks of long necks.
Purification of the His-tagged gp17 from the transformed Sulfolobus cells was
performed as described above, and the eluted proteins were evaluated by 12.5%
SDS-PAGE and stained with PAGE blue (Sigma Aldrich, UK). Protein bands present
exclusively in the gp17-containing transformant were sliced from the gel and subjected to
MALDI-TOF analysis (Alphalyse A/S, Odense, Denmark).
RESULTS
Bioinformatic analysis revealed high conservation of SIRV2 gp17, gp18 and gp19 in
archaeal rudiviruses and filamentous viruses
gp17, gp18 and gp19 occur as a gene cluster in the genome of SIRV2 and were shown
previously to be transcribed from a single promoter generating a polycistronic transcript (10).
The operon organization suggests the three genes are functionally related. However, except
gp19 which was experimentally determined to possess a ssDNA endonuclease activity (21),
very little is known about the function of gp17 and gp18. The crystal structure of gp17
homolog encoded by SIRV1 has been resolved (24), but no functional insight is available.
Although a weak similarity between a limited part of gp18 and bacterial ATPase domains of
Lon proteases was described (15), a tertiary structure prediction using the threading
program Phyre 2 (25) suggested a different function. About two thirds of gp18 sequence
49
(residues 140 - 430) matched with high confidence (98.8%) to MCM homolog 2 (c3f8tA)
from Methanopyrus kandleri, suggesting that gp18 might be a hexameric helicase.
The three genes are conserved in all rudiviruses and in one of the filamentous
viruses, AFV1 (Fig. 1). The amino acid (aa) sequence similarities range between 36 - 96%
for gp17 homologs, 51 - 100% for gp18 homologs and 51 – 98% for gp19 homologs.
Although no significant sequence similarity was detected between SIRV2 gp19 and the
other filamentous viral genomes, a putative nuclease is clearly encoded by the latter (except
AFV2) and it belongs to the Cas4 superfamily similar to SIRV2 gp19 (6;26). Interestingly, a
highly conserved gene encoding a putative helicase is found upstream of the putative
nuclease gene in all filamentous viral genomes. Upstream of the putative helicase gene is
another highly conserved gene encoding a 79 aa hypothetical protein showing no sequence
similarity to SIRV2 gp17 (Fig. 1). It is not clear whether this small gene is functionally related
to SIRV2 gp17, while homologs or analogs of SIRV2 gp18 and gp19 are present in almost
all rudiviral and filamentous viral genomes. Genome comparison of all the rudiviruses and
filamentous viruses using the Mutagen program (27) revealed that SIRV2 gp17, gp18 and
gp19 constitute the only conserved gene cluster in the archaeal linear viruses (Fig. S1 and
Table S3). Thus, the three genes appear important for both viral families.
gp17 is a single-stranded DNA binding protein
Structural features of SIRV2 gp17. To gain insights into the functions of the gene cluster, we
first examined the structure of gp17, which forms a dimer in the crystals (Protein Data Bank
identifier [ID] 2X5T)(24)(Fig. S2A). Although it doesn’t show obvious structural similarity to
any known domains present in Protein Data Bank (PDB), the electrostatics and the shape of
the molecule indicate a DNA binding activity. It is dominated by basic (blue) residues on the
concave side (Fig. S2C) and by acidic (red) residues on the convex side (Fig. S2D). The
concave side fits well as a DNA straddling pocket. Two arginine side chains point down in
the middle of the arch. The dimer binds a sulphate group, which may reflect its affinity
towards phosphates.
The total length of gp17 is 131 residues, and structural information is missing for 38
residues at the C-terminus. In line with this lack of structural information, analysis of the
sequence with protein disorder predictors revealed a potential intrinsically disordered
C-terminus of about 35 residues (28-30) (Fig. S3). Analysis of gp17 homologs encoded by
50
other rudiviruses and AFV1 revealed intrinsically disordered C-terminus of similar size (data
not shown), indicating the importance of the disorder in this domain for the function of the
protein.
ssDNA binding activity of gp17. gp17 was amplified from the SIRV2 genome and
cloned into E. coli vector pET30a. The C-terminally His-tagged protein was purified from
E.coli to homogeneity (Fig. S4) and tested for binding activity to different DNA substrates.
As shown in Fig. 2A, gp17 binds to substrates that are either ssDNA or dsDNA containing a
single or double flaps, whereas no binding to the blunt ended dsDNA was detected with the
same range of protein concentrations. The same result was obtained when ssDNA and
blunt ended dsDNA were mixed with equal molar concentration in the reaction, where
almost all ssDNA, but no or very little dsDNA, were shifted in the presence of 130 nM gp17
(Fig. 2B). At higher concentrations of gp17, no free ssDNA is available, and the protein
exhibited binding to the dsDNA, albeit with a much lower affinity. At a gp17 concentration of
3.9 µM, a significant amount of the dsDNA still remains unshifted, indicating that the affinity
of gp17 towards ssDNA is at least 30 times higher than to dsDNA.
A few positively charged residues forming a U-shaped binding channel on the gp17
dimer are crucial for its ssDNA binding activity
To identify essential elements of the ssDNA-binding domain of gp17, we first aligned the
sequences of gp17 homologs and identified 3 fully conserved positively charged residues,
R60, K61 and K82 (Fig.S5). R60 and K61 are located in a loop at the central cleft of the
concave side, while K82 is found on the surface of the convex side (Fig. 3A). The three
residues were mutated individually into alanine and the binding affinity of the mutant
proteins to ssDNA was compared to that of the wild-type gp17. Whereas the K82A variant
exhibited a similar level of binding affinity as the WT gp17, a 2 and a 5 fold reduction in
binding affinity was observed, respectively, for the R60A and the K61A variants.
Interestingly, simultaneous mutation of R60 and K61 to alanine abolished almost completely
its ssDNA binding activity (Fig. 3B).
Four other positively charged residues, R24, K27, K29 and R33, are relatively
conserved in the rudiviruses (Fig S5), and the corresponding residue of SIRV2 R33 in AFV1
ORF135 (K32) is also positively charged. Therefore the four residues were mutated to
alanine either individually or simultaneously. While the double mutant R24K27 and the triple
51
mutant R24K27K29 showed very little or only a mild reduction in the binding affinity (Fig. S6),
the R33 mutant demonstrated the most profound effect within the tested single mutants,
with an 8 fold drop of the binding activity observed (Fig. 3B and 3C).
The above experiments demonstrated that R33, R60 and K61 are important for the
ssDNA binding whereas R24, K27, K29 and K82 are less or not important. By examining the
location and the orientation of the residues, it is obvious that the side chains of the former
residues point to the central cleft and those of the latter residues point to the outer surface
(Fig. 3A). It appears that the residues important for DNA binding (R33, R60, K61) form a
positively charged and U-shaped structure in the gp17 dimer, thus straddling on ssDNA and
causing bending of the ssDNA (Fig. 3D).
Within the U-shaped path another relatively conserved residue, H54, has a positive
charge (Fig. S5) and was thus mutated to alanine to test its possible contribution to ssDNA
binding. In line with its charge, conservation and location in the protein, H54 appears also
important for ssDNA binding, as the H54A mutant demonstrated a 4 fold drop of binding
activity compared with the WT protein (Fig. 3B, 3C and 3D).
Purification, refolding and stability of gp18
To characterize gp18 biochemically, a certain amount of soluble protein was needed. As
gp18 couldn’t be cloned into Sulfolobus (see below) due to its toxicity and as recombinant
expression in E.coli resulted in the formation of inclusion bodies, a denaturation and
refolding strategy was employed to purify the His-tagged gp18 from E.coli. Following cell
lysis, the inclusion bodies were pelleted and dissolved in 8 M urea (Fig. 4A lane 3), and
gp18-His was purified using Ni-NTA-agarose beads. The denatured gp18-His was then
refolded in L-Arginine buffer (31).
The protein appeared refolded properly as it remained soluble in the solution after 20
min incubation at 70°C (Fig. 4A lane 4). To assess the fold integrity of gp18 after
recombinant production and refolding, the final preparation was subjected to structure
analyses by CD spectroscopy. The far-UV CD spectrum recorded at room temperature
revealed distinct negative molar ellipticity with minima at 218 and 208 nm, strongly
indicating that the protein is folded with content of both α-helices and β-strands (Fig. 4B).
Additionally, a temperature denaturation monitored at 220 nm showed that the protein was
stable with two cooperative transitions, one with an apparent melting temperature (Tmapp) of
52
~60°C and a major, highly cooperative transition with a Tmapp of 91°C, ΔH(Tm) = -469 ± 15
kJ/mol and ΔS(Tm) = -1.29 ± 0.04 kJ/mol (Fig. 4C). Because of the extreme stability of the
protein, the post transition was not perfectly revealed at the current experimental conditions,
and hence the thermodynamic parameters are approximations. However, conclusively,
recombinantly produced gp18 was highly stable, cooperatively folded and with content of
both α-helices and β-strands.
The oligomerization status of the refolded gp18 was analysed by gel filtration
chromatography, which resulted in the formation of a broad peak containing two “shoulders”
with a total elution volume of 23.6 ml at a flow rate of 0.5 ml/min (Fig. 4D). The elute volume
of the main peak (labelled 1 in Fig. 4D) was between 9.85 and 10.60 ml and those of the two
“shoulders” were 8.05 ml and 12.75 ml, respectively. Assuming that gp18 has a shape and
partial specific volume similar to those of standard proteins, the molecular mass of the main
peak is estimated to be between 685 kDa and 458 kDa and that of the two “shoulders” to be
Vo and 147 kDa, respectively, calculated from a standard linear regression equation,
Kav=-0.2976(logMW)+1.8388 (Fig.S7). Since the molecular weight of the monomeric gp18 is
50.36 kDa, the protein was refolded as a series of oligomers from trimmers, nonamers to
dodecamers. The molecular mass of the main top (560.6 kDa) ranges from nonamers to
dodecamers, which might be the functional folds of gp18.
gp18 stimulates the annealing of complementary oligonucleotides
Since structural prediction suggested that gp18 could be a hexameric helicase such as
MCM, an essential protein for the initiation and elongation phases of DNA replication (32),
we performed helicase assays. However, no helicase activity was detected in spite of
multiple trials with different nucleic acid substrates and varied experimental conditions with
different metal ions, NTPs, and temperatures (data not shown). Surprisingly, during the
helicase assays, we found that instead of unwinding dsDNA, gp18 seemed to be able to
increase the dsDNA yield from two complementary oligonucleotides.
To determine whether gp18 catalyzes ssDNA annealing, a [32P]-labelled 57-nt
oligonucleotide (oligo-4 in Table S2) was first mixed with gp18. The annealing reactions
were initiated by the addition of the complementary strand (oligo-5) and stopped by a
20-fold excess of the unlabeled oligo-4. After deproteination, the reaction products were
resolved on a native PAGE gel. gp18-mediated annealing was dependent upon protein
53
concentration, with the reaction being most efficient at 200 nM of gp18 under the
experimental condition (Fig. 5A, lane 5), also supporting the oligomeric structure of the
protein. Moreover, the efficiency of annealing was not changed when ATP was excluded
from the reaction (Fig. 5A, lane 6), suggesting that ATP hydrolysis is not needed in the
catalyzed process.
As shown in the left panel of Fig. 5B, spontaneous annealing between the two
oligonucleotides occurred slowly in a time-dependent manner with only 40% of the ssDNA
annealed after 4 minutes incubation (Fig. 5C). The annealing process was drastically
accelerated by gp18 (right panel of Fig. 5B). The oligonucleotides were almost completely
annealed to form the slow-migrating dsDNA after 4 minutes of incubation in the presence of
gp18 (Fig. 5B and 5C).
gp19 demonstrates both ssDNA endonuclease and 5´-3´ exonuclease activities
Although previously demonstrated as a ssDNA endonuclease (21), gp19 shares sequence
similarity with different CRISPR-associated Cas4 proteins (26), which possess
metal-dependent endonuclease and 5´→3´exonuclease activities against ssDNA (33). We
therefore examined the possible exonuclease activity of gp19 using DNA substrates of
different structures. As shown in Fig. 6A, the migration of the blunt–end duplex DNA did not
change upon the addition of gp19, indicating that it is not a substrate of gp19. While the
3´-flap duplex DNA remained unchanged as the blunt-ended DNA, the 5´-flap duplex DNA
was cleaved with the final product having the same size of the blunt-ended DNA. It
appeared that gp19 initiated cleavage from the 5´ single-strand end and stopped at the
single strand and double strand junction (Fig. 6B). This indicates that gp19 has the 5´-3´
ssDNA exonuclease activity.
To confirm the ssDNA endonuclease activity of gp19, the circular ssDNA M13mp18
was tested with the same reaction buffer. As shown in Fig. 6C, the incubation with gp19 led
to slow degradation of the circular ssDNA. These results confirm that SIRV2 gp19
possesses both 5´→3´exonuclease activity and endonuclease activity against ssDNA.
Interactions between gp17 and gp18 and between gp18 and gp19
Given that gp17, gp18 and gp19 all work on the same type of substrate, ssDNA, it appeared
possible that the three proteins interact with one another. Since the removal of the last 10
54
residues showed little effect on the DNA binding activity of gp17 (Fig. S6), the intrinsically
disordered C-terminus is possibly involved in other functions such as protein-protein
interactions as demonstrated for the disordered C-terminus of bacterial SSB proteins (34).
Therefore, we first tested its possible interactions with gp18 and gp19.
gp17 and its C-terminally truncated variants were expressed as GST fusion proteins
and purified separately (Fig.7A). Individual GST fusion proteins were incubated with
His-tagged gp18, and immobilized on GSH beads. After centrifugation, the beads were
washed and boiled in SDS buffer before loading on SDS gel for Western blotting. Western
hybridization using His-tag antibody revealed the presence of gp18 on the GSH beads with
immobilized wild type gp17 protein, indicative of the interaction between gp17 and gp18 (Fig.
7B). However, no interaction was detected between gp18 and the two gp17 variants with 10
and 20 C- terminal residues removed, respectively (Fig. 7C). These results demonstrate
that gp17 interacts with gp18 and the C-terminal disordered domain of gp17 is essential for
the interaction.
The same method was applied to test possible interactions between GST-tagged
gp17 and His-tagged gp19, and between GST-tagged gp19 and His-tagged gp17, none of
which showed positive results (data not shown). While no interaction was detected between
gp17 and gp19, the GST-tagged gp19 retained a small amount of His-tagged gp18 on the
GSH beads (Fig. 7B), demonstrating a weak interaction between gp18 and gp19.
gp17 binds to two Sulfolobus host proteins
ssDNA binding proteins are essential for protecting ssDNA and recruiting specific
ssDNA-processing proteins. In bacteria, SSBs were found to interact with more than a
dozen different proteins involved in DNA replication, recombination and repair (34). To
identify possible interactions with other proteins, gp17 was cloned into the E.coli/Sulfolobus
shuttle vector pEXA2 under the control of arabinose promoter (22) and expressed in
Sulfolobus. By Ni-NTA-Agarose chromatography the His-tagged gp17 was co-purified with
two large proteins, of about 60 and 150 kDa, respectively (Fig. 7D). The absence of the two
bands in proteins purified from the control cells transformed with empty pEXA2 supported
that they were pulled-down specifically by gp17. Western blot hybridization using His-tag
antibody revealed a single band with the expected size of gp17-His, and the two large bands
were thus not oligomers of gp17-His (Fig.7E).
55
The two bands were sliced from the gel and identified by MALDI-TOF analysis. Band
1 contained a hypothetical protein encoded by SSO2277 with a theoretical mass of 57 kDa,
carrying an ATPase domain. Band 2 was identified to be reverse gyrase from S. solfataricus
P2 (SSO0422) with a mass of 142 kDa (Table S4). The same procedure was repeated with
SIRV2 infected transformants and revealed again the same results (data not shown). No
viral proteins such as gp18 were co-purified with gp17, which could be due to low
expression of gp18, as demonstrated previously by microarray analysis (12).
We attempted to clone gp18 and gp19 individually into Sulfolobus using pEXA2 as
cloning vector. Whereas gp18 was shown to be highly toxic and couldn’t be transformed into
Sulfolobus, overexpression of gp19 caused growth retardation of the transformant, and no
host proteins were identified to interact with gp19 (data not shown).
DISCUSSION
Single stranded DNA binding proteins are ubiquitous across all three domains of life and are
found in many viruses playing essential roles in genome maintenance, DNA replication,
recombination, repair and transcription. They can coat, protect and remove secondary
structures of the ssDNA intermediates. Besides, some specific ssDNA-processing proteins
are recruited and coordinated by ssDNA binding proteins during DNA metabolism pathways
(35-37). In spite of high sequence, structural and functional divergence, almost all classical
ssDNA binding proteins contain one of the following four structural topologies:
oligonucleotide/oligosaccharide/oligopeptide-binding (OB) folds, K homology (KH) domains,
RNA recognition motifs (RRMs), and whirly domains (38). Recently a group of
hyperthermophilic archaeal organisms were found to lack a classical ssDNA binding protein
and instead to harbour a distinct ssDNA binding protein termed ThermoDBP (39). The
ssDNA binding protein encoded by SIRV2 gp17 differs in structure from the classical ssDNA
binding proteins as well as from the ThermoDBPs, and thus constitutes a novel
non-canonical ssDNA binding protein.
Single strand annealing activity has been detected in different proteins including
some helicases and recombinases encoded by cellular life and by some viruses (reviewed
by (40). In many of the helicases containing annealing activity, a separate protein domain
distinct from the helicase domain is responsible for the annealing activity (40). Remarkably,
a helicase domain-containing protein, HARP, was recently discovered to possess annealing,
56
but no unwinding, activity (41). HARP binds to the ssDNA binding protein RPA and anneals
RPA-coated complementary ssDNA. Mutations in HARP are associated with Schimke
Immuno-Osseous Dysplasis (SIOD) disease and the defects in the annealing activity of two
HARP mutants correlate with the severity of the disease (41). Together with AH2, another
protein with similar features (42), HARP was termed annealing helicase. In this study, the
annealing activity was clearly demonstrated for the SIRV2 gp18 protein. The failure of
detecting the helicase activity, which was predicted by structural modelling of the gp18
sequence, could be due to the lack of proper experimental conditions or possible mask of
helicase activity by the stronger annealing activity. A third possibility is that gp18 carries no
helicase activity, as demonstrated for the annealing helicases. While only the structural
modelling revealed a connection between SIRV2 gp18 and a MCM helicase, a high
sequence similarity to Cas3 and other helicases was clearly detected by BlastP searches of
the gp18 analogues encoded in the genomes of most filamentous viruses (Fig. 1).
Interestingly, the E. coli Cas3 was found to possess both helicase and annealing activities
(43).
To better understand the function of the entire gene operon, the protein product of
the third gene, gp19, was further characterized in this study which revealed a 5’-3’ ssDNA
exonuclease activity, in addition to the previously demonstrated ssDNA endonuclease
activity (Fig. 6 and Garder et al., 2011b). The operonic or clustered organization of the three
genes in rudi- and filamentous viruses (Fig. 1) and the observed interactions between their
protein products (Fig. 7) strongly suggest their close cooperation in a same process(es)
involving ssDNA. The SIRV2 genome replication study by different approaches
demonstrated that SIRV2 forms ssDNA intermediates larger than a single genome size, and
large concatemers are abundant during the replication process (Martinez-Alvarez et al., in
preparation). This requires, first of all, abundant ssDNA binding protein to protect the ssDNA
intermediates and the highly expressed gp17 (12) may fulfil this requirement. To mature into
dsDNA monomers, the long ssDNA concatemers must anneal between the two
complementary strands, which could be facilitated by gp18. Subsequent nicking by a ssDNA
endonuclease and ligation by an unknown ligase would produce a mature dsDNA genome.
Through protein-protein interactions, gp17 can recruit gp18 to facilitate ssDNA annealing
whereas gp19 can be recruited by gp18 to perform the final cleavage. In support of this
scenario, gp17 was found to be still present at high amount at the late stage of SIRV2 life
57
cycle, together with the tail-fiber protein of SIRV2 virions (11). Thus, it is very likely that the
gene operon is involved in genome maturation of SIRV2 replicative intermediates.
Another common and interesting feature shared by the rudiviruses and filamentous
viruses is the presence of multiple 12 bp insertion/deletions (indels) in their genomes,
revealed by sequence alignment between closely related viral genomes and between
homologous genes (6;44). In the latter case where the nucleotide sequences diverged too
much to be aligned, amino acid sequence alignment between homologs allowed the
detection of a single or multiple 4 residue indels. Given the proposed function of Cas4 in
CRISPR spacer acquisition and the fact that gp19 belongs to the Cas4 nuclease
superfamily (26;33), it is possible that gp19 is involved in the generation of the 12 bp indels
and the annealing activity of gp18 fits well with both insertion and deletion scenarios.
Following strand-displacement replication as proposed for both AFV1 (45) and SIRV2 (19)
unpublished data from Martinez-Alvarez et al.), ssDNA bubbles may arise frequently and
spontaneously during genome maturation. Repair of such structures involving ssDNA
binding protein (gp17), annealing protein (gp18) and ssDNA nuclease (gp19) could in
principle produce either insertions or deletions.
A third possible function of the gene operon is recombination involved in general
repair or replication initiation, which has been proved important for many viruses (e.g. T4 as
in (46). After dsDNA unwinding by a helicase, which remains to be identified in this case,
ssDNA nuclease, binding and annealing activities are all needed in the classical
recombination processes (47) and the identified functions of the three proteins fit well with
the scenario. In support of this, Phyre2 structural modelling of SSO2277,
a Sulfolobus protein interacting with gp17 (Fig. 7D) and annotated as hypothetical, revealed
a good match (99.9% confidence over half of the protein) with proteins of the family RecF,
RecN, Rad50 etc (data not shown). The latter proteins are involved in recombination (48).
In conclusion, this is the first study providing the functional characterization of an
entire gene operon conserved in archaeal rudiviruses and filamentous viruses. Due to low or
no sequence homology with characterized proteins, the majority of archaeal viral genes
remain hypothetical. This had hindered the progress of the archaeal virology field. The
results from this study will therefore contribute to better understanding of the novel viruses
infecting Archaea, the third domain of life. More importantly, the sequence and/or structural
divergence of the three proteins from previously characterized ssDNA binding, annealing
58
and nuclease proteins not only add novelty to, but also provide important information for
evolutionary studies of these proteins, which are nearly ubiquitous from bacteria, archaea to
eukaryotes including humans.
FUNDING
This work was supported by the European Union Frame Work 7 program 265933. Y.G.
received a stipend from China Scholarship Council.
REFERENCES
1. Pietila,M.K., Roine,E., Paulin,L., Kalkkinen,N. and Bamford,D.H. (2009) An ssDNA virus
infecting archaea: a new lineage of viruses with a membrane envelope. Mol.
Microbiol., 72, 307-319.
2. Mochizuki,T., Krupovic,M., Pehau-Arnaudet,G., Sako,Y., Forterre,P. and Prangishvili,D.
(2012) Archaeal virus with exceptional virion architecture and the largest
single-stranded DNA genome. Proc. Natl. Acad. Sci. U. S. A, 109, 13386-13391.
3. Pietila,M.K., Demina,T.A., Atanasova,N.S., Oksanen,H.M. and Bamford,D.H. (2014)
Archaeal viruses and bacteriophages: comparisons and contrasts. Trends Microbiol.,
22, 334-344.
4. Prangishvili,D., Arnold,H.P., Gotz,D., Ziese,U., Holz,I., Kristjansson,J.K. and Zillig,W.
(1999) A novel virus family, the Rudiviridae: Structure, virus-host interactions and
genome variability of the sulfolobus viruses SIRV1 and SIRV2. Genetics, 152,
1387-1396.
5. Blum,H., Zillig,W., Mallok,S., Domdey,H. and Prangishvili,D. (2001) The genome of the
archaeal virus SIRV1 has features in common with genomes of eukaryal viruses.
Virology, 281, 6-9.
6. Vestergaard,G., Aramayo,R., Basta,T., Haring,M., Peng,X., Brugger,K., Chen,L.,
Rachel,R., Boisset,N., Garrett,R.A. et al. (2008) Structure of the acidianus filamentous
virus 3 and comparative genomics of related archaeal lipothrixviruses. J. Virol., 82,
371-381.
7. Vestergaard,G., Haring,M., Peng,X., Rachel,R., Garrett,R.A. and Prangishvili,D. (2005)
A novel rudivirus, ARV1, of the hyperthermophilic archaeal genus Acidianus. Virology,
336, 83-92.
8. Servin-Garciduenas,L.E., Peng,X., Garrett,R.A. and Martinez-Romero,E. (2013)
Genome sequence of a novel archaeal rudivirus recovered from a mexican hot spring.
Genome Announc., 1.
59
9. Peng,X., Blum,H., She,Q., Mallok,S., Brugger,K., Garrett,R.A., Zillig,W. and
Prangishvili,D. (2001) Sequences and replication of genomes of the archaeal
rudiviruses SIRV1 and SIRV2: relationships to the archaeal lipothrixvirus SIFV and
some eukaryal viruses. Virology, 291, 226-234.
10. Kessler,A., Brinkman,A.B., van der Oost,J. and Prangishvili,D. (2004) Transcription of
the rod-shaped viruses SIRV1 and SIRV2 of the hyperthermophilic archaeon
sulfolobus. J. Bacteriol., 186, 7745-7753.
11. Quax,T.E., Krupovic,M., Lucas,S., Forterre,P. and Prangishvili,D. (2010) The Sulfolobus
rod-shaped virus 2 encodes a prominent structural component of the unique virion
release system in Archaea. Virology, 404, 1-4.
12. Okutan,E., Deng,L., Mirlashari,S., Uldahl,K., Halim,M., Liu,C., Garrett,R.A., She,Q. and
Peng,X. (2013) Novel insights into gene regulation of the rudivirus SIRV2 infecting
Sulfolobus cells. RNA. Biol., 10, 875-885.
13. Deng,L., He,F., Bhoobalan-Chitty,Y., Martinez-Alvarez,L., Guo,Y. and Peng,X. (2014)
Unveiling cell surface and type IV secretion proteins responsible for archaeal
rudivirus entry. J. Virol., 88, 10264-10268.
14. Prangishvili,D., Garrett,R.A. and Koonin,E.V. (2006) Evolutionary genomics of archaeal
viruses: unique viral genomes in the third domain of life. Virus Res., 117, 52-67.
15. Prangishvili,D., Koonin,E.V. and Krupovic,M. (2013) Genomics and biology of
Rudiviruses, a model for the study of virus-host interactions in Archaea. Biochem. Soc.
Trans., 41, 443-450.
16. Vestergaard,G., Shah,S.A., Bize,A., Reitberger,W., Reuter,M., Phan,H., Briegel,A.,
Rachel,R., Garrett,R.A. and Prangishvili,D. (2008) Stygiolobus rod-shaped virus and
the interplay of crenarchaeal rudiviruses with the CRISPR antiviral system. J.
Bacteriol., 190, 6837-6845.
17. Guilliere,F., Peixeiro,N., Kessler,A., Raynal,B., Desnoues,N., Keller,J., Delepierre,M.,
Prangishvili,D., Sezonov,G. and Guijarro,J.I. (2009) Structure, function, and targets of
the transcriptional regulator SvtR from the hyperthermophilic archaeal virus SIRV1. J.
Biol. Chem., 284, 22222-22237.
18. Quax,T.E., Voet,M., Sismeiro,O., Dillies,M.A., Jagla,B., Coppee,J.Y., Sezonov,G.,
Forterre,P., van der Oost,J., Lavigne,R. et al. (2013) Massive activation of archaeal
defense genes during viral infection. J. Virol., 87, 8419-8428.
19. Oke,M., Kerou,M., Liu,H., Peng,X., Garrett,R.A., Prangishvili,D., Naismith,J.H. and
White,M.F. (2011) A dimeric Rep protein initiates replication of a linear archaeal virus
60
genome: implications for the Rep mechanism and viral replication. J. Virol., 85,
925-931.
20. Gardner,A.F., Guan,C. and Jack,W.E. (2011) Biochemical characterization of a
structure-specific resolving enzyme from Sulfolobus islandicus rod-shaped virus 2.
PLoS. One., 6, e23668.
21. Gardner,A.F., Prangishvili,D. and Jack,W.E. (2011) Characterization of Sulfolobus
islandicus rod-shaped virus 2 gp19, a single-strand specific endonuclease.
Extremophiles., 15, 619-624.
22. Gudbergsdottir,S., Deng,L., Chen,Z., Jensen,J.V., Jensen,L.R., She,Q. and Garrett,R.A.
(2011) Dynamic properties of the Sulfolobus CRISPR/Cas and CRISPR/Cmr systems
when challenged with vector-borne viral and plasmid genes and protospacers. Mol.
Microbiol., 79, 35-49.
23. Deng,L., Zhu,H., Chen,Z., Liang,Y.X. and She,Q. (2009) Unmarked gene deletion and
host-vector system for the hyperthermophilic crenarchaeon Sulfolobus islandicus.
Extremophiles., 13, 735-746.
24. Oke,M., Carter,L.G., Johnson,K.A., Liu,H., McMahon,S.A., Yan,X., Kerou,M.,
Weikart,N.D., Kadi,N., Sheikh,M.A. et al. (2010) The Scottish Structural Proteomics
Facility: targets, methods and outputs. J. Struct. Funct. Genomics, 11, 167-180.
25. Kelley,L.A. and Sternberg,M.J. (2009) Protein structure prediction on the Web: a case
study using the Phyre server. Nat. Protoc., 4, 363-371.
26. Zhang,J., Kasciukovic,T. and White,M.F. (2012) The CRISPR associated protein Cas4 Is
a 5' to 3' DNA exonuclease with an iron-sulfur cluster. PLoS. One., 7, e47232.
27. Brugger,K., Redder,P. and Skovgaard,M. (2003) MUTAGEN: multi-user tool for
annotating genomes. Bioinformatics., 19, 2480-2481.
28. Xue,B., Dunbrack,R.L., Williams,R.W., Dunker,A.K. and Uversky,V.N. (2010)
PONDR-FIT: a meta-predictor of intrinsically disordered amino acids. Biochim. Biophys.
Acta, 1804, 996-1010.
29. Dosztanyi,Z., Csizmok,V., Tompa,P. and Simon,I. (2005) IUPred: web server for the
prediction of intrinsically unstructured regions of proteins based on estimated energy
content. Bioinformatics., 21, 3433-3434.
30. Munoz,V. and Serrano,L. (1994) Elucidating the folding problem of helical peptides
using empirical parameters. Nat. Struct. Biol., 1, 399-409.
61
31. Kawano,S., Iyaguchi,D., Okada,C., Sasaki,Y. and Toyota,E. (2013) Expression,
purification, and refolding of active recombinant human E-selectin lectin and EGF
domains in Escherichia coli. Protein J., 32, 386-391.
32. Chong,J.P., Hayashi,M.K., Simon,M.N., Xu,R.M. and Stillman,B. (2000) A
double-hexamer archaeal minichromosome maintenance protein is an
ATP-dependent DNA helicase. Proc. Natl. Acad. Sci. U. S. A, 97, 1530-1535.
33. Lemak,S., Beloglazova,N., Nocek,B., Skarina,T., Flick,R., Brown,G., Popovic,A.,
Joachimiak,A., Savchenko,A. and Yakunin,A.F. (2013) Toroidal structure and DNA
cleavage by the CRISPR-associated [4Fe-4S] cluster containing Cas4 nuclease SSO0001
from Sulfolobus solfataricus. J. Am. Chem. Soc., 135, 17476-17487.
34. Shereda,R.D., Kozlov,A.G., Lohman,T.M., Cox,M.M. and Keck,J.L. (2008) SSB as an
organizer/mobilizer of genome maintenance complexes. Crit Rev. Biochem. Mol. Biol.,
43, 289-318.
35. Bochkarev,A. and Bochkareva,E. (2004) From RPA to BRCA2: lessons from
single-stranded DNA binding by the OB-fold. Curr. Opin. Struct. Biol., 14, 36-42.
36. Suck,D. (1997) Common fold, common function, common origin? Nat. Struct. Biol., 4,
161-165.
37. Theobald,D.L., Mitton-Fry,R.M. and Wuttke,D.S. (2003) Nucleic acid recognition by
OB-fold proteins. Annu. Rev. Biophys. Biomol. Struct., 32, 115-133.
38. Dickey,T.H., Altschuler,S.E. and Wuttke,D.S. (2013) Single-stranded DNA-binding
proteins: multiple domains for multiple functions. Structure., 21, 1074-1084.
39. Paytubi,S., McMahon,S.A., Graham,S., Liu,H., Botting,C.H., Makarova,K.S., Koonin,E.V.,
Naismith,J.H. and White,M.F. (2012) Displacement of the canonical single-stranded
DNA-binding protein in the Thermoproteales. Proc. Natl. Acad. Sci. U. S. A, 109,
E398-E405.
40. Wu,Y. (2012) Unwinding and rewinding: double faces of helicase? J. Nucleic Acids,
2012, 140601.
41. Yusufzai,T. and Kadonaga,J.T. (2008) HARP is an ATP-driven annealing helicase.
Science, 322, 748-750.
42. Yusufzai,T. and Kadonaga,J.T. (2010) Annealing helicase 2 (AH2), a DNA-rewinding
motor with an HNH motif. Proc. Natl. Acad. Sci. U. S. A, 107, 20970-20973.
43. Howard,J.A., Delmas,S., Ivancic-Bace,I. and Bolt,E.L. (2011) Helicase dissociation and
annealing of RNA-DNA hybrids by Escherichia coli Cas3 protein. Biochem. J., 439,
85-95.
62
44. Peng,X., Kessler,A., Phan,H., Garrett,R.A. and Prangishvili,D. (2004) Multiple variants
of the archaeal DNA rudivirus SIRV1 in a single host and a novel mechanism of
genomic variation. Mol. Microbiol., 54, 366-375.
45. Pina,M., Basta,T., Quax,T.E., Joubert,A., Baconnais,S., Cortez,D., Lambert,S., Le,C.E.,
Bell,S.D., Forterre,P. et al. (2014) Unique genome replication mechanism of the
archaeal virus AFV1. Mol. Microbiol., 92, 1313-1325.
46. Mosig,G. (1998) Recombination and recombination-dependent DNA replication in
bacteriophage T4. Annu. Rev. Genet., 32, 379-413.
47. Kowalczykowski,S.C. (2000) Initiation of genetic recombination and
recombination-dependent replication. Trends Biochem. Sci., 25, 156-165.
48. Kowalczykowski,S.C., Dixon,D.A., Eggleston,A.K., Lauder,S.D. and Rehrauer,W.M.
(1994) Biochemistry of homologous recombination in Escherichia coli. Microbiol. Rev.,
58, 401-465.
TABLE AND FIGURE LEGENDS
Table 1. Structures of the DNA substrates used in this study.
Figure 1. Organization of SIRV2 gp17, gp18, gp19 and their homologs in the genomes of
archaeal linear viruses. The pattern codes are as follows:
SIRV2 gp17 homolog; SIRV2 gp18 homolog or analog; SIRV2 gp19
homolog; conserved gene upstream of SIRV2 gp18 homolog in most
filamentous viruses.
Figure 2. gp17 binds to ssDNA. (A) gp17 binds to DNA substrates with either a 23-nt
5´-ssDNA flap or Y-shaped double-flaps (23 nt), but not to blunt-ended dsDNA. F, free DNA;
C, DNA-protein complex. (B) gp17 shows a high preference towards ssDNA than to dsDNA.
The concentration (nM) of gp17 is indicated on the top of the gel.
Figure 3. Mutagenesis of gp17 revealed a U-shaped binding path for ssDNA. (A) The
structure of a monomer of gp17 homolog with the conserved positive charged residues
labeled in stick model. (B) Gel retardation assays using gp17 WT and mutant proteins and
ssDNA. Protein concentrations are labeled on the top of each gel. DNA forms are indicated
by a short line (free) or a line covered with a circle (DNA-protein complex). (C) Quantification
of the ssDNA binding activity of different gp17 mutants based on the results shown in B. (D)
63
Binding path of ssDNA on gp17. The residues contributing to ssDNA binding are labeled in
sticks.
Figure 4. Characterization of the refolded recombinant gp18. (A) Purification and refolding
of gp18 from inclusion bodies. Lane 1, supernatant of E.coli cells expressing gp18; lane 2,
pellet of the lysate; lane 3, pellet protein dissolved in 8 M urea; lane 4, supernatant of gp18
after purification, refolding, heating at 70°C for 20 min and centrifugation. (B) Far-UV CD
spectrum of refolded gp18. (C) Temperature denaturation of gp18 followed at 220 nm. (D)
Gel-filtration chromatographic analysis of the purified and refolded gp18 protein. The main
peak and the two shoulders are labeled.
Figure 5. gp18 stimulates annealing of the complementary oligonucleotides. (A)
Concentration-dependent enhancement of oligonucleotide annealing by gp18. The 32P-labeled 57-mer oligo-4 (1nM) and the complementary oligo-5 (1.2 nM) were incubated in
the absence (lane 1) or presence (lanes 2 to 6) of gp18. gp18 concentrations were indicated
on the top of the gel. The presence (lanes 2 to 5) or absence (lane 6) of ATP is also
indicated. (B) Time course of gp18-enhanced single-strand annealing. Left panel, annealing
without gp18; right panel, annealing in the presence of 200 nM gp18. (C) Quantification of
annealed DNA in the absence or presence of gp18. The percentages were calculated based
on the intensities of bands in B.
Figure 6. Nuclease activities of SIRV2 gp19. (A) Selective cleavage of DNA substrate with a
5´ ssDNA flap by gp19. (B) Gradual cleavage of ssDNA. (C) Endonuclease activity of gp19.
The circular ssDNA of M13mp18 was incubated at 50 °C with or without the addition of 0.5
μM gp19, and the incubation time is given on top of the gel.
Figure 7. Interactions between gp17, gp18 and gp19 and between gp17 and Sulfolobus host
proteins. (A) Schematic presentation of GST-tagged gp17 mutants. (B) Pull-down assays by
GST affinity chromatography. The purified and refolded His-tagged gp18 (labeled as P for
prey) was incubated with GST-tagged gp17 or GST-tagged gp19 on GSH column for 1 h.
After washing with PBS buffer, the GSH beads were boiled in the SDS loading buffer and
loaded for SDS-PAGE, and the interacting protein was detected by anti-His antibody. GST
protein was used as negative control. Positive controls for Western blotting were carried out
using the input His-tagged gp18. (C) Interaction between GST-tagged gp17 mutants and
His-tagged gp18. Pull-down assays were performed as described in B. (D) Identification of
Sulfolobus proteins interacting with His-tagged gp17 overexpressed in sulfolobus sofataricus
P2. Three fractions of eluted proteins from the negative control cells containing the empty
pEXA2 vector (lanes 1 to 3) and from the gp17 transformant (lanes 4 to 6) were tested. The
64
identified proteins are indicated at the right side. (E) Western blot hybridization of negative
control (Lane1) and gp17 protein elution (Lane 2).
Table S1. Details of the primers used in this study.
Table S2. Sequences of the oligonucleotides used as substrates in this study.
Table S3. The location, gene length and functions of the conserved gene cluster in all linear
viruses.
Table S4. Mass spectrometric peptide mapping and sequencing analysis of the two pulled
down proteins.
Figure S1. Genome comparison of all the rudiviruses and filamentous viruses using the
Mutagen program. Conserved gene clusters are labeled with red square. Homologs are
color-coded whereas white rectangles represent ORFs without homologues.
Figure S2. Structure of SIRV1 ORF1312-96 (PDB identifier [ID] 2X5T)(24). (A) Dimer
structure of SIRV1 ORF1312-96 coloured in deep teal and violet purple for the two monomers.
(B) Secondary structure elements of the monomer are labeled in different colors. (C) and (D),
A surface representation shown on the concave side and convex side of the dimer, indicating
the electrostatic potential of the putative binding interface.
Figure S3. (A) Probability of disordered gp17 aa: two different programs (IUpred and PONDR)
were used for the prediction, both revealed disorder at the C-terminus. (B) Percentage of
helicity of gp17.
Figure S4. Purification of SIRV2 gp17. Protein gp17 expressed and purified to homogeneity
from E.coli. Lane 1-4, four elution fractions from Ni-NTA-agarose beads.
Figure S5. Alignment of SIRV2 gp17 and its homologs. Identical residues are labeled as *,
and conserved positive charged residues are shaded red. Red arrows indicate the position of
the β sheets, blue bars indicate the position of α helices. Residues mutated to alanine in this
study are marked with black dots and numbered accordingly.
Figure S6. Gel retardation assays showing the binding of gp17 WT and some of the mutant
proteins to ssDNA.
Figure S7. Standard linear regression curve. The column Superdex 200 HR 10/30 was
calibrated with proteins of known molecular masses: Thyroglobulin, Bovine (669 kDa);
65
Apoferritin, Horse Spleen (443 kDa); β-Amylase, Sweet Potato (200 kDa) and Alcohol
Dehydrogenase, Yeast (150 kDa).
Name Structure Oligonucleotides*
Substrate A: Blunt end duplex 1+2
Substrate B: 5´-ssDNA flap duplex 1+3
Substrate C: 3´-ssDNA flap duplex 1+4
Substrate D: Y-shaped duplex 4+5
Table 1. Structures of the substrates used in this study
*: The sequences of the oligonucleotides are provided in Table S2
66
SRV << >>
ARV << >>
SIRV1,2 << >>
AFV1 << >>
SIFV << >>
AFV3-8 << >>
<< >> AFV9
Figure 1
67
C
F
0 33 66 130 260
F
0 33 66 130 260 gp17(nM)
C1
F
0 33 66 130 260
C2
A
0 66 130 260 1300 2600 3250 3900 gp17(nM)
B
Figure 2
68
K61
R60
R24
H54
R33
K82
K27
R29
A
Figure 3
0 16 33 66 133 266 400 533 666 1200
gp17 wt (nM)
B
0 16 33 66 133 266 400 533 666 1200
gp17 K82A (nM)
0 16 33 66 133 266 400 533 666 1200
gp17 R60A (nM)
0 33 66 133 266 400 533 800 1200 2000
gp17 K61A (nM)
0 33 66 133 266 400 533 800 1200 2000
gp17 H54A (nM)
0 133 266 400 533 733 933 1200 1466 2000
gp17 R33A (nM)
gp17 R60A K61A (nM)
0 133 266 400 533 733 933 1200 1466 2000
0
50
100
0 200 400 600 800 1000 1200
gp17 wt
K82A
R60A
H54A
K61A
R33A
R60A K61A
C
Bo
un
d r
ati
o %
Protein (nM)
D
R33
H54
K61 R60
69
Figure 4
Vo
1
2
D
B C A
M 1 2 3 4
gp18 55kDa
35kDa
15kDa
10kDa
70
Time (min) A
nn
ealin
g
(% )
C A
gp18 (nM) -- 50 100 150 200 200
ATP + + + + + -
Time (min) 15 15 15 15 15 15
1 2 3 4 5 6
B
gp18 (nM) 0 0 0 0 0
Time (min) 0 0.5 1 2 4
gp18 (nM) 0 200 200 200 200
Time (min) 0 0.5 1 2 4
Figure 5
0
10
20
30
40
50
60
70
80
90
0 1 2 3 4
control
gp18
71
Figure 6
A
5 5 5 3
10 3 10 10 3 10 10 3 10 Time (min)
5 3
3 3 5 5
gp19 (0.5M) – + + – + + – + +
3 3
C
M – + + + + 60 15 30 45 60
5 kbp
1 kbp
Time (min)
gp19 (0.5M)
5
5
5
– + + + +
Time (min)
gp19 (0.5M)
10 2 5 8 10
3
5 3
3
3
B
72
A
55kDa
35kDa
GSTgp17-I (131aa)
GSTgp17-II (121aa)
GSTgp17-III(111aa)
100 50 (aa)
B
Figure 7
C
M 55kDa
D E
55kDa
25kDa
15kDa
170kDa
70kDa
M 1 2 3 4 5 6
gp17
Sso2277
Reverse
gyrase
pEXA2-control pEXA2-gp17
1 2
gp17
130kDa
73
Expression
Host Tag Protein Name Oligo Nr. Sequence 5´-3´
E.coli
C-HIS tag
gp17 wt 1 Fw: GGCGAAAACCATATGGCCTCATTAAAACAAATAATAG
2 Rv: GCTTCTCGAGAAACTCCTCCTCAACTGTTTTTT
gp17(1-121aa)a 3 Rv: CGTACTCGAGTTATTTTTCTCTCGTTTTCTCTTCTT
gp17 K82Ab 4 1Rv: GCATACGCCTCTAGAAATTCAG
5 1Fw: GAATTTCTAGAGGCGTATGCAG
gp17 K32Ab 6 1Rv: CAACTATTCTCGCTATACCT TTTATC
7 1Fw: GGTATAGCGAGAATAGTTGTACAG
gp17 R60Ab 8 1Rv: CCAATTTGTTTCGCGAAATTATTC
9 1Fw: CGCGAAACAAATTGGAATAAC
gp17 K61Ab 10 1Rv: CCAATTTGCGCTCTGAAATTATT
11 1Fw: CAGAGCGCAAATTGGAATAAC
gp17 H54Ab 12 1Rv: GAAATTATTCTGACTCGCTATCGTC
13 1Fw: CATGACGATAGCGAGTCAGAA
gp17 R33Ab 14 1Rv: GTACAACTATCGCTTTTATACCTTTTA
15 1Fw: GCGATAGTTGTACAGTTAAATGC
gp17 R60A K61Ab 16 1Rv: TTGCGCCGCGAAATTATTCTG
17 1Fw: TTCGCGGCGCAAATTGGAA
gp17 R24A K27Ab 18 1Rv: CGCTAAAATCGCAGACGCTATTTTATTGTTCTCTTT
19 1Fw:GCGATTTTAGCGATAAAAGGTATAAAAAGAATAGTTGTAC
gp17 R24A K27A
K29Ab
20 1Rv: TCTTTTTATACCCGCTATCGCTAAAATCGC AGACGC
21 1Fw: GCG ATA GCG GGT ATA AAA AGA ATA GTT GTA CAG
gp18 22 Fw: CATTTGTTCCATATGAGTGAAAACACACAACTATTTG
23 Rv: CGTACTCGAGCCATCCTCCTAAATTGCTAAATC
gp19 24 Fw: CTACCATTCATATGGTAAATATGAATTATGAAGATC
25 Rv: GCGCTCGAGAAAAAGTGATATAATGCATTTTTG
N-GST tag
GSTgp17-I 26 Fw: ATCGGGATCCGCCTCATTAAAACAAATAATAG
27 Rv: CGTACTCGAGTTAAAACTCCTCCTCAACTGTTTTTT
GSTgp17-IIc 28 Rv: CGTACTCGAGTTATTTTTCTCTCGTTTTCTCTTCTT
GSTgp17-IIIc 29 Rv: CGTACTCGAGTTATTGTTCCATGTCTAGCTCTTC
GSTgp19 30
Fw:CGCGGATCCGTAAATATGAATTATGAAGATCATATAAAAGAA
AG
31 Rv:GCCGCTCGAGTTAAAAAAGTGATATAATGCATTTTTGTTTG
Sulfolobus
solfataricus C-HIS tag
gp17 34 Fw: CATTTGTTCCATATGGCCTCATTAAAACAAATAATAG
35 Rv: CTCAACTAGCGGCCGCAAACTCCTCCTCAACTGTTT
gp18 36 Fw: CATTTGTTCCATATGAGTGAAAACACACAACTATTTG
37 Rv:TATTAATAGCGGCCGCCCATCCTCCTAAATTGCTAAAT
gp19 38 Fw: CATTTGTTCCATATGGTAAATATGAATTATGAAGATC
39 Rv: CTCTTCTAGCGGCCGCTTAAAAAAGTGATATAATGCATTT
a The forward primer used for this construction is oligo-1.
b For this construction, the front part fragment was amplified using oligo1 and 1Rv primer, and the rest part
fragment was amplified using 1Fw and oligo2. Then the two fragments were used as template to amplify the whole
construction by oligo-1 and oligo-2.
c The forward primer used for this construction is oligo-26.
Table S1. Details of the primers used in this study
74
Name Sequence (5’ to 3’)
oligo-1……………………..CGTACTCGAGTTATTGTTCCATGTCTAGCTCTTC
oligo-2………………….....GAAGAGCTAGACATGGAACAATAACTCGAGTACG
oligo-3……………………..GTTATTGCATGAAAGCCCGGCTGGAAGAGCTAGACATGGAACAATAACTCGAGTACG
oligo-4……………………..GTCAGTCCAAAAGTACATTATTGCGTACTCGAGTTATTGTTCCATGTCTAGCTCTTC
oligo-5……………………..GAAGAGCTAGACATGGAACAATAACTCGAGTACGGTTATTGCATGAAAGCCCGGCTG
Table S2. Sequences of the oligonucleotides used as substrates in this study
75
Strain Gene1 Length
(aa) Function Gene2
Length
(aa) Function Gene3
Length
(aa) Function
AFV3 gp10 79 Hypothetical
protein gp09 593 Putative helicase gp08 203 Putative nuclease
AFV6 gp10 79 Hypothetical
protein gp09 593 Putative helicase gp08 203 Putative nuclease
AFV7 gp05 79 Hypothetical
protein gp04 593 Putative helicase gp03 203 Putative nuclease
AFV8 gp07 79 Hypothetical
protein gp06 593 Putative helicase gp05 203 Putative nuclease
AFV9 gp11 79 Hypothetical
protein gp10 602
Putative Holiday
junction branch
migration
helicase
gp08 203 Putative
nuclease
SIFV SIFV-08 79 Hypothetical
protein SIFV-07 601 Putative helicase SIFV-06 232
Hypothetical
protein
AFV2 - gp15 425 Hypothetical
protein -
AFV1 gp14 135 Hypothetical
protein gp15 426
Hypothetical
protein gp17 223
CRISPR-associated
Cas4-like protein
ARV1 gp12 134 Hypothetical
protein gp16 443
Hypothetical
protein gp17 207
Hypothetical
protein
SRV SRV-
ORF138- 138
Hypothetical
protein
SRV-
ORF440 440
Hypothetical
protein
SRV-
ORF199 199
Hypothetical
protein
SIRV2 gp17 131 Hypothetical
protein gp18 436
Hypothetical
protein gp19 207
Single strand
nuclease
Table S3. The location, gene length and functions of the conserved gene cluster in all linear viruses
76
Table S4. Mass spectrometric peptide mapping and sequencing analysis
of the two pulled down proteins
77
Figure S1
78
A B
C D
Figure S2
79
Figure S3
80
M 1 2 3 4
25KDa
15KDa
10KDa
SIRV2 gp17
Figure S4
81
Figure S5
82
0 33 66 130 260
gp17 wt
(nM) 0 33 66 130 260
gp17(R24A K27A)
(nM) 0 33 66 130 260
gp17(R24A K27A K29A)
(nM)
0 33 66 130 260
gp17(1-121aa)
(nM)
Figure S6
83
Ka
v
log MW
660kDa
440kDa
200kDa
150kDa
Figure S7
84
85
Manuscript II
Genome-wide binding profile of two transcription regulators of
Sulfolobus solfataricus
Yang Guo, Xu Peng
In preparation
86
Genome-wide binding profile of two transcription regulators of
Sulfolobus solfataricus
Yang Guo and Xu Peng *
Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200 CPH N. Denmark
* To whom correspondence should be addressed. Tel: +45 35322018; Fax: +45 35322128; Email:
87
Abstract
Two transcription regulators sso2474 and sso10340 from Sulfolobus solfataricus P2 were
differently regulated upon SIRV2 infection. A method similar as Chromatin
immunoprecipitation combined with subsequent high-throughput sequencing (Chip-seq)
was applied in this study to get into the gene composition of the two protein regulons in vivo.
Mapping of the sequencing data with Sulfolobus solfataricus P2 and SIRV2 genomes
demonstrated that sso2474 binds with a high affinity to virus genome, whereas sso10340
mainly binds to the host DNA. A total of 27 enriched host DNA fragments extracted from
sso10340-DNA complex appeared as potential binding targets, most of which are genes
involved in energy metabolism, transport, translation and amino acid metabolism. The
genome-wide binding profiles presented here reveal two different kinds of regulon
conditions and contribute to the knowledge expansion of the transcription regulations upon
virus infection.
88
Introduction
Viruses infecting the organisms of Archaea, the third domain of life, comprise the most
diverse and previously unsuspected virion morphotypes (Pina et al., 2011). In the last few
years, a substantial effort was made to explore the functions of the hypothetical proteins and
the viral infection life cycles (Kessler et al., 2004;Oke et al., 2011;Gardner et al., 2011;Bize
et al., 2009). To date, several virus-host systems have become promising models providing
a great opportunity for studying virus-host interactions.
One of the best-studied viruses in hyperthermophilic archaea is SIRV2 (Sulfolobus
islandicus rod-shaped virus 2), isolated from an acidic hot spring in Iceland, belongs to
Rudiviridae and share a common ancestry with the family Lipothrixviridae. Studies by
transmission electron microscopy showed that SIRV2 virions specificially recognized the
pilus-like filaments on the host cell surface to get adsorption (Quemin et al., 2013). On the
other hand , two gene clusters, cluster sso3138 to sso3141 and cluster sso2386 and sso2387
identified from the SIRV2 resistant Sulfolobus mutants were confirmed responsible for the
virus entry (Deng et al., 2014), providing first insights into its entry process. Unlike most
archaeal viruses, infecting host cells with a `carrier state`, this linear non-enveloped double-
stranded DNA (dsDNA) SIRV2, together with TTV1 and STIV, are lytic viruses (Bize et al.,
2009;Ortmann et al., 2008;Zillig et al., 1996). The virions released from the host cell
through a unique mechanism, which involvs the formation of pyramid-like protrusions,
transecting the cell envelope and S-layer. At the end of the infection stage, this seven
isosceles triangular faces pyramid opens up, allowing mature virions to escape from the cell
(Bize et al., 2009;Quax et al., 2011). To gain better insights into the biology of virus, life
cycle and their effect on the host, microarray analysis to determine the transcriptional
responses of the host and the virus during the infection process could be very efficient and
had been successfully applied to three archaeal viruses, the fusellovirus SSV1, the
icosahedral virus STIV and the Rudiviridae virus SIRV2 (Frols et al., 2007;Ortmann et al.,
2008;Okutan et al., 2013).
What we focused in this work is to investigate the host genes regulation upon SIRV2
infection. As the previous study revealed that a total of 148 host genes differently responded,
and among these genes, two transcription regulators sso2474 and sso10340 were up and
89
down regulated, respectively. It is raised an interesting question as to how these two
proteins regulate the corresponding genes upon the virus infection stress. Are they global
regulators or just regulate their own promoters? A method similar with chip-seq was applied
to this study for detection of the DNA binding sites in vivo. Combined with the gene
expression analysis, we can get a first insight into the transcription regulation network
between virus and host cells.
Materials and methods
Sulfolobus cultivation and plasmid construction
sso2474 and sso10340 fragments were amplified from Sulfolobus solfataricus P2 genome
by PCR, digested with NdeI and NotI and inserted into the similarly digested
sulfolobus/E.coli shuttle vector pEXA3 (He et al., 2014), allowing the expression of His-
tagged gp17 under the control of arabinose promoter. The constructed plasmid, as well as
the empty plasmid pEXA3, were then electroporated individually into the uracil deficient
competent cells (Deng et a.,2009), Single colonies of the transformants were inoculated into
test tubes containing 5 ml SCV (basal medium supplemented with 0.2% sucrose, 0.2%
casamino acids and 1% vitamin solution) (Deng et a.,2009), and incubated in an Innova
3100 oil-bath shaker. Large-scale culturing was performed in ACV medium (0.2% D-
arabinose was substituted for sucrose) with Erlenmeyer flasks of long necks. When the
culture OD 600 reached to 0.8, it was infected by SIRV2 at about m.o.i of 10. The cells were
collected after 2.5 h virus post infection.
E. coli cells cultivation and plasmid construction
The coding sequences of sso2474 was amplified by PCR from Sulfolobus solfataricus P2
genome, digested with NdeI and XhoI and subsequently inserted into a similarly digested
pET-30(a) (Novagen) expression vector. E.coli BL21 CodonPlus cells were transformed
with individual plasmid construct and a single clone transformant was inoculated in LB
medium containing 30 g/ml kanamycin and 25 g/ml chloramphenicol. At an optical
density (OD600) of 0.4, IPTG (0.5 mM) was added to the culture and the cells were further
cultured at 25oC for 12 hours.
Protein purification
The purification of His-tagged sso2474 and sso10340 either from sulfolobus or E.coli was
carried out as follows, the harvested cell pellets were lysed in lysis buffer (50 mM Tris-Hcl
pH 8.0 , 300 mM NaCl , 1 mM EDTA, 1% Triton X100 and 1 mM PMSF) by sonication,
and different sonication time (4,6,8,10 min) was detected to minimum the size of the DNA
90
fragments bound by the proteins. Then the lysate were cleared by centrifugation at 10000× g
for 20 min. Supernatant was then incubated with Ni-NTA-agarose beads(Qiagen, Germany)
for 1 h at room temperature. Beads was washed three times with washing buffer (50 mM
Tris-Hcl pH 8.0 , 300 mM NaCl, 40 mM Imidazol ) and protein-DNA complex were eluted
with elution buffer (50 mM Tris-Hcl pH 8.0 , 300 mM NaCl, 250 mM Imidazol). The purity
of the protein was evaluated by 12.5 % SDS-PAGE and staining with PAGE blue (Sigma
Aldrich, UK), and the amount of DNA in the samples were detected on 0.7% Agarose gel
and stained with GelRed (Biotium).
DNA extraction and high-throughput sequencing
The eluted protein-DNA complex in solution were diluted with one volume of water and
treated with RNase A at room temperature for 1h. The deproteination was carried out by
incubation with 2 mg/ml Protease K at 50 °C for 2 h, and 65 °C for 8 h. Then the DNA was
extracted with phenol/chloroform/isoamyl (25:24:1) mixture solution, and finally was
precipitated and concentrated by ethanol precipitation. Sequencing libraries with an average
fragment size of 350 bp were prepared according to protocol of the ion plus fragment library
kit, and sequenced in the Ion PGM™ Sequencer (Life Technology).
Reads mapping and Peak detection
The quality filtered reads were treated and aligned to genome Sulfolobus solfataricus P2 as
well as SIRV2 using Bowtie, the ultrafast memory-efficient short read aligner, to align
sequenced sets of short DNA reads to large genomes (Satoh and Tabunoki, 2013), and then
the enriched peaks were visualized using Artemis (Carver T, etal. 2012), allowing for up to
two errors per reads (insertion, deletion and/or mismatch).
Real-time Quantitative PCR
qPCR reactions were performed in 10 L mixtures containing 5 L iQ SYBR Green
Supermix (Bio-Rad, Cat. No. 170-8880), 1 mM primers and around 1 ng total DNA.
Separate reactions were prepared for detection of reference gene and virus specific and
sulfolobus solfataricus-host specific amplicons. The mixtures were prepared in duplicates in
96-well microliter PCR plates (Bio-Rad Laboratories), sealed with an adhesive cover (Bio-
Rad Laboratories) and worked on the CFX96 Real-Time Detection System (Bio-Rad
Laboratories) following this uniform cycling parameters: Initialization (95°C for 3 min) was
followed by the denaturation of the strands (95°C for 10 sec), annealing of the primers to the
template (55°C for 10 sec), elongation of the primers by the DNA polymerase (72°C for 15
sec). The cycle from denaturation to elongation were repeated 40 times. Thereafter, the final
elongation step was performed at 95°C for 10 sec. At last, a melting temperature gradient
91
with 0.5°C increasing increment from 65–95°C for 5 sec was used to confirm the specificity
of the primer sets. Besides, the possibility of unspecific amplification products and
contamination was checked by using a non-template control (NTC). Furthermore, a positive
control was used with a known amount of template. qPCR data were analyzed with the Bio-
Rad CFX manager software, which allows for the immediate determination of the cycle
threshold (Ct), melting curves and quantification of samples.
Motif analysis
For de novo motif discovery within the significantly enriched DNA fragments, their
genomic sequences were submitted to MEME (Bailey and Elkan, 1994). Parameters were
set to search for zero or one palindromic motif of 16 bp width per sequence.
DNA band shift assays
50 nM of the ssDNA (oligo1-CGTACTCGAGTTATTGTTCCATGTCTAGCTCTTC) and
blunt-ended dsDNA (which was formed by annealing the oligo1 with its complementary
oligonucleotide) were incubated for 20 min at 50oC with increasing concentrations of
SSO2474 (0-2.0 M) in 20 l DNA-binding buffer (10 mM Tris-Cl, pH 8.0, 100 mM KCl,
2 mM DTT, 10% glycerol). The samples were loaded onto 12% (v/v) acrylamide gel and
electrophoresed in 0.5 × TBE buffer for 1 h 50 min. Following electrophoresis, the gels
were stained with SYBR® Gold (Life Technologies) and scanned by Typhoon FLA 7000
(GE Healthcare Life Science).
Results
Function prediction and high conservation of the two host transcription regulators
The transcription machinery in Archaea has drawn a lot of interests due to its bacterial-like
regulators and eukayote-like basal factors (Bell et al., 2001). Although possessing unique
genome structure, half of the transcription factors (TFs) identified in archaeal genomes
share at least one homolog with bacterial genomes (Perez-Rueda and Janga, 2010).
Therefore, structural or sequence similarities to the well-studied protiens could provide
efficient and direct information for the study of the unknown archaeal proteins.
BlastP search of sso2474 revealed a putative HTH (Helix-turn-Helix) motif located between
20 and 70 amino acid residues, and a conserved domain matching MarR-2 (multiple
antibiotic resistance) family proteins was also detected at the same location. The genome
sequence similarities in the database suggest that sso2474 belong to the TrmB-like
(transcriptional regulator of the maltose system) family, and it has a 36% identity and 65%
sequence similarity to transcription regulator TrmB in Sulfolobus archaeon AZ1 and a 36%
92
identity and 61% similarity to transcription regulator TrmB in strain Acidianus hospitalis.
Most of the TrmB proteins are able to bind DNA using a HTH motif as DBD (DNA–
binding domain). The DBD is located at the N-terminal region and a mutational analysis
revealed that it is essential for binding (Maruyama et al., 2011). However, the tertiary
structure prediction of sso2474 using the threading program Phyre 2 (Kelley and Sternberg,
2009) suggested a high confidence (99.6%) to MarR-like family transcription regulator with
a 93% alignment coverage. The crystal structures of many MarR family proteins from
bacterial and archaeal species were solved, and they reveal a common architecture with a
characteristic winged helix domain for DNA binding. Although sequence identities between
these homologs is less than 20%, they all possess the same core fold (Nichols et al., 2009).
The protein sso2474 was conserved in Sulfolobus, Acidianus and Metallosphaera specises
of Sulfolobales as well as euryarchaeotal Halobacteria species (Fig. S1).
The down-regulated gene, sso10340, encoding a 10 kDa protein, has a high identity with
truncated variant of Lrp/AsnC-family. Sequence alignment between sso10340 and E.coli
Lrp (leucine-responsive regulatory) protein, one of the best characterized Lrp family
proteins, revealed that the sso10340 protein aa sequence matched well with the C-terminal
amino acid effector domain of the Lrp protein (Fig. S3). The structure prediction of
sso10340 by Phyre2 also suggested a high confidence (99.9%) to the STS042 protein from
Sulfolobus tokodaii 7. STS042 was identified as a stand-alone RAM (regulation of amino
acid metabolism) module protein, which has homologies with the C-terminal domain of
Lrp/AsnC-family proteins (Miyazono et al., 2008). Search results among the DNA database
indicated that Lrp/AsnC-family proteins distribute among many bacterial and most archaea.
Sso10340 has homologues in crenarchaeotal sulfolobales species as well as bacterial
species (Fig. S2).
DNA extraction from protein-DNA complex and high-throughput sequencing
A method similar to Chip-seq was used to gain further insights into the two proteins regulon.
sso2474 and sso10340 were cloned into the E.coli-Sulfolobus shuttle vector pEXA2 under
the control of arabinose promoter, with a His-tag in the C-terminus, respectively
(Gudbergsdottir et al., 2011). In order to detect their binding sites in both the host and virus
genome, the cells were infected with SIRV2 at a m.o.i of 10 after the expression of the
target protein was induced for 15 h. The cells were collected after 2.5 h post virus infection.
By Ni-NTA-Agarose chromatography the protein-DNA complex were purified. Proteins
were detected in SDS-PAGE gel and DNA bound by these proteins was run on the agarose
gel. As shown in Fig. 1A, sso2474 was purified to homogeneity, with a single band detected
in SDS-gel. The DNA extracted from sso2474 exhibited hundreds of folds higher yields
93
than the control DNA, which was purified by Ni-NTA-Agarose beads from the cells
transformed with an empty plasmid. It seems that sso2474 showed a really high affinity to
DNA. The SDS-PAGE and western blot analysis showed that the Lrp-like protein, sso10340,
exhibit a range of oligomeric states including dimers, octamers and decamers even after
SDS treatment (Fig. 1B), resembling the Lrp/AsnC family protiens which form a range of
multimeric species in solution (Brinkman et al., 2003;Leonard et al., 2001). There are also
significantly more DNA from sso10340-DNA complex than from the control.
The purified DNA-protein complex was firstly treated with RNase A to remove the
contaminated RNA, and the deproteination was carried out by incubation with protease K.
The target DNA fragments of each sample were finally extracted using phenol/chloroform
extraction and ethanol precipitation, with an average size of 300-500 bp. Then the prepared
sample was sequenced using ion torrent next-generation sequencing. Of the sequenced 363-
393 thousand reads, 347-362 thousand reads was uniquely mapped with either sulfolobus
sofataricus host genome DNA or SIRV2 virus DNA (Table 1). It is interesting that 91.7 %
of the mapped reads from sso2474 are aligned with virus genome, whereas 92.04 % reads
from sso10340 belong to the host genome, indicating that sso2474 has a high affinity to
virus genome and sso10340 specially regulate the host genes.
It is surprising that almost all the DNA extracted from sso2474 was aligned to virus genome.
In order to validate whether it is due to the high amount of virus genome present in the cell,
we checked the copy number ratio between host and virus genome by Real-Time PCR
(qPCR). The infected cells (the same one for sequence) were collected and washed 3 times
to remove the virus on the cell surface, and the total DNA from the infected Sulfolobus cells
was extracted. One set of primers belong to the Sulfolobus solfataricus TFB-II were
designed to detect the host genome copy numbers and the primers amplifying the SIRV2
coat protein were designed to check the virus copy numbers in the same DNA sample.
Whereas, the data in Table 2 showed that there was average 0.6 virus entered in one host
cell after 2.5 h post infection, excluding the possibility that the high coverage of viral
sequence reads was due to a high copy number of the virus present in the infected cells.
Thus, we conclude the sso2474 preferentially binds the viral DNA.
Detection of the enriched DNA fragment and genome-wide binding profile of the two
proteins
To identify the DNA-enriched regions, we use Bowtie, the ultrafast memory-efficient short
read aligner, to align sequenced sets of short DNA reads to large genome Sulfolobus
solfataricus P2 as well as SIRV2 (Satoh and Tabunoki, 2013). And the genomic locations of
94
the peaks was identified and visualized by Artemis, an integrated platform to analyze high-
throughput sequence-based experimental data (Carver et al., 2012).
Protein Sso2474 binds to virus DNA with low specificity
Although only 3.76 % DNA extracted from protein sso2474 can be aligned to the host
genome, there is still a specific binding peak showing up in the map (Fig. 2A), located
upstream and inside of sso2474, indicating that the protein was regulated by itself (peak 10
in Fig 2A). In contrast, an average of 5400 reads were aligned to SIRV2 genome, which was
more than 500 folds than that to the host genome. They were demonstrated as mountain
shape with wide peaks covering the whole virus genome (Fig 2B). Even so, some potential
specific binding sites were marked with numbers, the binding regions were amplified by
PCR, and the gel mobility shift assays were carried out for validation.
Protein sso2474 was firstly expressed and purified from Sulfolobus solfataricus P2 (Fig 1A).
However, DNA specifically bound by the protein cannot be removed by either DNaseI or
PEI (Phenylethyleneimine) and formed an extremely high background. Then we set out to
express the sso2474 protein in E.coli inserted into the pET-30a vector with a C-terminal His
tag and purified by Ni2+
-affinity chromatography. It was expressed soluble in high amount
and SDS-PAGE analysis of the purified protein revealed a pure major band with a
molecular weight of approximately 15kDa (Fig. 3A).
Firstly the 11 enriched fragments and a negative control fragment were amplified by PCR
from SIRV2 genome, and the electrophoretic mobility shift assay (EMSA) of the
recombinant sso2474 from E.coli with the target fragment were carried out. The results
showed that this protein bound all the DNA fragments with no specificity (data not shown).
Since the binding region of sso2474 cannot be detected, another possibility is that this
protein prefers to bind ssDNA rather than dsDNA, some single-strand DNA binding
proteins bind DNA in a non-sequence specific way (Dickey et al., 2013). To verify whether
sso2474 is a single-strand binding protein, an EMSA experiment with equal molar ratio of
ssDNA and dsDNA substrate mixture were performed.
The concentration of the protein was increased from 0.1 M to 2.0 M. The samples were
deposited in a 12% acrylamide gel and was run in 0.5 x TBE buffer. It is demonstrated that
sso2474 preferred to bind dsDNA than ssDNA. As it is shown in Fig .3B lane 3, the band
representing dsDNA began to shift while the amount of ssDNA kept the same. When the
protein binds all dsDNA in the sample, and there is still more protein left, it begins to bind
the ssDNA. When the protein concentration increased to 1.6 M (Fig 3B. Lane 7), almost
all the substrates formed complex and no free DNA left. The result demonstrated a clear
image that sso2474 showed more fold affinity to dsDNA than to ssDNA.
95
Protein sso10340 bound the host genome at several regions
Compared with sso2474, the genome-wide binding profile of sso10340 with host genome
was well mapped showing a dozen of binding sites (Fig 4 A). However, the reads
corresponding to viral sequences were randomly aligned with virus genome, similar to the
control sample (Fig 4 B). Only regions exhibiting more than 2-fold enrichment in CHIP
DNA versus input DNA were considered to be bound significantly to sso10340. A total of
27 genomic regions, scattered in the genome, was identified. The various functions of the
genes that these binding peaks overlapped or closest to were summarized, most of which
participate in amino acid metabolism, energy metabolism, biosynthesis and transport ( Table
3).
Additionally, the fragments were grouped into four categories according to their location
with respect to open reading frames (Fig 4 C). As we observed that 41% (upstream and
intragenic but upstream) of the 27 regions localize to the upstream regulatory region of the
corresponding gene, and 46% of them fell within the coding region. The small left peaks
(13%) were found in the regions locating both the downstream of the neighbored genes.
As half of the binding region fell into the upstream area of the corresponding gene, a
binding profile of 14 genomic regions near promoter area were zoomed in and analyzed in
detail (Fig. S4). The binding genomic fragment was amplified by PCR with an averagesize
of 150bp. The protein sso10340 was purified from Sulfolobus, and an EMSA screen of these
regions was performed to verify whether these targets regions also interact with purified
protein in vitro. However, no binding was observed by protein sso10340 in vitro.
Motif analysis for sso10340 binding site
The protein sso10340 binding motif was defined by enriched oligonucleotide sequences
within bound regions. The sequences of these DNA fragments were submitted to the motif-
based sequence analysis tool MEME-ChIP (Machanick & Bailey, 2011.
http://meme.nbcr.net.) to detect conserved DNA motif. The most suggested motif was
shown in Fig. 5 B.
This 11 bp, an imperfect palindromic sequence was present in 96% of all binding regions
and have a similarity with the known motif of PPARG (Peroxisome proliferator activated
receptor gamma) (MA0066.1) (Fig 5 A). PPARgamma binds as heterodimer composing of
members of the retinoid X receptor family (RXR) and PPRE (PPAR response elements),
which had a direct repeat of two half sites of 5´-AGGTCA-3´ separated by one nucleotide
(Fig 5 A).
Discussion
96
The archaeal transcription regulation possessed the eukaryotic-like basal transcription
machinery and bacterial-like regulators that distinguished them from the other two domains
(Koonin and Galperin, 1997;Grabowski and Kelman, 2003). Bindings of the TBP (TATA-
box binding protein) and TFB (transcription factor B) to TATA box and BRE (TFB
response element) in the promoter region are critical to the transcription initiation of
archaeal genes (Bell et al., 1999). How the bacterial-type transcriptional regulators regulate
the eukaryotic-like transcription machinery in archaea, especially on the virus infection, are
still need to be elucidated.
The protein sso2474 showed an amino acid sequence similarity with TrmB family proteins,
which were found in all three domains of life, containing all three kinds of possible TF
combinations- repressors, activators or both. In archaea, most of the TrmB family proteins
were spread in the kingdom Euryarchaeota, only some exist in Crenarchaeota (Maruyama et
al., 2011). No matter the best studied TrmB proteins in Thermococcales P. furiosus of
Euryarchaeota or the research on TrmB family protein MalR of S. acidocaldarius in
Crenarchaeon, most of documented TrmBs seem to function as controlling diverse sugar
transporters or different genes of sugar metabolism, such as maltose and glucose processing,
as well as genes involved in other metabolisms (Kanai et al., 2007;Lee et al., 2008;Reichlen
et al., 2012;Wagner et al., 2014). However, in this study, the binding map of the protein
sso2474 indicated that this protein did not show a significant response on regulating the
genes related with sugar metabolism (except sso2474 itself) to activate or repress the
corresponding gene for its healthy maintenance.
On the other hand, both the conserved domain and the structure prediction revealed that
sso2474 belongs to the MarR (multiple antibiotic resistance) family transcription regulators.
MarR family proteins constitute a diverse group of transcription regulators that modulate the
expression of genes encoding proteins involved in the metabolic pathways, stress responses,
virulence and degradation or export of harmful chemicals such as antibotics, organic
solvents (White et al., 1997), oxidative stress agents (Ariza et al., 1994), and house
disinfections (McMurry et al., 1998). It seems that this mar locus is involved in the
mechanism that the stains used to resist the lethal effects of a wide range of toxic agents.
E.coli MarR was the firstly described MarR family regulator and its homologs are widely
distributed in both bacterial and archaea. MarR, as a component of the marRAB locus in
E.coli, is a repressor to its own operon and MarA is a transcription activator that can active
the operon and regulates the expression of proteins important to the multiple antibiotic
resistance (Alekshun and Levy, 1997). In many strains, constitutive expression of MarA
makes a contribution to maintenance of the resistance to antibiotics and other environmental
hazards, and the marR deletion mutant or the inactivation of MarR will result the increased
97
expression of MarA (Alekshun and Levy, 1999;Barbosa and Levy, 2000). To date, no
research revealed that TrmB family or MarR family proteins bind specifically to virus
genome upon virus infection. If sso2474 is more similar to a MarR family protein, the strain
would activate the expression of corresponding proteins to resist the exposure to the virus
infection, and the sso2474 could be in a way like binding to virus genome to hinder the
process of transcription. Based on this hypothesis, the growth retardation to SIRV2 was
compared between sso2474 overexpressed stain and the wild-type strain, and no difference
was observed from the growth curve (data not show). The above experiment indicated that
this protein probably is not involved in inhibiting the growth of virus. Or perhaps there is a
difference between viral and host DNA, e.g. modification, so sso2474 could specifically
binds to viral DNA. However, the binding mechanism of this protein and its possible
interacted partners involved are needed to be further identified. It will be intriguing to detect
the phenotype changes of sso2474 mutant strain upon SIRV2 infection, comparing to wide
type strain.
The downregulated transcription regulator sso10340 showed a similarity to the C-terminal
domain of Lrp/AsnC family proteins ( leucine-responsive regulatory protein ). Most of the
experimentally characterized archaeal transcriptional regulators belong to this family, and it
is a family that globally and specifically regulates genes. These family members can be
found in both bacteria and archaea but not in eukarya (Brinkman et al., 2003). The Lrp
family proteins typically have a 15 kDa molecular weight for the monomer with an N-
terminal wHTH domain and a C-terminal Amino Acid Metabolism (RAM) domain. The
RAM possesses a αβ sandwich fold and possibly involved in effector recognition and
oligomerization of the protein subunits (Thaw et al., 2006). Actually, proteins that only
possess the RAM domain are frequently observed in the genomes of many organisms. They
are defined as a novel ligand-binding domain or stand-alone RAM-domain (SARD) proteins
involved in regulation of amino acid metabolism (Ettema et al., 2002). Although many of
them were crystalized and structurally determined, the functions of these proteins remain
not clear and still need to be elucidated (Miyazono et al., 2008;Nakano et al., 2006). The
failure of detecting any DNA binding by sso10340 is possibly due to lack of binding
conditions or lack of DNA binding activity. It is possible that sso10340 recognizes an
effector and interacts with a DNA binding protein or a transcription regulator to achieve its
regulatory role in vivo. Indeed, sequences close or in the coding region of a DNA binding
protein (sso2626) and a transcription regulator (sso2827) were detected in this study (Table
3). Weather sso10304 interacted with the two proteins still need to further confirmed.
98
Reference
Alekshun,M.N., and Levy,S.B. (1997) Regulation of chromosomally mediated multiple antibiotic resistance: the mar regulon. Antimicrob Agents Chemother 41: 2067-2075.
Alekshun,M.N., and Levy,S.B. (1999) Alteration of the repressor activity of MarR, the negative regulator of the Escherichia coli marRAB locus, by multiple chemicals in vitro. J Bacteriol 181: 4669-4672.
Ariza,R.R., Cohen,S.P., Bachhawat,N., Levy,S.B., and Demple,B. (1994) Repressor mutations in the marRAB operon that activate oxidative stress genes and multiple antibiotic resistance in Escherichia coli. J Bacteriol 176: 143-148.
Bailey,T.L., and Elkan,C. (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 2: 28-36.
Barbosa,T.M., and Levy,S.B. (2000) Differential expression of over 60 chromosomal genes in Escherichia coli by constitutive expression of MarA. J Bacteriol 182: 3467-3474.
Bell,S.D., Kosa,P.L., Sigler,P.B., and Jackson,S.P. (1999) Orientation of the transcription preinitiation complex in archaea. Proc Natl Acad Sci U S A 96: 13662-13667.
Bell,S.D., Magill,C.P., and Jackson,S.P. (2001) Basal and regulated transcription in Archaea. Biochem Soc Trans 29: 392-395.
Bize,A., Karlsson,E.A., Ekefjard,K., Quax,T.E., Pina,M., Prevost,M.C. et al. (2009) A unique virus release mechanism in the Archaea. Proc Natl Acad Sci U S A 106: 11306-11311.
Brinkman,A.B., Ettema,T.J., de Vos,W.M., and van der Oost,J. (2003) The Lrp family of transcriptional regulators. Mol Microbiol 48: 287-294.
Carver,T., Harris,S.R., Berriman,M., Parkhill,J., and McQuillan,J.A. (2012) Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data. Bioinformatics 28: 464-469.
Deng,L., He,F., Bhoobalan-Chitty,Y., Martinez-Alvarez,L., Guo,Y., and Peng,X. (2014) Unveiling cell surface and type IV secretion proteins responsible for archaeal rudivirus entry. J Virol 88: 10264-10268.
Dickey,T.H., Altschuler,S.E., and Wuttke,D.S. (2013) Single-stranded DNA-binding proteins: multiple domains for multiple functions. Structure 21: 1074-1084.
Ettema,T.J., Brinkman,A.B., Tani,T.H., Rafferty,J.B., and Van Der Oost,J. (2002) A novel ligand-binding domain involved in regulation of amino acid metabolism in prokaryotes. J Biol Chem 277: 37464-37468.
Frols,S., Gordon,P.M., Panlilio,M.A., Schleper,C., and Sensen,C.W. (2007) Elucidating the transcription cycle of the UV-inducible hyperthermophilic archaeal virus SSV1 by DNA microarrays. Virology 365: 48-59.
Gardner,A.F., Guan,C., and Jack,W.E. (2011) Biochemical characterization of a structure-specific resolving enzyme from Sulfolobus islandicus rod-shaped virus 2. PLoS One 6: e23668.
99
Grabowski,B., and Kelman,Z. (2003) Archeal DNA replication: eukaryal proteins in a bacterial context. Annu Rev Microbiol 57: 487-516.
Gudbergsdottir,S., Deng,L., Chen,Z., Jensen,J.V., Jensen,L.R., She,Q., and Garrett,R.A. (2011) Dynamic properties of the Sulfolobus CRISPR/Cas and CRISPR/Cmr systems when challenged with vector-borne viral and plasmid genes and protospacers. Mol Microbiol 79: 35-49.
He,F., Chen,L., and Peng,X. (2014) First Experimental Evidence for the Presence of a CRISPR Toxin in Sulfolobus. J Mol Biol.
Kanai,T., Akerboom,J., Takedomi,S., van de Werken,H.J., Blombach,F., van der Oost,J. et al. (2007) A global transcriptional regulator in Thermococcus kodakaraensis controls the expression levels of both glycolytic and gluconeogenic enzyme-encoding genes. J Biol Chem 282: 33659-33670.
Kelley,L.A., and Sternberg,M.J. (2009) Protein structure prediction on the Web: a case study using the Phyre server. Nat Protoc 4: 363-371.
Kessler,A., Brinkman,A.B., van der Oost,J., and Prangishvili,D. (2004) Transcription of the rod-shaped viruses SIRV1 and SIRV2 of the hyperthermophilic archaeon sulfolobus. J Bacteriol 186: 7745-7753.
Koonin,E.V., and Galperin,M.Y. (1997) Prokaryotic genomes: the emerging paradigm of genome-based microbiology. Curr Opin Genet Dev 7: 757-763.
Lee,S.J., Surma,M., Hausner,W., Thomm,M., and Boos,W. (2008) The role of TrmB and TrmB-like transcriptional regulators for sugar transport and metabolism in the hyperthermophilic archaeon Pyrococcus furiosus. Arch Microbiol 190: 247-256.
Leonard,P.M., Smits,S.H., Sedelnikova,S.E., Brinkman,A.B., de Vos,W.M., van der Oost,J. et al. (2001) Crystal structure of the Lrp-like transcriptional regulator from the archaeon Pyrococcus furiosus. EMBO J 20: 990-997.
Maruyama,H., Shin,M., Oda,T., Matsumi,R., Ohniwa,R.L., Itoh,T. et al. (2011) Histone and TK0471/TrmBL2 form a novel heterogeneous genome architecture in the hyperthermophilic archaeon Thermococcus kodakarensis. Mol Biol Cell 22: 386-398.
McMurry,L.M., Oethinger,M., and Levy,S.B. (1998) Overexpression of marA, soxS, or acrAB produces resistance to triclosan in laboratory and clinical strains of Escherichia coli. FEMS Microbiol Lett 166: 305-309.
Miyazono,K., Tsujimura,M., Kawarabayasi,Y., and Tanokura,M. (2008) Crystal structure of STS042, a stand-alone RAM module protein, from hyperthermophilic archaeon Sulfolobus tokodaii strain 7. Proteins 71: 1557-1562.
Nakano,N., Okazaki,N., Satoh,S., Takio,K., Kuramitsu,S., Shinkai,A., and Yokoyama,S. (2006) Structure of the stand-alone RAM-domain protein from Thermus thermophilus HB8. Acta Crystallogr Sect F Struct Biol Cryst Commun 62: 855-860.
Nichols,C.E., Sainsbury,S., Ren,J., Walter,T.S., Verma,A., Stammers,D.K. et al. (2009) The structure of NMB1585, a MarR-family regulator from Neisseria meningitidis. Acta Crystallogr Sect F Struct Biol Cryst Commun 65: 204-209.
100
Oke,M., Kerou,M., Liu,H., Peng,X., Garrett,R.A., Prangishvili,D. et al. (2011) A dimeric Rep protein initiates replication of a linear archaeal virus genome: implications for the Rep mechanism and viral replication. J Virol 85: 925-931.
Okutan,E., Deng,L., Mirlashari,S., Uldahl,K., Halim,M., Liu,C. et al. (2013) Novel insights into gene regulation of the rudivirus SIRV2 infecting Sulfolobus cells. RNA Biol 10: 875-885.
Ortmann,A.C., Brumfield,S.K., Walther,J., McInnerney,K., Brouns,S.J., van de Werken,H.J. et al. (2008) Transcriptome analysis of infection of the archaeon Sulfolobus solfataricus with Sulfolobus turreted icosahedral virus. J Virol 82: 4874-4883.
Perez-Rueda,E., and Janga,S.C. (2010) Identification and genomic analysis of transcription factors in archaeal genomes exemplifies their functional architecture and evolutionary origin. Mol Biol Evol 27: 1449-1459.
Pina,M., Bize,A., Forterre,P., and Prangishvili,D. (2011) The archeoviruses. FEMS Microbiol Rev 35: 1035-1054.
Quax,T.E., Lucas,S., Reimann,J., Pehau-Arnaudet,G., Prevost,M.C., Forterre,P. et al. (2011) Simple and elegant design of a virion egress structure in Archaea. Proc Natl Acad Sci U S A 108: 3354-3359.
Quemin,E.R., Lucas,S., Daum,B., Quax,T.E., Kuhlbrandt,W., Forterre,P. et al. (2013) First insights into the entry process of hyperthermophilic archaeal viruses. J Virol 87: 13379-13385.
Reichlen,M.J., Vepachedu,V.R., Murakami,K.S., and Ferry,J.G. (2012) MreA functions in the global regulation of methanogenic pathways in Methanosarcina acetivorans. MBio 3: e00189-12.
Satoh,J., and Tabunoki,H. (2013) Molecular network of chromatin immunoprecipitation followed by deep sequencing-based vitamin D receptor target genes. Mult Scler 19: 1035-1045.
Thaw,P., Sedelnikova,S.E., Muranova,T., Wiese,S., Ayora,S., Alonso,J.C. et al. (2006) Structural insight into gene transcriptional regulation and effector binding by the Lrp/AsnC family. Nucleic Acids Res 34: 1439-1449.
Wagner,M., Wagner,A., Ma,X., Kort,J.C., Ghosh,A., Rauch,B. et al. (2014) Investigation of the malE promoter and MalR, a positive regulator of the maltose regulon, for an improved expression system in Sulfolobus acidocaldarius. Appl Environ Microbiol 80: 1072-1081.
White,D.G., Goldman,J.D., Demple,B., and Levy,S.B. (1997) Role of the acrAB locus in organic solvent tolerance mediated by expression of marA, soxS, or robA in Escherichia coli. J Bacteriol 179: 6122-6126.
Zillig,W., Prangishvilli,D., Schleper,C., Elferink,M., Holz,I., Albers,S. et al. (1996) Viruses, plasmids and other genetic elements of thermophilic and hyperthermophilic Archaea. FEMS Microbiol Rev 18: 225-236.
101
Table and Figure Legends
Table1. Sequencing and mapping data with Sulfolobus solfataricus P2 and SIRV2 genomes
Table 2. The average virus copy number in each host cell for 2.5 h post infection of SIRV2
Table 3. The products as well as their COG Functional Category of target genes binding by
sso10340
Figure 1. Purification of protein sso2474, sso10340 as well as negative control from Sulfolobus
sofataricus P2. (A) Purified protein elution samples of sso2474 and negative control was subjected
to 12.5% SDS-PAGE gel to detect the purity of the protein (a), and 0.7% Agarose gel to detect the
DNA amount bound by protein(b).(B) Purified protein elution samples from sso10340 and negative
control was subjected to SDS-PAGE gel to detect the the purity of the protein (a), and Agarose gel
to detect the DNA amount(b).Western blot analysis of the purified sso10340 (c).
Figure 2. Genome-wide distribution of sso2474 binding regions on Sulfolobus solfataricus P2
genome (A) and SIRV2 viron genome (B). These experiments were performed using DNA
extracted from purified sso2474 and negative control. Data were analyzed and visualized as
`Material and Methods` section. The genome coordinates (in bp) are given on the x-axis, and y-axis
represents the sequenced reads aligned on the genome. And the sharp peak marked in arrow in (A)
locates in the region of protein sso2474.
Figure 3. (A) The purified sso2474 fractions (L1-L6) from E.coli were analyzed on a SDS-
PAGE gel. The single major band around 15kDa represented the protein sso2474. (B) EMSA assay
with dsDNA and ssDNA mixture as substrate. The concentration of the protein was increased
from 0.1 M to 2.0 M. The length of oligos was 35 bp. The dsDNA and ssDNA were mixed with
equal molar ratio (25 nM: 25 nM).
Figure 4. Genome-wide distribution of sso10340 binding regions on Sulfolobus solfataricus P2
genome (A) and SIRV2 viron genome (B) as well as classification of sso10340 binding regions
with respect to genomic organization(C).(A) and (B),These experiments were performed using
DNA extracted from purified sso10340(red bar) and control(empty plasmids) (green bar)-input chip.
The genome coordinates (in bp) are given on the x-axis, and y-axis represents the sequenced reads
aligned on the genome. Taget gene detected in vivo are indicated.(C) ChIP-enriched regions are
indicated by black horizontal bars, whereas ORFs are depicted by horizontal arrows. Binding
regions ranging from – 500 bp to +100 bp relative to translation start site was considered to be
upstream of a transcription unit or intragenic but upstream. ChIP-enriched regions that are
exclusively located in gene coding regions or partial in downsteam of the gene were identified as
intragenic part. For binding in the regions belong to both downstream of neighbored gene was
classified to intergenic region.
Figure 5. Weblogo of the motif detected with MEME-chip. For MEME motifs, the discovered
motif Logo (B) from submitted bind-region sequences is shown aligned with the most similar
JASPAR motif Logo (A) and the similarity is significant (E≤0.05).
102
Table 1. Sequencing and mapping with Sulfolobus solfataricus P2 and SIRV2
Sample Sequenced
reads Mapped reads
Mapped reads
with p2 genome
Alignment rate
with p2 genome
Mapped reads
with Sirv2
genome
Alighnment
rate with Sirv2
genome
Input control 393537 362185 359559 91.37% 2626 0.67%
sso2474 363814 347347 13677 3.76% 333670 91.70%
sso10340 388533 359145 357598 92.04% 1547 0.4%
Table 2. The average virus copy number in each host cell for 2 h post infection of SIRV2
Time (h) Host SQ Mean Host Std.Dev Virus SQ Mean Virus Std.Dev
Virus copy
number/host
chromosome
2.5 5.287E+07 1.80E+06 3.158E+07 8.22E+06 0.6
103
Table 3. The products as well as their COG Functional Category of target genes binding by sso10340
Target
name
Genomic
coordinate
peak start
Genomic
coordinate
peak stop
Motif binding
location
Log2 CHIP
DNA/input
DNA
Gene function
COG
Functional
Category
doxC 40520 41150 Inside the gene 1.82 Terminal oxidase
Energy
production and
conversion
sso1214 1053350 1054300 Inside the gene 3.41 Carbonic anhydrases
sso1580 1427200 1428050
80bp Inside the
gene from start
codon
2.21
Anaerobic dehydrogenases;
molybdopterin
oxidoreductase
sso2826 2587400 2588210 Inside the gene 2.34 molybdopterin-binding
protein
pacS 2411320 2412430 200bp upstream 3.39 Cation transporting ATPase
Inorganic ion
transport and
metabolism
sso3189 2935150 2935710
75bp inside the
gene from start
codon
1.93 Amino acid permease Amino acid
transport and
metabolism adh-11 2469950 2470580 70bp upstream 1.96 Alcohol dehydrogenase
sso2153 1977900 1978650 Inside the gene 2.66
Archaeal putative
transposase ISC1217;
pfam04693
Transposase
sso12210 2965830 2966580 175bp upstream 2.07 Transposase
sso10340 2189085 2189390 Inside the gene 2.31
Conserved,truncated variant
of Lrp/AsnC-family;RNA
polymerase and
transcription factors Transcription
sso2626 2391100 2392050 Start codon 2.07 DNA-binding protein
sso2827 2588200 2589300 Inside the gene 3.06 Transcription regulator
rps26E 502000 502750 Inside the gene 2.49 30S ribosomal protein S26e
Translation rp144e 907520 908380
24bp Inside the
gene from start
coden
1.88 50S ribosomal protein L44e
mrp 395515 396010 Inside the gene 1.78 ATPases involved in
chromosome partitioning Cell division
and
chromosome
partitioning sso2730 2485375 2485920 Inside the gene 1.87
Possibly ATPase of the
AAA superfamily
metS 491920 492385 Inside the gene 2.48 Methionone-tRNA ligase RNA processing
and
modification sso1044 903240 904070 Inside the gene 2.96
RNA methyltransferase,
DNA methyltransferase,
tRNA
sso1288 1114815 1115458 Inside the gene 2.16
short last area hits to a zoo
of molecules associated
with cell wall/cell
membrane
Cell envelop
biogenesis,
outermembrane sso3150 2902540 2903225 Inside the gene 3.19 Putative membrane protein
sso3178 2923630 2924485 280bp upstream 2.10 Cell well
sso2749 2502810 2503615 Inside the gene 2.37
ferritin
Oxidoreductase,oxidative-
stress response Defense
mechanisms
sso2037 1850715 1851580 Inside the gene 3.06 multicopy.similar to
sso1886;proteases
sso1758 1592730 1593280 202bp upstream 2.20
multicopy similar to
carboxy-end of
sso1752,1469 and 1606 Unclassified
sso0927 787505 788400 Inside the gene 2.09 Fe-S cluster assembly
protein SufB
sso0438 380250 381045 43bp upstream 2.45 -
Function
unknown sso1984 1796200 1796975
22bp inside the
gene from the
start codon
2.98 -
104
(c)
A
B
M
PEX
A3
SSO1
03
40-1
SSO1
03
40-2
10kDa
15kDa
27kDa
70kDa 55kDa
sso10340
dimmer
octamer
Figure 1
105
A
B
Figure 2
106
dsDNA
ssDNA
1 2 3 4 5 6 7 8
10kDa
K
15kDa
K
sso2474
K
(Sso2474 uM) 0 0.1 0.2 0.4 0.8 1.2 1.6 2.0 M 1 2 3 4 5 6
A
K
B
K
Figure 3
107
A
Figure 4
108
B
C
46%
26%
15%
13%
Distribution of the genomic locations of 10340 binding sites
Intragenic Intragenic but upstream Upstream Intergenic
Intragenic
Intragenic but upstream
Upstream
Intergenic
109
Figure 5
SyntTax Report http://archaea.u-psud.fr/SyntTax page 1/3
Genomic contexts
Sulfolobus solfataricus98 2 uid167998
Score: 79.92
02
71
02
72
02
73
02
74
02
75
02
76
02
77
02
78
02
79
02
80
02
81
02
82
02
83
02
84
02
85
02
86
02
87
Sulfolobus islandicusHVE10 4 uid162067
Score: 70.470
26
2
02
63
02
64
02
65
02
66
02
67
02
68
02
69
02
70
02
71
02
72
02
73
02
74
02
75
02
76
02
77
02
78
02
79
02
80
Sulfolobus islandicusL S 2 15 uid58871
Score: 70.08
02
86
02
87
30
11
leu
D
02
89
02
90
02
91
02
92
02
93
02
94
02
95
02
96
02
97
29
66
02
98
02
99
03
00
03
01
03
02
Sulfolobus islandicusL D 8 5 uid43679 C1
Score: 70.08
02
61
02
62
02
63
02
64
02
65
02
66
02
67
02
68
02
69
02
70
02
71
02
72
02
73
02
74
02
75
02
76
02
77
02
78
Sulfolobus solfataricusP2 uid57721Score: 70.08
24
68
24
69
leu
D
leu
C
ph
rB
24
73
24
74
10
44
9
24
75
24
76
24
78
24
79
24
81
Sulfolobus islandicusM 14 25 uid58849
Score: 70.08
02
55
02
56
02
57
leu
D
02
59
02
60
02
61
02
62
02
63
02
64
02
65
02
66
02
67
02
68
02
69
02
70
02
71
thrS
Sulfolobus islandicusY N 15 51 uid58825 C1
Score: 70.08
28
63
28
64
28
65
leu
D
28
67
28
68
28
69
28
70
28
71
28
72
28
73
28
74
28
75
28
76
28
77
28
78
28
79
28
80
28
81
Sulfolobus islandicusM 16 4 uid58841
Score: 70.08
02
73
02
74
02
75
leu
D
02
77
02
78
02
79
02
80
02
81
02
82
02
83
02
84
02
85
02
86
02
87
02
88
02
89
thrS
Sulfolobus islandicusM 16 27 uid58851
Score: 70.08
02
55
02
56
02
57
leu
D
02
59
02
60
02
61
02
62
02
63
02
64
02
65
02
66
02
67
02
68
02
69
02
70
02
71
thrS
Sulfolobus islandicusY G 57 14 uid58923
Score: 70.08
02
58
02
59
02
60
leu
D
02
62
02
63
02
64
02
65
02
66
02
67
02
68
02
69
02
70
02
71
02
72
29
88
02
73
02
74
02
75
Sulfolobus islandicusREY15A uid162071
Score: 69.69
02
57
02
58
02
59
02
60
02
61
02
62
02
63
02
64
02
65
02
66
02
67
02
68
02
69
02
70
02
71
02
72
02
73
02
74
Sulfolobus islandicusLAL14 1 uid197216
Score: 69.69
02
47
02
48
02
49
02
50
02
51
02
52
02
53
02
54
02
55
02
56
02
57
02
58
02
59
02
60
02
61
02
62
02
63
02
64
Acidianus hospitalisW1 uid66875Score: 27.01
09
26
09
27
09
28
09
29
09
30
09
31
09
32
09
33
09
34
09
35
09
36
09
37
09
38
09
39
09
40
09
41
09
42
Sulfolobus tokodaii7 uid57807
Score: 25.79
08
30
09
6
08
31
08
33
08
34
08
35
08
36
08
37
08
38
09
7
08
39
08
40
08
41
08
42
08
43
08
44
08
45
08
46
Metallosphaera cuprinaAr 4 uid66329
Score: 25.16
08
29
08
30
08
31
08
32
08
33
08
34
08
35
08
36
08
37
08
38
08
39
08
40
08
41
08
42
08
43
08
44
08
45
08
46
08
47
08
48
08
49
08
50
Metallosphaera sedulaDSM 5348 uid58717
Score: 23.03
13
87
13
88
13
89
13
90
13
91
13
92
13
93
13
94
13
95
13
96
13
97
13
98
13
99
14
00
14
01
14
02
14
03
14
04
14
05
14
06
Sulfolobus acidocaldariusSUSAZ uid232254
Score: 20.31
00
44
5
00
45
0
00
45
5
00
46
0
00
46
5
00
47
0
00
47
5
00
48
0
00
48
5
00
49
0
00
49
5
00
50
0
00
50
5
00
51
0
00
51
5
00
52
0
00
52
5
00
53
0
00
53
5
00
54
0
00
54
5
Sulfolobus acidocaldariusN8 uid189027
Score: 20.16
00
44
0
00
44
5
00
45
0
11
56
9
00
45
5
00
46
0
00
46
5
00
47
0
00
47
5
00
48
0
00
48
5
00
49
0
00
49
5
00
50
0
00
50
5
00
51
0
00
51
5
00
52
0
00
52
5
00
53
0
00
53
5
Sulfolobus acidocaldariusRon12 I uid189028
Score: 20.16
00
44
0
00
44
5
00
45
0
11
81
4
00
45
5
00
46
0
00
46
5
00
47
0
00
47
5
00
48
0
00
48
5
00
49
0
00
49
5
00
50
0
00
50
5
00
51
0
00
51
5
00
52
0
00
52
5
00
53
0
00
53
5
SyntTax Report http://archaea.u-psud.fr/SyntTax page 2/3
Sulfolobus acidocaldariusDSM 639 uid58379
Score: 20.16
gd
s
00
93
pg
mB
00
95
00
96
cb
aA
00
98
00
99
01
00
01
01
lrs1
4
01
03
01
04
01
05
mo
eA
01
07
01
08
01
09
01
10
01
11
01
12
01
13
Fervidicoccus fontisKam940 uid162201
Score: 17.13
03
43
03
44
03
45
03
46
03
47
03
48
03
49
03
50
03
51
R0
01
0
03
52
03
53
03
54
03
55
03
56
03
57
03
58
03
59
03
60
03
61
Pyrolobus fumarii1A uid73415Score: 15.31
14
80
14
81
14
82
14
83
14
84
14
85
14
86
14
87
14
88
14
89
14
90
14
91
14
92
14
93
14
94
14
95
14
96
14
97
Methanocaldococcus fervensAG86 uid59347 C1
Score: 11.54
10
20
10
21
10
22
tfb
10
24
10
25
10
26
10
27
10
28
10
29
10
30
10
31
10
32
10
33
10
34
10
35
Pyrobaculum ogunienseTE7 uid84411 C1
Score: 11.06
18
48
18
49
18
50
18
51
18
52
18
53
18
54
18
55
18
56
18
57
18
58
18
59
18
60
Pyrobaculum arsenaticumDSM 13514 uid58409
Score: 11.06
04
89
04
90
04
91
04
92
04
93
04
94
04
95
04
96
04
97
04
98
04
99
05
00
Methanobacterium SWAN1 uid67359
Score: 10.75
02
69
02
70
02
71
02
72
02
73
02
74
02
75
02
76
02
77
02
78
02
79
02
80
02
81
02
82
Halopiger xanaduensisSH 6 uid68105 C1
Score: 10.75
37
88
37
89
37
90
37
91
37
92
37
93
37
94
37
95
37
96
37
97
37
98
37
99
Methanosarcina mazeiGo1 uid57893
Score: 10.47
01
05
fxsA
01
07
01
08
01
09
01
10
01
11
01
12
01
13
tru
D
01
15
01
16
01
17
pyrG
Methanosarcina mazeiTuc01 uid190185
Score: 10.47
01
10
01
11
01
12
01
13
01
14
01
15
01
16
01
17
01
18
01
19
01
20
01
21
01
22
01
23
01
24
01
25
01
26
Methanosarcina barkeriFusaro uid57715 C1
Score: 10.47
fxsA
A3
19
9
A3
20
0
A3
20
1
A3
20
2
A3
20
3
A3
20
4
A3
20
5
tru
D
A3
20
7
A3
20
8
A3
20
9
A3
21
0
A3
21
1
Natrinema pellirubrumDSM 15624 uid74437 C1
Score: 10.47
12
61
12
62
12
63
12
64
12
65
12
66
12
67
12
68
12
69
12
70
12
71
12
72
12
73
12
74
12
75
Pyrococcus horikoshiiOT3 uid57753
Score: 10.31
10
80
10
81
10
82
10
83
10
85
10
86
10
87
10
88
10
89
10
90
10
91
10
93
10
95
10
97
10
98
Methanosarcina acetivoransC2A uid57879
Score: 10.31
32
66
32
67
rim
K
32
69
32
70
tru
D
32
72
32
73
32
74
32
75
32
76
32
77
32
78
Pyrobaculum calidifontisJCM 11548 uid58787
Score: 10.00
14
45
14
46
14
47
14
48
14
49
14
50
14
51
14
52
14
53
14
54
14
55
14
56
14
57
14
58
14
59
14
60
14
61
14
62
14
63
14
64
14
65
Thermofilum pendensHrk 5 uid58563 C1
Score: 10.00
00
35
00
36
00
37
00
38
00
39
00
40
00
41
00
42
00
43
00
44
00
45
00
46
00
47
00
48
00
49
00
50
R0
00
1
00
51
00
52
00
53
00
54
00
55
SyntTax Report http://archaea.u-psud.fr/SyntTax page 3/3
Query protein sequence:mqgkseismp dgrvadvfnv vkflyglsdr dieilkllik sqssltmeei sselnitksv vnksilnlek knivikekve sskkgrrayt yrvdvnyltr klvtdldqli kdlkvkiadv igiqiektas v
Genomes without synteny:Acidilobus_saccharovorans_345_15_uid51395Caldisphaera_lagunensis_DSM_15908_uid183486Aeropyrum_camini_SY1___JCM_12091_uid222311Aeropyrum_pernix_K1_uid57757Desulfurococcus_fermentans_DSM_16532_uid75119Desulfurococcus_kamchatkensis_1221n_uid59133Desulfurococcus_mucosus_DSM_2162_uid62227Ignicoccus_hospitalis_KIN4_I_uid58365Ignisphaera_aggregans_DSM_17230_uid51875Staphylothermus_hellenicus_DSM_12710_uid45893Staphylothermus_marinus_F1_uid58719Thermogladius_1633_uid167488Thermosphaera_aggregans_DSM_11486_uid48993Hyperthermus_butylicus_DSM_5456_uid57755Thermofilum_1910b_uid215374Caldivirga_maquilingensis_IC_167_uid58711Pyrobaculum_1860_uid82379Pyrobaculum_aerophilum_IM2_uid57727Pyrobaculum_islandicum_DSM_4184_uid58635Pyrobaculum_neutrophilum_V24Sta_uid58421Thermoproteus_tenax_Kra_1_uid74443Thermoproteus_uzoniensis_768_20_uid65089Vulcanisaeta_distributa_DSM_14429_uid52827Vulcanisaeta_moutnovskia_768_28_uid63631Archaeoglobus_fulgidus_DSM_4304_uid57717Archaeoglobus_profundus_DSM_5631_uid43493_C1Archaeoglobus_sulfaticallidus_PM70_1_uid201033Archaeoglobus_veneficus_SNP6_uid65269Ferroglobus_placidus_DSM_10642_uid40863Halalkalicoccus_jeotgali_B3_uid50305_C1Haloarcula_hispanica_ATCC_33960_uid72475_C1Haloarcula_hispanica_N601_uid230920_C1Haloarcula_marismortui_ATCC_43049_uid57719_C1Halobacterium_NRC_1_uid57769_C1Halobacterium_salinarum_R1_uid61571_C1Haloferax_mediterranei_ATCC_33500_uid167315_C1Haloferax_volcanii_DS2_uid46845_C1Halogeometricum_borinquense_DSM_11551_uid54919_C1Halomicrobium_mukohataei_DSM_12286_uid59107_C1Haloquadratum_walsbyi_C23_uid162019_C1Haloquadratum_walsbyi_DSM_16790_uid58673_C1Halorhabdus_tiamatea_SARL4B_uid214082_C1Halorhabdus_utahensis_DSM_12940_uid59189Halorubrum_lacusprofundi_ATCC_49239_uid58807_C1Haloterrigena_turkmenica_DSM_5511_uid43501_C1Halovivax_ruber_XH_70_uid184819Natrialba_magadii_ATCC_43099_uid46245_C1Natrinema_J7_uid171337_C1Natronobacterium_gregoryi_SP2_uid74439Natronococcus_occultus_SP4_uid184863_C1Natronomonas_moolapensis_8_8_11_uid190182Natronomonas_pharaonis_DSM_2160_uid58435_C1Salinarchaeum_laminariae_Harcht_Bsk1_uid207001Methanobacterium_AL_21_uid63623Methanobacterium_MB1_uid231690Methanobrevibacter_AbM4_uid206516Methanobrevibacter_ruminantium_M1_uid45857Methanobrevibacter_smithii_ATCC_35061_uid58827Methanosphaera_stadtmanae_DSM_3091_uid58407Methanothermobacter_marburgensis_Marburg_uid51637_C1Methanothermobacter_thermautotrophicus_Delta_H_uid57877Methanothermus_fervidus_DSM_2088_uid60167Methanocaldococcus_FS406_22_uid42499_C1Methanocaldococcus_infernus_ME_uid48803Methanocaldococcus_jannaschii_DSM_2661_uid57713_C1Methanocaldococcus_vulcanius_M7_uid41131_C1Methanotorris_igneus_Kol_5_uid67321Methanococcus_aeolicus_Nankai_3_uid58823Methanococcus_maripaludis_C5_uid58741_C1Methanococcus_maripaludis_C6_uid58947Methanococcus_maripaludis_C7_uid58847Methanococcus_maripaludis_S2_uid58035Methanococcus_maripaludis_X1_uid70729Methanococcus_vannielii_SB_uid58767Methanococcus_voltae_A3_uid49529Methanothermococcus_okinawensis_IH1_uid51535_C1Methanocella_arvoryzae_MRE50_uid61623Methanocella_conradii_HZ254_uid157911Methanocella_paludicola_SANAE_uid42887Methanocorpusculum_labreanum_Z_uid58785Methanoculleus_bourgensis_MS2_uid171377Methanoculleus_marisnigri_JR1_uid58561Methanoplanus_petrolearius_DSM_11571_uid52695Methanoregula_boonei_6A8_uid58815Methanoregula_formicicum_SMSP_uid184406Methanosphaerula_palustris_E1_9c_uid59193Methanospirillum_hungatei_JF_1_uid58181Methanosaeta_concilii_GP6_uid66207_C1Methanosaeta_harundinacea_6Ac_uid81199_C1Methanosaeta_thermophila_PT_uid58469Methanococcoides_burtonii_DSM_6242_uid58023Methanohalobium_evestigatum_Z_7303_uid49857_C1Methanohalophilus_mahii_DSM_5219_uid47313Methanolobus_psychrophilus_R15_uid177925Methanomethylovorans_hollandica_DSM_15978_uid184864_C1Methanosalsum_zhilinae_DSM_4017_uid68249Methanopyrus_kandleri_AV19_uid57883Pyrococcus_abyssi_GE5_uid62903_C1Pyrococcus_furiosus_COM1_uid169620Pyrococcus_furiosus_DSM_3638_uid57873Pyrococcus_NA2_uid66551Pyrococcus_ST04_uid167261Pyrococcus_yayanosii_CH1_uid68281Thermococcus_4557_uid70841Thermococcus_AM4_uid54735Thermococcus_barophilus_MP_uid54733Thermococcus_CL1_uid168259Thermococcus_gammatolerans_EJ3_uid59389Thermococcus_kodakarensis_KOD1_uid58225Thermococcus_litoralis_DSM_5473_uid82997Thermococcus_onnurineus_NA1_uid59043Thermococcus_sibiricus_MM_739_uid59399Ferroplasma_acidarmanus_fer1_uid54095Picrophilus_torridus_DSM_9790_uid58041Thermoplasma_acidophilum_DSM_1728_uid61573Thermoplasma_volcanium_GSS1_uid57751
SyntTax Report http://archaea.u-psud.fr/SyntTax page 1/8
Genomic contexts
Sulfolobus solfataricus98 2 uid167998
Score: 81.88
02
02
02
03
02
04
02
05
02
06
02
07
02
08
02
09
02
10
02
11
02
12
02
13
02
14
02
15
02
16
02
17
02
18
02
19
02
20
02
21
02
22
02
23
Sulfolobus islandicusY N 15 51 uid58825 C1
Score: 81.88
27
93
27
94
27
95
27
96
pa
nB
27
98
32
85
27
99
28
00
28
01
28
02
28
03
28
04
28
05
28
06
aksA
28
08
28
09
28
10
28
11
28
12
28
13
28
14
Sulfolobus solfataricusP2 uid57721Score: 81.88
23
95
23
98
23
97
23
99
pa
nB
24
01
24
02
24
04
10
34
0
24
05
10
34
2
24
06
3 24
08
10
34
8
24
09
24
10
24
11
24
12
24
13
24
15
3
Sulfolobus islandicusL S 2 15 uid58871
Score: 81.88
02
18
02
19
02
20
02
21
pa
nB
02
23
02
24
02
25
02
26
02
27
02
28
02
29
02
30
02
31
02
32
02
33
02
34
02
35
02
36
02
37
02
38
02
39
Sulfolobus islandicusL D 8 5 uid43679 C1
Score: 81.88
01
91
01
92
01
93
01
94
01
95
01
96
01
97
01
98
01
99
02
00
02
01
02
02
02
03
02
04
02
05
02
06
02
07
02
08
02
09
02
10
02
11
02
12
Sulfolobus islandicusLAL14 1 uid197216
Score: 81.21
01
80
01
81
01
82
01
83
01
84
01
85
01
86
01
87
01
88
01
89
01
90
01
91
01
92
01
93
01
94
01
95
01
96
01
97
01
98
01
99
02
00
Sulfolobus islandicusREY15A uid162071
Score: 81.21
01
87
01
88
01
89
01
90
01
91
01
93
01
92
01
94
01
95
00
09
00
10
01
96
01
97
01
98
01
99
00
11
02
00
02
01
02
02
02
03
02
04
02
05
02
06
02
07
Sulfolobus islandicusHVE10 4 uid162067
Score: 81.21
01
93
01
94
01
95
01
96
01
97
01
99
01
98
02
00
02
01
00
09
02
02
02
03
02
04
02
05
02
06
02
07
02
08
02
09
02
10
02
11
02
12
02
13
02
14
Sulfolobus islandicusM 14 25 uid58849
Score: 81.21
01
87
01
88
01
89
01
90
pa
nB
01
92
01
93
01
94
01
95
01
96
01
97
01
98
01
99
aksA
02
01
02
02
02
03
02
04
02
05
02
06
02
07
02
08
Sulfolobus islandicusM 16 27 uid58851
Score: 81.21
01
87
01
88
01
89
01
90
pa
nB
01
92
01
93
01
94
01
95
01
96
01
97
01
98
01
99
aksA
02
01
02
02
02
03
02
04
02
05
02
06
02
07
02
08
Sulfolobus islandicusM 16 4 uid58841
Score: 81.21
02
06
02
07
02
08
02
09
pa
nB
02
11
02
12
02
13
02
14
02
15
02
16
02
17
02
18
aksA
02
20
02
21
02
22
02
23
02
24
02
25
02
26
02
27
Sulfolobus islandicusY G 57 14 uid58923
Score: 80.54
01
91
01
92
01
93
01
94
pa
nB
01
96
01
97
01
98
01
99
02
00
02
01
02
02
02
03
aksA
02
05
02
06
02
07
02
08
02
09
02
10
02
11
02
12
Sulfolobus acidocaldariusN8 uid189027
Score: 44.970
45
15
04
52
0
04
52
5
04
53
0
04
53
5
04
54
0
04
54
5
04
55
0
04
55
5
aksA
04
56
5
04
57
0
04
57
5
04
58
0
04
58
5
04
59
0
04
59
5
pa
nB
04
60
5
04
61
0
04
61
5
04
62
0
04
62
5
04
63
0
Sulfolobus acidocaldariusDSM 639 uid58379
Score: 44.97
trxR
09
32
09
33
09
34
09
35
09
36
09
37
09
38
09
39
aksA
09
41
09
42
09
43
09
44
09
45
09
46
09
47
pa
nB
09
49
09
50
09
51
09
52
09
53
09
54
Sulfolobus acidocaldariusRon12 I uid189028
Score: 44.97
04
50
5
04
51
0
04
51
5
04
52
0
04
52
5
04
53
0
04
53
5
04
54
0
04
54
5
aksA
04
55
5
04
56
0
04
56
5
04
57
0
04
57
5
04
58
0
04
58
5
pa
nB
04
59
5
04
60
0
04
60
5
04
61
0
04
61
5
04
62
0
Sulfolobus acidocaldariusSUSAZ uid232254
Score: 44.97
04
26
5
04
27
0
04
27
5
04
28
0
04
28
5
04
29
0
04
29
5
04
30
0
04
30
5
04
31
0
04
31
5
04
32
0
04
32
5
04
33
0
04
33
5
04
34
0
04
34
5
04
35
0
04
35
5
04
36
0
04
36
5
04
37
0
04
37
5
04
38
0
Sulfolobus tokodaii7 uid57807
Score: 44.97
05
28
05
29
05
30
05
32
pa
nB
05
34
05
35
05
36
07
1
07
2
05
37
aksA
05
39
07
3
05
40
05
41
05
42
05
43
05
44
05
45
05
46
Metallosphaera cuprinaAr 4 uid66329
Score: 43.69
14
72
14
73
14
74
14
75
14
76
14
77
14
78
14
79
14
80
14
81
14
82
14
83
14
84
14
85
14
86
14
87
14
88
14
89
14
90
14
91
Acidianus hospitalisW1 uid66875Score: 39.53
21
89
21
90
21
91
21
92
21
93
21
94
21
95
21
96
21
97
00
20
21
98
21
99
22
00
22
01
00
21
22
02
22
03
22
04
22
05
22
06
22
07
22
08
22
09
22
10
SyntTax Report http://archaea.u-psud.fr/SyntTax page 2/8
Metallosphaera sedulaDSM 5348 uid58717
Score: 39.53
06
11
06
12
06
13
06
14
pa
nB
06
16
06
17
06
18
06
19
06
20
06
21
06
22
aksA
06
24
06
25
06
26
06
27
06
28
06
29
06
30
06
31
06
32
Thermofilum 1910buid215374
Score: 27.38
03
36
5
03
37
0
03
37
5
03
38
0
03
38
5
03
39
0
03
39
5
03
40
0
03
40
5
03
41
0
03
41
5
03
42
0
03
42
5
03
43
0
03
43
5
03
44
0
03
44
5
03
45
0
03
45
5
03
46
0
03
46
5
03
47
0
03
47
5
03
48
0
03
48
5
03
49
0
Thermococcus CL1uid168259
Score: 26.38
14
35
14
36
14
37
14
38
14
39
14
40
14
41
14
42
14
43
14
44
14
45
14
46
14
47
14
48
14
49
14
50
14
51
14
52
Pyrococcus furiosusDSM 3638 uid57873
Score: 25.84
18
85
18
86
18
87
18
88
18
89
18
90
18
91
18
92
18
93
18
94
18
95
18
96
s0
43
t04
31
89
61
18
97
18
98
18
99
19
00
19
01
19
02
Thermococcus 4557uid70841
Score: 25.84
09
19
5
09
20
0
09
20
5
09
21
0
09
21
5
09
22
0
09
22
5
09
23
0
09
23
5
09
24
0
09
24
5
09
25
0
09
25
5
09
26
0
09
26
5
09
27
0
09
27
5
09
28
0
Pyrococcus furiosusCOM1 uid169620
Score: 25.84
09
07
5
09
08
0
09
08
5
09
09
0
09
09
50
91
00
10
69
0
09
10
5
09
11
0
09
11
5
09
12
0
09
12
5
09
13
0
09
13
5
09
14
0
09
14
5
09
15
0
09
15
5
09
16
0
Pyrococcus abyssiGE5 uid62903 C1
Score: 24.56
20
69
2 20
67
20
66
20
65
20
64
20
63
20
62
20
62
1
20
61
02
82
like
30
06
20
60
04
07
02
84
20
59
02
85
20
57
aksA
Pyrococcus NA2uid66551
Score: 24.56
04
44
04
45
04
46
04
47
04
48
04
49
04
50
04
51
04
52
04
53
04
54
04
55
20
02
04
57
04
56
04
58
04
59
04
60
04
61
04
62
Pyrococcus yayanosiiCH1 uid68281
Score: 24.30
08
15
0
08
16
0
08
17
0
08
18
0
08
19
0
08
20
0
08
21
0
08
22
0
08
23
0
08
24
0
08
25
0
08
26
0
t90
08
27
0
08
28
0
08
29
0
08
30
0
08
31
0
08
32
0
08
33
0
Thermofilum pendensHrk 5 uid58563 C1
Score: 24.30
05
72
05
73
05
74
05
75
05
76
R0
02
6
05
77
05
78
R0
02
7
05
79
t25
05
80
05
81
05
82
05
83
05
84
R0
02
8
R0
02
9
05
85
Thermococcus onnurineusNA1 uid59043
Score: 24.30
13
29
13
30
13
31
13
32
13
33
13
34
13
35
13
36
13
37
13
38
13
39
13
40
13
41
13
42
13
43
13
44
13
45
13
46
13
47
13
48
13
49
13
50
13
51
13
52
Thermococcus kodakarensisKOD1 uid58225
Score: 24.03
13
21
rplX
13
23
13
24
13
25
13
26
13
27
13
28
13
29
13
30
13
31
13
32
13
33
13
34
13
35
13
36
13
37
13
38
Candidatus Nitrososphaeragargensis Ga9 2 uid176707
Score: 24.03
13
60
0
13
61
0
13
62
0
13
63
0
13
64
0
13
65
0
13
66
0
13
67
0
13
68
0
13
69
0
sd
r2
trn
A1
13
72
0
mscL
rpiA
2
13
75
0
13
76
0
Pyrococcus horikoshiiOT3 uid57753
Score: 23.76
18
48
18
49
18
50
18
52
18
53
18
54
18
55
18
56
18
56
1
05
4
18
57
18
58
43
18
59
18
60
18
61
18
62
18
63
18
64
18
65
18
66
Thermococcus gammatoleransEJ3 uid59389Score: 23.76
08
70
08
71
dp
pF
dp
pD
op
pC
op
pD
08
76
08
77
srp
19
08
79
08
80
08
81
08
82
08
83
08
84
08
85
08
86
08
87
08
88
08
89
08
90
08
91
Pyrococcus ST04uid167261
Score: 23.49
17
75
17
76
17
77
17
78
17
79
17
80
17
81
17
82
17
83
17
84
17
85
17
86
t00
46
17
87
17
88
17
89
Thermococcus sibiricusMM 739 uid59399
Score: 23.49
09
19
09
20
09
21
09
22
09
23
09
24
09
25
09
26
09
27
09
28
09
29
09
30
09
31
09
32
09
33
09
34
09
35
09
36
Methanoregula boonei6A8 uid58815Score: 23.49
19
84
19
85
19
86
19
87
19
88
19
89
19
90
19
91
R0
03
8
19
92
19
93
19
94
19
95
19
96
19
97
19
98
Thermococcus AM4uid54735
Score: 23.49
71
2
23
18
44
2
57
2
55
8
48
7
66
2
60
9
56
2
88
0
75
5
23
20
23
21
23
22
70
0
46
3
85
9
SyntTax Report http://archaea.u-psud.fr/SyntTax page 3/8
Candidatus Caldiarchaeumsubterraneum uid227223
Score: 23.29
C0
10
3
C0
10
4
C0
10
5
C0
10
6
C0
10
7
C0
10
8
C0
10
9
C0
11
0
C0
11
1
C0
11
2
C0
11
3
C0
11
4
C0
11
5
C0
11
6
C0
11
7
C0
11
8
C0
11
9Thermoplasma volcanium
GSS1 uid57751Score: 23.29
02
85
02
86
02
87
02
88
02
89
02
90
02
91
02
92
02
93
02
94
02
95
02
96
02
97
02
98
02
99
09
03
00
03
01
03
02
Thermococcus litoralisDSM 5473 uid82997
Score: 23.29
10
98
0
14
49
0
14
49
5
10
99
5
11
00
0
14
50
0
14
50
5
07
05
4
07
05
9
07
06
4
07
06
9
07
07
4
07
07
9
07
08
4
07
08
9
07
09
4
07
09
9
07
10
4
07
10
9
Thermoplasma acidophilumDSM 1728 uid61573
Score: 23.29
13
52
13
53
Ta
t38
13
54
13
55
Ta
t39
13
56
13
57
13
58
13
59
13
59
13
61
fur
13
63
13
64
13
65
gcvH
pu
rA
13
68
13
70
Aeropyrum pernixK1 uid57757Score: 23.29
cca
ligT
top
A
17
96
1
17
97
fbp
A
17
99
1
17
99
a
pe
lA
18
04
va
lS
18
07
32
18
08
18
10
Aeropyrum caminiSY1 JCM 12091 uid222311
Score: 23.02
cca
ligT
top
A
11
25
11
26
fbp
A
11
28
11
29
pe
lA
va
lS
11
32
32
11
33
11
34
11
35
11
36
Pyrolobus fumarii1A uid73415Score: 22.75
19
76
R0
05
1
19
77
19
78
19
79
19
80
19
81
19
82
19
83
R0
05
21
98
4
19
85
19
86
Thermococcus barophilusMP uid54733Score: 22.75
01
51
1
01
51
2
01
51
3
01
51
4
01
51
5
01
51
6
01
51
7
01
51
8
01
51
9
01
52
0
01
52
1
01
52
2
01
52
3
01
52
4
01
52
5
01
52
6
01
52
7
01
52
8
01
52
9
Hyperthermus butylicusDSM 5456 uid57755
Score: 21.95
09
79
09
80
09
81
09
82
09
83
09
84
09
85
09
86
09
87
09
88
09
89
09
90
09
91
09
92
Methanosphaerula palustrisE1 9c uid59193
Score: 20.94
19
26
19
27
19
28
19
29
19
30
19
31
19
32
19
33
19
34
19
35
19
36
19
37
19
38
19
39
19
40
19
41
19
42
19
43
19
44
19
45
Aciduliprofundum booneiT469 uid43333
Score: 20.94
04
77
04
78
04
79
04
80
04
81
04
82
04
83
04
84
04
85
04
86
04
87
04
88
04
89
04
90
04
91
04
92
04
93
04
94
04
95
04
96
04
97
Ignisphaera aggregansDSM 17230 uid51875
Score: 20.94
11
14
11
15
11
16
11
17
11
18
11
19
11
20
11
21
11
22
11
23
11
24
11
25
11
26
11
27
11
28
11
29
11
30
11
31
11
32
11
33
11
34
11
35
11
36
11
37
Candidatus NitrosopumilusAR2 uid176130
Score: 19.87
08
33
0
08
33
5
08
34
0
08
34
5
08
35
0
08
35
5
08
36
0
08
36
5
08
37
0
08
37
5
08
38
0
08
38
5
08
39
0
08
39
5
08
40
0
08
40
5
Nitrosopumilus maritimusSCM1 uid58903
Score: 19.87
13
29
13
30
13
31
13
32
13
33
13
34
13
35
13
36
13
37
13
38
13
39
13
40
13
41
13
42
13
43
ectC
13
45
13
46
13
47
13
48
13
49
Aciduliprofundum MAR08339 uid184407
Score: 19.87
05
43
05
44
05
45
05
46
05
47
05
48
05
49
05
50
05
51
05
52
05
53
05
54
05
55
05
56
05
57
05
58
05
59
05
60
05
61
05
62
05
63
05
64
05
65
Candidatus Nitrosopumiluskoreensis AR1 uid176129
Score: 19.66
08
25
0
08
25
5
08
26
0
08
26
5
08
27
0
08
27
5
08
28
0
08
28
5
08
29
0
08
29
5
08
30
0
08
30
5
08
31
0
08
31
5
Ignicoccus hospitalisKIN4 I uid58365
Score: 19.13
02
99
03
00
03
01
03
02
03
03
03
04
03
05
03
06
03
07
03
08
03
09
03
10
03
11
03
12
03
13
03
14
Haloarcula marismortuiATCC 43049 uid57719 C1
Score: 17.85
tra
B
04
37
pu
rM
04
39
04
40
04
41
psm
A3
04
43
ma
nB
2
cd
d
ud
p2
yjlD
1
04
51
ga
lE
arg
1
gyrA
gyrB
Thermosphaera aggregansDSM 11486 uid48993
Score: 17.85
12
22
12
23
12
24
12
25
12
26
12
27
12
28
R0
03
7
12
29
12
30
12
31
12
32
12
33
R0
03
81
23
4
12
35
12
36
12
37
12
38
SyntTax Report http://archaea.u-psud.fr/SyntTax page 4/8
Thermoproteus uzoniensis768 20 uid65089
Score: 17.32
01
93
01
94
01
95
01
96
01
97
R0
2
01
98
01
99
02
00
02
02
02
03
02
04
02
05
02
06
02
07
02
08
02
09
Methanomassiliicoccus Mx1Issoire uid207287
Score: 17.050
82
35
08
24
0
08
24
5
08
25
0
08
25
5
08
26
0
08
26
5
08
27
0
08
27
5
08
28
0
08
28
5
08
29
0
08
29
5
08
30
0
Desulfurococcus1221n uid59133
Score: 16.78
12
29
12
30
12
31
12
32
12
33
12
34
12
35
12
36
R0
03
5
12
37
12
38
12
39
12
40
12
41
R0
03
6
12
42
12
43
R0
03
7
12
44
Methanobacterium AL21 uid63623Score: 16.78
24
44
24
45
24
46
24
47
24
48
24
49
24
50
24
51
24
52
24
53
24
54
24
55
24
56
Archaeoglobus profundusDSM 5631 uid43493 C1
Score: 16.31
00
01
00
02
00
03
00
04
00
05
00
06
00
07
00
08
00
09
00
10
Desulfurococcus fermentansDSM 16532 uid75119
Score: 16.31
13
32
13
33
13
34
13
35
13
36
R0
04
0
13
37
13
38
R0
04
1
13
39
13
40
13
41
13
42
13
43
13
44
13
45
R0
04
2
13
46
Vulcanisaeta moutnovskia768 28 uid63631
Score: 16.31
05
34
05
35
05
36
05
37
05
38
05
39
05
40
05
41
05
42
05
43
05
44
05
45
05
46
05
47
05
48
Pyrobaculum ogunienseTE7 uid84411 C1
Score: 16.04
03
61
03
62
03
63
03
64
03
65
03
66
03
67
03
68
03
69
03
70
03
71
03
72
03
73
03
74
03
75
03
76
03
77
03
78
03
79
Pyrobaculum arsenaticumDSM 13514 uid58409
Score: 16.04
17
53
17
54
17
55
arg
S
28
17
58
7 17
60
17
61
17
62
17
63
R0
03
0R
00
31
17
64
17
65
17
66
17
67
17
68
Methanosaeta conciliiGP6 uid66207 C1
Score: 16.04
33
95
33
96
33
97
33
98
34
05
34
06
34
07
34
08
34
09
34
11
34
12
34
13
34
14
34
15
Halobacterium salinarumR1 uid61571 C1
Score: 15.77
Ile
36
86
36
86
13
68
71
36
88
nir
DL
trp
D2
pp
iA
36
96
Le
u
36
95
36
99
crc
B1
crc
B2
37
06
trh
5
trkA
6
37
11
237
14
tmk
no
lA
ftsZ
3
37
19
37
21
co
fG
Halobacterium NRC1 uid57769 C1
Score: 15.77
trn
31
19
07
19
10
nir
D
trp
D2
pp
iA
19
16
trn
32
19
17
19
18
19
19
19
21
19
20
trh
5
trkA
6
19
25
pd
hA
1
19
27
tmk
no
lA
ftsZ
3
19
34
19
35
co
fG
Halorubrum lacusprofundiATCC 49239 uid58807 C1
Score: 15.77
18
58
18
59
18
60
18
61
18
62
18
63
18
64
18
65
18
66
18
67
18
68
18
69
Haloquadratum walsbyiDSM 16790 uid58673 C1
Score: 15.77
19
68
19
69
19
70
19
71
ald
H
sfs
A
tau
A
tau
C
tau
B
tau
C
19
78
pn
cB
Haloquadratum walsbyiC23 uid162019 C1
Score: 15.77
12
80
12
81
36
27
AB
tif2
a
no
p1
0
12
86
BAF
12
87
ph
zF
prt
2
12
90
A mta
P
scp
A
sm
c
Halovivax ruberXH 70 uid184819
Score: 15.50
07
14
07
15
07
16
07
17
07
18
07
19
07
20
07
21
07
22
07
23
07
24
07
25
07
26
07
27
07
28
07
29
07
30
Methanoculleus marisnigriJR1 uid58561Score: 15.50
18
96
18
97
18
98
pyrG
19
00
19
01
19
02
19
03
19
04
19
05
19
06
19
07
19
08
19
09
19
10
Methanopyrus kandleriAV19 uid57883
Score: 15.50
08
51
08
52
08
53
08
54
His
B
08
56
08
57
2 GA
R1
08
60
08
61
Ph
oU
2 08
64
arg
J
08
66
08
67
08
68
08
69
08
70
Cya
B
MethanomethylovoransDSM 15978 uid184864 C1
Score: 15.50
07
13
07
14
07
15
07
16
07
17
07
18
07
19
07
20
07
21
07
22
07
23
07
24
07
25
07
26
07
27
07
28
07
29
SyntTax Report http://archaea.u-psud.fr/SyntTax page 5/8
Natronomonas pharaonisDSM 2160 uid58435 C1
Score: 15.50
15
02
48
02
50
02
52
02
54
02
56
tpa
02
02
60
ub
iA
atp
D
02
66
02
68
prf
1
se
rA
Halorhabdus utahensisDSM 12940 uid59189
Score: 15.231
37
8
13
79
13
80
13
81
13
82
13
83
13
84
13
85
13
86
13
87
13
88
13
89
13
90
13
91
13
92
13
93
13
94
13
95
13
96
Methanocaldococcus infernusME uid48803Score: 15.23
11
25
11
26
11
27
11
28
11
29
11
30
11
31
11
32
11
33
11
34
11
35
11
36
11
37
11
38
11
39
11
40
11
41
11
42
11
43
11
44
11
45
Methanocaldococcus FS40622 uid42499 C1
Score: 15.23
10
45
10
46
10
47
10
48
10
49
10
50
10
51
10
52
10
53
10
54
10
55
10
56
10
57
10
58
10
59
10
60
10
61
10
62
Methanosarcina barkeriFusaro uid57715 C1
Score: 15.23
A3
01
3
A3
01
4
A3
01
5
A3
01
6
A3
01
7
A3
01
8
A3
01
9
A3
02
0
A3
02
1
A3
02
2
Thermoplasmatales archaeonBRNA1 uid195930
Score: 15.23
00
75
5
00
75
6
00
75
7
00
75
8
00
75
9
00
76
0
00
76
1
00
76
2
00
76
3
00
76
4
00
76
5
00
76
6
00
76
7
00
76
8
00
76
9
00
77
0
Natronomonas moolapensis8 8 11 uid190182
Score: 15.23
21
47
pg
i
21
49
21
50
rps1
5
recJ2
21
53
rps1
e
tmk
21
56
trkA
2
21
58
csp
A3
tfb
A1
ald
H1
21
63
ca
rA
so
pI
htr
1S
21
67
Pyrobaculum 1860uid82379
Score: 15.23
10
13
10
14
10
15
10
16
10
17
10
18
10
19
10
20
10
21
10
22
10
23
10
24
10
25
Methanococcus aeolicusNankai 3 uid58823
Score: 15.23
12
26
12
27
12
28
12
29
12
30
12
31
12
32
12
33
12
34
12
35
12
36
12
37
12
38
12
39
12
40
12
41
12
42
12
43
12
44
12
45
12
46
12
47
12
48
Methanococcus maripaludisC5 uid58741 C1
Score: 15.23
13
59
13
60
13
61
13
62
13
63
13
64
13
65
13
66
13
67
13
68
13
69
13
70
13
71
13
72
13
73
13
74
13
75
13
76
13
77
13
78
13
79
Methanococcus maripaludisC6 uid58947Score: 15.23
06
36
06
37
06
38
06
39
06
40
06
41
06
42
06
43
06
44
06
45
06
46
06
47
06
48
06
49
06
50
06
51
06
52
06
53
06
54
06
55
Methanococcus maripaludisC7 uid58847Score: 15.23
12
97
12
98
12
99
13
00
13
01
13
02
13
03
13
04
13
05
13
06
13
07
13
08
13
09
13
10
13
11
13
12
13
13
13
14
13
15
13
16
13
17
Methanococcus maripaludisX1 uid70729Score: 15.23
01
51
0
01
51
5
01
52
0
01
52
5
01
53
0
01
53
5
01
54
0
01
54
50
15
50
01
55
5
01
56
0
01
56
5
01
57
0
01
57
5
01
58
0
01
58
5
01
59
0
01
59
5
01
60
0
01
60
5
01
61
0
01
61
5
01
62
0
Haloferax volcaniiDS2 uid46845 C1
Score: 15.23
26
61
26
62
26
63
26
64
hp
cH
thiD
thiM
thiE
he
mG
26
70
26
71
26
72
26
73
mn
tA
his
D
MethanothermococcusIH1 uid51535 C1
Score: 15.23
02
19
02
20
02
21
02
22
02
23
02
24
02
25
02
26
02
27
02
28
02
29
02
30
02
31
02
32
02
33
Methanococcus maripaludisS2 uid58035Score: 15.23
thrB
02
96
be
ta
rpl1
5
02
99
03
00
hyp
A
03
02
03
03
03
04
03
05
03
06
03
07
03
08
03
09
03
10
03
11
03
12
03
13
03
14
iorB
1
Natrialba magadiiATCC 43099 uid46245 C1
Score: 14.97
14
47
14
48
14
49
14
50
14
51
14
52
14
53
14
54
14
55
14
56
14
57
14
58
14
59
14
60
14
61
14
62
14
63
MethanocaldococcusDSM 2661 uid57713 C1
Score: 14.97
09
04
09
05
09
06
09
07
09
08
09
09
09
10
09
11
09
12
09
13
09
14
09
15
09
16
pn
k
Methanoplanus petroleariusDSM 11571 uid52695
Score: 14.97
07
93
07
94
07
95
07
96
07
97
07
98
07
99
08
00
R0
01
8
08
01
08
02
08
03
08
04
08
05
SyntTax Report http://archaea.u-psud.fr/SyntTax page 6/8
Ferroglobus placidusDSM 10642 uid40863
Score: 14.97
14
82
14
83
14
84
14
85
14
86
14
87
14
88
14
89
14
90
14
91
14
92
R0
03
0
14
93
14
94
14
95
14
96
14
97
Methanocaldococcus fervensAG86 uid59347 C1
Score: 14.971
19
0
11
91
11
92
11
93
11
94
11
95
11
96
11
97
11
98
11
99
12
00
12
01
12
02
12
03
Methanocaldococcus vulcaniusM7 uid41131 C1
Score: 14.97
01
04
01
05
01
06
01
07
01
08
01
09
01
10
01
11
01
12
01
13
01
14
01
15
01
16
01
17
01
18
01
19
Cenarchaeum symbiosumA uid61411
Score: 14.70
02
51
02
52
02
53
02
54
02
55
02
56
02
57
02
58
02
59
02
60
02
61
02
62
02
63
02
64
02
65
02
66
02
67
02
68
02
69
02
70
02
71
02
72
02
73
02
74
02
75
Acidilobus saccharovorans345 15 uid51395
Score: 14.70
R0
00
5
01
68
01
69
01
70
01
71
01
72
01
73
01
74
01
75
01
76
R0
00
6
01
77
01
78
01
79
01
80
01
81
Caldivirga maquilingensisIC 167 uid58711
Score: 14.70
05
12
05
13
05
14
19
05
16
05
17
05
18
05
19
ub
iA
05
21
05
22
05
23
R0
01
1
05
24
dcd
05
26
05
27
05
28
Archaeoglobus veneficusSNP6 uid65269
Score: 14.70
07
81
07
82
07
83
07
84
07
85
07
86
07
87
07
88
07
89
07
90
07
91
07
92
07
93
07
94
07
95
Pyrobaculum neutrophilumV24Sta uid58421
Score: 14.70
08
39
08
40
08
41
08
42
08
43
08
44
08
45
08
46
08
47
08
48
08
49
08
50
08
51
08
52
08
53
08
54
08
55
08
56
08
57
08
58
15
Methanohalobium evestigatumZ 7303 uid49857 C1
Score: 14.70
08
72
08
73
08
74
08
75
08
76
08
77
08
78
08
79
R0
02
3
08
80
08
81
08
82
08
83
08
84
08
85
08
86
Methanosarcina mazeiGo1 uid57893
Score: 14.70
04
67
04
68
04
69
04
70
04
71
04
72
04
73
arg
J
04
75
arg
C
04
77
04
78
04
79
Halorhabdus tiamateaSARL4B uid214082 C1
Score: 14.50
10
49
10
50
10
51
10
52
10
53
10
54
10
55
10
56
10
57
10
58
Pyrobaculum aerophilumIM2 uid57727Score: 14.50
19
21
19
22
19
24
aro
B
19
27
19
29
19
31
19
32
19
34
19
35
19
36
19
39
19
41
19
43
19
44
19
46
Halalkalicoccus jeotgaliB3 uid50305 C1
Score: 14.50
04
28
5
04
29
0
04
29
5
pyrH
04
30
5
04
31
0
04
31
5
04
32
0
04
32
5
04
33
0
04
33
5
04
34
0
04
34
5
04
35
0
04
35
5
04
36
0
15
27
6
04
36
5
04
37
0
04
37
5
Salinarchaeum laminariaeHarcht Bsk1 uid207001
Score: 14.50
14
65
5
14
66
0
14
66
5
14
67
0
ga
tA
14
68
0
14
69
5
14
70
0
14
70
5
14
71
0
14
71
5
14
72
0
Staphylothermus marinusF1 uid58719Score: 14.50
13
98
13
99
14
00
14
01
14
02
14
03
14
04
14
05
14
06
14
07
14
08
14
09
MethanothermobacterDelta H uid57877
Score: 14.50
29
2
29
3
29
4
29
5
29
6
29
7
29
8
29
9
30
0
30
1
30
2
30
3
30
4
30
5
30
6
30
7
30
8
30
9
Methanococcus voltaeA3 uid49529Score: 14.50
08
54
08
55
08
56
08
57
08
58
08
59
08
60
08
61
08
62
08
63
08
64
08
65
08
66
08
67
MethanothermobacterMarburg uid51637 C1
Score: 14.50
07
56
0
07
57
0
07
58
0
07
59
0
07
60
0
07
61
0
07
62
0
07
63
0
07
64
0
07
65
0
07
66
0
07
67
0
07
68
0
07
69
0
07
70
0
07
71
0
Methanosaeta thermophilaPT uid58469Score: 14.50
08
56
08
57
08
58
08
59
08
60
08
61
pyrG
08
63
08
64
08
65
08
66
08
67
08
68
SyntTax Report http://archaea.u-psud.fr/SyntTax page 7/8
Methanosaeta harundinacea6Ac uid81199 C1
Score: 14.50
17
12
17
13
17
14
17
15
17
16
17
17
17
18
17
19
17
20
Nanoarchaeum equitansKin4 M uid58009
Score: 14.23
02
1
02
3
02
4
02
5
02
6
02
8
02
7
02
9
03
0
03
1
t03
03
2
03
3
03
4
03
5
03
6
37
03
7
t04
03
9
04
1
Methanohalophilus mahiiDSM 5219 uid47313
Score: 14.23
18
08
18
09
18
10
18
11
18
12
18
13
18
14
18
15
18
16
18
17
18
18
18
19
18
20
18
21
18
22
18
23
18
24
Candidatus Korarchaeumcryptofilum OPF8 uid58601
Score: 14.23
00
92
00
93
00
94
00
95
00
96
00
97
00
98
00
99
01
00
01
01
01
02
01
03
01
04
01
05
Vulcanisaeta distributaDSM 14429 uid52827
Score: 14.23
07
99
08
00
08
01
08
02
08
03
08
04
08
05
08
06
08
07
08
08
08
09
08
10
08
11
08
12
08
13
08
14
08
15
08
16
08
17
Caldisphaera lagunensisDSM 15908 uid183486
Score: 13.96
00
68
00
69
00
70
00
71
00
72
00
73
00
74
00
75
00
76
00
77
00
78
00
79
00
80
Desulfurococcus mucosusDSM 2162 uid62227
Score: 13.96
13
45
13
46
13
47
13
48
13
49
13
50
13
51
13
52
13
53
13
54
13
55
13
56
13
57
R0
04
9
13
58
13
59
13
60
13
61
13
62
13
63
13
64
13
65
13
66
Methanococcus vannieliiSB uid58767Score: 13.96
10
73
10
74
rplX
10
76
10
77
10
78
10
79
10
80
10
81
10
82
10
83
R0
02
6R
00
27
10
84
10
85
10
86
10
87
10
88
10
89
10
90
10
91
10
92
Methanobrevibacter smithiiATCC 35061 uid58827
Score: 13.96
05
25
05
26
05
27
05
28
05
29
05
30
05
31
05
32
05
33
05
34
05
35
05
36
05
37
05
38
05
39
05
40
18
03
18
04
18
05
Ferroplasma acidarmanusfer1 uid54095Score: 13.96
00
00
10
01
3
00
00
10
01
4
00
00
10
01
5
00
00
10
01
6
00
00
10
01
7
00
00
10
01
8
00
00
10
01
9
00
00
10
02
0
00
00
10
02
1
00
00
10
02
2
Thermogladius 1633uid167488
Score: 13.69
11
38
11
39
11
40
11
41
11
42
11
43
11
44
11
45
11
46
11
47
11
48
11
49
11
50
11
51
11
52
11
53
11
54
11
55
11
56
SyntTax Report http://archaea.u-psud.fr/SyntTax page 8/8
Query protein sequence:maevvrayil vsttvgkeme vadmakkvsg viradpvyge ydvvveveak ssddlkkviy eirrnpniir tvtlivm
Genomes without synteny:Staphylothermus_hellenicus_DSM_12710_uid45893Fervidicoccus_fontis_Kam940_uid162201Pyrobaculum_calidifontis_JCM_11548_uid58787Pyrobaculum_islandicum_DSM_4184_uid58635Thermoproteus_tenax_Kra_1_uid74443Archaeoglobus_fulgidus_DSM_4304_uid57717Archaeoglobus_sulfaticallidus_PM70_1_uid201033Haloarcula_hispanica_ATCC_33960_uid72475_C1Haloarcula_hispanica_N601_uid230920_C1Haloferax_mediterranei_ATCC_33500_uid167315_C1Halogeometricum_borinquense_DSM_11551_uid54919_C1Halomicrobium_mukohataei_DSM_12286_uid59107_C1Halopiger_xanaduensis_SH_6_uid68105_C1Haloterrigena_turkmenica_DSM_5511_uid43501_C1Natrinema_J7_uid171337_C1Natrinema_pellirubrum_DSM_15624_uid74437_C1Natronobacterium_gregoryi_SP2_uid74439Natronococcus_occultus_SP4_uid184863_C1Methanobacterium_MB1_uid231690Methanobacterium_SWAN_1_uid67359Methanobrevibacter_AbM4_uid206516Methanobrevibacter_ruminantium_M1_uid45857Methanosphaera_stadtmanae_DSM_3091_uid58407Methanothermus_fervidus_DSM_2088_uid60167Methanotorris_igneus_Kol_5_uid67321Methanocella_arvoryzae_MRE50_uid61623Methanocella_conradii_HZ254_uid157911Methanocella_paludicola_SANAE_uid42887Methanocorpusculum_labreanum_Z_uid58785Methanoculleus_bourgensis_MS2_uid171377Methanoregula_formicicum_SMSP_uid184406Methanospirillum_hungatei_JF_1_uid58181Methanococcoides_burtonii_DSM_6242_uid58023Methanolobus_psychrophilus_R15_uid177925Methanosalsum_zhilinae_DSM_4017_uid68249Methanosarcina_acetivorans_C2A_uid57879Methanosarcina_mazei_Tuc01_uid190185Picrophilus_torridus_DSM_9790_uid58041
10 20 30 40 50....|....| ....|....| ....|....| ....|....| ....|....|
Lrp 1 MVDSKKRPGK DLDRIDRNIL NELQKDGRIS NVELSKRVGL SPTPCLERVR
sso10340 1 ---------- ---------- ---------- ---------- ----------
Clustal C
60 70 80 90 100....|....| ....|....| ....|....| ....|....| ....|....|
Lrp 51 RLERQGFIQG YTALLNPHYL DASLLVFVEI TLNRGAPDVF EQFNTAVQKL
sso10340 1 ---------- ---------M AEVVRAYILV STTVGKEMEV ADM---AKKV
Clustal C : : .:: : : . * . :: .:*:
110 120 130 140 150....|....| ....|....| ....|....| ....|....| ....|....|
Lrp 101 EEIQECHLVS GDFDYLLKTR VPDMSAYRKL LGETLLRLPG VNDTRTYVVM
sso10340 29 SGVIRADPVY GEYDVVVEVE AKSSDDLKKV IYE-IRRNPN IIRTVTLIVM
Clustal C 12 . : ... * *::* :::.. . . . :*: : * : * *. : * * :**
160....|....| ....
Lrp 151 EEVKQSNRLV IKTR
sso10340 77 ---------- ----
Clustal C
Figure S3
Figure S4
1 Sso 0438 peak area(380600-380750) nr 220
AAATAGTTACACTTATCTCCTAAAGGCATTGGTTTCTGCTCATCAGTAAAGGAGGGCTGTCTTTCAAAGCCTCT
ATTTCC
GTTCCTCAAACTACTTATACCCCCAAACTTTTAGACAATCATATTAAACTTACATTTTGACCCATTTAAACATT
ATGTATCTTATACCTTCACTTTTACGATCCACTAGAGCCAATACTAATCTCTTTTTAGTGCTATGTGAGACTCT
ACCGAAACTCAATAGTTCATGAG
Primer EMSASso0438 S 5 CAAAGCCTCTATTTCCGTTC 3 TM 53.8 EMSAsso0438 A 5 CGGTAGAGTCTCACATAGCA 3 TM 55.7 168bp
2 Sso 0570 peak area (502350-502600) nr192 (with rps26E together transcription)
GTTGTGATCAATGTGGTGCTAGAGTACCAGAGGATAAGGCAGTATGTGTAACAAAAATGTATAGCCCCGTGGAT
GCTTCT
CTAGCATCTGAATTAGAAAAGAAGGGTGCAATAATTGCTAGATATCCTGTAACTAAGTGTTACTGTGTGAATTG
TGCGGT
ATTTTTGGGTATTATTAAGATAAGAGCAGAAAATGAGAGAAAGCAAAAAGCTCGTTTAAGATAGGCTTTTAAAC
CTTTAG
TCAGAATATGTGATGAAATGAGACTTTATGAATTATCTTTTGCACAAATTGAAGATTTTTTCTATAAACTAGCA
GAAGTTAAAGATATTATAAAAGATCATGGTCTATTAG
Primer EMSAsso0570 S 5 GTTGTGATCAATGTGGTGCTAG 3 TM 57.4 EMSAsso0570 A 5 CGCACAATTCACACAGTAACA 3 TM 57.5 158bp
3 Rp144e peak area (907700-908000) nr 195
TATATAATGATCACAGGGTTAATGAGGGAGATGTTTTGGTTTTACCTATGAGAGAAGCCTTGCCATTAATAATA
GCAAGT
TATTTAACTCCCTATAAGATAGATATTGAAGAACAATTATGAAAGTCCCTAAGGTCATCAGCACATATTGTCCA
AAGTGTAAGACTCATACAGATCACTCTGTATCACTATACAAGAGCGGTAAGAGAAGAAATCTCGCTGAAGGACA
GAGAAGATATGAGAGAAAGAATATTGGATATGGAAGTAAAAGAAAACCAGAACAGAAGAGATTTGCAAAAGTT
Primer EMSArp144e S 5 GCAAGTTATTTAACTCCCTATAAG 3 TM 53.2 EMSArp144e A 5 CTCTCATATCTTCT CTGTCCTTC 3 TM 55.2 168bp
4 Sso 1758 peak area (1592900-1593150) nr 213
AGACGACCTCAAATGACCTGCCTTTCTCTATGATACCAACAGTATCAACGATCGTACCGTGTTCAACCCTCTTA
GGATCTACACCATGAAGTTCTAAAGCTTTCCTATAAAGTTTTAACATTGTCCTTTCAAACCCTTTACCGGCCCT
AGAAGTAAAACT
ACCTAACTCAACTGAAAACTTCTTTTGACTCCTAACTAGTCTAAGAATGATCCTGGAATGTCTTTCAATTCCTT
TTTGGGAGAGAGACTATTGCTTCTCCTTGCTTTTTGACTTCTTGCTGTAATGACGTAATAGCCTCACTGTGCCT
CTTAACCTCTTCTTGCAAGGATCTTATTGTCTCTTGTAAAAGCTGAATGCTTTTAGTATTTTCTTCCAATCTCT
TTATCACTATTTCATCC
Primer EMSAsso1758 S 5 GAATGATCCTGGAATGTCTTTC 3 TM 54.5 EMSAsso1758 A 5 GGAAGAAAATACTAAAAGCATTCAG 3 TM 55.2 171bp
5 Sso 1879 peak area (1692550-1692750)
GGTTATCCACCTTATGGATACCTTGTAACGATAATATTAAATATTGAGGAGCTGTCAGACGCATTGCGAAGTTT
AGCTGAGTACTTACGCTCCATAATGAATTAACAGAATAAGGTTATAGAAGCGGAATGATAAATGGCTAACTTTA
TCACCTCAATAC
ATTAAATCTATATTGTGACATTTGACGACTTAAATAAGTTAATCAGAGAGAAACTTAGCGTAGAAACGTACCCT
TATCAAAAGTACATCAG
Primer EMSAsso1879 S 5 CAGACGCATTGCGAAGTTTA 3 TM 56.5 EMSAsso1879 A 5 CGCTAAGTTTCTCTCTGATTAAC 3 TM 54.9 165bp
6 Rpom-2 peak area (1731850-1732040) nr 105.8 (with sso1913 together transcription)
TGAGCTACATTCTCCTTTAATATTACTTTCTCTACATCATGATCAGAATACCCACACTTACTACAAACCATTTT
GTTACCTTTTACCTTCAAGAAAGAACCACATTTAGGGCAAAAACGCATATCTAGAAATGAGTCTTTTAATCATA
AAAGCTTGCGGC
TAACATACTAACTCCGGGCCAAGGTGTGTCA
Primer EMSArpom-2 S 5 GATCAGAATACCCACACTTAC 3 TM 53.3 EMSArpom-2 A 5 GAGTTAGTATGTTAGCCGCA 3 TM 54.3 134bp
7 Sso1984 peak area (1796450-1796650) nr 363
TGTACCAAATGGTCAAGCAATAAATTATAATGGACATACAGATCCTGTGGTGATTTAATACTAAGTAATGGAAC
TATGATACAAAATGTGGTATGGGATGGACAATATGCAGGTACAATAATTCAAAATCATTACCAAATAGTTCAAT
TGAATGATGAATGGGTAGGAAGAACCGACCCAGTGAATAATCAACAATATGTA
Primer EMSAsso1984 S 5 GGTCAAGCAATAAATTATAATGGAC 3 TM 55.0 EMSAsso1984 A 5 CCTACCCATTCATCATTCAATTG 3 TM 55.9 157bp
GCTTAAAGGGTTTGTTTGTTTTGGTGGTGTTATGGGTGCGTTGAAGTACGTGGCCATTGGCGTTGTGGTTTTCG
CCACCACTGTGTTTTACTACTACCGACACCGGGTGCCTGTGCAGTACGTGGGTTCACCCAGCGGTTATGAGGCA
TTTGTACCAAATGGTCAAGCAATAAATTATAATGGACATACAGATCCTGTGGTGATTTAATACTAAGTAATGGA
ACTATGATACAAAATGTGGTATGGGATGGACAATATGCAGGTACAATAATTCAAAATCATTACCAAATAGTTCA
ATTGAATGATGAATGGGTAGGAAGAACCGACCCAGTGAATAATCAACAATATGTAACTTTTCAAGATTTCTACG
TGATCAAGGGTCAAGTACCAATTGAGAATGTAACAATAAATGGACAGACGTATTACGTGATAGATGCGGATAAA
ATAAACCCAGCGGACATCGCGGGATTTTTCACATACTGGAGATGGGTAAACAACTTC
FPsso1984 S 5 GCTTAAAGGGTTTGTTTGTTTTG 3 TM 55.4 FPsso1984 A 5 GAAGTTGTTTACCCATCTCCAG 3 TM 56.5 501bp
8 Sso 2102 peak area (1923900-1924100) nr 143
GAATGCGAAATATGTAAAGCAATAGTTTCAGTATTATGTGGACTATTAGCTGAAGGAGTAGCTAAAAGTGTGGC
ATGTGACGAAGCTTGTGGAACAGTTTGCTTAATATTTGTTGAGGATCCCATTATTTATGATATTTGTGTGGTAA
TATGTATACCTTCTTGTGATGAACTACTTCAACTAATTATCTCAATAGGAGTAGCGACTGCATGTGGACTAGGT
GGTGAGTATCTATGTCAA
AAGGCTGGTCTGTGTTGCTAATAGATTTTTTTTAGATATGTGGTTATAAATAGCTAAGGAGGGAGGGATAGAA
ATGGAAAGAAAGGGGACAGAAATAGAAAGAAATAAGATATCTTTTTTAAAGTGGCTTGAGCTAACATTATTATT
TGTGATTTTGCCTT
Primer EMSAsso2102 S 5 GTGGACTATTAGCTGAAGGAG 3 TM 54.6 EMSAsso2102 A 5 CAGTCGCTACTCCTATTGAGATA 3 TM 56.5 171bp
9 Sso 2626 peak area (2391350-2391550) nr 167.4
ATGAGATAACTAAACAATTGAGAGATGAAGCTGACAAATTACAACAACCCCTTAAGAAGTATATTGGGCTTGTT
CACAAT
GTAGGTGGCACAGGTCACTTTGCATATGTTATGATTCTAAGAAGGTGACCTTAGATGCAAATAGATGCAATACC
GTTATCAATAAAATATAAGATCAAATACCCGGACGAATTTATTGAAGCAGTTAAGAGGGGAGAAATTGTAGCTA
CCAAATGTAAAAATTGTGGTTCC
Primer EMSAsso2626 S 5 CAACAACCCCTTAAGAAGTATATT 3 TM 54.5 EMSAsso2626 A 5 CTCCCCTCTTAACTGCTTCAAT 3 TM 57.5 175bp
Pacs
10 Pacs peak aera (2411750-2411950) nr 1080
CTTCTCAGTGGCCATGTGGCATTCCCTTAGGTCCATTCCTTAAATACTCTTCCGGATTTCTCTGAAACTCCCTT
AGACAA
TGAGATGAGCAGAAATAGTAGATTTTTCCCTTATACATTGTCTTATATTGACTTTTCTCATCTACTTCCATTCC
ACAAACCGGATCGATTATCATAATACTAATTATTGCTTTTATTATAAAGAACCTTTTCTTCTGTAAATTTGTAT
CTATATATTGTTGATTACAGTCAAAATGACTCTAAAACTAATGTAAGTGCAAGCCATTGTTGCGTGCAAATTTT
TCCTTTAATCCAGACAAGCAAGAATTGCAACAAGCATAATACGTTTTCC
Primer EMSApacs S 5 GTCTTATATTGACTTTTCTCATCTAC 3 TM 54.0 EMSApacs A 5 GCTTGTCTGG ATTAAAGGAAAA 3 TM 54.4 201bp
CCGGATTTCTCTGAAACTCCCTTAGACAATGAGATGAGCAGAAATAGTAGATTTTTCCCTTATACATTGTCTTATATTGACTTTTCTCATCTACTTCCATTCCACAAACCGGATCGATTATCATAATACTAATTATTGCTTTTATTATAAAGAACCTTTTCTTCTGTAAATTTGTATCTATATATTGTTGATTACAGTCAAAATGACTCTAAAACTAATGTAAGTGCAAGCCATTGTTGCGTGCAAATTTTTCCTTTAATCCAGACAAGCAAGAATTGCAACAAGCATAATACGTTTTCCTTCCTACCTTATAGGTCAAGGGATTATCATATATTTCCTTACCGCAGTAATCACATTTAACTACAAAACTGCTCTTGCCTATGAAAACTAAATCCTTTTCTTTTACATTTTTTAACTTATTTTCTAAATCTTCTAAAGTGTTAGCTCTTATCATGTTAATAAATCTACCATCCAACAGTTTGTAACACTCGTCACTTTGA
FPpacs S 5 CTCTGAAACTCCCTTAGACAATG 3 TM 56.4 FPpacs A 5 CAAACTGTTGGATGGTAGATTTA 3 TM 54.3 474bp
Adh-11
11 Adh-11 peak area (2470250-2470400) nr 120
ATAAAAAATACCTTCCACAATCTTAGCTCACCTTTATTTGCCTATAGTTAGTTATTATTACTGTTCCGATGAAG
TATATA
AGACCTATCCCAATTAAGATTTCCATTCCTAAGAGATATGCTTTGGTGACTTCCACAGTAGCACCAAATACTAT
TGGTAC
TATTATACCCCATAGAGTTTCCCAAAAGCCAATATGACCACCAAATTGACCAGCTAATTCAGTAGAAACTAGAA
AGGATGGTGCTGCCCATTGTATCCCCGACCATCTTAGGAAGAAGAAAGTTACCATTAGTATTTCAAC
Primer EMSAadh-11 S 5 GAGATATGCTTTGGTGACTTCC 3 TM 56.0 EMSAadh-11 A 5 CCTAAGATGGTCGGGGATAC 3 TM 55.8 160bp
12 Sso 3178 peak area (2923900-2924050) nr 206
TTTCTTTAAAGCTCTATCACATGTAGTCCAGTTCTCATATAGTTCCTTATCCTCTTTTCTTAGCAGTCATATTG
ACTTATGAATTATAATCAACGACGTTTTTGGGAGTAAAGGTTAAGTTTGCTCACATTTCTACTTTAAGAGAGAG
TAAATTAAATTAACTTTCCTTTAAATTTTCTAATGCTTTGAGGTTCTATGCAAGAGCTGGTTTACAATGGTATA
TAGTTAAAAGAGAGGGCTGAAAATGAAACTTGGAGTGGAAAAATTGTTGCCAGCTTATTAAAAACCAAGTGGAG
CATTATCGTAGCTGAGAAAACATAGTCCAGTCCTTAATTCTAGCTTTAGTGCGACGTATTTTGTGAAAAAATCG
CTTGATTTTTGCATGTTTCAAAACATTTTTA
Primer EMSAsso3178 S 5 GCTCTATCACATGTAGTCCAG 3 TM 55.1 EMSAsso3178 A 5 GCATAGAACCTCAAAGCATTAG 3 TM 55.2 188bp
13 Sso3189 peak area (2935250-2935550)
AAGTAAATTATGAACGATTATCTAATCATACTTTTGCAATGGATAAGTAAATTTAAATAGGGGAATTTAGAAGA
TAGCGT
GTGAGTAACAAAAATAGAATATTCGTAAGGGAGACTTCTGGTTTAATAAAGAACGTATCATTATGGGATGCAGT
TGCACTCAATATAGGCAATATGTCAGCTGGAGTAGCATTATTTGAATCAATATCACCATATGTACAACAAGGAG
GAGTATTGTGGCTGGCTTCATTAATAGGCTTCATCTTCGCTATACCACAACTGTTAATTTATGTATTTTTAAC
Primer EMSAsso3189 S 5 GGGGAATTTAGAAGATAGCGT 3 TM 54.5 EMSAsso3189 A 5 CAGCCACAATACTCCTCCTT 3 TM 56.2 183bp
14 Sso 12210 peak area (2966200-2966400)
TTTGCACGCTAACTTCAACTGGGCTTTTAACGCCCTAAGGGTTTGTTCGTCAGTGTATGCACGGAAGCGAAACC
CTAAGGTGGGTATTGATGTGTATTTTGTTATTTTCCTATTTTTAACTTTTCTACAAAGGGATTCATCTCAAAGA
GGCGAAGTTTTCCGCCCCCTTTGAACCCCCGTCTGTTATAAACATAATACGCAATCATAGGTCAGATTGACTAC
AGATGATAGCTTATATGGCTGAAATGTAGTATAAAAATACCTAAGACGTACTGGTGTTTACCCGTGGCGTAACT
TCTGC
Primer EMSAsso12210 S 5 CTTTTCTACAAAGGGATTCATCTC 3 TM 55.6 EMSAsso12210 A 5 GGTAAACACCAGTACGTCTT 3 TM 54.5 204 bp
Area 505950-506100 151bp
CTCCTAAAAATAGCGAACCGCCACTCTTTCTTGCAAATTCTATAATACCTTCTGCTTGTTCCCTTCTAACATAA
ATTTTTCCCTCTATATCCTTTCTACTCTTCATCTCAATTAAAATAATAACGCCATTCTTTAAAGCGATAATATC
CGGTATAGGGTC
Primer EMSA no binding S 5 CTCCTAAAAATAGCGAACCG 3 TM 53.5 EMSA no binding A 5 GACCCTATACCGGATATTATCG 3 TM 54.5 160bp
Published Ahead of Print 25 June 2014. 2014, 88(17):10264. DOI: 10.1128/JVI.01495-14. J. Virol.
Martinez-Alvarez, Yang Guo and Xu PengLing Deng, Fei He, Yuvaraj Bhoobalan-Chitty, Laura Archaeal Rudivirus EntrySecretion Proteins Responsible for Unveiling Cell Surface and Type IV
http://jvi.asm.org/content/88/17/10264Updated information and services can be found at:
These include:
SUPPLEMENTAL MATERIAL Supplemental material
REFERENCEShttp://jvi.asm.org/content/88/17/10264#ref-list-1at:
This article cites 32 articles, 11 of which can be accessed free
CONTENT ALERTS more»articles cite this article),
Receive: RSS Feeds, eTOCs, free email alerts (when new
http://journals.asm.org/site/misc/reprints.xhtmlInformation about commercial reprint orders: http://journals.asm.org/site/subscriptions/To subscribe to to another ASM Journal go to:
on Novem
ber 2, 2014 by Copenhagen U
niversity Libraryhttp://jvi.asm
.org/D
ownloaded from
on N
ovember 2, 2014 by C
openhagen University Library
http://jvi.asm.org/
Dow
nloaded from
Unveiling Cell Surface and Type IV Secretion Proteins Responsible forArchaeal Rudivirus Entry
Ling Deng, Fei He, Yuvaraj Bhoobalan-Chitty, Laura Martinez-Alvarez, Yang Guo, Xu Peng
Archaea Centre, Department of Biology, University of Copenhagen, Copenhagen, Denmark
Sulfolobus mutants resistant to archaeal lytic virus Sulfolobus islandicus rod-shaped virus 2 (SIRV2) were isolated, and muta-tions were identified in two gene clusters, cluster sso3138 to sso3141 and cluster sso2386 and sso2387, encoding cell surface andtype IV secretion proteins, respectively. The involvement of the mutations in the resistance was confirmed by genetic comple-mentation. Blocking of virus entry into the mutants was demonstrated by the lack of early gene transcription, strongly support-ing the idea of a role of the proteins in SIRV2 entry.
To date, relatively few archaeal viruses have been character-ized, and most of those that have been characterized infect
acidothermophilic members of the order Sulfolobales. Despitetheir limited number of around 50 species, they exhibit consider-ably greater morphological diversity than the more extensivelycharacterized bacteriophages, about 95% of which show head-tailmorphologies. Archaeal viruses, in contrast, exhibit fusiformshapes, often with one or two tails, bottle shapes, bearded-globu-lar forms, and a wide variety of rod-like and filamentous morpho-types which often carry small terminal appendages (1–3). Thismorphological diversity suggests that the archaeal viruses mayemploy a variety of mechanisms to enter their hosts, but currentinsights into entry mechanisms are limited to an OppA trans-porter protein, Sso1273, possibly providing a receptor site for theAcidianus two-tailed virus (ATV) in Sulfolobus solfataricus P2 (4).And very recently, microscopic studies suggested that Sulfolobusislandicus rod-shaped virus 2 (SIRV2) enters the host cell by at-taching and moving through a pilus-like filament; however, thenature of the structure and the identity of the involved proteinsremain elusive (5).
Sulfolobus solfataricus P2 is an acidothermophilic crenar-chaeon that can host a wide range of archaeal viruses, many ofwhich are propagated stably (1, 3, 6). Moreover, few of the virusesappear to induce cell lysis, possibly reflecting a need to minimizecontact with the harsh hot acidic environment. However, recentstudies have identified a few viruses that can enter a lytic phase,including the Sulfolobus turreted icosahedral virus (STIV), thetwo-tailed fusiform (ATV), and, more recently, the rudivirusSIRV2 (7–9).
SIRV2 is classified in the family Rudiviridae together with otherwell-characterized viruses, including SIRV1 (10, 11), ARV1 (12)and SRV1, (13), all of which are rod shaped and lack an envelope,and their genomes consist of linear double-stranded DNA withcovalently closed ends (10, 14, 15). In a recent microarray analysisof S. solfataricus infected with SIRV2, we demonstrated that theviral genes were activated at different times and that mainly stress-response host genes and those implicated in vesicle formationwere downregulated (16). The results also illustrated that SIRV2infection at a multiplicity of infection (MOI) of 30 resulted ingrowth inhibition of S. solfataricus 5E6 (16). In the present exper-iment, the culture was infected at a lower MOI (�1) which also ledto a growth retardation, but the infected culture could enter theexponential-growth phase at 80 h postinfection (p.i.) (Fig. 1A).
The surviving cells (named 5E6R) appeared to be resistant toSIRV2 because, in contrast to the sensitive 5E6 strain, no growth
Received 24 May 2014 Accepted 16 June 2014
Published ahead of print 25 June 2014
Editor: A. Simon
Address correspondence to Xu Peng, [email protected].
Supplemental material for this article may be found at http://dx.doi.org/10.1128/JVI.01495-14.
Copyright © 2014, American Society for Microbiology. All Rights Reserved.
doi:10.1128/JVI.01495-14
FIG 1 (A) Growth retardation of S. solfataricus 5E6 upon SIRV2 infection. (B)Resistance of S. solfataricus 5E6R to SIRV2. OD600, optical density at 600 nm.
10264 jvi.asm.org Journal of Virology p. 10264 –10268 September 2014 Volume 88 Number 17
on Novem
ber 2, 2014 by Copenhagen U
niversity Libraryhttp://jvi.asm
.org/D
ownloaded from
retardation was observed when 5E6R was diluted and infectedwith SIRV2 at the same MOI (Fig. 1).
In order to manipulate the SIRV2-sensitive S. solfataricus 5E6strain genetically (17), 10 pyrEF mutants, labeled Sens1 to Sens10,were isolated from Gelrite plates containing 5-fluoroorotic acid(5=-FOA). Their mutation sites in the pyrEF gene region wereidentified by a combination of PCR amplification, restriction di-gest analysis, and sequencing (17). All the mutations were shownto result from transposon insertions, either IS elements or minia-ture inverted terminal repeat elements (MITEs), and the inser-tions occurred in the coding sequences or within the single pro-moter (Fig. 2A). These results are consistent with the previousreports demonstrating high transposition activity in S. solfataricusand its contribution to chromosomal plasticity (18–20). Follow-ing the procedure described above, SIRV2-resistant cultures weregenerated for each of the pyrEF mutants. Single colonies were thenproduced from the cultures by streaking onto Gelrite plates toyield the purified resistant strains Res1 to Res10. The stability ofthe transposon insertions in the pyrEF genes was tested for each ofthe 10 pyrEF mutants (Sens1 to Sens10) and their correspondingSIRV2-resistant colonies (Res1 to Res10) by growing them in richmedia containing uracil (17) for 3 days without transfer, prior tototal DNA extraction and PCR amplification of the pyrEF regions.Each transposon insertion appeared to be stable, because no wild-type PCR bands were observed, except a weak wild-type bandproduced in Sens2, consistent with the undetectable reversionrates for Sulfolobus transposons recorded earlier (19, 20). SinceRes2 did not generate the wild-type band, the extra PCR productin Sens2 was probably due to a minor contamination of the colonyby wild-type cells (Fig. 2B).
Sens1, Sens3, Sens7, and Sens8 were selected for a transforma-tion test because they carried different transposons located at dif-ferent insertion sites (Fig. 2A). Shuttle vector pEXA was used fortransformation (21), and water was used in the negative control.While Sens7 and Sens8 appeared unstable after electroporation,Sens1 and Sens3 yielded transformants without colony formationin the negative control. Thus, we focused on Sens1 and its resistantmutants for further studies of SIRV2 susceptibility.
The SIRV2-resistant cells were enriched directly from theSIRV2-sensitive culture; therefore, the only selective pressure ap-peared to occur either upon SIRV2 infection or during virus-in-duced cell lysis. Moreover, since the active clustered regularly in-
terspaced short palindromic repeat (CRISPR) loci A, B, C, and Dwere all lost from the 5E6 host strain (16), the residual CRISPRloci E and F, which lack the spacer acquisition cas genes, wereunlikely to be responsible for the resistance (21, 22). Therefore, weinferred that resistance arose as a result of mutated host genes thatare important for the SIRV2 life cycle. To identify such mutations,the genomes of strains Sens1 and Res1 were resequenced by theuse of a Hiseq 2000 sequencer, yielding about 200-fold coverage.The sequencing reads of both strains were aligned with thegenome sequence of S. solfataricus P2 (23) using the R2R program(24) to identify mismatches as well as insertions and deletions.Mutations to the P2 genome in the resequenced genomes of Sens1and Res1 were then compared manually. Only one mutation wasdetected and constituted a single insertion of ISC1078 into Res1but not Sens1. The insertion was localized in sso3139, a gene en-coding a conserved hypothetical protein lying within an operon(Fig. 3A).
Next we tested whether other resistant strains also carried mu-tations in sso3139 or in other genes of the same operon by employ-ing a primer pair (5=-GCTACGCTTCTAACAAACCTAATCTGand 5=-CGAAACTTGCGAAACAACTACCT) designed to am-plify the whole operon region. After PCR amplification, restric-tion digestion, and sequencing, another 5 strains were shown tocontain mutations at different locations within sso3139 or the ad-jacent sso3140 (Fig. 3A). Interestingly, all the 6 mutations wereproduced by ISC1078 insertion (Fig. 3A) and appeared to be stablymaintained (Fig. 3C).
To identify possible mutations in the other 4 resistant strains,genome resequencing followed by PCR analyses of relevant geneswas performed (using primers 5=-GAGTCTGGGGAAAATCGGTAAAGTT and 5=-TGGCATTGTAACCCTAATTGCTTCT).These revealed IS element insertions in sso2387 of Res2 and Res10(Fig. 3B). Sso2387 in Sens2 and Sens10 contains 577 amino acids(aa) but only 283 aa in the sequenced S. solfataricus P2 genome(23). An analysis of the sequences around sso2387 in S. solfataricusP2 revealed that it is a partial gene that resulted from an ISC1225insertion (Fig. 3B), which could explain the resistance of the wild-type P2 strain to SIRV2 (13 and this work). Interestingly, an in-version in sso2386 was detected in Res7 whereas no mutationswere identified in Res9 that could be linked to SIRV2 resistance(Fig. 3B).
The frequently observed mutations in cluster sso3139 and
FIG 2 Analysis of the pyrEF mutants derived from S. solfataricus 5E6. (A) Types and insertion sites of transposons inserted in the pyrEF gene region of differentpyrEF mutants. (B) PCR amplification of the pyrEF region from Sens1 to Sens10 and from Res1 to Res10. wt, wild type.
SIRV2 Entry in Sulfolobus
September 2014 Volume 88 Number 17 jvi.asm.org 10265
on Novem
ber 2, 2014 by Copenhagen U
niversity Libraryhttp://jvi.asm
.org/D
ownloaded from
sso3140 and cluster sso2386 and sso2387 in the resistant strainsstrongly suggest that the two gene clusters are important for theSIRV2 life cycle. To confirm the implication of the mutationsin the gained resistance, genetic complementation was per-formed for the mutated genes. As described above, Sens1 ap-peared stable during genetic manipulation, and we thus se-lected Res1 for complementation of sso3139 mutation. Forcomplementation of mutations in the other gene cluster,Res1B, carrying an ISC1234 insertion in sso2387 (Fig. 3B), wasisolated from SIRV2-infected Sens1. Res1 cells were trans-formed with vector pEXA2 containing sso3139, and Res1B cellswere transformed with vector pEXA2 containing sso2386 andsso2387. After SIRV2 was added into the cultures, growth re-tardation occurred in the complemented cells, while the non-complemented culture, transformed with the empty vector,showed a growth rate similar to that of the uninfected culture(Fig. 4A and B). Further, Southern hybridization (17) using aprobe derived from the SIRV2 inverted terminal repeats (ITR)detected signals only from the complemented cells (Fig. 4C andD) and the multiple hybridized bands were consistent withongoing replication (Fig. 4E) (10, 25). The absence of SIRV2signal in the resistant strains indicates a defect in the virus lifecycle.
To gain insights into the functions of the two gene clusters, theprotein sequences of the genes were firstly analyzed by the use ofprogram TMHMM (http://www.cbs.dtu.dk/services/TMHMM-2.0/) for the possible presence of transmembrane helices. Sso3138,Sso3139, and Sso3140 were predicted to be primarily located ex-tracellularly (see Fig. S1 in the supplemental material), correlatingwith a previous prediction of the presence of class III signal pep-tides at their N termini (26). Among these, Sso3140 was confirmedto be a membrane-associated protein in a proteomics study (27).In contrast to the other 3 proteins, Sso3141 was predicted to con-
tain two transmembrane helices, one at the N terminus and theother at the C terminus, while the sequence between them waspresumed to be located intracellularly. Therefore, it appears thatthe proteins encoded in the operon form a membrane-associatedcell surface structure and may function as a receptor for SIRV2.Moreover, it was demonstrated recently that Sso2386 carries mul-tiple transmembrane helices and that Sso2387 constitutes anATPase associated with a type IV secretion system, and they weredesignated AapF and AapE, respectively (28). Further, homologsof both are essential for the formation of the adhesive type IVpilus of S. acidocaldarius (28). The association with the cell mem-brane of proteins encoded by both gene clusters strongly indicatestheir involvement in the entry process of SIRV2.
The failure of viral entry into Res1 and Res1B cells was fur-ther confirmed by reverse transcription-PCR analysis of one ofthe early genes, ORF131a (17). RNA extracted from cells takenat 15 min p.i. was DNase I treated and reverse transcribed(SuperScript II reverse transcriptase; Invitrogen). PCR per-formed on the cDNAs detected ORF131a only from Sens1 cells,while the positive-control sso0446 (tfb-1) gene was detected inall the 3 strains (Fig. 4F). This strongly supports the conclusionthat the proteins encoded by the two gene clusters are involvedin SIRV2 entry. A likely scenario is that gene cluster sso3138 tosso3141 encodes a surface receptor for SIRV2 and that genecluster sso2386 and sso2387 is involved in the secretion of thereceptor components.
Except in Escherichia coli, very few virus receptors are known inthe domains of Bacteria and Archaea (29). The primary receptorsfor E. coli filamentous phages are pili which retract toward the cellsurface, bringing the phages to the secondary receptor located inthe periplasm (30). Linear archaeal viruses, including rudiviruses,have been observed to attach to pili (5, 31, 32). Future work isneeded to determine the association of the two identified gene
FIG 3 Different mutations in the SIRV2-resistant strains and their stability. (A) Transposon insertions in sso3139 and sso3140. (B) Mutations in sso2386 andsso2387. (C) PCR amplification of the mutation region from different resistant strains.
Deng et al.
10266 jvi.asm.org Journal of Virology
on Novem
ber 2, 2014 by Copenhagen U
niversity Libraryhttp://jvi.asm
.org/D
ownloaded from
clusters with the structure of pili. To our knowledge, this is the firstwork providing genetic and biochemical evidence for a possiblereceptor system in archaeal virus entry.
ACKNOWLEDGMENTS
We thank Roger A. Garrett for critically reading the manuscript.This work is supported by a European Union FP7 grant (265933).
REFERENCES1. Prangishvili D, Forterre P, Garrett RA. 2006. Viruses of the Archaea: a
unifying view. Nat. Rev. Microbiol. 4:837– 848. http://dx.doi.org/10.1038/nrmicro1527.
2. Pina M, Bize A, Forterre P, Prangishvili D. 2011. The archeoviruses.FEMS Microbiol. Rev. 35:1035–1054. http://dx.doi.org/10.1111/j.1574-6976.2011.00280.x.
3. Peng X, Garrett RA, She Q. 2012. Archaeal viruses–novel, diverse andenigmatic. Sci. China Life Sci. 55:422– 433. http://dx.doi.org/10.1007/s11427-012-4325-8.
4. Erdmann S, Scheele U, Garrett RA. 2011. AAA ATPase p529 of Acidianustwo-tailed virus ATV and host receptor recognition. Virology 421:61– 66.http://dx.doi.org/10.1016/j.virol.2011.08.029.
5. Quemin ER, Lucas S, Daum B, Quax TE, Kuhlbrandt W, Forterre P,Albers SV, Prangishvili D, Krupovic M. 2013. First insights into the entryprocess of hyperthermophilic archaeal viruses. J. Virol. 87:13379 –13385.http://dx.doi.org/10.1128/JVI.02742-13.
6. Zillig W, Arnold HP, Holz I, Prangishvili D, Schweier A, Stedman K,She Q, Phan H, Garrett R, Kristjansson JK. 1998. Genetic elements in theextremely thermophilic archaeon Sulfolobus. Extremophiles 2:131–140.http://dx.doi.org/10.1007/s007920050052.
7. Fu CY, Johnson JE. 2012. Structure and cell biology of archaeal virusSTIV. Curr. Opin. Virol. 2:122–127. http://dx.doi.org/10.1016/j.coviro.2012.01.007.
8. Häring M, Vestergaard G, Rachel R, Chen L, Garrett RA, PrangishviliD. 2005. Virology: independent virus development outside a host. Nature436:1101–1102. http://dx.doi.org/10.1038/4361101a.
9. Bize A, Karlsson EA, Ekefjard K, Quax TE, Pina M, Prevost MC,Forterre P, Tenaillon O, Bernander R, Prangishvili D. 2009. A unique
FIG 4 Cluster sso3138 to sso3141 and cluster sso2386 and sso2387 are involved in SIRV2 entry. (A) Growth retardation of sso3139-complemented Res1 uponSIRV2 infection. Res1 (pEXA), Res1 transformed with expression vector pEXA; Res1 (pEXA3139), Res1 transformed with expression vector pEXA containingsso3139. (B) Growth retardation of sso2386-and-sso2387-complemented Res1B upon SIRV2 infection. Res1B (pEXA), Res1B transformed with expression vectorpEXA; Res1B (pEXA2386 –2387), Res1B transformed with expression vector pEXA containing sso2386 and sso2387. (C and D) Visualization of SIRV2 DNAreplication in Res1 (C) and Res1B (D) transformants infected with SIRV2. Plasmid constructs contained in the transformants are labeled on top of each lane, andthe sampling time p.i. is indicated as hours. L and R designate the left and right terminal fragments, respectively, after a double digestion with BamHI and HindIII(see panel E). (E) Schematic presentation of SIRV2 genomic map and the formation of terminal duplex replicative intermediates (2L and 2R), as describedpreviously (12). The locations of the probe (filled rectangle) in the termini are indicated. ITR, inverted terminal repeat. (F) RT-PCR amplification of sso0446(tfb-1) (left panel) and SIRV2 ORF131a transcript fragments (right panel). “�” and “�” indicate the presence and absence of reverse transcriptase (RT),respectively.
SIRV2 Entry in Sulfolobus
September 2014 Volume 88 Number 17 jvi.asm.org 10267
on Novem
ber 2, 2014 by Copenhagen U
niversity Libraryhttp://jvi.asm
.org/D
ownloaded from
virus release mechanism in the Archaea. Proc. Natl. Acad. Sci. U. S. A.106:11306 –11311. http://dx.doi.org/10.1073/pnas.0901238106.
10. Peng X, Blum H, She Q, Mallok S, Brugger K, Garrett RA, Zillig W,Prangishvili D. 2001. Sequences and replication of genomes of the ar-chaeal rudiviruses SIRV1 and SIRV2: relationships to the archaeal lipo-thrixvirus SIFV and some eukaryal viruses. Virology 291:226 –234. http://dx.doi.org/10.1006/viro.2001.1190.
11. Prangishvili D, Arnold HP, Gotz D, Ziese U, Holz I, Kristjansson JK,Zillig W. 1999. A novel virus family, the Rudiviridae: structure, virus-hostinteractions and genome variability of the sulfolobus viruses SIRV1 andSIRV2. Genetics 152:1387–1396.
12. Vestergaard G, Haring M, Peng X, Rachel R, Garrett RA, PrangishviliD. 2005. A novel rudivirus, ARV1, of the hyperthermophilic archaealgenus Acidianus. Virology 336:83–92. http://dx.doi.org/10.1016/j.virol.2005.02.025.
13. Vestergaard G, Shah SA, Bize A, Reitberger W, Reuter M, Phan H,Briegel A, Rachel R, Garrett RA, Prangishvili D. 2008. Stygiolobusrod-shaped virus and the interplay of crenarchaeal rudiviruses with theCRISPR antiviral system. J. Bacteriol. 190:6837– 6845. http://dx.doi.org/10.1128/JB.00795-08.
14. Blum H, Zillig W, Mallok S, Domdey H, Prangishvili D. 2001. Thegenome of the archaeal virus SIRV1 has features in common with genomesof eukaryal viruses. Virology 281:6 –9. http://dx.doi.org/10.1006/viro.2000.0776.
15. Prangishvili D, Koonin EV, Krupovic M. 2013. Genomics and biology ofRudiviruses, a model for the study of virus-host interactions in Archaea.Biochem. Soc. Trans. 41:443–450. http://dx.doi.org/10.1042/BST20120313.
16. Okutan E, Deng L, Mirlashari S, Uldahl K, Halim M, Liu C, Garrett RA,She Q, Peng X. 2013. Novel insights into gene regulation of the rudivirusSIRV2 infecting Sulfolobus cells. RNA Biol. 10:875– 885. http://dx.doi.org/10.4161/rna.24537.
17. Deng L, Zhu H, Chen Z, Liang YX, She Q. 2009. Unmarked genedeletion and host-vector system for the hyperthermophilic crenarchaeonSulfolobus islandicus. Extremophiles 13:735–746. http://dx.doi.org/10.1007/s00792-009-0254-2.
18. Martusewitsch E, Sensen CW, Schleper C. 2000. High spontaneousmutation rate in the hyperthermophilic archaeon Sulfolobus solfataricusis mediated by transposable elements. J. Bacteriol. 182:2574 –2581. http://dx.doi.org/10.1128/JB.182.9.2574-2581.2000.
19. Redder P, Garrett RA. 2006. Mutations and rearrangements in the ge-nome of Sulfolobus solfataricus P2. J. Bacteriol. 188:4198 – 4206. http://dx.doi.org/10.1128/JB.00061-06.
20. Blount ZD, Grogan DW. 2005. New insertion sequences of Sulfolobus:functional properties and implications for genome evolution in hyper-thermophilic archaea. Mol. Microbiol. 55:312–325. http://dx.doi.org/10.1111/j.1365-2958.2004.04391.x.
21. Gudbergsdottir S, Deng L, Chen Z, Jensen JV, Jensen LR, She Q,Garrett RA. 2011. Dynamic properties of the Sulfolobus CRISPR/Cas and
CRISPR/Cmr systems when challenged with vector-borne viral and plas-mid genes and protospacers. Mol. Microbiol. 79:35– 49. http://dx.doi.org/10.1111/j.1365-2958.2010.07452.x.
22. Erdmann S, Garrett RA. 2012. Selective and hyperactive uptake of foreignDNA by adaptive immune systems of an archaeon via two distinct mech-anisms. Mol. Microbiol. 85:1044 –1056. http://dx.doi.org/10.1111/j.1365-2958.2012.08171.x.
23. She Q, Singh RK, Confalonieri F, Zivanovic Y, Allard G, Awayez MJ,Chan-Weiher CC, Clausen IG, Curtis BA, De Moors A, Erauso G,Fletcher C, Gordon PM, Heikamp-de Jong I, Jeffries AC, Kozera CJ,Medina N, Peng X, Thi-Ngoc HP, Redder P, Schenk ME, Theriault C,Tolstrup N, Charlebois RL, Doolittle WF, Duguet M, Gaasterland T,Garrett RA, Ragan MA, Sensen CW, Van der Oost J. 2001. The completegenome of the crenarchaeon Sulfolobus solfataricus P2. Proc. Natl. Acad.Sci. U. S. A. 98:7835–7840. http://dx.doi.org/10.1073/pnas.141222098.
24. Skovgaard O, Bak M, Lobner-Olesen A, Tommerup N. 2011. Genome-wide detection of chromosomal rearrangements, indels, and mutations incircular chromosomes by short read sequencing. Genome Res. 21:1388 –1393. http://dx.doi.org/10.1101/gr.117416.110.
25. Oke M, Kerou M, Liu H, Peng X, Garrett RA, Prangishvili D, NaismithJH, White MF. 2011. A dimeric Rep protein initiates replication of a lineararchaeal virus genome: implications for the Rep mechanism and viralreplication. J. Virol. 85:925–931. http://dx.doi.org/10.1128/JVI.01467-10.
26. Szabó Z, Stahl AO, Albers SV, Kissinger JC, Driessen AJ, PohlschröderM. 2007. Identification of diverse archaeal proteins with class III signalpeptides cleaved by distinct archaeal prepilin peptidases. J. Bacteriol. 189:772–778. http://dx.doi.org/10.1128/JB.01547-06.
27. Pham TK, Sierocinski P, van der Oost J, Wright PC. 2010. Quantitativeproteomic analysis of Sulfolobus solfataricus membrane proteins. J. Pro-teome Res. 9:1165–1172. http://dx.doi.org/10.1021/pr9007688.
28. Henche AL, Ghosh A, Yu X, Jeske T, Egelman E, Albers SV. 2012.Structure and function of the adhesive type IV pilus of Sulfolobus acido-caldarius. Environ. Microbiol. 14:3188 –3202. http://dx.doi.org/10.1111/j.1462-2920.2012.02898.x.
29. Labrie SJ, Samson JE, Moineau S. 2010. Bacteriophage resistance mechanisms.Nat. Rev. Microbiol. 8:317–327. http://dx.doi.org/10.1038/nrmicro2315.
30. Rakonjac J, Bennett NJ, Spagnuolo J, Gagic D, Russel M. 2011. Fila-mentous bacteriophage: biology, phage display and nanotechnology ap-plications. Curr. Issues Mol. Biol. 13:51–76. http://www.horizonpress.com/cimb/v/v13/51.pdf.
31. Zillig W, Kletzin A, Schleper C, Holz I, Janekovic D, Hain J, Lanzendör-fer M, Kristjansson JK. 1993. Screening for Sulfolobales, their plasmidsand their viruses in Icelandic Solfataras. Syst. Appl. Microbiol. 16:609 –628. http://dx.doi.org/10.1016/S0723-2020(11)80333-4.
32. Bettstetter M, Peng X, Garrett RA, Prangishvili D. 2003. AFV1, a novelvirus infecting hyperthermophilic archaea of the genus acidianus. Virol-ogy 315:68 –79. http://dx.doi.org/10.1016/S0042-6822(03)00481-1.
Deng et al.
10268 jvi.asm.org Journal of Virology
on Novem
ber 2, 2014 by Copenhagen U
niversity Libraryhttp://jvi.asm
.org/D
ownloaded from