a model for the hepatitis b virus core protein: prediction of antigenic

6
The EMBO Journal vol.7 no.3 pp.819 - 824, 1988 A model for the hepatitis B virus core protein: prediction of antigenic sites and relationship to RNA virus capsid proteins Patrick Argos' and Stephen D.Fuller2 'Biocomputing Programme and 2Biological Structures Programme, EMBL, Postfach 10.2209, 6900 Heidelberg, FRG Communicated by K.Simons The sequences of the core proteins from several serotypes of human hepatitis B virus and related mammalian and avian hepadnaviruses are aligned with the vp3 capsid pro- tein of mengo virus, a picornavirus. The homology in- dicates an eight-stranded antiparallel $-barrel fold for the hepatitis protein, as observed in the tertiary struc- ture of the picornavirus protein. The locations of known antigenic sites and other modifications are consistent with this structure for the core protein. The predicted folding suggests additional exposed antigenic sites and supports an evolutionary relationship between this family of enveloped DNA viruses and enveloped and non-enveloped RNA viruses. Key words: antigenic sites/evolution/hepatitis hepadnavirus/ picornavirus/sequence homology Introduction Hepatitis B virus (HBV) belongs to a family of enveloped viruses, the hepadnaviruses, which employ unique replica- tion strategies at several stages of their life cycle (Ganem and Varmus, 1987; Summers, 1981; Marion and Robinson, 1983; Tiollais et al., 1981, 1985). Unusual features include the replication of the viral DNA through reverse transcrip- tion of an RNA intermediate, the presence of at least two reading frames in more than half of the genome and the use of multiple ATG codons within a single frame to produce multiple related proteins (Ganem and Varmus, 1987; Sum- mers, 1981; Marion and Robinson, 1983). Historically the proteins of the virus have been identified by their antigenici- ty. The S open reading frame gives rise to three translation products which share their carboxy-terminal sequence and a major antigenic determinant and are hence collectively named HBsAg. All three are integral membrane proteins in- corporated into the outer envelope of the virion. The C open reading frame gives rise to two major translation products: the 21 kd capsid protein which assembles to form the viral core and the pre-C protein which is secreted from the cell in a processed 16 kd form, HBeAg. These two forms share most of their sequence but are antigenically distinct. Anti- bodies against the virus core antigen, HBcAg, show little crossreactivity against the circulating form of the 16 kd pro- tein, HBeAg. Circulating HBeAg lacks the carboxy-terminal 35 amino acids of HBcAg (Takahashi et al., 1983) although the amino terminus is less well characterized. Harsh treatments of HBcAg can cause it to lose its reactivity with anti-HBcAg and acquire reactivity with anti-HBeAg (MacKay et al., 1981; Ohori et al., 1980, 1984; Takahashi et al., 1979). ©IRL Press Limited, Oxford, England Despite its intricate replication strategy, the structure of the hepatitis virion appears similar to that of a simple enveloped RNA virus (Ganem and Varmus, 1987). The HBcAg encapsidates the partially double stranded DNA to generate a 27 nm icosahedral capsid which has been reported to display T = 3 symmetry (Onodera et al., 1982). The assembled core then interacts with viral membrane glycopro- teins (the surface antigens, HBsAg) leading to envelopment and maturation of the complete virion. Our understanding of HBV has been aided by comparative studies of closely related hepatitis viruses (Summers, 1981; Marion and Robinson, 1983) in Eastern woodchuck (WHV), Beechey ground squirrel (GSHV) and Pekin duck (DHV). These viruses share their overall organization, genome struc- ture and many features of their replication strategy. There is sufficient sequence homology among the corresponding proteins to allow clear alignment and identification of con- served and variable regions. We have made use of a sen- sitive sequence alignment method (Argos, 1987) to identify an extensive sequence homology between the hepadnavirus capsid proteins and those of several picornavirus capsid pro- teins including the vp3 protein of mengo virus. This align- ment is used to infer the folding of the hepadnavirus capsid protein from the known 3-dimensional structure of mengo virus vp3 (Luo et al., 1987). We then examine this predicted folding in terms of the available evidence for exposed sites on the viral capsid and use it to predict other regions which should be exposed in the structure. The homology also sug- gests a possible evolutionary relationship between positive stranded RNA viruses and enveloped DNA viruses utiliz- ing an RNA intermediate for replication. Results Alignment of sequences The homology search between the hepadnavirus core pro- teins and other viral capsid proteins was performed with a sensitive amino acid sequence comparison procedure (Argos, 1987) as described in Materials and methods. Particularly good alignments were found with the sequence of the vp3 capsid protein of mengo virus (Luo et al., 1987) (Figure 1), poliovirus (Hogle et al., 1985), and human rhinovirus 14 (RHV) (Rossman et al., 1985). The quality of the mengo-HBcAg alignment is seen from the highest search peaks which are colinear with the main diagonal (Figure 1) and correspond to 4.6 -4.9 standard deviations (a) above the search matrix mean (Argos, 1987). Figure 2 shows the alignment of the hepadnavirus core proteins with the vp3's of the picornaviruses for which high resolution structures are available (Luo et al., 1987; Hogle et al., 1985; Rossman et al., 1985). The boxes indicate con- served regions between the mengo and WHV sequences. Two or more of the hepadnavirus proteins display residue conservation with two or more of the picornavirus proteins in 48% of the aligned positions which are marked by dots 819

Upload: dangdung

Post on 16-Jan-2017

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A model for the hepatitis B virus core protein: prediction of antigenic

The EMBO Journal vol.7 no.3 pp.819 - 824, 1988

A model for the hepatitis B virus core protein: predictionof antigenic sites and relationship to RNA virus capsidproteins

Patrick Argos' and Stephen D.Fuller2

'Biocomputing Programme and 2Biological Structures Programme,EMBL, Postfach 10.2209, 6900 Heidelberg, FRG

Communicated by K.Simons

The sequences of the core proteins from several serotypesof human hepatitis B virus and related mammalian andavian hepadnaviruses are aligned with the vp3 capsid pro-tein of mengo virus, a picornavirus. The homology in-dicates an eight-stranded antiparallel $-barrel fold forthe hepatitis protein, as observed in the tertiary struc-ture of the picornavirus protein. The locations of knownantigenic sites and other modifications are consistent withthis structure for the core protein. The predicted foldingsuggests additional exposed antigenic sites and supportsan evolutionary relationship between this family ofenveloped DNA viruses and enveloped and non-envelopedRNA viruses.Key words: antigenic sites/evolution/hepatitis hepadnavirus/picornavirus/sequence homology

Introduction

Hepatitis B virus (HBV) belongs to a family of envelopedviruses, the hepadnaviruses, which employ unique replica-tion strategies at several stages of their life cycle (Ganemand Varmus, 1987; Summers, 1981; Marion and Robinson,1983; Tiollais et al., 1981, 1985). Unusual features includethe replication of the viral DNA through reverse transcrip-tion of an RNA intermediate, the presence of at least tworeading frames in more than half of the genome and the use

of multiple ATG codons within a single frame to producemultiple related proteins (Ganem and Varmus, 1987; Sum-mers, 1981; Marion and Robinson, 1983). Historically theproteins of the virus have been identified by their antigenici-ty. The S open reading frame gives rise to three translationproducts which share their carboxy-terminal sequence anda major antigenic determinant and are hence collectivelynamed HBsAg. All three are integral membrane proteins in-corporated into the outer envelope of the virion. The C openreading frame gives rise to two major translation products:the 21 kd capsid protein which assembles to form the viralcore and the pre-C protein which is secreted from the cellin a processed 16 kd form, HBeAg. These two forms sharemost of their sequence but are antigenically distinct. Anti-bodies against the virus core antigen, HBcAg, show littlecrossreactivity against the circulating form of the 16 kd pro-tein, HBeAg. Circulating HBeAg lacks the carboxy-terminal35 amino acids of HBcAg (Takahashi et al., 1983) althoughthe amino terminus is less well characterized. Harshtreatments of HBcAg can cause it to lose its reactivity withanti-HBcAg and acquire reactivity with anti-HBeAg(MacKay et al., 1981; Ohori et al., 1980, 1984; Takahashiet al., 1979).

©IRL Press Limited, Oxford, England

Despite its intricate replication strategy, the structure ofthe hepatitis virion appears similar to that of a simpleenveloped RNA virus (Ganem and Varmus, 1987). TheHBcAg encapsidates the partially double stranded DNA togenerate a 27 nm icosahedral capsid which has been reportedto display T = 3 symmetry (Onodera et al., 1982). Theassembled core then interacts with viral membrane glycopro-teins (the surface antigens, HBsAg) leading to envelopmentand maturation of the complete virion.Our understanding ofHBV has been aided by comparative

studies of closely related hepatitis viruses (Summers, 1981;Marion and Robinson, 1983) in Eastern woodchuck (WHV),Beechey ground squirrel (GSHV) and Pekin duck (DHV).These viruses share their overall organization, genome struc-ture and many features of their replication strategy. Thereis sufficient sequence homology among the correspondingproteins to allow clear alignment and identification of con-served and variable regions. We have made use of a sen-sitive sequence alignment method (Argos, 1987) to identifyan extensive sequence homology between the hepadnaviruscapsid proteins and those of several picornavirus capsid pro-teins including the vp3 protein of mengo virus. This align-ment is used to infer the folding of the hepadnavirus capsidprotein from the known 3-dimensional structure of mengovirus vp3 (Luo et al., 1987). We then examine this predictedfolding in terms of the available evidence for exposed siteson the viral capsid and use it to predict other regions whichshould be exposed in the structure. The homology also sug-gests a possible evolutionary relationship between positivestranded RNA viruses and enveloped DNA viruses utiliz-ing an RNA intermediate for replication.

ResultsAlignment of sequencesThe homology search between the hepadnavirus core pro-teins and other viral capsid proteins was performed with asensitive amino acid sequence comparison procedure (Argos,1987) as described in Materials and methods.

Particularly good alignments were found with the sequenceof the vp3 capsid protein of mengo virus (Luo et al., 1987)(Figure 1), poliovirus (Hogle et al., 1985), and humanrhinovirus 14 (RHV) (Rossman et al., 1985). The qualityof the mengo-HBcAg alignment is seen from the highestsearch peaks which are colinear with the main diagonal(Figure 1) and correspond to 4.6 -4.9 standard deviations(a) above the search matrix mean (Argos, 1987).

Figure 2 shows the alignment of the hepadnavirus coreproteins with the vp3's of the picornaviruses for which highresolution structures are available (Luo et al., 1987; Hogleet al., 1985; Rossman et al., 1985). The boxes indicate con-served regions between the mengo and WHV sequences.Two or more of the hepadnavirus proteins display residueconservation with two or more of the picornavirus proteinsin 48% of the aligned positions which are marked by dots

819

Page 2: A model for the hepatitis B virus core protein: prediction of antigenic

P.Argos and S.D.Fuller

208

°fl 1#// / /Z/10

T 78-

52

26 .;/-0

0 26 52 78 104 130 156 182 208 234

mengo virus vp3

Fig. 1. Homology search matrix between wood chuck hepatitis virus (WHV) core protein sequence and the mengo vp3 capsid protein sequence. Thesearch window lengths ranged from 7 to 27 in steps of two; the search peaks are plotted over the entire window length with the largest valuedominating when overlap occurs. The peaks are given as the number of standard deviations (a) above the matrix mean for a given probe length.Symbols are used to indicate fractional or ranges for the search scores. The highest peaks are colinear with the main diagonal and strongly suggesthomology.

(:) below the sequences. The mengo-WHV homology is par-ticularly striking; 23% of the 160 aligned residues are iden-tical while 42% represent strongly conservative changes. Theregions marked with bars in Figure 2 define the positionsof the $-strands in the eight-stranded antiparallel fl-barrelof the mengo virus vp3 (Luo et al., 1987).The significance of the alignment was tested in two ways.

The first involved the statistics for the alignment shown inFigure 1. The mean correlation coefficient over five aminoacid characteristics for the aligned residues was 5.5a abovea control mean while 4.0a has been shown to be sufficientfor statistical significance (Argos, 1987). The second con-trol screened for a more significant alignment between WHVand all available sequences in the Protein IdentificationResource (PIR databank; Barker et al., 1987). An automatedprocedure (see Materials and methods) was used to com-pare the sequence ofWHV core protein to a database of 1010independent and non-homologous sequences, with para-meters appropriate for a full length mengo-WHV match. Thiscontrol supported the hypothesis of a structurally meaningfulalignment ofWHV core protein with mengo vp3 since thisalignment was in the top 2% of all pair comparisons. Visualinspection of the search matrices for all of these topalignments using the sensitive comparison procedure (Argos,1987) revealed no convincing distribution of high peakssimilar to the mengo-WHV match (Figure 1) where the threehighest peaks lie on an approximately colinear trace. Weconclude that the alignment is meaningful not only againstthe background of capsid viral sequences but also againstthat of all other known sequences.

820

A model for the folding of the hepadnavirus coreproteinThe homologies suggest that the hepadnavirus core proteinswill adopt the same f-strand architecture as the picomaviralproteins. The resultant predicted folding for the human HBVcore protein based on the established 3-dimensional struc-ture for mengo virus (Luo et al., 1987) is shown in Figure3. The predicted structure is less elaborate than that of mengovirus vp3 and lacks the atA helix between fC and flD. Theshorter loop regions on the left side of Figure 3 support thealignment as they would contribute to the tight interactionsites responsible for pentamer formation in a T = 3 virion(Luo et al., 1987; Hogle et al., 1985; Rossman et al., 1985;Fuller and Argos, 1987). A very striking difference fromthe picornavirus structure is in the non-homologous carboxy-terminal region of the core protein which is positively charg-ed and presumably fulfills the same function as the N-terminal arm of the picornaviruses in interacting with theviral nucleic acid.A stringent test of the model is consistency with the

available information about antigenic sites in the capsid pro-tein. During infection, antibodies are developed against bothHBcAg and HBeAg. Since the HBeAg sequence is foundin HBcAg, some crossreactivity is observed. Two com-plementary specificities (Williams and Le Bouvier, 1976)have been identified in antisera directed against HBeAg:HBeAg/ 1 and HBeAg/2. An antibody recognizing theHBeAg/1 epitope reacts with the assembled core proteinwhile one directed against the HBeAg/2 epitope only reactswith the core protein after it has been subjected to harsh

Page 3: A model for the hepatitis B virus core protein: prediction of antigenic

B1G F P V A {. A D G Y G G L V T T D P K T A D P V Y G K V Y N P P K T N Y P V; R F T N L L D V A EAAG L P V M N T P A; V N Q Y L T A D N F Q S P C A L P L F D V T P P I D I P G E V K N M M E L A E I DV L P T T T L P G S G Q F L T T D 0 R Q S P S i,. L P N Y E P T P R I H I P G K V h N L L E I I Q V D

V PAi P VST BAE H A G T S T L P D) T V P I Y G K T P V A P A N Y M m E LEIA 1 PM D WL) P |YK E-F |GA S Y LM D I D P Y K E F G S S Y Q L LM D I D P Y K E F G A T V E L LM D I N A S R - _ A L A N V Y0IBV~~ ~ ~ ~ ~ ~ MIDYEGAVL

:DI*AS *- :ALAMA

B1FMD - P T'F L RPO AI T M I P F D LHBV T L I P M N NMengo T F IGI - N KWHV N IF LP - - -GSHV N F LP - - -

HBA v F LP - - -

DHV D - LP - - -

F--F- - -

T T H TM -

-- D D G- A T K

K D E V NP N A

- - - - ILIIJL

- -D-IGs- -ATK]D

B2K P Y'V V T RA'-K N T M E M Y R V R LS Y L I P L N A- -V P M IF|AF S ND F W P DL|iN A -D F F P D L N AD F F P S V R D ---D F F P K I D D -

:FP :::A:-

ci- QN-QGTFDD T RLGILA K FD V S L A A K H M SS D K P H T A A S I L C L S L S P A S D P R L S-N R Q N E - - Q V F G T N L F I -G D G V F K--TFF-V FK P L AVYQ AT Ls-S c LA-|-L VlID MT A -T |A L Y EIJE E UL T G R E H--

L V D T A A A L Y E E E L T G R E H--L L D T A -S A L Y li E A L E S P E H _L V R D A K D A L E -P Y W K S D S I K K::LDT : :LYE; LT: E:

N T Y L S G 1 A Q Y Y'T Q Y s GH T M L G E I L N Y Y T H W A GT T L L G E I V Q Y Y T H W 6 GNTF LAA LM RNFAA Q Yri]G

C 6 P li H T A I QC S P H h T A 1 K QC 6 P H H T A L R Q

HI - V L - -I A T H F V D L I E.C~PhII.JTB..CSPMHT.A.H.

0T I N LS L K FS L K Fs |L V1YA IL VlICA L V CA I L CD - - F

TIN:

H F M F T G' S TT F L F C G S MS L M Y T G P AT t V P [; 1 G T AW D E L T - - -W E E L T---W G E L M---W Q T T Q G - M

ElD S'K A H Y M V A Y' I P PM A T G K L L V S Y A P PL S S A K L I L A Y T P PM M K G [FFLTIATYlT P P

-KEJ-L I A WI M S 6-R L I T W M :b E

T L A T W V G VH E I A E S L R A V I P P.:LAT:::.TLBTWVGV~~

G V E T P P DIT PG A D - - P P K K HG A R -- G P Q D RG 1 G - - K P T S RN ST - - E Q VN T T - -- E E V RN L E D --P A S RT T T P V P P A H L

F GFMD E E A A H C'I H A E W D T'G L NIS K F T F S I P Y V S A A D Y A Y T1A S D T

Polio K E A M L G T H V I W D I G L Q S S C T M V V P W I S N S T Y R Q T I D D S

HHV R E A M L G T H V V W D I G L Q S T I V M T I P W T S G V Q F R Y T D P D TMengoDQIA M QA T Y A I D L L N S Y S MF T VM P F P T H R[FR V G T D QWHV T III V NIH V N D T W G K S L W F H S CIL TlF G Q H T V - - - - -

GSHV R I I V D H V N N T W G L K V R Q T L W F H L S C L T F G Q H T V - - -

HBV D L V V S Y V N T N M G L K F R Q L L W F H I S C L T F G R E T V - - - - -

DHV K A Y A K I N E E S L D -R A R R L L W W H Y N C L L W G E A Q V - - - -

- A E TF T E GY T A S-A N I

- Q--I

T

T N V Q G h V C VG Y I S V F Y Q TG F L S C W Y Q TT N W D G W V T ME F L Vs F G l!!E F L -V S F G VE Y L -V S F G VN Y I - S R L R T

HFBD Y I T HM G K A - -

PoAi I v V P L S T PBMR S L I L P P E T T - -

Menrao W Q L FT P L|T P P GwV W I R ITP A P RGSHV W I R T P h P Y R - -

HBV W I R T P P A Y R - -

DHV L S T P E K Y R

I

-'E N D T L L V S A SR E M D I L G F V SG Q V Y L L S F I S

C T S K [I LT |N A

P P A I L ST P

-P P N A P I L S T L P

-G R D A P T I E A I T

AGK DACN DACP DSGK DE HT VE HT VETT VRPI Q

F E L R L' P I DF S V R L L R DF k L R L M K DFKLMLMnKD

I R R R G G - A1 R R R G G - 6V k R R (; r - sV A Q G G R - K

* t

P R-T T HT Q TP nAPR LLSR A AP R -T TT

-T QI E Q K -A L A QI S Q T V A L T EW nsP QR US P R R R T P 6 P R R RR S P R R R T P S PI R R R----R R T P S P R R R

G T R K P R G L E P R R R

FMDPol ioH R VMengoWHV R S Q S - P R R R R S Q S P S A N C

GSHAh R S Q S- P H R R R S Q S P A S N CHBV S Q S - P R H R R S Q S R E S Q C

DHV K V K T T V V Y G R R R S K S R E R R A P T P Q R A G S P L P R S S S H H R S P S P K

Fig. 2.Alignment of picornaviral (FMD, polio, HRV, mengo) and hepadnavirus sequences. The boxes indicated conserved residues between mengo

virus (Luo et al., 1987) and WHV (Galibert et al., 1982) using the conservation groups (KR), (ST), (QEDN), (AIVLMC), (PG) and (FWYH).

Positions which are conserved among two or more of the picornavirus sequences and two or more of the hepadnavirus sequences are indicated by

(:). The positions of the ,B-strands seen in the 3-dimensional structure of mengo virus vp3 are indicated by brackets above the sequence and named as

in Luo et al. (1987). Features in the hepadnavirus sequences are also shown. The position of the inserted residues in DHV is indicated by D (GYLI-

QHEEAEEIPLGDLFKHQEERIVSFQPDYPITARIH, DHV residues 131-169) (Mandart et al., 1984). With this exception, the complete sequences

of all of the proteins are displayed. The locations of the HBeAg/l and HBeAg/2 epitopes (Salfeld, 1985) are marked by el and e2 below the

sequences. The HBV sequence shown is that of serotype ayw (Galibert et al., 1979). Two positions of non-conservative changes in the sequenced

human serotypes [HBV serotypes ayw (Galibert et al., 179); adr4 Fujiyama HBV serotypes ayw (Galibert et al., 1979); adr4 (Fujiyama et al.,1983); adr, adw (Ono et al., 1983); adyw (Pasek et al., 1979)]; adw (Valenzuela et al., 1980) and the ayw variant (Bichko et al., 1985); DHV

(Mandart et al., 1984); WHV (Galibert et al., 1982) and GSHV (Seeger et al., 1984) are indicated by lower case letters and marked with arrows.

Residue 74 (v) varies among all of the human serotypes while residues 152 and 153 (rs) are the site of a two-residue insertion in several serotypes.

Both of these changes lie outside the ,B-strand regions. The sequences of GSHV are taken from Seeger et al. (1984). The vp3 sequences for

poliovirus, human rhinovirus 14 (HRV) and foot and mouth disease virus (FMD) are taken from Rossman et al. (1985).

treatments such as heating at 56°C for 4 h or treatment with3 M NH4SCN resulting in core disruption (MacKay et al.,1981; Ohori et al., 1980, 1984; Takahashi et al., 1979;Budkowski et al., 1979). Salfeld (1985) used analyses ofthe reactivity of fusion proteins and synthetic peptides to

show that the HBeAg/ 1 epitope is contained within the se-

quence 76-89 (marked by e in Figures 2 and 3). TheHBeAg/2 epitope involves sequences in the region 130-140(e2 in Figures 2 and 3) although it appears that other por-

tions of the molecule contribute to this epitope (Salfeld,

821

FMDPo!i oii H VMen6o

DBVDHA

Hepatitis B core protein

FMDPoi ioHRVMen6 OWHVGSHVlbVDH V

Page 4: A model for the hepatitis B virus core protein: prediction of antigenic

P.Argos and S.D.Fuller

Fig. 3. The predicted fold of the HBV core protein. The positions ofthe 3-strands and the epitopes are marked as described in the legend toFigure 2. The positions of the serine phosphorylation sites inferredfrom Feitelson et al. (1981, 1982a,b) are indicated by P. The figure issimilar to that for mengo virus vp3 in Figure 6 of Luo et al. (1987)except that certain regions have been altered to reflect the predicteddeletions. The nucleic acid containing interior of the core lies belowthe figure; the external surface of the core is the upper portion of thestructure. The 5-fold axis in a T = 3 icosahedron would be at the leftof the figure where some shortening of the loops relative to the mengovirus (Luo et al., 1987) is seen. The chain trace is presented here asan indication of the predicted hepatitis core fold. The details of theloop regions are completely undetermined by this work.

1985). The folding model of Figure 3 provides an explana-tion for the behavior of these epitopes. HBeAg/l is foundin the exposed region at the capsid surface between BE1 and(3F and so would be accessible in intact cores. In contrast,HBeAg/2 is predicted to involve portions of (3-strands anda loop region at the 5-fold interaction site which would beburied in the assembled core.

Recent work has explored the response of T-cells to pep-tides derived from the HBcAg sequence to understand thebasis for the differential response to HBcAg and HBeAg(Milich et al., 1987a,b). Two peptide sequences (120-140and 85-100) were shown to possess HBcAg-specific T-celldeterminants but the B-cell epitopes present on the peptideswere not exposed on native HBcAg. Antibodies to HBcAgdo not recognize these peptides. Correspondingly, antibodiesraised by immunization with each peptide react with it, butnot with intact HBcAg. These results are consistent with thepredicted folding shown in Figure 3. The sequence 85-100contains (3-strand F, the beginning of (3-strand G and thesmall intervening loop. The 120- 140 peptide would cor-respond to (-strand I and the short intervening loop andwould contain the e2 epitope described by Salfeld (1985).Both regions would be inaccessible in the folded structureof Figure 3. We also note that the removal of the carboxy-terminal 35 amino acids from HBcAg (Takahashi et al.,1983) would disrupt (3-strand I and could contribute to therearrangement of the structure which appears to have oc-curred in HBeAg.

Figure 3 shows the predicted folding for the mammalianhepadnavirus core proteins. The DHV core protein sequencecontains an insertion of 39 amino acids which should be ac-commodated by the structure. The folding shown in Figure3 would place the 39 residue insertion, marked by D, in theexternally exposed loop between (EI and (3F. This is nearthe position of the HBeAg/l epitope. This region is the siteof large insertions in several of the picornaviruses (Luo etal., 1987; Hogle et al., 1985; Rossman et al., 1985) in-dicating that an insertion here would not change the overall

folding of the protein. The effect of the insertion would beto maintain the same fold of the shell portion of the capsidprotein while elaborating the surface of the core particle.This predicted protrusion from the capsid surface could ex-plain the characteristic 'spikey' appearance of the DHV cores(Mason et al., 1980) which distinguish them from theirsmoother, mammalian counterparts (Onodera et al., 1982).The locations of the major phosphorylation sites in the core

protein are also consistent with the Figure 3 model. The in-tact core is capable of phosphorylating the assembled cap-sid protein at sites which are believed to lie within the capsid(Albin and Robinson, 1980). Detailed analysis has shownthat major phosphorylation sites are on serine residues andthat the major tryptic phosphopeptides in HBV and GSHVhave identical mobilities (Feitelson et al., 1981, 1982a,b).With knowledge of the two sequences (Galibert et al., 1979;Seger et al., 1984), the phosphorylated peptides can be iden-tified as HBV 168-172, 155-157 and 160-164 (orequivalently GSHV 202-206, 189-191 and 194-198). Allof these sequences would lie just beyond the end of 31 andhence be internal to the structure shown in Figure 3.

DiscussionThis paper presents evidence for a structural homology bet-ween the vp3 capsid protein of mengo virus and the coreprotein of the enveloped hepadnaviruses. The homology wasshown to be significant against the background of alignmentsof the hepadnavirus protein with viral capsid sequences andall unique protein sequences with PIR database (Barker etal., 1987). The alignment predicts a folding for the hepad-navirus core protein which is consistent with the availablelocalization data on accessible and non-accessible sites, thelocation of insertions in the duck hepatitis sequence and thelocation of phosphorylation sites in the protein.The model which we have derived is sufficiently detailed

to allow prediction of sites which should be exposed at thesurface of the intact core. The loop region between 3E 1 andfF, which contains the HBeAg/1 site, and that between (IB1and ,BB2 would be accessible at the surface of the core, andhence could serve as good antigenic sites on the intact core.The regions between (B2 and (C and between 3C and (3Dcan also be identified as exposed.

It is clear that nucleocapsid-specific immune responses areimportant in hepatitis B infection. This is true even thoughanti-HBcAg does not neutralize the intact HB virion. Im-munization with HBcAg has been reported to protect againstHBV infection (Geretv et al., 1979; Tabor and Gerety, 1984;Murray et al., 1984). The antigenic response to HBcAg iscomplex and differs from that to HBsAg, which is current-ly used as a vaccine. The humoral response to HBsAg isprimarily T-cell mediated and shows great variation with asignificant fraction of the human population responding poor-ly. Further, HBsAg appears to be a relatively poor inducerof cell-mediated response (Gerety et al., 1978, 1979). Incontrast, the nucleocapsid is both a T-cell dependent and T-cell independent antigen (Milich and McLaughlan, 1986) towhich virtually 100% of patients produce strong responses(Hoofnagle et al., 1978). The T-cell independent responseto HBcAg has been ascribed to its assembled form since itshares with other T-cell independent antigens a high parti-cle mol. wt and the presence of a repeating array of iden-tical antigenic determinants (Milich and McLaughlan, 1986).

822

Page 5: A model for the hepatitis B virus core protein: prediction of antigenic

Hepatitis B core protein

The response to the monomeric or degraded form of capsidis T-cell dependent. Gerety et al. (1978) have suggested thatthe protection afforded by vaccination with HBcAg may arisefrom its ability to induce a strong cellular response; Mondelliet al. (1982) have also implicated its role in HBV clearance.This response would be particularly important in halting aHBV infection once it had begun. Recent studies show thatsome of the T-cell response can be achieved by innocula-tion with peptides derived from the nucleocapsid protein se-quence (Milich and McLaughlin, 1986; Milich et al.,1987a,b). The intact nucleocapsid remains, however, themost efficient antigen, more so than the peptides and some100-fold more efficient than the particulate HBsAg (Milichet al., 1985, 1987a).We hope that our folding model will provide a useful struc-

tural framework for understanding the complex and multi-varied immune responses to HBcAg. It should also providea basis for utilizing the demonstrated potential of HBcAgas an inducer of immune response. It has been proposed thatHBsAg be used as a particulate carrier for multivalent vac-cines (Delpeyroux et al., 1986). HBcAg has been demons-trated to be more efficient in terms of antibody productionand T-cell activation and hence may provide a better antigencarrier (Milich et al., 1987b). The homology with viral pro-teins of known structure should allow a more rational designof modifications for the HBcAg structure to incorporate othersequences. In the case of HBsAg, much less structural in-formation is presently available.We have previously proposed homology between the

nucleocapsid proteins of the enveloped alphaviruses (e.g.Sindbis) and those of the non-enveloped picornaviruses(Fuller and Argos, 1987). The suggestion of an evolutionaryrelationship between these two families of positive strandRNA viruses is further supported by their similar genomeorganizations and homology between their polymerases. Thispaper extends this relationship to the core proteins of hepad-naviruses such that a structural similarity is suggested bet-ween enveloped DNA and RNA viruses and non-envelopedsimple RNA viruses. Is the observed similarity a result ofconvergent or divergent evolution? Despite the uniqueorganization of the hepadnavirus genome, these viruses sharesome features with RNA viruses (Ganem and Varmus,1987). Hepadnavirus replication utilizes an RNA in-termediate and their polymerase is homologous withretroviral reverse transcriptases (Toh et al., 1983; Millerand Robinson, 1985). This homology in combination withother similarities including U5-like sequence in HBV, a par-tial homology between the hepadnavirus core antigen andthe p30 gag protein of type C retroviruses and the codonusage in the X open reading frame have led to the proposalthat these two families share a common evolutionary origin(Miller and Robinson, 1985). The retrovirus reversetranscriptases are in turn partially homologous to the positivestrand RNA virus polymerases (Kamer and Argos, 1984).Further, the observation that the reverse transcription oc-curs in association with the capsid suggests that the core doespackage RNA during this stage of the hepadnavirus viruslife cycle. The divergent evolution perspective interpretsthese similarities as evidence that enveloped DNA and RNAviruses and non-enveloped RNA viruses arose from a com-mon ancestor which possessed a basic icosahedral capsid.This precursor eventually developed the ability to beenveloped by budding from host cells. In the case of the

DNA viruses, the incorporation of a reverse transcriptaseallowed development from the enveloped RNA viruses. Theconvergent evolution perspective views the hepadnavirusesas facing the same problem of encapsidating nucleic acidin an icosahedral shell as simple enveloped and non-enveloped RNA viruses with an eight-stranded f-barrel be-ing the best means to accomplish the task. Viewed fromeither perspective, the homology described here suggests thatthis structural motif may be employed by a much broaderrange of RNA and DNA viruses than previously suspected.

Materials and methodsThe homology search between the hepadnavirus core proteins and otherviral capsid proteins was performed with a sensitive amino acid sequencecomparison procedure (Argos, 1987) which combines two measures of se-quence homology: the 20 by 20 Dayhoff mutation matrix (Dayhoff et al.,1972) and the mean correlation coefficient over five residue characteristicsincluding fl-strand and reverse turn conformational preferences (Palau etal., 1982), residue bulk (Jones, 1975), amino acid refractivity index (Jones,1975), and the surrounding hydrophobicity (Manavalan and Ponnuswamy,1978). The search matrix for a sequence window or probe length combin-ed was constructed by averaging and scaling these two measures; the matrixscore is stored as the number of standard deviations (a) above the matrixmean. The final search matrix was produced by calculating individual stan-dard deviation matrices for each window length, displaying the scores ateach matrix position over the entire probe length and saving the maximumscore value when overlap occurred at a given search position from the severalprobe lengths. Such a matrix is displayed in Figure 1.

Each of nine hepadnavirus sequences [HBV serotype ayw (Galibert etal., 1979); adr4 (Fujiyama etal., 1983); adr, adw (Ono etal., 1983); adyw(Pasek et al., 1979)]; adw (Valenzuela et al., 1980) and the ayw variant(Bichko etal., 1985); DHV (Mandart etal., 1984); WHV (Galibert etal.,1982) and GSHV (Seeger et al., 1984) were compared with those of allidentified viral capsid proteins in the National Biomedical Research Foun-dation protein sequence data bank (NBRF) (Barker et al., 1987). The se-quences for the mengo virus (Luo et al., 1987), rhinovirus (Rossman etal., 1985) and poliovirus (Hogle et al., 1985) capsid proteins used wherethose for which the 3-dimensional structure has been determined when theydiffered from those in the database.The program MaxHomology by C.Sander (manuscript in preparation)

was used to search for sequence alignments between WHV and a non-viralprotein sequence. This automated procedure is based on the algorithm ofSmith and Waterman (1981) and uses the amino acid similarity table ofMcLachlan (1971). The gap weight and sea level parameters which governthe number of insertions and deletions in the aligned sequences and the lengthof an optimal alignment were those appropriate for the mengo-WHV align-ment shown in Figure 1. The protein sequences used were a homologizedset of the PIR database containing one representative from each of over1000 protein sequence families (P.McCaldon and P.Argos, manuscript sub-mitted) so that the statistics represent comparisons with unique sequences.

AcknowledgementsWe are pleased to acknowledge our colleague, Chris Sander from theBiocomputing Programme of EMBL, for his important contribution to thispaper through detailed discussions, useful suggestions and for performingthe control search of the database. The authors also acknowledge Don Ganemof the Departments of Microbiology and Medicine, University of Califor-nia Medical Center in San Francisco for many useful discussions and en-couragement during this work and Michael Nassal and Heinz Schaller ofthe Zentrum fur Molekular Biologie in Heidelberg for discussions and fordirecting us to the J.Salfeld thesis. We thank Patrick Charnay (EMBL) andMichael Nassal for critical readings of the manuscript and Christine Barberfor help with the word processing of the text.

ReferencesAlbin,C. and Robinson,W.S. (1980) J. Virol., 34, 297-302.Argos,P. (1987) J. Mol. Biol., 193, 385-396.Barker,W.C., Hunt,L.T., George,D.G., Yeh,L.S., Chen,H.R., Blom-

quist,M.C., Seibel-Ross,E.L., Hong,M.K., Bair,J.K., Chen,S.L. and

823

Page 6: A model for the hepatitis B virus core protein: prediction of antigenic

P.Argos and S.D.Fuller

Ledley,R.S. (1987) Protein Identification Resource. Release no. 8, Na-tional Biomedical Research Foundation, Georgetown University,Washington, DC.

Bichko,V., Pushko,P., Dreilina,D., Pumpen,P. and Gren,E. (1985) FEBSLett., 185, 208-212.

Budkowski,A., Kalinowskas,B. and Nowoslawski,A. (1979) J. Immunol.,123, 1415-1416.

Dayhoff,M.O., Eck,R.V. and Park,C.M. (1972) In Dayhoff,M.O. (ed.),Atlas ofProtein Sequence and Structure. National Biomedical ResearchFoundation, Georgetown University, Washington, DC, Vol. 5, 89-99.

Delpeyroux,F., Chenciner,N., Lim,A., Malpiece,Y., Blondel,B.,Crainic,R., Van der Werf,S. and Streeck,R.E. (1986) Science, 233,472-475.

Eddleston,A.L.W. and Williams,R. (1974) Lancet, ii, 1543-1545.Feitelson,M.A., Marion,P.L. and Robinson,W.S. (1981) J. Virol., 39,447-454.

Feitelson,M.A., Marion,P.L. and Robinson,W.S. (1982a) J. Virol., 43,687-696.

Feitelson,M.A., Marion,P.L. and Robinson,W.S. (1982b) J. Virol., 43,741-748.

Fujiyama,A., Miyanohara,A., Nozaki,C., Yoneyama,T., Ohtomo,N. andMatsubara,K. (1983) Nucleic Acids Res., 11, 4601 -4610.

Fuller,S.D. and Argos,P. (1987) EMBO J., 6, 1099-1105.Galibert,F., Mandart,E., Fitoussi,F., Tiollais,P. and Charnay,P. (1979)

Nature, 281, 646-650.Galibert,F., Chen,T.N. and Mandart,E. (1982) J. Virol., 41, 51-65.Ganem,D. and Varmus,H.E. (1987) Annu. Rev. Biochem., 56, 651-693.Gerety,R.J., Tabor,E., Hofnagle,J.H., Mitchell,F. and Barker,L.F. (1978)

In Vyas,G.N., Cohen,S.N. and Schmid,R. (eds), Viral Hepatitis. FranklinInstitute Press, Philadelphia, PA, pp. 121-138.

Gerety,R.J., Tabor,E., Purcell,R.H. and Tzerzar,F.J. (1979) J. Infect. Dis.,140, 642-648.

Hogle,J.M., Chow,M. and Filman,D.J. (1985) Science, 229, 1358-1365.Hoofnagle,J.H., Seeff,L.B., Bales,Z.B., Gerety,R.J. and Tablor,E. (1978)

In Vyas,G.N., Cohen,S.N. and Schmid,R. (eds), Viral Hepatitis. FranklinInstitute Press, Philadelphia, PA, pp. 219-242.

Jones,D.D. (1975) J. Theoret. Biol., 50, 167-183.Kamer,G. and Argos,P. (1984) Nucleic Acids Res., 12, 7269-7272.Luo,M., Vriend,G., Kamer,G., Minor,I., Arnold,E., Rossmann,M.G.,

Boege,U., Scraba,D.G., Duke,G.M. and Palmenberg,A.C. (1987)Science, 235, 182 -191.

MacKay,P., Lees,J. and Murray,K. (1981) J. Med. Virol., 8, 237-243.Manavalan,P. and Ponnuswamy,P.K. (1978) Nature, 275, 673-674.Mandart,E., Kay,A. and Galibert,F. (1984) Virology, 49, 782-792.Marion,P. and Robinson,W.S. (1983) Current Topics Microbiol. Immunol.,

105, 99-121.Mason,W.S., Seal,G. and Summers,J. (1980) J. Virol., 36, 829-836.McLachlan,A.D. (1971) J. Mol. Biol., 61, 409-424.Milich,D.R., Louie,R.E. and Chisari,F.V. (1985) J. Immunol., 134,4194-4200.

Milich,D.R. and McLaughlin,A. (1986) Science, 234, 1398-1401.Milich,D.R., McLaughlin,A., Moriarty,A. and Thornton,G.B. (1987a) J.

Immunol., 139, 1223-1231.Milich,D.R., McLaughlin,A., Thornton,G.B. and Hughes,J.L. (1987b)

Nature, 329, 547-549.Miller,R. and Robinson,W.S. (1985) Proc. NatI. Acad. Sci. USA, 83,

2531 -2535.Mondelli,M., Vergani,G.M., Alberti,A., Vergani,D., Portman,B., Eddles-

ton,A.W.F. and Williams,R. (1982) J. Immunol., 129, 2773-2777.Murray,K., Bruce,S.A., Hinnen,A., Wingfield,P., van Erd,P.M.C.A., de

Reus,A. and Schelleken,H. (1984) EMBO J., 3, 645-650.Ohori,H., Yamaki,M., Onadera,S., Yamada,E. and Ishida,N. (1980) In-

tervirology, 13, 74-82.Ohori,H., Shimizu,N., Yamada,E., Onodera,S. and Ishida,N. (1984) J.

Gen. Virol., 65, 405-414.Ono,Y., Onda,H., Sasada,R., Igarishi,K., Sugino,Y. and Nishioka,K.

(1983) Nucleic Acids Res., 11, 1747-1757.Onodera,S., Ohori,H., Yamaki,M. and Ishida,N. (1982) J. Med. Virol.,

10, 147-155.Palau,J., Argos,P. and Puigdomenech,P. (1982) Int. J. Pept. Protein Res.,

19, 394-401.Pasek,M., Goto,T., Gilbert,W., Zink,B., Schaller,H., MacKay,P., Leadbet-

ter,G. and Murray,K. (1979) Nature, 282, 575-579.Rossman,M.G., Arnold,E., Erickson,J.W., Frankenberger,E.A., Grif-

fith,J.P., Hecht,H.-J., Johnson,J.E., Kamer,G., Luo,M., Mosser,A.G.,Rueckert,R.R., Sherry,B. and Vriend,G. (1985) Nature, 317, 145-153.

Salfeld,J. (1985) Dissertation, Ruprecht-Karls-Universitiit, Hidelberg. Kapitel

2. Struktur der antigenen Determinanten des HBc- und HBeAg. pp.25-55.

Seeger,C., Ganem,D. and Varmus,H.E. (1984) Virology, 51, 367-375.Smith,T.F. and Waterman,M.S. (1981) J. Mol. Biol., 147, 195-197.Summer,J. (1981) Hepatology, 2, 179-183.Tabor,E. and Gerety,R.J. (1984) Lancet, i, 172-173.Takahashi,K., Akahane,J., Gotando,T., Miashiro,T., Imai,M., Miya-

kawa,Y. and Mayumi,M. (1979) J. Immunol., 122, 275 -279.Takahashi,K., Machida,A., Funatsu,G., Nomura,M., Usuda,S., Aoyogi,S.,

Tachiana,K. and Mizamoto,H. (1983) J. Immunol., 130, 2903 -2907.Tiollais,P., Chamay,P. and Vyas,G. (1981) Science, 213, 406-411.Tiollais,P., Pourcel,C. and Dejean,A. (1985) Nature, 317, 489-495.Toh,H., Hayashida,H. and Miyata,T. (1983) Nature, 305, 827-829.Valenzuela,P., Quiroga,M., Zaldivar,J., Gray,P. and Rutter,W.J. (1980)

In Field,B.N., Jaenisch,R. and Fox,C.F. (eds), Animal Virus Genetics.Academic Press, New York, pp. 57-70.

Williams,A. and Le Bouvier,G.L. (1976) Bibliotheca Haematologica, 42,71-75.

Received on December 22, 1987

824