lampreys, the jawless vertebrates, contain only two parahox … · lampreys, the jawless...

6
Lampreys, the jawless vertebrates, contain only two ParaHox gene clusters Huixian Zhang a,b,1 , Vydianathan Ravi a,1 , Boon-Hui Tay a , Sumanty Tohari a , Nisha E. Pillai a , Aravind Prasad a , Qiang Lin b , Sydney Brenner a,2 , and Byrappa Venkatesh a,c,2 a Institute of Molecular and Cell Biology, Agency for Science, Technology and Research, Biopolis, Singapore 138673, Singapore; b Chinese Academy of Sciences (CAS) Key Laboratory of Tropical Marine Bioresources and Ecology, South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou 510301, China; and c Department of Paediatrics, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 119228, Singapore Contributed by Sydney Brenner, July 6, 2017 (sent for review March 20, 2017; reviewed by José Luis Gómez-Skarmeta and Nipam H. Patel) ParaHox genes (Gsx, Pdx, and Cdx) are an ancient family of develop- mental genes closely related to the Hox genes. They play critical roles in the patterning of brain and gut. The basal chordate, amphioxus, con- tains a single ParaHox cluster comprising one member of each family, whereas nonteleost jawed vertebrates contain four ParaHox genomic loci with six or seven ParaHox genes. Teleosts, which have experienced an additional whole-genome duplication, contain six ParaHox genomic loci with six ParaHox genes. Jawless vertebrates, represented by lam- preys and hagfish, are the most ancient group of vertebrates and are crucial for understanding the origin and evolution of vertebrate gene families. We have previously shown that lampreys contain six Hox gene loci. Here we report that lampreys contain only two ParaHox gene clusters (designated as α- and β-clusters) bearing five ParaHox genes (Gsxα, Pdxα, Cdxα, Gsxβ, and Cdxβ). The order and orientation of the three genes in the α-cluster are identical to that of the single cluster in amphioxus. However, the orientation of Gsxβ in the β-cluster is inverted. Interestingly, Gsxβ is expressed in the eye, unlike its homo- logs in jawed vertebrates, which are expressed mainly in the brain. The lamprey Pdxα is expressed in the pancreas similar to jawed vertebrate Pdx genes, indicating that the pancreatic expression of Pdx was ac- quired before the divergence of jawless and jawed vertebrate lineages. It is likely that the lamprey Pdxα plays a crucial role in pancreas spec- ification and insulin production similar to the Pdx of jawed vertebrates. jawless vertebrate | jawed vertebrates | ParaHox gene cluster | pancreas T he ParaHox gene cluster is an evolutionary sister of the Hox gene cluster and comprises three physically linked homeobox gene families: Gsx (known as GSH in human), Pdx (aka Xlox or Ipf1), and Cdx. The clustered organization of these genes was first discovered in the cephalochordate amphioxus, although their origin predates the origin of sponges (1). They encode homeodomain- containing transcription factors that are involved in the patterning of the nervous system and the gut. Gsx is known to be involved in patterning of the brain (2), whereas Pdx is involved in the devel- opment of the invertebrate gut and vertebrate pancreas (3, 4). Cdx genes, in contrast, are involved in the development of the posterior gut, anus, and posterior neural tube (5). The cephalochordate amphioxus possesses a single ParaHox cluster containing three genes, Gsx, Xlox, and Cdx, arranged in that order, with Cdx present in the opposite orientation to the other two genes (6). In contrast to amphioxus, vertebrates possess additional members of ParaHox genes distributed on four or more chromo- some locations. For example, the human genome contains six ParaHox genes comprising an intact cluster (GSX1, PDX1, and CDX2) located on chromosome 13, a GSX2 (aka GSH2) located on chromosome 4, a CDX1 gene on chromosome 5, and a third CDX gene, termed CDX4, on chromosome X (Fig. 1) (7). The duplicate copies of genes flanking the paralogous ParaHox genes on the four human chromosomes provide support to the hypothesis that the four ParaHox gene loci are the result of two rounds of whole- genome duplication (WGD) at the base of the vertebrates (7). Cartilaginous fishes (Chondrichthyes) such as the lesser spotted dogfish, Scyliorhinus canicula; little skate, Leucoraja erinacea; and elephant shark, Callorhinchus milii; as well as the lobe-finned fish, coelacanth (Latimeria chalumnae), and spotted gar (Lepisosteus oculatus), a basal ray-finned fish (Actinopterygian), possess an ad- ditional Pdx gene (called Pdx2) linked to Gsx2 (Fig. 1) (810). In- terestingly, teleosts that have experienced an additional round of genome duplication (3R) possess only six ParaHox genes distrib- uted in six genomic loci (11, 12) (Fig. 1). As a consequence of the secondary gene loss, teleosts do not contain an intact cluster bearing the three ParaHox genes and lack the Pdx2 gene (Fig. 1). On the basis of the number and genomic organization of ParaHox genes in jawed vertebrates (gnathostomes), it can be inferred that the com- mon ancestor of gnathostomes contained at least seven ParaHox genes located on four chromosomes. Jawless vertebrates are the most ancient lineage of verte- brates. The extant jawless vertebrates (cyclostomes) are repre- sented by two lineages, the lampreys and hagfishes, which together form a monophyletic group (13). They possess primitive morphological features such as a single median nostrillocated dorsally compared with a pair of nostrils located ventrally in gnathostomes. In addition, they lack paired appendages, miner- alized tissues, hinged jaws, and a spleen (14). Although cyclo- stomes possess complex brains divided into forebrain, midbrain, and hindbrain, similar to in gnathostomes, some subregions of the cyclostome brains are not as elaborate as those seen in gnathostomes (15, 16). The intermediate phylogenetic position of cyclostomes between invertebrates and gnathostomes, and Significance Lampreys and hagfishes are the only living members of jawless vertebrates, the most ancient lineage of vertebrates, and are therefore a crucial group for understanding the evolution of vertebrates. ParaHox genes (Gsx, Pdx, and Cdx) are an impor- tant family of developmental genes that play critical roles in the patterning of brain, pancreas, and posterior gut of jawed vertebrates. Here we show that lampreys contain two ParaHox gene clusters compared with four ParaHox loci in most jawed vertebrates. The lamprey Gsxβ gene is expressed specifically in the eye, an unusual expression domain for Gsx genes. The pancreatic expression of the lamprey Pdx gene suggests the crucial role of Pdx in pancreas specification and insulin pro- duction evolved in the common ancestor of vertebrates. Author contributions: S.B. and B.V. designed research; H.Z., V.R., B.-H.T., S.T., N.E.P., and A.P. performed research; V.R., Q.L., S.B., and B.V. analyzed data; and B.V. wrote the paper. Reviewers: J.L.G.-S., Consejo Superior de Investigaciones Científicas and Universidad Pablo de Olavide; and N.H.P., University of California, Berkeley. The authors declare no conflict of interest. Data deposition: The sequences reported in this paper have been deposited in the GenBank database (accession nos. KY563327KY563336). 1 H.Z. and V.R. contributed equally to this work. 2 To whom correspondence may be addressed. Email: [email protected] or [email protected]. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1704457114/-/DCSupplemental. 91469151 | PNAS | August 22, 2017 | vol. 114 | no. 34 www.pnas.org/cgi/doi/10.1073/pnas.1704457114 Downloaded by guest on June 15, 2020

Upload: others

Post on 08-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lampreys, the jawless vertebrates, contain only two ParaHox … · Lampreys, the jawless vertebrates, contain only two ParaHox gene clusters Huixian Zhanga,b,1, Vydianathan Ravia,1,

Lampreys, the jawless vertebrates, contain only twoParaHox gene clustersHuixian Zhanga,b,1, Vydianathan Ravia,1, Boon-Hui Taya, Sumanty Toharia, Nisha E. Pillaia, Aravind Prasada, Qiang Linb,Sydney Brennera,2, and Byrappa Venkatesha,c,2

aInstitute of Molecular and Cell Biology, Agency for Science, Technology and Research, Biopolis, Singapore 138673, Singapore; bChinese Academy of Sciences(CAS) Key Laboratory of Tropical Marine Bioresources and Ecology, South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou510301, China; and cDepartment of Paediatrics, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 119228, Singapore

Contributed by Sydney Brenner, July 6, 2017 (sent for review March 20, 2017; reviewed by José Luis Gómez-Skarmeta and Nipam H. Patel)

ParaHox genes (Gsx, Pdx, and Cdx) are an ancient family of develop-mental genes closely related to the Hox genes. They play critical roles inthe patterning of brain and gut. The basal chordate, amphioxus, con-tains a single ParaHox cluster comprising one member of each family,whereas nonteleost jawed vertebrates contain four ParaHox genomicloci with six or seven ParaHox genes. Teleosts, which have experiencedan additional whole-genome duplication, contain six ParaHox genomicloci with six ParaHox genes. Jawless vertebrates, represented by lam-preys and hagfish, are the most ancient group of vertebrates and arecrucial for understanding the origin and evolution of vertebrate genefamilies. We have previously shown that lampreys contain six Hoxgene loci. Here we report that lampreys contain only two ParaHoxgene clusters (designated as α- and β-clusters) bearing five ParaHoxgenes (Gsxα, Pdxα, Cdxα, Gsxβ, and Cdxβ). The order and orientationof the three genes in the α-cluster are identical to that of the singlecluster in amphioxus. However, the orientation of Gsxβ in the β-clusteris inverted. Interestingly, Gsxβ is expressed in the eye, unlike its homo-logs in jawed vertebrates, which are expressed mainly in the brain. Thelamprey Pdxα is expressed in the pancreas similar to jawed vertebratePdx genes, indicating that the pancreatic expression of Pdx was ac-quired before the divergence of jawless and jawed vertebrate lineages.It is likely that the lamprey Pdxα plays a crucial role in pancreas spec-ification and insulin production similar to the Pdx of jawed vertebrates.

jawless vertebrate | jawed vertebrates | ParaHox gene cluster | pancreas

The ParaHox gene cluster is an evolutionary sister of the Hoxgene cluster and comprises three physically linked homeobox

gene families: Gsx (known as GSH in human), Pdx (aka Xlox orIpf1), and Cdx. The clustered organization of these genes was firstdiscovered in the cephalochordate amphioxus, although their originpredates the origin of sponges (1). They encode homeodomain-containing transcription factors that are involved in the patterningof the nervous system and the gut. Gsx is known to be involved inpatterning of the brain (2), whereas Pdx is involved in the devel-opment of the invertebrate gut and vertebrate pancreas (3, 4). Cdxgenes, in contrast, are involved in the development of the posteriorgut, anus, and posterior neural tube (5).The cephalochordate amphioxus possesses a single ParaHox

cluster containing three genes, Gsx, Xlox, and Cdx, arranged in thatorder, with Cdx present in the opposite orientation to the other twogenes (6). In contrast to amphioxus, vertebrates possess additionalmembers of ParaHox genes distributed on four or more chromo-some locations. For example, the human genome contains sixParaHox genes comprising an intact cluster (GSX1, PDX1, andCDX2) located on chromosome 13, aGSX2 (akaGSH2) located onchromosome 4, a CDX1 gene on chromosome 5, and a third CDXgene, termed CDX4, on chromosome X (Fig. 1) (7). The duplicatecopies of genes flanking the paralogous ParaHox genes on the fourhuman chromosomes provide support to the hypothesis that thefour ParaHox gene loci are the result of two rounds of whole-genome duplication (WGD) at the base of the vertebrates (7).Cartilaginous fishes (Chondrichthyes) such as the lesser spotteddogfish, Scyliorhinus canicula; little skate, Leucoraja erinacea; and

elephant shark, Callorhinchus milii; as well as the lobe-finned fish,coelacanth (Latimeria chalumnae), and spotted gar (Lepisosteusoculatus), a basal ray-finned fish (Actinopterygian), possess an ad-ditional Pdx gene (called Pdx2) linked to Gsx2 (Fig. 1) (8–10). In-terestingly, teleosts that have experienced an additional round ofgenome duplication (3R) possess only six ParaHox genes distrib-uted in six genomic loci (11, 12) (Fig. 1). As a consequence of thesecondary gene loss, teleosts do not contain an intact cluster bearingthe three ParaHox genes and lack the Pdx2 gene (Fig. 1). On thebasis of the number and genomic organization of ParaHox genes injawed vertebrates (gnathostomes), it can be inferred that the com-mon ancestor of gnathostomes contained at least seven ParaHoxgenes located on four chromosomes.Jawless vertebrates are the most ancient lineage of verte-

brates. The extant jawless vertebrates (cyclostomes) are repre-sented by two lineages, the lampreys and hagfishes, whichtogether form a monophyletic group (13). They possess primitivemorphological features such as a single median “nostril” locateddorsally compared with a pair of nostrils located ventrally ingnathostomes. In addition, they lack paired appendages, miner-alized tissues, hinged jaws, and a spleen (14). Although cyclo-stomes possess complex brains divided into forebrain, midbrain,and hindbrain, similar to in gnathostomes, some subregions ofthe cyclostome brains are not as elaborate as those seen ingnathostomes (15, 16). The intermediate phylogenetic positionof cyclostomes between invertebrates and gnathostomes, and

Significance

Lampreys and hagfishes are the only living members of jawlessvertebrates, the most ancient lineage of vertebrates, and aretherefore a crucial group for understanding the evolution ofvertebrates. ParaHox genes (Gsx, Pdx, and Cdx) are an impor-tant family of developmental genes that play critical roles inthe patterning of brain, pancreas, and posterior gut of jawedvertebrates. Here we show that lampreys contain two ParaHoxgene clusters compared with four ParaHox loci in most jawedvertebrates. The lamprey Gsxβ gene is expressed specifically inthe eye, an unusual expression domain for Gsx genes. Thepancreatic expression of the lamprey Pdx gene suggests thecrucial role of Pdx in pancreas specification and insulin pro-duction evolved in the common ancestor of vertebrates.

Author contributions: S.B. and B.V. designed research; H.Z., V.R., B.-H.T., S.T., N.E.P., and A.P.performed research; V.R., Q.L., S.B., and B.V. analyzed data; and B.V. wrote the paper.

Reviewers: J.L.G.-S., Consejo Superior de Investigaciones Científicas and Universidad Pablode Olavide; and N.H.P., University of California, Berkeley.

The authors declare no conflict of interest.

Data deposition: The sequences reported in this paper have been deposited in the GenBankdatabase (accession nos. KY563327–KY563336).1H.Z. and V.R. contributed equally to this work.2To whom correspondence may be addressed. Email: [email protected] [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1704457114/-/DCSupplemental.

9146–9151 | PNAS | August 22, 2017 | vol. 114 | no. 34 www.pnas.org/cgi/doi/10.1073/pnas.1704457114

Dow

nloa

ded

by g

uest

on

June

15,

202

0

Page 2: Lampreys, the jawless vertebrates, contain only two ParaHox … · Lampreys, the jawless vertebrates, contain only two ParaHox gene clusters Huixian Zhanga,b,1, Vydianathan Ravia,1,

their distinctive morphological features, make them a crucialgroup for understanding the origin of additional copies ofParaHox genes and their functional diversification in verte-brates. A single ParaHox cluster has been identified in twospecies of hagfish: the inshore hagfish (Eptatretus burgeri) andthe Atlantic hagfish (Myxine glutinosa) (17). Interestingly, thiscluster contains a Gsx and a Cdx gene with a pseudogenized Pdxgene in the intergenic region. The organization and comple-ment of ParaHox genes in lampreys remains unknown. In thisstudy, we have carried out an exhaustive search for ParaHoxgenes in the genome assemblies of two species of lamprey (theJapanese lamprey, Lethenteron japonicum, and the sea lamprey,Petromyzon marinus) and RNA-seq data from lamprey brain,eye, intestine, and pancreas. Our analyses provide evidence forthe presence of five ParaHox genes organized into two clustersin lampreys.

ResultsParaHox Genes in the Japanese Lamprey Genome. We searched thegermline genome assembly of the Japanese lamprey (jlampreygenome.imcb.a-star.edu.sg/) for ParaHox genes, using human, coelacanth,and elephant shark ParaHox proteins as TBLASTN queries.On the basis of a combination of sequence analysis and RACE

and/or reverse-transcription PCR (RT-PCR; SI Appendix, Ma-terials and Methods), we identified five ParaHox genes orga-nized in two clusters in the Japanese lamprey genome. Becausewe could not determine the orthology of these genes to Para-Hox genes in gnathostomes (see following sections), we designatedthe Gsx, Pdx, and Cdx genes on scaffold_1054+1722 as Gsxα, Pdxα,and Cdxα (α-cluster), and the Gsx and Cdx genes on scaffold_14 asGsxβ and Cdxβ (β-cluster) (Fig. 2). In addition to genome assemblysearches, we carried out degenerate PCR, using genomic DNA as atemplate to determine whether there were any additional ParaHoxgenes in Japanese lamprey (Materials and Methods). The de-generate PCRs were able to amplify fragments ofGsxα, Pdxα, Cdxα,and Cdxβ genes, but not Gsxβ, because of the divergence of itssequences corresponding to the forward primers and the presenceof a lamprey-specific intron (2.9 kb) in the region targeted by PCRprimers (Materials and Methods). Nevertheless, no new ParaHoxgenes were identified in the PCR products.

ParaHox Genes in the Sea Lamprey Genome. Searches for ParaHoxgenes in the genome assembly of the sea lamprey (www.ensembl.org/Petromyzon_marinus/Info/Index) identified Gsxβ and Cdxβgenes on scaffold_GL477479 (Fig. 2), and the first exon of Pdxαon scaffold_GL481998. The intergenic region of the sea lampreyGsxβ and Cdxβ genes spanned ∼22 kb and contained three gaps(100 bp, 377 bp, and 100 bp). We filled these gaps by genomicPCR (GenBank accession numbers: KX400882–KX400884) andnoted that the intergenic region does not contain a Pdx gene. It istherefore likely that the intergenic region of Japanese lampreyGsxβ and Cdxβ also does not contain a Pdx gene.

ParaHox Genes in Lamprey RNA-seq Data. To determine whetherthere are any more ParaHox genes that are not represented inthe genome assembly or in degenerate PCR products, we gen-erated RNA-seq data for four tissues (brain, eye, and intestinefrom Japanese lamprey pancreas from brook lamprey, Lampetraplaneri) that were identified as the main tissues expressingParaHox genes in lampreys by RT-PCR analysis (Expression Patternsof Lamprey ParaHox Genes). Searches of the assembled tran-scripts revealed transcripts for Gsxα in the brain, Gsxβ in the eye,Cdxα and Cdxβ in the intestine, and Pdxα in the pancreas. Wealso searched RNA-seq transcripts of sea lamprey embryos fromstages 23–25 and 26–28 (18) and identified fragments of tran-scripts for Cdxβ, Gsxα, and Pdxα (SI Appendix, Sequence File).We generated full-length cDNA sequence for sea lamprey Gsxα,Pdxα, and Cdxβ by 5′/3′-RACE, using total RNA from embryostages 26–28 as a template (Materials and Methods). Overall, ouranalysis of genome and transcriptome sequences indicate thatlampreys contain only five ParaHox genes (Gsxα, Gsxβ, Pdxα,Cdxα, and Cdxβ) that are organized in two clusters (Fig. 2).

Phylogenetic Analysis and Synteny. To determine the orthology oflamprey and gnathostome ParaHox genes, we carried out phy-logenetic analysis (Bayesian Inference) of ParaHox genes fromlamprey, hagfish, and representative gnathostomes, with am-phioxus as the outgroup. However, phylogenetic analysis was notinformative, as the lamprey and hagfish (cyclostomes) genesgenerally formed an exclusive cluster outside the gnathostomeclades (SI Appendix, Fig. S1). This pattern of independent clus-tering of cyclostome sequences outside gnathostomes sequencessuggests the duplication of cyclostome sequences occurred in-dependent from the duplication event in gnathostomes. How-ever, this clustering pattern is likely to be an artifact because ofthe high GC-content of the lamprey genome, which affects thecodon use pattern and amino acid composition of lampreyprotein-coding sequences. As a result, sequences of lampreyparalogues are more similar to each other than to their orthologsin gnathostomes. A similar pattern of exclusive clustering hasbeen previously observed for several lamprey genes such as Hox,

Fig. 1. ParaHox gene loci in jawed vertebrates (gnathostomes). Organiza-tion of ParaHox gene loci in representative gnathostomes. Genes aredepicted as block arrows, with the direction of arrows denoting the transcrip-tional orientation. The star represents the whole-genome duplication in theteleost lineage. GSX gene in human is known as GSH. Chromosomal/scaffoldnumbers of the ParaHox genes are shown at the right of each locus.

Zhang et al. PNAS | August 22, 2017 | vol. 114 | no. 34 | 9147

EVOLU

TION

Dow

nloa

ded

by g

uest

on

June

15,

202

0

Page 3: Lampreys, the jawless vertebrates, contain only two ParaHox … · Lampreys, the jawless vertebrates, contain only two ParaHox gene clusters Huixian Zhanga,b,1, Vydianathan Ravia,1,

KCNA, and the p53 family (19–21). Thus, phylogenetic analysiswas not useful in assigning orthology between lamprey and gna-thostome ParaHox genes. Nevertheless, phylogenetic analysis wasable to infer the orthology between Japanese lamprey and sea lam-prey sequences and between lamprey and hagfish sequences withhigh statistical support (posterior probability values, 0.96–1.0; SIAppendix, Fig. S1). Thus, we infer that the hagfish Gsx and Cdx genes(SI Appendix, Fig. S2) are orthologs of Japanese lamprey Gsxα andCdxα, respectively.Japanese lamprey Gsxβ and Cdxβ genes are located on a large

scaffold (5.3 Mb) and are flanked by several genes. We comparedthe synteny of the flanking genes with those flanking various gna-thostome ParaHox genes to see whether synteny provides any clueto the orthology of these lamprey ParaHox genes. However, thecombination of paralogous genes flanking the lamprey Gsxβ andCdxβ genes is not found in any gnathostome ParaHox locus. Forexample, homologs of lamprey Pan3, Mtif3, and Gtf3a found in thislocus (scaffold_14) are present in the gnathostome Gsx1-Pdx1-Cdx2locus (SI Appendix, Fig. S2), whereas homologs of lamprey Kit andRasl11b, also found on this locus (scaffold_14), are present in thegnathostome (e.g., elephant shark) Gsx2-Pdx2 locus (SI Appendix,Fig. S2). This mixed pattern of syntenic genes in the lamprey sug-gests paralogous genes flanking the ParaHox genes were lost in-dependently in the lamprey after it diverged from the gnathostomelineage. As such, synteny is also not useful in informing theorthology of lamprey and gnathostome ParaHox genes.

Genomic Organization of Lamprey ParaHox Genes. The Japaneselamprey Gsxα and Pdxα are on the same strand of DNA, whereasCdxα is on the opposite strand, similar to the organization of Gsx,Pdx, and Cdx genes in amphioxus and gnathostomes. However, theJapanese lampreyGsxβ gene is on the same strand as Cdxβ, which isan unusual arrangement. The inverted orientation ofGsxβ indicatesthis gene underwent a local inversion in the lamprey lineage. Theprotein encoded by the lamprey Gsxβ contains several amino acidsubstitutions in the homeodomain (14 of 60) that is nearly totallyconserved in the Gsx proteins of all other chordates, including Gsxαof lamprey (Fig. 3). These substitutions might have altered theDNA binding specificity of the lamprey Gsxβ protein. The lampreyGsxβ, Cdxβ, and Pdxα genes are also unusual, in that they eachcontain an extra intron compared with their homologs in gnathos-tomes (Fig. 3 and SI Appendix, Fig. S3).

Expression Patterns of Lamprey ParaHox Genes. We examined theexpression pattern of the Japanese lamprey ParaHox genes in

the brain, eye, and intestine by RT-PCR. Both Cdxα and Cdxβgenes were found to express in the intestine (Fig. 4), similar toCdx1, Cdx2, and Cdx4 genes of mammals (22) and Cdx1a andCdx1b of zebrafish (23, 24). The Japanese lamprey Gsxα showedexpression in the brain (Fig. 4), similar to Gsx1 and Gsx2 genes ofmouse (25, 26) and zebrafish (27, 28). However, the Japaneselamprey Gsxβ did not show expression in the brain, but instead wasfound to express in the eye, in addition to a low level of expressionin the intestine (Fig. 4). None of the vertebrate ParaHox genescharacterized so far have been shown to express in the eye. Thus,the eye expression of lamprey Gsxβ seems to be an expressiondomain specific to the lamprey lineage. In gnathostomes, Pdx genesexpress specifically in the pancreatic tissue (4). Because we did nothave access to the pancreatic tissue of the Japanese lamprey, weobtained cDNA from the pancreatic tissue of postmetamorphicjuvenile sea lamprey and checked for the expression of Pdxα. Thesea lamprey Pdxα was indeed found to express in this tissue (Fig. 4).

Conserved Noncoding Elements. Noncoding elements conserved indistant vertebrates have been shown to often function as en-hancers (20, 29, 30). To determine whether the cyclostomeParaHox gene loci contain any evolutionarily conserved regula-tory elements, we generated MLAGAN alignments of ParaHoxloci from lamprey, hagfish, and selected gnathostomes (elephantshark, human, coelacanth, spotted gar, and zebrafish) and pre-dicted conserved noncoding elements (CNEs) that are at least70% identical across >100-bp windows (Materials and Methods).Because the orthologous relationship between the cyclostomeand gnathostome ParaHox genes was not clear, we aligned theJapanese lamprey and hagfish Gsxα locus with the gnathostomeGsx1, as well as Gsx2 locus and predicted CNEs. Both lampreyand hagfish Gsxα loci were found to share a CNE with thegnathostome Gsx1 locus (Fig. 5A), but none with the gnathos-tome Gsx2 locus (SI Appendix, Fig. S4). The sharing of a CNEbetween the Japanese lamprey and hagfish Gsxα locus providesfurther support to our inference based on phylogenetic analysisthat these loci are indeed orthologous. The CNE identified in thelamprey and hagfish loci is actually part of a previously reportedGsx1 enhancer (#hs532) that is highly conserved in mouse andhuman and has been shown to drive expression of a reporter geneto the forebrain, midbrain, and hindbrain in embryonic day (E)11.5transgenic mouse embryos (VISTA Enhancer Browser: https://enhancer.lbl.gov/). To check whether lamprey and hagfish share anyunique CNEs, we aligned the Gsxα-Pdxα-Cdxα locus of Japaneselamprey with the Gsxα-Cdxα locus of hagfish and predicted CNEs

Fig. 2. ParaHox gene loci in the Japanese (A) and sea lamprey (B). The black rectangles denote the exons. The transcriptional orientation of the genes is indicated bythe arrows. Gray circles represent the ends of scaffolds/contigs. The gap between Japanese lamprey scaffolds 1054 and 1722 was filled (dotted line) by genomic PCR.

9148 | www.pnas.org/cgi/doi/10.1073/pnas.1704457114 Zhang et al.

Dow

nloa

ded

by g

uest

on

June

15,

202

0

Page 4: Lampreys, the jawless vertebrates, contain only two ParaHox … · Lampreys, the jawless vertebrates, contain only two ParaHox gene clusters Huixian Zhanga,b,1, Vydianathan Ravia,1,

(SI Appendix, Fig. S5). Although we found an element unique to thetwo species (second CNE in SI Appendix, Fig. S5A), it turned out tobe a repetitive sequence (SI Appendix, Fig. S5B).To examine whether the second ParaHox locus (Gsxβ-Cdxβ

locus) in Japanese lamprey and sea lamprey share any CNEs withthe gnathostome ParaHox gene loci, we aligned the lamprey lociwith each of the four ParaHox gene loci of the gnathostomes andpredicted CNEs. However, no CNE was found to be sharedbetween these lamprey loci and the gnathostome Gsx1 and Gsx2loci (SI Appendix, Fig. S6 A and B).

Functional Assay of CNE. To investigate whether the single CNEconserved in the Japanese lamprey Gsxα-Pdxα-Cdxα locus is anenhancer, we generated transgenic zebrafish bearing sequencesencompassing the lamprey CNE (964 bp) linked to a GFP reporter.As a control, we assayed the homologous CNE region (670 bp)from the zebrafish gsx1 locus. In situ hybridization studies haveshown that zebrafish Gsx1 is expressed in the diencephalon, mes-encephalon (midbrain), and rhombomeres (hindbrain) (27). Thezebrafish gsx1 CNE was able to drive reproducible expression in thehindbrain starting from the midbrain–hindbrain boundary and inparts of the midbrain of F1 transgenic zebrafish embryos (Fig. 5B),thereby recapitulating part of the expression pattern of the zebrafishgsx1 gene (27). In addition, high levels of GFP expression were alsoseen in branchial arches of transgenic zebrafish embryos (Fig. 5B).This is likely to be ectopic expression driven by the CNE construct.In contrast to the zebrafish CNE construct, the lamprey CNE failedto drive any recognizable GFP expression in F0 as well asF1 transgenic zebrafish embryos. Thus, it appears that the zebrafishtranscription factors are unable to recognize the lamprey CNE,which is shorter and more divergent compared with its orthologousCNE from zebrafish and other gnathostomes. Testing the lampreyCNE in a transgenic lamprey system should clarify whether theyindeed function as enhancers in lamprey.

DiscussionOur analysis of genome assemblies of two species of lamprey(Japanese lamprey and sea lamprey) and transcriptomes fromlamprey brain, eye, intestine, and pancreas identified a total of fiveParaHox genes (Gsxα, Pdxα, Cdxα, Gsxβ, and Cdxβ). These geneswere found to be organized in two clusters (α- and β-clusters) in theJapanese lamprey genome. This is in contrast to the four ParaHoxgenomic loci containing six or seven genes in nonteleost gnathos-tomes and six ParaHox genomic loci, each with one ParaHox genein teleosts (Fig. 1). The α-cluster of lamprey is a complete clustercomprising a Gsx, a Pdx, and a Cdx gene similar to the A-cluster ofnonteleost gnathostomes and the single ParaHox cluster in am-phioxus (Fig. 1). Furthermore, this lamprey ParaHox cluster and its

hagfish ortholog share a CNE with the A-cluster of gnathostomes.Thus, the α-clusters of lamprey and hagfish are likely to be ortho-logs of the gnathostome A-cluster.The β-cluster of lamprey contains a Gsx (Gsxβ) and a Cdx

(Cdxβ) gene, a combination not found in any jawed vertebrateParaHox gene loci. Thus, lamprey seems to have retained adifferent complement of ParaHox genes after the WGDs thatgave rise to multiple ParaHox gene loci in vertebrates. In-terestingly, unlikeGsx genes of gnathostomes that express mainlyin the brain (2, 31), the lamprey Gsxβ gene is expressed mainly inthe eye (Fig. 4). In mammals, Gsx genes are expressed in theprogenitor cells of the embryonic brain and are involved in thespecification of the lateral ganglionic eminence progenitors,which in turn give rise to striatal projection neurons and olfac-tory bulb interneurons (32). It is likely that the lamprey Gsxα,which expresses in the brain, performs a similar function. How-ever, the function of the lamprey Gsxβ which is expressed in theeye, is unclear and remains to be investigated. In addition to eye,Gsxβ is also expressed in the intestine at very low levels (Fig. 4), a

Fig. 3. Comparison of intron positions in the Gsx/Gsh genes of gnathostomes and jawless vertebrates. Multiple alignments of Gsx/Gsh proteins from representativegnathostomes (human, coelacanth, spotted gar, and elephant shark), jawless vertebrates (Japanese lamprey, sea lamprey, and inshore hagfish), and the cepha-lochordate amphioxus showing a selected region of the alignment for the Gsx/Gsh family. Intron positions are shown as arrowheads. The numbers next to the ar-rowheads indicate intron phases. Introns identified in the lampreys are shown as red arrows. The amino acid positions of the human sequences are shown. The boxedregion denotes the homeodomain. The unique amino acid substitutions in the Hox domain of lamprey Gsxβ are shown in red font.

Fig. 4. Expression pattern of lamprey ParaHox genes. RT-PCR analysis of lampreyParaHox genes. Liver cDNA was used as a negative control. β-actin was amplifiedas an internal control to assess the quality of the cDNA. Because we did not haveaccess to pancreatic tissue from Japanese lamprey, we used cDNA from thepancreatic tissue of sea lamprey for RT-PCR of Pdxα (gel image on the Right).

Zhang et al. PNAS | August 22, 2017 | vol. 114 | no. 34 | 9149

EVOLU

TION

Dow

nloa

ded

by g

uest

on

June

15,

202

0

Page 5: Lampreys, the jawless vertebrates, contain only two ParaHox … · Lampreys, the jawless vertebrates, contain only two ParaHox gene clusters Huixian Zhanga,b,1, Vydianathan Ravia,1,

domain in which Cdx genes are known to express at high levels.In the ParaHox clusters of gnathostomes and amphioxus, Gsxgenes are always located at the 5′ end of the cluster and areexpressed in the brain, whereas Cdx genes are found at the 3′ endof the cluster and are typically expressed in the hindgut. Theunusual expression of the lamprey Gsxβ in the intestine could bea result of the inversion of this gene in the lamprey lineage,which might have brought its promoter in close proximity to theintestine enhancer of Cdxβ, resulting in the somewhat leaky ex-pression of Gsxβ in the intestine.Most gnathostomes contain four paralogous loci for several

genes, including the Hox genes and ParaHox genes, whereasamphioxus, a basal chordate, contains a single locus for thesegenes. The quadruple loci in gnathostomes have been attributedto the two rounds of WGD during the early evolution of verte-brates (33). We have previously shown that the Japanese lam-prey contains at least six Hox gene loci, in contrast to four Hoxclusters in most gnathostomes (20), suggesting the Hox clustersmay have experienced an additional round of duplication in thelamprey lineage (20). However, the Japanese lamprey containsonly two ParaHox gene loci. This suggests the duplication eventthat gave rise to the six Hox loci did not include the ParaHoxcluster, implying that the additional Hox loci are not the result ofWGD, but are instead a result of segmental duplication or du-plications, a scenario recently proposed based on the meioticmap of the sea lamprey genome (34). According to this study, thejawless vertebrate and gnathostome lineages experienced onlyone round of WGD before their separation, which was followedby several independent segmental duplications in the two

lineages (34). However, an alternative possibility is that thelamprey ParaHox clusters duplicated along with the Hox clustersas part of the WGD events, but subsequently experienced sec-ondary losses, resulting in the retention of only five ParaHoxgenes in two loci. This possibility is supported by the presence ofsome Japanese lamprey paralogues whose gnathostome homo-logs are linked to the additional ParaHox gene loci. For example,gnathostome Pdgfra and Kit genes are linked to the ParaHox“B-cluster,” whereas their paralogues, Pdgfrb and Csf1r, are linkedto the ParaHox “C-cluster” (SI Appendix, Fig. S2). The Japaneselamprey genome contains homologs of both Kit and Csf1r(SI Appendix, Fig. S7A). Although the lamprey Kit is linkedto the β-cluster (SI Appendix, Fig. S2), Csf1r is located onscaffold_154 that does not contain any ParaHox gene. The Japa-nese lamprey genome also contains two Pdgfr genes (SI Appendix,Fig. S7B), located on scaffold_1691 and scaffold_22, that do notharbor any ParaHox genes. The presence of Kit and Csf1r, and twoPdgfr paralogues in the lamprey genome, suggests the ancestor ofJapanese lamprey contained more than two ParaHox clusters, andthe ParaHox genes in the additional clusters were subsequentlylost in the lamprey lineage. Thus, it seems likely that the ParaHoxloci in the lamprey lineage underwent at least two rounds of du-plication, along with the Hox clusters, as a result of WGD events,and subsequently only two ParaHox loci were retained in the lam-prey lineage because of extensive secondary loss of ParaHox genes.The homolog of the Pdx gene in invertebrates, known as Xlox,

is expressed in the gut, whereas Pdx in gnathostomes is expressedin the pancreas, an organ specific to vertebrates. In mammals,Pdx1 plays a crucial role in the development of the pancreas,

Fig. 5. Potential cis-regulatory elements in the ParaHox gene loci. (A) VISTA plot of the MLAGAN alignment of the Gsx1-Pdx1-Cdx2 locus from representativegnathostomes (elephant shark, human, coelacanth, spotted gar, zebrafish) with the Gsxα-Pdxα-Cdxα locus from Japanese lamprey and Gsxα-Cdxα locus fromthe inshore hagfish. The elephant shark sequence was used as the base (x-axis). The y-axis represents percentage sequence identity. Blue peaks representconserved exons, whereas pink peaks represent CNEs. The Japanese lamprey CNE is 99 bp long and 75.8% identical, whereas the hagfish CNE is 123 bp longand 76.4% identical. (B) 72-hpf F1 transgenic zebrafish embryos showing GFP expression driven by a CNE identified in the zebrafish gsx1 locus. hb, hindbrain;mb, midbrain. This expression was observed in embryos derived from four independent founders.

9150 | www.pnas.org/cgi/doi/10.1073/pnas.1704457114 Zhang et al.

Dow

nloa

ded

by g

uest

on

June

15,

202

0

Page 6: Lampreys, the jawless vertebrates, contain only two ParaHox … · Lampreys, the jawless vertebrates, contain only two ParaHox gene clusters Huixian Zhanga,b,1, Vydianathan Ravia,1,

differentiation of beta cells, and maintenance of their function(35), and is therefore vital for insulin production. Inactivation ofPdx1 in beta cells of mouse is known to result in maturity-onsetdiabetes (36) whereas mutations in human PDX1 (also known asIPF1) result in pancreatic agenesis (37). Characterization of aParaHox cluster in hagfish had shown that the Pdx gene in thislocus has become a pseudogene (17). Our study shows that theortholog of this Pdx gene in lampreys is intact and expresses inthe pancreas. Thus, the Pdx gene in lamprey is likely to befunctional. In contrast to hagfish, whose islet organ (endocrinepancreas) comprises scattered follicles, lampreys contain discreteislet organs (38). The presence of a functional Pdx gene inlampreys supports the notion that the development of its discreteislet organ, as well as insulin production, is mediated by Pdxsimilar to that in gnathostomes. In humans, PDX1 functionsin pancreas as part of a network comprising insulin, glucagon,SLC2A2, FOXA2, HNF1A, NEUROD1, PRKACB, and PRKACG(STRING database, https://string-db.org/). The brook lamprey pan-creas RNA-seq data contains transcripts (SI Appendix) for genesencoding homologs of all these proteins except PRKACG. Thus,although lampreys do not possess a distinct pancreatic gland similarto jawed vertebrates, most of the genes associated with the Pdx-network of jawed vertebrate pancreas are present and expressed inthe islet organ of lampreys. A previous study had shown that carti-laginous fishes possess a three-hormone (insulin, glucagon, and so-matostatin) pancreas and that the four-hormone (the three pluspancreatic polypeptide) pancreas found in coelacanth and tetrapodswas a later innovation (39). The brook lamprey pancreas RNA-seq

data includes transcripts (SI Appendix) for the three hormonespresent in the pancreas of cartilaginous fishes, indicating that thethree hormone pancreas had evolved in the common ancestor ofall vertebrates.

Materials and MethodsThe Japanese lamprey and sea lamprey genome assemblies were searched forParaHox genes by TBLASTN, using selected vertebrate ParaHox proteins asqueries. RNA-seq reads for the Japanese lamprey brain, eye, and intestine andbrook lamprey pancreas were generated on the Illumina platform and as-sembled using Trinity (40). To look for additional ParaHox genes in the ge-nome of the lamprey, degenerate PCR was carried out using genomic DNAas a template. Details of the primers used and PCR conditions are given inthe SI Appendix. Phylogenetic analysis of ParaHox protein sequences wereperformed using Bayesian Inference as implemented in MrBayes (version3.2.6; mrbayes.sourceforge.net/). CNEs in the ParaHox loci were predicted byaligning the sequences with MLAGAN and visualized using VISTA (genome.lbl.gov/vista/index.shtml). The Japanese lamprey and zebrafish CNEs wereassayed for enhancer function in transgenic zebrafish using GFP as a re-porter. Further details of the materials and methods are given in theSI Appendix.

Detailed SI Appendix, Materials and Methods, SI Appendix, Sequence File,SI Appendix, Tables S1 and S2, and SI Appendix, Figs. S1–S10 are provided inthe SI Appendix.

ACKNOWLEDGMENTS. We thank Michael Schorpp and Thomas Boehm forproviding pancreatic tissue of brook lamprey, and Jonathan M. Wilson forsea lamprey pancreas cDNA. This work was supported by the BiomedicalResearch Council of the Agency for Science, Technology and Research ofSingapore.

1. Fortunato SAV, et al. (2014) Calcisponges have a ParaHox gene and dynamic ex-pression of dispersed NK homeobox genes. Nature 514:620–623.

2. Valerius MT, et al. (1995) Gsh-1: A novel murine homeobox gene expressed in thecentral nervous system. Dev Dyn 203:337–351.

3. Johnson JD, et al. (2003) Increased islet apoptosis in Pdx1+/- mice. J Clin Invest 111:1147–1160.

4. Jonsson J, Carlsson L, Edlund T, Edlund H (1994) Insulin-promoter-factor 1 is requiredfor pancreas development in mice. Nature 371:606–609.

5. Gamer LW, Wright CVE (1993) Murine Cdx-4 bears striking similarities to the Dro-sophila caudal gene in its homeodomain sequence and early expression pattern.Mech Dev 43:71–81.

6. Osborne PW, Ferrier DE (2010) Chordate Hox and ParaHox gene clusters differ dra-matically in their repetitive element content. Mol Biol Evol 27:217–220.

7. Pollard SL, Holland PWH (2000) Evidence for 14 homeobox gene clusters in humangenome ancestry. Curr Biol 10:1059–1062.

8. Mulley JF, Holland PW (2010) Parallel retention of Pdx2 genes in cartilaginous fish andcoelacanths. Mol Biol Evol 27:2386–2391.

9. Mulley JF, Holland PW (2014) Genomic organisation of the seven ParaHox genes ofcoelacanths. J Exp Zoolog B Mol Dev Evol 322:352–358.

10. Braasch I, et al. (2016) The spotted gar genome illuminates vertebrate evolution andfacilitates human-teleost comparisons. Nat Genet 48:427–437.

11. Mulley JF, Chiu CH, Holland PWH (2006) Breakup of a homeobox cluster after genomeduplication in teleosts. Proc Natl Acad Sci USA 103:10369–10372.

12. Siegel N, Hoegg S, Salzburger W, Braasch I, Meyer A (2007) Comparative genomics ofParaHox clusters of teleost fishes: Gene cluster breakup and the retention of genesets following whole genome duplications. BMC Genomics 8:312.

13. Kuraku S, Hoshiyama D, Katoh K, Suga H, Miyata T (1999) Monophyly of lampreys andhagfishes supported by nuclear DNA-coded genes. J Mol Evol 49:729–735.

14. Hardisty MW (1979) Biology of the Cyclostomes (Springer, New York), pp 428.15. Murakami Y, Uchida K, Rijli FM, Kuratani S (2005) Evolution of the brain de-

velopmental plan: Insights from agnathans. Dev Biol 280:249–259.16. Wicht H (1996) The brains of lampreys and hagfishes: Characteristics, characters, and

comparisons. Brain Behav Evol 48:248–261.17. Furlong RF, et al. (2007) A degenerate ParaHox gene cluster in a degenerate verte-

brate. Mol Biol Evol 24:2681–2686.18. Ravi V, et al. (2016) Cyclostomes lack clustered protocadherins. Mol Biol Evol 33:

311–315.19. Coffill CR, et al. (2016) The p53-Mdm2 interaction and the E3 ligase activity of Mdm2/

Mdm4 are conserved from lampreys to humans. Genes Dev 30:281–292.20. Mehta TK, et al. (2013) Evidence for at least six Hox clusters in the Japanese lamprey

(Lethenteron japonicum). Proc Natl Acad Sci USA 110:16044–16049.21. Qiu H, Hildebrand F, Kuraku S, Meyer A (2011) Unresolved orthology and peculiar

coding sequence properties of lamprey genes: The KCNA gene family as test case.BMC Genomics 12:325.

22. Beck F, Stringer EJ (2010) The role of Cdx genes in the gut and in axial development.Biochem Soc Trans 38:353–357.

23. Davidson AJ, Zon LI (2006) The caudal-related homeobox genes cdx1a and cdx4 actredundantly to regulate hox gene expression and the formation of putative hema-topoietic stem cells during zebrafish embryogenesis. Dev Biol 292:506–518.

24. Flores MV, et al. (2008) Intestinal differentiation in zebrafish requires Cdx1b, afunctional equivalent of mammalian Cdx2. Gastroenterology 135:1665–1675.

25. Diez-Roux G, et al. (2011) A high-resolution anatomical atlas of the transcriptome inthe mouse embryo. PLoS Biol 9:e1000582.

26. Gray PA, et al. (2004) Mouse brain organization revealed through direct genome-scale TF expression analysis. Science 306:2255–2257.

27. Cheesman SE, Eisen JS (2004) gsh1 demarcates hypothalamus and intermediate spinalcord in zebrafish. Gene Expr Patterns 5:107–112.

28. Satou C, et al. (2013) Transgenic tools to characterize neuronal properties of discretepopulations of zebrafish neurons. Development 140:3927–3931.

29. Bhatia S, et al. (2014) A survey of ancient conserved non-coding elements in thePAX6 locus reveals a landscape of interdigitated cis-regulatory archipelagos. Dev Biol387:214–228.

30. Parker HJ, Piccinelli P, Sauka-Spengler T, Bronner M, Elgar G (2011) Ancient Pbx-Hoxsignatures define hundreds of vertebrate developmental enhancers. BMC Genomics12:637.

31. Hsieh-Li HM, et al. (1995) Gsh-2, a murine homeobox gene expressed in the de-veloping brain. Mech Dev 50:177–186.

32. Pei Z, et al. (2011) Homeobox genes Gsx1 and Gsx2 differentially regulate telence-phalic progenitor maturation. Proc Natl Acad Sci USA 108:1675–1680.

33. Putnam NH, et al. (2008) The amphioxus genome and the evolution of the chordatekaryotype. Nature 453:1064–1071.

34. Smith JJ, Keinath MC (2015) The sea lamprey meiotic map improves resolution ofancient vertebrate genome duplications. Genome Res 25:1081–1090.

35. Kaneto H, et al. (2008) PDX-1 and MafA play a crucial role in pancreatic beta-celldifferentiation and maintenance of mature beta-cell function. Endocr J 55:235–252.

36. Ahlgren U, Jonsson J, Jonsson L, Simu K, Edlund H (1998) beta-cell-specific inactivationof the mouse Ipf1/Pdx1 gene results in loss of the beta-cell phenotype and maturityonset diabetes. Genes Dev 12:1763–1768.

37. Stoffers DA, Zinkin NT, Stanojevic V, Clarke WL, Habener JF (1997) Pancreatic agenesisattributable to a single nucleotide deletion in the human IPF1 gene coding sequence.Nat Genet 15:106–110.

38. Youson JH, Al-Mahrouki AA (1999) Ontogenetic and phylogenetic development ofthe endocrine pancreas (islet organ) in fish. Gen Comp Endocrinol 116:303–335.

39. Mulley JF, Hargreaves AD, Hegarty MJ, Heller RS, Swain MT (2014) Transcriptomicanalysis of the lesser spotted catshark (Scyliorhinus canicula) pancreas, liver and brainreveals molecular level conservation of vertebrate pancreas function. BMC Genomics15:1074.

40. Grabherr MG, et al. (2011) Full-length transcriptome assembly from RNA-Seq datawithout a reference genome. Nat Biotechnol 29:644–652.

Zhang et al. PNAS | August 22, 2017 | vol. 114 | no. 34 | 9151

EVOLU

TION

Dow

nloa

ded

by g

uest

on

June

15,

202

0