metaimprint: an information repository of mammalian ...mysql backend. the starting point of...

8
RESEARCH ARTICLE TECHNIQUES AND RESOURCES MetaImprint: an information repository of mammalian imprinted genes Yanjun Wei 1 , Jianzhong Su 1 , Hongbo Liu 2 , Jie Lv 2 , Fang Wang 1 , Haidan Yan 1 , Yanhua Wen 1 , Hui Liu 2 , Qiong Wu 2, * and Yan Zhang 1, * ABSTRACT Genomic imprinting is a complex genetic and epigenetic phenomenon that plays important roles in mammalian development and diseases. Mammalian imprinted genes have been identified widely by experimental strategies or predicted using computational methods. Systematic information for these genes would be necessary for the identification of novel imprinted genes and the analysis of their regulatory mechanisms and functions. Here, a well-designed information repository, MetaImprint (http://bioinfo.hrbmu.edu.cn/ MetaImprint), is presented, which focuses on the collection of information concerning mammalian imprinted genes. The current version of MetaImprint incorporates 539 imprinted genes, including 255 experimentally confirmed genes, and their detailed research courses from eight mammalian species. MetaImprint also hosts genome-wide genetic and epigenetic information of imprinted genes, including imprinting control regions, single nucleotide polymorphisms, non-coding RNAs, DNA methylation and histone modifications. Information related to human diseases and functional annotation was also integrated into MetaImprint. To facilitate data extraction, MetaImprint supports multiple search options, such as by gene ID and disease name. Moreover, a configurable Imprinted Gene Browser was developed to visualize the information on imprinted genes in a genomic context. In addition, an Epigenetic Changes Analysis Tool is provided for online analysis of DNA methylation and histone modification differences of imprinted genes among multiple tissues and cell types. MetaImprint provides a comprehensive information repository of imprinted genes, allowing researchers to investigate systematically the genetic and epigenetic regulatory mechanisms of imprinted genes and their functions in development and diseases. KEY WORDS: Imprinted genes, Database, Genetic and epigenetic information, Imprinting disorder INTRODUCTION Genomic imprinting, a complex epigenetic phenomenon whereby gene expression occurs in a parent-of-origin specific manner in various cell or tissue types, has an essential role in normal growth and development (Reik and Walter, 2001). Fetal development is promoted generally by paternally expressed genes (Wu et al., 2006), whereas maternally expressed genes have a tendency to restrict fetal growth (Abu-Amero et al., 2008). Imprinting is found predominantly in placental mammals, and may participate in the balance of parental resource allocation to the offspring (Ishida and Moore, 2013). It has been reported that imprinting is related to the evolution of the placenta in mammals (Fang et al., 2012). Defects in imprinting have been associated with congenital diseases, infertility, molar pregnancy and defective assisted reproductive cells (Tomizawa and Sasaki, 2012). Loss of imprinting (LOI) induced by methylation disruption at the imprinted loci may lead to serious imprinting disorders (e.g. BeckwithWiedemann syndrome) (Bliek et al., 2001) and cancers (e.g. Wilmstumor) (Rancourt et al., 2013; Wu et al., 2008). Extensive studies have attempted to identify the imprinted genes in various cell and tissue types of mammals, and several experimental strategies have been used to identify imprinted genes, based on DNA methylation status analysis between two alleles, gene-targeting experiments and whole-transcriptome RNA- sequencing analysis (Babak et al., 2008; Barlow et al., 1991; Luedi et al., 2007; Wang et al., 2008). In addition, computational methods have been used to predict new imprinted genes, based on DNA sequence features, CpG islands, repetitive elements, miRNA clusters and epigenetic modifications in the human and mouse genomes (Brideau et al., 2010; Luedi et al., 2005). The mechanisms that establish and maintain imprinted genes have been found using different perspectives, such as DNA sequence features (Khatib et al., 2007), non-coding RNAs (ncRNAs) (Royo and Cavaille, 2008) and epigenetic modifications (Barlow, 2011; Koerner and Barlow, 2010). Over the past few years, several databases have been established to store information related to imprinted genes, including MouseBook (Blake et al., 2010), WAMIDEX (Schulz et al., 2008), the Otago Catalogue of Imprinted Genes (Morison and Reeve, 1998) and Geneimprint (Bischoff et al., 2009). Both MouseBook and WAMIDEX mainly focus on mouse imprinted genes. Although the Otago Catalogue of Imprinted Genes and Geneimprint cover diverse species, the records for the imprinted gene are limited, such that even the genomic loci information is limited or even lacking. In addition, as reference genes in public nucleotide databases are updated, a portion of the imprinted gene records in these databases have become outdated and cannot be positioned in the recent versions of the genomes. In addition, the emergence of high-throughput sequencing technologies in recent years has made it possible to perform genome-wide annotations of imprinted genes, such as cell/tissue specificity, genetic and epigenetic information, as well as related diseases. Therefore, a specifically designed database, MetaImprint, was constructed by manually mining the literature related to imprinted genes and incorporating meta-information from the high-throughput genetic and epigenetic data. The current version of MetaImprint covers imprinted genes from eight species (human, mouse, sheep, rat, dog, rabbit, cow and pig). Compared with the pre-existing databases, MetaImprint has four advantages. (1) The allele or cell/tissue specificity and detailed Received 27 October 2013; Accepted 12 April 2014 1 College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China. 2 School of Life Science and Technology, State Key Laboratory of Urban Water Resource and Environment, Harbin Institute of Technology, Harbin 150001, China. *Authors for correspondence ([email protected]; [email protected]) 2516 © 2014. Published by The Company of Biologists Ltd | Development (2014) 141, 2516-2523 doi:10.1242/dev.105320 DEVELOPMENT

Upload: others

Post on 03-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MetaImprint: an information repository of mammalian ...MySQL backend. The starting point of MetaImprint is the search page, which provides three entries including the Imprinted gene

RESEARCH ARTICLE TECHNIQUES AND RESOURCES

MetaImprint: an information repository of mammalian imprintedgenesYanjun Wei1, Jianzhong Su1, Hongbo Liu2, Jie Lv2, Fang Wang1, Haidan Yan1, Yanhua Wen1, Hui Liu2,Qiong Wu2,* and Yan Zhang1,*

ABSTRACTGenomic imprinting is a complex genetic and epigenetic phenomenonthat plays important roles in mammalian development and diseases.Mammalian imprinted genes have been identified widely byexperimental strategies or predicted using computational methods.Systematic information for these genes would be necessary for theidentification of novel imprinted genes and the analysis of theirregulatory mechanisms and functions. Here, a well-designedinformation repository, MetaImprint (http://bioinfo.hrbmu.edu.cn/MetaImprint), is presented, which focuses on the collection ofinformation concerning mammalian imprinted genes. The currentversion of MetaImprint incorporates 539 imprinted genes, including255 experimentally confirmed genes, and their detailed researchcourses from eight mammalian species. MetaImprint also hostsgenome-wide genetic and epigenetic information of imprintedgenes, including imprinting control regions, single nucleotidepolymorphisms, non-coding RNAs, DNA methylation and histonemodifications. Information related to human diseases and functionalannotation was also integrated into MetaImprint. To facilitate dataextraction, MetaImprint supports multiple search options, such as bygene ID and disease name. Moreover, a configurable ImprintedGene Browser was developed to visualize the information onimprinted genes in a genomic context. In addition, an EpigeneticChanges Analysis Tool is provided for online analysis of DNAmethylation and histone modification differences of imprinted genesamong multiple tissues and cell types. MetaImprint provides acomprehensive information repository of imprinted genes, allowingresearchers to investigate systematically the genetic and epigeneticregulatory mechanisms of imprinted genes and their functions indevelopment and diseases.

KEY WORDS: Imprinted genes, Database, Genetic and epigeneticinformation, Imprinting disorder

INTRODUCTIONGenomic imprinting, a complex epigenetic phenomenon wherebygene expression occurs in a parent-of-origin specific manner invarious cell or tissue types, has an essential role in normal growth anddevelopment (Reik andWalter, 2001). Fetal development is promotedgenerally by paternally expressed genes (Wu et al., 2006), whereasmaternally expressed genes have a tendency to restrict fetal growth(Abu-Amero et al., 2008). Imprinting is found predominantly in

placental mammals, and may participate in the balance of parentalresource allocation to the offspring (Ishida and Moore, 2013). It hasbeen reported that imprinting is related to the evolution of the placentain mammals (Fang et al., 2012). Defects in imprinting have beenassociated with congenital diseases, infertility, molar pregnancy anddefective assisted reproductive cells (Tomizawa and Sasaki, 2012).Loss of imprinting (LOI) induced by methylation disruption at theimprinted loci may lead to serious imprinting disorders (e.g.Beckwith–Wiedemann syndrome) (Bliek et al., 2001) and cancers(e.g. Wilms’ tumor) (Rancourt et al., 2013; Wu et al., 2008).

Extensive studies have attempted to identify the imprinted genesin various cell and tissue types of mammals, and severalexperimental strategies have been used to identify imprintedgenes, based on DNA methylation status analysis between twoalleles, gene-targeting experiments and whole-transcriptome RNA-sequencing analysis (Babak et al., 2008; Barlow et al., 1991; Luediet al., 2007; Wang et al., 2008). In addition, computational methodshave been used to predict new imprinted genes, based on DNAsequence features, CpG islands, repetitive elements, miRNAclusters and epigenetic modifications in the human and mousegenomes (Brideau et al., 2010; Luedi et al., 2005). The mechanismsthat establish and maintain imprinted genes have been found usingdifferent perspectives, such as DNA sequence features (Khatibet al., 2007), non-coding RNAs (ncRNAs) (Royo and Cavaille,2008) and epigenetic modifications (Barlow, 2011; Koerner andBarlow, 2010).

Over the past few years, several databases have been established tostore information related to imprinted genes, including MouseBook(Blake et al., 2010), WAMIDEX (Schulz et al., 2008), the OtagoCatalogue of Imprinted Genes (Morison and Reeve, 1998) andGeneimprint (Bischoff et al., 2009). Both MouseBook andWAMIDEX mainly focus on mouse imprinted genes. Although theOtago Catalogue of Imprinted Genes and Geneimprint cover diversespecies, the records for the imprinted gene are limited, such that eventhe genomic loci information is limited or even lacking. In addition, asreference genes in public nucleotide databases are updated, a portionof the imprinted gene records in these databases have becomeoutdatedand cannot be positioned in the recent versions of the genomes. Inaddition, the emergence of high-throughput sequencing technologiesin recent years has made it possible to perform genome-wideannotations of imprinted genes, such as cell/tissue specificity, geneticand epigenetic information, as well as related diseases. Therefore, aspecifically designed database, MetaImprint, was constructed bymanually mining the literature related to imprinted genes andincorporating meta-information from the high-throughput geneticand epigenetic data. The current version of MetaImprint coversimprinted genes from eight species (human, mouse, sheep, rat, dog,rabbit, cow and pig).

Compared with the pre-existing databases, MetaImprint has fouradvantages. (1) The allele or cell/tissue specificity and detailedReceived 27 October 2013; Accepted 12 April 2014

1College of Bioinformatics Science and Technology, Harbin Medical University,Harbin 150081, China. 2School of Life Science and Technology, State KeyLaboratory of Urban Water Resource and Environment, Harbin Institute ofTechnology, Harbin 150001, China.

*Authors for correspondence ([email protected]; [email protected])

2516

© 2014. Published by The Company of Biologists Ltd | Development (2014) 141, 2516-2523 doi:10.1242/dev.105320

DEVELO

PM

ENT

Page 2: MetaImprint: an information repository of mammalian ...MySQL backend. The starting point of MetaImprint is the search page, which provides three entries including the Imprinted gene

research processes of imprinted genes are shown in the detailedpages as a result of literature mining. (2) MetaImprint provides theintegrative annotation of genetic and epigenetic information foreach imprinted gene at any given chromosome location, such asimprint control regions (ICRs), single nucleotide polymorphisms(SNPs), repetitive elements, ncRNAs, DNA methylation andhistone modifications. In particular, a visual engine, ImprintedGene Browser, was developed to show a landscape view of theimprinted genes in the context of existing genetic and epigeneticannotations. (3) The detailed relationships between imprinted genesand human diseases are also incorporated. (4) MetaImprint providestwo analysis tools to identify epigenetic differences and forfunctional enrichment analysis of imprinted genes among cell/tissue types. In brief, MetaImprint is a comprehensive and cross-species database for the systematic study of imprinted genes in termsof genetics and epigenetics, as well as diseases and evolution.

RESULTSDatabase content and constructionThe current release of MetaImprint was designed to provideimprinted gene-related information regarding the imprinting states,epigenetic modifications, and related diseases. By mining theliterature manually, we collected 845 imprinted gene-related studiesfrom 1991 to date. From these studies, 539 mammalian imprintedgenes from eight mammalian species were obtained, including 317human genes, 146 mouse genes, three sheep genes, seven rat genes,one dog gene, one rabbit gene, 35 cow genes and 29 pig genes(Table 1). Among these imprinted genes, 255 imprinted genes havebeen validated experimentally and the remainder were predicated bycomputational methods from the human and mouse genomes.Based on the collected information, we investigated the research

trends of imprinted genes over time. This revealed that the numberof studies related to imprinting increased from 1900, especially inthe years since 2007 (Fig. 1A). In most of the studies, allele-specificexpression of genes was used to discover imprinted genes byexperimental and computational methods. For example, Igf2 wasidentified as the first imprinted gene in mouse in 1991, and H19 inhuman in 1992. The time spectrum for the discovery of imprintedgenes revealed that there was a spurt in the identification of novelimprinted genes around 2008 (Fig. 1B). However, the number ofnewly discovered imprinted genes decreased year by year from 2010whereas the number of studies related to imprinting genes increased.

Although the identification of novel imprinted genes is useful,researchers have been paying more attention to the regulatorymechanisms and functions of imprinted genes. Thus, the regulatoryand functional information of imprinted genes would be necessaryfor current analysis.

DNA methylation and histone modification information maybe useful to uncover epigenetic regulatory effects on genomeimprinting. Recently, increasing numbers of studies related toepigenetic modifications (e.g. DNAmethylation) have revealed thatepigenetic modifications are regulators of allele-specific expressionof imprinted genes (Court et al., 2014; Fang et al., 2012; Shoemakeret al., 2010). In addition, some studies focused on the imprintingmechanisms in the development and related functions of imprintedgenes (Chang et al., 2013). Studies on SNPs, which are idealmarkers for identifying imprinted genes, revealed the roles ofabnormal imprinting in complex traits, including complex diseases(Ishida and Moore, 2013; Lawson et al., 2013). It has been reportedthat imprinted genes contain tandem repeat arrays in their CpGislands more frequently (Hutter et al., 2006). Thus, we obtained andnormalized the genomic and epigenomic data from public dataresources including the University of California Santa Cruz (UCSC)Genome Bioinformatics Site (Karolchik et al., 2014), the NationalCenter for Biotechnology Information (NCBI) Gene ExpressionOmnibus (GEO) database (Barrett et al., 2013) and the NCBISequence Read Archive (SRA) (Wheeler et al., 2007).

By integrating genetic and epigenetic information with imprintedgenes and other information, we constructed the MetaImprintdatabase based on three major software components: a MySQLrelational database, an Apache Tomcat web server and Java-basedbackstage processing programs. To guarantee the stability and highperformance of the database, a flexible query engine was developedusing a Java web application framework (Apache Struts2) and apersistence framework (iBATIS), which automates the mappingbetween MySQL databases and objects in Java. We also developedtwo pragmatic online tools to analyze imprinted genes, one forfunctional enrichment of imprinted genes and the other to analyzethe differences in epigenetic modification among multiple tissues orcells. Asynchronous JavaScript and XML (AJAX) and JavaServerPages (JSP) were used to build the ImprintedGeneBrowser forvisualization of information related to imprinted genes, such asimprinting states, imprinting control region, allelic-specificmethylation, and imprinting-related diseases.

Table 1. MetaImprint data content and statistics (1 January 2014)

Data statistics

Data content Human Mouse Dog Cow Sheep Pig Rabbit Rat

Imprinted geneTotal 317 146 1 35 3 29 1 7Experimentally validated (I) 65 132 1 27 3 20 1 6Predicted (P) 200 1 0 0 0 0 0 1Loss of imprint (LOI) 21 4 0 0 0 0 0 0Not imprinted (NI) 31 7 0 8 0 9 0 0

Genome informationRefGenome hg19 mm9 canFam3 bosTau7 oviAri1 susScr3 oryCun2 rn4RefGene 23,850 23,427 1327 13,584 710 3950 1301 16,849

DNA methylome 19 10 0 0 0 0 0 0Histone modification 21 18 0 0 0 0 0 0Allele-specific methylation 5 0 0 0 0 0 0 0Disease 344 83 0 18 0 0 0 0Other integrated dataRepeat 10 10 10 10 10 10 10 10SNP 11 1 0 0 0 0 0 1

2517

RESEARCH ARTICLE Development (2014) 141, 2516-2523 doi:10.1242/dev.105320

DEVELO

PM

ENT

Page 3: MetaImprint: an information repository of mammalian ...MySQL backend. The starting point of MetaImprint is the search page, which provides three entries including the Imprinted gene

Database usage and accessOverview of the usage of the MetaImprint databaseMetaImprint is available at http://bioinfo.hrbmu.edu.cn/MetaImprint.The database supports four basic operations: search, browser, analysisand download. As shown in Fig. 1C, users can obtain the imprintedgenes and related annotations by exercising user-specified options.The query results are listed as a tablewith links to the Imprinted GeneBrowser for visualization, and links to download for further analysis.Two analysis tools were developed to identify epigenetic differencesof imprinted genes among different tissue/cell types and forfunctional enrichment analysis of imprinted gene sets. All of thedatasets hosted in the MetaImprint database can be downloaded tolocal servers for further analysis. Novel imprinted genes and theirrelated information can be directly uploaded into the database usingthe ‘submit’ option to keep MetaImprint up to date.

Querying imprinted gene recordsMetaImprint provides a flexible search engine based on aMySQL backend. The starting point of MetaImprint is the searchpage, which provides three entries including the Imprinted gene

search, Disease search and Imprinted Gene Browser. Each searchentry provides several query parameters, such as species, genename/ID, genomic location, chromosome band, expressedparental allele and disease type. The more query parametersused to query imprinted genes, the more precise the query resultsare. Quick search results are shown in Fig. 2A. The links on thesegenes would bring users to the detailed information for thecorresponding imprinted genes.

As shown in Fig. 2B, the imprinting states of a gene are dividedinto four states (see Materials and Methods). Moreover, the relatedKyoto Encyclopedia of Genes and Genomes (KEGG) pathway,gene ontology (GO) terms and disease information for the imprintedgenes are also provided in MetaImprint. The research courses ofimprinted genes include gene names, genome locations,EnsembleID, imprinting states, allele-specific expression, tissue-specific types, measurement methods and relevant publicationsshown in chronological order. As shown in Fig. 2B, the summaryinformation for the human imprinted gene IGF2 consists of sixkinds of information. (1) Its genome location is chr11: 2150346-2160204 (hg19). (2) Its Ensembl (space) ID is ENSG00000167244.

Fig. 1. Database content andconstruction. (A) The number ofpublished studies of imprinted genessince 1991. (B) The number ofidentified imprinted genes since1991. (C) Architectural overview ofthe MetaImprint database.

2518

RESEARCH ARTICLE Development (2014) 141, 2516-2523 doi:10.1242/dev.105320

DEVELO

PM

ENT

Page 4: MetaImprint: an information repository of mammalian ...MySQL backend. The starting point of MetaImprint is the search page, which provides three entries including the Imprinted gene

(3) Its imprinting states are imprinted (I) according to 46 relatedpublications, including 39 records supporting maternal-origin-specific expression, only seven supporting paternal-origin-specificexpression and 22 supporting biallelic expression. (4) The KEGGand GO terms related to IGF2. (5) The studies of IGF2 as animprinted gene are from 1994 in human. (6) IGF2 is related to somediseases, including medulloblastomas, hepatocarcinomas, neuraltube defects and Silver Russell syndrome (SRS). All the queryresults can be viewed by the Imprinted Gene Browser anddownloaded to a local computer for further analysis.

Imprinted gene browserA user-friendly and configurable genome browser, Imprinted GeneBrowser, was developed to view the query results for imprintedgenes intuitively. Imprinted Gene Browser is linked to the MySQLbackend to display the genetic and epigenetic features of imprintedgenes, including the upstream and downstream genetic information,repetitive elements, SNPs, ncRNAs, ICRs, CpG islands, DNAmethylation and histone modifications. As shown in Fig. 2C, the

Imprinted Gene Browser provides options to zoom through thepredefined genomic regions or gene ID/symbols. Users can view acomprehensive landscape of each imprinted gene along with theepigenetic and genetic features in the Imprinted Gene Browser.Users can also review the imprinted genes that are located in specificchromosome regions, such as a given chromosome band.

Analysis toolsMetaImprint also provides two analysis tools: the Epigenetic ChangesAnalysis tool and the Functional Enrichment Analysis tool (Fig. 2D).As is well known, DNA methylation mediates imprinted geneexpression by passing an epigenetic state across generations anddifferentially marking specific regulatory regions on maternal andpaternal alleles. The dynamics of epigenetic modification may beindispensable for genomic imprinting, which is a feature ofmammalian development (Liu et al., 2013). Therefore, DNAmethylation and histone modification information may be useful touncover the mechanisms by which epigenetic modification mediatethe allele-specific expression of imprinted genes. The Epigenetic

Fig. 2. A comprehensive view ofMetaImprint search results. (A) Theresults table after searching fiveimprinted genes. (B) Details of aselected gene from the results tableshowing imprinting states, gene IDs,research records and relateddiseases. (C) Visualization of geneticand epigenetic information in theImprinted Gene Browser. (D) Twoanalysis tools in MetaImprint. Left:the Epigenetic Changes Analysis toolfor identification of the imprintedgenes modified by differentepigenetic modifications. Right: theFunctional Enrichment Analysis toolfor functional analysis of a set ofimprinted genes.

2519

RESEARCH ARTICLE Development (2014) 141, 2516-2523 doi:10.1242/dev.105320

DEVELO

PM

ENT

Page 5: MetaImprint: an information repository of mammalian ...MySQL backend. The starting point of MetaImprint is the search page, which provides three entries including the Imprinted gene

Changes Analysis tool is based on analysis of variance (ANOVA) toidentify the epigenetic differences of the average histonemodificationsignals or the average DNA methylation levels of imprinted genesamong different cells or tissues. The results list the information ofRefSeq genes, repetitive elements, SNPs and ncRNAs. The epigeneticfeatures and P-values of ANOVA can be adjusted by the users whennecessary. Moreover, the Functional Enrichment Analysis toolsupported by the Database for Annotation, Visualization andIntegrated Discovery (DAVID) application programming interface(API) (Dennis et al., 2003) is provided for users to identify theenriched biological functions for any given lists of imprinted genes.And users can also submit a list of gene (NCBI gene ID/gene symbol)to identify enriched biological function based on KEGG and GO.

TheapplicationsofMetaImprint in the systematic analysis ofimprinted genesBased on the integrated information of imprinted genes inMetaImprint, we studied the functions of imprinted genes. As anexample, 317 human imprinted genes in MetaImprint were analyzedfor functional enrichment using the Functional Enrichment Analysistool.This revealed that these imprintedgenes are significantlyenrichedin biological processes, including embryonic morphogenesis,embryonic organ development, pattern specification processes, etc.(Fig. 3A), indicating the important roles of these imprinted genes indevelopment. In addition, numerous genetic diseases are associated

with imprinting defects, including Beckwith–Wiedemann syndrome(BWS) and multiple cancers (Ishida and Moore, 2013; Surani et al.,1990). It has been reported that imprinting is related to the evolution ofthe placenta in mammals and defects in genome imprinting have beenassociated with several diseases. From the disease information ofimprinted genes in MetaImprint, we found that 99 of the 317 humanimprinted genes analyzed are associated with disorders (Fig. 3B). Forexample, imprinted genes IGF2 andH19 are related to 38diseases.Wealso found that 84 diseases are induced by at least one imprinted gene(Fig. 3C). It should be noted that 24 imprinted genes are associatedwith Wilms’ tumor. Furthermore, we built an imprinting disordernetwork and disorder-related gene network to investigate the potentialrelationships between imprinted genes and various diseases (Fig. 3D).This analysis showed that imprinted genes participate inmanykinds ofcancer, including Wilms’ tumor and bladder cancer, which isconsistent with previous findings of loss of imprinting in a largevariety of human tumors (Jelinic and Shaw, 2007). Many imprintedgenes tend to participate jointly in one disease, indicating their co-regulation in human disorders.

Furthermore, we used the Epigenetic Changes Analysis tool toanalyze the methylation difference of human imprinted genesbetween liver and hepatic carcinoma. Among 317 human imprintedgenes, 253 genes with available data were analyzed, and 97 wereidentified as differentially methylated genes (DMGenes) (Fig. 4A).Based on the disease information in MetaImprint, 11 of these

Fig. 3. Systematic analysis of imprintedgenes based on the comprehensiveinformation of imprinted genes inMetaImprint. (A) Functional analysis of 317human imprinted genes. (B) Genes involved inhuman disorders. (C) Diseases induced byhuman imprinted genes. (D) Examples of theimprinting disorder network and the disordergene network.

2520

RESEARCH ARTICLE Development (2014) 141, 2516-2523 doi:10.1242/dev.105320

DEVELO

PM

ENT

Page 6: MetaImprint: an information repository of mammalian ...MySQL backend. The starting point of MetaImprint is the search page, which provides three entries including the Imprinted gene

DMGenes are related to hepatic carcinoma (Fig. 4B). For example, aprevious study reported the loss of imprinting in IGF2, whichencodes a member of the insulin family of polypeptide growthfactors involved in development and growth, influencing livertumorigenesis (Haddad and Held, 1997). It has been reported thatCDKN1C is abnormally expressed in human hepatocarcinomas, butwithout mutations (Bonilla et al., 1998; Schwienbacher et al., 2000),indicating the potential roles of methylation difference in regulatingthe differential expression of this gene between the liver and hepaticcarcinoma (Fig. 4D). The other DMGenes may also be associatedwith the development of human hepatic carcinoma and furtheranalysis of these genes is necessary.

DISCUSSIONGenomic imprinting is a common epigenetic phenomenon inmammals. Allele-specific expression and DNAmethylation are twowell-known characteristics of genomic imprinting involved indevelopment and complex diseases. For promoting the applicationandunderstanding of imprinted genes,we developed acomprehensivecross-species database, MetaImprint, which is focused on thesystematic study of imprinted genes in the meta-view, includinggenetics, epigenetics and functional genomes. The latest release ofMetaImprint incorporates 539 imprinted genes from eight species,and related genome-wide genetic and epigenetic information,including ICRs, SNPs, repetitive elements, non-coding RNAs,DNA methylation and histone modifications, as well as humanimprinting disorders.The erasing, building and maintaining of gene imprinting are

important in the normal development of embryos. Tissue and cellspecificity of imprinted genes have been investigated extensively(Yamasaki-Ishizaki et al., 2007). Appropriate expression of imprintedgenes is important for normal development. The state transformation

of imprinted genes is associatedwith development, tumors,metabolicdiseases and congenital diseases. For example, Xin et al. found thatdynamic expression of imprinted genes is associated with maternallycontrolled nutrient allocation during maize endosperm development(Xin et al., 2013). However, abnormal imprinting may cause varioushumandisorders includingWilms’ tumorand other cancers (ChaoandD’Amore, 2008). The disease information also confirmed thepotential roles of imprinted genes in human diseases. Based onthe information for imprinted genes in this database, we provided anexample of its application for studying regulatory mechanisms andpotential roles in development and diseases. MetaImprint may be auseful analytical tool to facilitate the systemic mining of relatedfeatures of imprinted genes. The in-depth studyof genomic imprintingfacilitated by MetaImprint will be valuable for understanding diseasemechanisms, such as carcinogenesis.

Recently, high-throughput sequencing data, including single-celltranscriptomes and MeDIP-based methylomes, have been generatedfrom a genome-wide perspective to study gene imprinting of differenttissue/cell/disease types. The re-evaluation of imprinted genes inmammals using the cumulative research over the last few decadesshould reveal hidden clues to the genomic homology of chromosomeimprinting regions. Going forward, many novel imprinted genes froma variety of species and the information related to the imprinted genesshould accumulate much faster, which highlights the importance of acomprehensive database such as MetaImprint. The allele-specificexpression of noncoding RNA, for instance AK003491 (Zeng et al.,2013), has also been included inMetaImprint. The database is a usefulinformation repository for hosting further imprinted genes. As newstudies of imprinting are reported, MetaImprint will be updatedregularly. In future, more information on imprinted genes in differentmammals will be integrated into the database, which should help thesystematic analysis of the dynamics of gene imprinting during

Fig. 4. Analysis of DNA methylationdifference of imprinted genes betweenliver and hepatic carcinoma.(A) Differentially methylated imprintedgenes (DMGene) in human liver and hepaticcarcinoma. (B) Imprinted genes related tohepatic carcinoma in MetaImprint.(C) Visualization of the DNAmethylation andgenetic information in the differentiallymethylated regions of imprinted geneCDKN1C.

2521

RESEARCH ARTICLE Development (2014) 141, 2516-2523 doi:10.1242/dev.105320

DEVELO

PM

ENT

Page 7: MetaImprint: an information repository of mammalian ...MySQL backend. The starting point of MetaImprint is the search page, which provides three entries including the Imprinted gene

mammalian development and the potential roles of abnormalimprinting in diseases.

MATERIALS AND METHODSThe collection of imprinted genesTo mine the imprinted genes from the literature, we searched the PubMeddatabase using the keywords ‘imprint OR imprinted OR imprinting’ andobtained 18,696 publications. We then scanned each publication andremoved 11,163 that were not gene-related studies. From the remaining7533 publications, we manually excavated imprinted genes according to thedescription of the imprinting state in each study. The imprinting state of eachimprinted gene was divided into four classes: (1) imprinted states (I): thedescription of the gene’s imprinting states was mentioned as imprinted; (2)not imprinted states (NI): the gene was not detected as imprinted by theauthors; (3) loss of imprint (LOI): the imprinting states were lost orabnormal in multiple disorders, such as Prader–Willi syndrome (PWS) andAngelman syndrome (AS); and (4) imprinted states by prediction (P): theimprinting states of the genes were predicted by computational methods.

The genetic and epigenetic annotation informationMetaImprint incorporated the genomic and epigenomic information relatedto the imprinted genes from reference literatures and public databases. Thegenetic annotation features (imprinted gene names, genomic location,repetitive elements and chromosome bands data) were downloaded from theUCSC Table Browser (Karolchik et al., 2004) and NCBI’s Entrez gene(Maglott et al., 2011). The CpG islands of mammalian genomes wereidentified by CpG_MI (Su et al., 2010) and downloaded from the UCSCTable Browser. The SNP data were downloaded from The InternationalHapMap Consortium (2003) and the UCSC Table Browser. The ncRNAdata were obtained from NONCODE (Liu et al., 2005). More than 30 ICRsfor human and mouse were collected manually from the literature. KEGGIDs (Kanehisa, 2002) and GO terms (Ashburner et al., 2000) were obtainedfrom their web servers or the UCSC Table Browser and preprocessed using acustom Perl program. The high-throughput epigenetic data for DNAmethylation and histone modifications in multiple cell types of human andmouse genomes were primarily collected from NCBI Epigenomics(Fingerman et al., 2011), DiseaseMeth (Lv et al., 2012), Human HistoneModification Database (HHMD) (Zhang et al., 2010), DevMouse (Liu et al.,2014) and GEO (Barrett and Edgar, 2006). All these genetic and epigeneticdata are available to download from the ftp server of MetaImprint.

Information related to imprinting disordersLiterature mining was performed to meticulously collect the associationsbetween imprinted genes and approximately 70 human diseases(supplementary material Table S1). These records could be valuablereferences to analyze the potential function of disorganized imprinting indiverse diseases.

Construction of imprinted genes and human disease networkUsing the information regarding the relationships between human imprintedgenes and diseases inMetaImprint, we used Cytoscape (Shannon et al., 2003)to build an imprinted genes and human disease network. In the human diseasenetwork, the nodes are diseases and two diseases are linked if they share at leastone gene. In the imprinted genes network, the nodes are imprinted genes, andtwo genes are linked if they participate in the same disease.

AcknowledgementsWe acknowledge Shanshan Jia and Chunlong Zhang from our laboratory, and JiangZhu from Harbin Medical University Daqing Campus for their comments.

Competing interestsThe authors declare no competing financial interests.

Author contributionsY.Z. and Q.W. developed the concepts. Y. Wei built the database. J.S. and HongboLiu performed the data analysis. J.L., F.W., H.Y., Y. Wen and Hui Liu collected theinformation on imprinted genes. Y. Wei, J.S. and Hongbo Liu prepared and editedthe manuscript.

FundingThis work was supported in part by the National Natural Science Foundation ofChina [61203262, 31371334, 31171383 and 31371478]; and the ScienceFoundation of Heilongjiang Province [QC2011C061].

Supplementary materialSupplementary material available online athttp://dev.biologists.org/lookup/suppl/doi:10.1242/dev.105320/-/DC1

ReferencesAbu-Amero, S., Monk, D., Frost, J., Preece, M., Stanier, P. and Moore, G. E.

(2008). The genetic aetiology of Silver-Russell syndrome. J. Med. Genet. 45,193-199.

Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M.,Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T. et al. (2000). Geneontology: tool for the unification of biology. The Gene Ontology Consortium. Nat.Genet. 25, 25-29.

Babak, T., DeVeale, B., Armour, C., Raymond, C., Cleary, M. A., van derKooy, D.,Johnson, J. M. and Lim, L. P. (2008). Global survey of genomic imprinting bytranscriptome sequencing. Curr. Biol. 18, 1735-1741.

Barlow, D. P. (2011). Genomic imprinting: a mammalian epigenetic discoverymodel. Annu. Rev. Genet. 45, 379-403.

Barlow, D. P., Stoger, R., Herrmann, B. G., Saito, K. and Schweifer, N. (1991).The mouse insulin-like growth factor type-2 receptor is imprinted and closelylinked to the Tme locus. Nature 349, 84-87.

Barrett, T. and Edgar, R. (2006). Mining microarray data at NCBI’s GeneExpression Omnibus (GEO)*. Methods Mol. Biol. 338, 175-190.

Barrett, T., Wilhite, S. E., Ledoux, P., Evangelista, C., Kim, I. F., Tomashevsky,M., Marshall, K. A., Phillippy, K. H., Sherman, P. M., Holko, M. et al. (2013).NCBI GEO: archive for functional genomics data sets – update. Nucleic AcidsRes. 41, D991-D995.

Bischoff, S. R., Tsai, S., Hardison, N., Motsinger-Reif, A. A., Freking, B. A.,Nonneman, D., Rohrer, G. and Piedrahita, J. A. (2009). Characterization ofconserved and nonconserved imprinted genes in swine. Biol. Reprod. 81,906-920.

Blake,A., Pickford, K., Greenaway, S., Thomas,S., Pickard,A.,Williamson,C.M.,Adams, N. C., Walling, A., Beck, T., Fray, M. et al. (2010). MouseBook: anintegrated portal of mouse resources. Nucleic Acids Res. 38, Database issue,D593-D599.

Bliek, J.,Maas,S.M.,Ruijter, J.M.,Hennekam,R.C.M.,Alders,M.,Westerveld,A.and Mannens, M. M. A. M. (2001). Increased tumour risk for BWS patientscorrelates with aberrant H19 and not KCNQ1OT1 methylation: occurrence ofKCNQ1OT1 hypomethylation in familial cases of BWS. Hum. Mol. Genet. 10,467-476.

Bonilla, F., Orlow, I. and Cordon-Cardo, C. (1998). Mutational study ofp16CDKN2/MTS1/INK4A and p57KIP2 genes in hepatocellular carcinoma.Int. J. Oncol. 12, 583-588.

Brideau, C. M., Eilertson, K. E., Hagarman, J. A., Bustamante, C. D. andSoloway, P. D. (2010). Successful computational prediction of novel imprintedgenes from epigenomic features. Mol. Cell. Biol. 30, 3357-3370.

Chang, G., Gao, S., Hou, X., Xu, Z., Liu, Y., Kang, L., Tao, Y., Liu, W., Huang, B.,Kou, X. et al. (2013). High-throughput sequencing reveals the disruption ofmethylation of imprinted gene in induced pluripotent stem cells. Cell Res. 24,293-306.

Chao, W. and D’Amore, P. A. (2008). IGF2: epigenetic regulation and role indevelopment and disease. Cytokine Growth Factor Rev. 19, 111-120.

Court, F., Tayama, C., Romanelli, V., Martin-Trujillo, A., Iglesias-Platas, I.,Okamura, K., Sugahara, N., Simon, C., Moore, H., Harness, J. V. et al. (2014).Genome-wide parent-of-origin DNAmethylation analysis reveals the intricacies ofthe human imprintome and suggests a germline methylation independentestablishment of imprinting. Genome Res. 24, 554-569.

Dennis, G., Jr, Sherman, B. T., Hosack, D. A., Yang, J., Gao, W., Lane, H. C. andLempicki, R. A. (2003). DAVID: database for annotation, visualization, andintegrated discovery. Genome Biol. 4, P3.

Fang, F., Hodges, E., Molaro, A., Dean, M., Hannon, G. J. and Smith, A. D.(2012). Genomic landscape of human allele-specific DNAmethylation. Proc. Natl.Acad. Sci. U.S.A. 109, 7332-7337.

Fingerman, I. M., McDaniel, L., Zhang, X., Ratzat, W., Hassan, T., Jiang, Z.,Cohen, R. F. and Schuler, G. D. (2011). NCBI Epigenomics: a new publicresource for exploring epigenomic data sets. Nucleic Acids Res. 39 Suppl. 1,D908-D912.

Haddad, R. and Held, W. A. (1997). Genomic imprinting and Igf2 influence livertumorigenesis and loss of heterozygosity in SV40 T antigen transgenic mice.Cancer Res. 57, 4615-4623.

Hutter, B., Helms, V. and Paulsen, M. (2006). Tandem repeats in the CpG islandsof imprinted genes. Genomics 88, 323-332.

Ishida, M. and Moore, G. E. (2013). The role of imprinted genes in humans. Mol.Aspects Med. 34, 826-840.

2522

RESEARCH ARTICLE Development (2014) 141, 2516-2523 doi:10.1242/dev.105320

DEVELO

PM

ENT

Page 8: MetaImprint: an information repository of mammalian ...MySQL backend. The starting point of MetaImprint is the search page, which provides three entries including the Imprinted gene

Jelinic, P. and Shaw, P. (2007). Loss of imprinting and cancer. J. Pathol. 211,261-268.

Kanehisa, M. (2002). The KEGG database. Novartis Found. Symp. 247, 91-101;discussion 101-103, 119-128, 244-152.

Karolchik, D., Hinrichs, A. S., Furey, T. S., Roskin, K. M., Sugnet, C. W.,Haussler, D. and Kent, W. J. (2004). The UCSC Table Browser data retrievaltool. Nucleic Acids Res. 32, D493-D496.

Karolchik, D., Barber, G. P., Casper, J., Clawson, H., Cline, M. S., Diekhans, M.,Dreszer, T. R., Fujita, P. A., Guruvadoo, L., Haeussler, M. et al. (2014). TheUCSC Genome Browser database: 2014 update. Nucleic Acids Res. 42,D764-D770.

Khatib, H., Zaitoun, I. and Kim, E.-S. (2007). Comparative analysis of sequencecharacteristics of imprinted genes in human, mouse, and cattle. Mamm. Genome18, 538-547.

Koerner, M. V. and Barlow, D. P. (2010). Genomic imprinting-an epigenetic gene-regulatory model. Curr. Opin. Genet. Dev. 20, 164-170.

Lawson, H. A., Cheverud, J. M. and Wolf, J. B. (2013). Genomic imprinting andparent-of-origin effects on complex traits. Nat. Rev. Genet. 14, 609-617.

Liu, C., Bai, B., Skogerbø, G., Cai, L., Deng, W., Zhang, Y., Bu, D., Zhao, Y. andChen, R. (2005). NONCODE: an integrated knowledge database of non-codingRNAs. Nucleic Acids Res. 33, Database issue, D112-D115.

Liu, H., Chen, Y., Lv, J., Zhu, R., Su, J., Liu, X., Zhang, Y. and Wu, Q. (2013).Quantitative epigenetic co-variation in CpG islands and co-regulation ofdevelopmental genes. Sci. Rep. 3, 2576.

Liu, H., Zhu, R., Lv, J., He, H., Yang, L., Huang, Z., Su, J., Zhang, Y., Yu, S. andWu, Q. (2014). DevMouse, the mouse developmental methylome database andanalysis tools. Database 2014, bat084.

Luedi, P. P., Hartemink, A. J. and Jirtle, R. L. (2005). Genome-wide prediction ofimprinted murine genes. Genome Res. 15, 875-884.

Luedi, P. P., Dietrich, F. S., Weidman, J. R., Bosko, J. M., Jirtle, R. L. andHartemink, A. J. (2007). Computational and experimental identification of novelhuman imprinted genes. Genome Res. 17, 1723-1730.

Lv, J., Liu, H., Su, J.,Wu, X., Liu,H., Li, B., Xiao, X.,Wang, F.,Wu,Q. andZhang, Y.(2012). DiseaseMeth: a human disease methylation database. Nucleic Acids Res.40, D1030-D1035.

Maglott, D., Ostell, J., Pruitt, K. D. and Tatusova, T. (2011). Entrez gene: gene-centered information at NCBI. Nucleic Acids Res. 39, Database issue, D52-D57.

Morison, I. M. and Reeve, A. E. (1998). A catalogue of imprinted genes and parent-of-origin effects in humans and animals. Hum. Mol. Genet. 7, 1599-1609.

Rancourt, R. C., Harris, H. R., Barault, L. and Michels, K. B. (2013). Theprevalence of loss of imprinting of H19 and IGF2 at birth. FASEB J. 27,3335-3343.

Reik, W. and Walter, J. (2001). Genomic imprinting: parental influence on thegenome. Nat. Rev. Genet. 2, 21-32.

Royo, H. andCavaille, J. (2008). Non-coding RNAs in imprinted gene clusters.Biol.Cell. 100, 149-166.

Schulz, R., Woodfine, K., Menheniott, T. R., Bourc’his, D., Bestor, T. andOakey,R. J. (2008). WAMIDEX: a web atlas of murine genomic imprinting and differentialexpression. Epigenetics 3, 89-96.

Schwienbacher, C., Gramantieri, L., Scelfo, R., Veronese, A., Calin, G. A.,Bolondi, L., Croce, C. M., Barbanti-Brodano, G. andNegrini, M. (2000). Gain ofimprinting at chromosome 11p15: a pathogenetic mechanism identified in humanhepatocarcinomas. Proc. Natl. Acad. Sci. U.S.A. 97, 5445-5449.

Shannon,P.,Markiel,A.,Ozier,O.,Baliga,N.S.,Wang,J.T.,Ramage,D.,Amin,N.,Schwikowski, B. and Ideker, T. (2003). Cytoscape: a software environment forintegrated models of biomolecular interaction networks. Genome Res. 13,2498-2504.

Shoemaker, R., Deng, J., Wang, W. and Zhang, K. (2010). Allele-specificmethylation is prevalent and is contributed by CpG-SNPs in the human genome.Genome Res. 20, 883-889.

Su, J., Zhang, Y., Lv, J., Liu, H., Tang, X., Wang, F., Qi, Y., Feng, Y. and Li, X.(2010). CpG_MI: a novel approach for identifying functional CpG islands inmammalian genomes. Nucleic Acids Res. 38, e6.

Surani, M. A., Kothary, R., Allen, N. D., Singh, P. B., Fundele, R., Ferguson-Smith, A. C. and Barton, S. C. (1990). Genome imprinting and development inthe mouse. Development, Suppl., 89-98.

The International HapMapConsortium (2003). The International HapMap Project.Nature 426, 789-796.

Tomizawa, S.-i. and Sasaki, H. (2012). Genomic imprinting and its relevance tocongenital disease, infertility, molar pregnancy and induced pluripotent stem cell.J. Hum. Genet. 57, 84-91.

Wang, X., Sun, Q., McGrath, S. D., Mardis, E. R., Soloway, P. D. and Clark, A. G.(2008). Transcriptome-wide identification of novel imprinted genes in neonatalmouse brain. PLoS ONE 3, e3839.

Wheeler, D. L., Barrett, T., Benson,D.A., Bryant, S.H., Canese,K., Chetvernin, V.,Church, D. M., DiCuccio, M., Edgar, R., Federhen, S. et al. (2007). Databaseresources of the National Center for Biotechnology Information. Nucleic Acids Res.35, Database issue, D5-D12.

Wu, Q., Kumagai, T., Kawahara, M., Ogawa, H., Hiura, H., Obata, Y., Takano, R.and Kono, T. (2006). Regulated expression of two sets of paternally imprintedgenes is necessary for mouse parthenogenetic development to term.Reproduction 131, 481-488.

Wu, Q., Kawahara, M. and Kono, T. (2008). Synergistic role of Igf2 and Dlk1 in fetalliver development and hematopoiesis in bi-maternal mice. J. Reprod. Dev. 54,177-182.

Xin,M.,Yang,R., Li,G.,Chen,H.,Laurie, J.,Ma,C.,Wang,D.,Yao,Y., Larkins,B.A.,Sun, Q. et al. (2013). Dynamic expression of imprinted genes associates withmaternally controllednutrientallocationduringmaizeendospermdevelopment.PlantCell 25, 3212-3227.

Yamasaki-Ishizaki, Y., Kayashima, T., Mapendano, C. K., Soejima, H., Ohta, T.,Masuzaki, H., Kinoshita, A., Urano, T., Yoshiura, K.-i., Matsumoto, N. et al.(2007). Role of DNA methylation and histone H3 lysine 27 methylation in tissue-specific imprinting of mouse Grb10. Mol. Cell. Biol. 27, 732-742.

Zeng, T. B., He, H. J., Zhang, F. W., Han, Z. B., Huang, Z. J., Liu, Q. and Wu, Q.(2013). Expression analysis of AK003491, an imprinted noncoding RNA, duringmouse development. Genes Genet. Syst. 88, 127-133.

Zhang, Y., Lv, J., Liu, H., Zhu, J., Su, J., Wu, Q., Qi, Y., Wang, F. and Li, X. (2010).HHMD: the human histone modification database. Nucleic Acids Res. 38,Database issue, D149-D154.

2523

RESEARCH ARTICLE Development (2014) 141, 2516-2523 doi:10.1242/dev.105320

DEVELO

PM

ENT