research article identification of simple sequence...

12
Research Article Identification of Simple Sequence Repeat Biomarkers through Cross-Species Comparison in a Tag Cloud Representation Jhen-Li Huang, 1 Hao-Teng Chang, 2,3 Ronshan Cheng, 4 Hui-Huang Hsu, 5 and Tun-Wen Pai 1 1 Department of Computer Science and Engineering, National Taiwan Ocean University, Keelung 20224, Taiwan 2 Graduate Institute of Basic Medical Science, China Medical University, Taichung City 40402, Taiwan 3 Department of Computer Science and Information Engineering, Asia University, Taichung City 41354, Taiwan 4 Department of Aquaculture, National Taiwan Ocean University, Keelung 20224, Taiwan 5 Department of Computer Science and Information Engineering, Tamkang University, New Taipei City 25137, Taiwan Correspondence should be addressed to Tun-Wen Pai; [email protected] Received 22 November 2013; Revised 27 February 2014; Accepted 27 February 2014; Published 31 March 2014 Academic Editor: Jose C. Nacher Copyright © 2014 Jhen-Li Huang et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Simple sequence repeats (SSRs) are not only applied as genetic markers in evolutionary studies but they also play an important role in gene regulatory activities. Efficient identification of conserved and exclusive SSRs through cross-species comparison is helpful for understanding the evolutionary mechanisms and associations between specific gene groups and SSR motifs. In this paper, we developed an online cross-species comparative system and integrated it with a tag cloud visualization technique for identifying potential SSR biomarkers within fourteen frequently used model species. Ultraconserved or exclusive SSRs among cross-species orthologous genes could be effectively retrieved and displayed through a friendly interface design. Four different types of testing cases were applied to demonstrate and verify the retrieved SSR biomarker candidates. rough statistical analysis and enhanced tag cloud representation on defined functional related genes and cross-species clusters, the proposed system can correctly represent the patterns, loci, colors, and sizes of identified SSRs in accordance with gene functions, pattern qualities, and conserved characteristics among species. 1. Introduction Simple sequence repeats (SSRs) are nonrandom distributed nucleotides in genomes of different organisms with repeated basic patterns of lengths from mononucleotide to hexanu- cleotide [1]. SSRs have been demonstrated as important motifs involved within various biological events including evolutionary processes, gene expression, genetic disease, chromatin organization, and DNA metabolic processes [24]. For example, dysplasia disease is a genetic disorder of abnormal cellular development due to imperfect polyalanine expansions (GCC repeats) on RUNX2 (CBFA1)[5]. Another example of Huntington’s disease (HD) was found as an irregular distribution of polyglutamine expansions (CAG repeats) located within the coding regions of Huntingtin (HTT) gene, and the excessive repeat number caused the symptoms of genetic neurological disease which appeared at an earlier stage [6]. In addition to illustrate the effects of mutations and expansions of SSR repeats on diseases, another example to demonstrate the function of SSR motifs is the insulin-like growth factor 1 (IGF1) which was confirmed as one of the growth control genes. e IGF1 gene contains “AC” repeats located within the upstream regions and is a major determinant of small body size in dogs [79]. From previous reports, evidences show that SSR regulation relies on pattern of repeat unit, repeat length, and genetic location in the target genes [2]. ese features are fundamental parameters for identifying functional SSRs under various biological applica- tions. However, due to abundant amount of SSRs distributed within genome sequences, it is yet challenging to select significant SSR biomarkers or gene regulation related SSRs automatically from limited information. erefore, identify- ing highly conserved SSRs through cross-species comparison may provide an alternative approach to recognize significant biomarkers or discover putative gene regulatory SSR motifs from enormous gene candidates under the assumption of Hindawi Publishing Corporation BioMed Research International Volume 2014, Article ID 678971, 11 pages http://dx.doi.org/10.1155/2014/678971

Upload: votuyen

Post on 12-Apr-2018

224 views

Category:

Documents


1 download

TRANSCRIPT

Research ArticleIdentification of Simple Sequence Repeat Biomarkers throughCross-Species Comparison in a Tag Cloud Representation

Jhen-Li Huang1 Hao-Teng Chang23 Ronshan Cheng4 Hui-Huang Hsu5 and Tun-Wen Pai1

1 Department of Computer Science and Engineering National Taiwan Ocean University Keelung 20224 Taiwan2Graduate Institute of Basic Medical Science China Medical University Taichung City 40402 Taiwan3Department of Computer Science and Information Engineering Asia University Taichung City 41354 Taiwan4Department of Aquaculture National Taiwan Ocean University Keelung 20224 Taiwan5Department of Computer Science and Information Engineering Tamkang University New Taipei City 25137 Taiwan

Correspondence should be addressed to Tun-Wen Pai twpmailntouedutw

Received 22 November 2013 Revised 27 February 2014 Accepted 27 February 2014 Published 31 March 2014

Academic Editor Jose C Nacher

Copyright copy 2014 Jhen-Li Huang et alThis is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Simple sequence repeats (SSRs) are not only applied as genetic markers in evolutionary studies but they also play an important rolein gene regulatory activities Efficient identification of conserved and exclusive SSRs through cross-species comparison is helpfulfor understanding the evolutionary mechanisms and associations between specific gene groups and SSR motifs In this paper wedeveloped an online cross-species comparative system and integrated it with a tag cloud visualization technique for identifyingpotential SSR biomarkers within fourteen frequently used model species Ultraconserved or exclusive SSRs among cross-speciesorthologous genes could be effectively retrieved and displayed through a friendly interface design Four different types of testingcases were applied to demonstrate and verify the retrieved SSR biomarker candidatesThrough statistical analysis and enhanced tagcloud representation on defined functional related genes and cross-species clusters the proposed system can correctly represent thepatterns loci colors and sizes of identified SSRs in accordance with gene functions pattern qualities and conserved characteristicsamong species

1 Introduction

Simple sequence repeats (SSRs) are nonrandom distributednucleotides in genomes of different organisms with repeatedbasic patterns of lengths from mononucleotide to hexanu-cleotide [1] SSRs have been demonstrated as importantmotifs involved within various biological events includingevolutionary processes gene expression genetic diseasechromatin organization and DNA metabolic processes [2ndash4] For example dysplasia disease is a genetic disorder ofabnormal cellular development due to imperfect polyalanineexpansions (GCC repeats) on RUNX2 (CBFA1) [5] Anotherexample of Huntingtonrsquos disease (HD) was found as anirregular distribution of polyglutamine expansions (CAGrepeats) located within the coding regions of Huntingtin(HTT) gene and the excessive repeat number caused thesymptoms of genetic neurological disease which appearedat an earlier stage [6] In addition to illustrate the effects of

mutations and expansions of SSR repeats on diseases anotherexample to demonstrate the function of SSR motifs is theinsulin-like growth factor 1 (IGF1) which was confirmed asone of the growth control genesThe IGF1 gene contains ldquoACrdquorepeats located within the upstream regions and is a majordeterminant of small body size in dogs [7ndash9] From previousreports evidences show that SSR regulation relies on patternof repeat unit repeat length and genetic location in the targetgenes [2] These features are fundamental parameters foridentifying functional SSRs under various biological applica-tions However due to abundant amount of SSRs distributedwithin genome sequences it is yet challenging to selectsignificant SSR biomarkers or gene regulation related SSRsautomatically from limited information Therefore identify-ing highly conserved SSRs through cross-species comparisonmay provide an alternative approach to recognize significantbiomarkers or discover putative gene regulatory SSR motifsfrom enormous gene candidates under the assumption of

Hindawi Publishing CorporationBioMed Research InternationalVolume 2014 Article ID 678971 11 pageshttpdxdoiorg1011552014678971

2 BioMed Research International

natural long-term evolutionary processes On the other handdiscovering exclusive SSR motifs among different speciesclusters could be applied as species-specific genetic markersor provide unique genetic functions which were developedafter species differentiation events The comparison of SSRmotifs across different species clusters may provide impor-tant clues and evidences to further understand evolutionarydevelopment

To efficiently identify SSR biomarkers from large amountof genes in different species considering a few interestedgenes at a time provides an intuitive and effective approachOne possible approach of selecting interested gene groupsfrom gene ontology (GO) terms was employed in thisstudy The GO is a set of structured vocabularies definedby Gene Ontology Consortium [10] which is aimed toprovide a universal standard of functional annotation forgene products All GO terms are connected with each otherby directed acyclic graphs with hierarchy relationship Eachterm belongs to one of the three independent ontologiesbiological process (BP) molecular function (MF) and cellu-lar component (CC) and represent different aspects of genein temporal functional and spatial domains respectivelyIn this study the query biological keywords associated withcorresponding GO terms could provide a set of functional-associated gene set for SSR biomarker analysis Recently sev-eral gene sequence studies associated with GO analysis havebeen reported such as the Gene Ontology SSR Hierarchy(GOSH) system which adopts GO terms to reveal prominentorthologous SSR patterns [11] FatiGO which is a web toolfor finding significant associations of Gene Ontology termswith groups of genes [2] and Goblet system which performsautomatic GO term annotation on anonymous sequences[12]

To enhance the ranking and readability of identified SSRmotifs a tag cloud technique was adopted to display thecomparative results of cross-species SSRs Tag cloud rep-resentation is a widespread visualization technology whichprovides users with an informative image from a designatedset of data Tags of different phases or short sentences rep-resent key information of each entry in the dataset Multipletags for various data entries could be displayed in an imagesimultaneously and which are manually assigned by usersor automatically generated by computer algorithms Each tagcloud could be shown with different visual attributes such asdifferent sizes or colors In tradition different sizes of tagsare designed to indicate various levels of representativeness oftags within the dataset [12] Currently tag clouds have beenwidely used in several different types of websites includingphoto albums bookmarks and blogs It is also used insome tag-based biomedical datasets to help users to rapidlyunderstand the representative information from a complexdataset For example the iHOPerator system employed tagcloud technique with related functions for genes analysis[13] INTERFEROME applied tag cloud visualization on geneontology databases for interferon regulated genes [14] andREVIGOused tag cloud approach to summarize and visualizelong lists of gene ontology terms [15] All these exampleshave shown that tag cloud visualization techniques could

be applied to strengthen key information from complexbiological datasets

In this study we have collected complete genomesequences of 14 model species as the fundamental datasetAll SSR motifs in each gene were extracted and saved inthe designed database in advance Users can prepare a setof genes or assign keywords to defined query genes andthen choosemodel species of interest for cross-species clustercomparison Model species of interest could be manuallyclustered or automatically categorized into two groups ofmammal and marine species clusters SSR retrieval anddistribution analysis for single species is also available fromthe developed system Once all parameters have been settledthe system will perform online comparison and display allgrouped SSR motifs in a tag cloud visualization approachAll significantly conserved or exclusive SSR motifs locatedwithin the specified gene sets from two species clusterswill be efficiently identified and displayed In addition allretrieved SSR biomarker candidates will be shown in a tagcloud representation with occurrence frequency conservedratio gene annotation sequence contents and correspondingtranslated proteins through a fast responsive and user-friendly web page design

2 Materials and Methods

21 System Configuration In this study there are fourteeninitially selected genomes obtained from Ensembl database[16] and all collected gene sequences with their corre-sponding gene coordinates and annotations were down-loaded for cross-species comparison in next modules Eachgene sequence including upstream and downstream regionswas scanned and all perfectimperfect SSR patterns underdifferent parameter settings were extracted from collectedgenes and saved in a newly created SSR database Accord-ing to gene coordinate information the developed systemdetermined the corresponding genetic regions for each SSRmotif and all related annotations were saved in the sameentry Accordingly the analytical module utilized cross-species comparison techniques between two assigned speciesclusters and all statistically conserved andor exclusive SSRpatterns could be shown under a tag cloud representationtechnique All details are introduced in the next two sections

22 Genome Sequences To obtain genome sequences ofvarious organisms the developed system employed Ensemblrelease 65 as the major data resource Ensembl databaseprovides complete genome information on multiple eukary-otic model organisms including whole genome sequencegene annotations and molecular functions To lay emphasison identification of consensus and unique features of SSRsamong different species two species clusters including fisheryand mammal species were initially selected for comparisonSince there were only 6 fishery species that could be collectedfrom Ensembl release 65 we therefore selected another 6popular mammal species for equivalent status Besides twofamous research organisms in experimental studies werealso included in our database These intentionally selected

BioMed Research International 3

model species are zebrafish (Danio rerio) stickleback (Gas-terosteus aculeatus) medaka (Oryzias latipes) fugu (Takifugurubripes) tetraodon (Tetraodon nigroviridis) and cod (Gadusmorhua) as fishery species human (Homo sapiens) gorilla(Gorilla gorilla) macaque (Macaca mulatta) mouse (Musmusculus) cow (Bos taurus) and dog (Canis familiaris)as mammal species roundworm (Caenorhabditis elegans)and fruit fly (Drosophila melanogaster) as extra two pop-ular experimental species These downloaded data includesequence contents coordinates of exonsintrons and UTRsfor each gene and upstream and downstream regions witha length of 2000 nucleotides

23 SSR Motif Database Construction To accelerate search-ing speed in identifying all perfect and imperfect orthologousSSRs from a set of specified genes among fourteen species weperformed an autocorrelation based SSR discovery algorithmand constructed the SSR motif database in advance [17] Theautocorrelation algorithm could extract all candidate per-fectimperfect SSR motifs under different threshold param-eters through an efficient and effectively approach In thisstudy we only considered SSR motifs with nucleotide lengthlonger than 20 nucleotides and the length of fundamentalrepeat unit ranging from 1 to 6 nucleotides The SSR motif-searching algorithm also applied a proportional quality factorfor defining SSR patterns of different degrees of noise Inthis study three different tolerant settings were tentativelyapplied for considering noisy patterns within multiscale SSRtag clouds in later presentationThe tolerant parameters wereinitially set as 0 01 and 02 for representing 0 10 and 20percent of noisy contentswithin an SSRmotifThepercentageof noise is defined as the ratio of the nonrepeated nucleotideswithin a total length of an identified SSR which includesnoise types of insertion deletion and substitutionmutationsIn other words the zero percent noisy rate represents aperfect repeat segmentwithout any toleranceThe formula forthe tolerant percentage is shown in the following equation

Tolerant () =nonrepeated nucleotidesidentified SSR length

times 100 (1)

An SSR motif could locate in six different geneticregions of a specified gene including coding intron 51015840 UTR(untranslated region) 31015840 UTR and upstream and down-stream regions In this system the upstream and downstreamregions are defined as an extended range of 2000 nucleotidesfrom the start and end positions of transcription In additionaccording to the shifting mechanism of repeating segmentsand the complementary based-paired nature in DNAdouble-stranded helical structures several possible combinations ofSSR patterns could be considered as an identical SSR motifwithin genetic loci For example any rotation of a basic repeatpattern is considered as the same SSR element such as aldquoTArdquo repeat could be also defined as an equivalent repeatmotif as an ldquoATrdquo repeat pattern through one nucleotideshifting Another situation of an identical SSR motif withdifferent appearance is through complementary based paringand inverse reading from the DNA sequences For examplethe repeat pattern of ldquoAGCrdquo would appear as ldquoGCTrdquo within

the other complementary strands of DNA Therefore toenumerate all possible SSR patterns in all DNA sequencesunder these two constraints there are exactly 501 funda-mental basic SSR patterns from 1 to 6 nucleotides in length[18] However there is one special condition that should becarefully considered when an SSR motif occurs in codingregions Since the translation processes convert an mRNAsequence into a string of amino acids through the codontable encoding processes the equivalent status due to shiftingmechanisms and complementary strand should be limitedHere we provide their true translated protein sequences fromthe locations of identified SSRmotifs and the in-frame infor-mation will be clearly annotated when the orthologous repeatmotifs are found in coding regions Finally to distinguishdifferent SSR patterns from extensive genomic resources thesystem defines an identifier for an SSR motif by its basicpattern in accordance with its corresponding genetic locationwithin the specified gene For example ldquoAGCodingrdquo inEnsembl gene id ldquoENSG00000069329rdquo represents a specificrepeated pattern ldquoAGrdquo appearing within the coding regionof ldquoENSG00000069329rdquo According to prerunning processesunder various parameter settings for identifying all possibleSSR motifs in accordance with both detailed coordinatesand annotated information from Ensembl database weconstructed a comprehensive SSR motif database for allgenes from any specified species These identified SSR motifsfrom each gene would be recognized as ldquotagrdquo items for thefollowing cross-species comparison and all retrieved SSRtags from the input gene set will be further compared basedon occurrence rates and applied to construct a multiscale tagcloud representation

24 Grouped Species and Cross-Cluster Comparison Due totremendous amount of SSRs nonrandomly distributed ingenome sequences it is not an intuitive task to observe SSRbiomarkers or identify gene regulatory related SSR motifsfrom an individual genome Hence we assume the conservedor exclusive SSR motifs providing important clues for iden-tifying functional SSR motifs or representative biomarkersamong various species To emphasize the long-distance rela-tionship from an evolutionary point of view we have selectedtwo groups of model vertebrate species for orthologous SSRmotif comparisonThe first group represents the mammalianspecies including Bos taurus Canis familiaris Homo sapiensGorilla gorilla Macaca mulatta and Mus musculus thesecond group represents the fishery species including Daniorerio Gadus morhua Gasterosteus aculeatus Oryzias latipesTakifugu rubripes and Tetraodon nigroviridis In addition tothese twelve clustered species we also included two widelyused model organisms including Drosophila melanogasterand Caenorhabditis elegans Nevertheless in this developedsystem users can either apply the previously defined twospecies groups or manually assign them into two clusterswithout any limitation By integrating with cross-speciescomparison techniques and overrepresentation analysis fromassigned gene sets the SSR patterns with conserved andexclusive characteristics in selected genes between differentspecies clusters can be recognized and treated An iden-tified conserved SSR motif would be initially defined as

4 BioMed Research International

an orthologous SSR motif if the conserved ratio meetsthe minimum threshold in an assigned species cluster Forexample a conserved ratio of 80 denotes the identifiedconserved SSR pattern that could be found in at least 80of species in the assigned species cluster which indicatedthat there are at least 5 (6 lowast 80 = 48) different speciespossessing the orthologous gene(s) and holding the specificSSR pattern located within the same genetic region among allorthologous gene(s) Regarding the conditions of many-to-many orthologous genes an SSR motif is defined as holdingconserved feature as long as it could be detected in any oneof its orthologous genes The threshold level of conservedratio can be assigned by users through interactive webpagesettings

Through cross-species comparison between two clusteredgroups retrieving conserved or exclusive SSR motifs couldhelp biologists in choosing significant biomarkers from apreviously defined gene set before performing biologicalexperiments On the other hand exclusive or commonSSR motifs between two different species clusters might beregarded as important genetic markers under the evidencesof biological evolution and functional conservation

25 SSR Tag Cloud Visualization Tag cloud visualizationtechnique provides keyword representation of text databy showing each tag in various font sizes and colors Toenhance the importance of conserved and exclusive SSRmotifs extracted from a set of specified homologous genesbetween two different species clusters we adopted the tagcloud representation to display these identified SSR motifsaccording to their calculated weighting coefficients fromquery gene sets In an SSR tag cloud the tag size ofeach SSR motif not only indicates the conservation statusof the motif among orthologous genes but also displaysthe representativeness among different species clusters Alinear accumulation formula and normalization proceduresfor deciding SSR weighting coefficients were performed fortag size selection This formula simply counts the numberof occurrence times of each SSR motif found from eachindividual gene in different species clusters According tothe definitions of occurrence rate if an identified SSR motifis well conserved in two different species clusters or highlyrepresented in the specified gene set the SSR tag will beassigned with a larger weighting coefficient Accordingly theSSR tag will be displayed with a bigger font size in the tagcloud

In order to visually emphasize identified SSR motifsbelonging to different species clusters we applied differentcolors on SSR tags to distinguish the conserved andorexclusive features of SSR biomarkers between two speciesclusters In this study red tags represent consensus SSRmotifsfor the first species cluster only and satisfy the conservationthreshold in the first species cluster pink tags are applied forrepresenting consensus SSRmotifs for the first species clusteronly but the conservation threshold is not satisfied darkgreen tags represent consensus SSR motifs well conservedwithin the second species cluster only and these motifs alsosatisfied the conservation criterion in the second speciescluster light green tags denote consensus SSR motifs in the

CR

lowastCR conserved ratio percentage

Cluster I Cluster II

Figure 1 Color coded chart for tag cloud representation of iden-tified SSR motifs between two species clusters and the criterion ofconserved ratio

Table 1 Relationship between colors species clusters and con-served ratios of detected SSR motifs

Color Species cluster Conserved motif ratioRed I geCRPink I ltCRBlue I and II geCRYellow I and II ltCRDark green II geCRLight green II ltCR

second species cluster only but the conservation threshold isnot satisfied blue tags represent the identified SSR patternswell conserved in both species clusters and satisfy the speciesconservation percentage as well yellow tags are appliedto show identified SSR patterns conserved but the speciesconservation criterion is not satisfied for the query gene setfrom both species clusters The color-coded information ina resulting tag cloud is shown in Figure 1 and correspondingattributes are described in Table 1 The abbreviated term ofCR represents ldquoconserved ratiordquo percentage of correspond-ing species clusters for each simulation

In the developed system users can also try to identifyimperfect SSR biomarkers by setting different tolerant levelsand the number of retrieved imperfect SSR motifs would bein accordance with the settings proportionally Higher noisyrates allow more tolerant repeat patterns and reflect largernumber of possible SSRmotifs Accordingly the correspond-ing tag clouds could be depicted inmultiscale representationsunder various noise threshold settings In other wordsdifferent scales of tag clouds are composed of SSR motifs ofdifferent tolerant qualities For instance the highest qualityof SSR tag cloud represents that all identified conserved SSRmotifs are with perfect repeating patterns among differentgenes and group species Contrarily lower quality SSR tagclouds contain more tolerant SSR motifs within the tagimage and which may reflect evolutionary status due to genespecification andor duplication events from either distant orclose species Multiscale tag clouds provide biologists withan easier way to compare and select suitable SSR candidatemotifs as biomarkers through a progressive approach ondifferent tolerance levels which could be applied in varioussituations for further design of biological experiments

BioMed Research International 5

Figure 2 Interface of the SSR tag cloud web system (httpssrtccsntouedutw)

3 Results

31 SSR Tag Cloud Web System In this study we havedeveloped an onlineweb system (httpssrtccsntouedutw)for identifying conserved and exclusive SSR biomarkersthrough cross-species cluster comparisonThemain interfaceof the developedweb system is shown in Figure 2 To discoversignificant SSR biomarker candidates from an automaticallygenerated SSR tag cloud a user is required to provide genename(s) or keyword(s) of gene function and simply appliesthe default parameters for system prediction In other wordsa set of query genes could be defined at the first step by pro-viding relevant EnsemblGene IDs GO terms or keywordsBesides the thresholding settings of SSR feature parameterscould also be assigned manually instead of default settingssuch as genetic region length of basic pattern minimumlength of SSR motif SSR quality species cluster and SSRmotif conserved ratio The genetic region and length of basicpattern are applied for distinguishing fundamental featuresof SSR motifs under cross-species cluster comparison Aminimum SSR length is applied to define the minimal lengthfor identification of SSR motifs The SSR quality factorrepresents a tolerance threshold for allowing imperfect SSRsas candidate biomarkers The developed system initiallyprovides three available settings for efficient identification10 for perfect SSRs 08 and 09 for imperfect SSRs with 20and 10 tolerant percentages for an identified SSRmotifThefunction of species cluster assignment is provided for cross-species comparison by classifying species of interest into twoclustersThe parameter ofmotif conserved ratio is designed asthe percentage of qualified species within a cluster possessingthe conserved SSR motif within a target gene Two differentoperation modes were designed for themotif conserved ratioIf a user chooses the condition of larger than or equal tomotifconserved ratio the system will display a resulting SSR tagcloud in 6 colors otherwise an SSR tag cloud will appearin 3 colors only Different color modes of an SSR tag cloudare defined in the previous section Once all parametersand operation modes are defined the system performs SSR

Figure 3 An SSR tag cloud example for ENSG00000069329(VPS35) between two 6-species clusters

biomarker evaluation automatically and generates a final SSRtag cloud for visualization The font color of each SSR tagis mainly decided by the motif conserved ratio parameterand the font size depends only on the occurrence frequencyof an SSR element Users can move the mouse device overany SSR item within the resulting tag clouds and a totalappearance number and conserved ratio of the selected SSRmotif from the target genes of assigned species cluster will bedisplayed The detailed information of each SSR tag is alsoavailable in a floating dialog box by clicking on it whichincludes Ensembl gene ID transcript ID of the specified genepossessing the target SSR motif species name coordinatesin genomes and DNA sequence contents Additionally if anSSR appears within coding regions then its correspondingprotein sequences could be recalled from Ensembl databaseand shown in an additional window

32 SSR Biomarkers for Orthologous Genes To demonstratesystem performance we have selected all orthologous genesfrom twelve vertebrate model species (except fruit fly androundworm) All selected genes possess sequence identi-ties higher than 80 compared to human genome indi-vidually Under this criterion there are totally 162 orthol-ogous genes selected for the first testing case If thesetwelve vertebrate species were classified into two speciesclusters including mammal and fishery species clusters forcomparison the conserved and exclusive SSR motifs foreach gene could be successfully identified and significantSSR biomarker candidates for each individual gene wereincluded in the Supplementary Material available online athttpdxdoiorg1011552014678971 Here we only illustratetwo genes of ENSG00000069329 and ENSG00000108883as examples and all conserved SSR motifs were carefullyverified within all orthologous genes from twelve modelspecies

321 Case Study of ENSG00000069329 (VPS35) TheEnsembl gene ID of ENSG00000069329 is a vacuolar proteinsorting gene (VPS35) which possesses an average sequenceidentity of 80 by taking pairwise alignment betweenhuman and the other eleven model species The resultingSSR tag cloud for VPS35 was shown in Figure 3 by settingSSR quality of 80 minimum SSR length of 20 nucleotidesand motif conserved ratio of 60 (ie required at least

6 BioMed Research International

4 species possessing identical SSR motifs in each speciescluster) The first species cluster was assigned as the mammalgroup including human macaque mouse cow dog andgorilla and the second species cluster was assigned as thefishery group including zebrafish stickleback medaka fugutetraodon and cod The genetic region parameters were setas searching for all regions except introns and the lengthof basic pattern was selected from 1 to 6 nucleotides forcomprehensive representation

According to Figure 1 for SSR color codes users canquickly observe that only three coconserved SSR motifs ofldquoCUpstreamrdquo ldquoAGUpstreamrdquo and ldquoADownstreamrdquo inyellow were found between two species clusters Howeverin this case there is not any blue coded SSR tag in thisexperiment and which implies no coconserved SSR motifexisting for at least 4 model species in each species clustersimultaneouslyThese three yellow color coded SSR tags werefound due to their appearance in both species clusters but notwell conserved with respect to the assigned conserved ratioThedark green SSR tag of ldquoAGCodingrdquo represented the con-sensus SSRmotif could be found only in the second cluster offishery species with more than 4 fishery species containingthe SSR motif at coding region but this motif pattern atcoding regionwas not found in anymammal species from thefirst cluster The light green SSR tags represented consensusSSR motifs which were found only in the fishery group butdo not satisfied the motif conserved ratio requirement of80 that is these light green coded SSR patterns were onlyfound with less than 4 fishery species On the other handthe pink coded SSR tags represented consensus SSR motifsfoundonly in themammal species cluster exclusivelywith lessthan 4 mammal species In addition the dark green SSR tagof ldquoAGCodingrdquo with the biggest font size implied this SSRholding as the most representative and exclusive feature forfishery species compared to mammal species

322 Case Study of ENSG00000108883 (EFTUD2) TheEnsemble gene ID of ENSG00000108883 is an elongationfactor Tu GTP binding domain (EFTUD2) which possessesan average sequence identity of 80 by taking pairwisealignment between human species and other 11model speciesindividually The resulting SSR tag cloud for EFTUD2 wasshown in Figure 4 by setting exactly the same parameters asthe previous example According to the resulting tag cloudusers can immediately identified that only one coconservedSSR tag of ldquoATCCodingrdquo could be found as a notablebiomarker between two species clusters and it was wellconserved across at least 4 species in each species clusterHence the SSR tag was indicated by blue Furthermoreone red coded SSR tag of ldquoADownstreamrdquo represented theconsensus SSRmotifs found only in the first mammal speciescluster and more than 4 species containing the SSR motif atcoding region However this motif could not be found in anyfishery species The pink SSR tags represented all conservedSSRmotifs found only in themammal group but not satisfiedthe requirement ofMotif Conserved Ratio Similarly the lightgreen coded SSR tags represented consensus SSR motifsonly found in the fishery species cluster exclusively with

Figure 4 An SSR tag cloud example for ENSG000000108883(EFTUD2) between two 6-species clusters

less than 4 fishery species In addition the red SSR tagof ldquoADownstreamrdquo was shown with the biggest font sizewhich implied the SSR holding as the most representativeand exclusive for mammal species compared to all other SSRcandidates

Interestingly the first gene VPS35 (ENSG00000069329)is associated with ldquoParkinsonrsquos disease (PD)rdquo [19] andthe second gene EFTUD2 (ENSG00000108883) causesldquomandibulofacial dysostosis withmicrocephalyrdquo [20] In bothcases so far scientists have only demonstrated that bothdiseases were caused by some gene mutations Through insilico SSR biomarker detection by our proposed system wecould efficiently identify many important conserved andexclusive SSRs between two grouped species as biomarkersHowever without experimental verification we could notmake sure whether both diseases possess a true correlationwith identified SSR motifs To gain more confidence onthe proposed system we verified on some disease geneswhich were known to be associated with some specific SSRbiomarkers If a genetic disease is indeed caused by abnormaldistributions of SSR motifs we expect that our proposedSSR tag cloud representation system could identify thosesignificant SSR biomarkers in an efficient and effective way

33 Case Study of a Set of Skeletal Development Genes Todemonstrate functionally related SSRmotifs we have selecteda gene set containing specific function of skeletal develop-ment A total of 17 genes associated with such function areselected and these genes are HOXA11 ZIC2 ALX4 HOXA2DLX2 HOXA7 TWIST1 HOXC13 RUNX2 SOX9 HOXD11HOXD13 GDF11 HLX SIX3 HOXD8 and HOXA10 [21] Inthis example we have shown that the detailed informationof each SSR tag is available in a floating dialog by clickingon it and the appearance number and conserved ratio of aselected SSR motif from the target genes can be viewed bymoving mouse cursor over the SSR tag

The resulting SSR tag clouds from different combinato-rial settings for 17 skeletal development related genes wereshown in Figure 5 In Figure 5(a) the parameter settingswere defined as follows SSR quality of 90 for perfectSSR patterns minimum SSR length of 20 nucleotides motifconserved ratio of 80 (ie at least 5 species possessing

BioMed Research International 7

(a)

(b)

(c)

(d)

Figure 5 (a) SSR tag cloud for 17 skeletal development related genesconstrained to coding regions (b) results of moving a mouse deviceover the SSR tag of ldquoCCGCodingrdquo (c) detailed information of theSSR tag ldquoCCGCodingrdquo in a floating dialog (d) an SSR tag cloud for17 skeletal development related genes by showing SSRs possessinghigh conserved ratios only

identical SSR motifs in each species cluster) and all possibleSSR candidates were shown The first species cluster wasassigned as the mammal group including human macaquemouse cow dog and gorilla the second species cluster wasassigned as the fishery group including zebrafish sticklebackmedaka fugu tetraodon and cod The filter of geneticregion was selected for coding region only and the lengthof basic pattern was selected from 1 to 6 nucleotides forcomprehensive representation According to these settingsthe simulated results were shown in Figure 5 the red codedSSR tag of ldquoCCGCodingrdquo represented the only exclusiveSSR motifs well conserved in mammal species This tagcould be found from at least 5 species within the mammalgroup and it is highly correlated to the skeletal developmentrelated genes Users can move a mouse device over the tag ofldquoCCGCodingrdquo and the appearance number and conservedratio of the selected SSR motif would be shown with a pop-up icon In Figure 5(b) the CCGCoding motifs appear inthe mammal species cluster with a total of 62 times and

a conserved ratio of 100 while no such an SSR motifcould be discovered from the skeletal development gene setwithin the fishery species cluster If a user clicked on thetag of ldquoCCGCodingrdquo detailed information of the SSR tagwill be shown by a floating dialog with Ensembl gene IDtranscript ID species name coordinates in genomes andDNA sequence contents Particularly if the SSRs appearedwithin coding regions the table also provided the detailedinformation of cDNA sequence and its corresponding trans-lated protein sequences In Figure 5(c) the CCG repeatedpattern in the last row of humanrsquos ENSG00000135414 (GDF11)gene is located at chromosome 12 and its coordinates arefrom 56137185 to 56137224 Since the CCG repeated patternwas found in coding regions the table also provided thedetailed information of DNA cDNA and correspondingprotein sequence contents Actually this repeated patternin RUNX2 gene at coding region is a polyalanine pep-tide (GCC repeat in coding region) and it indeed playsa crucial role in cellular development function Abnormaldistribution of this polyalanine repeat biomarkermight causedysplasia disease a genetic disorder of abnormal cellulardevelopment

In Figure 5(d) most of parameters were set identicallyas Figure 5(a) except the display parameter was modifiedfor showing highly conserved SSRs instead of showing allof identified SSRs In the other words tags with pinklight green and yellow color codes would be hiddenThe corresponding tag showed only one red coded tagof ldquoCCGCodingrdquo existed under such high conservationrequirements Again the SSR motif of ldquoCCGCodingrdquo rep-resented as a significant biomarker inmammal species highlycorrelated to the skeletal development related genes

34 Case Study of Gene Ontology Term of ldquoEmbryonicCranial Skeleton Morphogenesisrdquo To demonstrate function-ally related SSR motifs through GO term assignmentwe selected a GO term of ldquoembryonic cranial skeletonmorphogenesisrdquo The related genes annotated by this GOterm include TBX15 SIX4 DLX2 PRRX1 TWIST1 BMP4SIX1 SMAD2 NIPBL NODAL WNT9B TGFBR2 GAS1SIX2 FOXC2 SMAD3 TBX1 TGFBR2 TBX15 GNASPRRX2 TGFBR1 TFAP2A SMAD2 SETD2 BMP4 SMAD3TWIST2 TFAP2A SMAD3 TGFBR1 and BMP4 To com-pare and show different results by various settings we havetried several combinations of input parameters which weredifferent from system default settings In this case study theparameter settings were defined as follows SSR quality of90 for perfect SSR patterns minimum SSR length of 20nucleotidesmotif conserved ratioof 80 (ie at least 5 speciespossessing identical SSR motifs in each species cluster) andshowed all possible SSR candidates The first species clusterwas assigned as mammal group and the second species clusterfor fishery group as default settingsThe filter of genetic regionwas selected for analyzing on coding regions only and thelength of basic pattern was selected from 1 to 6 nucleotidesAccording to these settings the simulated results were shownin Figure 6(a) We could observe that there was only onered color coded SSR tag of ldquoCCGCodingrdquo and which isthe unique biomarker conserved in mammal species with

8 BioMed Research International

(a)

(b)

(c)

Figure 6 (a) SSR tag cloud for GO keyword ldquoembryonic cranialskeleton morphogenesisrdquo with motif conserved ratio of 80 (b)motif conserved ratio of 60 (c) motif conserved ratio of 100

respect to the embryonic cranial skeleton morphogenesisrelated genes

Then we lowered down the motif conserved ratio to 60and the resulting SSR tag cloud was shown in Figure 6(b)We could observe that several tags were changed by theircoded colors Taking red color coded tags as an example therewas only one red tag ldquoCCGCodingrdquo in previous Figure 6(a)but in Figure 6(b) we noticed that the red color coded SSRtags increased another tag of ldquoAATCTGCodingrdquo whichwas displayed in originally denoted as pink in Figure 6(a)Inversely if we increased the motif conserved ratio to 100the result was shown in Figure 6(c) with no red color codedSSR tag in this cloud Compared to Figure 6(a) the originalred tag of ldquoCCGCodingrdquo was changed into pink due toonly 5 out of 6 species in the mammal group holding thetag of ldquoCCGCodingrdquo In both Figures 6(b) and 6(c) wesimply observed that color coded tagsmay switch their colorsthrough different motif conserved ratio adjustments Thehigher setting ofmotif conserved ratio reduces the amount ofred green and blue color coded tags

35 An Example of Genetic Disease of ldquoHuntingtonrsquos Disease(HD)rdquo To demonstrate genetic diseases caused by abnor-mal distribution of SSR motifs we have selected a well-known neurodegenerative genetic disease ldquoHuntingtonrsquos dis-ease (HD)rdquo as an example HD was found as an irregular dis-tribution of polyglutamine expansions (CAG repeats) locatedwithin the coding regions of ENSG00000197386 (HTT)gene at chromosome 4 [22] It appears with involuntarymovements caused by losing muscle coordination and leadsto psychiatric problemsThe nucleotide repeat length and theaverage age of symptom occurrence of Huntingtonrsquos diseasewere in inverse relationship [23]

The verification results of SSR tag cloud were shown inFigure 7 and the parameter settings were defined as followsSSR quality of 100 and 80 minimum SSR length of 20

MammalianBos taurus

Mus musculusHomo sapiensGorilla gorilla

FisheryDanio rerio

Canis familiaris

(a)

MammalianBos taurus

Homo sapiens

FisheryNone

(b)

Figure 7 (a) SSR tag cloud for HTT gene with SSR quality of80 Motif Conserved Ratio of 80 and 5 organisms holding theconserved SSR tag of ldquoAGCCodingrdquo (b) SSR quality of 100MotifConserved Ratio of 80 and only two species of human and cattlespecies holding the perfect SSR tag of ldquoAGCCodingrdquo

nucleotidesmotif conserved ratioof 80 (ie at least 5 speciespossessing identical SSR motifs in each species cluster) andwith a selection of ldquoshow all SSRsrdquo The first species clusterwas assigned as mammal group while the second speciescluster as fishery group In Figure 7 we could observe theldquoAGCCodingrdquo in both two-tag clouds as an importantbiomarker In fact according to shifting transformation ofSSR repeat pattern the ldquoAGCrdquo repeat unit could be theoret-ically considered as the same pattern of ldquoCAGrdquo for efficientidentification However SSRs located within coding regionswould be further translated into their corresponding aminoacid sequences according to precise loci verification on exonregions Frame shifted SSRs in coding regions might result indifferent coded amino acids For example the coded aminoacid of the trinucleotide pattern of ldquoAGCrdquo is serine(S) andldquoCAGrdquo for glutamine (Q) Therefore identified SSRs in cod-ing regions should be carefully treated and translated into anappropriated protein sequence based on annotated genomedatabase In this example we noticed that a significant SSRmotif of ldquoAGCCodingrdquo in HTT genes could be identified

BioMed Research International 9A

mou

nt o

f SSR

s Cluster I Cluster II Cluster I Cluster II

CR

0

(a)

SSR quality 80 80

100100

Cluster I Cluster II

(b)

Figure 8 (a) Relationship between the parameter ofmotif conservedratio and the amount of SSR tags in different colors (b) relationshipbetween the parameter of SSR quality and SSR tag colors

with different sizes (occurrence rates) according to variousSSR quality settings This repeat motif in coding regionsappears in most mammal species except macaque with aminimum length requirement of 20 nucleotides Besides onlyzebrafish possesses a similar repeat motif in coding regionamong all fishery species When the parameter of SSR qualitywas increased to 100 (without any tolerance) the pattern ofldquoAGCCodingrdquo (or equivalently to ldquoCAGCodingrdquo in DNAsense strand) could be retrieved from both cattle and humanin mammal species only We could observe that the font sizeand color of each SSR tag were gradually changed accordingto different settings of tolerance rate Accordingly the tag ofldquoAGCCodingrdquo appeared with the biggest icon in pink whencompared to all other SSRs in coding regions and it reflectedthe significance of exclusive features for mammal speciescompared to fishery species These observations might alsoprovide important information for biologists for animalspecies selection in future experimental studies regardingspecific diseases

4 Discussion

Two key parameters affect the color and size distributionwithin an SSR tag cloud The first one is the motif conservedratio Different conserved ratio values change colors of SSRtags When the motif conserved ratio increased the amountof red green and blue tags might decrease In Figure 8(a)Cluster I represents the first species cluster and Cluster IIrepresents the second species cluster The horizontal straightline in the figure represents a motif conserved ratio valueWhen the CR threshold value is increased the areas of red

blue and dark green decreased In contrast when the CRthreshold value is decreased the areas of red blue and darkgreen increased The area is proportional to the amount ofSSR tags

The second important parameter for a tag cloud is theSSR quality threshold As shown in Figure 8(b) different SSRquality values were not only changing the number of SSR tagsbut also transforming the colors Increment of SSR qualityvalue may reduce the amount of SSR tags since the SSRswith higher qualities are always a subset of SSRs with lowerqualities When a quality threshold decreases to gain moreSSR candidates part of red and green tags might change theircolors into yellow or blue tags respectively This is mainlycaused by newly intersecting region after expanding SSRcandidates

Besides a few common SSR tags originally coded inyellowmight be transformed into either red or green throughincreasing the quality factors which is mainly becausethe total number of species possessing certain SSR tag isdecreased and therefore the conserved SSR motifs betweentwo species clusters might become representative SSR tagsfor one species cluster exclusively In Table 2 a list of totalamount of SSR motifs for each species is presented by settinga minimum SSR length of 20 nucleotides The SSR quantitiesfor mammal species are usually more than fishery speciesand the increment of SSR quality value reduces the amountof SSR motifs in each species generally

5 Conclusion

SSRs are nonrandomly distributed nucleotides in thegenomes with repeating basic patterns of lengths from 1to 6 nucleotides and a large number of functional SSRmotifs have been demonstrated as important biomarkersinvolved within various biological processes and generegulations Due to abundant number of SSRs in eachspecies genomes it is difficult to recognize significant SSRbiomarkers or gene regulation related SSRs mainly based onrepeat sequence length genetic locations and fundamentalrepeat pattern of an SSR motif In this paper we proposedthe concept of identifying SSR biomarker candidate throughcross-species cluster comparison on a specified set of targetgenes The developed system provides an online tool withmultiparameter selection functions and the identified SSRmotifs are displayed by a tag cloud visualization methodThe exclusive and consensus SSR motifs between two speciesclusters are shown in different font colors and sizes in anefficient approach The in silico comparison of SSR motifsacross different species clusters may provide the cluesand evidences for further understanding of evolutionarydevelopment and functional associations

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

10 BioMed Research International

Table 2 The number of SSR motifs of each species for various SSR quality settings

Scientific name Species name SSR quality 80 SSR quality 90 SSR quality 100Danio rerio Zebrafish 1175832 594741 401503Gasterosteus aculeatus Stickleback 160413 87343 51779Oryzias latipes Medaka 122505 37730 15460Takifugu rubripes Fugu 261612 148043 90753Tetraodon nigroviridis Tetraodon 119557 69473 43584Gadus morhua Cod 359592 209540 123880Homo sapiens Human 3023284 1406186 644338Gorilla gorilla Gorilla 757571 344973 152403Macaca mulatta Macaque 1075737 526515 225403Mus musculus Mouse 2463222 1301019 812873Bos taurus Cow 323386 132923 44906Canis familiaris Dog 715776 340433 152502Caenorhabditis elegans Roundworm 59273 13637 4225Drosophila melanogaster Fruit fly 199458 79952 21223

Acknowledgments

This work is supported by the Center of Excellence for theOceans fromNational TaiwanOceanUniversity andNationalScience Council Taiwan (NSC 102ndash2321-B-019-001 and NSC101ndash2627-B-019-003 to T-W Pai) and Department of Healthin Taiwan (DOH102-TD-B-111-004 to H-T Chang)

References

[1] B Charlesworth P Sniegowski and W Stephan ldquoThe evolu-tionary dynamics of repetitive DNA in eukaryotesrdquoNature vol371 no 6494 pp 215ndash220 1994

[2] Y-C Li A B Korol T Fahima and E Nevo ldquoMicrosatelliteswithin genes structure function and evolutionrdquo MolecularBiology and Evolution vol 21 no 6 pp 991ndash1007 2004

[3] J R Brouwer R Willemsen and B A Oostra ldquoMicrosatelliterepeat instability andneurological diseaserdquoBioessays vol 31 no1 pp 71ndash83 2009

[4] Y C Li A B Korol T Fahima A Beiles and E NevoldquoMicrosatellites genomic distribution putative functions andmutational mechanisms a reviewrdquo Molecular Ecology vol 11no 12 pp 2453ndash2465 2002

[5] S Mundlos F Otto C Mundlos et al ldquoMutations involving thetranscription factor CBFA1 cause cleidocranial dysplasiardquo Cellvol 89 no 5 pp 773ndash779 1997

[6] H Y Zoghbi and H T Orr ldquoGlutamine repeats and neurode-generationrdquoAnnual Review of Neuroscience vol 23 pp 217ndash2472000

[7] C L Cheng T Q Gao Z Wang and D D Li ldquoRoleof insulininsulin-like growth factor 1 signaling pathway inlongevityrdquoWorld Journal of Gastroenterology vol 11 no 13 pp1891ndash1895 2005

[8] K AWoods C Camacho-Hubner D Barter A J L Clark andMO Savage ldquoInsulin-like growth factor I gene deletion causingintrauterine growth retardation and severe short staturerdquo ActaPaediatrica vol 86 no 423 pp 39ndash45 1997

[9] N B Sutter C D Bustamante K Chase et al ldquoA single IGF1allele is a major determinant of small size in dogsrdquo Science vol316 no 5821 pp 112ndash115 2007

[10] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[11] S Lohmann J Ziegler and L Tetzlaff ldquoComparison of tag cloudlayouts task-related performance and visual explorationrdquo inHuman-Computer InteractionmdashINTERACT 2009 vol 5726 ofLecture Notes in Computer Science pp 392ndash404 2009

[12] S Hennig D Groth and H Lehrach ldquoAutomated gene ontol-ogy annotation for anonymous sequence datardquo Nucleic AcidsResearch vol 31 no 13 pp 3712ndash3715 2003

[13] B M Good E A Kawas B Kuo andM DWilkinson ldquoiHOP-erator User-scripting a personalized bioinformaticsWeb start-ing with the iHOP websiterdquo BMC Bioinformatics vol 7 article534 2006

[14] S A Samarajiwa S Forster K Auchettl and P J HertzogldquoINTERFEROME the database of interferon regulated genesrdquoNucleic Acids Research vol 37 no 1 pp D852ndashD857 2009

[15] F Supek M Bosnjak N Skunca and T Smuc ldquoRevigosummarizes and visualizes long lists of gene ontology termsrdquoPLoS ONE vol 6 no 7 Article ID e21800 2011

[16] E Birney T D Andrews P Bevan et al ldquoAn overview ofEnsemblrdquo Genome Research vol 14 pp 925ndash928 2004

[17] CMChen C C Chen TH Shih TW Pai CHHu andW STzou ldquoEfficient algorithms for identifying orthologous simplesequence repeats of disease genesrdquo Journal of Systems Scienceand Complexity vol 23 pp 906ndash916 2010

[18] E Nascimento R Martinez A R Lopes et al ldquoDetection andselection of microsatellites in the genome of Paracoccidioidesbrasiliensis as molecular markers for clinical and epidemiolog-ical studiesrdquo Journal of Clinical Microbiology vol 42 no 11 pp5007ndash5014 2004

[19] A Zimprich A Benet-Pages W Struhal et al ldquoA mutationin VPS35 encoding a subunit of the retromer complex causeslate-onset parkinson diseaserdquo The American Journal of HumanGenetics vol 89 no 1 pp 168ndash175 2011

[20] D V Luquetti A V Hing M J Rieder D A Nickerson EH Turner J Smith et al ldquoMandibulofacial dysostosis withmicrocephaly caused by EFTUD2 mutations expanding thephenotyperdquo The American Journal of Medical Genetics A vol161 pp 108ndash113 2013

BioMed Research International 11

[21] M A Lines L Huang J Schwartzentruber et al ldquoHaploinsuf-ficiency of a spliceosomal GTPase encoded by EFTUD2 causesmandibulofacial dysostosis with microcephalyrdquo The AmericanJournal of Human Genetics vol 90 no 2 pp 369ndash377 2012

[22] J W Fondon III and H R Garner ldquoMolecular origins of rapidand continuous morphological evolutionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 101 no 52 pp 18058ndash18063 2004

[23] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

2 BioMed Research International

natural long-term evolutionary processes On the other handdiscovering exclusive SSR motifs among different speciesclusters could be applied as species-specific genetic markersor provide unique genetic functions which were developedafter species differentiation events The comparison of SSRmotifs across different species clusters may provide impor-tant clues and evidences to further understand evolutionarydevelopment

To efficiently identify SSR biomarkers from large amountof genes in different species considering a few interestedgenes at a time provides an intuitive and effective approachOne possible approach of selecting interested gene groupsfrom gene ontology (GO) terms was employed in thisstudy The GO is a set of structured vocabularies definedby Gene Ontology Consortium [10] which is aimed toprovide a universal standard of functional annotation forgene products All GO terms are connected with each otherby directed acyclic graphs with hierarchy relationship Eachterm belongs to one of the three independent ontologiesbiological process (BP) molecular function (MF) and cellu-lar component (CC) and represent different aspects of genein temporal functional and spatial domains respectivelyIn this study the query biological keywords associated withcorresponding GO terms could provide a set of functional-associated gene set for SSR biomarker analysis Recently sev-eral gene sequence studies associated with GO analysis havebeen reported such as the Gene Ontology SSR Hierarchy(GOSH) system which adopts GO terms to reveal prominentorthologous SSR patterns [11] FatiGO which is a web toolfor finding significant associations of Gene Ontology termswith groups of genes [2] and Goblet system which performsautomatic GO term annotation on anonymous sequences[12]

To enhance the ranking and readability of identified SSRmotifs a tag cloud technique was adopted to display thecomparative results of cross-species SSRs Tag cloud rep-resentation is a widespread visualization technology whichprovides users with an informative image from a designatedset of data Tags of different phases or short sentences rep-resent key information of each entry in the dataset Multipletags for various data entries could be displayed in an imagesimultaneously and which are manually assigned by usersor automatically generated by computer algorithms Each tagcloud could be shown with different visual attributes such asdifferent sizes or colors In tradition different sizes of tagsare designed to indicate various levels of representativeness oftags within the dataset [12] Currently tag clouds have beenwidely used in several different types of websites includingphoto albums bookmarks and blogs It is also used insome tag-based biomedical datasets to help users to rapidlyunderstand the representative information from a complexdataset For example the iHOPerator system employed tagcloud technique with related functions for genes analysis[13] INTERFEROME applied tag cloud visualization on geneontology databases for interferon regulated genes [14] andREVIGOused tag cloud approach to summarize and visualizelong lists of gene ontology terms [15] All these exampleshave shown that tag cloud visualization techniques could

be applied to strengthen key information from complexbiological datasets

In this study we have collected complete genomesequences of 14 model species as the fundamental datasetAll SSR motifs in each gene were extracted and saved inthe designed database in advance Users can prepare a setof genes or assign keywords to defined query genes andthen choosemodel species of interest for cross-species clustercomparison Model species of interest could be manuallyclustered or automatically categorized into two groups ofmammal and marine species clusters SSR retrieval anddistribution analysis for single species is also available fromthe developed system Once all parameters have been settledthe system will perform online comparison and display allgrouped SSR motifs in a tag cloud visualization approachAll significantly conserved or exclusive SSR motifs locatedwithin the specified gene sets from two species clusterswill be efficiently identified and displayed In addition allretrieved SSR biomarker candidates will be shown in a tagcloud representation with occurrence frequency conservedratio gene annotation sequence contents and correspondingtranslated proteins through a fast responsive and user-friendly web page design

2 Materials and Methods

21 System Configuration In this study there are fourteeninitially selected genomes obtained from Ensembl database[16] and all collected gene sequences with their corre-sponding gene coordinates and annotations were down-loaded for cross-species comparison in next modules Eachgene sequence including upstream and downstream regionswas scanned and all perfectimperfect SSR patterns underdifferent parameter settings were extracted from collectedgenes and saved in a newly created SSR database Accord-ing to gene coordinate information the developed systemdetermined the corresponding genetic regions for each SSRmotif and all related annotations were saved in the sameentry Accordingly the analytical module utilized cross-species comparison techniques between two assigned speciesclusters and all statistically conserved andor exclusive SSRpatterns could be shown under a tag cloud representationtechnique All details are introduced in the next two sections

22 Genome Sequences To obtain genome sequences ofvarious organisms the developed system employed Ensemblrelease 65 as the major data resource Ensembl databaseprovides complete genome information on multiple eukary-otic model organisms including whole genome sequencegene annotations and molecular functions To lay emphasison identification of consensus and unique features of SSRsamong different species two species clusters including fisheryand mammal species were initially selected for comparisonSince there were only 6 fishery species that could be collectedfrom Ensembl release 65 we therefore selected another 6popular mammal species for equivalent status Besides twofamous research organisms in experimental studies werealso included in our database These intentionally selected

BioMed Research International 3

model species are zebrafish (Danio rerio) stickleback (Gas-terosteus aculeatus) medaka (Oryzias latipes) fugu (Takifugurubripes) tetraodon (Tetraodon nigroviridis) and cod (Gadusmorhua) as fishery species human (Homo sapiens) gorilla(Gorilla gorilla) macaque (Macaca mulatta) mouse (Musmusculus) cow (Bos taurus) and dog (Canis familiaris)as mammal species roundworm (Caenorhabditis elegans)and fruit fly (Drosophila melanogaster) as extra two pop-ular experimental species These downloaded data includesequence contents coordinates of exonsintrons and UTRsfor each gene and upstream and downstream regions witha length of 2000 nucleotides

23 SSR Motif Database Construction To accelerate search-ing speed in identifying all perfect and imperfect orthologousSSRs from a set of specified genes among fourteen species weperformed an autocorrelation based SSR discovery algorithmand constructed the SSR motif database in advance [17] Theautocorrelation algorithm could extract all candidate per-fectimperfect SSR motifs under different threshold param-eters through an efficient and effectively approach In thisstudy we only considered SSR motifs with nucleotide lengthlonger than 20 nucleotides and the length of fundamentalrepeat unit ranging from 1 to 6 nucleotides The SSR motif-searching algorithm also applied a proportional quality factorfor defining SSR patterns of different degrees of noise Inthis study three different tolerant settings were tentativelyapplied for considering noisy patterns within multiscale SSRtag clouds in later presentationThe tolerant parameters wereinitially set as 0 01 and 02 for representing 0 10 and 20percent of noisy contentswithin an SSRmotifThepercentageof noise is defined as the ratio of the nonrepeated nucleotideswithin a total length of an identified SSR which includesnoise types of insertion deletion and substitutionmutationsIn other words the zero percent noisy rate represents aperfect repeat segmentwithout any toleranceThe formula forthe tolerant percentage is shown in the following equation

Tolerant () =nonrepeated nucleotidesidentified SSR length

times 100 (1)

An SSR motif could locate in six different geneticregions of a specified gene including coding intron 51015840 UTR(untranslated region) 31015840 UTR and upstream and down-stream regions In this system the upstream and downstreamregions are defined as an extended range of 2000 nucleotidesfrom the start and end positions of transcription In additionaccording to the shifting mechanism of repeating segmentsand the complementary based-paired nature in DNAdouble-stranded helical structures several possible combinations ofSSR patterns could be considered as an identical SSR motifwithin genetic loci For example any rotation of a basic repeatpattern is considered as the same SSR element such as aldquoTArdquo repeat could be also defined as an equivalent repeatmotif as an ldquoATrdquo repeat pattern through one nucleotideshifting Another situation of an identical SSR motif withdifferent appearance is through complementary based paringand inverse reading from the DNA sequences For examplethe repeat pattern of ldquoAGCrdquo would appear as ldquoGCTrdquo within

the other complementary strands of DNA Therefore toenumerate all possible SSR patterns in all DNA sequencesunder these two constraints there are exactly 501 funda-mental basic SSR patterns from 1 to 6 nucleotides in length[18] However there is one special condition that should becarefully considered when an SSR motif occurs in codingregions Since the translation processes convert an mRNAsequence into a string of amino acids through the codontable encoding processes the equivalent status due to shiftingmechanisms and complementary strand should be limitedHere we provide their true translated protein sequences fromthe locations of identified SSRmotifs and the in-frame infor-mation will be clearly annotated when the orthologous repeatmotifs are found in coding regions Finally to distinguishdifferent SSR patterns from extensive genomic resources thesystem defines an identifier for an SSR motif by its basicpattern in accordance with its corresponding genetic locationwithin the specified gene For example ldquoAGCodingrdquo inEnsembl gene id ldquoENSG00000069329rdquo represents a specificrepeated pattern ldquoAGrdquo appearing within the coding regionof ldquoENSG00000069329rdquo According to prerunning processesunder various parameter settings for identifying all possibleSSR motifs in accordance with both detailed coordinatesand annotated information from Ensembl database weconstructed a comprehensive SSR motif database for allgenes from any specified species These identified SSR motifsfrom each gene would be recognized as ldquotagrdquo items for thefollowing cross-species comparison and all retrieved SSRtags from the input gene set will be further compared basedon occurrence rates and applied to construct a multiscale tagcloud representation

24 Grouped Species and Cross-Cluster Comparison Due totremendous amount of SSRs nonrandomly distributed ingenome sequences it is not an intuitive task to observe SSRbiomarkers or identify gene regulatory related SSR motifsfrom an individual genome Hence we assume the conservedor exclusive SSR motifs providing important clues for iden-tifying functional SSR motifs or representative biomarkersamong various species To emphasize the long-distance rela-tionship from an evolutionary point of view we have selectedtwo groups of model vertebrate species for orthologous SSRmotif comparisonThe first group represents the mammalianspecies including Bos taurus Canis familiaris Homo sapiensGorilla gorilla Macaca mulatta and Mus musculus thesecond group represents the fishery species including Daniorerio Gadus morhua Gasterosteus aculeatus Oryzias latipesTakifugu rubripes and Tetraodon nigroviridis In addition tothese twelve clustered species we also included two widelyused model organisms including Drosophila melanogasterand Caenorhabditis elegans Nevertheless in this developedsystem users can either apply the previously defined twospecies groups or manually assign them into two clusterswithout any limitation By integrating with cross-speciescomparison techniques and overrepresentation analysis fromassigned gene sets the SSR patterns with conserved andexclusive characteristics in selected genes between differentspecies clusters can be recognized and treated An iden-tified conserved SSR motif would be initially defined as

4 BioMed Research International

an orthologous SSR motif if the conserved ratio meetsthe minimum threshold in an assigned species cluster Forexample a conserved ratio of 80 denotes the identifiedconserved SSR pattern that could be found in at least 80of species in the assigned species cluster which indicatedthat there are at least 5 (6 lowast 80 = 48) different speciespossessing the orthologous gene(s) and holding the specificSSR pattern located within the same genetic region among allorthologous gene(s) Regarding the conditions of many-to-many orthologous genes an SSR motif is defined as holdingconserved feature as long as it could be detected in any oneof its orthologous genes The threshold level of conservedratio can be assigned by users through interactive webpagesettings

Through cross-species comparison between two clusteredgroups retrieving conserved or exclusive SSR motifs couldhelp biologists in choosing significant biomarkers from apreviously defined gene set before performing biologicalexperiments On the other hand exclusive or commonSSR motifs between two different species clusters might beregarded as important genetic markers under the evidencesof biological evolution and functional conservation

25 SSR Tag Cloud Visualization Tag cloud visualizationtechnique provides keyword representation of text databy showing each tag in various font sizes and colors Toenhance the importance of conserved and exclusive SSRmotifs extracted from a set of specified homologous genesbetween two different species clusters we adopted the tagcloud representation to display these identified SSR motifsaccording to their calculated weighting coefficients fromquery gene sets In an SSR tag cloud the tag size ofeach SSR motif not only indicates the conservation statusof the motif among orthologous genes but also displaysthe representativeness among different species clusters Alinear accumulation formula and normalization proceduresfor deciding SSR weighting coefficients were performed fortag size selection This formula simply counts the numberof occurrence times of each SSR motif found from eachindividual gene in different species clusters According tothe definitions of occurrence rate if an identified SSR motifis well conserved in two different species clusters or highlyrepresented in the specified gene set the SSR tag will beassigned with a larger weighting coefficient Accordingly theSSR tag will be displayed with a bigger font size in the tagcloud

In order to visually emphasize identified SSR motifsbelonging to different species clusters we applied differentcolors on SSR tags to distinguish the conserved andorexclusive features of SSR biomarkers between two speciesclusters In this study red tags represent consensus SSRmotifsfor the first species cluster only and satisfy the conservationthreshold in the first species cluster pink tags are applied forrepresenting consensus SSRmotifs for the first species clusteronly but the conservation threshold is not satisfied darkgreen tags represent consensus SSR motifs well conservedwithin the second species cluster only and these motifs alsosatisfied the conservation criterion in the second speciescluster light green tags denote consensus SSR motifs in the

CR

lowastCR conserved ratio percentage

Cluster I Cluster II

Figure 1 Color coded chart for tag cloud representation of iden-tified SSR motifs between two species clusters and the criterion ofconserved ratio

Table 1 Relationship between colors species clusters and con-served ratios of detected SSR motifs

Color Species cluster Conserved motif ratioRed I geCRPink I ltCRBlue I and II geCRYellow I and II ltCRDark green II geCRLight green II ltCR

second species cluster only but the conservation threshold isnot satisfied blue tags represent the identified SSR patternswell conserved in both species clusters and satisfy the speciesconservation percentage as well yellow tags are appliedto show identified SSR patterns conserved but the speciesconservation criterion is not satisfied for the query gene setfrom both species clusters The color-coded information ina resulting tag cloud is shown in Figure 1 and correspondingattributes are described in Table 1 The abbreviated term ofCR represents ldquoconserved ratiordquo percentage of correspond-ing species clusters for each simulation

In the developed system users can also try to identifyimperfect SSR biomarkers by setting different tolerant levelsand the number of retrieved imperfect SSR motifs would bein accordance with the settings proportionally Higher noisyrates allow more tolerant repeat patterns and reflect largernumber of possible SSRmotifs Accordingly the correspond-ing tag clouds could be depicted inmultiscale representationsunder various noise threshold settings In other wordsdifferent scales of tag clouds are composed of SSR motifs ofdifferent tolerant qualities For instance the highest qualityof SSR tag cloud represents that all identified conserved SSRmotifs are with perfect repeating patterns among differentgenes and group species Contrarily lower quality SSR tagclouds contain more tolerant SSR motifs within the tagimage and which may reflect evolutionary status due to genespecification andor duplication events from either distant orclose species Multiscale tag clouds provide biologists withan easier way to compare and select suitable SSR candidatemotifs as biomarkers through a progressive approach ondifferent tolerance levels which could be applied in varioussituations for further design of biological experiments

BioMed Research International 5

Figure 2 Interface of the SSR tag cloud web system (httpssrtccsntouedutw)

3 Results

31 SSR Tag Cloud Web System In this study we havedeveloped an onlineweb system (httpssrtccsntouedutw)for identifying conserved and exclusive SSR biomarkersthrough cross-species cluster comparisonThemain interfaceof the developedweb system is shown in Figure 2 To discoversignificant SSR biomarker candidates from an automaticallygenerated SSR tag cloud a user is required to provide genename(s) or keyword(s) of gene function and simply appliesthe default parameters for system prediction In other wordsa set of query genes could be defined at the first step by pro-viding relevant EnsemblGene IDs GO terms or keywordsBesides the thresholding settings of SSR feature parameterscould also be assigned manually instead of default settingssuch as genetic region length of basic pattern minimumlength of SSR motif SSR quality species cluster and SSRmotif conserved ratio The genetic region and length of basicpattern are applied for distinguishing fundamental featuresof SSR motifs under cross-species cluster comparison Aminimum SSR length is applied to define the minimal lengthfor identification of SSR motifs The SSR quality factorrepresents a tolerance threshold for allowing imperfect SSRsas candidate biomarkers The developed system initiallyprovides three available settings for efficient identification10 for perfect SSRs 08 and 09 for imperfect SSRs with 20and 10 tolerant percentages for an identified SSRmotifThefunction of species cluster assignment is provided for cross-species comparison by classifying species of interest into twoclustersThe parameter ofmotif conserved ratio is designed asthe percentage of qualified species within a cluster possessingthe conserved SSR motif within a target gene Two differentoperation modes were designed for themotif conserved ratioIf a user chooses the condition of larger than or equal tomotifconserved ratio the system will display a resulting SSR tagcloud in 6 colors otherwise an SSR tag cloud will appearin 3 colors only Different color modes of an SSR tag cloudare defined in the previous section Once all parametersand operation modes are defined the system performs SSR

Figure 3 An SSR tag cloud example for ENSG00000069329(VPS35) between two 6-species clusters

biomarker evaluation automatically and generates a final SSRtag cloud for visualization The font color of each SSR tagis mainly decided by the motif conserved ratio parameterand the font size depends only on the occurrence frequencyof an SSR element Users can move the mouse device overany SSR item within the resulting tag clouds and a totalappearance number and conserved ratio of the selected SSRmotif from the target genes of assigned species cluster will bedisplayed The detailed information of each SSR tag is alsoavailable in a floating dialog box by clicking on it whichincludes Ensembl gene ID transcript ID of the specified genepossessing the target SSR motif species name coordinatesin genomes and DNA sequence contents Additionally if anSSR appears within coding regions then its correspondingprotein sequences could be recalled from Ensembl databaseand shown in an additional window

32 SSR Biomarkers for Orthologous Genes To demonstratesystem performance we have selected all orthologous genesfrom twelve vertebrate model species (except fruit fly androundworm) All selected genes possess sequence identi-ties higher than 80 compared to human genome indi-vidually Under this criterion there are totally 162 orthol-ogous genes selected for the first testing case If thesetwelve vertebrate species were classified into two speciesclusters including mammal and fishery species clusters forcomparison the conserved and exclusive SSR motifs foreach gene could be successfully identified and significantSSR biomarker candidates for each individual gene wereincluded in the Supplementary Material available online athttpdxdoiorg1011552014678971 Here we only illustratetwo genes of ENSG00000069329 and ENSG00000108883as examples and all conserved SSR motifs were carefullyverified within all orthologous genes from twelve modelspecies

321 Case Study of ENSG00000069329 (VPS35) TheEnsembl gene ID of ENSG00000069329 is a vacuolar proteinsorting gene (VPS35) which possesses an average sequenceidentity of 80 by taking pairwise alignment betweenhuman and the other eleven model species The resultingSSR tag cloud for VPS35 was shown in Figure 3 by settingSSR quality of 80 minimum SSR length of 20 nucleotidesand motif conserved ratio of 60 (ie required at least

6 BioMed Research International

4 species possessing identical SSR motifs in each speciescluster) The first species cluster was assigned as the mammalgroup including human macaque mouse cow dog andgorilla and the second species cluster was assigned as thefishery group including zebrafish stickleback medaka fugutetraodon and cod The genetic region parameters were setas searching for all regions except introns and the lengthof basic pattern was selected from 1 to 6 nucleotides forcomprehensive representation

According to Figure 1 for SSR color codes users canquickly observe that only three coconserved SSR motifs ofldquoCUpstreamrdquo ldquoAGUpstreamrdquo and ldquoADownstreamrdquo inyellow were found between two species clusters Howeverin this case there is not any blue coded SSR tag in thisexperiment and which implies no coconserved SSR motifexisting for at least 4 model species in each species clustersimultaneouslyThese three yellow color coded SSR tags werefound due to their appearance in both species clusters but notwell conserved with respect to the assigned conserved ratioThedark green SSR tag of ldquoAGCodingrdquo represented the con-sensus SSRmotif could be found only in the second cluster offishery species with more than 4 fishery species containingthe SSR motif at coding region but this motif pattern atcoding regionwas not found in anymammal species from thefirst cluster The light green SSR tags represented consensusSSR motifs which were found only in the fishery group butdo not satisfied the motif conserved ratio requirement of80 that is these light green coded SSR patterns were onlyfound with less than 4 fishery species On the other handthe pink coded SSR tags represented consensus SSR motifsfoundonly in themammal species cluster exclusivelywith lessthan 4 mammal species In addition the dark green SSR tagof ldquoAGCodingrdquo with the biggest font size implied this SSRholding as the most representative and exclusive feature forfishery species compared to mammal species

322 Case Study of ENSG00000108883 (EFTUD2) TheEnsemble gene ID of ENSG00000108883 is an elongationfactor Tu GTP binding domain (EFTUD2) which possessesan average sequence identity of 80 by taking pairwisealignment between human species and other 11model speciesindividually The resulting SSR tag cloud for EFTUD2 wasshown in Figure 4 by setting exactly the same parameters asthe previous example According to the resulting tag cloudusers can immediately identified that only one coconservedSSR tag of ldquoATCCodingrdquo could be found as a notablebiomarker between two species clusters and it was wellconserved across at least 4 species in each species clusterHence the SSR tag was indicated by blue Furthermoreone red coded SSR tag of ldquoADownstreamrdquo represented theconsensus SSRmotifs found only in the first mammal speciescluster and more than 4 species containing the SSR motif atcoding region However this motif could not be found in anyfishery species The pink SSR tags represented all conservedSSRmotifs found only in themammal group but not satisfiedthe requirement ofMotif Conserved Ratio Similarly the lightgreen coded SSR tags represented consensus SSR motifsonly found in the fishery species cluster exclusively with

Figure 4 An SSR tag cloud example for ENSG000000108883(EFTUD2) between two 6-species clusters

less than 4 fishery species In addition the red SSR tagof ldquoADownstreamrdquo was shown with the biggest font sizewhich implied the SSR holding as the most representativeand exclusive for mammal species compared to all other SSRcandidates

Interestingly the first gene VPS35 (ENSG00000069329)is associated with ldquoParkinsonrsquos disease (PD)rdquo [19] andthe second gene EFTUD2 (ENSG00000108883) causesldquomandibulofacial dysostosis withmicrocephalyrdquo [20] In bothcases so far scientists have only demonstrated that bothdiseases were caused by some gene mutations Through insilico SSR biomarker detection by our proposed system wecould efficiently identify many important conserved andexclusive SSRs between two grouped species as biomarkersHowever without experimental verification we could notmake sure whether both diseases possess a true correlationwith identified SSR motifs To gain more confidence onthe proposed system we verified on some disease geneswhich were known to be associated with some specific SSRbiomarkers If a genetic disease is indeed caused by abnormaldistributions of SSR motifs we expect that our proposedSSR tag cloud representation system could identify thosesignificant SSR biomarkers in an efficient and effective way

33 Case Study of a Set of Skeletal Development Genes Todemonstrate functionally related SSRmotifs we have selecteda gene set containing specific function of skeletal develop-ment A total of 17 genes associated with such function areselected and these genes are HOXA11 ZIC2 ALX4 HOXA2DLX2 HOXA7 TWIST1 HOXC13 RUNX2 SOX9 HOXD11HOXD13 GDF11 HLX SIX3 HOXD8 and HOXA10 [21] Inthis example we have shown that the detailed informationof each SSR tag is available in a floating dialog by clickingon it and the appearance number and conserved ratio of aselected SSR motif from the target genes can be viewed bymoving mouse cursor over the SSR tag

The resulting SSR tag clouds from different combinato-rial settings for 17 skeletal development related genes wereshown in Figure 5 In Figure 5(a) the parameter settingswere defined as follows SSR quality of 90 for perfectSSR patterns minimum SSR length of 20 nucleotides motifconserved ratio of 80 (ie at least 5 species possessing

BioMed Research International 7

(a)

(b)

(c)

(d)

Figure 5 (a) SSR tag cloud for 17 skeletal development related genesconstrained to coding regions (b) results of moving a mouse deviceover the SSR tag of ldquoCCGCodingrdquo (c) detailed information of theSSR tag ldquoCCGCodingrdquo in a floating dialog (d) an SSR tag cloud for17 skeletal development related genes by showing SSRs possessinghigh conserved ratios only

identical SSR motifs in each species cluster) and all possibleSSR candidates were shown The first species cluster wasassigned as the mammal group including human macaquemouse cow dog and gorilla the second species cluster wasassigned as the fishery group including zebrafish sticklebackmedaka fugu tetraodon and cod The filter of geneticregion was selected for coding region only and the lengthof basic pattern was selected from 1 to 6 nucleotides forcomprehensive representation According to these settingsthe simulated results were shown in Figure 5 the red codedSSR tag of ldquoCCGCodingrdquo represented the only exclusiveSSR motifs well conserved in mammal species This tagcould be found from at least 5 species within the mammalgroup and it is highly correlated to the skeletal developmentrelated genes Users can move a mouse device over the tag ofldquoCCGCodingrdquo and the appearance number and conservedratio of the selected SSR motif would be shown with a pop-up icon In Figure 5(b) the CCGCoding motifs appear inthe mammal species cluster with a total of 62 times and

a conserved ratio of 100 while no such an SSR motifcould be discovered from the skeletal development gene setwithin the fishery species cluster If a user clicked on thetag of ldquoCCGCodingrdquo detailed information of the SSR tagwill be shown by a floating dialog with Ensembl gene IDtranscript ID species name coordinates in genomes andDNA sequence contents Particularly if the SSRs appearedwithin coding regions the table also provided the detailedinformation of cDNA sequence and its corresponding trans-lated protein sequences In Figure 5(c) the CCG repeatedpattern in the last row of humanrsquos ENSG00000135414 (GDF11)gene is located at chromosome 12 and its coordinates arefrom 56137185 to 56137224 Since the CCG repeated patternwas found in coding regions the table also provided thedetailed information of DNA cDNA and correspondingprotein sequence contents Actually this repeated patternin RUNX2 gene at coding region is a polyalanine pep-tide (GCC repeat in coding region) and it indeed playsa crucial role in cellular development function Abnormaldistribution of this polyalanine repeat biomarkermight causedysplasia disease a genetic disorder of abnormal cellulardevelopment

In Figure 5(d) most of parameters were set identicallyas Figure 5(a) except the display parameter was modifiedfor showing highly conserved SSRs instead of showing allof identified SSRs In the other words tags with pinklight green and yellow color codes would be hiddenThe corresponding tag showed only one red coded tagof ldquoCCGCodingrdquo existed under such high conservationrequirements Again the SSR motif of ldquoCCGCodingrdquo rep-resented as a significant biomarker inmammal species highlycorrelated to the skeletal development related genes

34 Case Study of Gene Ontology Term of ldquoEmbryonicCranial Skeleton Morphogenesisrdquo To demonstrate function-ally related SSR motifs through GO term assignmentwe selected a GO term of ldquoembryonic cranial skeletonmorphogenesisrdquo The related genes annotated by this GOterm include TBX15 SIX4 DLX2 PRRX1 TWIST1 BMP4SIX1 SMAD2 NIPBL NODAL WNT9B TGFBR2 GAS1SIX2 FOXC2 SMAD3 TBX1 TGFBR2 TBX15 GNASPRRX2 TGFBR1 TFAP2A SMAD2 SETD2 BMP4 SMAD3TWIST2 TFAP2A SMAD3 TGFBR1 and BMP4 To com-pare and show different results by various settings we havetried several combinations of input parameters which weredifferent from system default settings In this case study theparameter settings were defined as follows SSR quality of90 for perfect SSR patterns minimum SSR length of 20nucleotidesmotif conserved ratioof 80 (ie at least 5 speciespossessing identical SSR motifs in each species cluster) andshowed all possible SSR candidates The first species clusterwas assigned as mammal group and the second species clusterfor fishery group as default settingsThe filter of genetic regionwas selected for analyzing on coding regions only and thelength of basic pattern was selected from 1 to 6 nucleotidesAccording to these settings the simulated results were shownin Figure 6(a) We could observe that there was only onered color coded SSR tag of ldquoCCGCodingrdquo and which isthe unique biomarker conserved in mammal species with

8 BioMed Research International

(a)

(b)

(c)

Figure 6 (a) SSR tag cloud for GO keyword ldquoembryonic cranialskeleton morphogenesisrdquo with motif conserved ratio of 80 (b)motif conserved ratio of 60 (c) motif conserved ratio of 100

respect to the embryonic cranial skeleton morphogenesisrelated genes

Then we lowered down the motif conserved ratio to 60and the resulting SSR tag cloud was shown in Figure 6(b)We could observe that several tags were changed by theircoded colors Taking red color coded tags as an example therewas only one red tag ldquoCCGCodingrdquo in previous Figure 6(a)but in Figure 6(b) we noticed that the red color coded SSRtags increased another tag of ldquoAATCTGCodingrdquo whichwas displayed in originally denoted as pink in Figure 6(a)Inversely if we increased the motif conserved ratio to 100the result was shown in Figure 6(c) with no red color codedSSR tag in this cloud Compared to Figure 6(a) the originalred tag of ldquoCCGCodingrdquo was changed into pink due toonly 5 out of 6 species in the mammal group holding thetag of ldquoCCGCodingrdquo In both Figures 6(b) and 6(c) wesimply observed that color coded tagsmay switch their colorsthrough different motif conserved ratio adjustments Thehigher setting ofmotif conserved ratio reduces the amount ofred green and blue color coded tags

35 An Example of Genetic Disease of ldquoHuntingtonrsquos Disease(HD)rdquo To demonstrate genetic diseases caused by abnor-mal distribution of SSR motifs we have selected a well-known neurodegenerative genetic disease ldquoHuntingtonrsquos dis-ease (HD)rdquo as an example HD was found as an irregular dis-tribution of polyglutamine expansions (CAG repeats) locatedwithin the coding regions of ENSG00000197386 (HTT)gene at chromosome 4 [22] It appears with involuntarymovements caused by losing muscle coordination and leadsto psychiatric problemsThe nucleotide repeat length and theaverage age of symptom occurrence of Huntingtonrsquos diseasewere in inverse relationship [23]

The verification results of SSR tag cloud were shown inFigure 7 and the parameter settings were defined as followsSSR quality of 100 and 80 minimum SSR length of 20

MammalianBos taurus

Mus musculusHomo sapiensGorilla gorilla

FisheryDanio rerio

Canis familiaris

(a)

MammalianBos taurus

Homo sapiens

FisheryNone

(b)

Figure 7 (a) SSR tag cloud for HTT gene with SSR quality of80 Motif Conserved Ratio of 80 and 5 organisms holding theconserved SSR tag of ldquoAGCCodingrdquo (b) SSR quality of 100MotifConserved Ratio of 80 and only two species of human and cattlespecies holding the perfect SSR tag of ldquoAGCCodingrdquo

nucleotidesmotif conserved ratioof 80 (ie at least 5 speciespossessing identical SSR motifs in each species cluster) andwith a selection of ldquoshow all SSRsrdquo The first species clusterwas assigned as mammal group while the second speciescluster as fishery group In Figure 7 we could observe theldquoAGCCodingrdquo in both two-tag clouds as an importantbiomarker In fact according to shifting transformation ofSSR repeat pattern the ldquoAGCrdquo repeat unit could be theoret-ically considered as the same pattern of ldquoCAGrdquo for efficientidentification However SSRs located within coding regionswould be further translated into their corresponding aminoacid sequences according to precise loci verification on exonregions Frame shifted SSRs in coding regions might result indifferent coded amino acids For example the coded aminoacid of the trinucleotide pattern of ldquoAGCrdquo is serine(S) andldquoCAGrdquo for glutamine (Q) Therefore identified SSRs in cod-ing regions should be carefully treated and translated into anappropriated protein sequence based on annotated genomedatabase In this example we noticed that a significant SSRmotif of ldquoAGCCodingrdquo in HTT genes could be identified

BioMed Research International 9A

mou

nt o

f SSR

s Cluster I Cluster II Cluster I Cluster II

CR

0

(a)

SSR quality 80 80

100100

Cluster I Cluster II

(b)

Figure 8 (a) Relationship between the parameter ofmotif conservedratio and the amount of SSR tags in different colors (b) relationshipbetween the parameter of SSR quality and SSR tag colors

with different sizes (occurrence rates) according to variousSSR quality settings This repeat motif in coding regionsappears in most mammal species except macaque with aminimum length requirement of 20 nucleotides Besides onlyzebrafish possesses a similar repeat motif in coding regionamong all fishery species When the parameter of SSR qualitywas increased to 100 (without any tolerance) the pattern ofldquoAGCCodingrdquo (or equivalently to ldquoCAGCodingrdquo in DNAsense strand) could be retrieved from both cattle and humanin mammal species only We could observe that the font sizeand color of each SSR tag were gradually changed accordingto different settings of tolerance rate Accordingly the tag ofldquoAGCCodingrdquo appeared with the biggest icon in pink whencompared to all other SSRs in coding regions and it reflectedthe significance of exclusive features for mammal speciescompared to fishery species These observations might alsoprovide important information for biologists for animalspecies selection in future experimental studies regardingspecific diseases

4 Discussion

Two key parameters affect the color and size distributionwithin an SSR tag cloud The first one is the motif conservedratio Different conserved ratio values change colors of SSRtags When the motif conserved ratio increased the amountof red green and blue tags might decrease In Figure 8(a)Cluster I represents the first species cluster and Cluster IIrepresents the second species cluster The horizontal straightline in the figure represents a motif conserved ratio valueWhen the CR threshold value is increased the areas of red

blue and dark green decreased In contrast when the CRthreshold value is decreased the areas of red blue and darkgreen increased The area is proportional to the amount ofSSR tags

The second important parameter for a tag cloud is theSSR quality threshold As shown in Figure 8(b) different SSRquality values were not only changing the number of SSR tagsbut also transforming the colors Increment of SSR qualityvalue may reduce the amount of SSR tags since the SSRswith higher qualities are always a subset of SSRs with lowerqualities When a quality threshold decreases to gain moreSSR candidates part of red and green tags might change theircolors into yellow or blue tags respectively This is mainlycaused by newly intersecting region after expanding SSRcandidates

Besides a few common SSR tags originally coded inyellowmight be transformed into either red or green throughincreasing the quality factors which is mainly becausethe total number of species possessing certain SSR tag isdecreased and therefore the conserved SSR motifs betweentwo species clusters might become representative SSR tagsfor one species cluster exclusively In Table 2 a list of totalamount of SSR motifs for each species is presented by settinga minimum SSR length of 20 nucleotides The SSR quantitiesfor mammal species are usually more than fishery speciesand the increment of SSR quality value reduces the amountof SSR motifs in each species generally

5 Conclusion

SSRs are nonrandomly distributed nucleotides in thegenomes with repeating basic patterns of lengths from 1to 6 nucleotides and a large number of functional SSRmotifs have been demonstrated as important biomarkersinvolved within various biological processes and generegulations Due to abundant number of SSRs in eachspecies genomes it is difficult to recognize significant SSRbiomarkers or gene regulation related SSRs mainly based onrepeat sequence length genetic locations and fundamentalrepeat pattern of an SSR motif In this paper we proposedthe concept of identifying SSR biomarker candidate throughcross-species cluster comparison on a specified set of targetgenes The developed system provides an online tool withmultiparameter selection functions and the identified SSRmotifs are displayed by a tag cloud visualization methodThe exclusive and consensus SSR motifs between two speciesclusters are shown in different font colors and sizes in anefficient approach The in silico comparison of SSR motifsacross different species clusters may provide the cluesand evidences for further understanding of evolutionarydevelopment and functional associations

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

10 BioMed Research International

Table 2 The number of SSR motifs of each species for various SSR quality settings

Scientific name Species name SSR quality 80 SSR quality 90 SSR quality 100Danio rerio Zebrafish 1175832 594741 401503Gasterosteus aculeatus Stickleback 160413 87343 51779Oryzias latipes Medaka 122505 37730 15460Takifugu rubripes Fugu 261612 148043 90753Tetraodon nigroviridis Tetraodon 119557 69473 43584Gadus morhua Cod 359592 209540 123880Homo sapiens Human 3023284 1406186 644338Gorilla gorilla Gorilla 757571 344973 152403Macaca mulatta Macaque 1075737 526515 225403Mus musculus Mouse 2463222 1301019 812873Bos taurus Cow 323386 132923 44906Canis familiaris Dog 715776 340433 152502Caenorhabditis elegans Roundworm 59273 13637 4225Drosophila melanogaster Fruit fly 199458 79952 21223

Acknowledgments

This work is supported by the Center of Excellence for theOceans fromNational TaiwanOceanUniversity andNationalScience Council Taiwan (NSC 102ndash2321-B-019-001 and NSC101ndash2627-B-019-003 to T-W Pai) and Department of Healthin Taiwan (DOH102-TD-B-111-004 to H-T Chang)

References

[1] B Charlesworth P Sniegowski and W Stephan ldquoThe evolu-tionary dynamics of repetitive DNA in eukaryotesrdquoNature vol371 no 6494 pp 215ndash220 1994

[2] Y-C Li A B Korol T Fahima and E Nevo ldquoMicrosatelliteswithin genes structure function and evolutionrdquo MolecularBiology and Evolution vol 21 no 6 pp 991ndash1007 2004

[3] J R Brouwer R Willemsen and B A Oostra ldquoMicrosatelliterepeat instability andneurological diseaserdquoBioessays vol 31 no1 pp 71ndash83 2009

[4] Y C Li A B Korol T Fahima A Beiles and E NevoldquoMicrosatellites genomic distribution putative functions andmutational mechanisms a reviewrdquo Molecular Ecology vol 11no 12 pp 2453ndash2465 2002

[5] S Mundlos F Otto C Mundlos et al ldquoMutations involving thetranscription factor CBFA1 cause cleidocranial dysplasiardquo Cellvol 89 no 5 pp 773ndash779 1997

[6] H Y Zoghbi and H T Orr ldquoGlutamine repeats and neurode-generationrdquoAnnual Review of Neuroscience vol 23 pp 217ndash2472000

[7] C L Cheng T Q Gao Z Wang and D D Li ldquoRoleof insulininsulin-like growth factor 1 signaling pathway inlongevityrdquoWorld Journal of Gastroenterology vol 11 no 13 pp1891ndash1895 2005

[8] K AWoods C Camacho-Hubner D Barter A J L Clark andMO Savage ldquoInsulin-like growth factor I gene deletion causingintrauterine growth retardation and severe short staturerdquo ActaPaediatrica vol 86 no 423 pp 39ndash45 1997

[9] N B Sutter C D Bustamante K Chase et al ldquoA single IGF1allele is a major determinant of small size in dogsrdquo Science vol316 no 5821 pp 112ndash115 2007

[10] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[11] S Lohmann J Ziegler and L Tetzlaff ldquoComparison of tag cloudlayouts task-related performance and visual explorationrdquo inHuman-Computer InteractionmdashINTERACT 2009 vol 5726 ofLecture Notes in Computer Science pp 392ndash404 2009

[12] S Hennig D Groth and H Lehrach ldquoAutomated gene ontol-ogy annotation for anonymous sequence datardquo Nucleic AcidsResearch vol 31 no 13 pp 3712ndash3715 2003

[13] B M Good E A Kawas B Kuo andM DWilkinson ldquoiHOP-erator User-scripting a personalized bioinformaticsWeb start-ing with the iHOP websiterdquo BMC Bioinformatics vol 7 article534 2006

[14] S A Samarajiwa S Forster K Auchettl and P J HertzogldquoINTERFEROME the database of interferon regulated genesrdquoNucleic Acids Research vol 37 no 1 pp D852ndashD857 2009

[15] F Supek M Bosnjak N Skunca and T Smuc ldquoRevigosummarizes and visualizes long lists of gene ontology termsrdquoPLoS ONE vol 6 no 7 Article ID e21800 2011

[16] E Birney T D Andrews P Bevan et al ldquoAn overview ofEnsemblrdquo Genome Research vol 14 pp 925ndash928 2004

[17] CMChen C C Chen TH Shih TW Pai CHHu andW STzou ldquoEfficient algorithms for identifying orthologous simplesequence repeats of disease genesrdquo Journal of Systems Scienceand Complexity vol 23 pp 906ndash916 2010

[18] E Nascimento R Martinez A R Lopes et al ldquoDetection andselection of microsatellites in the genome of Paracoccidioidesbrasiliensis as molecular markers for clinical and epidemiolog-ical studiesrdquo Journal of Clinical Microbiology vol 42 no 11 pp5007ndash5014 2004

[19] A Zimprich A Benet-Pages W Struhal et al ldquoA mutationin VPS35 encoding a subunit of the retromer complex causeslate-onset parkinson diseaserdquo The American Journal of HumanGenetics vol 89 no 1 pp 168ndash175 2011

[20] D V Luquetti A V Hing M J Rieder D A Nickerson EH Turner J Smith et al ldquoMandibulofacial dysostosis withmicrocephaly caused by EFTUD2 mutations expanding thephenotyperdquo The American Journal of Medical Genetics A vol161 pp 108ndash113 2013

BioMed Research International 11

[21] M A Lines L Huang J Schwartzentruber et al ldquoHaploinsuf-ficiency of a spliceosomal GTPase encoded by EFTUD2 causesmandibulofacial dysostosis with microcephalyrdquo The AmericanJournal of Human Genetics vol 90 no 2 pp 369ndash377 2012

[22] J W Fondon III and H R Garner ldquoMolecular origins of rapidand continuous morphological evolutionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 101 no 52 pp 18058ndash18063 2004

[23] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

BioMed Research International 3

model species are zebrafish (Danio rerio) stickleback (Gas-terosteus aculeatus) medaka (Oryzias latipes) fugu (Takifugurubripes) tetraodon (Tetraodon nigroviridis) and cod (Gadusmorhua) as fishery species human (Homo sapiens) gorilla(Gorilla gorilla) macaque (Macaca mulatta) mouse (Musmusculus) cow (Bos taurus) and dog (Canis familiaris)as mammal species roundworm (Caenorhabditis elegans)and fruit fly (Drosophila melanogaster) as extra two pop-ular experimental species These downloaded data includesequence contents coordinates of exonsintrons and UTRsfor each gene and upstream and downstream regions witha length of 2000 nucleotides

23 SSR Motif Database Construction To accelerate search-ing speed in identifying all perfect and imperfect orthologousSSRs from a set of specified genes among fourteen species weperformed an autocorrelation based SSR discovery algorithmand constructed the SSR motif database in advance [17] Theautocorrelation algorithm could extract all candidate per-fectimperfect SSR motifs under different threshold param-eters through an efficient and effectively approach In thisstudy we only considered SSR motifs with nucleotide lengthlonger than 20 nucleotides and the length of fundamentalrepeat unit ranging from 1 to 6 nucleotides The SSR motif-searching algorithm also applied a proportional quality factorfor defining SSR patterns of different degrees of noise Inthis study three different tolerant settings were tentativelyapplied for considering noisy patterns within multiscale SSRtag clouds in later presentationThe tolerant parameters wereinitially set as 0 01 and 02 for representing 0 10 and 20percent of noisy contentswithin an SSRmotifThepercentageof noise is defined as the ratio of the nonrepeated nucleotideswithin a total length of an identified SSR which includesnoise types of insertion deletion and substitutionmutationsIn other words the zero percent noisy rate represents aperfect repeat segmentwithout any toleranceThe formula forthe tolerant percentage is shown in the following equation

Tolerant () =nonrepeated nucleotidesidentified SSR length

times 100 (1)

An SSR motif could locate in six different geneticregions of a specified gene including coding intron 51015840 UTR(untranslated region) 31015840 UTR and upstream and down-stream regions In this system the upstream and downstreamregions are defined as an extended range of 2000 nucleotidesfrom the start and end positions of transcription In additionaccording to the shifting mechanism of repeating segmentsand the complementary based-paired nature in DNAdouble-stranded helical structures several possible combinations ofSSR patterns could be considered as an identical SSR motifwithin genetic loci For example any rotation of a basic repeatpattern is considered as the same SSR element such as aldquoTArdquo repeat could be also defined as an equivalent repeatmotif as an ldquoATrdquo repeat pattern through one nucleotideshifting Another situation of an identical SSR motif withdifferent appearance is through complementary based paringand inverse reading from the DNA sequences For examplethe repeat pattern of ldquoAGCrdquo would appear as ldquoGCTrdquo within

the other complementary strands of DNA Therefore toenumerate all possible SSR patterns in all DNA sequencesunder these two constraints there are exactly 501 funda-mental basic SSR patterns from 1 to 6 nucleotides in length[18] However there is one special condition that should becarefully considered when an SSR motif occurs in codingregions Since the translation processes convert an mRNAsequence into a string of amino acids through the codontable encoding processes the equivalent status due to shiftingmechanisms and complementary strand should be limitedHere we provide their true translated protein sequences fromthe locations of identified SSRmotifs and the in-frame infor-mation will be clearly annotated when the orthologous repeatmotifs are found in coding regions Finally to distinguishdifferent SSR patterns from extensive genomic resources thesystem defines an identifier for an SSR motif by its basicpattern in accordance with its corresponding genetic locationwithin the specified gene For example ldquoAGCodingrdquo inEnsembl gene id ldquoENSG00000069329rdquo represents a specificrepeated pattern ldquoAGrdquo appearing within the coding regionof ldquoENSG00000069329rdquo According to prerunning processesunder various parameter settings for identifying all possibleSSR motifs in accordance with both detailed coordinatesand annotated information from Ensembl database weconstructed a comprehensive SSR motif database for allgenes from any specified species These identified SSR motifsfrom each gene would be recognized as ldquotagrdquo items for thefollowing cross-species comparison and all retrieved SSRtags from the input gene set will be further compared basedon occurrence rates and applied to construct a multiscale tagcloud representation

24 Grouped Species and Cross-Cluster Comparison Due totremendous amount of SSRs nonrandomly distributed ingenome sequences it is not an intuitive task to observe SSRbiomarkers or identify gene regulatory related SSR motifsfrom an individual genome Hence we assume the conservedor exclusive SSR motifs providing important clues for iden-tifying functional SSR motifs or representative biomarkersamong various species To emphasize the long-distance rela-tionship from an evolutionary point of view we have selectedtwo groups of model vertebrate species for orthologous SSRmotif comparisonThe first group represents the mammalianspecies including Bos taurus Canis familiaris Homo sapiensGorilla gorilla Macaca mulatta and Mus musculus thesecond group represents the fishery species including Daniorerio Gadus morhua Gasterosteus aculeatus Oryzias latipesTakifugu rubripes and Tetraodon nigroviridis In addition tothese twelve clustered species we also included two widelyused model organisms including Drosophila melanogasterand Caenorhabditis elegans Nevertheless in this developedsystem users can either apply the previously defined twospecies groups or manually assign them into two clusterswithout any limitation By integrating with cross-speciescomparison techniques and overrepresentation analysis fromassigned gene sets the SSR patterns with conserved andexclusive characteristics in selected genes between differentspecies clusters can be recognized and treated An iden-tified conserved SSR motif would be initially defined as

4 BioMed Research International

an orthologous SSR motif if the conserved ratio meetsthe minimum threshold in an assigned species cluster Forexample a conserved ratio of 80 denotes the identifiedconserved SSR pattern that could be found in at least 80of species in the assigned species cluster which indicatedthat there are at least 5 (6 lowast 80 = 48) different speciespossessing the orthologous gene(s) and holding the specificSSR pattern located within the same genetic region among allorthologous gene(s) Regarding the conditions of many-to-many orthologous genes an SSR motif is defined as holdingconserved feature as long as it could be detected in any oneof its orthologous genes The threshold level of conservedratio can be assigned by users through interactive webpagesettings

Through cross-species comparison between two clusteredgroups retrieving conserved or exclusive SSR motifs couldhelp biologists in choosing significant biomarkers from apreviously defined gene set before performing biologicalexperiments On the other hand exclusive or commonSSR motifs between two different species clusters might beregarded as important genetic markers under the evidencesof biological evolution and functional conservation

25 SSR Tag Cloud Visualization Tag cloud visualizationtechnique provides keyword representation of text databy showing each tag in various font sizes and colors Toenhance the importance of conserved and exclusive SSRmotifs extracted from a set of specified homologous genesbetween two different species clusters we adopted the tagcloud representation to display these identified SSR motifsaccording to their calculated weighting coefficients fromquery gene sets In an SSR tag cloud the tag size ofeach SSR motif not only indicates the conservation statusof the motif among orthologous genes but also displaysthe representativeness among different species clusters Alinear accumulation formula and normalization proceduresfor deciding SSR weighting coefficients were performed fortag size selection This formula simply counts the numberof occurrence times of each SSR motif found from eachindividual gene in different species clusters According tothe definitions of occurrence rate if an identified SSR motifis well conserved in two different species clusters or highlyrepresented in the specified gene set the SSR tag will beassigned with a larger weighting coefficient Accordingly theSSR tag will be displayed with a bigger font size in the tagcloud

In order to visually emphasize identified SSR motifsbelonging to different species clusters we applied differentcolors on SSR tags to distinguish the conserved andorexclusive features of SSR biomarkers between two speciesclusters In this study red tags represent consensus SSRmotifsfor the first species cluster only and satisfy the conservationthreshold in the first species cluster pink tags are applied forrepresenting consensus SSRmotifs for the first species clusteronly but the conservation threshold is not satisfied darkgreen tags represent consensus SSR motifs well conservedwithin the second species cluster only and these motifs alsosatisfied the conservation criterion in the second speciescluster light green tags denote consensus SSR motifs in the

CR

lowastCR conserved ratio percentage

Cluster I Cluster II

Figure 1 Color coded chart for tag cloud representation of iden-tified SSR motifs between two species clusters and the criterion ofconserved ratio

Table 1 Relationship between colors species clusters and con-served ratios of detected SSR motifs

Color Species cluster Conserved motif ratioRed I geCRPink I ltCRBlue I and II geCRYellow I and II ltCRDark green II geCRLight green II ltCR

second species cluster only but the conservation threshold isnot satisfied blue tags represent the identified SSR patternswell conserved in both species clusters and satisfy the speciesconservation percentage as well yellow tags are appliedto show identified SSR patterns conserved but the speciesconservation criterion is not satisfied for the query gene setfrom both species clusters The color-coded information ina resulting tag cloud is shown in Figure 1 and correspondingattributes are described in Table 1 The abbreviated term ofCR represents ldquoconserved ratiordquo percentage of correspond-ing species clusters for each simulation

In the developed system users can also try to identifyimperfect SSR biomarkers by setting different tolerant levelsand the number of retrieved imperfect SSR motifs would bein accordance with the settings proportionally Higher noisyrates allow more tolerant repeat patterns and reflect largernumber of possible SSRmotifs Accordingly the correspond-ing tag clouds could be depicted inmultiscale representationsunder various noise threshold settings In other wordsdifferent scales of tag clouds are composed of SSR motifs ofdifferent tolerant qualities For instance the highest qualityof SSR tag cloud represents that all identified conserved SSRmotifs are with perfect repeating patterns among differentgenes and group species Contrarily lower quality SSR tagclouds contain more tolerant SSR motifs within the tagimage and which may reflect evolutionary status due to genespecification andor duplication events from either distant orclose species Multiscale tag clouds provide biologists withan easier way to compare and select suitable SSR candidatemotifs as biomarkers through a progressive approach ondifferent tolerance levels which could be applied in varioussituations for further design of biological experiments

BioMed Research International 5

Figure 2 Interface of the SSR tag cloud web system (httpssrtccsntouedutw)

3 Results

31 SSR Tag Cloud Web System In this study we havedeveloped an onlineweb system (httpssrtccsntouedutw)for identifying conserved and exclusive SSR biomarkersthrough cross-species cluster comparisonThemain interfaceof the developedweb system is shown in Figure 2 To discoversignificant SSR biomarker candidates from an automaticallygenerated SSR tag cloud a user is required to provide genename(s) or keyword(s) of gene function and simply appliesthe default parameters for system prediction In other wordsa set of query genes could be defined at the first step by pro-viding relevant EnsemblGene IDs GO terms or keywordsBesides the thresholding settings of SSR feature parameterscould also be assigned manually instead of default settingssuch as genetic region length of basic pattern minimumlength of SSR motif SSR quality species cluster and SSRmotif conserved ratio The genetic region and length of basicpattern are applied for distinguishing fundamental featuresof SSR motifs under cross-species cluster comparison Aminimum SSR length is applied to define the minimal lengthfor identification of SSR motifs The SSR quality factorrepresents a tolerance threshold for allowing imperfect SSRsas candidate biomarkers The developed system initiallyprovides three available settings for efficient identification10 for perfect SSRs 08 and 09 for imperfect SSRs with 20and 10 tolerant percentages for an identified SSRmotifThefunction of species cluster assignment is provided for cross-species comparison by classifying species of interest into twoclustersThe parameter ofmotif conserved ratio is designed asthe percentage of qualified species within a cluster possessingthe conserved SSR motif within a target gene Two differentoperation modes were designed for themotif conserved ratioIf a user chooses the condition of larger than or equal tomotifconserved ratio the system will display a resulting SSR tagcloud in 6 colors otherwise an SSR tag cloud will appearin 3 colors only Different color modes of an SSR tag cloudare defined in the previous section Once all parametersand operation modes are defined the system performs SSR

Figure 3 An SSR tag cloud example for ENSG00000069329(VPS35) between two 6-species clusters

biomarker evaluation automatically and generates a final SSRtag cloud for visualization The font color of each SSR tagis mainly decided by the motif conserved ratio parameterand the font size depends only on the occurrence frequencyof an SSR element Users can move the mouse device overany SSR item within the resulting tag clouds and a totalappearance number and conserved ratio of the selected SSRmotif from the target genes of assigned species cluster will bedisplayed The detailed information of each SSR tag is alsoavailable in a floating dialog box by clicking on it whichincludes Ensembl gene ID transcript ID of the specified genepossessing the target SSR motif species name coordinatesin genomes and DNA sequence contents Additionally if anSSR appears within coding regions then its correspondingprotein sequences could be recalled from Ensembl databaseand shown in an additional window

32 SSR Biomarkers for Orthologous Genes To demonstratesystem performance we have selected all orthologous genesfrom twelve vertebrate model species (except fruit fly androundworm) All selected genes possess sequence identi-ties higher than 80 compared to human genome indi-vidually Under this criterion there are totally 162 orthol-ogous genes selected for the first testing case If thesetwelve vertebrate species were classified into two speciesclusters including mammal and fishery species clusters forcomparison the conserved and exclusive SSR motifs foreach gene could be successfully identified and significantSSR biomarker candidates for each individual gene wereincluded in the Supplementary Material available online athttpdxdoiorg1011552014678971 Here we only illustratetwo genes of ENSG00000069329 and ENSG00000108883as examples and all conserved SSR motifs were carefullyverified within all orthologous genes from twelve modelspecies

321 Case Study of ENSG00000069329 (VPS35) TheEnsembl gene ID of ENSG00000069329 is a vacuolar proteinsorting gene (VPS35) which possesses an average sequenceidentity of 80 by taking pairwise alignment betweenhuman and the other eleven model species The resultingSSR tag cloud for VPS35 was shown in Figure 3 by settingSSR quality of 80 minimum SSR length of 20 nucleotidesand motif conserved ratio of 60 (ie required at least

6 BioMed Research International

4 species possessing identical SSR motifs in each speciescluster) The first species cluster was assigned as the mammalgroup including human macaque mouse cow dog andgorilla and the second species cluster was assigned as thefishery group including zebrafish stickleback medaka fugutetraodon and cod The genetic region parameters were setas searching for all regions except introns and the lengthof basic pattern was selected from 1 to 6 nucleotides forcomprehensive representation

According to Figure 1 for SSR color codes users canquickly observe that only three coconserved SSR motifs ofldquoCUpstreamrdquo ldquoAGUpstreamrdquo and ldquoADownstreamrdquo inyellow were found between two species clusters Howeverin this case there is not any blue coded SSR tag in thisexperiment and which implies no coconserved SSR motifexisting for at least 4 model species in each species clustersimultaneouslyThese three yellow color coded SSR tags werefound due to their appearance in both species clusters but notwell conserved with respect to the assigned conserved ratioThedark green SSR tag of ldquoAGCodingrdquo represented the con-sensus SSRmotif could be found only in the second cluster offishery species with more than 4 fishery species containingthe SSR motif at coding region but this motif pattern atcoding regionwas not found in anymammal species from thefirst cluster The light green SSR tags represented consensusSSR motifs which were found only in the fishery group butdo not satisfied the motif conserved ratio requirement of80 that is these light green coded SSR patterns were onlyfound with less than 4 fishery species On the other handthe pink coded SSR tags represented consensus SSR motifsfoundonly in themammal species cluster exclusivelywith lessthan 4 mammal species In addition the dark green SSR tagof ldquoAGCodingrdquo with the biggest font size implied this SSRholding as the most representative and exclusive feature forfishery species compared to mammal species

322 Case Study of ENSG00000108883 (EFTUD2) TheEnsemble gene ID of ENSG00000108883 is an elongationfactor Tu GTP binding domain (EFTUD2) which possessesan average sequence identity of 80 by taking pairwisealignment between human species and other 11model speciesindividually The resulting SSR tag cloud for EFTUD2 wasshown in Figure 4 by setting exactly the same parameters asthe previous example According to the resulting tag cloudusers can immediately identified that only one coconservedSSR tag of ldquoATCCodingrdquo could be found as a notablebiomarker between two species clusters and it was wellconserved across at least 4 species in each species clusterHence the SSR tag was indicated by blue Furthermoreone red coded SSR tag of ldquoADownstreamrdquo represented theconsensus SSRmotifs found only in the first mammal speciescluster and more than 4 species containing the SSR motif atcoding region However this motif could not be found in anyfishery species The pink SSR tags represented all conservedSSRmotifs found only in themammal group but not satisfiedthe requirement ofMotif Conserved Ratio Similarly the lightgreen coded SSR tags represented consensus SSR motifsonly found in the fishery species cluster exclusively with

Figure 4 An SSR tag cloud example for ENSG000000108883(EFTUD2) between two 6-species clusters

less than 4 fishery species In addition the red SSR tagof ldquoADownstreamrdquo was shown with the biggest font sizewhich implied the SSR holding as the most representativeand exclusive for mammal species compared to all other SSRcandidates

Interestingly the first gene VPS35 (ENSG00000069329)is associated with ldquoParkinsonrsquos disease (PD)rdquo [19] andthe second gene EFTUD2 (ENSG00000108883) causesldquomandibulofacial dysostosis withmicrocephalyrdquo [20] In bothcases so far scientists have only demonstrated that bothdiseases were caused by some gene mutations Through insilico SSR biomarker detection by our proposed system wecould efficiently identify many important conserved andexclusive SSRs between two grouped species as biomarkersHowever without experimental verification we could notmake sure whether both diseases possess a true correlationwith identified SSR motifs To gain more confidence onthe proposed system we verified on some disease geneswhich were known to be associated with some specific SSRbiomarkers If a genetic disease is indeed caused by abnormaldistributions of SSR motifs we expect that our proposedSSR tag cloud representation system could identify thosesignificant SSR biomarkers in an efficient and effective way

33 Case Study of a Set of Skeletal Development Genes Todemonstrate functionally related SSRmotifs we have selecteda gene set containing specific function of skeletal develop-ment A total of 17 genes associated with such function areselected and these genes are HOXA11 ZIC2 ALX4 HOXA2DLX2 HOXA7 TWIST1 HOXC13 RUNX2 SOX9 HOXD11HOXD13 GDF11 HLX SIX3 HOXD8 and HOXA10 [21] Inthis example we have shown that the detailed informationof each SSR tag is available in a floating dialog by clickingon it and the appearance number and conserved ratio of aselected SSR motif from the target genes can be viewed bymoving mouse cursor over the SSR tag

The resulting SSR tag clouds from different combinato-rial settings for 17 skeletal development related genes wereshown in Figure 5 In Figure 5(a) the parameter settingswere defined as follows SSR quality of 90 for perfectSSR patterns minimum SSR length of 20 nucleotides motifconserved ratio of 80 (ie at least 5 species possessing

BioMed Research International 7

(a)

(b)

(c)

(d)

Figure 5 (a) SSR tag cloud for 17 skeletal development related genesconstrained to coding regions (b) results of moving a mouse deviceover the SSR tag of ldquoCCGCodingrdquo (c) detailed information of theSSR tag ldquoCCGCodingrdquo in a floating dialog (d) an SSR tag cloud for17 skeletal development related genes by showing SSRs possessinghigh conserved ratios only

identical SSR motifs in each species cluster) and all possibleSSR candidates were shown The first species cluster wasassigned as the mammal group including human macaquemouse cow dog and gorilla the second species cluster wasassigned as the fishery group including zebrafish sticklebackmedaka fugu tetraodon and cod The filter of geneticregion was selected for coding region only and the lengthof basic pattern was selected from 1 to 6 nucleotides forcomprehensive representation According to these settingsthe simulated results were shown in Figure 5 the red codedSSR tag of ldquoCCGCodingrdquo represented the only exclusiveSSR motifs well conserved in mammal species This tagcould be found from at least 5 species within the mammalgroup and it is highly correlated to the skeletal developmentrelated genes Users can move a mouse device over the tag ofldquoCCGCodingrdquo and the appearance number and conservedratio of the selected SSR motif would be shown with a pop-up icon In Figure 5(b) the CCGCoding motifs appear inthe mammal species cluster with a total of 62 times and

a conserved ratio of 100 while no such an SSR motifcould be discovered from the skeletal development gene setwithin the fishery species cluster If a user clicked on thetag of ldquoCCGCodingrdquo detailed information of the SSR tagwill be shown by a floating dialog with Ensembl gene IDtranscript ID species name coordinates in genomes andDNA sequence contents Particularly if the SSRs appearedwithin coding regions the table also provided the detailedinformation of cDNA sequence and its corresponding trans-lated protein sequences In Figure 5(c) the CCG repeatedpattern in the last row of humanrsquos ENSG00000135414 (GDF11)gene is located at chromosome 12 and its coordinates arefrom 56137185 to 56137224 Since the CCG repeated patternwas found in coding regions the table also provided thedetailed information of DNA cDNA and correspondingprotein sequence contents Actually this repeated patternin RUNX2 gene at coding region is a polyalanine pep-tide (GCC repeat in coding region) and it indeed playsa crucial role in cellular development function Abnormaldistribution of this polyalanine repeat biomarkermight causedysplasia disease a genetic disorder of abnormal cellulardevelopment

In Figure 5(d) most of parameters were set identicallyas Figure 5(a) except the display parameter was modifiedfor showing highly conserved SSRs instead of showing allof identified SSRs In the other words tags with pinklight green and yellow color codes would be hiddenThe corresponding tag showed only one red coded tagof ldquoCCGCodingrdquo existed under such high conservationrequirements Again the SSR motif of ldquoCCGCodingrdquo rep-resented as a significant biomarker inmammal species highlycorrelated to the skeletal development related genes

34 Case Study of Gene Ontology Term of ldquoEmbryonicCranial Skeleton Morphogenesisrdquo To demonstrate function-ally related SSR motifs through GO term assignmentwe selected a GO term of ldquoembryonic cranial skeletonmorphogenesisrdquo The related genes annotated by this GOterm include TBX15 SIX4 DLX2 PRRX1 TWIST1 BMP4SIX1 SMAD2 NIPBL NODAL WNT9B TGFBR2 GAS1SIX2 FOXC2 SMAD3 TBX1 TGFBR2 TBX15 GNASPRRX2 TGFBR1 TFAP2A SMAD2 SETD2 BMP4 SMAD3TWIST2 TFAP2A SMAD3 TGFBR1 and BMP4 To com-pare and show different results by various settings we havetried several combinations of input parameters which weredifferent from system default settings In this case study theparameter settings were defined as follows SSR quality of90 for perfect SSR patterns minimum SSR length of 20nucleotidesmotif conserved ratioof 80 (ie at least 5 speciespossessing identical SSR motifs in each species cluster) andshowed all possible SSR candidates The first species clusterwas assigned as mammal group and the second species clusterfor fishery group as default settingsThe filter of genetic regionwas selected for analyzing on coding regions only and thelength of basic pattern was selected from 1 to 6 nucleotidesAccording to these settings the simulated results were shownin Figure 6(a) We could observe that there was only onered color coded SSR tag of ldquoCCGCodingrdquo and which isthe unique biomarker conserved in mammal species with

8 BioMed Research International

(a)

(b)

(c)

Figure 6 (a) SSR tag cloud for GO keyword ldquoembryonic cranialskeleton morphogenesisrdquo with motif conserved ratio of 80 (b)motif conserved ratio of 60 (c) motif conserved ratio of 100

respect to the embryonic cranial skeleton morphogenesisrelated genes

Then we lowered down the motif conserved ratio to 60and the resulting SSR tag cloud was shown in Figure 6(b)We could observe that several tags were changed by theircoded colors Taking red color coded tags as an example therewas only one red tag ldquoCCGCodingrdquo in previous Figure 6(a)but in Figure 6(b) we noticed that the red color coded SSRtags increased another tag of ldquoAATCTGCodingrdquo whichwas displayed in originally denoted as pink in Figure 6(a)Inversely if we increased the motif conserved ratio to 100the result was shown in Figure 6(c) with no red color codedSSR tag in this cloud Compared to Figure 6(a) the originalred tag of ldquoCCGCodingrdquo was changed into pink due toonly 5 out of 6 species in the mammal group holding thetag of ldquoCCGCodingrdquo In both Figures 6(b) and 6(c) wesimply observed that color coded tagsmay switch their colorsthrough different motif conserved ratio adjustments Thehigher setting ofmotif conserved ratio reduces the amount ofred green and blue color coded tags

35 An Example of Genetic Disease of ldquoHuntingtonrsquos Disease(HD)rdquo To demonstrate genetic diseases caused by abnor-mal distribution of SSR motifs we have selected a well-known neurodegenerative genetic disease ldquoHuntingtonrsquos dis-ease (HD)rdquo as an example HD was found as an irregular dis-tribution of polyglutamine expansions (CAG repeats) locatedwithin the coding regions of ENSG00000197386 (HTT)gene at chromosome 4 [22] It appears with involuntarymovements caused by losing muscle coordination and leadsto psychiatric problemsThe nucleotide repeat length and theaverage age of symptom occurrence of Huntingtonrsquos diseasewere in inverse relationship [23]

The verification results of SSR tag cloud were shown inFigure 7 and the parameter settings were defined as followsSSR quality of 100 and 80 minimum SSR length of 20

MammalianBos taurus

Mus musculusHomo sapiensGorilla gorilla

FisheryDanio rerio

Canis familiaris

(a)

MammalianBos taurus

Homo sapiens

FisheryNone

(b)

Figure 7 (a) SSR tag cloud for HTT gene with SSR quality of80 Motif Conserved Ratio of 80 and 5 organisms holding theconserved SSR tag of ldquoAGCCodingrdquo (b) SSR quality of 100MotifConserved Ratio of 80 and only two species of human and cattlespecies holding the perfect SSR tag of ldquoAGCCodingrdquo

nucleotidesmotif conserved ratioof 80 (ie at least 5 speciespossessing identical SSR motifs in each species cluster) andwith a selection of ldquoshow all SSRsrdquo The first species clusterwas assigned as mammal group while the second speciescluster as fishery group In Figure 7 we could observe theldquoAGCCodingrdquo in both two-tag clouds as an importantbiomarker In fact according to shifting transformation ofSSR repeat pattern the ldquoAGCrdquo repeat unit could be theoret-ically considered as the same pattern of ldquoCAGrdquo for efficientidentification However SSRs located within coding regionswould be further translated into their corresponding aminoacid sequences according to precise loci verification on exonregions Frame shifted SSRs in coding regions might result indifferent coded amino acids For example the coded aminoacid of the trinucleotide pattern of ldquoAGCrdquo is serine(S) andldquoCAGrdquo for glutamine (Q) Therefore identified SSRs in cod-ing regions should be carefully treated and translated into anappropriated protein sequence based on annotated genomedatabase In this example we noticed that a significant SSRmotif of ldquoAGCCodingrdquo in HTT genes could be identified

BioMed Research International 9A

mou

nt o

f SSR

s Cluster I Cluster II Cluster I Cluster II

CR

0

(a)

SSR quality 80 80

100100

Cluster I Cluster II

(b)

Figure 8 (a) Relationship between the parameter ofmotif conservedratio and the amount of SSR tags in different colors (b) relationshipbetween the parameter of SSR quality and SSR tag colors

with different sizes (occurrence rates) according to variousSSR quality settings This repeat motif in coding regionsappears in most mammal species except macaque with aminimum length requirement of 20 nucleotides Besides onlyzebrafish possesses a similar repeat motif in coding regionamong all fishery species When the parameter of SSR qualitywas increased to 100 (without any tolerance) the pattern ofldquoAGCCodingrdquo (or equivalently to ldquoCAGCodingrdquo in DNAsense strand) could be retrieved from both cattle and humanin mammal species only We could observe that the font sizeand color of each SSR tag were gradually changed accordingto different settings of tolerance rate Accordingly the tag ofldquoAGCCodingrdquo appeared with the biggest icon in pink whencompared to all other SSRs in coding regions and it reflectedthe significance of exclusive features for mammal speciescompared to fishery species These observations might alsoprovide important information for biologists for animalspecies selection in future experimental studies regardingspecific diseases

4 Discussion

Two key parameters affect the color and size distributionwithin an SSR tag cloud The first one is the motif conservedratio Different conserved ratio values change colors of SSRtags When the motif conserved ratio increased the amountof red green and blue tags might decrease In Figure 8(a)Cluster I represents the first species cluster and Cluster IIrepresents the second species cluster The horizontal straightline in the figure represents a motif conserved ratio valueWhen the CR threshold value is increased the areas of red

blue and dark green decreased In contrast when the CRthreshold value is decreased the areas of red blue and darkgreen increased The area is proportional to the amount ofSSR tags

The second important parameter for a tag cloud is theSSR quality threshold As shown in Figure 8(b) different SSRquality values were not only changing the number of SSR tagsbut also transforming the colors Increment of SSR qualityvalue may reduce the amount of SSR tags since the SSRswith higher qualities are always a subset of SSRs with lowerqualities When a quality threshold decreases to gain moreSSR candidates part of red and green tags might change theircolors into yellow or blue tags respectively This is mainlycaused by newly intersecting region after expanding SSRcandidates

Besides a few common SSR tags originally coded inyellowmight be transformed into either red or green throughincreasing the quality factors which is mainly becausethe total number of species possessing certain SSR tag isdecreased and therefore the conserved SSR motifs betweentwo species clusters might become representative SSR tagsfor one species cluster exclusively In Table 2 a list of totalamount of SSR motifs for each species is presented by settinga minimum SSR length of 20 nucleotides The SSR quantitiesfor mammal species are usually more than fishery speciesand the increment of SSR quality value reduces the amountof SSR motifs in each species generally

5 Conclusion

SSRs are nonrandomly distributed nucleotides in thegenomes with repeating basic patterns of lengths from 1to 6 nucleotides and a large number of functional SSRmotifs have been demonstrated as important biomarkersinvolved within various biological processes and generegulations Due to abundant number of SSRs in eachspecies genomes it is difficult to recognize significant SSRbiomarkers or gene regulation related SSRs mainly based onrepeat sequence length genetic locations and fundamentalrepeat pattern of an SSR motif In this paper we proposedthe concept of identifying SSR biomarker candidate throughcross-species cluster comparison on a specified set of targetgenes The developed system provides an online tool withmultiparameter selection functions and the identified SSRmotifs are displayed by a tag cloud visualization methodThe exclusive and consensus SSR motifs between two speciesclusters are shown in different font colors and sizes in anefficient approach The in silico comparison of SSR motifsacross different species clusters may provide the cluesand evidences for further understanding of evolutionarydevelopment and functional associations

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

10 BioMed Research International

Table 2 The number of SSR motifs of each species for various SSR quality settings

Scientific name Species name SSR quality 80 SSR quality 90 SSR quality 100Danio rerio Zebrafish 1175832 594741 401503Gasterosteus aculeatus Stickleback 160413 87343 51779Oryzias latipes Medaka 122505 37730 15460Takifugu rubripes Fugu 261612 148043 90753Tetraodon nigroviridis Tetraodon 119557 69473 43584Gadus morhua Cod 359592 209540 123880Homo sapiens Human 3023284 1406186 644338Gorilla gorilla Gorilla 757571 344973 152403Macaca mulatta Macaque 1075737 526515 225403Mus musculus Mouse 2463222 1301019 812873Bos taurus Cow 323386 132923 44906Canis familiaris Dog 715776 340433 152502Caenorhabditis elegans Roundworm 59273 13637 4225Drosophila melanogaster Fruit fly 199458 79952 21223

Acknowledgments

This work is supported by the Center of Excellence for theOceans fromNational TaiwanOceanUniversity andNationalScience Council Taiwan (NSC 102ndash2321-B-019-001 and NSC101ndash2627-B-019-003 to T-W Pai) and Department of Healthin Taiwan (DOH102-TD-B-111-004 to H-T Chang)

References

[1] B Charlesworth P Sniegowski and W Stephan ldquoThe evolu-tionary dynamics of repetitive DNA in eukaryotesrdquoNature vol371 no 6494 pp 215ndash220 1994

[2] Y-C Li A B Korol T Fahima and E Nevo ldquoMicrosatelliteswithin genes structure function and evolutionrdquo MolecularBiology and Evolution vol 21 no 6 pp 991ndash1007 2004

[3] J R Brouwer R Willemsen and B A Oostra ldquoMicrosatelliterepeat instability andneurological diseaserdquoBioessays vol 31 no1 pp 71ndash83 2009

[4] Y C Li A B Korol T Fahima A Beiles and E NevoldquoMicrosatellites genomic distribution putative functions andmutational mechanisms a reviewrdquo Molecular Ecology vol 11no 12 pp 2453ndash2465 2002

[5] S Mundlos F Otto C Mundlos et al ldquoMutations involving thetranscription factor CBFA1 cause cleidocranial dysplasiardquo Cellvol 89 no 5 pp 773ndash779 1997

[6] H Y Zoghbi and H T Orr ldquoGlutamine repeats and neurode-generationrdquoAnnual Review of Neuroscience vol 23 pp 217ndash2472000

[7] C L Cheng T Q Gao Z Wang and D D Li ldquoRoleof insulininsulin-like growth factor 1 signaling pathway inlongevityrdquoWorld Journal of Gastroenterology vol 11 no 13 pp1891ndash1895 2005

[8] K AWoods C Camacho-Hubner D Barter A J L Clark andMO Savage ldquoInsulin-like growth factor I gene deletion causingintrauterine growth retardation and severe short staturerdquo ActaPaediatrica vol 86 no 423 pp 39ndash45 1997

[9] N B Sutter C D Bustamante K Chase et al ldquoA single IGF1allele is a major determinant of small size in dogsrdquo Science vol316 no 5821 pp 112ndash115 2007

[10] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[11] S Lohmann J Ziegler and L Tetzlaff ldquoComparison of tag cloudlayouts task-related performance and visual explorationrdquo inHuman-Computer InteractionmdashINTERACT 2009 vol 5726 ofLecture Notes in Computer Science pp 392ndash404 2009

[12] S Hennig D Groth and H Lehrach ldquoAutomated gene ontol-ogy annotation for anonymous sequence datardquo Nucleic AcidsResearch vol 31 no 13 pp 3712ndash3715 2003

[13] B M Good E A Kawas B Kuo andM DWilkinson ldquoiHOP-erator User-scripting a personalized bioinformaticsWeb start-ing with the iHOP websiterdquo BMC Bioinformatics vol 7 article534 2006

[14] S A Samarajiwa S Forster K Auchettl and P J HertzogldquoINTERFEROME the database of interferon regulated genesrdquoNucleic Acids Research vol 37 no 1 pp D852ndashD857 2009

[15] F Supek M Bosnjak N Skunca and T Smuc ldquoRevigosummarizes and visualizes long lists of gene ontology termsrdquoPLoS ONE vol 6 no 7 Article ID e21800 2011

[16] E Birney T D Andrews P Bevan et al ldquoAn overview ofEnsemblrdquo Genome Research vol 14 pp 925ndash928 2004

[17] CMChen C C Chen TH Shih TW Pai CHHu andW STzou ldquoEfficient algorithms for identifying orthologous simplesequence repeats of disease genesrdquo Journal of Systems Scienceand Complexity vol 23 pp 906ndash916 2010

[18] E Nascimento R Martinez A R Lopes et al ldquoDetection andselection of microsatellites in the genome of Paracoccidioidesbrasiliensis as molecular markers for clinical and epidemiolog-ical studiesrdquo Journal of Clinical Microbiology vol 42 no 11 pp5007ndash5014 2004

[19] A Zimprich A Benet-Pages W Struhal et al ldquoA mutationin VPS35 encoding a subunit of the retromer complex causeslate-onset parkinson diseaserdquo The American Journal of HumanGenetics vol 89 no 1 pp 168ndash175 2011

[20] D V Luquetti A V Hing M J Rieder D A Nickerson EH Turner J Smith et al ldquoMandibulofacial dysostosis withmicrocephaly caused by EFTUD2 mutations expanding thephenotyperdquo The American Journal of Medical Genetics A vol161 pp 108ndash113 2013

BioMed Research International 11

[21] M A Lines L Huang J Schwartzentruber et al ldquoHaploinsuf-ficiency of a spliceosomal GTPase encoded by EFTUD2 causesmandibulofacial dysostosis with microcephalyrdquo The AmericanJournal of Human Genetics vol 90 no 2 pp 369ndash377 2012

[22] J W Fondon III and H R Garner ldquoMolecular origins of rapidand continuous morphological evolutionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 101 no 52 pp 18058ndash18063 2004

[23] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

4 BioMed Research International

an orthologous SSR motif if the conserved ratio meetsthe minimum threshold in an assigned species cluster Forexample a conserved ratio of 80 denotes the identifiedconserved SSR pattern that could be found in at least 80of species in the assigned species cluster which indicatedthat there are at least 5 (6 lowast 80 = 48) different speciespossessing the orthologous gene(s) and holding the specificSSR pattern located within the same genetic region among allorthologous gene(s) Regarding the conditions of many-to-many orthologous genes an SSR motif is defined as holdingconserved feature as long as it could be detected in any oneof its orthologous genes The threshold level of conservedratio can be assigned by users through interactive webpagesettings

Through cross-species comparison between two clusteredgroups retrieving conserved or exclusive SSR motifs couldhelp biologists in choosing significant biomarkers from apreviously defined gene set before performing biologicalexperiments On the other hand exclusive or commonSSR motifs between two different species clusters might beregarded as important genetic markers under the evidencesof biological evolution and functional conservation

25 SSR Tag Cloud Visualization Tag cloud visualizationtechnique provides keyword representation of text databy showing each tag in various font sizes and colors Toenhance the importance of conserved and exclusive SSRmotifs extracted from a set of specified homologous genesbetween two different species clusters we adopted the tagcloud representation to display these identified SSR motifsaccording to their calculated weighting coefficients fromquery gene sets In an SSR tag cloud the tag size ofeach SSR motif not only indicates the conservation statusof the motif among orthologous genes but also displaysthe representativeness among different species clusters Alinear accumulation formula and normalization proceduresfor deciding SSR weighting coefficients were performed fortag size selection This formula simply counts the numberof occurrence times of each SSR motif found from eachindividual gene in different species clusters According tothe definitions of occurrence rate if an identified SSR motifis well conserved in two different species clusters or highlyrepresented in the specified gene set the SSR tag will beassigned with a larger weighting coefficient Accordingly theSSR tag will be displayed with a bigger font size in the tagcloud

In order to visually emphasize identified SSR motifsbelonging to different species clusters we applied differentcolors on SSR tags to distinguish the conserved andorexclusive features of SSR biomarkers between two speciesclusters In this study red tags represent consensus SSRmotifsfor the first species cluster only and satisfy the conservationthreshold in the first species cluster pink tags are applied forrepresenting consensus SSRmotifs for the first species clusteronly but the conservation threshold is not satisfied darkgreen tags represent consensus SSR motifs well conservedwithin the second species cluster only and these motifs alsosatisfied the conservation criterion in the second speciescluster light green tags denote consensus SSR motifs in the

CR

lowastCR conserved ratio percentage

Cluster I Cluster II

Figure 1 Color coded chart for tag cloud representation of iden-tified SSR motifs between two species clusters and the criterion ofconserved ratio

Table 1 Relationship between colors species clusters and con-served ratios of detected SSR motifs

Color Species cluster Conserved motif ratioRed I geCRPink I ltCRBlue I and II geCRYellow I and II ltCRDark green II geCRLight green II ltCR

second species cluster only but the conservation threshold isnot satisfied blue tags represent the identified SSR patternswell conserved in both species clusters and satisfy the speciesconservation percentage as well yellow tags are appliedto show identified SSR patterns conserved but the speciesconservation criterion is not satisfied for the query gene setfrom both species clusters The color-coded information ina resulting tag cloud is shown in Figure 1 and correspondingattributes are described in Table 1 The abbreviated term ofCR represents ldquoconserved ratiordquo percentage of correspond-ing species clusters for each simulation

In the developed system users can also try to identifyimperfect SSR biomarkers by setting different tolerant levelsand the number of retrieved imperfect SSR motifs would bein accordance with the settings proportionally Higher noisyrates allow more tolerant repeat patterns and reflect largernumber of possible SSRmotifs Accordingly the correspond-ing tag clouds could be depicted inmultiscale representationsunder various noise threshold settings In other wordsdifferent scales of tag clouds are composed of SSR motifs ofdifferent tolerant qualities For instance the highest qualityof SSR tag cloud represents that all identified conserved SSRmotifs are with perfect repeating patterns among differentgenes and group species Contrarily lower quality SSR tagclouds contain more tolerant SSR motifs within the tagimage and which may reflect evolutionary status due to genespecification andor duplication events from either distant orclose species Multiscale tag clouds provide biologists withan easier way to compare and select suitable SSR candidatemotifs as biomarkers through a progressive approach ondifferent tolerance levels which could be applied in varioussituations for further design of biological experiments

BioMed Research International 5

Figure 2 Interface of the SSR tag cloud web system (httpssrtccsntouedutw)

3 Results

31 SSR Tag Cloud Web System In this study we havedeveloped an onlineweb system (httpssrtccsntouedutw)for identifying conserved and exclusive SSR biomarkersthrough cross-species cluster comparisonThemain interfaceof the developedweb system is shown in Figure 2 To discoversignificant SSR biomarker candidates from an automaticallygenerated SSR tag cloud a user is required to provide genename(s) or keyword(s) of gene function and simply appliesthe default parameters for system prediction In other wordsa set of query genes could be defined at the first step by pro-viding relevant EnsemblGene IDs GO terms or keywordsBesides the thresholding settings of SSR feature parameterscould also be assigned manually instead of default settingssuch as genetic region length of basic pattern minimumlength of SSR motif SSR quality species cluster and SSRmotif conserved ratio The genetic region and length of basicpattern are applied for distinguishing fundamental featuresof SSR motifs under cross-species cluster comparison Aminimum SSR length is applied to define the minimal lengthfor identification of SSR motifs The SSR quality factorrepresents a tolerance threshold for allowing imperfect SSRsas candidate biomarkers The developed system initiallyprovides three available settings for efficient identification10 for perfect SSRs 08 and 09 for imperfect SSRs with 20and 10 tolerant percentages for an identified SSRmotifThefunction of species cluster assignment is provided for cross-species comparison by classifying species of interest into twoclustersThe parameter ofmotif conserved ratio is designed asthe percentage of qualified species within a cluster possessingthe conserved SSR motif within a target gene Two differentoperation modes were designed for themotif conserved ratioIf a user chooses the condition of larger than or equal tomotifconserved ratio the system will display a resulting SSR tagcloud in 6 colors otherwise an SSR tag cloud will appearin 3 colors only Different color modes of an SSR tag cloudare defined in the previous section Once all parametersand operation modes are defined the system performs SSR

Figure 3 An SSR tag cloud example for ENSG00000069329(VPS35) between two 6-species clusters

biomarker evaluation automatically and generates a final SSRtag cloud for visualization The font color of each SSR tagis mainly decided by the motif conserved ratio parameterand the font size depends only on the occurrence frequencyof an SSR element Users can move the mouse device overany SSR item within the resulting tag clouds and a totalappearance number and conserved ratio of the selected SSRmotif from the target genes of assigned species cluster will bedisplayed The detailed information of each SSR tag is alsoavailable in a floating dialog box by clicking on it whichincludes Ensembl gene ID transcript ID of the specified genepossessing the target SSR motif species name coordinatesin genomes and DNA sequence contents Additionally if anSSR appears within coding regions then its correspondingprotein sequences could be recalled from Ensembl databaseand shown in an additional window

32 SSR Biomarkers for Orthologous Genes To demonstratesystem performance we have selected all orthologous genesfrom twelve vertebrate model species (except fruit fly androundworm) All selected genes possess sequence identi-ties higher than 80 compared to human genome indi-vidually Under this criterion there are totally 162 orthol-ogous genes selected for the first testing case If thesetwelve vertebrate species were classified into two speciesclusters including mammal and fishery species clusters forcomparison the conserved and exclusive SSR motifs foreach gene could be successfully identified and significantSSR biomarker candidates for each individual gene wereincluded in the Supplementary Material available online athttpdxdoiorg1011552014678971 Here we only illustratetwo genes of ENSG00000069329 and ENSG00000108883as examples and all conserved SSR motifs were carefullyverified within all orthologous genes from twelve modelspecies

321 Case Study of ENSG00000069329 (VPS35) TheEnsembl gene ID of ENSG00000069329 is a vacuolar proteinsorting gene (VPS35) which possesses an average sequenceidentity of 80 by taking pairwise alignment betweenhuman and the other eleven model species The resultingSSR tag cloud for VPS35 was shown in Figure 3 by settingSSR quality of 80 minimum SSR length of 20 nucleotidesand motif conserved ratio of 60 (ie required at least

6 BioMed Research International

4 species possessing identical SSR motifs in each speciescluster) The first species cluster was assigned as the mammalgroup including human macaque mouse cow dog andgorilla and the second species cluster was assigned as thefishery group including zebrafish stickleback medaka fugutetraodon and cod The genetic region parameters were setas searching for all regions except introns and the lengthof basic pattern was selected from 1 to 6 nucleotides forcomprehensive representation

According to Figure 1 for SSR color codes users canquickly observe that only three coconserved SSR motifs ofldquoCUpstreamrdquo ldquoAGUpstreamrdquo and ldquoADownstreamrdquo inyellow were found between two species clusters Howeverin this case there is not any blue coded SSR tag in thisexperiment and which implies no coconserved SSR motifexisting for at least 4 model species in each species clustersimultaneouslyThese three yellow color coded SSR tags werefound due to their appearance in both species clusters but notwell conserved with respect to the assigned conserved ratioThedark green SSR tag of ldquoAGCodingrdquo represented the con-sensus SSRmotif could be found only in the second cluster offishery species with more than 4 fishery species containingthe SSR motif at coding region but this motif pattern atcoding regionwas not found in anymammal species from thefirst cluster The light green SSR tags represented consensusSSR motifs which were found only in the fishery group butdo not satisfied the motif conserved ratio requirement of80 that is these light green coded SSR patterns were onlyfound with less than 4 fishery species On the other handthe pink coded SSR tags represented consensus SSR motifsfoundonly in themammal species cluster exclusivelywith lessthan 4 mammal species In addition the dark green SSR tagof ldquoAGCodingrdquo with the biggest font size implied this SSRholding as the most representative and exclusive feature forfishery species compared to mammal species

322 Case Study of ENSG00000108883 (EFTUD2) TheEnsemble gene ID of ENSG00000108883 is an elongationfactor Tu GTP binding domain (EFTUD2) which possessesan average sequence identity of 80 by taking pairwisealignment between human species and other 11model speciesindividually The resulting SSR tag cloud for EFTUD2 wasshown in Figure 4 by setting exactly the same parameters asthe previous example According to the resulting tag cloudusers can immediately identified that only one coconservedSSR tag of ldquoATCCodingrdquo could be found as a notablebiomarker between two species clusters and it was wellconserved across at least 4 species in each species clusterHence the SSR tag was indicated by blue Furthermoreone red coded SSR tag of ldquoADownstreamrdquo represented theconsensus SSRmotifs found only in the first mammal speciescluster and more than 4 species containing the SSR motif atcoding region However this motif could not be found in anyfishery species The pink SSR tags represented all conservedSSRmotifs found only in themammal group but not satisfiedthe requirement ofMotif Conserved Ratio Similarly the lightgreen coded SSR tags represented consensus SSR motifsonly found in the fishery species cluster exclusively with

Figure 4 An SSR tag cloud example for ENSG000000108883(EFTUD2) between two 6-species clusters

less than 4 fishery species In addition the red SSR tagof ldquoADownstreamrdquo was shown with the biggest font sizewhich implied the SSR holding as the most representativeand exclusive for mammal species compared to all other SSRcandidates

Interestingly the first gene VPS35 (ENSG00000069329)is associated with ldquoParkinsonrsquos disease (PD)rdquo [19] andthe second gene EFTUD2 (ENSG00000108883) causesldquomandibulofacial dysostosis withmicrocephalyrdquo [20] In bothcases so far scientists have only demonstrated that bothdiseases were caused by some gene mutations Through insilico SSR biomarker detection by our proposed system wecould efficiently identify many important conserved andexclusive SSRs between two grouped species as biomarkersHowever without experimental verification we could notmake sure whether both diseases possess a true correlationwith identified SSR motifs To gain more confidence onthe proposed system we verified on some disease geneswhich were known to be associated with some specific SSRbiomarkers If a genetic disease is indeed caused by abnormaldistributions of SSR motifs we expect that our proposedSSR tag cloud representation system could identify thosesignificant SSR biomarkers in an efficient and effective way

33 Case Study of a Set of Skeletal Development Genes Todemonstrate functionally related SSRmotifs we have selecteda gene set containing specific function of skeletal develop-ment A total of 17 genes associated with such function areselected and these genes are HOXA11 ZIC2 ALX4 HOXA2DLX2 HOXA7 TWIST1 HOXC13 RUNX2 SOX9 HOXD11HOXD13 GDF11 HLX SIX3 HOXD8 and HOXA10 [21] Inthis example we have shown that the detailed informationof each SSR tag is available in a floating dialog by clickingon it and the appearance number and conserved ratio of aselected SSR motif from the target genes can be viewed bymoving mouse cursor over the SSR tag

The resulting SSR tag clouds from different combinato-rial settings for 17 skeletal development related genes wereshown in Figure 5 In Figure 5(a) the parameter settingswere defined as follows SSR quality of 90 for perfectSSR patterns minimum SSR length of 20 nucleotides motifconserved ratio of 80 (ie at least 5 species possessing

BioMed Research International 7

(a)

(b)

(c)

(d)

Figure 5 (a) SSR tag cloud for 17 skeletal development related genesconstrained to coding regions (b) results of moving a mouse deviceover the SSR tag of ldquoCCGCodingrdquo (c) detailed information of theSSR tag ldquoCCGCodingrdquo in a floating dialog (d) an SSR tag cloud for17 skeletal development related genes by showing SSRs possessinghigh conserved ratios only

identical SSR motifs in each species cluster) and all possibleSSR candidates were shown The first species cluster wasassigned as the mammal group including human macaquemouse cow dog and gorilla the second species cluster wasassigned as the fishery group including zebrafish sticklebackmedaka fugu tetraodon and cod The filter of geneticregion was selected for coding region only and the lengthof basic pattern was selected from 1 to 6 nucleotides forcomprehensive representation According to these settingsthe simulated results were shown in Figure 5 the red codedSSR tag of ldquoCCGCodingrdquo represented the only exclusiveSSR motifs well conserved in mammal species This tagcould be found from at least 5 species within the mammalgroup and it is highly correlated to the skeletal developmentrelated genes Users can move a mouse device over the tag ofldquoCCGCodingrdquo and the appearance number and conservedratio of the selected SSR motif would be shown with a pop-up icon In Figure 5(b) the CCGCoding motifs appear inthe mammal species cluster with a total of 62 times and

a conserved ratio of 100 while no such an SSR motifcould be discovered from the skeletal development gene setwithin the fishery species cluster If a user clicked on thetag of ldquoCCGCodingrdquo detailed information of the SSR tagwill be shown by a floating dialog with Ensembl gene IDtranscript ID species name coordinates in genomes andDNA sequence contents Particularly if the SSRs appearedwithin coding regions the table also provided the detailedinformation of cDNA sequence and its corresponding trans-lated protein sequences In Figure 5(c) the CCG repeatedpattern in the last row of humanrsquos ENSG00000135414 (GDF11)gene is located at chromosome 12 and its coordinates arefrom 56137185 to 56137224 Since the CCG repeated patternwas found in coding regions the table also provided thedetailed information of DNA cDNA and correspondingprotein sequence contents Actually this repeated patternin RUNX2 gene at coding region is a polyalanine pep-tide (GCC repeat in coding region) and it indeed playsa crucial role in cellular development function Abnormaldistribution of this polyalanine repeat biomarkermight causedysplasia disease a genetic disorder of abnormal cellulardevelopment

In Figure 5(d) most of parameters were set identicallyas Figure 5(a) except the display parameter was modifiedfor showing highly conserved SSRs instead of showing allof identified SSRs In the other words tags with pinklight green and yellow color codes would be hiddenThe corresponding tag showed only one red coded tagof ldquoCCGCodingrdquo existed under such high conservationrequirements Again the SSR motif of ldquoCCGCodingrdquo rep-resented as a significant biomarker inmammal species highlycorrelated to the skeletal development related genes

34 Case Study of Gene Ontology Term of ldquoEmbryonicCranial Skeleton Morphogenesisrdquo To demonstrate function-ally related SSR motifs through GO term assignmentwe selected a GO term of ldquoembryonic cranial skeletonmorphogenesisrdquo The related genes annotated by this GOterm include TBX15 SIX4 DLX2 PRRX1 TWIST1 BMP4SIX1 SMAD2 NIPBL NODAL WNT9B TGFBR2 GAS1SIX2 FOXC2 SMAD3 TBX1 TGFBR2 TBX15 GNASPRRX2 TGFBR1 TFAP2A SMAD2 SETD2 BMP4 SMAD3TWIST2 TFAP2A SMAD3 TGFBR1 and BMP4 To com-pare and show different results by various settings we havetried several combinations of input parameters which weredifferent from system default settings In this case study theparameter settings were defined as follows SSR quality of90 for perfect SSR patterns minimum SSR length of 20nucleotidesmotif conserved ratioof 80 (ie at least 5 speciespossessing identical SSR motifs in each species cluster) andshowed all possible SSR candidates The first species clusterwas assigned as mammal group and the second species clusterfor fishery group as default settingsThe filter of genetic regionwas selected for analyzing on coding regions only and thelength of basic pattern was selected from 1 to 6 nucleotidesAccording to these settings the simulated results were shownin Figure 6(a) We could observe that there was only onered color coded SSR tag of ldquoCCGCodingrdquo and which isthe unique biomarker conserved in mammal species with

8 BioMed Research International

(a)

(b)

(c)

Figure 6 (a) SSR tag cloud for GO keyword ldquoembryonic cranialskeleton morphogenesisrdquo with motif conserved ratio of 80 (b)motif conserved ratio of 60 (c) motif conserved ratio of 100

respect to the embryonic cranial skeleton morphogenesisrelated genes

Then we lowered down the motif conserved ratio to 60and the resulting SSR tag cloud was shown in Figure 6(b)We could observe that several tags were changed by theircoded colors Taking red color coded tags as an example therewas only one red tag ldquoCCGCodingrdquo in previous Figure 6(a)but in Figure 6(b) we noticed that the red color coded SSRtags increased another tag of ldquoAATCTGCodingrdquo whichwas displayed in originally denoted as pink in Figure 6(a)Inversely if we increased the motif conserved ratio to 100the result was shown in Figure 6(c) with no red color codedSSR tag in this cloud Compared to Figure 6(a) the originalred tag of ldquoCCGCodingrdquo was changed into pink due toonly 5 out of 6 species in the mammal group holding thetag of ldquoCCGCodingrdquo In both Figures 6(b) and 6(c) wesimply observed that color coded tagsmay switch their colorsthrough different motif conserved ratio adjustments Thehigher setting ofmotif conserved ratio reduces the amount ofred green and blue color coded tags

35 An Example of Genetic Disease of ldquoHuntingtonrsquos Disease(HD)rdquo To demonstrate genetic diseases caused by abnor-mal distribution of SSR motifs we have selected a well-known neurodegenerative genetic disease ldquoHuntingtonrsquos dis-ease (HD)rdquo as an example HD was found as an irregular dis-tribution of polyglutamine expansions (CAG repeats) locatedwithin the coding regions of ENSG00000197386 (HTT)gene at chromosome 4 [22] It appears with involuntarymovements caused by losing muscle coordination and leadsto psychiatric problemsThe nucleotide repeat length and theaverage age of symptom occurrence of Huntingtonrsquos diseasewere in inverse relationship [23]

The verification results of SSR tag cloud were shown inFigure 7 and the parameter settings were defined as followsSSR quality of 100 and 80 minimum SSR length of 20

MammalianBos taurus

Mus musculusHomo sapiensGorilla gorilla

FisheryDanio rerio

Canis familiaris

(a)

MammalianBos taurus

Homo sapiens

FisheryNone

(b)

Figure 7 (a) SSR tag cloud for HTT gene with SSR quality of80 Motif Conserved Ratio of 80 and 5 organisms holding theconserved SSR tag of ldquoAGCCodingrdquo (b) SSR quality of 100MotifConserved Ratio of 80 and only two species of human and cattlespecies holding the perfect SSR tag of ldquoAGCCodingrdquo

nucleotidesmotif conserved ratioof 80 (ie at least 5 speciespossessing identical SSR motifs in each species cluster) andwith a selection of ldquoshow all SSRsrdquo The first species clusterwas assigned as mammal group while the second speciescluster as fishery group In Figure 7 we could observe theldquoAGCCodingrdquo in both two-tag clouds as an importantbiomarker In fact according to shifting transformation ofSSR repeat pattern the ldquoAGCrdquo repeat unit could be theoret-ically considered as the same pattern of ldquoCAGrdquo for efficientidentification However SSRs located within coding regionswould be further translated into their corresponding aminoacid sequences according to precise loci verification on exonregions Frame shifted SSRs in coding regions might result indifferent coded amino acids For example the coded aminoacid of the trinucleotide pattern of ldquoAGCrdquo is serine(S) andldquoCAGrdquo for glutamine (Q) Therefore identified SSRs in cod-ing regions should be carefully treated and translated into anappropriated protein sequence based on annotated genomedatabase In this example we noticed that a significant SSRmotif of ldquoAGCCodingrdquo in HTT genes could be identified

BioMed Research International 9A

mou

nt o

f SSR

s Cluster I Cluster II Cluster I Cluster II

CR

0

(a)

SSR quality 80 80

100100

Cluster I Cluster II

(b)

Figure 8 (a) Relationship between the parameter ofmotif conservedratio and the amount of SSR tags in different colors (b) relationshipbetween the parameter of SSR quality and SSR tag colors

with different sizes (occurrence rates) according to variousSSR quality settings This repeat motif in coding regionsappears in most mammal species except macaque with aminimum length requirement of 20 nucleotides Besides onlyzebrafish possesses a similar repeat motif in coding regionamong all fishery species When the parameter of SSR qualitywas increased to 100 (without any tolerance) the pattern ofldquoAGCCodingrdquo (or equivalently to ldquoCAGCodingrdquo in DNAsense strand) could be retrieved from both cattle and humanin mammal species only We could observe that the font sizeand color of each SSR tag were gradually changed accordingto different settings of tolerance rate Accordingly the tag ofldquoAGCCodingrdquo appeared with the biggest icon in pink whencompared to all other SSRs in coding regions and it reflectedthe significance of exclusive features for mammal speciescompared to fishery species These observations might alsoprovide important information for biologists for animalspecies selection in future experimental studies regardingspecific diseases

4 Discussion

Two key parameters affect the color and size distributionwithin an SSR tag cloud The first one is the motif conservedratio Different conserved ratio values change colors of SSRtags When the motif conserved ratio increased the amountof red green and blue tags might decrease In Figure 8(a)Cluster I represents the first species cluster and Cluster IIrepresents the second species cluster The horizontal straightline in the figure represents a motif conserved ratio valueWhen the CR threshold value is increased the areas of red

blue and dark green decreased In contrast when the CRthreshold value is decreased the areas of red blue and darkgreen increased The area is proportional to the amount ofSSR tags

The second important parameter for a tag cloud is theSSR quality threshold As shown in Figure 8(b) different SSRquality values were not only changing the number of SSR tagsbut also transforming the colors Increment of SSR qualityvalue may reduce the amount of SSR tags since the SSRswith higher qualities are always a subset of SSRs with lowerqualities When a quality threshold decreases to gain moreSSR candidates part of red and green tags might change theircolors into yellow or blue tags respectively This is mainlycaused by newly intersecting region after expanding SSRcandidates

Besides a few common SSR tags originally coded inyellowmight be transformed into either red or green throughincreasing the quality factors which is mainly becausethe total number of species possessing certain SSR tag isdecreased and therefore the conserved SSR motifs betweentwo species clusters might become representative SSR tagsfor one species cluster exclusively In Table 2 a list of totalamount of SSR motifs for each species is presented by settinga minimum SSR length of 20 nucleotides The SSR quantitiesfor mammal species are usually more than fishery speciesand the increment of SSR quality value reduces the amountof SSR motifs in each species generally

5 Conclusion

SSRs are nonrandomly distributed nucleotides in thegenomes with repeating basic patterns of lengths from 1to 6 nucleotides and a large number of functional SSRmotifs have been demonstrated as important biomarkersinvolved within various biological processes and generegulations Due to abundant number of SSRs in eachspecies genomes it is difficult to recognize significant SSRbiomarkers or gene regulation related SSRs mainly based onrepeat sequence length genetic locations and fundamentalrepeat pattern of an SSR motif In this paper we proposedthe concept of identifying SSR biomarker candidate throughcross-species cluster comparison on a specified set of targetgenes The developed system provides an online tool withmultiparameter selection functions and the identified SSRmotifs are displayed by a tag cloud visualization methodThe exclusive and consensus SSR motifs between two speciesclusters are shown in different font colors and sizes in anefficient approach The in silico comparison of SSR motifsacross different species clusters may provide the cluesand evidences for further understanding of evolutionarydevelopment and functional associations

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

10 BioMed Research International

Table 2 The number of SSR motifs of each species for various SSR quality settings

Scientific name Species name SSR quality 80 SSR quality 90 SSR quality 100Danio rerio Zebrafish 1175832 594741 401503Gasterosteus aculeatus Stickleback 160413 87343 51779Oryzias latipes Medaka 122505 37730 15460Takifugu rubripes Fugu 261612 148043 90753Tetraodon nigroviridis Tetraodon 119557 69473 43584Gadus morhua Cod 359592 209540 123880Homo sapiens Human 3023284 1406186 644338Gorilla gorilla Gorilla 757571 344973 152403Macaca mulatta Macaque 1075737 526515 225403Mus musculus Mouse 2463222 1301019 812873Bos taurus Cow 323386 132923 44906Canis familiaris Dog 715776 340433 152502Caenorhabditis elegans Roundworm 59273 13637 4225Drosophila melanogaster Fruit fly 199458 79952 21223

Acknowledgments

This work is supported by the Center of Excellence for theOceans fromNational TaiwanOceanUniversity andNationalScience Council Taiwan (NSC 102ndash2321-B-019-001 and NSC101ndash2627-B-019-003 to T-W Pai) and Department of Healthin Taiwan (DOH102-TD-B-111-004 to H-T Chang)

References

[1] B Charlesworth P Sniegowski and W Stephan ldquoThe evolu-tionary dynamics of repetitive DNA in eukaryotesrdquoNature vol371 no 6494 pp 215ndash220 1994

[2] Y-C Li A B Korol T Fahima and E Nevo ldquoMicrosatelliteswithin genes structure function and evolutionrdquo MolecularBiology and Evolution vol 21 no 6 pp 991ndash1007 2004

[3] J R Brouwer R Willemsen and B A Oostra ldquoMicrosatelliterepeat instability andneurological diseaserdquoBioessays vol 31 no1 pp 71ndash83 2009

[4] Y C Li A B Korol T Fahima A Beiles and E NevoldquoMicrosatellites genomic distribution putative functions andmutational mechanisms a reviewrdquo Molecular Ecology vol 11no 12 pp 2453ndash2465 2002

[5] S Mundlos F Otto C Mundlos et al ldquoMutations involving thetranscription factor CBFA1 cause cleidocranial dysplasiardquo Cellvol 89 no 5 pp 773ndash779 1997

[6] H Y Zoghbi and H T Orr ldquoGlutamine repeats and neurode-generationrdquoAnnual Review of Neuroscience vol 23 pp 217ndash2472000

[7] C L Cheng T Q Gao Z Wang and D D Li ldquoRoleof insulininsulin-like growth factor 1 signaling pathway inlongevityrdquoWorld Journal of Gastroenterology vol 11 no 13 pp1891ndash1895 2005

[8] K AWoods C Camacho-Hubner D Barter A J L Clark andMO Savage ldquoInsulin-like growth factor I gene deletion causingintrauterine growth retardation and severe short staturerdquo ActaPaediatrica vol 86 no 423 pp 39ndash45 1997

[9] N B Sutter C D Bustamante K Chase et al ldquoA single IGF1allele is a major determinant of small size in dogsrdquo Science vol316 no 5821 pp 112ndash115 2007

[10] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[11] S Lohmann J Ziegler and L Tetzlaff ldquoComparison of tag cloudlayouts task-related performance and visual explorationrdquo inHuman-Computer InteractionmdashINTERACT 2009 vol 5726 ofLecture Notes in Computer Science pp 392ndash404 2009

[12] S Hennig D Groth and H Lehrach ldquoAutomated gene ontol-ogy annotation for anonymous sequence datardquo Nucleic AcidsResearch vol 31 no 13 pp 3712ndash3715 2003

[13] B M Good E A Kawas B Kuo andM DWilkinson ldquoiHOP-erator User-scripting a personalized bioinformaticsWeb start-ing with the iHOP websiterdquo BMC Bioinformatics vol 7 article534 2006

[14] S A Samarajiwa S Forster K Auchettl and P J HertzogldquoINTERFEROME the database of interferon regulated genesrdquoNucleic Acids Research vol 37 no 1 pp D852ndashD857 2009

[15] F Supek M Bosnjak N Skunca and T Smuc ldquoRevigosummarizes and visualizes long lists of gene ontology termsrdquoPLoS ONE vol 6 no 7 Article ID e21800 2011

[16] E Birney T D Andrews P Bevan et al ldquoAn overview ofEnsemblrdquo Genome Research vol 14 pp 925ndash928 2004

[17] CMChen C C Chen TH Shih TW Pai CHHu andW STzou ldquoEfficient algorithms for identifying orthologous simplesequence repeats of disease genesrdquo Journal of Systems Scienceand Complexity vol 23 pp 906ndash916 2010

[18] E Nascimento R Martinez A R Lopes et al ldquoDetection andselection of microsatellites in the genome of Paracoccidioidesbrasiliensis as molecular markers for clinical and epidemiolog-ical studiesrdquo Journal of Clinical Microbiology vol 42 no 11 pp5007ndash5014 2004

[19] A Zimprich A Benet-Pages W Struhal et al ldquoA mutationin VPS35 encoding a subunit of the retromer complex causeslate-onset parkinson diseaserdquo The American Journal of HumanGenetics vol 89 no 1 pp 168ndash175 2011

[20] D V Luquetti A V Hing M J Rieder D A Nickerson EH Turner J Smith et al ldquoMandibulofacial dysostosis withmicrocephaly caused by EFTUD2 mutations expanding thephenotyperdquo The American Journal of Medical Genetics A vol161 pp 108ndash113 2013

BioMed Research International 11

[21] M A Lines L Huang J Schwartzentruber et al ldquoHaploinsuf-ficiency of a spliceosomal GTPase encoded by EFTUD2 causesmandibulofacial dysostosis with microcephalyrdquo The AmericanJournal of Human Genetics vol 90 no 2 pp 369ndash377 2012

[22] J W Fondon III and H R Garner ldquoMolecular origins of rapidand continuous morphological evolutionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 101 no 52 pp 18058ndash18063 2004

[23] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

BioMed Research International 5

Figure 2 Interface of the SSR tag cloud web system (httpssrtccsntouedutw)

3 Results

31 SSR Tag Cloud Web System In this study we havedeveloped an onlineweb system (httpssrtccsntouedutw)for identifying conserved and exclusive SSR biomarkersthrough cross-species cluster comparisonThemain interfaceof the developedweb system is shown in Figure 2 To discoversignificant SSR biomarker candidates from an automaticallygenerated SSR tag cloud a user is required to provide genename(s) or keyword(s) of gene function and simply appliesthe default parameters for system prediction In other wordsa set of query genes could be defined at the first step by pro-viding relevant EnsemblGene IDs GO terms or keywordsBesides the thresholding settings of SSR feature parameterscould also be assigned manually instead of default settingssuch as genetic region length of basic pattern minimumlength of SSR motif SSR quality species cluster and SSRmotif conserved ratio The genetic region and length of basicpattern are applied for distinguishing fundamental featuresof SSR motifs under cross-species cluster comparison Aminimum SSR length is applied to define the minimal lengthfor identification of SSR motifs The SSR quality factorrepresents a tolerance threshold for allowing imperfect SSRsas candidate biomarkers The developed system initiallyprovides three available settings for efficient identification10 for perfect SSRs 08 and 09 for imperfect SSRs with 20and 10 tolerant percentages for an identified SSRmotifThefunction of species cluster assignment is provided for cross-species comparison by classifying species of interest into twoclustersThe parameter ofmotif conserved ratio is designed asthe percentage of qualified species within a cluster possessingthe conserved SSR motif within a target gene Two differentoperation modes were designed for themotif conserved ratioIf a user chooses the condition of larger than or equal tomotifconserved ratio the system will display a resulting SSR tagcloud in 6 colors otherwise an SSR tag cloud will appearin 3 colors only Different color modes of an SSR tag cloudare defined in the previous section Once all parametersand operation modes are defined the system performs SSR

Figure 3 An SSR tag cloud example for ENSG00000069329(VPS35) between two 6-species clusters

biomarker evaluation automatically and generates a final SSRtag cloud for visualization The font color of each SSR tagis mainly decided by the motif conserved ratio parameterand the font size depends only on the occurrence frequencyof an SSR element Users can move the mouse device overany SSR item within the resulting tag clouds and a totalappearance number and conserved ratio of the selected SSRmotif from the target genes of assigned species cluster will bedisplayed The detailed information of each SSR tag is alsoavailable in a floating dialog box by clicking on it whichincludes Ensembl gene ID transcript ID of the specified genepossessing the target SSR motif species name coordinatesin genomes and DNA sequence contents Additionally if anSSR appears within coding regions then its correspondingprotein sequences could be recalled from Ensembl databaseand shown in an additional window

32 SSR Biomarkers for Orthologous Genes To demonstratesystem performance we have selected all orthologous genesfrom twelve vertebrate model species (except fruit fly androundworm) All selected genes possess sequence identi-ties higher than 80 compared to human genome indi-vidually Under this criterion there are totally 162 orthol-ogous genes selected for the first testing case If thesetwelve vertebrate species were classified into two speciesclusters including mammal and fishery species clusters forcomparison the conserved and exclusive SSR motifs foreach gene could be successfully identified and significantSSR biomarker candidates for each individual gene wereincluded in the Supplementary Material available online athttpdxdoiorg1011552014678971 Here we only illustratetwo genes of ENSG00000069329 and ENSG00000108883as examples and all conserved SSR motifs were carefullyverified within all orthologous genes from twelve modelspecies

321 Case Study of ENSG00000069329 (VPS35) TheEnsembl gene ID of ENSG00000069329 is a vacuolar proteinsorting gene (VPS35) which possesses an average sequenceidentity of 80 by taking pairwise alignment betweenhuman and the other eleven model species The resultingSSR tag cloud for VPS35 was shown in Figure 3 by settingSSR quality of 80 minimum SSR length of 20 nucleotidesand motif conserved ratio of 60 (ie required at least

6 BioMed Research International

4 species possessing identical SSR motifs in each speciescluster) The first species cluster was assigned as the mammalgroup including human macaque mouse cow dog andgorilla and the second species cluster was assigned as thefishery group including zebrafish stickleback medaka fugutetraodon and cod The genetic region parameters were setas searching for all regions except introns and the lengthof basic pattern was selected from 1 to 6 nucleotides forcomprehensive representation

According to Figure 1 for SSR color codes users canquickly observe that only three coconserved SSR motifs ofldquoCUpstreamrdquo ldquoAGUpstreamrdquo and ldquoADownstreamrdquo inyellow were found between two species clusters Howeverin this case there is not any blue coded SSR tag in thisexperiment and which implies no coconserved SSR motifexisting for at least 4 model species in each species clustersimultaneouslyThese three yellow color coded SSR tags werefound due to their appearance in both species clusters but notwell conserved with respect to the assigned conserved ratioThedark green SSR tag of ldquoAGCodingrdquo represented the con-sensus SSRmotif could be found only in the second cluster offishery species with more than 4 fishery species containingthe SSR motif at coding region but this motif pattern atcoding regionwas not found in anymammal species from thefirst cluster The light green SSR tags represented consensusSSR motifs which were found only in the fishery group butdo not satisfied the motif conserved ratio requirement of80 that is these light green coded SSR patterns were onlyfound with less than 4 fishery species On the other handthe pink coded SSR tags represented consensus SSR motifsfoundonly in themammal species cluster exclusivelywith lessthan 4 mammal species In addition the dark green SSR tagof ldquoAGCodingrdquo with the biggest font size implied this SSRholding as the most representative and exclusive feature forfishery species compared to mammal species

322 Case Study of ENSG00000108883 (EFTUD2) TheEnsemble gene ID of ENSG00000108883 is an elongationfactor Tu GTP binding domain (EFTUD2) which possessesan average sequence identity of 80 by taking pairwisealignment between human species and other 11model speciesindividually The resulting SSR tag cloud for EFTUD2 wasshown in Figure 4 by setting exactly the same parameters asthe previous example According to the resulting tag cloudusers can immediately identified that only one coconservedSSR tag of ldquoATCCodingrdquo could be found as a notablebiomarker between two species clusters and it was wellconserved across at least 4 species in each species clusterHence the SSR tag was indicated by blue Furthermoreone red coded SSR tag of ldquoADownstreamrdquo represented theconsensus SSRmotifs found only in the first mammal speciescluster and more than 4 species containing the SSR motif atcoding region However this motif could not be found in anyfishery species The pink SSR tags represented all conservedSSRmotifs found only in themammal group but not satisfiedthe requirement ofMotif Conserved Ratio Similarly the lightgreen coded SSR tags represented consensus SSR motifsonly found in the fishery species cluster exclusively with

Figure 4 An SSR tag cloud example for ENSG000000108883(EFTUD2) between two 6-species clusters

less than 4 fishery species In addition the red SSR tagof ldquoADownstreamrdquo was shown with the biggest font sizewhich implied the SSR holding as the most representativeand exclusive for mammal species compared to all other SSRcandidates

Interestingly the first gene VPS35 (ENSG00000069329)is associated with ldquoParkinsonrsquos disease (PD)rdquo [19] andthe second gene EFTUD2 (ENSG00000108883) causesldquomandibulofacial dysostosis withmicrocephalyrdquo [20] In bothcases so far scientists have only demonstrated that bothdiseases were caused by some gene mutations Through insilico SSR biomarker detection by our proposed system wecould efficiently identify many important conserved andexclusive SSRs between two grouped species as biomarkersHowever without experimental verification we could notmake sure whether both diseases possess a true correlationwith identified SSR motifs To gain more confidence onthe proposed system we verified on some disease geneswhich were known to be associated with some specific SSRbiomarkers If a genetic disease is indeed caused by abnormaldistributions of SSR motifs we expect that our proposedSSR tag cloud representation system could identify thosesignificant SSR biomarkers in an efficient and effective way

33 Case Study of a Set of Skeletal Development Genes Todemonstrate functionally related SSRmotifs we have selecteda gene set containing specific function of skeletal develop-ment A total of 17 genes associated with such function areselected and these genes are HOXA11 ZIC2 ALX4 HOXA2DLX2 HOXA7 TWIST1 HOXC13 RUNX2 SOX9 HOXD11HOXD13 GDF11 HLX SIX3 HOXD8 and HOXA10 [21] Inthis example we have shown that the detailed informationof each SSR tag is available in a floating dialog by clickingon it and the appearance number and conserved ratio of aselected SSR motif from the target genes can be viewed bymoving mouse cursor over the SSR tag

The resulting SSR tag clouds from different combinato-rial settings for 17 skeletal development related genes wereshown in Figure 5 In Figure 5(a) the parameter settingswere defined as follows SSR quality of 90 for perfectSSR patterns minimum SSR length of 20 nucleotides motifconserved ratio of 80 (ie at least 5 species possessing

BioMed Research International 7

(a)

(b)

(c)

(d)

Figure 5 (a) SSR tag cloud for 17 skeletal development related genesconstrained to coding regions (b) results of moving a mouse deviceover the SSR tag of ldquoCCGCodingrdquo (c) detailed information of theSSR tag ldquoCCGCodingrdquo in a floating dialog (d) an SSR tag cloud for17 skeletal development related genes by showing SSRs possessinghigh conserved ratios only

identical SSR motifs in each species cluster) and all possibleSSR candidates were shown The first species cluster wasassigned as the mammal group including human macaquemouse cow dog and gorilla the second species cluster wasassigned as the fishery group including zebrafish sticklebackmedaka fugu tetraodon and cod The filter of geneticregion was selected for coding region only and the lengthof basic pattern was selected from 1 to 6 nucleotides forcomprehensive representation According to these settingsthe simulated results were shown in Figure 5 the red codedSSR tag of ldquoCCGCodingrdquo represented the only exclusiveSSR motifs well conserved in mammal species This tagcould be found from at least 5 species within the mammalgroup and it is highly correlated to the skeletal developmentrelated genes Users can move a mouse device over the tag ofldquoCCGCodingrdquo and the appearance number and conservedratio of the selected SSR motif would be shown with a pop-up icon In Figure 5(b) the CCGCoding motifs appear inthe mammal species cluster with a total of 62 times and

a conserved ratio of 100 while no such an SSR motifcould be discovered from the skeletal development gene setwithin the fishery species cluster If a user clicked on thetag of ldquoCCGCodingrdquo detailed information of the SSR tagwill be shown by a floating dialog with Ensembl gene IDtranscript ID species name coordinates in genomes andDNA sequence contents Particularly if the SSRs appearedwithin coding regions the table also provided the detailedinformation of cDNA sequence and its corresponding trans-lated protein sequences In Figure 5(c) the CCG repeatedpattern in the last row of humanrsquos ENSG00000135414 (GDF11)gene is located at chromosome 12 and its coordinates arefrom 56137185 to 56137224 Since the CCG repeated patternwas found in coding regions the table also provided thedetailed information of DNA cDNA and correspondingprotein sequence contents Actually this repeated patternin RUNX2 gene at coding region is a polyalanine pep-tide (GCC repeat in coding region) and it indeed playsa crucial role in cellular development function Abnormaldistribution of this polyalanine repeat biomarkermight causedysplasia disease a genetic disorder of abnormal cellulardevelopment

In Figure 5(d) most of parameters were set identicallyas Figure 5(a) except the display parameter was modifiedfor showing highly conserved SSRs instead of showing allof identified SSRs In the other words tags with pinklight green and yellow color codes would be hiddenThe corresponding tag showed only one red coded tagof ldquoCCGCodingrdquo existed under such high conservationrequirements Again the SSR motif of ldquoCCGCodingrdquo rep-resented as a significant biomarker inmammal species highlycorrelated to the skeletal development related genes

34 Case Study of Gene Ontology Term of ldquoEmbryonicCranial Skeleton Morphogenesisrdquo To demonstrate function-ally related SSR motifs through GO term assignmentwe selected a GO term of ldquoembryonic cranial skeletonmorphogenesisrdquo The related genes annotated by this GOterm include TBX15 SIX4 DLX2 PRRX1 TWIST1 BMP4SIX1 SMAD2 NIPBL NODAL WNT9B TGFBR2 GAS1SIX2 FOXC2 SMAD3 TBX1 TGFBR2 TBX15 GNASPRRX2 TGFBR1 TFAP2A SMAD2 SETD2 BMP4 SMAD3TWIST2 TFAP2A SMAD3 TGFBR1 and BMP4 To com-pare and show different results by various settings we havetried several combinations of input parameters which weredifferent from system default settings In this case study theparameter settings were defined as follows SSR quality of90 for perfect SSR patterns minimum SSR length of 20nucleotidesmotif conserved ratioof 80 (ie at least 5 speciespossessing identical SSR motifs in each species cluster) andshowed all possible SSR candidates The first species clusterwas assigned as mammal group and the second species clusterfor fishery group as default settingsThe filter of genetic regionwas selected for analyzing on coding regions only and thelength of basic pattern was selected from 1 to 6 nucleotidesAccording to these settings the simulated results were shownin Figure 6(a) We could observe that there was only onered color coded SSR tag of ldquoCCGCodingrdquo and which isthe unique biomarker conserved in mammal species with

8 BioMed Research International

(a)

(b)

(c)

Figure 6 (a) SSR tag cloud for GO keyword ldquoembryonic cranialskeleton morphogenesisrdquo with motif conserved ratio of 80 (b)motif conserved ratio of 60 (c) motif conserved ratio of 100

respect to the embryonic cranial skeleton morphogenesisrelated genes

Then we lowered down the motif conserved ratio to 60and the resulting SSR tag cloud was shown in Figure 6(b)We could observe that several tags were changed by theircoded colors Taking red color coded tags as an example therewas only one red tag ldquoCCGCodingrdquo in previous Figure 6(a)but in Figure 6(b) we noticed that the red color coded SSRtags increased another tag of ldquoAATCTGCodingrdquo whichwas displayed in originally denoted as pink in Figure 6(a)Inversely if we increased the motif conserved ratio to 100the result was shown in Figure 6(c) with no red color codedSSR tag in this cloud Compared to Figure 6(a) the originalred tag of ldquoCCGCodingrdquo was changed into pink due toonly 5 out of 6 species in the mammal group holding thetag of ldquoCCGCodingrdquo In both Figures 6(b) and 6(c) wesimply observed that color coded tagsmay switch their colorsthrough different motif conserved ratio adjustments Thehigher setting ofmotif conserved ratio reduces the amount ofred green and blue color coded tags

35 An Example of Genetic Disease of ldquoHuntingtonrsquos Disease(HD)rdquo To demonstrate genetic diseases caused by abnor-mal distribution of SSR motifs we have selected a well-known neurodegenerative genetic disease ldquoHuntingtonrsquos dis-ease (HD)rdquo as an example HD was found as an irregular dis-tribution of polyglutamine expansions (CAG repeats) locatedwithin the coding regions of ENSG00000197386 (HTT)gene at chromosome 4 [22] It appears with involuntarymovements caused by losing muscle coordination and leadsto psychiatric problemsThe nucleotide repeat length and theaverage age of symptom occurrence of Huntingtonrsquos diseasewere in inverse relationship [23]

The verification results of SSR tag cloud were shown inFigure 7 and the parameter settings were defined as followsSSR quality of 100 and 80 minimum SSR length of 20

MammalianBos taurus

Mus musculusHomo sapiensGorilla gorilla

FisheryDanio rerio

Canis familiaris

(a)

MammalianBos taurus

Homo sapiens

FisheryNone

(b)

Figure 7 (a) SSR tag cloud for HTT gene with SSR quality of80 Motif Conserved Ratio of 80 and 5 organisms holding theconserved SSR tag of ldquoAGCCodingrdquo (b) SSR quality of 100MotifConserved Ratio of 80 and only two species of human and cattlespecies holding the perfect SSR tag of ldquoAGCCodingrdquo

nucleotidesmotif conserved ratioof 80 (ie at least 5 speciespossessing identical SSR motifs in each species cluster) andwith a selection of ldquoshow all SSRsrdquo The first species clusterwas assigned as mammal group while the second speciescluster as fishery group In Figure 7 we could observe theldquoAGCCodingrdquo in both two-tag clouds as an importantbiomarker In fact according to shifting transformation ofSSR repeat pattern the ldquoAGCrdquo repeat unit could be theoret-ically considered as the same pattern of ldquoCAGrdquo for efficientidentification However SSRs located within coding regionswould be further translated into their corresponding aminoacid sequences according to precise loci verification on exonregions Frame shifted SSRs in coding regions might result indifferent coded amino acids For example the coded aminoacid of the trinucleotide pattern of ldquoAGCrdquo is serine(S) andldquoCAGrdquo for glutamine (Q) Therefore identified SSRs in cod-ing regions should be carefully treated and translated into anappropriated protein sequence based on annotated genomedatabase In this example we noticed that a significant SSRmotif of ldquoAGCCodingrdquo in HTT genes could be identified

BioMed Research International 9A

mou

nt o

f SSR

s Cluster I Cluster II Cluster I Cluster II

CR

0

(a)

SSR quality 80 80

100100

Cluster I Cluster II

(b)

Figure 8 (a) Relationship between the parameter ofmotif conservedratio and the amount of SSR tags in different colors (b) relationshipbetween the parameter of SSR quality and SSR tag colors

with different sizes (occurrence rates) according to variousSSR quality settings This repeat motif in coding regionsappears in most mammal species except macaque with aminimum length requirement of 20 nucleotides Besides onlyzebrafish possesses a similar repeat motif in coding regionamong all fishery species When the parameter of SSR qualitywas increased to 100 (without any tolerance) the pattern ofldquoAGCCodingrdquo (or equivalently to ldquoCAGCodingrdquo in DNAsense strand) could be retrieved from both cattle and humanin mammal species only We could observe that the font sizeand color of each SSR tag were gradually changed accordingto different settings of tolerance rate Accordingly the tag ofldquoAGCCodingrdquo appeared with the biggest icon in pink whencompared to all other SSRs in coding regions and it reflectedthe significance of exclusive features for mammal speciescompared to fishery species These observations might alsoprovide important information for biologists for animalspecies selection in future experimental studies regardingspecific diseases

4 Discussion

Two key parameters affect the color and size distributionwithin an SSR tag cloud The first one is the motif conservedratio Different conserved ratio values change colors of SSRtags When the motif conserved ratio increased the amountof red green and blue tags might decrease In Figure 8(a)Cluster I represents the first species cluster and Cluster IIrepresents the second species cluster The horizontal straightline in the figure represents a motif conserved ratio valueWhen the CR threshold value is increased the areas of red

blue and dark green decreased In contrast when the CRthreshold value is decreased the areas of red blue and darkgreen increased The area is proportional to the amount ofSSR tags

The second important parameter for a tag cloud is theSSR quality threshold As shown in Figure 8(b) different SSRquality values were not only changing the number of SSR tagsbut also transforming the colors Increment of SSR qualityvalue may reduce the amount of SSR tags since the SSRswith higher qualities are always a subset of SSRs with lowerqualities When a quality threshold decreases to gain moreSSR candidates part of red and green tags might change theircolors into yellow or blue tags respectively This is mainlycaused by newly intersecting region after expanding SSRcandidates

Besides a few common SSR tags originally coded inyellowmight be transformed into either red or green throughincreasing the quality factors which is mainly becausethe total number of species possessing certain SSR tag isdecreased and therefore the conserved SSR motifs betweentwo species clusters might become representative SSR tagsfor one species cluster exclusively In Table 2 a list of totalamount of SSR motifs for each species is presented by settinga minimum SSR length of 20 nucleotides The SSR quantitiesfor mammal species are usually more than fishery speciesand the increment of SSR quality value reduces the amountof SSR motifs in each species generally

5 Conclusion

SSRs are nonrandomly distributed nucleotides in thegenomes with repeating basic patterns of lengths from 1to 6 nucleotides and a large number of functional SSRmotifs have been demonstrated as important biomarkersinvolved within various biological processes and generegulations Due to abundant number of SSRs in eachspecies genomes it is difficult to recognize significant SSRbiomarkers or gene regulation related SSRs mainly based onrepeat sequence length genetic locations and fundamentalrepeat pattern of an SSR motif In this paper we proposedthe concept of identifying SSR biomarker candidate throughcross-species cluster comparison on a specified set of targetgenes The developed system provides an online tool withmultiparameter selection functions and the identified SSRmotifs are displayed by a tag cloud visualization methodThe exclusive and consensus SSR motifs between two speciesclusters are shown in different font colors and sizes in anefficient approach The in silico comparison of SSR motifsacross different species clusters may provide the cluesand evidences for further understanding of evolutionarydevelopment and functional associations

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

10 BioMed Research International

Table 2 The number of SSR motifs of each species for various SSR quality settings

Scientific name Species name SSR quality 80 SSR quality 90 SSR quality 100Danio rerio Zebrafish 1175832 594741 401503Gasterosteus aculeatus Stickleback 160413 87343 51779Oryzias latipes Medaka 122505 37730 15460Takifugu rubripes Fugu 261612 148043 90753Tetraodon nigroviridis Tetraodon 119557 69473 43584Gadus morhua Cod 359592 209540 123880Homo sapiens Human 3023284 1406186 644338Gorilla gorilla Gorilla 757571 344973 152403Macaca mulatta Macaque 1075737 526515 225403Mus musculus Mouse 2463222 1301019 812873Bos taurus Cow 323386 132923 44906Canis familiaris Dog 715776 340433 152502Caenorhabditis elegans Roundworm 59273 13637 4225Drosophila melanogaster Fruit fly 199458 79952 21223

Acknowledgments

This work is supported by the Center of Excellence for theOceans fromNational TaiwanOceanUniversity andNationalScience Council Taiwan (NSC 102ndash2321-B-019-001 and NSC101ndash2627-B-019-003 to T-W Pai) and Department of Healthin Taiwan (DOH102-TD-B-111-004 to H-T Chang)

References

[1] B Charlesworth P Sniegowski and W Stephan ldquoThe evolu-tionary dynamics of repetitive DNA in eukaryotesrdquoNature vol371 no 6494 pp 215ndash220 1994

[2] Y-C Li A B Korol T Fahima and E Nevo ldquoMicrosatelliteswithin genes structure function and evolutionrdquo MolecularBiology and Evolution vol 21 no 6 pp 991ndash1007 2004

[3] J R Brouwer R Willemsen and B A Oostra ldquoMicrosatelliterepeat instability andneurological diseaserdquoBioessays vol 31 no1 pp 71ndash83 2009

[4] Y C Li A B Korol T Fahima A Beiles and E NevoldquoMicrosatellites genomic distribution putative functions andmutational mechanisms a reviewrdquo Molecular Ecology vol 11no 12 pp 2453ndash2465 2002

[5] S Mundlos F Otto C Mundlos et al ldquoMutations involving thetranscription factor CBFA1 cause cleidocranial dysplasiardquo Cellvol 89 no 5 pp 773ndash779 1997

[6] H Y Zoghbi and H T Orr ldquoGlutamine repeats and neurode-generationrdquoAnnual Review of Neuroscience vol 23 pp 217ndash2472000

[7] C L Cheng T Q Gao Z Wang and D D Li ldquoRoleof insulininsulin-like growth factor 1 signaling pathway inlongevityrdquoWorld Journal of Gastroenterology vol 11 no 13 pp1891ndash1895 2005

[8] K AWoods C Camacho-Hubner D Barter A J L Clark andMO Savage ldquoInsulin-like growth factor I gene deletion causingintrauterine growth retardation and severe short staturerdquo ActaPaediatrica vol 86 no 423 pp 39ndash45 1997

[9] N B Sutter C D Bustamante K Chase et al ldquoA single IGF1allele is a major determinant of small size in dogsrdquo Science vol316 no 5821 pp 112ndash115 2007

[10] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[11] S Lohmann J Ziegler and L Tetzlaff ldquoComparison of tag cloudlayouts task-related performance and visual explorationrdquo inHuman-Computer InteractionmdashINTERACT 2009 vol 5726 ofLecture Notes in Computer Science pp 392ndash404 2009

[12] S Hennig D Groth and H Lehrach ldquoAutomated gene ontol-ogy annotation for anonymous sequence datardquo Nucleic AcidsResearch vol 31 no 13 pp 3712ndash3715 2003

[13] B M Good E A Kawas B Kuo andM DWilkinson ldquoiHOP-erator User-scripting a personalized bioinformaticsWeb start-ing with the iHOP websiterdquo BMC Bioinformatics vol 7 article534 2006

[14] S A Samarajiwa S Forster K Auchettl and P J HertzogldquoINTERFEROME the database of interferon regulated genesrdquoNucleic Acids Research vol 37 no 1 pp D852ndashD857 2009

[15] F Supek M Bosnjak N Skunca and T Smuc ldquoRevigosummarizes and visualizes long lists of gene ontology termsrdquoPLoS ONE vol 6 no 7 Article ID e21800 2011

[16] E Birney T D Andrews P Bevan et al ldquoAn overview ofEnsemblrdquo Genome Research vol 14 pp 925ndash928 2004

[17] CMChen C C Chen TH Shih TW Pai CHHu andW STzou ldquoEfficient algorithms for identifying orthologous simplesequence repeats of disease genesrdquo Journal of Systems Scienceand Complexity vol 23 pp 906ndash916 2010

[18] E Nascimento R Martinez A R Lopes et al ldquoDetection andselection of microsatellites in the genome of Paracoccidioidesbrasiliensis as molecular markers for clinical and epidemiolog-ical studiesrdquo Journal of Clinical Microbiology vol 42 no 11 pp5007ndash5014 2004

[19] A Zimprich A Benet-Pages W Struhal et al ldquoA mutationin VPS35 encoding a subunit of the retromer complex causeslate-onset parkinson diseaserdquo The American Journal of HumanGenetics vol 89 no 1 pp 168ndash175 2011

[20] D V Luquetti A V Hing M J Rieder D A Nickerson EH Turner J Smith et al ldquoMandibulofacial dysostosis withmicrocephaly caused by EFTUD2 mutations expanding thephenotyperdquo The American Journal of Medical Genetics A vol161 pp 108ndash113 2013

BioMed Research International 11

[21] M A Lines L Huang J Schwartzentruber et al ldquoHaploinsuf-ficiency of a spliceosomal GTPase encoded by EFTUD2 causesmandibulofacial dysostosis with microcephalyrdquo The AmericanJournal of Human Genetics vol 90 no 2 pp 369ndash377 2012

[22] J W Fondon III and H R Garner ldquoMolecular origins of rapidand continuous morphological evolutionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 101 no 52 pp 18058ndash18063 2004

[23] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

6 BioMed Research International

4 species possessing identical SSR motifs in each speciescluster) The first species cluster was assigned as the mammalgroup including human macaque mouse cow dog andgorilla and the second species cluster was assigned as thefishery group including zebrafish stickleback medaka fugutetraodon and cod The genetic region parameters were setas searching for all regions except introns and the lengthof basic pattern was selected from 1 to 6 nucleotides forcomprehensive representation

According to Figure 1 for SSR color codes users canquickly observe that only three coconserved SSR motifs ofldquoCUpstreamrdquo ldquoAGUpstreamrdquo and ldquoADownstreamrdquo inyellow were found between two species clusters Howeverin this case there is not any blue coded SSR tag in thisexperiment and which implies no coconserved SSR motifexisting for at least 4 model species in each species clustersimultaneouslyThese three yellow color coded SSR tags werefound due to their appearance in both species clusters but notwell conserved with respect to the assigned conserved ratioThedark green SSR tag of ldquoAGCodingrdquo represented the con-sensus SSRmotif could be found only in the second cluster offishery species with more than 4 fishery species containingthe SSR motif at coding region but this motif pattern atcoding regionwas not found in anymammal species from thefirst cluster The light green SSR tags represented consensusSSR motifs which were found only in the fishery group butdo not satisfied the motif conserved ratio requirement of80 that is these light green coded SSR patterns were onlyfound with less than 4 fishery species On the other handthe pink coded SSR tags represented consensus SSR motifsfoundonly in themammal species cluster exclusivelywith lessthan 4 mammal species In addition the dark green SSR tagof ldquoAGCodingrdquo with the biggest font size implied this SSRholding as the most representative and exclusive feature forfishery species compared to mammal species

322 Case Study of ENSG00000108883 (EFTUD2) TheEnsemble gene ID of ENSG00000108883 is an elongationfactor Tu GTP binding domain (EFTUD2) which possessesan average sequence identity of 80 by taking pairwisealignment between human species and other 11model speciesindividually The resulting SSR tag cloud for EFTUD2 wasshown in Figure 4 by setting exactly the same parameters asthe previous example According to the resulting tag cloudusers can immediately identified that only one coconservedSSR tag of ldquoATCCodingrdquo could be found as a notablebiomarker between two species clusters and it was wellconserved across at least 4 species in each species clusterHence the SSR tag was indicated by blue Furthermoreone red coded SSR tag of ldquoADownstreamrdquo represented theconsensus SSRmotifs found only in the first mammal speciescluster and more than 4 species containing the SSR motif atcoding region However this motif could not be found in anyfishery species The pink SSR tags represented all conservedSSRmotifs found only in themammal group but not satisfiedthe requirement ofMotif Conserved Ratio Similarly the lightgreen coded SSR tags represented consensus SSR motifsonly found in the fishery species cluster exclusively with

Figure 4 An SSR tag cloud example for ENSG000000108883(EFTUD2) between two 6-species clusters

less than 4 fishery species In addition the red SSR tagof ldquoADownstreamrdquo was shown with the biggest font sizewhich implied the SSR holding as the most representativeand exclusive for mammal species compared to all other SSRcandidates

Interestingly the first gene VPS35 (ENSG00000069329)is associated with ldquoParkinsonrsquos disease (PD)rdquo [19] andthe second gene EFTUD2 (ENSG00000108883) causesldquomandibulofacial dysostosis withmicrocephalyrdquo [20] In bothcases so far scientists have only demonstrated that bothdiseases were caused by some gene mutations Through insilico SSR biomarker detection by our proposed system wecould efficiently identify many important conserved andexclusive SSRs between two grouped species as biomarkersHowever without experimental verification we could notmake sure whether both diseases possess a true correlationwith identified SSR motifs To gain more confidence onthe proposed system we verified on some disease geneswhich were known to be associated with some specific SSRbiomarkers If a genetic disease is indeed caused by abnormaldistributions of SSR motifs we expect that our proposedSSR tag cloud representation system could identify thosesignificant SSR biomarkers in an efficient and effective way

33 Case Study of a Set of Skeletal Development Genes Todemonstrate functionally related SSRmotifs we have selecteda gene set containing specific function of skeletal develop-ment A total of 17 genes associated with such function areselected and these genes are HOXA11 ZIC2 ALX4 HOXA2DLX2 HOXA7 TWIST1 HOXC13 RUNX2 SOX9 HOXD11HOXD13 GDF11 HLX SIX3 HOXD8 and HOXA10 [21] Inthis example we have shown that the detailed informationof each SSR tag is available in a floating dialog by clickingon it and the appearance number and conserved ratio of aselected SSR motif from the target genes can be viewed bymoving mouse cursor over the SSR tag

The resulting SSR tag clouds from different combinato-rial settings for 17 skeletal development related genes wereshown in Figure 5 In Figure 5(a) the parameter settingswere defined as follows SSR quality of 90 for perfectSSR patterns minimum SSR length of 20 nucleotides motifconserved ratio of 80 (ie at least 5 species possessing

BioMed Research International 7

(a)

(b)

(c)

(d)

Figure 5 (a) SSR tag cloud for 17 skeletal development related genesconstrained to coding regions (b) results of moving a mouse deviceover the SSR tag of ldquoCCGCodingrdquo (c) detailed information of theSSR tag ldquoCCGCodingrdquo in a floating dialog (d) an SSR tag cloud for17 skeletal development related genes by showing SSRs possessinghigh conserved ratios only

identical SSR motifs in each species cluster) and all possibleSSR candidates were shown The first species cluster wasassigned as the mammal group including human macaquemouse cow dog and gorilla the second species cluster wasassigned as the fishery group including zebrafish sticklebackmedaka fugu tetraodon and cod The filter of geneticregion was selected for coding region only and the lengthof basic pattern was selected from 1 to 6 nucleotides forcomprehensive representation According to these settingsthe simulated results were shown in Figure 5 the red codedSSR tag of ldquoCCGCodingrdquo represented the only exclusiveSSR motifs well conserved in mammal species This tagcould be found from at least 5 species within the mammalgroup and it is highly correlated to the skeletal developmentrelated genes Users can move a mouse device over the tag ofldquoCCGCodingrdquo and the appearance number and conservedratio of the selected SSR motif would be shown with a pop-up icon In Figure 5(b) the CCGCoding motifs appear inthe mammal species cluster with a total of 62 times and

a conserved ratio of 100 while no such an SSR motifcould be discovered from the skeletal development gene setwithin the fishery species cluster If a user clicked on thetag of ldquoCCGCodingrdquo detailed information of the SSR tagwill be shown by a floating dialog with Ensembl gene IDtranscript ID species name coordinates in genomes andDNA sequence contents Particularly if the SSRs appearedwithin coding regions the table also provided the detailedinformation of cDNA sequence and its corresponding trans-lated protein sequences In Figure 5(c) the CCG repeatedpattern in the last row of humanrsquos ENSG00000135414 (GDF11)gene is located at chromosome 12 and its coordinates arefrom 56137185 to 56137224 Since the CCG repeated patternwas found in coding regions the table also provided thedetailed information of DNA cDNA and correspondingprotein sequence contents Actually this repeated patternin RUNX2 gene at coding region is a polyalanine pep-tide (GCC repeat in coding region) and it indeed playsa crucial role in cellular development function Abnormaldistribution of this polyalanine repeat biomarkermight causedysplasia disease a genetic disorder of abnormal cellulardevelopment

In Figure 5(d) most of parameters were set identicallyas Figure 5(a) except the display parameter was modifiedfor showing highly conserved SSRs instead of showing allof identified SSRs In the other words tags with pinklight green and yellow color codes would be hiddenThe corresponding tag showed only one red coded tagof ldquoCCGCodingrdquo existed under such high conservationrequirements Again the SSR motif of ldquoCCGCodingrdquo rep-resented as a significant biomarker inmammal species highlycorrelated to the skeletal development related genes

34 Case Study of Gene Ontology Term of ldquoEmbryonicCranial Skeleton Morphogenesisrdquo To demonstrate function-ally related SSR motifs through GO term assignmentwe selected a GO term of ldquoembryonic cranial skeletonmorphogenesisrdquo The related genes annotated by this GOterm include TBX15 SIX4 DLX2 PRRX1 TWIST1 BMP4SIX1 SMAD2 NIPBL NODAL WNT9B TGFBR2 GAS1SIX2 FOXC2 SMAD3 TBX1 TGFBR2 TBX15 GNASPRRX2 TGFBR1 TFAP2A SMAD2 SETD2 BMP4 SMAD3TWIST2 TFAP2A SMAD3 TGFBR1 and BMP4 To com-pare and show different results by various settings we havetried several combinations of input parameters which weredifferent from system default settings In this case study theparameter settings were defined as follows SSR quality of90 for perfect SSR patterns minimum SSR length of 20nucleotidesmotif conserved ratioof 80 (ie at least 5 speciespossessing identical SSR motifs in each species cluster) andshowed all possible SSR candidates The first species clusterwas assigned as mammal group and the second species clusterfor fishery group as default settingsThe filter of genetic regionwas selected for analyzing on coding regions only and thelength of basic pattern was selected from 1 to 6 nucleotidesAccording to these settings the simulated results were shownin Figure 6(a) We could observe that there was only onered color coded SSR tag of ldquoCCGCodingrdquo and which isthe unique biomarker conserved in mammal species with

8 BioMed Research International

(a)

(b)

(c)

Figure 6 (a) SSR tag cloud for GO keyword ldquoembryonic cranialskeleton morphogenesisrdquo with motif conserved ratio of 80 (b)motif conserved ratio of 60 (c) motif conserved ratio of 100

respect to the embryonic cranial skeleton morphogenesisrelated genes

Then we lowered down the motif conserved ratio to 60and the resulting SSR tag cloud was shown in Figure 6(b)We could observe that several tags were changed by theircoded colors Taking red color coded tags as an example therewas only one red tag ldquoCCGCodingrdquo in previous Figure 6(a)but in Figure 6(b) we noticed that the red color coded SSRtags increased another tag of ldquoAATCTGCodingrdquo whichwas displayed in originally denoted as pink in Figure 6(a)Inversely if we increased the motif conserved ratio to 100the result was shown in Figure 6(c) with no red color codedSSR tag in this cloud Compared to Figure 6(a) the originalred tag of ldquoCCGCodingrdquo was changed into pink due toonly 5 out of 6 species in the mammal group holding thetag of ldquoCCGCodingrdquo In both Figures 6(b) and 6(c) wesimply observed that color coded tagsmay switch their colorsthrough different motif conserved ratio adjustments Thehigher setting ofmotif conserved ratio reduces the amount ofred green and blue color coded tags

35 An Example of Genetic Disease of ldquoHuntingtonrsquos Disease(HD)rdquo To demonstrate genetic diseases caused by abnor-mal distribution of SSR motifs we have selected a well-known neurodegenerative genetic disease ldquoHuntingtonrsquos dis-ease (HD)rdquo as an example HD was found as an irregular dis-tribution of polyglutamine expansions (CAG repeats) locatedwithin the coding regions of ENSG00000197386 (HTT)gene at chromosome 4 [22] It appears with involuntarymovements caused by losing muscle coordination and leadsto psychiatric problemsThe nucleotide repeat length and theaverage age of symptom occurrence of Huntingtonrsquos diseasewere in inverse relationship [23]

The verification results of SSR tag cloud were shown inFigure 7 and the parameter settings were defined as followsSSR quality of 100 and 80 minimum SSR length of 20

MammalianBos taurus

Mus musculusHomo sapiensGorilla gorilla

FisheryDanio rerio

Canis familiaris

(a)

MammalianBos taurus

Homo sapiens

FisheryNone

(b)

Figure 7 (a) SSR tag cloud for HTT gene with SSR quality of80 Motif Conserved Ratio of 80 and 5 organisms holding theconserved SSR tag of ldquoAGCCodingrdquo (b) SSR quality of 100MotifConserved Ratio of 80 and only two species of human and cattlespecies holding the perfect SSR tag of ldquoAGCCodingrdquo

nucleotidesmotif conserved ratioof 80 (ie at least 5 speciespossessing identical SSR motifs in each species cluster) andwith a selection of ldquoshow all SSRsrdquo The first species clusterwas assigned as mammal group while the second speciescluster as fishery group In Figure 7 we could observe theldquoAGCCodingrdquo in both two-tag clouds as an importantbiomarker In fact according to shifting transformation ofSSR repeat pattern the ldquoAGCrdquo repeat unit could be theoret-ically considered as the same pattern of ldquoCAGrdquo for efficientidentification However SSRs located within coding regionswould be further translated into their corresponding aminoacid sequences according to precise loci verification on exonregions Frame shifted SSRs in coding regions might result indifferent coded amino acids For example the coded aminoacid of the trinucleotide pattern of ldquoAGCrdquo is serine(S) andldquoCAGrdquo for glutamine (Q) Therefore identified SSRs in cod-ing regions should be carefully treated and translated into anappropriated protein sequence based on annotated genomedatabase In this example we noticed that a significant SSRmotif of ldquoAGCCodingrdquo in HTT genes could be identified

BioMed Research International 9A

mou

nt o

f SSR

s Cluster I Cluster II Cluster I Cluster II

CR

0

(a)

SSR quality 80 80

100100

Cluster I Cluster II

(b)

Figure 8 (a) Relationship between the parameter ofmotif conservedratio and the amount of SSR tags in different colors (b) relationshipbetween the parameter of SSR quality and SSR tag colors

with different sizes (occurrence rates) according to variousSSR quality settings This repeat motif in coding regionsappears in most mammal species except macaque with aminimum length requirement of 20 nucleotides Besides onlyzebrafish possesses a similar repeat motif in coding regionamong all fishery species When the parameter of SSR qualitywas increased to 100 (without any tolerance) the pattern ofldquoAGCCodingrdquo (or equivalently to ldquoCAGCodingrdquo in DNAsense strand) could be retrieved from both cattle and humanin mammal species only We could observe that the font sizeand color of each SSR tag were gradually changed accordingto different settings of tolerance rate Accordingly the tag ofldquoAGCCodingrdquo appeared with the biggest icon in pink whencompared to all other SSRs in coding regions and it reflectedthe significance of exclusive features for mammal speciescompared to fishery species These observations might alsoprovide important information for biologists for animalspecies selection in future experimental studies regardingspecific diseases

4 Discussion

Two key parameters affect the color and size distributionwithin an SSR tag cloud The first one is the motif conservedratio Different conserved ratio values change colors of SSRtags When the motif conserved ratio increased the amountof red green and blue tags might decrease In Figure 8(a)Cluster I represents the first species cluster and Cluster IIrepresents the second species cluster The horizontal straightline in the figure represents a motif conserved ratio valueWhen the CR threshold value is increased the areas of red

blue and dark green decreased In contrast when the CRthreshold value is decreased the areas of red blue and darkgreen increased The area is proportional to the amount ofSSR tags

The second important parameter for a tag cloud is theSSR quality threshold As shown in Figure 8(b) different SSRquality values were not only changing the number of SSR tagsbut also transforming the colors Increment of SSR qualityvalue may reduce the amount of SSR tags since the SSRswith higher qualities are always a subset of SSRs with lowerqualities When a quality threshold decreases to gain moreSSR candidates part of red and green tags might change theircolors into yellow or blue tags respectively This is mainlycaused by newly intersecting region after expanding SSRcandidates

Besides a few common SSR tags originally coded inyellowmight be transformed into either red or green throughincreasing the quality factors which is mainly becausethe total number of species possessing certain SSR tag isdecreased and therefore the conserved SSR motifs betweentwo species clusters might become representative SSR tagsfor one species cluster exclusively In Table 2 a list of totalamount of SSR motifs for each species is presented by settinga minimum SSR length of 20 nucleotides The SSR quantitiesfor mammal species are usually more than fishery speciesand the increment of SSR quality value reduces the amountof SSR motifs in each species generally

5 Conclusion

SSRs are nonrandomly distributed nucleotides in thegenomes with repeating basic patterns of lengths from 1to 6 nucleotides and a large number of functional SSRmotifs have been demonstrated as important biomarkersinvolved within various biological processes and generegulations Due to abundant number of SSRs in eachspecies genomes it is difficult to recognize significant SSRbiomarkers or gene regulation related SSRs mainly based onrepeat sequence length genetic locations and fundamentalrepeat pattern of an SSR motif In this paper we proposedthe concept of identifying SSR biomarker candidate throughcross-species cluster comparison on a specified set of targetgenes The developed system provides an online tool withmultiparameter selection functions and the identified SSRmotifs are displayed by a tag cloud visualization methodThe exclusive and consensus SSR motifs between two speciesclusters are shown in different font colors and sizes in anefficient approach The in silico comparison of SSR motifsacross different species clusters may provide the cluesand evidences for further understanding of evolutionarydevelopment and functional associations

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

10 BioMed Research International

Table 2 The number of SSR motifs of each species for various SSR quality settings

Scientific name Species name SSR quality 80 SSR quality 90 SSR quality 100Danio rerio Zebrafish 1175832 594741 401503Gasterosteus aculeatus Stickleback 160413 87343 51779Oryzias latipes Medaka 122505 37730 15460Takifugu rubripes Fugu 261612 148043 90753Tetraodon nigroviridis Tetraodon 119557 69473 43584Gadus morhua Cod 359592 209540 123880Homo sapiens Human 3023284 1406186 644338Gorilla gorilla Gorilla 757571 344973 152403Macaca mulatta Macaque 1075737 526515 225403Mus musculus Mouse 2463222 1301019 812873Bos taurus Cow 323386 132923 44906Canis familiaris Dog 715776 340433 152502Caenorhabditis elegans Roundworm 59273 13637 4225Drosophila melanogaster Fruit fly 199458 79952 21223

Acknowledgments

This work is supported by the Center of Excellence for theOceans fromNational TaiwanOceanUniversity andNationalScience Council Taiwan (NSC 102ndash2321-B-019-001 and NSC101ndash2627-B-019-003 to T-W Pai) and Department of Healthin Taiwan (DOH102-TD-B-111-004 to H-T Chang)

References

[1] B Charlesworth P Sniegowski and W Stephan ldquoThe evolu-tionary dynamics of repetitive DNA in eukaryotesrdquoNature vol371 no 6494 pp 215ndash220 1994

[2] Y-C Li A B Korol T Fahima and E Nevo ldquoMicrosatelliteswithin genes structure function and evolutionrdquo MolecularBiology and Evolution vol 21 no 6 pp 991ndash1007 2004

[3] J R Brouwer R Willemsen and B A Oostra ldquoMicrosatelliterepeat instability andneurological diseaserdquoBioessays vol 31 no1 pp 71ndash83 2009

[4] Y C Li A B Korol T Fahima A Beiles and E NevoldquoMicrosatellites genomic distribution putative functions andmutational mechanisms a reviewrdquo Molecular Ecology vol 11no 12 pp 2453ndash2465 2002

[5] S Mundlos F Otto C Mundlos et al ldquoMutations involving thetranscription factor CBFA1 cause cleidocranial dysplasiardquo Cellvol 89 no 5 pp 773ndash779 1997

[6] H Y Zoghbi and H T Orr ldquoGlutamine repeats and neurode-generationrdquoAnnual Review of Neuroscience vol 23 pp 217ndash2472000

[7] C L Cheng T Q Gao Z Wang and D D Li ldquoRoleof insulininsulin-like growth factor 1 signaling pathway inlongevityrdquoWorld Journal of Gastroenterology vol 11 no 13 pp1891ndash1895 2005

[8] K AWoods C Camacho-Hubner D Barter A J L Clark andMO Savage ldquoInsulin-like growth factor I gene deletion causingintrauterine growth retardation and severe short staturerdquo ActaPaediatrica vol 86 no 423 pp 39ndash45 1997

[9] N B Sutter C D Bustamante K Chase et al ldquoA single IGF1allele is a major determinant of small size in dogsrdquo Science vol316 no 5821 pp 112ndash115 2007

[10] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[11] S Lohmann J Ziegler and L Tetzlaff ldquoComparison of tag cloudlayouts task-related performance and visual explorationrdquo inHuman-Computer InteractionmdashINTERACT 2009 vol 5726 ofLecture Notes in Computer Science pp 392ndash404 2009

[12] S Hennig D Groth and H Lehrach ldquoAutomated gene ontol-ogy annotation for anonymous sequence datardquo Nucleic AcidsResearch vol 31 no 13 pp 3712ndash3715 2003

[13] B M Good E A Kawas B Kuo andM DWilkinson ldquoiHOP-erator User-scripting a personalized bioinformaticsWeb start-ing with the iHOP websiterdquo BMC Bioinformatics vol 7 article534 2006

[14] S A Samarajiwa S Forster K Auchettl and P J HertzogldquoINTERFEROME the database of interferon regulated genesrdquoNucleic Acids Research vol 37 no 1 pp D852ndashD857 2009

[15] F Supek M Bosnjak N Skunca and T Smuc ldquoRevigosummarizes and visualizes long lists of gene ontology termsrdquoPLoS ONE vol 6 no 7 Article ID e21800 2011

[16] E Birney T D Andrews P Bevan et al ldquoAn overview ofEnsemblrdquo Genome Research vol 14 pp 925ndash928 2004

[17] CMChen C C Chen TH Shih TW Pai CHHu andW STzou ldquoEfficient algorithms for identifying orthologous simplesequence repeats of disease genesrdquo Journal of Systems Scienceand Complexity vol 23 pp 906ndash916 2010

[18] E Nascimento R Martinez A R Lopes et al ldquoDetection andselection of microsatellites in the genome of Paracoccidioidesbrasiliensis as molecular markers for clinical and epidemiolog-ical studiesrdquo Journal of Clinical Microbiology vol 42 no 11 pp5007ndash5014 2004

[19] A Zimprich A Benet-Pages W Struhal et al ldquoA mutationin VPS35 encoding a subunit of the retromer complex causeslate-onset parkinson diseaserdquo The American Journal of HumanGenetics vol 89 no 1 pp 168ndash175 2011

[20] D V Luquetti A V Hing M J Rieder D A Nickerson EH Turner J Smith et al ldquoMandibulofacial dysostosis withmicrocephaly caused by EFTUD2 mutations expanding thephenotyperdquo The American Journal of Medical Genetics A vol161 pp 108ndash113 2013

BioMed Research International 11

[21] M A Lines L Huang J Schwartzentruber et al ldquoHaploinsuf-ficiency of a spliceosomal GTPase encoded by EFTUD2 causesmandibulofacial dysostosis with microcephalyrdquo The AmericanJournal of Human Genetics vol 90 no 2 pp 369ndash377 2012

[22] J W Fondon III and H R Garner ldquoMolecular origins of rapidand continuous morphological evolutionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 101 no 52 pp 18058ndash18063 2004

[23] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

BioMed Research International 7

(a)

(b)

(c)

(d)

Figure 5 (a) SSR tag cloud for 17 skeletal development related genesconstrained to coding regions (b) results of moving a mouse deviceover the SSR tag of ldquoCCGCodingrdquo (c) detailed information of theSSR tag ldquoCCGCodingrdquo in a floating dialog (d) an SSR tag cloud for17 skeletal development related genes by showing SSRs possessinghigh conserved ratios only

identical SSR motifs in each species cluster) and all possibleSSR candidates were shown The first species cluster wasassigned as the mammal group including human macaquemouse cow dog and gorilla the second species cluster wasassigned as the fishery group including zebrafish sticklebackmedaka fugu tetraodon and cod The filter of geneticregion was selected for coding region only and the lengthof basic pattern was selected from 1 to 6 nucleotides forcomprehensive representation According to these settingsthe simulated results were shown in Figure 5 the red codedSSR tag of ldquoCCGCodingrdquo represented the only exclusiveSSR motifs well conserved in mammal species This tagcould be found from at least 5 species within the mammalgroup and it is highly correlated to the skeletal developmentrelated genes Users can move a mouse device over the tag ofldquoCCGCodingrdquo and the appearance number and conservedratio of the selected SSR motif would be shown with a pop-up icon In Figure 5(b) the CCGCoding motifs appear inthe mammal species cluster with a total of 62 times and

a conserved ratio of 100 while no such an SSR motifcould be discovered from the skeletal development gene setwithin the fishery species cluster If a user clicked on thetag of ldquoCCGCodingrdquo detailed information of the SSR tagwill be shown by a floating dialog with Ensembl gene IDtranscript ID species name coordinates in genomes andDNA sequence contents Particularly if the SSRs appearedwithin coding regions the table also provided the detailedinformation of cDNA sequence and its corresponding trans-lated protein sequences In Figure 5(c) the CCG repeatedpattern in the last row of humanrsquos ENSG00000135414 (GDF11)gene is located at chromosome 12 and its coordinates arefrom 56137185 to 56137224 Since the CCG repeated patternwas found in coding regions the table also provided thedetailed information of DNA cDNA and correspondingprotein sequence contents Actually this repeated patternin RUNX2 gene at coding region is a polyalanine pep-tide (GCC repeat in coding region) and it indeed playsa crucial role in cellular development function Abnormaldistribution of this polyalanine repeat biomarkermight causedysplasia disease a genetic disorder of abnormal cellulardevelopment

In Figure 5(d) most of parameters were set identicallyas Figure 5(a) except the display parameter was modifiedfor showing highly conserved SSRs instead of showing allof identified SSRs In the other words tags with pinklight green and yellow color codes would be hiddenThe corresponding tag showed only one red coded tagof ldquoCCGCodingrdquo existed under such high conservationrequirements Again the SSR motif of ldquoCCGCodingrdquo rep-resented as a significant biomarker inmammal species highlycorrelated to the skeletal development related genes

34 Case Study of Gene Ontology Term of ldquoEmbryonicCranial Skeleton Morphogenesisrdquo To demonstrate function-ally related SSR motifs through GO term assignmentwe selected a GO term of ldquoembryonic cranial skeletonmorphogenesisrdquo The related genes annotated by this GOterm include TBX15 SIX4 DLX2 PRRX1 TWIST1 BMP4SIX1 SMAD2 NIPBL NODAL WNT9B TGFBR2 GAS1SIX2 FOXC2 SMAD3 TBX1 TGFBR2 TBX15 GNASPRRX2 TGFBR1 TFAP2A SMAD2 SETD2 BMP4 SMAD3TWIST2 TFAP2A SMAD3 TGFBR1 and BMP4 To com-pare and show different results by various settings we havetried several combinations of input parameters which weredifferent from system default settings In this case study theparameter settings were defined as follows SSR quality of90 for perfect SSR patterns minimum SSR length of 20nucleotidesmotif conserved ratioof 80 (ie at least 5 speciespossessing identical SSR motifs in each species cluster) andshowed all possible SSR candidates The first species clusterwas assigned as mammal group and the second species clusterfor fishery group as default settingsThe filter of genetic regionwas selected for analyzing on coding regions only and thelength of basic pattern was selected from 1 to 6 nucleotidesAccording to these settings the simulated results were shownin Figure 6(a) We could observe that there was only onered color coded SSR tag of ldquoCCGCodingrdquo and which isthe unique biomarker conserved in mammal species with

8 BioMed Research International

(a)

(b)

(c)

Figure 6 (a) SSR tag cloud for GO keyword ldquoembryonic cranialskeleton morphogenesisrdquo with motif conserved ratio of 80 (b)motif conserved ratio of 60 (c) motif conserved ratio of 100

respect to the embryonic cranial skeleton morphogenesisrelated genes

Then we lowered down the motif conserved ratio to 60and the resulting SSR tag cloud was shown in Figure 6(b)We could observe that several tags were changed by theircoded colors Taking red color coded tags as an example therewas only one red tag ldquoCCGCodingrdquo in previous Figure 6(a)but in Figure 6(b) we noticed that the red color coded SSRtags increased another tag of ldquoAATCTGCodingrdquo whichwas displayed in originally denoted as pink in Figure 6(a)Inversely if we increased the motif conserved ratio to 100the result was shown in Figure 6(c) with no red color codedSSR tag in this cloud Compared to Figure 6(a) the originalred tag of ldquoCCGCodingrdquo was changed into pink due toonly 5 out of 6 species in the mammal group holding thetag of ldquoCCGCodingrdquo In both Figures 6(b) and 6(c) wesimply observed that color coded tagsmay switch their colorsthrough different motif conserved ratio adjustments Thehigher setting ofmotif conserved ratio reduces the amount ofred green and blue color coded tags

35 An Example of Genetic Disease of ldquoHuntingtonrsquos Disease(HD)rdquo To demonstrate genetic diseases caused by abnor-mal distribution of SSR motifs we have selected a well-known neurodegenerative genetic disease ldquoHuntingtonrsquos dis-ease (HD)rdquo as an example HD was found as an irregular dis-tribution of polyglutamine expansions (CAG repeats) locatedwithin the coding regions of ENSG00000197386 (HTT)gene at chromosome 4 [22] It appears with involuntarymovements caused by losing muscle coordination and leadsto psychiatric problemsThe nucleotide repeat length and theaverage age of symptom occurrence of Huntingtonrsquos diseasewere in inverse relationship [23]

The verification results of SSR tag cloud were shown inFigure 7 and the parameter settings were defined as followsSSR quality of 100 and 80 minimum SSR length of 20

MammalianBos taurus

Mus musculusHomo sapiensGorilla gorilla

FisheryDanio rerio

Canis familiaris

(a)

MammalianBos taurus

Homo sapiens

FisheryNone

(b)

Figure 7 (a) SSR tag cloud for HTT gene with SSR quality of80 Motif Conserved Ratio of 80 and 5 organisms holding theconserved SSR tag of ldquoAGCCodingrdquo (b) SSR quality of 100MotifConserved Ratio of 80 and only two species of human and cattlespecies holding the perfect SSR tag of ldquoAGCCodingrdquo

nucleotidesmotif conserved ratioof 80 (ie at least 5 speciespossessing identical SSR motifs in each species cluster) andwith a selection of ldquoshow all SSRsrdquo The first species clusterwas assigned as mammal group while the second speciescluster as fishery group In Figure 7 we could observe theldquoAGCCodingrdquo in both two-tag clouds as an importantbiomarker In fact according to shifting transformation ofSSR repeat pattern the ldquoAGCrdquo repeat unit could be theoret-ically considered as the same pattern of ldquoCAGrdquo for efficientidentification However SSRs located within coding regionswould be further translated into their corresponding aminoacid sequences according to precise loci verification on exonregions Frame shifted SSRs in coding regions might result indifferent coded amino acids For example the coded aminoacid of the trinucleotide pattern of ldquoAGCrdquo is serine(S) andldquoCAGrdquo for glutamine (Q) Therefore identified SSRs in cod-ing regions should be carefully treated and translated into anappropriated protein sequence based on annotated genomedatabase In this example we noticed that a significant SSRmotif of ldquoAGCCodingrdquo in HTT genes could be identified

BioMed Research International 9A

mou

nt o

f SSR

s Cluster I Cluster II Cluster I Cluster II

CR

0

(a)

SSR quality 80 80

100100

Cluster I Cluster II

(b)

Figure 8 (a) Relationship between the parameter ofmotif conservedratio and the amount of SSR tags in different colors (b) relationshipbetween the parameter of SSR quality and SSR tag colors

with different sizes (occurrence rates) according to variousSSR quality settings This repeat motif in coding regionsappears in most mammal species except macaque with aminimum length requirement of 20 nucleotides Besides onlyzebrafish possesses a similar repeat motif in coding regionamong all fishery species When the parameter of SSR qualitywas increased to 100 (without any tolerance) the pattern ofldquoAGCCodingrdquo (or equivalently to ldquoCAGCodingrdquo in DNAsense strand) could be retrieved from both cattle and humanin mammal species only We could observe that the font sizeand color of each SSR tag were gradually changed accordingto different settings of tolerance rate Accordingly the tag ofldquoAGCCodingrdquo appeared with the biggest icon in pink whencompared to all other SSRs in coding regions and it reflectedthe significance of exclusive features for mammal speciescompared to fishery species These observations might alsoprovide important information for biologists for animalspecies selection in future experimental studies regardingspecific diseases

4 Discussion

Two key parameters affect the color and size distributionwithin an SSR tag cloud The first one is the motif conservedratio Different conserved ratio values change colors of SSRtags When the motif conserved ratio increased the amountof red green and blue tags might decrease In Figure 8(a)Cluster I represents the first species cluster and Cluster IIrepresents the second species cluster The horizontal straightline in the figure represents a motif conserved ratio valueWhen the CR threshold value is increased the areas of red

blue and dark green decreased In contrast when the CRthreshold value is decreased the areas of red blue and darkgreen increased The area is proportional to the amount ofSSR tags

The second important parameter for a tag cloud is theSSR quality threshold As shown in Figure 8(b) different SSRquality values were not only changing the number of SSR tagsbut also transforming the colors Increment of SSR qualityvalue may reduce the amount of SSR tags since the SSRswith higher qualities are always a subset of SSRs with lowerqualities When a quality threshold decreases to gain moreSSR candidates part of red and green tags might change theircolors into yellow or blue tags respectively This is mainlycaused by newly intersecting region after expanding SSRcandidates

Besides a few common SSR tags originally coded inyellowmight be transformed into either red or green throughincreasing the quality factors which is mainly becausethe total number of species possessing certain SSR tag isdecreased and therefore the conserved SSR motifs betweentwo species clusters might become representative SSR tagsfor one species cluster exclusively In Table 2 a list of totalamount of SSR motifs for each species is presented by settinga minimum SSR length of 20 nucleotides The SSR quantitiesfor mammal species are usually more than fishery speciesand the increment of SSR quality value reduces the amountof SSR motifs in each species generally

5 Conclusion

SSRs are nonrandomly distributed nucleotides in thegenomes with repeating basic patterns of lengths from 1to 6 nucleotides and a large number of functional SSRmotifs have been demonstrated as important biomarkersinvolved within various biological processes and generegulations Due to abundant number of SSRs in eachspecies genomes it is difficult to recognize significant SSRbiomarkers or gene regulation related SSRs mainly based onrepeat sequence length genetic locations and fundamentalrepeat pattern of an SSR motif In this paper we proposedthe concept of identifying SSR biomarker candidate throughcross-species cluster comparison on a specified set of targetgenes The developed system provides an online tool withmultiparameter selection functions and the identified SSRmotifs are displayed by a tag cloud visualization methodThe exclusive and consensus SSR motifs between two speciesclusters are shown in different font colors and sizes in anefficient approach The in silico comparison of SSR motifsacross different species clusters may provide the cluesand evidences for further understanding of evolutionarydevelopment and functional associations

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

10 BioMed Research International

Table 2 The number of SSR motifs of each species for various SSR quality settings

Scientific name Species name SSR quality 80 SSR quality 90 SSR quality 100Danio rerio Zebrafish 1175832 594741 401503Gasterosteus aculeatus Stickleback 160413 87343 51779Oryzias latipes Medaka 122505 37730 15460Takifugu rubripes Fugu 261612 148043 90753Tetraodon nigroviridis Tetraodon 119557 69473 43584Gadus morhua Cod 359592 209540 123880Homo sapiens Human 3023284 1406186 644338Gorilla gorilla Gorilla 757571 344973 152403Macaca mulatta Macaque 1075737 526515 225403Mus musculus Mouse 2463222 1301019 812873Bos taurus Cow 323386 132923 44906Canis familiaris Dog 715776 340433 152502Caenorhabditis elegans Roundworm 59273 13637 4225Drosophila melanogaster Fruit fly 199458 79952 21223

Acknowledgments

This work is supported by the Center of Excellence for theOceans fromNational TaiwanOceanUniversity andNationalScience Council Taiwan (NSC 102ndash2321-B-019-001 and NSC101ndash2627-B-019-003 to T-W Pai) and Department of Healthin Taiwan (DOH102-TD-B-111-004 to H-T Chang)

References

[1] B Charlesworth P Sniegowski and W Stephan ldquoThe evolu-tionary dynamics of repetitive DNA in eukaryotesrdquoNature vol371 no 6494 pp 215ndash220 1994

[2] Y-C Li A B Korol T Fahima and E Nevo ldquoMicrosatelliteswithin genes structure function and evolutionrdquo MolecularBiology and Evolution vol 21 no 6 pp 991ndash1007 2004

[3] J R Brouwer R Willemsen and B A Oostra ldquoMicrosatelliterepeat instability andneurological diseaserdquoBioessays vol 31 no1 pp 71ndash83 2009

[4] Y C Li A B Korol T Fahima A Beiles and E NevoldquoMicrosatellites genomic distribution putative functions andmutational mechanisms a reviewrdquo Molecular Ecology vol 11no 12 pp 2453ndash2465 2002

[5] S Mundlos F Otto C Mundlos et al ldquoMutations involving thetranscription factor CBFA1 cause cleidocranial dysplasiardquo Cellvol 89 no 5 pp 773ndash779 1997

[6] H Y Zoghbi and H T Orr ldquoGlutamine repeats and neurode-generationrdquoAnnual Review of Neuroscience vol 23 pp 217ndash2472000

[7] C L Cheng T Q Gao Z Wang and D D Li ldquoRoleof insulininsulin-like growth factor 1 signaling pathway inlongevityrdquoWorld Journal of Gastroenterology vol 11 no 13 pp1891ndash1895 2005

[8] K AWoods C Camacho-Hubner D Barter A J L Clark andMO Savage ldquoInsulin-like growth factor I gene deletion causingintrauterine growth retardation and severe short staturerdquo ActaPaediatrica vol 86 no 423 pp 39ndash45 1997

[9] N B Sutter C D Bustamante K Chase et al ldquoA single IGF1allele is a major determinant of small size in dogsrdquo Science vol316 no 5821 pp 112ndash115 2007

[10] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[11] S Lohmann J Ziegler and L Tetzlaff ldquoComparison of tag cloudlayouts task-related performance and visual explorationrdquo inHuman-Computer InteractionmdashINTERACT 2009 vol 5726 ofLecture Notes in Computer Science pp 392ndash404 2009

[12] S Hennig D Groth and H Lehrach ldquoAutomated gene ontol-ogy annotation for anonymous sequence datardquo Nucleic AcidsResearch vol 31 no 13 pp 3712ndash3715 2003

[13] B M Good E A Kawas B Kuo andM DWilkinson ldquoiHOP-erator User-scripting a personalized bioinformaticsWeb start-ing with the iHOP websiterdquo BMC Bioinformatics vol 7 article534 2006

[14] S A Samarajiwa S Forster K Auchettl and P J HertzogldquoINTERFEROME the database of interferon regulated genesrdquoNucleic Acids Research vol 37 no 1 pp D852ndashD857 2009

[15] F Supek M Bosnjak N Skunca and T Smuc ldquoRevigosummarizes and visualizes long lists of gene ontology termsrdquoPLoS ONE vol 6 no 7 Article ID e21800 2011

[16] E Birney T D Andrews P Bevan et al ldquoAn overview ofEnsemblrdquo Genome Research vol 14 pp 925ndash928 2004

[17] CMChen C C Chen TH Shih TW Pai CHHu andW STzou ldquoEfficient algorithms for identifying orthologous simplesequence repeats of disease genesrdquo Journal of Systems Scienceand Complexity vol 23 pp 906ndash916 2010

[18] E Nascimento R Martinez A R Lopes et al ldquoDetection andselection of microsatellites in the genome of Paracoccidioidesbrasiliensis as molecular markers for clinical and epidemiolog-ical studiesrdquo Journal of Clinical Microbiology vol 42 no 11 pp5007ndash5014 2004

[19] A Zimprich A Benet-Pages W Struhal et al ldquoA mutationin VPS35 encoding a subunit of the retromer complex causeslate-onset parkinson diseaserdquo The American Journal of HumanGenetics vol 89 no 1 pp 168ndash175 2011

[20] D V Luquetti A V Hing M J Rieder D A Nickerson EH Turner J Smith et al ldquoMandibulofacial dysostosis withmicrocephaly caused by EFTUD2 mutations expanding thephenotyperdquo The American Journal of Medical Genetics A vol161 pp 108ndash113 2013

BioMed Research International 11

[21] M A Lines L Huang J Schwartzentruber et al ldquoHaploinsuf-ficiency of a spliceosomal GTPase encoded by EFTUD2 causesmandibulofacial dysostosis with microcephalyrdquo The AmericanJournal of Human Genetics vol 90 no 2 pp 369ndash377 2012

[22] J W Fondon III and H R Garner ldquoMolecular origins of rapidand continuous morphological evolutionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 101 no 52 pp 18058ndash18063 2004

[23] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

8 BioMed Research International

(a)

(b)

(c)

Figure 6 (a) SSR tag cloud for GO keyword ldquoembryonic cranialskeleton morphogenesisrdquo with motif conserved ratio of 80 (b)motif conserved ratio of 60 (c) motif conserved ratio of 100

respect to the embryonic cranial skeleton morphogenesisrelated genes

Then we lowered down the motif conserved ratio to 60and the resulting SSR tag cloud was shown in Figure 6(b)We could observe that several tags were changed by theircoded colors Taking red color coded tags as an example therewas only one red tag ldquoCCGCodingrdquo in previous Figure 6(a)but in Figure 6(b) we noticed that the red color coded SSRtags increased another tag of ldquoAATCTGCodingrdquo whichwas displayed in originally denoted as pink in Figure 6(a)Inversely if we increased the motif conserved ratio to 100the result was shown in Figure 6(c) with no red color codedSSR tag in this cloud Compared to Figure 6(a) the originalred tag of ldquoCCGCodingrdquo was changed into pink due toonly 5 out of 6 species in the mammal group holding thetag of ldquoCCGCodingrdquo In both Figures 6(b) and 6(c) wesimply observed that color coded tagsmay switch their colorsthrough different motif conserved ratio adjustments Thehigher setting ofmotif conserved ratio reduces the amount ofred green and blue color coded tags

35 An Example of Genetic Disease of ldquoHuntingtonrsquos Disease(HD)rdquo To demonstrate genetic diseases caused by abnor-mal distribution of SSR motifs we have selected a well-known neurodegenerative genetic disease ldquoHuntingtonrsquos dis-ease (HD)rdquo as an example HD was found as an irregular dis-tribution of polyglutamine expansions (CAG repeats) locatedwithin the coding regions of ENSG00000197386 (HTT)gene at chromosome 4 [22] It appears with involuntarymovements caused by losing muscle coordination and leadsto psychiatric problemsThe nucleotide repeat length and theaverage age of symptom occurrence of Huntingtonrsquos diseasewere in inverse relationship [23]

The verification results of SSR tag cloud were shown inFigure 7 and the parameter settings were defined as followsSSR quality of 100 and 80 minimum SSR length of 20

MammalianBos taurus

Mus musculusHomo sapiensGorilla gorilla

FisheryDanio rerio

Canis familiaris

(a)

MammalianBos taurus

Homo sapiens

FisheryNone

(b)

Figure 7 (a) SSR tag cloud for HTT gene with SSR quality of80 Motif Conserved Ratio of 80 and 5 organisms holding theconserved SSR tag of ldquoAGCCodingrdquo (b) SSR quality of 100MotifConserved Ratio of 80 and only two species of human and cattlespecies holding the perfect SSR tag of ldquoAGCCodingrdquo

nucleotidesmotif conserved ratioof 80 (ie at least 5 speciespossessing identical SSR motifs in each species cluster) andwith a selection of ldquoshow all SSRsrdquo The first species clusterwas assigned as mammal group while the second speciescluster as fishery group In Figure 7 we could observe theldquoAGCCodingrdquo in both two-tag clouds as an importantbiomarker In fact according to shifting transformation ofSSR repeat pattern the ldquoAGCrdquo repeat unit could be theoret-ically considered as the same pattern of ldquoCAGrdquo for efficientidentification However SSRs located within coding regionswould be further translated into their corresponding aminoacid sequences according to precise loci verification on exonregions Frame shifted SSRs in coding regions might result indifferent coded amino acids For example the coded aminoacid of the trinucleotide pattern of ldquoAGCrdquo is serine(S) andldquoCAGrdquo for glutamine (Q) Therefore identified SSRs in cod-ing regions should be carefully treated and translated into anappropriated protein sequence based on annotated genomedatabase In this example we noticed that a significant SSRmotif of ldquoAGCCodingrdquo in HTT genes could be identified

BioMed Research International 9A

mou

nt o

f SSR

s Cluster I Cluster II Cluster I Cluster II

CR

0

(a)

SSR quality 80 80

100100

Cluster I Cluster II

(b)

Figure 8 (a) Relationship between the parameter ofmotif conservedratio and the amount of SSR tags in different colors (b) relationshipbetween the parameter of SSR quality and SSR tag colors

with different sizes (occurrence rates) according to variousSSR quality settings This repeat motif in coding regionsappears in most mammal species except macaque with aminimum length requirement of 20 nucleotides Besides onlyzebrafish possesses a similar repeat motif in coding regionamong all fishery species When the parameter of SSR qualitywas increased to 100 (without any tolerance) the pattern ofldquoAGCCodingrdquo (or equivalently to ldquoCAGCodingrdquo in DNAsense strand) could be retrieved from both cattle and humanin mammal species only We could observe that the font sizeand color of each SSR tag were gradually changed accordingto different settings of tolerance rate Accordingly the tag ofldquoAGCCodingrdquo appeared with the biggest icon in pink whencompared to all other SSRs in coding regions and it reflectedthe significance of exclusive features for mammal speciescompared to fishery species These observations might alsoprovide important information for biologists for animalspecies selection in future experimental studies regardingspecific diseases

4 Discussion

Two key parameters affect the color and size distributionwithin an SSR tag cloud The first one is the motif conservedratio Different conserved ratio values change colors of SSRtags When the motif conserved ratio increased the amountof red green and blue tags might decrease In Figure 8(a)Cluster I represents the first species cluster and Cluster IIrepresents the second species cluster The horizontal straightline in the figure represents a motif conserved ratio valueWhen the CR threshold value is increased the areas of red

blue and dark green decreased In contrast when the CRthreshold value is decreased the areas of red blue and darkgreen increased The area is proportional to the amount ofSSR tags

The second important parameter for a tag cloud is theSSR quality threshold As shown in Figure 8(b) different SSRquality values were not only changing the number of SSR tagsbut also transforming the colors Increment of SSR qualityvalue may reduce the amount of SSR tags since the SSRswith higher qualities are always a subset of SSRs with lowerqualities When a quality threshold decreases to gain moreSSR candidates part of red and green tags might change theircolors into yellow or blue tags respectively This is mainlycaused by newly intersecting region after expanding SSRcandidates

Besides a few common SSR tags originally coded inyellowmight be transformed into either red or green throughincreasing the quality factors which is mainly becausethe total number of species possessing certain SSR tag isdecreased and therefore the conserved SSR motifs betweentwo species clusters might become representative SSR tagsfor one species cluster exclusively In Table 2 a list of totalamount of SSR motifs for each species is presented by settinga minimum SSR length of 20 nucleotides The SSR quantitiesfor mammal species are usually more than fishery speciesand the increment of SSR quality value reduces the amountof SSR motifs in each species generally

5 Conclusion

SSRs are nonrandomly distributed nucleotides in thegenomes with repeating basic patterns of lengths from 1to 6 nucleotides and a large number of functional SSRmotifs have been demonstrated as important biomarkersinvolved within various biological processes and generegulations Due to abundant number of SSRs in eachspecies genomes it is difficult to recognize significant SSRbiomarkers or gene regulation related SSRs mainly based onrepeat sequence length genetic locations and fundamentalrepeat pattern of an SSR motif In this paper we proposedthe concept of identifying SSR biomarker candidate throughcross-species cluster comparison on a specified set of targetgenes The developed system provides an online tool withmultiparameter selection functions and the identified SSRmotifs are displayed by a tag cloud visualization methodThe exclusive and consensus SSR motifs between two speciesclusters are shown in different font colors and sizes in anefficient approach The in silico comparison of SSR motifsacross different species clusters may provide the cluesand evidences for further understanding of evolutionarydevelopment and functional associations

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

10 BioMed Research International

Table 2 The number of SSR motifs of each species for various SSR quality settings

Scientific name Species name SSR quality 80 SSR quality 90 SSR quality 100Danio rerio Zebrafish 1175832 594741 401503Gasterosteus aculeatus Stickleback 160413 87343 51779Oryzias latipes Medaka 122505 37730 15460Takifugu rubripes Fugu 261612 148043 90753Tetraodon nigroviridis Tetraodon 119557 69473 43584Gadus morhua Cod 359592 209540 123880Homo sapiens Human 3023284 1406186 644338Gorilla gorilla Gorilla 757571 344973 152403Macaca mulatta Macaque 1075737 526515 225403Mus musculus Mouse 2463222 1301019 812873Bos taurus Cow 323386 132923 44906Canis familiaris Dog 715776 340433 152502Caenorhabditis elegans Roundworm 59273 13637 4225Drosophila melanogaster Fruit fly 199458 79952 21223

Acknowledgments

This work is supported by the Center of Excellence for theOceans fromNational TaiwanOceanUniversity andNationalScience Council Taiwan (NSC 102ndash2321-B-019-001 and NSC101ndash2627-B-019-003 to T-W Pai) and Department of Healthin Taiwan (DOH102-TD-B-111-004 to H-T Chang)

References

[1] B Charlesworth P Sniegowski and W Stephan ldquoThe evolu-tionary dynamics of repetitive DNA in eukaryotesrdquoNature vol371 no 6494 pp 215ndash220 1994

[2] Y-C Li A B Korol T Fahima and E Nevo ldquoMicrosatelliteswithin genes structure function and evolutionrdquo MolecularBiology and Evolution vol 21 no 6 pp 991ndash1007 2004

[3] J R Brouwer R Willemsen and B A Oostra ldquoMicrosatelliterepeat instability andneurological diseaserdquoBioessays vol 31 no1 pp 71ndash83 2009

[4] Y C Li A B Korol T Fahima A Beiles and E NevoldquoMicrosatellites genomic distribution putative functions andmutational mechanisms a reviewrdquo Molecular Ecology vol 11no 12 pp 2453ndash2465 2002

[5] S Mundlos F Otto C Mundlos et al ldquoMutations involving thetranscription factor CBFA1 cause cleidocranial dysplasiardquo Cellvol 89 no 5 pp 773ndash779 1997

[6] H Y Zoghbi and H T Orr ldquoGlutamine repeats and neurode-generationrdquoAnnual Review of Neuroscience vol 23 pp 217ndash2472000

[7] C L Cheng T Q Gao Z Wang and D D Li ldquoRoleof insulininsulin-like growth factor 1 signaling pathway inlongevityrdquoWorld Journal of Gastroenterology vol 11 no 13 pp1891ndash1895 2005

[8] K AWoods C Camacho-Hubner D Barter A J L Clark andMO Savage ldquoInsulin-like growth factor I gene deletion causingintrauterine growth retardation and severe short staturerdquo ActaPaediatrica vol 86 no 423 pp 39ndash45 1997

[9] N B Sutter C D Bustamante K Chase et al ldquoA single IGF1allele is a major determinant of small size in dogsrdquo Science vol316 no 5821 pp 112ndash115 2007

[10] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[11] S Lohmann J Ziegler and L Tetzlaff ldquoComparison of tag cloudlayouts task-related performance and visual explorationrdquo inHuman-Computer InteractionmdashINTERACT 2009 vol 5726 ofLecture Notes in Computer Science pp 392ndash404 2009

[12] S Hennig D Groth and H Lehrach ldquoAutomated gene ontol-ogy annotation for anonymous sequence datardquo Nucleic AcidsResearch vol 31 no 13 pp 3712ndash3715 2003

[13] B M Good E A Kawas B Kuo andM DWilkinson ldquoiHOP-erator User-scripting a personalized bioinformaticsWeb start-ing with the iHOP websiterdquo BMC Bioinformatics vol 7 article534 2006

[14] S A Samarajiwa S Forster K Auchettl and P J HertzogldquoINTERFEROME the database of interferon regulated genesrdquoNucleic Acids Research vol 37 no 1 pp D852ndashD857 2009

[15] F Supek M Bosnjak N Skunca and T Smuc ldquoRevigosummarizes and visualizes long lists of gene ontology termsrdquoPLoS ONE vol 6 no 7 Article ID e21800 2011

[16] E Birney T D Andrews P Bevan et al ldquoAn overview ofEnsemblrdquo Genome Research vol 14 pp 925ndash928 2004

[17] CMChen C C Chen TH Shih TW Pai CHHu andW STzou ldquoEfficient algorithms for identifying orthologous simplesequence repeats of disease genesrdquo Journal of Systems Scienceand Complexity vol 23 pp 906ndash916 2010

[18] E Nascimento R Martinez A R Lopes et al ldquoDetection andselection of microsatellites in the genome of Paracoccidioidesbrasiliensis as molecular markers for clinical and epidemiolog-ical studiesrdquo Journal of Clinical Microbiology vol 42 no 11 pp5007ndash5014 2004

[19] A Zimprich A Benet-Pages W Struhal et al ldquoA mutationin VPS35 encoding a subunit of the retromer complex causeslate-onset parkinson diseaserdquo The American Journal of HumanGenetics vol 89 no 1 pp 168ndash175 2011

[20] D V Luquetti A V Hing M J Rieder D A Nickerson EH Turner J Smith et al ldquoMandibulofacial dysostosis withmicrocephaly caused by EFTUD2 mutations expanding thephenotyperdquo The American Journal of Medical Genetics A vol161 pp 108ndash113 2013

BioMed Research International 11

[21] M A Lines L Huang J Schwartzentruber et al ldquoHaploinsuf-ficiency of a spliceosomal GTPase encoded by EFTUD2 causesmandibulofacial dysostosis with microcephalyrdquo The AmericanJournal of Human Genetics vol 90 no 2 pp 369ndash377 2012

[22] J W Fondon III and H R Garner ldquoMolecular origins of rapidand continuous morphological evolutionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 101 no 52 pp 18058ndash18063 2004

[23] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

BioMed Research International 9A

mou

nt o

f SSR

s Cluster I Cluster II Cluster I Cluster II

CR

0

(a)

SSR quality 80 80

100100

Cluster I Cluster II

(b)

Figure 8 (a) Relationship between the parameter ofmotif conservedratio and the amount of SSR tags in different colors (b) relationshipbetween the parameter of SSR quality and SSR tag colors

with different sizes (occurrence rates) according to variousSSR quality settings This repeat motif in coding regionsappears in most mammal species except macaque with aminimum length requirement of 20 nucleotides Besides onlyzebrafish possesses a similar repeat motif in coding regionamong all fishery species When the parameter of SSR qualitywas increased to 100 (without any tolerance) the pattern ofldquoAGCCodingrdquo (or equivalently to ldquoCAGCodingrdquo in DNAsense strand) could be retrieved from both cattle and humanin mammal species only We could observe that the font sizeand color of each SSR tag were gradually changed accordingto different settings of tolerance rate Accordingly the tag ofldquoAGCCodingrdquo appeared with the biggest icon in pink whencompared to all other SSRs in coding regions and it reflectedthe significance of exclusive features for mammal speciescompared to fishery species These observations might alsoprovide important information for biologists for animalspecies selection in future experimental studies regardingspecific diseases

4 Discussion

Two key parameters affect the color and size distributionwithin an SSR tag cloud The first one is the motif conservedratio Different conserved ratio values change colors of SSRtags When the motif conserved ratio increased the amountof red green and blue tags might decrease In Figure 8(a)Cluster I represents the first species cluster and Cluster IIrepresents the second species cluster The horizontal straightline in the figure represents a motif conserved ratio valueWhen the CR threshold value is increased the areas of red

blue and dark green decreased In contrast when the CRthreshold value is decreased the areas of red blue and darkgreen increased The area is proportional to the amount ofSSR tags

The second important parameter for a tag cloud is theSSR quality threshold As shown in Figure 8(b) different SSRquality values were not only changing the number of SSR tagsbut also transforming the colors Increment of SSR qualityvalue may reduce the amount of SSR tags since the SSRswith higher qualities are always a subset of SSRs with lowerqualities When a quality threshold decreases to gain moreSSR candidates part of red and green tags might change theircolors into yellow or blue tags respectively This is mainlycaused by newly intersecting region after expanding SSRcandidates

Besides a few common SSR tags originally coded inyellowmight be transformed into either red or green throughincreasing the quality factors which is mainly becausethe total number of species possessing certain SSR tag isdecreased and therefore the conserved SSR motifs betweentwo species clusters might become representative SSR tagsfor one species cluster exclusively In Table 2 a list of totalamount of SSR motifs for each species is presented by settinga minimum SSR length of 20 nucleotides The SSR quantitiesfor mammal species are usually more than fishery speciesand the increment of SSR quality value reduces the amountof SSR motifs in each species generally

5 Conclusion

SSRs are nonrandomly distributed nucleotides in thegenomes with repeating basic patterns of lengths from 1to 6 nucleotides and a large number of functional SSRmotifs have been demonstrated as important biomarkersinvolved within various biological processes and generegulations Due to abundant number of SSRs in eachspecies genomes it is difficult to recognize significant SSRbiomarkers or gene regulation related SSRs mainly based onrepeat sequence length genetic locations and fundamentalrepeat pattern of an SSR motif In this paper we proposedthe concept of identifying SSR biomarker candidate throughcross-species cluster comparison on a specified set of targetgenes The developed system provides an online tool withmultiparameter selection functions and the identified SSRmotifs are displayed by a tag cloud visualization methodThe exclusive and consensus SSR motifs between two speciesclusters are shown in different font colors and sizes in anefficient approach The in silico comparison of SSR motifsacross different species clusters may provide the cluesand evidences for further understanding of evolutionarydevelopment and functional associations

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

10 BioMed Research International

Table 2 The number of SSR motifs of each species for various SSR quality settings

Scientific name Species name SSR quality 80 SSR quality 90 SSR quality 100Danio rerio Zebrafish 1175832 594741 401503Gasterosteus aculeatus Stickleback 160413 87343 51779Oryzias latipes Medaka 122505 37730 15460Takifugu rubripes Fugu 261612 148043 90753Tetraodon nigroviridis Tetraodon 119557 69473 43584Gadus morhua Cod 359592 209540 123880Homo sapiens Human 3023284 1406186 644338Gorilla gorilla Gorilla 757571 344973 152403Macaca mulatta Macaque 1075737 526515 225403Mus musculus Mouse 2463222 1301019 812873Bos taurus Cow 323386 132923 44906Canis familiaris Dog 715776 340433 152502Caenorhabditis elegans Roundworm 59273 13637 4225Drosophila melanogaster Fruit fly 199458 79952 21223

Acknowledgments

This work is supported by the Center of Excellence for theOceans fromNational TaiwanOceanUniversity andNationalScience Council Taiwan (NSC 102ndash2321-B-019-001 and NSC101ndash2627-B-019-003 to T-W Pai) and Department of Healthin Taiwan (DOH102-TD-B-111-004 to H-T Chang)

References

[1] B Charlesworth P Sniegowski and W Stephan ldquoThe evolu-tionary dynamics of repetitive DNA in eukaryotesrdquoNature vol371 no 6494 pp 215ndash220 1994

[2] Y-C Li A B Korol T Fahima and E Nevo ldquoMicrosatelliteswithin genes structure function and evolutionrdquo MolecularBiology and Evolution vol 21 no 6 pp 991ndash1007 2004

[3] J R Brouwer R Willemsen and B A Oostra ldquoMicrosatelliterepeat instability andneurological diseaserdquoBioessays vol 31 no1 pp 71ndash83 2009

[4] Y C Li A B Korol T Fahima A Beiles and E NevoldquoMicrosatellites genomic distribution putative functions andmutational mechanisms a reviewrdquo Molecular Ecology vol 11no 12 pp 2453ndash2465 2002

[5] S Mundlos F Otto C Mundlos et al ldquoMutations involving thetranscription factor CBFA1 cause cleidocranial dysplasiardquo Cellvol 89 no 5 pp 773ndash779 1997

[6] H Y Zoghbi and H T Orr ldquoGlutamine repeats and neurode-generationrdquoAnnual Review of Neuroscience vol 23 pp 217ndash2472000

[7] C L Cheng T Q Gao Z Wang and D D Li ldquoRoleof insulininsulin-like growth factor 1 signaling pathway inlongevityrdquoWorld Journal of Gastroenterology vol 11 no 13 pp1891ndash1895 2005

[8] K AWoods C Camacho-Hubner D Barter A J L Clark andMO Savage ldquoInsulin-like growth factor I gene deletion causingintrauterine growth retardation and severe short staturerdquo ActaPaediatrica vol 86 no 423 pp 39ndash45 1997

[9] N B Sutter C D Bustamante K Chase et al ldquoA single IGF1allele is a major determinant of small size in dogsrdquo Science vol316 no 5821 pp 112ndash115 2007

[10] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[11] S Lohmann J Ziegler and L Tetzlaff ldquoComparison of tag cloudlayouts task-related performance and visual explorationrdquo inHuman-Computer InteractionmdashINTERACT 2009 vol 5726 ofLecture Notes in Computer Science pp 392ndash404 2009

[12] S Hennig D Groth and H Lehrach ldquoAutomated gene ontol-ogy annotation for anonymous sequence datardquo Nucleic AcidsResearch vol 31 no 13 pp 3712ndash3715 2003

[13] B M Good E A Kawas B Kuo andM DWilkinson ldquoiHOP-erator User-scripting a personalized bioinformaticsWeb start-ing with the iHOP websiterdquo BMC Bioinformatics vol 7 article534 2006

[14] S A Samarajiwa S Forster K Auchettl and P J HertzogldquoINTERFEROME the database of interferon regulated genesrdquoNucleic Acids Research vol 37 no 1 pp D852ndashD857 2009

[15] F Supek M Bosnjak N Skunca and T Smuc ldquoRevigosummarizes and visualizes long lists of gene ontology termsrdquoPLoS ONE vol 6 no 7 Article ID e21800 2011

[16] E Birney T D Andrews P Bevan et al ldquoAn overview ofEnsemblrdquo Genome Research vol 14 pp 925ndash928 2004

[17] CMChen C C Chen TH Shih TW Pai CHHu andW STzou ldquoEfficient algorithms for identifying orthologous simplesequence repeats of disease genesrdquo Journal of Systems Scienceand Complexity vol 23 pp 906ndash916 2010

[18] E Nascimento R Martinez A R Lopes et al ldquoDetection andselection of microsatellites in the genome of Paracoccidioidesbrasiliensis as molecular markers for clinical and epidemiolog-ical studiesrdquo Journal of Clinical Microbiology vol 42 no 11 pp5007ndash5014 2004

[19] A Zimprich A Benet-Pages W Struhal et al ldquoA mutationin VPS35 encoding a subunit of the retromer complex causeslate-onset parkinson diseaserdquo The American Journal of HumanGenetics vol 89 no 1 pp 168ndash175 2011

[20] D V Luquetti A V Hing M J Rieder D A Nickerson EH Turner J Smith et al ldquoMandibulofacial dysostosis withmicrocephaly caused by EFTUD2 mutations expanding thephenotyperdquo The American Journal of Medical Genetics A vol161 pp 108ndash113 2013

BioMed Research International 11

[21] M A Lines L Huang J Schwartzentruber et al ldquoHaploinsuf-ficiency of a spliceosomal GTPase encoded by EFTUD2 causesmandibulofacial dysostosis with microcephalyrdquo The AmericanJournal of Human Genetics vol 90 no 2 pp 369ndash377 2012

[22] J W Fondon III and H R Garner ldquoMolecular origins of rapidand continuous morphological evolutionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 101 no 52 pp 18058ndash18063 2004

[23] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

10 BioMed Research International

Table 2 The number of SSR motifs of each species for various SSR quality settings

Scientific name Species name SSR quality 80 SSR quality 90 SSR quality 100Danio rerio Zebrafish 1175832 594741 401503Gasterosteus aculeatus Stickleback 160413 87343 51779Oryzias latipes Medaka 122505 37730 15460Takifugu rubripes Fugu 261612 148043 90753Tetraodon nigroviridis Tetraodon 119557 69473 43584Gadus morhua Cod 359592 209540 123880Homo sapiens Human 3023284 1406186 644338Gorilla gorilla Gorilla 757571 344973 152403Macaca mulatta Macaque 1075737 526515 225403Mus musculus Mouse 2463222 1301019 812873Bos taurus Cow 323386 132923 44906Canis familiaris Dog 715776 340433 152502Caenorhabditis elegans Roundworm 59273 13637 4225Drosophila melanogaster Fruit fly 199458 79952 21223

Acknowledgments

This work is supported by the Center of Excellence for theOceans fromNational TaiwanOceanUniversity andNationalScience Council Taiwan (NSC 102ndash2321-B-019-001 and NSC101ndash2627-B-019-003 to T-W Pai) and Department of Healthin Taiwan (DOH102-TD-B-111-004 to H-T Chang)

References

[1] B Charlesworth P Sniegowski and W Stephan ldquoThe evolu-tionary dynamics of repetitive DNA in eukaryotesrdquoNature vol371 no 6494 pp 215ndash220 1994

[2] Y-C Li A B Korol T Fahima and E Nevo ldquoMicrosatelliteswithin genes structure function and evolutionrdquo MolecularBiology and Evolution vol 21 no 6 pp 991ndash1007 2004

[3] J R Brouwer R Willemsen and B A Oostra ldquoMicrosatelliterepeat instability andneurological diseaserdquoBioessays vol 31 no1 pp 71ndash83 2009

[4] Y C Li A B Korol T Fahima A Beiles and E NevoldquoMicrosatellites genomic distribution putative functions andmutational mechanisms a reviewrdquo Molecular Ecology vol 11no 12 pp 2453ndash2465 2002

[5] S Mundlos F Otto C Mundlos et al ldquoMutations involving thetranscription factor CBFA1 cause cleidocranial dysplasiardquo Cellvol 89 no 5 pp 773ndash779 1997

[6] H Y Zoghbi and H T Orr ldquoGlutamine repeats and neurode-generationrdquoAnnual Review of Neuroscience vol 23 pp 217ndash2472000

[7] C L Cheng T Q Gao Z Wang and D D Li ldquoRoleof insulininsulin-like growth factor 1 signaling pathway inlongevityrdquoWorld Journal of Gastroenterology vol 11 no 13 pp1891ndash1895 2005

[8] K AWoods C Camacho-Hubner D Barter A J L Clark andMO Savage ldquoInsulin-like growth factor I gene deletion causingintrauterine growth retardation and severe short staturerdquo ActaPaediatrica vol 86 no 423 pp 39ndash45 1997

[9] N B Sutter C D Bustamante K Chase et al ldquoA single IGF1allele is a major determinant of small size in dogsrdquo Science vol316 no 5821 pp 112ndash115 2007

[10] M Ashburner C A Ball J A Blake et al ldquoGene ontology toolfor the unification of biologyrdquoNature Genetics vol 25 no 1 pp25ndash29 2000

[11] S Lohmann J Ziegler and L Tetzlaff ldquoComparison of tag cloudlayouts task-related performance and visual explorationrdquo inHuman-Computer InteractionmdashINTERACT 2009 vol 5726 ofLecture Notes in Computer Science pp 392ndash404 2009

[12] S Hennig D Groth and H Lehrach ldquoAutomated gene ontol-ogy annotation for anonymous sequence datardquo Nucleic AcidsResearch vol 31 no 13 pp 3712ndash3715 2003

[13] B M Good E A Kawas B Kuo andM DWilkinson ldquoiHOP-erator User-scripting a personalized bioinformaticsWeb start-ing with the iHOP websiterdquo BMC Bioinformatics vol 7 article534 2006

[14] S A Samarajiwa S Forster K Auchettl and P J HertzogldquoINTERFEROME the database of interferon regulated genesrdquoNucleic Acids Research vol 37 no 1 pp D852ndashD857 2009

[15] F Supek M Bosnjak N Skunca and T Smuc ldquoRevigosummarizes and visualizes long lists of gene ontology termsrdquoPLoS ONE vol 6 no 7 Article ID e21800 2011

[16] E Birney T D Andrews P Bevan et al ldquoAn overview ofEnsemblrdquo Genome Research vol 14 pp 925ndash928 2004

[17] CMChen C C Chen TH Shih TW Pai CHHu andW STzou ldquoEfficient algorithms for identifying orthologous simplesequence repeats of disease genesrdquo Journal of Systems Scienceand Complexity vol 23 pp 906ndash916 2010

[18] E Nascimento R Martinez A R Lopes et al ldquoDetection andselection of microsatellites in the genome of Paracoccidioidesbrasiliensis as molecular markers for clinical and epidemiolog-ical studiesrdquo Journal of Clinical Microbiology vol 42 no 11 pp5007ndash5014 2004

[19] A Zimprich A Benet-Pages W Struhal et al ldquoA mutationin VPS35 encoding a subunit of the retromer complex causeslate-onset parkinson diseaserdquo The American Journal of HumanGenetics vol 89 no 1 pp 168ndash175 2011

[20] D V Luquetti A V Hing M J Rieder D A Nickerson EH Turner J Smith et al ldquoMandibulofacial dysostosis withmicrocephaly caused by EFTUD2 mutations expanding thephenotyperdquo The American Journal of Medical Genetics A vol161 pp 108ndash113 2013

BioMed Research International 11

[21] M A Lines L Huang J Schwartzentruber et al ldquoHaploinsuf-ficiency of a spliceosomal GTPase encoded by EFTUD2 causesmandibulofacial dysostosis with microcephalyrdquo The AmericanJournal of Human Genetics vol 90 no 2 pp 369ndash377 2012

[22] J W Fondon III and H R Garner ldquoMolecular origins of rapidand continuous morphological evolutionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 101 no 52 pp 18058ndash18063 2004

[23] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

BioMed Research International 11

[21] M A Lines L Huang J Schwartzentruber et al ldquoHaploinsuf-ficiency of a spliceosomal GTPase encoded by EFTUD2 causesmandibulofacial dysostosis with microcephalyrdquo The AmericanJournal of Human Genetics vol 90 no 2 pp 369ndash377 2012

[22] J W Fondon III and H R Garner ldquoMolecular origins of rapidand continuous morphological evolutionrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 101 no 52 pp 18058ndash18063 2004

[23] M E MacDonald C M Ambrose M P Duyao et al ldquoA novelgene containing a trinucleotide repeat that is expanded andunstable on Huntingtonrsquos disease chromosomesrdquo Cell vol 72no 6 pp 971ndash983 1993

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology